Fifth_Generation_Computer_Systems_1992_Volume_2 Fifth Generation Computer Systems 1992 Volume 2

User Manual: Fifth_Generation_Computer_Systems_1992_Volume_2

Open the PDF directly: View PDF PDF.
Page Count: 773

DownloadFifth_Generation_Computer_Systems_1992_Volume_2 Fifth Generation Computer Systems 1992 Volume 2
Open PDF In BrowserView PDF
~
FIFTH GENERATION
COMPUTER SYSTEMS 1992
Edited by
Institute for New Generation
Computer Technology (ICOT)
Volume 2

Ohmsha, Ltd.

IDS Press

FIFTH GENERATION COMPUTER SYSTEMS 1992
Copyright

©

1992 by Institute for New Generation Computer Technology

All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system or transmitted, in any form or by any means, electronic, mechanical, recording or
otherwise, without the prior permission of the copyright owner.
ISBN 4-274-07724-1 (Ohmsha)
ISBN 90-5199-099-5 (lOS Press)
Library of Congress Catalog Card Number: 92-073166

Published and distributed in Japan by
Ohmsha, Ltd.
3-\ Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101, Japan
Distributed in North America by
lOS Press, Inc.
Postal Drawer 10558, Burke, VA 22009-0558, U. S. A.

United Kingdom by
lOS Press
73 Lime Walk, Headington, Oxford OX3 7AD, England
Europe and the rest of the world by
lOS Press
Van Diemenstraat 94, 10 13 CN Amsterdam, Netherlands
Far East jointly by
Ohmsha, Ltd., lOS Press
Printed in Japan

111

CONTENTS OF VOLUME 1
PLENARY SESSIONS
Keynote Speech
Launching the New Era
Kazuhiro Fuchi

....................... 3

General Report on ICOT Research and Development
Overview of the Ten Years of the FGCS Project
Takashi K urozumi . . . . . . . . . . . . . . . . . . . . . . .
Summary of Basic Resear.ch Activities of the FGCS Project
Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . .
Summary of the Parallel Inference Machine and its Basic Software
Shunichi Uchida . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 9
.20
.33

Report on ICOT Research Results
Parallel Inference Machine PIM
Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.50
Operating System PIMOS and Kernel Language KL1
Takashi Chikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.73
Towards an Integrated Knowledge-Base Management System: Overview of R&D on
Databases and Knowledge-Bases in the FGCS Project
Kazumasa Yokota and Hideki Yasukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · .89
Constraint Logic Programming System: CAL, GDCC and Their Constraint Solvers
Akira Aiba and Ryuzo Hasegawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · 113
Parallel Theorem Provers .and Their Applications
Ryuzo Hasegawa and Masayuki Fujita
· 132
Natural Language Processing Software
Yuichi Tanaka . . . . . . . . . . . . . . . . . . . . . ..
· 155
Experimental Parallel Inference Software
Katsumi Nitta, Kazuo Taki and Nobuyuki Ichiyoshi . . . . . . . . . . . . . . . . . . . . . . . . . 166
Invited Lectures
Formalism vs. Conceptualism: Interfaces between Classical Software Development Techniques
and Knowledge Engineering
Dines Bj¢rner . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
The Role of Logic in Computer Science and Artificial Intelligence
J. A. Robinson . . .
Programs are Predicates
C. A. R. Hoare
Panel Discussion: A Springboard for Information Processing in the 21st Century
PANEL: A Springboard for Information Processing in the 21st Century
Robert A. Kowalski (Chairman) . . . . . . .
Finding the Best Route for Logic Programming
Herve Gallaire . . . . . . . . . . . . . . . . . .
The Role of Logic Programming in the 21st Century
Ross Overbeek . . . . . . . . . . . . . .
Object-Based Versus Logic Programming
Peter Wegner . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
Concurrent Logic Programming as a Basis for Large-Scale Knowledge Information Processing
Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

· 191
.199
.211

.219
.220
.223
.225
.230

IV

Knowledge Information Processing in the 21st Century
Shunichi Uchida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

ICOT SESSIONS
Parallel VLSI-CAD and KBM Systems
LSI-CAD Programs on Parallel Inference Machine
Hiroshi Date, Yukinori Matsumoto, Kouichi Kimura, Kazuo Taki, Hiroo Kato and
lvlasahiro Hoshi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 237
Parallel Database Management System: Kappa-P
Moto Kawamura, Hiroyuki Sato, Kazutomo Naganuma and Kazumasa Yokota . . . . . . . . 248
Objects, Properties, and Modules in QUIXOTe
Hideki Yasukawa, Hiroshi Tsuda and Kazumasa Yokota . . . . . . . . . . . . . . . . . . . . . . 257
Parallel Operating System, PIMOS
Resource Management Mechanism of PIM OS
Hiroshi Yashiro, Tetsuro Fujise, Takashi Chikayama, Masahiro Matsuo, Atsushi Hori
and K umiko Wada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
The Design of the PIMOS File System
Fumihide Itoh, Takashi Chikayama, Takeshi Mori, Masaki Sato, Tatsuo Kato and
Tadashi Sato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 278
ParaGraph: A Graphical Tuning Tool for Multiprocessor Systems
Seiichi Aikawa, Mayumi Kamiko, Hideyuki Kubo, Fumiko Matsuzawa and
Takashi Chikayama .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Genetic Information Processing
Protein Sequence Analysis by Parallel Infere?ce Machine
Masato Ishikawa, Masaki Hoshida, Makoto Hirosawa, Tomoyuki Toya, Kentaro Onizuka
and Katsumi Nitta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Folding Simulation using Temperature Parallel Simulated Annealing
Makoto Hirosawa, Richard J. Feldmann, David Rawn, Masato Ishikawa, Masaki Hoshida
.. 300
and George Michaels . . . . . . . . .... ". . . . . . . . . . . . . . . . . . . . . . . . .
Toward a Human Genome Encyclopedia
Kaoru Yoshida, Cassandra Smith, Toni Kazic, George Michaels, Ron Taylor,
David Zawada, Ray Hagstrom and Ross Overbeek . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Integrated System for Protein Information Processing
Hidetoshi Tanaka . . . . ". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Constraint Logic Programnling and Parallel Theorenl Proving
Parallel Constraint Logic Programming Language GDCC and its Parallel Constraint Solvers
Satoshi Terasaki, David J. Hawley, Hiroyuki Sawada, Ken Satoh, Satoshi Menju,
Taro Kawagishi, Noboru Iwayama and Akira Aiba . . . . . . . . . . . . . . . . . . . . . . .. . 330
cu-Prolog for Constraint-Based Grammar
Hiroshi Tsuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Model Generation Theorem Provers on a Parallel Inference Machine
Masayuki Fujita, Ryuzo Hasegawa, Miyuki Koshimura and Hiroshi Fujita
.357
Natural Language Processing
On a Grammar Formalism, Knowledge Bases and Tools for Natural Language Processing in
Logic Programming
Hiroshi Sano and Fumiyo Fukumoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376

v

Argument Text Generation System (Dulcinea)
Teruo Ikeda, Akira Kotani, Kaoru Hagiwara and Yukihiro Kubo . . . . . . . . . . . . . . . . . 385
Situated Inference of Temporal Information
Satoshi Tojo and Hideki Yasukawa
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
A Parallel Cooperation Model for Natural Language Processing
Shigeichiro Yamasaki, Michiko Turuta, Ikuko Nagasawa and Kenji Sugiyama . . . . . . . . . 405

Parallel Inference Machine (PIM)
Architecture and Implementation of PIM/p
Kouichi Kumon, Akira Asato, Susumu Arai, Tsuyoshi Shinogi, Akira Hattori,
Hiroyoshi Hatazawa and Kiyoshi Hirano . . . . . . . . . . . . . . . . . . . .
. . . 414
Architecture and Implementation of PIM/m
Hiroshi Nakashima, Katsuto Nakajima, Seiichi Kondo, Yasutaka Takeda, Yu Inamura,
Satoshi Onishi and Kanae Masuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 5
Parallel and Distributed Implementation of Concurrent Logic Programming Language KLI
Keiji Hirata, Reki Yamamoto, Akira Imai, Hideo Kawai, Kiyoshi Hirano,
Tsuneyoshi Takagi, Kazuo Taki, Akihiko Nakase and Kazuaki Rokusawa
.436
Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . i

VII

CONTENTS OF VOLUME 2
FOUNDATIONS
Reasoning about ProgralTIS
Logic Program Synthesis from First Order Logic Specifications
Tadashi Kawamura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 463
Sound and Complete Partial Deduction with Unfolding Based on Well-Founded Measures
Bern Martens, Danny De Schreye and Maurice Bruynooghe . . . . . . . . . . . . . . . .
.473
A Framework for A.nalyzing the Termination of Definite Logic Programs with respect to Call
Patterns
Danny De Schreye, Kristof Verschaetse and Maurice Bruynooghe . . . . . . . . . . . . . . . 481
Automatic Verification of GHC-Programs: Termination
Lutz Plumer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
Analogy
Analogical Generalization
Takenao Ohkawa, Toshiaki Mori, Noboru Babaguchi and Yoshikazu Tezuka . . . . . . . . . . 497
Logical Structure of Analogy: Preliminary Report
Jun Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Abduction (1)
Consistency-Based and Abductive Diagnoses as Generalised Stable Models
Chris Preist and Kave Eshghi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 514
A Forward-Chaining Hypothetical Reasoner Based on Upside-Down Meta-Interpretation
Yoshihiko Ohta and Katsumi Inoue ... . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 522
Logic Programming, Abduction and Probability
David Poole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
Abduction (2)
Abduction in Logic Programming with Equality
P. T. Cox, E. Knill and T. Pietrzykowski
.539
Hypothetico-Deductive Reasoning
Chris Evans and Antonios C. Kakas . . . .
.546
Acyclic Disjunctive Logic Programs with Abductive Procedures as Proof Procedure
Phan Minh Dung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
Semantics of Logic Progranls
Adding Closed World Assumptions to Well Founded Semantics
Luis Moniz Pereira, Jose J. Alferes and Joaquim N. Aparicio . . . . . . . . . . . . . . . . . . . 562
Contributions to the Semantics of Open Logic Programs
A. Bossi, M. Gabbrielli, G. Levi and M. C. Meo . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
A Generalized Semantics for Constraint Logic Programs
Roberto Giacobazzi, Saumya K. Debray and Giorgio Levi
.581
Extended Well-Founded Semantics for Paraconsistent Logic Programs
Chiaki Sakama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Invited Paper
Formalizing Database Evolution in the Situation Calculus
Raymond Reiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

VIII

Machine Learning
Learning Missing Clauses by Inverse Resolution
Peter Idestam-Almquist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 610
A Machine Discovery from Amino Acid Sequences by Decision Trees over Regular Patterns
Setsuo Arikawa, Satoru Kuhara, Satoru Miyano, Yasuhito Mukouchi, Ayumi Shinohara
an d Takeshi Sh in oh ara . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
.618
Efficient Induction of Version Spaces through Constrained Language Shift
Claudio Carpineto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.626
Theorem Proving
Theorem Proving Engine and Strategy Des·cription Language
Massimo Bruschi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 634
A New Algorithm for Subsumption Test
Byeong Man Kim, Sang Ho Lee, Seung Ryoul Maeng and Jung Wan Cho . . . . . . . . . . . 643
On the Duality of Abduction and Model Generation
Marc Denecker and Danny De Schreye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Functional Programming and Constructive Logic
Defining Concurrent Processes Constructively
Yukihide Takayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Realizability Interpretation of Coinductive Definitions and Program Synthesis with Streams
Makoto Tatsuta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MLOG: A Strongly Typed Confluent Functional Language with Logical Variables
Vincent Poirriez . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ANew Perspective on Integrating Functional and Logic Languages
John Darlington, Yi-ke Guo and Helen Pull . . . . . . . . . . . . . . . . . . . . . . . . . .
Temporal Reasoning
A Mechanism for Reasoning about Time and Belief
Hideki Isozaki and Yoav Shoham . . . . . . ..
Dealing with Time Granularity in the Event Calculus
Angelo Montanari, Enrico Maim, Emanuele Ciapessoni and Elena Ratto

.658
.666

. 674
. 682

.694
.702

ARCHITECTURES & SOFTWARE
Hardware Architecture and Evaluation
UNIRED II: The High Performance Inference Processor for the Parallel Inference Machine
PIE64
Kentaro Shimada, Hanpei Koike and Hidehiko Tanaka . . . . . . . . . . . . . . . . . . . . .. 715
Hardware Implementation of Dynamic Load Balancing in the Parallel Inference Machine
PIM/c
T. Nakagawa, N. Ido, T. Tarui, M. Asaie and M. Sugie . . . . . . . . . . . . . . . . . . .
. 723
Evaluation of the EM-4 Highly Parallel Computer using a Game Tree Searching Problem
Yuetsu Kodama, Shuichi Sakai and Yoshinori Yamaguchi . . . . . . . . . . . . . . . . .
.731
OR-Parallel Speedups in a Knowledge Based System: on Muse and Aurora
Khayri A. M. Ali and Roland Karlsson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Invited Paper
A Universal Parallel Computer Architecture
William J. Dally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746

IX

AND-ParallelislTI and OR-Parallelism
An Automatic Translation Scheme from Prolog to the Andorra Kernel Language
Francisco Bueno and Manuel Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 759
Recomputation based Implementations of And-Or Parallel Prolog
Gopal Gupta and Manuel V. Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Estimating the Inherent Parallelism in Prolog Programs
David C. Sehr and Laxmikant V. Kale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Implementation Techniques
Implementing Streams on Parallel Machines with Distributed Memory
Koichi Konishi, Tsutomu Maruyama, Akihiko Konagaya, Kaoru Yoshida and
Takashi Chikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .791
Message-Oriented Parallel Implementation of Moded Flat GHC
I{azunori Ueda and Masao Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799
Towards an Efficient Compile-Time Granularity Analysis Algorithm
x. Zhong, E. Tick, S. Duvvuru, L. Hansen, A. V. S. Sastry and R. Sundararajan
.809
Providing Iteration and Concurrency in Logic Programs through Bounded Quantifications
.817
Jonas Barklund and Hakan Millroth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Extension of Logic Programming
An Implementation for a Higher Level Logic Programming Language
Anthony S. K. Cheng and Ross A. Paterson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
Implementing Prolog Extensions: a Parallel Inference Machine
Jean-Marc Alliot, Andreas Herzig and Mamede Lima-Marques . . . . . . . . . . . . . . . . . . 833
Parallel Constraint Solving in Andorra-I
Steve Gregory and Rong Yang . . . . . . . . . . . . . . . . . . . .
.843
A Parallel Execution of Functional Logic Language with Lazy Evaluation
Jong H. Nang, D. W. Shin, S. R. Maeng and Jung W. Cho . . . . . . . . . . . . . . . . . . . . 851
Task Scheduling and Load Analysis
Self-Organizing Task Scheduling for Parallel Execution of Logic Programs
Zheng Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Asymptotic Load Balance of Distributed Hash Tables
Nobuyuki Ichiyoshi and Kouichi Kimura

.859
.869

Concurrency
Constructing and Collapsing a Reflective Tower in Reflective Guarded Horn Clauses
Jiro Tanaka and Fumio Matono . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877
CHARM: Concurrency and Hiding in an Abstract Rewriting Machine
Andrea Corradini, Ugo Montanari and Francesca Rossi . . . . . .
.887
Less Abstract Semantics for Abstract Interpretation of FG H C Programs
Kenji Horiuchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .897
Databases and Distributed SystelTIS
Parallel Optimization and Execution of Large Join Queries
Eileen Tien Lin, Edward Omiecinski and Sudhakar Yalamanchili . . . . . . .
Towards an Efficient Evaluation of Recursive Aggregates in Deductive Databases
Alexandre Lefebvre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Distributed Programming Environment based on Logic Tuple Spaces
Paolo Ciancarini and David Gelernter . . . . . . . . . . . . . . . . .

.907
.915
.926

x

Programluing Environn1.ent
Visualizing Parallel Logic Programs with VISTA
E. Tick . . . . . . . . . . . . . . . . . . . . . .
Concurrent Constraint Programs to Parse and Animate Pictures of Concurrent Constraint
Programs
Kenneth M. Kahn . . . . . . . . . . . . . . . . . . . . . . .
Logic Programs with Inheritance
Yaron Goldberg, William Silverman and Ehud Shapiro . . . . . . . . . . . . . . . . . . .
Implementing a Process Oriented Debugger with Reflection and Program Transformation
Munenori Maeda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

· .. 934

.943
.951
. 961

Production Systen1.s
ANew Parallelization Method for Production Systems
E. Bahr, F. Barachini and H. Mistelberger
. . . . . . . . . . . . . . . . . . . . . . . · .. 969
Performance Evaluation of the Multiple Root Node Approach to the Rete Pattern Matcher
for Production Systems
Andrew Sohn and Jean-Luc Gaudiot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . · .. 977

APPLICATIONS & SOCIAL IMPACTS
Constraint Logic Programluing
Output in CLP(R)
Joxan Jaffar, Michael J. Maher, Peter J. Stuckey and Roland H. C. Yap
Adapting CLP(R) to Floating-Point Arithmetic
J. H. M. Lee and M. H. van Emden ..
Domain Independent Propagation
Thierry Le Provost and Mark Wallace . . . . . . . . . . . . . . . .
A Feature-Based Constraint System for Logic Programming with Entailment
Hassan 'Ai't-Kaci, Andreas Podelski and Gert Smolka . . . . . . . . . . . .
Qualitative Reasoning
Range Determination of Design Parameters by Qualitative Reasoning and its Application to
Electronic Circuits
Masaru Ohki, Eiji Oohira, Hiroshi Shinjo and Masahiro Abe
Logical Implementation of Dynamical Models
Yoshiteru Ishida . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge Representation
The CLASSIC Knowledge Representation System or, I<:L-ONE: The Next Generation
Ronald J. Brachman, Alexander Borgida, Deborah L. McGuinness, Peter F. PatelSchneider and Lori Alperin Resnick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Morphe: A Constraint-Based Object-Oriented Language Supporting Situated Knowledge
Shigeru Watari, Yasuaki Honda and Mario Tokoro . . . . . .
. . . . . . . .
On the Evolution of Objects in a Logic Programming Framework
F. Nihan Kesim and Marek Sergot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.987
.996
1004
1012

1022
1030

1036
1044
1052

Panel Discussion: Future Direction of Next Generation Applications
The Panel on a Future Direction of New Generation Applications
Fumio Mizoguchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
Knowledge Representation Theory Meets Reality: Some Brief Lessons from the CLASSIC
Experience
Ronald J. Brachman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063

XI

Reasoning wi th Constraints
Catherine Lassez
Developments in Inductive Logic Programming
Stephen 1'1'1uggleton . . . . . . . . . . . . . . . . . . . ..
Towards the General-Purpose Parallel Processing System
Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . .
Know ledge- Based SystelTIS
A Hybrid Reasoning System for Explaining
Mistakes in Chinese Writing
(
Jacqueline Castaing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automatic Generation of a Domain Specific Inference Program for Building a Knowledge
Processing System
Takayasu Kasahara, Naoyuki Yamada, Yasuhiro Kobayashi, Katsuyuki Yoshino and
Kikuo Yoshimura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge-Based Functional Testing for Large Software Systems
Uwe Nonnenmann and John K. Eddy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Diagnostic and Control Expert System Based on a Plant MO,del
Junzo Suzuki, Chiho Konuma, Mikito Iwamasa, Naomichi Sueda, Shigeru Mochiji and
Akimoto Kamiya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1066
1071
1074

1076

1084
1091

1099

Legal Reasoning
A Semiformal Metatheory for Fragmentary and Multilayered Knowledge as an Interactive
Metalogic Program
Andreas Hamfelt and Ake Hansson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1107
HELIC-II: A Legal Reasoning System on the Parallel Inference Machine
Katsumi Nitta, Yoshihisa Ohtake, Shigeru Maeda, Masayuki Ono, Hiroshi Ohsaki and
Kiyokazu Sakane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115
Natural Language Processing
Chart Parsers as Proof Procedures for Fixed-Mode Logic Programs
David A. Rosenblueth . . . . . . . . . . . . . . . . . . . .
A Discourse Structure Analyzer for Japanese Text
K. Sumita, K. Ono, T. Chino, T. Ukita and S. Amano . . . . . . .
Dynamics of Symbol Systems: An Integrated Architecture of Cognition
K6iti Hasida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Knowledge Support Systems
Mental Ergonomics as Basis for New-Generation Computer Systems
M. H. van Emden . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An Integrated Knowledge Support System
B. R. Gaines, M. Linster and M. L. G. Shaw . . . . . . . . . . . .
Modeling the Generational Infrastructure of Information Technology
B. R. Gaines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1125
1133
1141

1149
1157
1165

Parallel Applications
Co-HLEX: Co-operative Recursive LSI Layout Problem Solver on Japan's Fifth Generation
Parallel Inference Machine
Toshinori Watanabe and Keiko Komatsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173
A Cooperative Logic Design Expert System on a Multiprocessor
Yoriko Minoda, Shuho Sawada, Yuka Takizawa, Fumihiro Maruyama and
Nobuaki Kawato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. ... 1181
A Parallel Inductive Learning Algorithm for Adaptive Diagnosis
Yoichiro N akakuki, Yoshiyuki Koseki and Midori Tanaka . . . . . . . . . . . . . . . . . . . . 1190

XII

Parallel Logic Simulator based on Time Warp and its Evaluation
Yukinori Matsumoto and Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198

Invited Paper
Applications of Machine Learning: Towards Knowledge Synthesis
Ivan Bratko

1207

Author Index.

. . i

FOUNDATIONS

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992

463

Logic Program Synthesis from First Order Logic
Specifications
Tadashi KAWAMURA
Institute for New Generation Computer Technology
1-4-28 Mita, Minato-ku, Tokyo 108, Japan
tkawamur@icot.or.jp

Abstract
In this paper, a logic program synthesis method from first
order logic specifications is described. The specifications
are described by Horn clauses extended by universally
quantified implicational formulae. Those formulae are
transformed into definite clause programs by meaningpreserving unfold/fold transformation. We show some
classes of first order formulae which can be successfully
transformed into definite clauses automatically by unfold/fold transformation.

1

Introduction

Logic program synthesis based on unfold/fold transformation [1] is a standard method and has been investigated by many researchers [2, 3, 5, 6, 11, 12, 19]. As
for the correctness of unfold/fold rules in logic programming, Tamaki and Sato proposed meaning-preserving
unfold/fold rules for definite clause programs [20]. Then,
Kanamori and Horiuchi proposed unfold/fold rules for a
class of first order formulae [7]. Recently, Sato proposed
unfold/fold rules for full first order formulae [18].
In the studies of program synthesis, unfold/fold rules
are used to eliminate quantifiers by folding to obtain definite clause programs from· first order formulae. However, in most of those studies, unfold/fold rules were applied nondeterministically and general methods to derive
definite clauses were not known. Recently, Dayantis [3]
showed a deterministic method to derive logic programs
from a class of first order formulae. Sato and Tamaki [19]
also showed a deterministic method by incorporating the
concept of continuation.
This paper shows another characterization of classes of
first order formulae from which definite clause programs
can be derived automatically. Those formulae are described by Horn clauses extended by universally quantified implicational formulae. As for transformation rules,
Kanamori and Horiuchi's unfold/fold rules are adopted.
A synthesis procedure based on unfold/fold rules is given,
and with some syntactic restrictions, those formulae are
successfully transformed into equivalent definite clause
programs. This study is also an extension of those by

Pettorossi and Proietti [14, 15, 16] on logic program
tr ansforma tions.
The rest of this paper is organized as follows. Section
2 describes unfold/fold rules and formalizes the synthesis
process. Section 3 describes a program synthesis procedure and proves that definite clause programs can be successfully derived from some classes of first order formulae
using this procedure. Section 4 discusses the relations to
other works and Section 5 gives a conclusion.
In the following, familiarity with the basic terminologies of logic programming is assumed[13]. As syntactical
variables, X, Y, Z, U, V are used for variables, A, B, H
for atoms and F, G for formulae, possibly with primes
and subscripts. In addition, 0 is used for a substitution,
FO for the formula obtained from formula F by applying
substitution 0, X for a vector of variables and Fa[G'] for
replacement of an occurrence of subformula G of formula
F with formula G'.

2

Unfold/Fold Transformation
for Logic Program Synthesis

In this section, preliminary notions of our logic program
synthesis are shown.

2.1

Preliminaries

Preliminary notions are described first.
A formula is called an implicational goal when it is of
the form Fl -+ F2, where Fl and F2 are conjunctions of
atoms.

Definition 2.1 Definite Formula
Formula C is called a definite formula when C is of
the form
A f - G1 1\ G2 1\ ... 1\ Gn (n ~ 0),
where Gi is a (possibly universally quantified) conjunction of implicational goals for i = 1,2, ... ,n. A is called
the head of C, G 1 1\ G 2 1\ . .. 1\ G n is called the body of
C and each G i is called a goal in the body of C.

464

Note that the notion of a definite formula is a restricted
form of that in [7].
A set of definite formulae is called a definite formula
program, while a set of definite clauses is called a definite
clause program. We may simply say programs instead of
definite formula (or clause) programs when it is obvious
to which we are referring.

Definition 2.2 Definition Formula
Let P be a definite formula program. A definite formula D is called a definition formula for P when all the
predicates appearing in D's body are defined by definite
clauses in. P and the predicate of D's head does not appear in P. The predicate of D's head is called a new
predicate, while those defined by definite clauses in P
are old predicates. A set of formulae D is called a definition formula set for P when every element D of D is
a definition formula for P and the predicate of D's head
appears only once in D.
Atoms with new predicates are called new atoms, while
those with old predicates are called old atoms.

2.2

Unfold/Fold

~ansformation

In this subsection, unfold/fold transformation rules are
shown following [7]. Below, we assume that the logical
constant i1'ue implicitly appears in the body of every unit
clause. Further, we assume that a goal is always deleted
from the body of a definite formula when it is the logical
constant true, and a definite formula is always deleted
when some goal in its body is the logical constant false.
Further, we introduce the reduction of implicational
goals with logical constant true and false, such as
,true::::} false, true /\ F ::::} F, and so on. (See [7] for
details.) Let G be an implicational goal. The reduced
form of G, denoted by G L is the normal form in the
above reduction system.
Variables not quantified in formula F are called global
variables of F. Atoms appearing positively (negatively)
in formula F are called positive (negative) atoms of F.

Definition 2.3 Positive Unfolding
Let Pi be a program, C be a definite formula in Pi,
G be a goal in the body of C and A be a positive old
atom of G containing no universally quantified variable.
Then, let Go be GA[Jalse] 1 and Cb be the definite formula obtained from C by replacing G with Go. Further,
let Cl , C2 , ••. ,Ck be all the definite clauses in Pi whose
heads are unifiable with A, say by mgu's 01 , O2 , ••• , Ok.
Let Gj be the reduced form of GOj after replacing AO j in
GO j with the body of CjOj , and Cj be the definite formula
obtained from COj by replacing GOj in the body with G j .
(New variables introduced from Cj are global variables
of Gj.) Then, Pi+l = (Pi - {C}) u {Cb,C~,c~, ... ,CD.
Cb, C~, C~, . .. , Ck are called the results of positive unfolding C at A (or G).

Example 2.1 Let P be a definite clause program as follows:
Cl : list([]).
C2 : list([XIL]) f - list(L).
C3 : 0 < suc(Y).
C4 : suc(X) < suc(Y) f - X < Y.
C5 : member(U,[UIL]).
C6 : member(U,[VIL]) f - member(U,L).
Let C7 be a definition formula for P as follows :
C7 : less-than-all(X,L) f list(L) /\ V Y(member(Y,L) ---t X
when every descendant formula C' of C in P' satisfies
one of the following:

(a) C' is a definite clause.
(b) There exists a goal G consisting of positive atoms
only in the body of C' such that an old atom in G is
not unifiable with the head of any definite clause in P'.
(c) By successively folding C' by clauses in {C} U D, a
definite clause can be obtained.
PU {C} is said to be closed with respect to D when there
exists a closed program with respect to < P, C, D > and
for every definition formula D in D there exists a closed
program with respect to < P, D, D U {C} >.
Example 2.5 Let P and P3 be programs in Example 2.2. Then, P3 is closed w.r.t. < P, C7 , 0 >. Further,
P U {C7 } is closed w.r.t. 0.
The above framework is an extension of the one shown
in [8], and also a modification of the one Pettorossi and
Proietti proposed [14, 15, 16] in their studies of program
transformation.
Now, our problem can be formalized as follows: for
given definite clause program P and definition formula
C for P, find a finite definition formula set 1) for P such
that P U {C} is closed with respect to D.

3

Some Classes of First Order
Formulae from Which Logic
Programs Can Be Derived

In this section, we specify some classes of first order formulae from which definite clause programs can be derived by unfold/fold transformation.

3.1

A Program Synthesis Procedure

In this subsection, we show a naive program synthesis
procedure. In the following, we borrow some notions
about programs in [15, 16]. We consider definite formula
(clause) programs with predicate =, which have no explicit definition in the programs. Predicate = is called
a base predicate, while other predicates are called defined predicates. Atoms with base predicates are called
base atoms, while those with defined predicates are called
defined atoms. Transformation rules can be applied to
defined atoms only.
A formula containing base atoms can be reduced by
unifying arguments of =. When a universally quantified variable and a global variable are unified, the global
variable is substituted for the universal one. The above
reduction is called the reduction with respect to =. We
assume that no formulae are reduced w.r.t. = unless this
is explicitly mentioned.
Further, we assume that the following operations are
always applied implicitly to the results of positive or negative unfolding. Goals G is said to be connected when
at most one universally quantified implicational goal G'

466

appears in G and each atom in G' has common universally quantified variables with at least one another atom
in G'. Let C be a definite formula such that all the goals
in its body are connected. Let C' be one of the results of
positive or negative unfolding C at some goal. By logical
deduction, definite formulae CL C~, ... ,C:n (m 2: 1) are
obtained from C' such that all the goals in the body of
Cf are connected. (Note that some goal G in the body of
C' is of the form Fl ---t F2 or Fl V F2 and no universally
quantified variables appear in both Fl and F 2 , C' can be
split into two formulae by replacing G in C' with ,Fl
(or F l ) and F 2 .)
Before showing our program synthesis procedure, a notion is defined.
Definition 3.1 Sound Unfolding
Suppose that positive or negative unfolding is applied
to a definite formula at atom A. Then, the application
of unfolding is said to be sound when no two distinct
universally quantified variables in A are unified when
reducing the result of unfolding with respect to =.
Some syntactic restrictions on programs ensure the
soundness of all possible applications of unfolding. In
fact, the restriction shown in [3] ensures the soundness.
However, in the following, we assume that every application of unfolding is sound, without giving any syntactic
restriction, for simplicity.
Now, we show our program synthesis procedure, which
is similar to partial evaluation procedures(cf.[9, 10]).
First, a procedure to synthesize new predicates is shown.
Procedure 3.1 Synthesis of New Predicates
Suppose that definite formula program P and definite
formula C in P of the form A +- G ll G2 , .•• , Gn are
given. Let G~ be the reduced formula obtained from Gi
by removing all base atoms and by replacing all universally quantified variables appearing in every base atom
with distinct fresh global variables if global variables are
substituted for them when reducing Gi w.r.t. =. Let Di
be of the form Hi +- G~ for i = 1,2, ... , n, where Hi is
an atom whose predicate does not appear in P or H j for
i =I- j and whose arguments are all global variables of C
appearing in Gi. Then, D l ,D 2 , ••• ,Dn are returned.
Note that in Procedure 3.1, C can be folded by
D l ,D2 , •.• ,Dn after reducing it w.r.t. = when C is the
result of sound unfolding, and the result of the folding is
a definite clause.
Example 3.1 Let P be a program as follows.
C l : all-less-than(L,M) +- list(L) 1\ list(M) 1\
V U,V (member(U,L) 1\ member(V,M) ---t U < V).
C 2 : member(U,[VIXD +- U = V.
C 3 : member(U,[VIX]) +- member(U,X).
The definition of '<' is given in Example 2.1. Suppose
that C's body consists of only one goal. By applying

positive unfolding and negative unfolding to C successively, the following formulae are obtained. (The reduction w.r.t. = is done when no universally quantified variable appears as an argument of =.)
C4 : all-less-than(O,M) +- list(M).
Cs : all-less-than([XILJ,M) +- (list(L) 1\ list(M)) 1\
(list(L) 1\ list(M) 1\
V U,V (U = X 1\ member(V,M) ---t U < V)) 1\
(list(L) 1\ list(M) 1\
VU,V (member(U,L)l\member(V,M) ---t U < V)).
Then, by Procedure 3.1, the following new predicates are
defined from Cs .
Dl : new1(X,L,M) +- list(L) 1\ Hst(M) 1\
V V (member(V ,M) --+ X < V).
D2 : new2(L,M) +- list(L) 1\ Hst(M) 1\
V U,V (member(U,L) 1\ member(V,M) ---t U < V).
Next, the whole procedure for program synthesis is
shown.
Procedure 3.2 A Program Synthesis Procedure
Suppose that definite clause program P and definition
formula C for P are given. Let 'D be the set {C}.
(a) If there exist no unmarked formulae in 'D, then return P and stop.
(b) Select an unmarked definition formula D from 'D.
Mark D 'selected.' Let P' be the set {D}.
(c) If there exist no formulae in P' which do not satisfy
conditions (a) and (b) in Definition 2.8; then P :=
PUP' and go to (a).
(d) Select a definite formula C' from P'. Apply positive
or negative unfolding to C'. Let CJ, ... , Cn be the
results. Remove C' from P'.
(e) Apply Procedure 3.1 to C l , ... , Cn. Let DJ,"" Dm
be the outputs. Add Di to 'D if it is not a definite clause
and there exists no formula in 'D which is identical to D i
except for the predicate of the head. Fold C1 , •• • , Cn
by the formulae in 'D and add the results to P'.
(f) Go to (c).
Example 3.2 Consider the program in Example 3.1
again. We see that D2 is identical to C except for the
predicate of the head. Cs can be folded by D1 and C
after reduction w.r.t. =. The result is as follows.
C6 : all-less-than([XIL],M) +- list(L) 1\ list(M) 1\
new1(X,L,M) 1\ all-less-than(L,M).
Similar operations are applied to D 1 , and finally, the
following clauses are obtained.
D3 : new1(X,L,D) +- list(L).
D4 : new1(X,L,[YIM)) +- X < Y 1\ new1(X,L,M).
Note that Procedure 3.2 does not necessarily derive
a definite clause program from a definite formula program. For example, when the following program is given
as input, Procedure 3.2 does not halt.
C1 : p(X,Y) +- p(X,Z) 1\ p(Z,Y)
C2 : h(X,Y) +- V Z (p(X,Z) --+ p(Y,Z))

467

3.2

Classes of First Order Formulae

In this section, we show some classes of definite formula
programs which can be transformed into equivalent definite clause programs by Procedure 3.2.
Throughout this subsection, we assume that unfolding
is always applicable to every definite formula at an atom
when there exist definite clauses whose heads are unifiable with the atom. Note that the above assumption
does not always hold. This problem will be discussed
in 3.3.
After giving a notion, we show a theorem which is an
extension of the results shown in [15]. A simple expression is either a term or an atom.

Definition 3.2 Depth of Symbol in Simple Expression
Let X be a variable or a constant and E be a simple
expression in which X appears. The depth of X in E,
denoted by depth( X ,E), is defined as follows.
(a) depth(X,X) = l.
(b) depth(X,E) = max{depth(X,ti)IX appears in ti
for i = 1, ... ,n} + 1, if E is either f(tl, ... ,tn ) or
p( t l , ... , t n ), for any function symbol f or any predicate symbol p.
The deepest variable or constant in E is denoted by
maxdepth( E).
Theorem 3.1 Let P be a definite clause program. Su'ppose that for any definition formula C for P, there exists
a U-selection rule R for P U {C} rooted on C such that R
is defined for all descendant clauses of C in which at least
one defined atom appears. Suppose also that there exist
two positive integers Hand W such that every descendant clause C' of C in every program P' obtained from
P U {C} via R satisfies the following two conditions.
( a) The depth of every term appearing in every goal in
the body of C' is less than H.
(b) Let Gl,GZ, ... ,Gn be connected goals inthe body
of C'. Then, the number of atoms appearing in Gi is
less than W, for i = 1,2, ... , n.
Then, there exists a finite definition formula set 1) for P
such that P U {C} is closed with respect to 1).
Proof. From hypothesis (a), only a finite number of distinct atoms (modulo renaming of variables) can appear
in the goals of all the descendant formulae of C. Then,
apply Procedure 3.2 to P and C. Note that every goal in
the body of every descendant formula of C is connected.
Then, for every goal of every descendant formula of C,
the number of atoms appearing in the goal is less than
W, from hypothesis (b). Hence, only a finite number of
distinct goals can appear in all the descendant formulae
of C. Thus, we can obtain a finite definition formula
set 1)0 for P such that there exists a closed program P'
w.r.t. < P, C, 1)0 >.
The above discussion holds for all the definition formulae in 1)0, since those formulae are constructed from

bodies of the descendant formulae of C. Evidently, only
a finite number of distinct definition formulae can be defined. Thus, there exists a finite definition formula set 1)
for P such that P U {C} is closed w.r. t. 1).
0
Theorem 3.1 shows that Procedure 3.2 can derive a
definite clause program when (a) a term of infinite depth
can not appear, or (b) an infinite number of atoms can
not appear in a connected goal during a transformation
process. In the following, we show some syntactic restrictions on programs which satisfy the above conditions.
Proietti and Pettorossi showed some classes of definite
clause programs which satisfy the conditions in Theorem 3.1 in their studies of program transformation [15].
We show that some extensions of their results are applicable to our problem.
The following definitions are according to [15]. The set
of variables occurring in simple expression E is denoted
by var(E).

Definition 3.3 Linear Term Formula and Program
A simple expression or a formula is said to be linear
when no variable appears in it more than once. A definite
formula (clause) is called a linear term formula (clause)
when every atom appearing in it is linear. A definite
formula ( clause) program is called a linear term program
when it consists of linear term formulae (clauses) only.
A linear term formula (clause) is called a strongly linear term formula (clause) when its body is linear. A definite formula (clause) program is called a strongly linear
term program when it consists of strongly linear term
formulae (clauses) only.
Note that the following definite clause is not a linear
term clause.
member(X,[XIL]).
However, it is easy to obtain an equivalent linear term
clause as follows :
member(X,[YIL])+-- X=Y.

Definition 3.4 A Relation ~ between Linear Simple
Expressions
Let El and Ez be linear simple expressions. When
depth(X,El)~depth(X,Ez) holds for every variable X in
var(El)nvar(Ez), we write El ~ E z. (Both El ~ Ez and
Ez ~ El hold when var(El)nvar(Ez)= 0. )
Definition 3.5 Non-Ascending Formula and Program
Let C be a linear term formula and H be the head of
C. C is said to be non-ascending when A ~ H holds
for every defined atom A appearing in the body of C. A
linear term program is said to be non-ascending when it
consists of non-ascending formulae only.
A definite formula (clause) is said to be strongly nonascending when it is a strongly linear term formula
(clause) and non-ascending. A definite formula (clause)
program is said to be strongly non-ascending when it

468

consists of strongly non-ascending formulae (clauses)
only.
Definition 3.6 Synchronized Descent Rule
Let P be a linear term program, R be a V-selection
rule for P and C be any descendant formula of the root
formula for R. Let AI, A 2, ... ,An be all the atoms appearing in the body of C. Then, R is called a synchronized descent rule when
(a) R selects the application of positive or negative unfolding to C at Ai if and only if Aj :::; Ai holds for
j = 1, ... , n, and
(b) R is not defined for C, otherwise.
Note that synchronized descent rules are not necessarily defined uniquely for given programs and definition
formulae.
The following theorem is an extension of the one shown
in [15, 16].
Lemma 3.2 Let P be a non-ascending definite clause
program, C be a linear term definition formula for P, and
R be a synchronized descent rule rooted on C. Let p' be
a program obtained from PU{ C} via R. For each defined
atom A appearing in the body of every descendant clause
of C in pI, the following holds :
maxdepth(A) ~
max{maxdepth(B)j B is a defined atom in P U {C}}
Proof By induction on the number of applications of
unfolding.
0

Now we show some classes of definite formula programs
which satisfy the hypotheses of Theorem 3.1. In the following, for simplicity, we deal with definition formulae
with only one universally quantified implicational goal
in the body. The results are easily extended to the definite formulae with a conjunction of universally quantified
implicationaJ goals.
The following results are also extensions of those
shown in [15].
Theorem 3.3 Let P be a strongly non-ascending definite clause program and C be a linear term definition
formula for P of the form H f - Al /\ VX(A2 ~ A 3 ), such
that the following hold.
(a) For every clause D in P of the form HD f - BI /\ ... /\
Bn /\ B~ /\ ... /\ E~, where B I , ... ,En are defined atoms
and B~, ... ,B~ are base atoms, the following hold.
(a-I) Let tH be any argument of H D . For every argument ti of B i , if tH contains a common variable with
ti, then ti is a subterm of tHo
(a- 2) For every argument ti of B i , if ti is a su bterm
of an argument tH of H D , then no other argument of
Bi is a subterm of tHo
(b) There exist two arguments ti and Si of some Ai (ti i
Si, i = 1,2 or 3) such that the following hold.

(b-l) There exists an argument tj of Aj (i 1= j) such
that
vars( Ai )nvars( Aj )=vars( ti )nvars( t j), and
. either ti is a subterm of tj, tj is ;1 subterm of ti or
vars( ti)nvars( tj )=0.
(b-2) There exists an argument Sk of Ak (k =I i,j)
such that the same relations as above hold for Si and
Sk'

(b-3) Aj contains no common variable with A k .
Then, there exists a definition formula set 'D for P such
that P U {C} is closed with respect to 'D.
Proof Note that there exists an atom A in the body of C
s.t. an argument of A is a maximal term in the body of
C w.r.t. subterm ordering relation. Let C ' be any result
of unfolding C at A and G be any connected goal in the
body of C ' of the form FI /\ VX(F2 ~ F3 ), where Fi is a
conjunction of atoms. Then, from the hypothesis, it can
be shown that a similar property to hypothesis (b) holds
for G. Note that the number of implicational goals dose
not increase by applying positive unfolding and no global
variables are instantiated by applying negative unfolding.
Then, again there exists an atom in the body of C ' s.t.
one of its arguments is a maximal term in the body of
C ' w.r.t. subterm ordering relation. By induction on
the number of applications of unfolding, a synchronized
descent rule can be defined for every descendant formula
of C. Then, from Lemma 3.2, the depth of every term
appearing in every descendant clause of C is bounded.
Note that the number of different subterms of a term
is bounded. Then, from the hypothesis, the number of
atoms appearing in every connected goal in the body of
every descendant formula of C is bounded. Thus, P and
C satisfy the hypotheses of Theorem 3.1. Hence, there
exists a definition formula set 'D for P such that P U {C}
is closed with respect to 'D.
0

Note that Theorem 3.3 holds for any nondeterministic
choice of synchronized descent rules in the above proof.
Note also that any program can be modified to satisfy
hypothesis (a) of Theorem 3.3 by introducing atoms with
= in the body.
Corollary 3.4 Let P be a strongly non-ascending definite clause program and pI be a definite clause program
such that no predicate appears in both P and P'. Let
C be a linear term definition formula for P U pI of the
form H f - Al /\ \fX(A2 ~ A 3 ), where the predicates of
Al and A2 are defined in P and that of A3 is defined in
P'. Suppose that the following hold.
(a) Hypothesis (a) of Theorem 3.3 holds for every clause
Din P.
(b) There exist arguments tl of Al and t2 of A2 such
that the following hold.
(b-l) vars(A 1 )nvars(A 2 )=vars( tl)nvars(t2)'

469

(b-2) Either tl is a subterm of t z, tz is a subterm of tl
or vars(tdnvars(tz)=0.
(c) No variable in A3 is instantiated by applying positive or negative unfolding to C successively.
Then, there exists a definition formula set 'D for P U p'
such that P U p' U {C} is closed with respect to 'D.
Proof. Suppose that unfolding is never applied at A 3 . A
synchronized descent rule can be defined by neglecting
A 3. Since variables in A3 are never instantiated, no other
atoms are derived from A 3 . Thus, the corollary holds. 0

In Corollary 3.4, no restrictions are required on the
definition of A 3 • This result corresponds to that in [3].
Note that any program can be modified to satisfy hypothesis (c) of Corollary 3.4 by introducing atoms with
= in the body.
Example 3.3 The program and the definition formula
in Example 2.1 satisfy the hypotheses of Theorem 3.3 and
Corollary 3.4, if clause C5 is replaced with the equivalent
clause:
C~: member(U,[V\L]) f - U=V.
In fact, a definite clause program can be obtained, as
shown in subsection 2.2.
Next, we show an extension of the results shown in
Theorem 3.3. Let P be a non-ascending definite clause
program and C be a definition formula for P of the form
H f - A/\ VX(FI -+ F z ), where Ais an atom, and Fl and
Fz are conjunctions of atoms. Let Di be the definition
clause for P of the form Hi f - Fi for i = 1,2. If Di
can be transformed into a set of definite clauses which
satisfies the hypotheses of Theorem 3.3, by replacing Fi
with Hi, we can show that P U {C} can be transformed
into an equivalent definite clause program.
The above problem is related to the foldability problem in [16]. The foldability problem is described informally as follows. Let P be a definite clause program and
C be a definition clause for P. Then, find program pI
obtained from P U {C} which satisfies the following: for
every descendant clause C ' of C in pI, there exists an ancestor clause D of C ' such that C"s body is an instance
of D's.
Proietti and Pettorossi showed some classes of definite
clause programs such that thefoldability problem can be
solved [16]. We show that their results are also available
to our problem.
A definite clause program P is said to be linear recursive when at most one defined atom appears in the body
of each clause in P. Note that a linear recursive and
linear term program (clause) is a strongly linear term
program (clause).
Lemma 3.5 Let P be a linear recursive non-ascending
program and C be a non-ascending definition clause for
P of the form H f - Al /\ Az /\ Bl /\ ... /\ B n , where Al

and Az are defined atoms and B 1 , ••. , Bn are base atoms.
Suppose that the following hold.
(a) For every clause D in P of the form HD f - AD /\
B~ /\ ... /\ B~, where AD is the only defined atom in
the body of D, the following hold.
(a-I) Let tH be any argument of H D . For every argument tA of AD, if tH contains a common variable
with t A , then tA is a subterm of tHo
(a-2) For every argument tA of AD, if tA is a subterm
of an argument tH of HD , then no other argument of
AD is a subterm of tHo
(b) There exist arguments tl of Al and t z of Az such
that the following hold.
(b-l) vars(A 1)nvars(A 2 )=vars( t1)nvars( t2)'
(b-2) Either tl is a subterm of t z , t2 is a subterm of tl
or vars(t 1)nvars(t 2 )=0.
Then, from P U {C}, we can obtain a linear recursive
non-ascending program which define the predicate of H
by unfold/fold transformation.
Proof. As shown in [16], we can get a solution of the
foldability problem for P and C. Then, obviously, a
linear recursive program is obtained.
0

Example 3.4 Let P be a linear recurSIve nonascending program as follows.
C1 : subseq([],L).
C2 : subseq([X\L],[Y\M]) f - X = Y /\ subseq(L,M).
C3 : subseq([X\L],[Y\M]) f - subseq([X\L],M).
Let C be a non-ascending definition clause for P as follows.
C: csub(X,Y,Z) f - subseq(X,Y), subseq(X,Z).
Then, P U {C} can be transformed into a linear recursive
non-ascending program as follows.
csub([],Y,Z).
csub([A\X],[B\Y],Z) f - A = B /\ cs(A,X,Y.Z).
csub([AIX],[BIYJ,Z) f - csub([AIX]'Y,Z).
cs(A,X,Y,[BIZ]) f - A = B /\ csub(X,Y,Z).
cs(A,X,Y,[BIZ]) f - cs(A,X,Y,Z).
Though Proietti and Pettrossi showed one more
class [16], we will not discuss this here.
Now, we get the following theorem.
Theorem 3.6 Let P be a linear recursive non-ascending
program and C be a linear term definition formula for
P of the form H f - Al 1\ VX(A2 1\ B2 ---+ A3 /\ B 3), such
that the following hold.
(a) Hypothesis (a) of Lemma 3.5 holds for P.
(b) Let 51 be the set of all the arguments of AI, and
5 i be the set of all the arguments of Ai and Bi for
i = 2,3. Then, there exist two terms tj and Sj in
some 5 j (tj i= Sj,j = 1,2 or 3) such that the following
hold.
(b-l) there exists a term t k in 5 k (j i= k) such that
. vars(5j )nvars(5k)=vars( tj )nvars(tk), and

470

. either tj is a subterm of tk, tk is a subterm of tj or
vars( t j )nvars( tk )=0.
(b-2) There exists a term Sz of Sz (l i= j, k) such that
the same relations as above hold for Sj and Sz.
(b-3) Sk contains no common variable with Sz.
Then, there exists a definition formula set V for P such
that P u {C} is closed with respect to V.

unfolding G, or they are unified with terms consisting of
constants and global variables by reduction w.r.t. =.
We believe that techniques such as mode analysis are
available to guarantee that every applicable negative unfolding satisfies the above conditions.

Proof Obvious from Theorem 3.3 and Lemma 3.5.

Negative unfolding should be applied without instantiating global variables. In some cases, this restriction may
be critical. However, we can deal with most of those
cases by adding positive atoms to the formula such tha~
the globaJ variables can be instantiated by applying positive unfolding at those atoms. Atoms with predicates
which specify data types (cf. list) are available. For
example, with the definitions of 'member' and '<' in Example 2.1, negative unfolding can not be applied to the
definite formula below.
less-than-all(X,L) +- V Y(member(Y,L) -7 X 0
then It I = 1 + Itll + ... + Itnl
else It I = 0
It is then possible to introduce weight-functions on
atoms.

in G'.
The binary relations descendent and ancestor, defined on
atoms in goals, are the transitive closures of the direct descendent and direct ancestor relations respectively. For
A an atom in G and B an atom in G', A is an ancestor
of B is denoted as A >pr B ("pr" stands for proof tree).
Notice that we also speak about one goal G' being an ancestor (or descendent) of another goal G. This terminology refers to the obvious relationships between goals in
an SLD-tree and should not be confused with the prooftree based relationships between literals, introduced in
the previous definition. The following definition does
introduce a relationship between goals, based on definition 3.3.
Definition 3.4 Let G and G' denote two different nodes
in an SLD-tree T. Let R be the computation rule used
in T. Then G' covers G iff
1. R( G') and R( G) are atoms with the same predicate

2. R( G') >pr R( G)
Notice that G' covers G implies that G' is an ancestor of
G.
We need one more piece of terminology.
Definition 3.5 Let G and G' denote two different nodes
in an SLD-tree T. We call G' the youngest covering ancestor of G iff

1. G' covers G

2. For any other node G ft such that Gil covers G, we
have that Gil covers G'
We are now finally able to formulate the following algorithm:
Algorithm 3.6

Definition 3.2 Let p be a predicate of arity nand S=
{al,"" am}, 1 ~ ak ~ n, 1 ~ k ~ m, a set of argument
positions for p. We define 1.lpls : {AlA is an atom with
predicate symbol p} ~ IN as follows:
Ip(tll' .. ,tn)lpls = Itall + ... + Itam I

Input
a definite program P
a definite goal ~A

The next two definitions introduce useful relations on
literals and goals in an SLD-tree.

Initialisation
T := {( +-A,l)}
Pr:= 0

Definition 3.3 Let (G,i) = ((~AI, ... ,Aj, ... ,An),i)
be a node in an SLD-tree T, let R( G) = Aj be the
call selected by the computation rule R, let H ~
Bll ... ,Bm be a clause whose head unifies with Aj
and let (J = mgu(Aj, H) be the most general unifier. Then (G, i) has a son (G' , k) in T, (G' , k) =
(( ~All"" Aj - I , B ll ···, Bm, Aj+l,';" An)(J, k).
We
say that BI(J, ... , Bm(J in G' are direct descendents of Aj
in G and that Aj in G is a direct ancestor of BI(J, . .. , Bm(J

Output
a finite SLD-tree

T

for P U {+- A}

Terminated := 0
Failed:= 0
For each recursive predicate pin in P and
for the derivation D in T:
SplD := {l, ... , n}

While there exists a derivation D in
D f/. Terminated do
Let (G, i) name the leaf of D

T

such that

476

Select the leftmost atom p( t 1 , ..• ,t n ) in G
satisfying the following condition:
If p is recursive and there is
a youngest covering ancestor (G', j) of (G, i) in D
then IR(G')lp,Sp,D new > Ip(t 1 , ..• , tn)lp,Sp,D new where
Sp,D new = Sp,D \ Sp,Dremove and
Sp,Dremove

... rev([l,2IXsJ,[],Zs)

... rev([2IXs],[1],a)

=

{ak E Sp,D IIp(t 1, ... , tn)lp,{ak} > IR(G')lp,{ak}}
If such an atom p( t 1 , ••. ,tn ) can be found
then

.... rev(Xs,[2,1],Zs)

R(G) :=P(tl, ... ,tn)
Let Derive( G, i) name the set of all derivation steps
that can be performed
If Derive( G, i) = 0
then
Add D to Terminated and Failed
else
Let Descend(R(G), i) name the set of
all pairs ((R(G), i), (BO,j)), where
- B is an atom in the body of a clause
applied in an element of Derive( G, i)
- 0 is the corresponding m.g. u.
- j is the number of the corresponding
descendent of (G, i)
Expand D in T with the elements of Derive( G, i)
Add the elements of Descend( R( G), i) to Pr
For every newly created extension D' of D and
for every recursive predicate q in P:
if q = p and (G, i) has a covering ancestor in D
new
then Sq,D' := Sq,D
else Sq,D' := Sq,D
else
Add D to Terminated
Endwhile
We have the following theorem.
Theorem 3.7 Algorithm 3.6 terminates. If a definite
program P and a definite goal -A are given as inputs,
its output T is a finite (possibly incomplete) SLD-tree for

P U {-A}.
Proof The theorem is an immediate consequence of
0
proposition 3.1 in [Bruynooghe et al., 1991aJ.
Example 3.8 The SLD-tree generated by algorithm 3.6
for the program and the query from example 2.2, are
depicted in figure 1. ("reverse" has been abbreviated to
"rev" .)

4
4.1

Combining These Techniques
Introduction

In the previous section, we introduced an algorithm for
the automatic construction of (incomplete) finite SLDtrees. In this section, we present sound and complete

Zs=[2.1]

~ Xs=[X'IXs']

xs=[/

o

~

.... rev(Xs',[X',2,l],a)

Figure 1: The SLD-tree for example 3.8.

partial deduction methods, based on it. Moreover, these
methods ar.e guaranteed to terminate. The following example shows that this latter property is not obvious, even
when termination of the basic unfolding procedure is ensured. We use the basic partial deduction algorithm from
[Benkerimi and Lloyd, 1990], together with our unfolding algorithm.
Example 4.1 For the reverse program with accumulating parameter (see example 2.2 for the program and the
starting query), an infinite number of (finite) SLD-trees
is produced (see figure 2). This behaviour is caused by
the constant generation of "fresh" body-literals which,
because of the growing accumulating parameter, are not
an instance of any atom that was obtained before.
In [Benkerimi and Lloyd, 1989], it is remarked that a solution to this kind of problems can be truncating atoms
put into A at some fixed depth bound. However, this
again seems to have an ad-hoc flavour to it, and we therefore devised an alternative method, described in the next
section.

4.2

An algorithm for partial deduction

We first introduce some useful definitions and prove a
lemma.
Definition 4.2 Let P be a definite program and p a
predicate symbol of the language underlying P. Then a
pp' -renaming of P is any program obtained in the following way:
• Take P together with a fresh-duplicate-copy of
the clauses defining p.
• Replace p in the heads of these new clauses by some
new (predicate) symbol pi (of the same arity as p).

477
• Replace p by p' in any number of goals in the bodies
of (old and new) clauses.
___ rev([1,2IXs],[],Zs)

~

rev([2IXs],[1],Zs)

--- rev(Xs,[2,1],Zs)
Zs=[2,1]

~XS=[X'IXS']

Xs=[Y
o

~

.... rev(Xs',[X',2,1],Zs)

.... rev(Xs',[X',2,l],Zs)

o

--- rev(Xs",[X",X',2,1],Zs)

--- rev(Xs",[X",X',2,1],Zs)

Figure 2: An infinite number of (finite) SLD-trees.

Lemma 4.3 Let P be a definite program and Pr a pp'renaming of P. Let G be a definite goal in the language
underlying P. Then the following hold:
• Pr U {G} has an SLD-refutation with computed answer e iff P U {G} does.

• Pr U {G} has a finitely failed SLD-tree iff P U {G}
does.

Proof There is an obvious equivalence between SLDderivations and -trees for P and Pr •
0
Definition 4.4 Let P be a definite program and p a
predicate symbol of the language underlying P. Then
the complete pp' -renaming of P is the pp'-renaming of P
where p has been replaced by p' in all goals in the bodies
of clauses.
Our method for partial deduction can then be formulated as the following algorithm.

Algorithm 4.5
Input
a definite program P
a definite goal ~A =~p(tl, .. . , t n )
in the language underlying P
a predicate symbol p', of the same arity as p,
not in the language underlying P
Output
a set of atoms A
a partial deduction P/ of Pr ,
the complete pp'-renaming of P, wrt A
Initialisation
Pr := the complete pp'-renaming of P
A := {A} and label A unmarked
While there is an unmarked atom B in A do
Apply algorithm 3.6 with Pr and ~B as inputs
Let TB name the resulting SLD-tree
Form PrB, a partial deduction for B in Pr , from TB
Label B marked
Let AB name the set of body literals in Pr B
For each predicate q appearing in an atom in AB
Let msg q name an msg of all atoms having q
as predicate symbol in A and AB
If there is an atom in A having q as predicate
symbol and it is less general than msgq
then remove this atom from A
,If now there is no atom in A having q as
predicate symbol
then add msgq to A and label it unmarked
Endfor
Endwhile
Finally, construct the partial deduction P/ of Pr wrt A:
Replace the definitions of the partially deduced
predicates by the union of the partial deductions Pr B
for the elements B of A.
We illustrate the algorithm on our running example.

Example 4.6
complete renaming of the reverse program:
reverse( [] ,L,L) .
reverse([X\Xs]'Y s,Zs) ~ reverse'(Xs,[X\Y s]'Zs).
reverse'([],L,L ).
reverse'([X\Xs],Y s,Zs) ~ reverse'(Xs,[X\Ys],Zs).
partial deduction for ~reverse([1,2\Xs],[],Zs):
reverse( [1 ,2], [], [2,1]).
reverse([1,2,X\Xs]'[],Zs) ~ reverse'(Xs,[X,2,1],Zs).
partial deduction for ~reverse'(Xs,[X,2,1 ]'Zs):
reverse'( [] ,[X,2,1] ,[X,2,1]).
reverse'( [X'\Xs], [X,2,1] ,Zs) ~
reverse'(Xs, [X',X,2, 1],Zs).
msg of reverse'(Xs,[X,2,1]'Zs) and
reverse'(Xs,[X',X,2,1 ],Zs): reverse'(Xs,[X,Y,Z\Y s],Zs)

478

partial deduction for +--reverse'(Xs,[X,Y,ZIYs}'Zs):
reverse'( [J ,[X, Y,ZIY sJ ,[X,Y,ZI YsJ).
reverse'([X'IXs],[X,Y,ZIY s],Zs) +-reverse'(Xs ,[X' ,X, Y,Z IY s] ,Zs).

Corollary 4.9 Let P be a definite program, A =

p( i 1 , •.• , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let A be the set of
atoms and P/ be the program output by algorithm 4.5.
Let G =+--Al, ... , Am be a goal in the language underlying P, consisting of atoms that are instances of atoms
in A. Then the following hold:

resulting set A:
{reverse([1 ,2IXs] ,[J,Zs ),reverse'(Xs,[X,Y,ZIYs],Zs)}
resulting partial deduction:
reverse( (1,2],[J ,[2,1]).
reverse((1 ,2,XIXs],[J ,Zs) +-- reverse'(Xs,[X,2,1]'Zs).
reverse'( [], [X, Y,ZIYs] ,[X,Y,Z IYs]).
reverse'([X'IXs],[X,Y,ZIY s],Zs) +-reverse'( Xs, [X' ,X, Y,Z IY sJ,Zs).

• P/ U {G} has an SLD-refutation with computed an- .
swer () iff P U {G} does.
• P/ U {G} has a finitely failed SLD-tree iff P U {G}
does.

We can prove the following interesting properties of
algorithm 4.5.
Theorem 4.7 Algorithm 4.5 terminates.
Proof
Due to space restrictions,
(Martens and De Schreye, 1992].

we refer to
o

Theorem 4.8 Let P be a definite program, A
p( i 1 , .•• , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let A be the (finite) set
of atoms and P/ be the program output by algorithm 4.5.
Then the following hold:
• A is independent.
• For any goal G =+--Al, . .. , Am consisting of atoms
that are instances of atoms in A, P/ U {G} is Acovered.
Proof
• We first prove that A is independent.
From the way A is constructed in the For-loop, it
is obvious that A cannot contain two atoms with
the same predicate symbol. Independence of A is
an immediate consequence of this.
• To prove the second part of the theorem, let Pr * be
the subprogram of P/ consisting of the definitions
of the predicates in P/ upon which G depends. We
show that Pr * U {G} is A-closed.
Let A be an atom in A. Then the For-loop in algorithm 4.5 ensures there is in A a generalisation of
any body literal in the computed partial deduction
for A in Pr'. The A-closedness of P/ U {G} now
follows from the following two facts:
1. Pr ' is a partial deduction of a program (Pr ) wrt
A.
2. All atoms in G are instances of atoms in A.

o

Proof The corollary is an immediate consequence of
lemma 4.3 and theorems 2.1 and 4.8.
0
Proposition 4.10 Let P be a definite program and A
be an atom used as inputs to algorithm 4.5. Let A be
the set of atoms output by algorithm 4.5. Then A E A.
Proof A is put into A in the initialisation phase. From
definition 4.4, it follows that no clause in Pr contains a
condition literal with the same predicate symbol as A.
Therefore, A will never be removed from A.
0
This proposition ensures us that algorithm 4.5 does
not suffer from the kind of specialisation loss mentioned
in section 2.1: The definition of the predicate which appears in the query +--A, used as starting input for the
partial deduction, will indeed be replaced by a partial
deduction for A in P in the program output by the algorithm.
Finally, we have:
Corollary 4.11 Let P be a definite program, A =

p( i 1 , ... , in) be an atom and p' be a predicate symbol
used as inputs to algorithm 4.5. Let P/ be the program
output by algorithm 4.5. Then the following hold for any
instance A' of A:
• P/ U {+--A'} has an SLD-refutation with computed
answer () iff P U {+--A'} does.
• P/ U {+--A'} has a finitely failed SLD-tree iff P U

{ +-- A'} does.
Proof The corollary immediately follows from corollary 4.9 and proposition 4.10.
0
Theorem 4.7 and corollary 4.11 are the most important results of this paper. In words, their contents can
be stated as follows. Given a program and a goal, algorithm 4.5 produces a prograrri which provides the same
answers as the original program to the given query and
any instances of it. Moreover, computing this (hopefully
more efficient) program terminates in all cases.

479

5

Discussion and Conclusion

In [Lloyd and Shepherdson, 1991], important criteria ensuring soundness and completeness of partial deduction are introduced. In the present paper, we started
from a recently proposed strategy for finite unfolding
([Bruynooghe et al., 1991a]) and developed a procedure
for partial deduction of definite logic programs. We
proved this procedure produces programs satisfying the
mentioned criteria and, in an important sense, showing
the desired specialisation. Moreover, the algorithm terminates on all definite programs and goals.
The unfolding method as it is presented in section 3
was proposed in [Bruynooghe et al., 1991a]' but appears
here for the first time in this detailed and automatisable form, specialised for object level programs. It
tries to maximise unfolding while retaining termination.
We know, however, of two classes of programs where
the first goal is not achieved. First, meta programs
require a somewhat more refined control of unfolding.
This issue is addressed in [Bruynooghe et ai., 1991a].
We refer the interested reader to that paper (or to
[Bruynooghe et al., 1991b]) for further comments on this
topic. Second, (datalog) programs where the information
contained in constants appearing in the program text
plays an important role, are not treated in a satisfactory
way. Further research is necessary to improve the unfolding in this case. (A combination of our rule with the Rv
computation rule seems promising.) As far as the used
unfolding strategy does maximise unfolding, however, it
probably diminishes or eliminates the need for dynamic
renaming as proposed in [Benkerimi and Hill, 1989].
We now compare briefly algorithm 4.5 with the partial deduction procedure with static renaming presented
in [Benkerimi and Lloyd, 1990]. First, we showed above
that our procedure terminates for all definite programs
and queries while the latter does not. The culprit
of this difference in behaviour is (apart from the unfolding strategy used) the way in which msg's are
taken. We do this predicatewise, while the authors of
[Benkerimi and Lloyd, 1990] only take an msg when this
is necessary to keep A independent. This may keep more
specialisation (though only for predicates different from
the one in the starting goal), but causes non-termination
whenever an infinite, independent set A is generated (as
illustrated in example 4.1). Observe, moreover, that we
have kept a clear separation between the issues of control
of unfolding and of ensuring soundness and completeness. The use of algorithm 3.6 - or further refinements
(see above) - guarantees that all sensible unfolding and therefore specialisation - is obtained. The way in
which algorithm 4.5, in addition, ensures soundness and
completeness, takes care that none of the obtained specialisation is undone. Therefore, it does not seem worthwhile to consider more than one msg per predicate. Note
that one can even consider restricting the partial deduc-

tion to the predicate in the starting query and simply
retaining the original clauses for all other predicates in
the result program. This can perhaps be formalised as a
partial deduction where only a 1-step trivial unfolding is
performed for these predicates.
Next, the method in [Benkerimi and Lloyd, 1990] is
formulated in a somewhat more general framework than
the one presented here. A reformulation of the latter
incorporating the concept of L-selectability and allowing more than one literal in the starting query seems
straightforward. However, a generalisation to normal
programs and queries and SLDNF-resolution while retaining the termination property, is not immediate. In
e:g. [Benkerimi and Lloyd, 1990], it is proposed that
during unfolding, negated calls can be executed when
ground and remain in the resultant when non-ground.
This of course jeopardises termination, since termination of "ordinary" ground logic program execution is not
guaranteed in general. One solution is restricting attention to specific subclasses of programs (e.g. acyclic
or acceptable programs, see [Apt and Bezem, 1990],
[Apt and Pedreschi, 1990]). Another might be to use an
adapted version of our unfolding criterion in the evaluation of the ground negative call, and to keep the latter one in the resultant whenever the SLD(NF)-tree produced is not a complete one. Yet a third way might be
offered by the use of more powerful techniques related to
constructive negation (see [Chan and Wallace, 1989]).
Finally, [Gallagher and Bruynooghe, 1990] presents
another approach to partial deduction focusing both on
soundness and completeness and on control of unfolding.
The main difference is the control of unfolding by a condition based on maximal deterministic paths, where our
approach is based on maximal data consumption, monitored through well-founded measures.

References
[Apt and Bezem, 1990] K. R. Apt and M. Bezem.
Acyclic programs.
In D. H. D. Warren and
P. Szeredi, editors, Proceedings ICLP'90, pages 617633, Jerusalem, June 1990. The MIT Press. Revised
version in New Generation Computing, 9(3 & 4):335364.
[Apt and Pedreschi, 1990] K. R. Apt and D. Pedreschi.
Studies in pure prolog: Termination.
In J. W.
Lloyd, editor, Proceedings of the Esprit Symposium on
Computational Logic, pages 150-176. Springer-Verlag,
November 1990.
[Benkerimi and Hill, 1989] K. Benkerimi and P. M. Hill.
Supporting transformations for the partial evaluation of logic programs. Technical report, Department
of Computer Science, University of Bristol GreatBritain, 1989.
'

480
[Benkerimi and Lloyd, 1989J K. Benkerimi and J. W.
Lloyd. A procedure for the partial evaluation of logic
programs. Technical Report TR-89-04, Department
of Computer Science, University of Bristol, GreatBritain, May 1989.

[Martens and De Schreye, 1992J B. Martens and D. De
Schreye. Sound and complete partial deduction with
unfolding based on well-founded measures. Technical
Report CW-137, Departement Computerwetenschappen, K.U.Leuven, Belgium, January 1992.

[Benkerimi and Lloyd, 1990J K. Benkerimi and J. W.
Lloyd. A partial evaluation procedure for logic programs. In S. Debray and M. Hermenegildo, editors, Proceedings NACLP'90, pages 343-358. The MIT
Press, October 1990.

[Safra and Shapiro, 1986J S. Safra and E. Shapiro. Meta
interpreters for real. In Information Processing 86,
pages 271-278, 1986.

[Bruynooghe et al., 1991aJ M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding infinite unfolding during partial deduction of logic
programs. In V. Saraswat and K. Ueda, editors, Proceedings ILPS'91, pages 117-131, October 1991.
[Bruynooghe et al., 1991bJ M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding
infinite unfolding during partial deduction. Technical
Report CW-126, Departement Computerwetenschappen, K.U.Leuven, Belgium, March 1991.
[Chan and Wallace, 1989J D. Chan and M. Wallace. A
treatment of negation during partial evaluation. In
H. D. Abramson and M. H. Rogers, editors, Proceedings il1eta '88, pages 299-318. MIT Press, 1989.
[Gallagher and Bruynooghe, 1990]
J. Gallagher and M. Bruynooghe. The derivation of
an algorithm for program specialisation. In D. H. D.
Warren and P. Szeredi, editors, Proceedings ICLP'90,
pages 732-746, Jerusalem, June 1990. Revised version
in New Generation Computing, 9(3 & 4):305-334.
[Gallagher, 1986] J. Gallagher. Transforming logic programs by specialising interpreters. In Proceedings
ECAI'86, pages 109-122, 1986.
[Komorowski,1981J H. J. Komorowski. A specification
of an abstract Prolog machine and its application to
partial evaluation. Technical Report LSST69, Linkoping University, 1981.
[Komorowski, 1989] H. J. Komorowski. Synthesis of programs in the framework of partial deduction. Technical
Report Ser.A, No.81, Departments of Computer Science and Mathematics, Abo Akademi, Finland, 1989.
[Levi and Sardu, 1988J G. Levi and G. Sardu. Partial
evaluation of metaprograms in a multiple worlds logic
language. New Generation Computing, 6(2 & 3), 1988.
[Lloyd and Shepherdson, 1991J J. W. Lloyd and J. C.
Shepherdson. Partial evaluation in logic programming.
Journal of Logic Programming, 11(3 & 4):217-242,
1991.

The Mixtus approach to
[Sahlin, 1990j D. Sahlin.
automatic partial evaluation of full Prolog.
In
S. Debray and M. Hermenegildo, editors, Proceedings
NACLP'90, pages 377-398, 1990.
[Sterling and Beer, 1986] L. Sterling and R. D. Beer. Incremental flavor-mixing of meta-interpreters for expert
system construction. In Proceedings ILPS'86, pages
20-27. IEEE Compo Society Press, 1986.
[Sterling and Beer, 1989J L. Sterling and R. D. Beer.
Metainterpreters for expert system construction.
Journal of Logic Programming, pages 163-178, 1989.
[Takeuchi and Furukawa, 1986J A. Takeuchi and K. Furukawa. Partial evaluation of Prolog programs and its
application to metaprogramming. In H.-J. Kugler, editor, Information Processing 86, pages 415-420, 1986.
[Venken and Demoen, 1988J R. Venken and B. Demoen.
A partial evaluation system for Prolog : Some practical considerations. New Generation Computing, 6(2
& 3):279-290, 1988.
[Venken, 1984J R. Venken. A Prolog meta interpreter
for partial evaluation and its application to source to
source transformation and query optimization. In Proceedings ECAI'84, pages 91-100. North-Holland, 1984.

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992

481

A Framework for Analysing the Termination of Definite Logic
Programs with respect to Call Patterns
Danny De Schreye'"

Kristof Verschaetse t

Maurice Bruynooghe>4<

Department of COlnputer Science, K.U.Leuvell,
Celestijnenlaan 200A, B-3001 Heverlee, Belgium.
e-Inail: {dannyd,kristof,nlaurice }@cs.kuleuven.ac.be

Abstract
We extend the notions 'recurrency' and 'acceptability'
of a logic program, which were respectively defined in
the work of M. Bezem and the work of K. R. Apt and
D. Pedreschi, and which were shown to be equivalent
to respectively termination under an arbitrary computation rule and termination under the Prolog computation
rule. We show that these equivalences still hold for the
extended definitions. The main idea is that instead of
measuring ground instances of atoms, all possible calls
are measured (which are not necessarily ground). By
doing so, a more practical technique is obtained, in the
sense that "more natural" measures can be used, which
can easily be found automatically.

1

Introduction

In the last few years, a strong research effort in the field
of logic programming has addressed the issue of termination. From the more theoretical point of view, the results
obtained by Vasak and Potter [1986]' Baudinet [1988]'
Bezem [1989], Cavedon [1989], Apt and Pedreschi [1990],
and Bossi et ai. [1991] have provided several frameworks
and basic techniques to formulate and solve questions
regarding the termination of logic programs in semantically clear and general terms. Other researchers, such
as Ullman and Van Gelder [1988], Plumer [1990], Wang
and Shyamasundar [1990], Verschaetse and De Schreye
[1991], and Solm and Van Gelder [1991] have provided
practical and automatable tecliniques for proving the termination of logic programs with respect to certain classes
of queries at compile time.
In this paper, we propose an extension of the theoretical frameworks for the characterisation of terminating programs and queries proposed in [Bezem 1989] and
[Apt and Pedreschi 1990]. The framework does not only
provide slightly more general results, but also increases
the practicality of the techniques in view of automation.
·Supported by the National Fund for Scientific Research.
tSupported by ESPRIT BRA COMPULOG project nr. 3012.

Let us recall some definitions from [Bezem 1989] in
order to explain our motivation and the intuition behind
our approach.
Definition 1.1 (see [Bezem 1989]; Definition 2.1) A level
mapping for a definite logic program P is a mapping
1.1: Bp -+ IN.
Definition 1.2 (see [Bezem 1989]; Definition 2.2) A
definite logic program P is recurrent if there exists a
level mapping 1.1, such that for each ground instance
A-B l , ••. , Bn of a clause in P, IAI > IBi!, for each
i = 1, .. . ,n.
Definition 1.3 (see [Bezem 1989]; Definition 2.7) A definite logic program P is terminating if all SLD-derivations
for (P, -G), where G is a ground goal, are finite.
One of the basic results of [Bezem 1989] is that a program is recurrent if and only if it is terminating. Although this result is very interesting from a theoretical
perspective, it is not a very practical one in terms of automated detection of terminat.ing programs and queries.
The problem comes from the fact that the definition of
recurrency requires that the level mapping "compares"
the head of each ground instance of a clause with every corresponding atom in the body and imposes a decrease. Intuitively, what would be preferable is to obtain
a well-founding based on a measure function (or level
mapping), which only decreases on each recursive call to
a same predicate. This corresponds better to our intuition, since nontermination (for pure logic programs) can
only be caused by infinite recursion.
As we stated above, the problem is not merely related
to our intuition on the cause of nontermination, but more
importantly to the practicality of level mappings. Consider the following example.
Example 1.4

p(O)·
p([ HIT)) -

q([HIT)), p(T).

q( []).
q([HIT))

q(T).

-

482
It is not possible to take as level mapping a function
that maps ground instances p(:e) and q(:e) to the same
level, namely list-length(:e) if :e is a ground list, and 0
otherwise. Instead, the definition of recurrency obliges
us to take a level mapping that has a "unnatural" offset
(1 in this case).

Ip(:e)1
Iq{:e )1

1. We first compute all atoms that call occur as calls
during any SLD-derivation for the top-level goal( s)
under consideration.
2. We use an extended notion oflevel mapplllg, defined
on all such atoms - not only the growld ones.
3. We have an adapted definition of recurrency, with
as its most important features:

list-length(:e) + 1
list-length(:e ).

(a) the condition IAI > IBil is not imposed 011
growld instances of a clause, but instead, 011
each instance obtained after unification with a
(possible) call,

In a naive attempt to improve on the results of
[Bezem 1989], one could try to start from an adapted
definition for a recurrent program, in which the relation
IAI > IBil would only be required if A and Bi are atoms
with the same predicate symbol. However, the equivalence with termination would immediately be lost even for programs having only direct recursion - as the
following example shows.
Example 1.5

appenci([), L, L).
appenci((HIS], T,
p([HITJ) -

[H/uD -

append(S, T, U).

append(X, Y, Z), peT).

An "extended" notion of recurrency, where the level
mapping only relates the measure of ground instances of
the recursive calls, would hold with respect to the level
mapping:

Ip(:e )1
lappenci(:e, y, z)1

list-length(:e )
list-length(:e) .

On the other hand, the program is clearly not terminating - if it would be terminating, then we would have
shown that append/3 terminates for a call with all three
arguments free.
The heart of the problem is that in the definition of
recurrency, the level mapping is used for two quite distinct purposes at the same time. First, the level mapping
does ensure that on each derivation step, the measure of
a recursive descending call is smaller than the measure of
the ancestor call (or at least: for each ground instance of
such a derivation step). Second, since we are only given
that the top level goal is ground (or, in a more general
version of the theorem, bounded) - but we have no information on the instantiation of any of the descending
calls - the level mapping is also used to ensure that we
have some upper limit on the measures for the calls of
the (independent) recursive subcomputation evoked by
the original call. In the current definition, this is done
by imposing that the level also decreases between a call
and its descendants that are not related through recursion.
The way in which we address the problem here, differs
from the approach in [Bezem 1989] in three ways:

(b) "the decrease IAI > IBil is only imposed if A
and Bi are calls to the same predicate symbol.
(This is for direct recursion - in the context of
.indirect recursion, the condition is more complex).
One of the side effects of taking this approach is
that there is no more necessity to start the analysis
for one ground or bounded goal. The technique works
equally well when we start from any general set of
atoms. The additional advantage that we gain here is
that in practice, we are usually interested in the termination properties of a program with respect to some
call pattern. Such call patterns can always be specified in terms of abstract properties of the arguments in
the goals through mode information, type lllformation
or combined (rigid or integrated) mode and type information (see [Janssens and Bruynooghe 1990)). Any such
call pattern corresponds to a set of atoms in the concrete domain, and can therefore be analysed with our
approach.
The paper is organised as follows. In the next section we extend the equivalence theorem of [Bezem 1989]
in the way described above. In section 3 we take
a completely similar approach to extend results of
[Apt and Pedreschi 1990] on left termination. In section 4, we illustrate the improved practicality of
the new framework.
We also indicate how some
simple extensions are likely to provide full theoretical support for the automated technique proposed in
[Verschaetse and De Schreye 1991].
All proofs have been omitted from the paper. They
can be found in [De Schreye and Verschaetse 1992J.

2

Recurrency with respect to a
set of atoms

We first introduce some conventions and recall some
basic terminology. Throughout the paper, P will denote a definite logic program. The extended Herbrand Universe,
and the extended Herbrand Base,
Bffi, associated to a program P, were introduced ill

Up,

483

[Falaschi et al. 1989]. They are defined as follows. Let
Termp and Atomp denote the sets of respectively all
terms and all atoms that can be constructed from the
alphabet underlying to P. The variant relation, denoted ~, defines an equivalence. Up and BP are respectively the quotient sets Termp / ~ and Atomp / ~.
For any term t (or atom A), we denote its class in U:
(B~) as {(A). There is a natural partial order on Up
(and BP), defined as: s S; [if there exist representants s' of sand t' of [in Termp and a substitution
0, such that s' = t'O. Throughout the paper, 5 will denote a subset of B~. We define its closure under < as:
5 e = {A E Bffi \ :3B E 5 : A S; B}.
Definition 2.1 P is terminating with respect to S if for
any representant A' of any element A of 5, every SLDtree for (P, ~ A') is finite.
Denoting the classical notion of a Herbrand Base (of
ground atoms) over P as B p, then with the terminology
of [Bezem 1989] we have:
Lemma 2.2 P is terminating if and only if it is terminating with respect to B p.
Lemma 2.3 If all SLD-derivations for (P, ~A) are finite,
and 0 is any substitution, then all SLD-derivations for
(P, ~AO) are finite.
From lemma 2.3 it follows that in order to verify definition 2.1 for a set 5 ~
it suffices to verify the
finiteness of the SLD-trees for (P, ~A) for only one representant of each element in ..1. It also follows that P is
terminating with respect to a set 5 ~ B~ if and only if it
is terminating with respect to 5 e • In fact, given that P
terminates with respect to 5, it will in general be terminating with respect to a larger set of atoms than those in
se. It is clear that if all SLD-trees for (P, ~A) are finite,
and if H ~Bl' ... , Bn is a clause in P, such that A and
H unify, then all SLD-trees for (P, ~BiO), i = 1, ... , n,
where 0 = mgu(A, H), are finite. We can characterise
the complete set of terminating atoms associated to a
given set S as follows.

B:,

Definition 2.4 For any T ~ B~, define Tp-l(T) =
{BiO E Bffi \ A' is a representant of A E T, H
~ Bl"'" Bn is a clause in P, 0 = mgu(A', H) and
1 ~ i ~ n}.
Denote 1ts = {T E 2B~ \ 5 e ~ T}. 1t s is a complete
lattice with bottom element se.
Definition 2.5 Rs : 1is

-+

1is : Rs(T) = T U Tp-l(Tr.

Lemma 2.7 P is terminating with respect to 5 if and
only if P is terminating with respect to RsTw.
As a result of our construction (in fact: as the very
purpose of it), RsTw contains every call in every SLDtree for any atomic goal of S. Formally:
Proposition 2.8 Let call( P, 5) denote the set of all
atoms B, such that B is the subgoal selected by the
computation rule in some goal of some SLD-tree for a
pair (P, ~A), with A the representant of an element of
S. Then, call(P, 5) ~ RsTw.
We now introduce a variant of the definition of a level
mapping, where the mapping is defined on equivalence
classes of calls.
Definition 2.9 (level mapping)
A level mapping with respect to a set 5 ~ Bffi is a function
\.\ : RsTw -+ IN. A level mapping \.\ is called rigid
i!J:.or all A E Rs jw and for any substitution 0, IAI =
IAOI, i.e. the level of an atom remains invariant under
substitution.
With slight abuse of notation, we will often write IA I,
where A is a representant of A E Bffi. The associated
notion of recurrency with respect to 5 will not be defined on ground instances of clauses, but instead OIl all
instances (H ~Bl"'" Bn}e of clauses H ~Bl"'" En of
P, such that 0 = mgu(A, H), where A is a representant
of an element of Rs Tw. The definition in [Bezem 1989J
does not explicitly impose a decrease of the level mapping at each inference step. The level mapping's values
should only decrease for ground instances of clauses. By
considering more general instances of clauses (as above),
we can explicitly impose a decrease of the level mapping's
value during (recursive) inference steps. As a result, the
adapted level mapping no longer needs to perform different functionalities at once, and we can concentrate on
the real structure of the recursion.
Now, concerning this recursive structure, there are a
number of different possibilities for a new definition of
recurrency, depending on how we aim to deal with indirect recursion. In order not to confuse all issues involved
we first provide a definition for programs P, relying onI;
on direct recursion.
Definition 2.10 A (directly recursive) program P is recurrent with respect to S, if there exists a level mapping
1.1 with respect to 5, such that:
• for any A' representant of

A E Rs jw,

• for any clause H ~Bl"'" Bn in P, such that
mgu( A', H) = 0 exists,

Lemma 2.6 Rs is continuous.
As a result, the least fix-pohl.t for Rs is Rs Tw.

• for any atom Bi, 1 S; i S; n, with the same predicate
symbol as H: IA'I > IBiOI.

484
What is expressed in this definition is that for any two
recursively descending calls with a same predicate symbol in any SLD-tree for (represent ants of) atoms in S,
the level mapping's value should decrease. This condition has the advantage of being perfectly natural and
therefore, of being easy to verify in an automated way.
The only possible problem in view of automation is that
it requires the computation of Rsiw. But, this problem
is precisely the type of problem that can easily be solved
(or approxinlated) through abstract interpretation (see
section 4).
In the presence of indirect recursion, we need a more
complex definition, that deals with the problem that a recursive call with a same predicate symbol as an ancestor
call may only appear after a finite number of inference
steps (instead of in the body of the particular instance
of the applied clause). Tlus can be done in several ways.
We first provide a defuution related to the concept of a
resultant of a finite (incomplete) derivation. Based on
tIus definition, we prove the equivalence with ternunation. After that, we provide a more practical condition,
of which definition 2.10 is an obvious instance for the
case of direct recursion.
First, we need some additional terminology.
Definition 2.11 Let A be an atom and (Go = - A),
G l , G 2 , ••• , G n , (n > 0), a finite, incomplete SLDderivation for (P, _A).
Let 01 , ••• , On be the corresponding sequence of substitutions, and let 0 =
0 10 2 " , On and G n = -B I , ••• , Bm. With the terminology of [Lloyd and Shepherds on 1991] we say that
AO-B1 , • •• , Bm is the resultant of the derivation.
Definition 2.12 A resultant AO-B 1 , ••• , Bm of a
derivation (Go = -A), G l , ..• , Gn , is a recursive resultant for A if there exists i (1 ::; i ::; m), such that Bi has
the same predicate symbol as A.
Definition 2.13 (recurrency wrt a set of atoms)
A program P is recurrent with respect to S, if there exists
a level mapping, 1.1, with respect to S, such that:
• for any A' representant of A E Rs iw,
• for any recursive resultant A'O-B l , ... , B m , for A',
• for any atom B i , 1 ::; i ::; m, with the same predicate
symbol as A': IA'I > IBil.
Proposition 2.14 If P is recurrent with respect to S,
then P terminates with respect to S.
Just as in the framework of Bezem, the converse statement holds as well.
Theorem 2.15
P is recurrent with respect to S if and only if it is terminating with respect to S.

One of the nice consequences of this result is that we
can now relate the concept of a recurrent program in the
sense of [Bezem 1989] to recurrellCY with respect to a set
of (ground) atoms.
Corollary 2.16 P is recurrent if and only if it is recurrent with respect to B p •
It may seem surprising to the reader that two apparently very different notions such as recurrency and recurrency with respect to B p coincide. It is our experience
from our work in termination of wlfolding in the context
of partial deduction ([Bruynooghe et ai. 1991]) that this_
is not unusual. The reason is that conditions occurring
in these contexts require the 11 existence 11 of some wellfounded measure. The specific properties of such measures can take totally different form without loosing the
termination property. The only real difference lies in the
practicality.
We conclude the section by introducing a condition
that implies definition 2.13. This condition has the advantage over definition 2.13 that it does not rely on the
verification of some property for each of a potentially
infinite number of recursive resultants. Instead it only
requires such a verification for a finite number of clauses,
which can be characterised through the minimal, cyclic
collections of P.

Definition 2.17 (minimal cyclic collection)
A minimal cyclic collection of P is a finite sequence of
clauses of P:

such that:
• for each pair (i -=f j), the heads of the clauses, Ai
and A j , are atoms with distinct predicate symbols,

• Ai and
m),
• A~+l

Ai have the same predicate symbols (1 < i

:::;

has the same predicate symbol as AI'

Only a finite number of minimal cyclic collections exists.
They can easily be characterised and computed from the
predicate dependency graph for P.
Proposition 2.18
Let S ~ B~ and 1.1 a rigid level mapping with respect to
S, such that for any minimal cyclic collection of P (after
standardizing apart),

485

and for any AI,"" Am E Rsjw, with A~, ... , A~ as
their respective representants, and 0i = mgu(Ai, An,
(1 :::; i :::; m), the following condition holds:

{

IA~Oll

~ IA~I }

IA~Om-ll

>

IA~I

Then, P is recurrent with respect to 5.
The conditions in proposition 2.18 seem rather unnatural at first sight and need some clarification. First, observe that in the case of direct recursion - except for the
rigidity of the level mapping - the conditions coincide
with those of definition 2.10.
For the case of indirect recursion, the conditions that
one would intuitively expect, are that for each minimal
cyclic collection

-

Am -

BL···,A~, ... ,B~I

Bi,···, A:n+l , ... , B;:'m

and each A~ representant of Al E Rs jw, such that 0 =
mgu(A~, Ad and Oi = mgu(AL Ai), 1 < i :::; m, exist and
are consistent, we have

IA~I > IA~+1001" ·Oml·
The problem is that such a condition is not correct. Consider the clauses:

p(a,[_IX])
p( b, X)
q(b,X)
q(a, [_IX])

++++-

p(b,X).
q(a, [_IX]).
p(a, [_IX]).
q(b, X).

Acceptability with respect to
a set of atoms

All definitions and propositions from the previous section can be specialised for the Prolog computation rule.
Following [Apt and Pedreschi 1990], we call an SLDderivation that uses Prolog's left-to-right computation
rule, an LD-derivation.

JJIA~I > IA:n+lOml·

Al

3

(ell)
(el2)
(el3)
(cl4)

There are 4 associated minimal collections: ( cll ),
Consider for instance
(cl2,cl3), (cl3,cl2) and (cl4).
the derivation +-p(a, [_, _]), +-p( b, [_]), +-q(a, [_, _]),
-q( b, [-]), -p( a, [-, _D.
The problem is caused by resultants associated to
derivations that start with a clause from one minimal
cyclic collection - say (cl2) in the collection ( cl2 ,cl3) then shift to applying another collection, (cl4), and only
after this resume the first collection and apply clause
(cl3). The head of the third clause, q( b, X), does not
unify with q(a, [_IX']), and therefore, the condition on
the cycle (cl2,cl3) can not be applied.
So, we have to impose th; condition in proposition
2.18. It states that, even if the next call in the traversal
of a mininlal collection (An is not really related - as
an instance - to a call we obtained earlier (A~ei-l)' but
if - through the intermediate computation in another
minimal collection - the level between these two has
decreased anyway, then the final conclusion bet.ween the
original call to the collection and the indirectly depending one must still hold. We will not discuss the condition
any further here, but we will return to its practicality in
section 4.

Definition 3.1 (left termination wrt 5) Let 5 be
a subset of B:. A program P is left-terminating with
respect to 5 if for any representant A of any element of
5, every LD-derivation is finite.
Recall definitions 2.4 and 2.5. The motivation behind
these definitions was finding an overestimation of all calls
that are possible in any SLD-derivation using an arbitrary computation rule. The fact that no fixed computation rule is used, forces us to take the closure under all
possible instantiations in definition 2.5, and hence Rs j w
contains in general a lot more calls than can really occur
when a particular computation rule is chosen.
In this section, we focus our analysis on computations
that use Prolog's left-to-right computation rule. Therefore, adapted definitions of the Tp- 1 and Rs functions are
needed.
Definition 3.2 For any T ~ Bffi, define: Ppl(T) =
{BieO'l ... O'i-l E Bffi I A' is a representant of A E T,
H +- B 1, ... , Bn is a clause in P, = mgu(A', H), 1 ~
i ~ n, :30'1, ... , O'i-l, such that Vj = 1, ... , i-I: O'j is an
answer for (P, +-BjOO'l .. , O'j-t)}.

e

The answer substitutions O'j are computed using LDresolution. Let 1it;r denote {T E 2B~ I 5 ~ T}.
Definition 3.3 Rt;r : 1it;r
Ppl(T)

-t

1i~-r

: RZ;r (T)

=T u

In a completely analogous way as in the previous section, we find that R~-r is continuous. Hence, the least fix
point R~-r j w contains all atoms that can possibly occur
as a call when P is executed under the Prolog computation rule, and when a representant of an element from 5
is used as query.
Level mappings are now defined on RZ;r. Recursive resultants are constructed using the left-to-right computation rule. This allows us to consider only recursive resultants of the formp(sl,"" sn)-p(t 1 , ••• , tn), B 2 , · · · , Bm·
The analogue of recurrency with respect to a set 5 of
atoms, is acceptability with respect to 5.
Definition 3.4 (acceptability wrt a set of atoms)
A program P is acceptable with respect to 5,
if there exist.s a level mapping 1.\ with respect
to 5, such that for any p( S1, . . . , Sn), representant of an element in R~-r j w, and for any recursive resultant P(Sl,"" sn)e-p(t 1 , ••• , tn), B 2 , . · · , Em:

Ip(sl,,,,,sn)1 > Ip(t1, ... ,tn)l·

486

Theorem 3.5
P is acceptable with respect to S if and only if it is leftterminating with respect to S.
As in section 2, we provide a more practical, sufficient
condition. The result is completely analogous to proposition 2.18.
Proposition 3.6
Let S ~ B: and 1.1 a level mapping with respect to 5,
such that for any minimal cyclic collection of P (after
standardizing apart),
~

Al

Bf, ... , BlI ' A~, ... , B~I

and for any AI' ... ' Am E R~-r jw, with A~, ... , A~
as their respective representants, and with OJ
mgu(Aj, Ai) (1 ~ j ~ m) and crt is a computed an(1 ~ k ~ ij),
swer substitution for (P, '--B~8jcr{ ...
the following condition holds:

crtl)

IA~81 cr~

{

... crI I

~

I

IA~8m-1 cr~-1

... crr:-=~ I >

IA~I
IA~I

}

or integrated types of (Janssens and BruYllooghe 1990].
Abstract interpretation can be applied to automatically infer a safe approximation of Rs jw or R~-r jw (see
[Janssens and Bruynooghe 1990]).
Automated techniques for proving termination use
various types of norms. A norm is a mapping 11.11 : U: ---+
IN. Several examples of norms can be found in the literature. When dealing with lists, it is often appropriate
to use list-length, which gives the depth of the rightmost
branch in the tree representation of the term. A more
general norm is term-size, which counts the number of
function symbols in a term. Another frequently used
norm is term-depth, which gives the maximum depth of
(the tree representation of) a term.
However, we restrict ourselves to semi-linear norms,
which were defined in [Bossi et al. 1991].
Definition 4.1 (semi-linear norm)
A norm 11.11 is semi-linear if it satisfies the folowing conditions:

• IIVII = 0 if V
• IIf(t l

, ..• ,

is a variable, and

in)11

= c+lltil /1+·· ·+1 Itj", II where c E IN,

1 ::; i l < ... < im
on fin.

~

nand c, i l , ••• , im depend only

.lJIA~I

> IA~+1emcri·· ·cr7:l,

Then, P is acceptable with respect to 5.

4

Practicality and automation

A fully automated technique needs to address the following issues:
• safe approximations of Rs j w and R~-r j w must be
computed,
• precise and natural level mappings are needed, and
• the condit.ions in propositions 2.18 and 3.6 must be
automatically verifiable.
For left termination, there is one extra issue:
• some properties of the answer substitutions for the
atoms in R~-r jw are needed; ill particular, after application of a computed answer substitution we want
an estimation of the relationship between the sizes
of the argwnents of the atoms in R~-rjw.
Concerning the first issue, observe that in practice, the
sets of atoms S in the framework are likely to be specified
in terms of call patterns over some abstract domain. The
framework contains no implicit restriction on the kind of
abstractions that are used for this purpose. They could
be either expressing mode or type information, or even
combined mode and type information - as in the rigid

Examples of semi-linear norms are list-length and
term-size.
As was pointed out in [Bossi et al. 1991), proving termination is significantly facilitated if the norm of a term
remains invariant under substitution. Such terms are
called rigid.
Definition 4.2 (rigid term; see [Bossi et al. 1991])
Let 11.11 be a (semi-linear) norm. A term t is rigid with
respect to 11.11 if for any substitution cr, IItcrll = Iltll.
Rigidity is a generalisation of groundnessj by using this
concept it is possible to avoid restricting the definition of
a norm to ground terms only, a restriction that is often
found in the literature.
Given a semi-linear norm and a set of atoms S, a very
natural level mapping with respect to S can be associated
to them.
Definition 4.3 (natural level mapping)
Given is a semi-linear norm 11.11 and a set of atoms s.
1.lnat' the natural level mapping induced by S, is defined
as follows: Vp(t l , • .. ,in) E Rs jw:
Ip(t l

, .•• ,

tn)lnat

:EiEllitill,
= 0

if I :;t: 0
otherwise,

with 1= {i I Vp(Ul,.'.'U n ) E RsTw: Ui is rigid}.
Let us illustrate the practicality of such mappings and of the framework itself - with some examples.

487

Example 4.4
Reconsider example 1.4 from the introduc tion. Assume
that S = {p(:u) I :u is a nil-terminated list}. Let 11.11, be
the list-length norm. The argument positions of all atoms
in Rs j ware rigid under this norm. So, Ip(:u) Inat = 1I:v II,
and Iq(:z: )Inat = 1I:z:II,. The program is directly recursive,
so that it suffices to verify the conditions of definition
2.10.
For the clause p([HIT])+-q([HIT]),p(T) and for each
call p(:u) E Rsjw, with 0 = mgu(:u, [HIT]), we have
Ip(:u)lnat > Ip(T)Olnat' By the same argunlent, the condition on the clause q{[HIT])+-q(T) holds as well. Thus,
the program is recurrent with respect to S under the
natural, list-length level mapping with respect to S.

Assume that e(:z:), f(y) and g(z) are any atoms with
ground terms :v, y and z, and that:

As a second example, we take a program with indirect
recursion. It defines some form of well-formed expressions built from integers and the function symbols +/2,
*/2 and -/1.

In the context of left termination, definition 4.3 can be
adapted to produce equally natural level mappings with
respect to a set S. Obviously, Rs jw should be replaced
by R~-rjw. In the context of left termination there is
an extra issue, namely, (an approximation of) the set of
possible answer substitutions for an atom is needed. The
next example illustrates how this is handled.

Example 4.5

e{X + Y)
e(X)
f(X * Y)
f(X)
g(-(X))
g(X)

++++++-

f(X), e(Y). (ell)
f(X).
(cZ2)

e(X).
integer(X).

=
=

+-

f(X), e(Y).
g(X'), f(Y').

+-

e(X").

+-

1 Since collections are sequel\ces of clauses, cyclic permutatiol\s
should be considered as well.

mgu(f(y), f(X' * yI))
= mgu(g(z), g( -(X"))).

Also assume' that If(X)Oll :2: If(y)1 and Ig(X')021 :2:
Ig(z )1· We then have le(:u)1 > If(X)Oll :2: If(y)1 >
Ig(X')021 :2: Ig(z)1 > le(X")031, so that le(x)1 >
le(X")031, and the conditions of proposition 2.18 (for the
third cycle) are fulfilled. All other cycles can be verified
in a similar way. The conclusion is that the program is
recurrent with respect to S and the very natural termsize level mapping.

Example 4.6

p([],O)·
p([HIT], [GIS])

(cZS)

In the context of our framework, consider the set S =
{e(:u) I :u is ground}. Through abstract interpretation,
we can find that Rs j w ~ B p.
Let 11.ll t be the term-size norm. Again, the argument
positions of all atoms in Rs jw }tore rigid (even ground) under this norm. Thus, le(:u)lnat = 1I:z:llp If(:z:)lnat = 11:vll t
and Ig(:z:)lnat = 11:ull t . The program contains essentiallyl
6 minimal, cyclic collections: (cll), (el3), (ell, cl3, clS ),
(ell, cl4, elS ), (cl2, cl3, clS ), (cl2, cl4, clS ).
Let us consider, as an example, the third collection:

e(X + Y)
f(X' * Y')
g( -(X"))

()3

+-

d(H, [HIT], T).
d(G, [HIT], [HIU])

(d6)

3 x term-size(:v)+2
3 x term-size(:u) + 1
3 x term-size(:z:).

= mgu(e(:u), e(X + Y))

()2 =

g(X), f(Y). (d3)
g(X).
(el4)

The obvious choice for a level mapping for this program is
term-size. However, the program is not recurrent in the
sense of [Bezem 1989] with respect to this norm. Since it
is clearly terminating, a level mapping exists. The most
natural mapping (in the sense of [Bezem 1989]) we were
able to come up with is:
le{:u)1
If(:z:)1
Ig(:u)1

Ol

d(G, [HIT], U),p(U, S).

+-

d(G, T, U).

Assume that S = {p(:u, y) I :u is a nil-terminated list and
y is free}. Notice that Rs j w contains the set {p( x, y) I :z:
and yare free variables}. We are not able to define a level
mapping on Rs jw that can be used to prove recurrency
with respect to S. This is not surprising, since P is not
terminating with respect to S.
However, program P is left terminating with respect
to S. We prove this by showing that P is acceptable with respect to S. The set R~-r Tw is the union
of {p(;z:, y) I x is a nil- terminated list and y is free}
and {d(:v, y, z) I :v and z are free variables and y is a
nil-terminated list}. This can be found by using abstract interpretation. Since there is only direct recursion in program P, it suffices to show that: (1) for
any p(:v,y) E R~-rTw, ip(:v,y)1 > Ip(U,S)Oo-\, where
= mgu(p(:v, y), p([HIT], [GIS])) and 0- is a computed
answer substitution for (P, +- d(G, [HIT], U)O), and (2)
for any d(:v,y,z) E R~-rjw, Id(x,y,z)1 > Id(G,T,U)01,
where () = mgu(d(x,y,z),d(G,[HIT],[HIU])).
Now, in practice, the statement "0- is a computed answer substitution for (P, +- d( G, [HIT], U)O)" can be
replaced by "11[HIT]fJo-lll = 11U()0-11, + I". This latter
statement is a so-called linear size relation, which expresses a relation between the norms of the arguments
of the atoms in the success set of the program. Alternatively, it can also be interpreted as a (non-Herbralld)

o

488

model of the program. For more details we refer to
[Verschaetse and De Schreye 1992], where we describe
an automated technique for deriving linear size relations.
By taking this information into account, and by taking
Ip(;e, y)1 = II:ell, for any p(;e, y) E R~-" jw -notice that ;z;
is rigid with respect to 11.11, - we find: Ip(;e, y)1 = II;ell, =

II[HIT]Oll, =
Ip(U, 5)00-1·

II [HIT]Oo-lI, = II UO o-lI, + 1

>

11U00-1I, =

The second inequality, Id(;e, y, z)1 > Id(G, T, U)oI, is
more easy to prove. TIns time, the list-length of the
second argument can be taken as level mapping. Since
both inequalities hold, we can conclude that the program
is acceptable with respect to the set of atoms that is
considered.
Automatic verification of the conditions for recurrency
and acceptability is handled by reformulating them into
a problem of checking the solvability of a linear system of
inequalities. This part of the work is described in more
detail in [De Schreye and Verschaetse 1992].

References
[Apt and Pedreschi 1990] K. R. Apt and D. Pedreschi.
Studies in pure Prolog: termination. In Proceedings
Esprit symposium on computational logic, pages 150176, Brussels, November 1990.
[Baudinet 1988] M. Baudinet.
Proving termination
properties of Prolog programs: a semantic approach.
In Proceedings of the 3rd IEEE symposium on logic
in computer science, pages 336-347, Edinburgh, July
1988. Revised version to appear in Journal of Logic
Programming.
[Bezem 1989] M. Bezem. Characterizing termination of
logic programs with level mappings. In Proceedings
NACLP'89, pages 69-80,1989.
[Bossi et al. 1991] A. Bossi, N. Cocco, and M. Fabris.
N onns on terms and their use in proving universal
termination of a logic program. Technical Report
4/29, CNR, Department of Mathematics, University
of Padova, March 1991.
[Bruynooghe et ai. 1991] M. Bruynooghe, D. De Schreye, and B. Martens. A general criterion for avoiding
infinite unfolding during partial deduction of logic programs. In Proceedings ILPS'91, pages 117-131, San
Diego, October 1991. MIT Press.
[Cavedon 1989] L. Cavedon. Continuity, consistency,
and completeness properties for logic programs. In
Proceedings ICLP'89, pages 571-584, June 1989.
[De Schreye and Verschaetse 1992] D. De Schreye and
K. Verschaetse. Termination analysis of definite logic

programs with respect to call patterns.
Technical Report CW 138, Department Computer Science,
K.U.Leuven, January 1992.
[Falaschi et al. 1989] M. Falaschi, G. Levi, M. Martelli,
and C. Palamidessi. Declarative modeling of the operational behaviour of logic languages. Theoretical Computer Science, 69(3):289-318,1989.
[Janssens and Bruynooghe 1990]
G. Janssens and M. Bruynooghe. Deriving descriptions of possible values of program variables by means
of abstract interpretation. Technical Report CW 107,
Department of Computer Science, K.U .Leuven, Mardi
1990. To appear in Journal of Logic Progranulling, ill
print.
[Lloyd and Shepherdson 1991] J. W. Lloyd and J. C.
Shepherdson. Partial evaluation in logic programming.
Journal of Logic Programming, 11(3 & 4):217-242, October/November 1991.
[Plumer 1990] L. Plumer. Termination proofs for logic
programs. Lecture Notes in Artificial Intelligence 446.
Springer- Verlag, 1990.
[Sohn and Van Gelder 1991] K. Sohn and A. Van
Gelder. Termination detection in logic programs using argument sizes. In Proceedings 10th symposium on
principles of database systems, pages 216-226. Acm
Press, May 1991.
[Ullman and Van Gelder 1988] J. D. Ullman and A. Van
Gelder. Efficient tests for top-down termination of
logical rules. Journal A CM, 35(2):345-373, April 1988.
[Vasak and Potter 1986] T. Vasak and J. Potter. Characterisation of terminating logic programs. In Proceedings 1986 symposium on logic programming, pages
140-147, Salt Lake City, 1986.
[Verschaetse and De Schreye 1991] K. Verschaetse and
D. De Schreye. Deriving termination proofs for logic
programs, using abstract procedures. In Proceedings
ICLP'91, pages 301-315, Paris, June 1991. MIT Press.
[Verschaetse and De Schreye 1992] K. Verschaetse and
D. De Schreye. Automatic derivation of linear size relations. Technical Report CW 139, Department Computer Science, K.U.Leuven, January 1992.
[Wang and Shyamasulldar 1990] B. Wang and R. K.
Shyamasundar. Towards a characterization of termination of logic programs. In Proceedings of international workshop PLILP'90, Lecture Notes in Computer Science 456, pages 204-221, Linkoping, August
1990. Springer- Verlag.

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992

489

Automatic Verification of GHC-Programs:
Termination
Lutz Pliimer
Rheinische 'Friedrich-Wilhelms-UniversiHit Bonn, Institut fiir Informatik III
D-5300 Bonn 1, Romerstr. 164
lutz@uran.infonnatik.uni-bonn.de

Abstract
We present an efficient technique for the automatic generation of tennination proofs for concurrent logic programs,
taking Guarded Hom Clauses (GHC) as an example. In contrast to Prolog's strict left to right order of evaluation, termination proofs for concurrent languages are complicated by a
more sophisticated mechanism of sub goal selection. We introduce the notion of directed GHC programs and show that
for this class of programs goal reductions can be simulated
by Prolog-like derivations. We give a sufficient criterion for
directedness. Static program analysis techniques developed
for Prolog can thus be applied, albeit with some important
modifications.

1. Introduction
With regard to termination it is useful to distinguish between
two types of software systems or programs: transformational
and reactive [HAP85]. A transformational system receives
an input at the beginning of its operation and yields an output
at the end. If the problem at hand is decidable, termination of
the process is surely a desirable property. Reactive systems,
on the other hand, are designed to maintain some interaction
with their environment. Some of them, for instance operating systems and database management systems, ideally
never terminate and do not yield a fmal result at all. Based on
the process interpretation of Hom clause logic, concurrent
logic programming systems have been designed for many
different applications including reactive systems and transformational parallel systems. While for some of them termination is not a desirable property, for others it is. In this paper we discuss how automatic termination proofs for concurrent logic programs can be achieved automatically.

inductive assertions and termination proof techniques, substantially depend on the strict left to right order of evaluation
in most cases and thus cannot easily be applied to concurrent
languages. Concurrent languages delay sub goals which are
not sufficiently instantiated. Goals which loop forever when
evaluated by a Prolog interpreter may deadlock in the context
of a concurrent language. These phenomena may suggest
that termination proofs for concurrent logic programs require
a different approach. This paper, however, shows that
techniques which have been established for pure Prolog are
still useful in the context of concurrency.
Our starting point is the question under which conditions
reductions of a concurrent logic program can be simulated by
Prolog-like derivations. We take Guarded Hom Clauses
(GHC, see [UED86]) as an example, but our results can
easily be extended to other concurrent logic programming
languages such as PARLOG, (Flat) Concurrent Prolog or
FCP(:). Our basic assumptions are the restriction of unification to input matching, nondetenninistic sub goal selection
and resuming of sub goals which are not sufficiently instantiated. Since we consider all possible derivations, the commit
operator does not need special attention.
In general simulation is not possible: if there is a GHCderivation of g' from g, g' cannot necessarily be derived
with Prolog's computation rule.
One could now try to augment simulation by program
transfonnation. Let, for instance, P' be derived from P by
including all clause body permutations. Although P' may be
exponentially larger than P, there are still derivations which
are not captured.
Example 1.1:
Program: p f- q,r.

s.

q

f-

s,t.

r f- u,v.

v.

Automatic proof techniques for pure Prolog programs
have been described in several papers including [ULG88]
and [PLU90a]. Prolog is characterized by a fixed
computation rule which always selects the leftmost atom.
Deterministic sub goal selection and strict left to right order of
evaluation cannot be assumed for the concurrent languages.

This goal can be reduced to f- t,u by nondeterministic
subgoal selection, but not by a Prolog like computation,
even after adding the following clauses:
p f- r,q. q f- t,s. r f- v,u.

Static program analysis techniques, which are well established for sequential Prolog, such as abstract interpretation,

The reason is that in order to derive f- t,u, the subderivations of f- q and f- r have to be interleaved.

Goal:

f-

p

490

The question arises whether there is an interesting subclass for which appropriate simulations can be defmed. Such
a class of programs will be discussed in Section 3. The main
idea is to assume that if a sub goal p may produce some
output on which evaluation of another sub goal q depends,
then p is smaller w.r.t. some partial ordering. Whether a
program maintains such a property, which we will call directedness, is undecidable. We will then introduce the
stronger notion of well-formedness which can be checked
syntactically. Well-formedness is related to directionality,
which is discussed in [GRE87]. Well-formedness is sufficient but not necessary for directedness, and it will tum out
that quite a lot of nontrivial programs (including for instance
systolic programs as discussed in [SHA87a] and most of the
examples given in [TIC91]) fall into this category. In Section
5 we will demonstrate how termination proof techniques
which have been established for pure Prolog can be
generalized such that they apply to well-formed GHC
programs.
The rest of this paper is organized as follows. Section 2
provides basic notions. Section 3 introduces the notion of directed programs and shows that this property is undecidable.
It provides the notion of well-formedness and shows that it
is sufficient for directedness. Section 4 discusses oriented
and data driven computation and shows that after some simple program transformation derivations with directed GHCprograms can be simulated by Prolog-like derivations.
Using the notion of S-models introduced in [FLP89], Sections 5 and 6 show how termination proofs can be achieved
automatically.

2. Basic Notions
We use standard notation and terminology of Lloyd [Ll087]
or Apt [APT90]. Following [APP90] we will say LD-resolution (LD-derivation, LD-refutation LD-tree) for SLD-resolution (SLD-derivation, SLD-refutation SLD-tree) with the
leftmost selection rule characteristic for Prolog.
Next we define GHC programs following [UED87] and
[UED88].
A GHC program is a set of guarded Hom clauses of the
following form:
(m >0, n> 0)
where H, GI, ... ,Gm and BI, ... ,Bn are atomic formulas. H
is called a clause head, the Gi's are called guard goals and
the Bi's are called body goals. The part of a clause before 'I'
is called a guard, and the part after 'I' is called a body. One
predicate, namely '=', is predefmed by the language. It unifies two terms.
Declaratively, the commitment operator 'I' denotes conjunction, and the above guarded Hom clause is read as "H is

implied by Gl, ... ,Gm and Bt. ... ,Bn". The operational semantics of GHC is given by parallel input resolution restricted by the following two rules:

Rule of Suspension:
• Unification invoked directly or indirectly in the guard of a
clause C called by a goal G (Le. unification of G with the
head of C and any unification invoked by solving the
guard goals of C) cannot instantiate the goal G.
• Unification invoked directly or indirectly in the body of a
clause C called by a goal G cannot instantiate the guard of
C or G until C is selected for commitment.

Rule of Commitment:
• When some clause C called by a goal G succeeds in
solving (see below) its guard, the clause C tries to be selected for subsequent execution (Le., proof) of G. To be
selected, C must first confirm that no other clauses in the
program have been selected for G. If confirmed, C is selected indivisibly, and the execution of G is said to be
committed to the clause C.

An important consequence is that any unification intended
to export bindings to the calling goal must be specified in the
clause body and use the predefmed predicate '='.
The operational semantics of GHC is a sound - albeit not
complete - proof procedure for Hom clause programs: if
~ B succeeds with answer substitution S, then V(BS) is a
logical consequence of the program.
Subsequently, we may fmd it convenient to denote a goal
g by the pair , i.e. g = GS. A single derivation step
reducing the i-th atom of G using clause C and applying mgu
S' is denoted by  -7 i;C . Subscripts may
be omitted.

3. Directed Programs
An annotation dp for an n-ary predicate symbol p is a function from {l, ... ,n} to {+,-} where '+' stands for input and
'-' for output. We will write p(+,+,-) in order to state that
the first two arguments of p are input and the last is output
A goal atom A generates (consumes) a variable v if v occurs at an output (input) position of A. A is generator for B,
if some variable v occurs at an output position of A and at an
input position of B; in this case, B is consumer of A.

r

Let denote a tuple of terms. A derivation  -7*
 respects the input annotation of p if vS v for every
variable v occurring at an input position of per).

=

A goal is directed if there is a linear ordering among its
atoms such that if Ai is generator for Aj then Ai precedes Aj
in that ordering. A program is directed, if all its derivations
respect directedness, i.e., all goals derived from a directed
goal are directed. Note that directedness of a goal is a static

491

property which can be checked syntactically. Directedness of
a program, however, is a dynamic property.
Theorem 3.1: It is undecidable, whether a program is directed.
Proof: Let tM(X) be a directed GHC simulation of a Turing
machine M for a language L which binds X to halt if and
only if M applied to the empty tape halts. Such a simulation
is for instance described in [PLU90b]. Next consider the
following procedures PM and q:
PM(X,Y) f- tM(A), q(A,X,Y).
q(halt,X,X).
and the (directed) goal
f- r(X,Y), s(Y:Z), PM(X,Z),
The following annotations are given:
t M(-)· q(+,-,-). PM(-'-)' r(+,-).

s(+,-).

If M halts on the empty tape, tM(A) will bind A to 'halt',
PM(X,Y) will identify X and Y and thus the given goal can
be reduced to the undirected goal f- r(X,Y), s(Y,X).
Decidability of program directedness would thus imply solvability of the halting problem: contradiction. •
Next we introduce the notion of well-formedness of a
program w.r.t. a given annotation and show that this property is sufficient for directedness.
A goal is well-Jormed if it is directed, generators precede
consumers in its textual ordering, and its output is unrestricted. Output of a goal is unrestricted if all its output arguments are distinct variables which do not occur (i) at an
output position of another goal atom and (ii) at an input position of the same atom.
A program P is well-formed if the following conditions
are satisfied by each clause H f- Glo ... ,Gm I Blo ... ,B n in P:
• f- Blo ... ,Bn is well-formed
• the input variables of H do not occur at output positions
of body atoms.
The predicate '=' has the annotation '- = -'. It is convenient to have two related primitives: '==' (test) and '¢::'
(matching) which have the same declarative reading as '='
but different annotations, namely '+ == +' and '- ¢:: +'.
Note that the goal f- r(X,Y),s(Y,Z), PM(X,Z) is not
well-fonned because its output is restricted: Z has two output
occurrences.
The next example is taken from [UED86]:
Example 1: Generating primes
primes(Max,Ps).
gen(N ,Max,Ns)
gen(N,Max,Ns)

+- true I

gen(2,Max,Ns),sift(Ns,Ps).
+- N ~ Max I Nl <= N + 1,
gen(Nl,Max,Nsl), Ns <=[N/Nsl}.
+- N > Max I Ns <= [J.

+- filter(P,Xs,Ys),sift(Ys,Zsl),
Zs <= [P/Zsl}.
sift([} ,Zs)
+- Zs <= [J.
filter(P,[X/Xs},Ys) +- X mod P == 0 I filter(P ,xs,Ys).
filter(P,[X/Xs},Ys) +- X mod P :f: 0 I filter(P ,xs,Ysl),
Ys <= [XIYsl].
filter(P,[J,Ys)
+- Ys <= [J.

sift([PjXs},Zs)

primes(+,-).

gen(+,+,-).

sift(+,-).

filter(+,+,-).

The call primes(Max,Ps) returns through Ps a stream of
primes up to Max. The stream of primes is generated from a
stream of integers by filtering out the multiples of primes.
For each prime P, a filter goal filter(p,Xs,Ys) is generated
which filters out the multiples of P from the stream Xs,
yielding Ys.
In this example all input terms are italic and all output
terms are bold. It can easily be seen that this program is
well-formed.
Another example for a well-formed program is quicksort.
The call qsort([HIL],S) returns through S an ordered version
of the list [HIL]. To sort [HIL] L is split into two lists Ll and
~ which are itself sorted by recursive calls to qsort.
Example 2: Quicksort
qt: qsort([J,L)
<12: qsort([H/L},S)

f-

sl: split([J ,x,Lt,~)

f-

~:

f-

split([X/Xs} ,y,Lt ' ,~)

f-

~: split([X/Xs},y,Lt,~') f-

L ¢:: [J.
split(L,H,A,B),
qsort(A,A t ), qsort(B,Bt),
append(A 1 ,[HIB1}'S),

L t ¢::: [J, L2 ¢:: [J.
X 5 YI
split(Xs,Y,Ll'L 2),

L t '¢::: [XIL 1}·
X > Y I split(Xs,Y,L t ,L 2),
Ll'

al: append([] ,L1,Ll)

f-

a2: append([H/Ll] ,L2,L3)

f-

split(+,+,-,-). qsort(+,-).

¢::

[X/L 2]·

Ll ¢::: L l'
append(Ll,L2,L3'),
L3 ¢::: [H/L J '}.
append(+,+,-).

Theorem 3.2: Let P be a well-formed program, g a wellformed goal and g -7 * g' a GHC-derivation. Then g' is
well-formed.
Proof: See [PLU92].
Well-formed programs respect input annotations:
1beorem 3.3: Let -7*  be a derivation and v
an input variable of p(~). Then v9 = v.
Proof: Goal variables can only be bound by transitions applying '=' or '¢::', since in the other cases matching substitutions are applied. Since both arguments of '=' are output,
and '¢::' also binds only output variables, input variables
cannot be bound. •

492

4. Oriented and Data Driven Computations
Our next aim is to show that derivations of directed programs can be simulated by derivations which are similar to
LD-derivations. In this context we fmd it convenient to use
the notational framework of SLD-resolution and to regard
GHC-derivations as a special case.
We say that an SLD-derivation is data driven, if for each
resolution step with selected atom A, applied clause C and
mgu 8 either C is the unit clause (X =X ~ true.) or C is
B f - BI, ... ,B n and A = B8. Data driven derivations are the
same as GHC derivations of programs with empty guards.
The assumption that guards are empty is without loss of
generality in this context
Next we consider oriented computation rules. Oriented
computation rules are similar to LD-resolution in the sense
that goal reduction strictly proceeds from left to right. They
are more general since the selected atom is not necessarily
the leftmost one. However, if the selected atom is not
leftmost, its left neighbors will not be selected in any future
derivation step.
More formally, we define: A computation rule R is
oriented, if every derivation  ~ ...  ~ .. , via
R satisfies the following property: If in Gi an atom Ak is
selected, and Aj (j < k), is an atom on the left of Ak , no
further instantiated version of Aj will be selected in any
future derivation step.
Our next aim is to show that, for directed programs, any
data driven derivation can be simulated by an equivalent data
driven derivation which is oriented. To prove the following
Lheorem, we need a slightly generalized version of the
switching lemma given in [LL087]. Here g ~i;C;9 g'denotes a single derivation step where the i-th atom of g is resolved with clause C using mgu 8.
Lemma 4.1: Let gk+2 be derived from gk via
gk ~i;ck+l;ek+l ghl ~j;Ck+2;~+2 gk+2 . Then there is a
derivation gk ~j;Ck+2·;ek+t' gk+I' ~j;Ck+1';9k+2' gk+2' such
that gk+2' is a variant of gk+2 and Ck+I', C k+2' are variants
of Ck+2 and ChI.
Proof: [LL087] The difference between this and Lloyds
version is that the latter refers to SLD-refutations, while ours
refers to (possibly partial) derivations. His proof, however,
also applies to our version. •
Theorem 4.2: Let P be a directed program and  a
directed goal. Let D = ~ ...  be a data driven
derivation using the clause sequence CI, ... ,Ck. Then there is
another data driven derivation D': ~ ... 
using a clause sequence Cjl', ... ,Cjk' , where  is a
permutation of , each Ci' is a variant of Ci and
Gk'8 k' is a variant of Gk8k, and D' is oriented.

Proof: Let gj be the first goal in D where orientation is violated, i.e. there is the following situation:
gj : 
gj: 
R' is selected in gi and R is selected in g .. Now we
get a new
switch subgoal selection in g·-l and g.
J
J
. .
denvatlOn D*. In D* we look again for the first goal
violating the orientation. After a finite number of iterations,
we arrive at a derivation D' which is oriented. It remains to
be shown that D* (and thus D') is still data driven.

ani

Note that up to gj-l both derivations are identical. Above,
the switching lemma implies that, from gj+ lon, the goals of
D' are variants of those of D.
Now let Q be the selected goal of Gj-l. Since orientation
is violated for the first time in G., Q is to the right of R. (If
. .
J
I =J - I then Q =R', and otherwise j-l would have the first
violation of orientation.) Since gj-l =  is directed,
Q8~-1 is not a generator of R8j _1 and thus R8j_l and R8j are
vanants. Let H be the head of the clause applied to resolve R
in . Since D is data driven, R8 j_1 =Hcr for some cr,
and so R8j = Hcr' for some cr'. Thus D' is data driven. •
Corollary 4.3: Let P be a directed program and g a directed goal. Then g has an infmite data driven derivation if
and only if it has an infinite data driven derivation which is
oriented.
According to Corollary 4.3, in our context it is suffIcient
to consider data driven derivations which are oriented. Such
derivations are still not always LD-derivations since the selected atom is not necessarily leftmost. If it is not, however,
its left neighbors will never be reactivated in future derivation steps; thus w.r.t. termination they can simply be
ignored. The same effect can be achieved by a simple
program transformation proposed in [FAL88]:
Pro(P) =

{p(X) f - I p is an n-ary predicate appearing
in the body or the head of some clause of P
and Xis an n-tuple of distinct variables}

Parto{P) =

p u Pro(P)

Simulation Lemma 4.4: Let D =Go ~ ... Gi-l ~ Gi be
an oriented SLD-derivation of Go and P where
Gi-l = f- Bl, ... ,Bj ... ,B n and
Gi
= f- (Bl, ... ,Bj_l,Ci+,Bj+l, ... ,Bn>8i·
Cj+ is the body of the Cj applied to resolve Bjo Then there is
an LD-derivation
D'
Go ... ~ ... Gk-l'-7Gk' with Part.o{P), where
Gk-t' = f- Bj ... ,B n and
Gk' = f- (Ci+,Bj+l ... ,B n)8i:

=

Proof: Whenever an atom B is selected in D which is not
the leftmost one, first the atoms to the left of B are resolved

493

away in D' with clauses in PrG(P), and then D' resolves B in
the same way as D .•
An immediate implication is the following:

Theorem 4.5: If g has a non-terminating data driven oriented derivation with P, then it has a nonterminating LDderivation with Parto(P).
The converse, however, is not true. Consider, for
instance, the quicksort example from above, extended by the
following clauses
qo:
so:

ao:

qsortL,..).
splitL,_,_,_).
appendL,_,..).

While the LD-tree for f- qsort([2,1],x) is finite in the
context of the standard deftnition of qsort, it is no longer true
for the extended program. Consider the following infinite
LD-derivation:
+- qsort([2,1],X)
+- split([1],2,A,B), qsort(A,A1)'
qsort(B,Bt), append(A1,[HIB 1],S).
by so:
+qsort(A,A 1)'
qsort(B,Bt), append(A1,[HIB 1],S).
+- splitL,_,_,_) •...
by q2:
+- qsortL,_), ...
by so:
This derivation, however. is not data driven: resolving
qsort(A,A 1) in the third goal with qZ yields an mgu which is
not a matching substitution.
For data driven LD-derivations we get a stronger result:
Theorem 4.6: There is a nonterminating data driven oriented derivation for g with P if and only if there is a nonterminating data driven LD-derivation for g with Parta(P).
Proof: The only-if part is implied by the simulation lemma.
For the if-part, consider a nonterminating. data driven LDderivation D. By removing all applications of clauses in
Pro(P), one gets another derivation D'. D' is a nonterminating data driven oriented derivation. •
Restriction to LD-derivations which are data-driven
enlarges the class of goal/program pairs which do not loop
forever. In the general case, termination of quicksort
requires that the first argument is a list. Termination of
append requires that the first or the third argument is a list.
Restriction to data-driven LD-derivation implies that no
queries of quicksort or append (and many other procedures
which have finite LD-derivations only for certain modes)
loop forever. However, goals like +- append(X.Y,z) or +quicksort(A,B) deadlock immediately.

5. Termination Proofs
In this section. we will give a sufficient.condition for terminating data driven LD-derivations. We will concentrate on
programs without mutual recursion. In [PLU90b] we have

demonstrated how mutual recursion can be transformed into
direct recursion. We need some further notions.
For a set T of terms, a norm is a mapping 1...1: T ~ N.
The mapping II.. .II: A ~ N is an input norm on (annotated)
atoms, if for all B = p(tl, ... ,trJ. II B II = Lie I I ti I, where I
is a subset of the input arguments of B.
Let P be a well-formed program without mutual recursion. P is safe. if there is an input norm on atoms such that
for all clauses c Bo f- Bl, ... ,Bh ... ,B n the following
holds: If Bi is a recursive literal (Bo and Bj have the same
predicate symbol), cr a substitution the domain of which is a
subset of the input variables of Bo and 8 is a computed
answer for f- (Bl, ... ,Bj-l)cr, then IIBocr811 > IIBicr811.

=

We can now state the following theorem:
Theorem 5.1: If P is a safe program and G = +- A is wellformed, then all data driven LD-derivations for G are fmite.
PROOF: By contradiction. Assume that there is an infmite
data driven LD-derivation D. Then there is an infinite subsequence D' of D containing all elements of D starting with the
same predicate symbol p. Let di and di+ 1 be two consecutive
elements of D' and
di
+- P(tl, ... ,tr), .. .
di+1
+- p(t't. ... ,t'r), .. .
and
P(Sl, ... ,Sr) +- B}, ... ,Bk,P(S'l, ... ,S'r), ...

=

=

=

be the clause applied to resolve the ftrst literal of di, 8j the
corresponding mgu. Then there is a computed answer
substitution 8' for +- (Bl, ... ,Bk)8i such that p(t'l, ... ,t'r) =
p(s' 1•••• ,S'r)8i8 '.
Since D is data driven, 8j is a matching substitution, i.e.
p(t., ...•ft.) = p(t., ... ,tr)8i. Since P is well-formed, Theorem
3.3 further implies p(t..... ,tr) p(t..... ,tr)8j8'. We also
have p(tl, .. .,lr)8j8' = p(Sf, ... ,Sr)8i8'.

=

Since P is a safe program
IIp(Sl, ... ,sr)8i8'1I > IIp(S'l, ... ,s'r)8j8'1I and thus
IIp(tl, ... ,tr)8j8'1I> IIp(t' ..... 't'r)8j8'11. Since the range of
11 ... 11 is a well-founded set, D' cannot be infinite.
Contradiction.•
The next question is how termination proofs for data
driven LD-derivations can be automated. In [PLU90b] and
[PLU91], a technique for automatic termination proofs for
Prolog programs is described. It uses an approximation of
the program's semantics to reason about its operational
behavior. The key concept are predicate inequalities which
relate the argument sizes of the atoms in the minimal
Herbrand model of the program. Now in any program
Parto(P) for every predicate symbol p occurring in P there is
a unit clause p(X). Thus the minimal Herbrand model Mp of
P equals the Herbrand base Bp of P, a semantics which is

494

not helpful. To overcome this difficulty, we will consider Smodels which have been proposed in [FLP89] in order to
model the operational behaviour of logic programs more
closely. The S-model of a logic program P can be characterized as the least fixpoint of an operator Ts which is defmed
as follows:
Ts(I) = (B I 3 Bo ~ Bt. ... ,Bk in P,3 BI', ... ,Bk' e I,
30= mgu«BI, ... ,Bk),(BI', ... ,Bk'
and B = BoO}.

»,

We need some notions defmed in [BCF90] and [PLU91].
Let.1 be a mapping from a set of function symbols F to N
which is not zero everywhere. A norm I ... I for T is said to
be semi-linear if it can be defmed by the following scheme:
It I
0
if t is a variable
I t I = .1(0 + Lie I I ti I
ift= f(tt. ... to),
where I ~ {1, ... ,n} and I depends on f.
A subterm lj is called selected if i e I.
A term t is rigid w.r.t. a norm I ... I if I t I = I tS I for all
substitutions S. Let t[v(i)~s] denote the term derived from t
by replacing the i-th occurrence ofv by s. An occurrence v(i)
of a variable v in a term t is relevant w.r.t. I ... I if
I t[v(i)~s] I '# I t I for some s. Variable occurrences which
are not relevant are called irrelevant A variable is relevant if
it has a relevant occurrence. Rvars(t) denotes the multiset of
relevant variable occurrences in the term t.
Proposition 5.2: Let t be a term, tS be a rigid term and V
be the multiset of relevant variable occurrences in t. Then for
a semi-linear norm 1...1 we have ItSI = Itl + Lve V IvSl
Corollary 5.3: ItS I ~ ItI.
Proof: [PL U91]
For an n-ary predicate p in a program P, a linear predicate
inequality Lip has the form Lie I Pi + c ~ L je J Pj, where I
and J are disjoint sets of arguments of p, and c, the offset of
Lip, is either a natural number or 00 or a special symbol like
y. I and J are called input resp. output positions of p (w.r.t.
Lip).
Let Ms be the S-model of P. LIp is called valid (for a
linear norm 1. . .1) ifp(tl, ... ,to) e Ms implies Lie I llil+ c ~
Lje J Itjl.
Let A = P(tl, ... ,tn). With the notations from above we
further define:

Theorem 5.4: Let Lie I Pi + c ~ L je J Pj be a valid linear
predicate inequality, G = ~ p(t..... ,tn)o, a well-formed goal,
V and W the multisets of relevant input resp. output variable
occurrences of P(tl, ... ,trJ and S a computed answer for G.
Then the following holds:

L

i) Lie I IliaSI + c ~
je J IljO'SI.
ii) LveV I vaS I + F(p(tl, ... ,tn),Llp ~
Lwe w IwO'S I .
Proof: According to [FLP89], p(t ..... ,trJO'S is an instance
of an atom p(s ..... ,SO> in the S-model Ms of P. Since the
output of G is unrestricted, tjO'S = Sj for all je J. Proposition
5.2 implies ItiO'SI ~ Itil for all ie I. Thus

L

Lie I ItiO'SI ~ Lie I I Si I and Lje J Itj0'81 =
je J I Sj I
which proves the first part of the theorem. The second part is
implied by Prop. 5.2. •
Theorem 5.4 gives a valid inequality relating variables occurring in a single literal goal. Next we give an algorithm for
the derivation of a valid inequality relating variables in a
compound goal.
Algorithm 5.5 goal_inequality(G ,LI,U, W,.d,b)
A well-formed goal G = +- B ..... ,Bo, a set LI
with one inequality for each predicate in G, and
two multisets U and W of variable occurrences.
Output: A boolean variable b which will be true if a valid
inequality relating U and W could be derived, and
an integer 11 which is the offset of that ineqUality.
begin
M :=W;.1 :=0; V:= U;
For i := n to 1 do:
IfM () Vout(Bi,Llp):F. 0 then
M:= (M\ Vout(Bi,Llp» u (Vio(Bi,Llp)\ V);
V := V\ Vio(Bi,Llp);
11 := 11 + F(Bj,Llp). fi
If M = 0 then b:= true else b:= false fi
end.
Input:

Next we show that the algorithm is correct:
Theorem 5.6: Assume that the inequalities in LI are valid
and b is true, (J is an arbitrary substitution such that GO' is
well-formed and S is a computed answer substitution for
GS. Then Lvev IV(JSI+ 11

~ LweW'wO'SI holds.

Proof: See [PLU92].
F(A,Llp)
Vin(A,Llp)

=

Lie I Ittl- Lje J Itjl + c.

Algorithm 5.5 takes time O(m) where m is the length of G.

=

u rvars(ti)

[PLU90b] gives an algorithm for the automatic derivation
of inequalities for compound goals based on andlor-dataflow
graphs which has exponential -runtime in the worst case.
Algorithm 5.5 makes substantial use of the fact that G is
well-formed: each variable has at most one generator; which
makes the derivation of inequalities detenninistic.

Vout(A,LIp) =

u rvars(tj)

Fin(A,Llp)

=

Lie I llil

Fout(A,Llp)

=

LjeJ Itjl

F(A,LIp) is called the offset of A w.r.t. Lip.

495

6. Derivation of inequalities for S-models
In Aection 5 it has been assumed that linear inequalities are
given for the predicates of a program P. We now show how
these inequalities can be derived automatically. We assume
that P is well-fonned and free of mutual recursion. Let P<1t q
if P '# q and p occurs in one of the clauses defining q.
Absence of mutual recursion in P implies that <1t defmes a
partial order which can be embedded into a linear order.
Thus there is an enumeration {Pl, ... ,Pn} of the predicates of
p such that Pi ~ Pj implies i < j. We will process the predicates of P in that order, thus in analyzing p we can assume
that for all predicates on which the definition of p depends
valid inequalities have already been derived. Note that a
trivial inequality with offset 00 always holds.
Let in(A) and out(A) denote the sets of input resp. output
variables of an atom or a set of atoms according to the annotation of the given programs.
Algorithm 6.1: predicate_inequalities(P,LI):
Input:

A well-fonned program P defming Plo ... ,Pn.

Output: A set LI of valid inequalities for the predicates of P.
begin

LI:= 0
For i:= 1 to n do:
begin
Let Cl, ... ,Cm be the clauses defining Pi,
Let M, N be the input resp. output arguments of Pi,
li := LJ1E M lpJ11 + 'Y ~ LVE Nlpvl.
hi:= true.
For j:= 1 to m do:
begin
Let cJ be Bo ~ Bh ... ,Bk.
goal_mequality((+- B 1, .•. ,B k),
LIu{li} ,Vin(Bo),Vout(Bo), 6i,bJ
c:= 6i + Fout(Bo,li) - Fin(Bo,li).
Wi := hi
If c contains '00' then Wi := Wi" false
(*)
elseif c is an integer then Wi :=Wi A (y ~ c)
(**)
elseif c = y + dAd ~ 0 then Wi := Wi " true
else if c = y + d " d > 0 then Wi := Wi " false
(***) elseif c = k * 'Y + n" k> 1,
then Wi := Wi A (y S n/(I-k).
end
If Wi is satisfiable then let 8i be the smallest value for
'Y which satisfies Wi
else let ~i be '00'.
Replace 'Y in li by ~.
LI := LI u {li}
end
end
Theorem 6.2: The inequalities derived by the algorithm
are valid.
Proof: By induction on the number of predicates n in P.
The case n = 0 is immediate. For the inductive case, assume
that the derived inequalities for the predicates Pt. ... ,Pn-l are

valid. Let 10 be the minimal S-model of P restricted to the
predicates Ph ... ,Pn-l. In the context of the program which
consists of the defmition of Pn only, let = 10 and ~ =
1
.Ts
). Its limes equals the minimal S-modelofP
restricted to the predicates Pl, ... ,Pn. Now we have to show
that the inequality li derived for Pn is valid w.r.t ~ . The
. proof is now by induction on m. The case m = 0 is implied
by the induction assumption on n. Assume that the theorem
holds for n - 1. We have to show that the inequality for Pn
holds for the elements of~. Now lett B E ~ and
Bo ~ Bh ... ,Bk be the clause applied to derive B. We have
B = BoO, where 0 is a computed answer substitution for
~ Bh ... ,Bk, which is a well-fonned goal. Let V = in(Bo)
and W = out(BO>. Let LI be the set of inequalities derived by
Algorithm 6.1, and A be the result of calling
goal_inequality«+- Blo ... ,B0,LI,V,W, A, hi). Theorem 5.6
and the induction assumption imply
(:t:)
LVE v ivOI + A ~ LWE w lwei

T?

('I7

=

=

Since B BoO, we have Fin(B,li) Fin(Bo,li) + LVEVlvel
and Fout(B,li) = Fout(Bo,li) + LWE w1wel. Let a be the
offset of Ii. We have to show
(:t::t:) Fin(B,li) + a. ~ Fout(B,li).

If bi is false or 6 is 00, we are done since in that case a is 00.
Three more cases remain. (*) and (**) immediately imply

(:t::t::t:) a. ~ 6 + Fout(Bo,li} - Fin(Bo,li).
(***) implies a. ~ n/(I-k) and thus a ~ n + k*a for some n
such that n + k*a. =6 + Fout(Bo,li) - Fin(Bo,li}. Again
(:t::t::t:) follows. (:t:) and (:t::t::t:) together now imply (:t::t:). •
Note that Algorithm 6.1 again has run-time complexity
O(n), where n is the length of the given program P.
Algorithm 6.1 is not yet able to derive PI ~ P2 for a unit
clause like p(X,Y) with mode(p(+,-». This inequality, however, holds since in a well-fonned goal the output argument
of p will always be unbound. To overcome this difficulty,
we assume that before calling predicate_inequalities(p,LI), P
will be transfonned to P' in the following way:
Defme freevars(Bo +- Bl, ... ,BJ =
(out(Bo) \out(B ..... ,BJ) u in(Bl, ... ,BJ \ in(Bo».
Now for the clause c = Bo +- Bl, ... ,Bn in P let freevars(c)
= {Yl, ... ,Ym }. Replace c by Bo +- q(Yl, ... ,Ym>,Bl, ... ,B n
where a new predicate q is defmed by the unit clause
q(XJ, ...
with mode(q{+, ... ,+». Note that, after that
transformation, P' is well-fonned if P is well-formed, and if
an inequality is valid for P' it is valid for P as well. In the
example mentioned above, input for Algorithm 6.1 will be
the program P = (q(X). ,p(X,Y) ~ q(Y)} and the output
will be (0 ~ qt, PI ~ P2).

,xm>

Another improvement can be made by considering subsets of
the input arguments in order to achieve stronger inequalities.
This, however, makes the algorithm less efficient.

496

languages, Proc. of the Int. Con! of Fifth
Gen. Compo Systems, ICOT 1988.

7. Example
We finally discuss how, with the techniques given so far, it
can be shown that the GHC program for quicksort specified
in Section 3 tenninates for arbitrary goals.
Corollary 4.3 and Theorem 4.5 imply that is suffices to
consider data-driven LD-derivations of the extended program
for qsort including the clauses SO, ao and qo. According to
Theorem 5.1 we only have to show that the three predicates
of the program are safe. This is easy to show for split and
append. In fact these procedures are structural recursive. It
is more difficult to prove of qsort because in q2 both
recursive calls contain the local variables A and B. For this
reason we need a linear predicate inequality for split which
has the form splitt + "( ~ Split3 + Split4. After the
transforamtion mentioned at the end of the last paragraph So
will have the following fonn:
f- q(L3' L,J
so: split(L 1,L2,L 3,L,J
Now SO and S1 give 'Y ~ 0 (case * in Algorithm 6.1), while S2
and S3 give 'true' (case **). Thus we get splitt + 0 ~ Split3 +
split4. In order to prove safety of qsort, we only have to
consider q2. Using this inequality Algorithm 5.5
immediately shows IIqsort([HIL],s)811 > IIqsort(A,A1)811 and
IIqsort([HIL],S)81 > IIqsort(B,B1)811 for all answer
substitutions 8 for split(H,L,A,B). Thus qsort is safe.

[FLP89]

Falaschi, M., Levi, G., Palamidessi, C.,
Martelli, M., Declarative Modeling of the
Operational Behavior of Logic Languages,
Theoretical Computer Science 69,1989.

[GRE87]

Gregory, S., Parallel Logic Programming in
P ARLOG, Addison Wesley, 1987.

[HAP85]

Harel, D., Pnueli, A., On the development of
reactive systems, in Apt, K. R (ed.) Logics
and Models of Concurrent Systems, Springer
1985.

[LL087]

Lloyd, 1., Foundations of Logic
Programming, Springer Verlag, Berlin,
second edition, 1987.

[pLU90a]

Pltimer, L., Tennination proofs for logic
programs based on predicate inequalities, in
Warren, D.H.D., Szeredi, P. (eds.),

Proceedings of the Seventh International
Conference on Logic Programming, MIT
Press 1990.
[PLU90b]

Pltlmer, L., Tennination Proofs for Logic
Programs, Springer Lecture Notes in
Artificial Intelligence 446, Berlin 1990.

[PLU91]

Pltimer, L., Termination proofs for Prolog
programs operating on non ground tenns,
1991 International Logic Programming

Acknowledgment
Part of this work was perfonned while I was visiting CWI.
K. R Apt stimulated my interest in concurrent logic programming.

References
[APP90]

[APT90]

Apt, K. R, Pedreschi, D., Studies in pure
Prolog: Tennination, Technical Report CSR9048, Centre for Mathematics and
Computer Science, Amsterdam, 1990.
Apt, K. R, Introduction to logic
programming, in Leeuwen (ed.), The

Symposium, San Diego, California, 1991.
[PLU92]

PlUmer, L., Automatic Verification of GHCPrograms: Tennination, Technical Report,
UniversiUit Bonn, 1992.

[SHA87]

Shapiro, E., Concurrent Prolog, Collected
Papers, MIT Press 1987.

[SHA87a]

Shapiro, E., Systolic Programming: A
paradigm of parallel processing, in [SHA87].

[TIC9l]

Tick, E., Parallel Logic Programming, MIT
Press 1991.

[UED88]

Ueda, K., Guarded Hom Clauses: A Parallel
Logic Programming Language with the
Concept of a Guard, in Nivat, M., Fuchi, K.
(eds.), Programming of Future Generation
Computers, North-Holland 1988.

[UED86]

Ueda, K., Guarded Hom Clauses, in
[SHA87].

[ULG88]

Ullman, J. D., van Gelder, A., Efficient
Tests for Top-Down Tennination of Logical
Rules, Journal of the ACM 35, 2,1988.

Handbook ofTheoretical Computer Science,
North Holland 1990.
[BCF90]

[FAL88]

Bossi, A., Cocco, N., Fabris, M., Proving
Tennination of Logic Programs by Exploiting
Tenn Properties, Technical Report Dip. di
Matematica Pura e Applicata, Universita di
Padova, 1990.
Falaschi, M., Levi, G., Finite failures and
partial computations in concurrent logic

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE
ON FIFTH GENERATION COMPUTER SYSTEMS 1992,
edited by ICOT. © ICOT, 1992

497

Analogical Generalization
Takenao OHKAWAt

Toshiaki MORIt

Noboru BABAGUCHIt

Yoshikazu TEZUKAt

t Education Center for Informati(;m Processing, Osaka University
t Dept. of Communication Eng., Faculty of Eng., 'Osaka University
2-1, Yamadaoka, Suita, Osaka, 565 Japan

e-mail: ohkawa@oucom5.oucom.osaka-u.ac.jp

Abstract
Approaches to learning by examples have focused 011 generating general knowledge from a lot of examples. In this paper
we describe a new learning method, called analogical generalization, which is capable of generating a new rule which
specifies a given target concept from a single example and
existing rules. Firstly we formulate analogical generalization
based on the similarity between a given example and existing
rules from the logical viewpoint. Secondly, we give a new procedure of inductive learning with analogical generalization,
called ANGEL. The procedure consists of the following five
steps: (1) extending a given example, (2) extracting atoms
from the example and selecting a base rule out of the set of
existing rules, (3) generalizing the extracted atoms by means
of the selected rule as a guide. (4) replacing predicates, and
(5) generating a rule. Through the experiment for the system
for parsing English sentences, we have clarified that ANGEL
is useful for acquiring rules on knowledge based systems.

1

Introduction

Machine learning has a great contribution to improving performance through automated knowledge acquisition and refinement, and so far, various types of machine learning
paradigms have been considered. In particular, learning from
examples, which can form general knowledge from specific
cases given as input examples, has been well studied and a
lot of concerned methods have been proposed[Mitchell1977,
Dietterich and Michalski 1983, Ohkawa et al. 1991].
Generally, in learning from examples, we have to give a
lot of examples to the learner. Why are so many examples
required? We think the reason for this is that the bias for
restricting the generalization is relatively weak, because it is
independent of the domain. Hov?ever, when a human being
acqt~ires new knowledge, he would not always require a lot of
examples. As the case may be, he can learn from one example. We think tbis is because h~ decides a strong bias for the
gencralization according to the domain, and generalizes the
examples based on the bias, That is, in order to generalize a
few examples appropriately, a strong bias which depends on
the domain is indispensable.
It is necessary to consider how the strong bias should
be provided. Let us recall the behavior of a human being
again. ·When c.cquiring new knowledge, he often utilizes similar knowledge which is already known. In other words, the

existence of similar knowledge may help for him to associate
new knowledge. This process is called analogy. Analogy is
considered promising to realize learning from a few examples.
Since analogy will be regarded as one of the most effective
way for restriction on generalization, modeling its process
will make it possible to provide a domain dependent bias.
In this paper, we propose a new learning method, called
ANGEL (ANalogical GEneraLization), which is capable of
generating a new rule from a single example. In ANGEL,
both the rules and the examples are represented as logical formulas. We introduce the notion of analogy[Winston 1980],
namely, the similarity between the example and the existing rules as the bias for the generalization[Mori et al. 1991].
The similarity is determined by comparing the atoms of both
the example and the existing rules. Based on the siIl'Jlarity,
firstly, ANGEL extracts atoms from the example and selects
a rule out of the existing rules; next, it generates a new rule
by generalizing the extracted atoms by means of the selected
rule as a guide.
The next section describes the definition of analogical generalization. In this section we consider analogical generalization from the logical viewpoint. Section 3 gives the procedure
of ANGEL which is a method for learning based on analogical generalization. In this section, we also give consideration
to the experimental result oflearning by ANGEL. Finally in
section 4, we clarify the originality of ANGEL through its
comparison to other related ·works.

2

Analogical generalizat.ion

To represent knowledge, we use the form which conforms
to first order predicate logic. Two kinds of forms, called a
fact and a rule, are provided. A fact is represented as an
atom, while a rule is represented as a Horn clausc, which is
expressed in the form of

where cx,/31, ... ,/3n are atoms. Letting l' be a rule cx
/31,'" ,/3n, we denote the consequence of rule r, namely cx, by
cons( r), and denote the premise of rule r, namely /31, ... ,/3n,
f--

by prem(r).
The underlying notion of analogical generalization is that
a new rule is generated by generalizing an input example,
which consists of facts, based on the similarity between the
example and the existing rules. Before formulating analogical
generalization, we define the similarity between two atoms,

498

and next formalize the similarity between two finite sets of
atoms.

Let us consider the similarity of father(x, y) to
mother(Jim,Betty) and brother(Tom,Joe). For each atom,
the following R-deducible sets are derived as

2.1

Similarity between two atoms

First, we define some basic notations. A substitution is a
finite set of the pair v It, where v is a variable, t is a term,
and the variables are distinct. Let {} = {VI/tb ... , vn/t n }
be a substitution and e be an expression, which is either a
literal or a conjunction or disjunction of literals. Then e{} is
the expression obtained from e by replacing each occurrence
of the variable Vi in e by the term ti. If S is a finite set
of expressions and {} is a substitution, SO denotes the set
{e{} leE S}.
Let {} be a substitution and S be a finite set of atoms. If S{}
is a singleton, S is unifiable by {} and we write unifiable(S).
Now, we give the following two functions, and define the
similarity between atoms by means of these functions. Let
R be a set of existing rules, and Q and Q' be atoms.

R-shnilar sets of father(x,y) for mother(Jim,Betty) and
brother(Tom, Joe) are as follows.
'!F(RI, father(x, y),mother( Jim, Betty))
= {parent(x,y),family(x,y)}
'!F(RI' father(x, y), brother(Tom, Joe)) = {family(x, y)}

Accordingly
father(x, y)
is
more
similar
to
mother(Jim, Betty) than brother(Tom, Joe) with respect to

RI. This result matches our intuition very well.

2.2

Definition 1 ( R-deducible set )

~(R, Q) ~ {fi I R

- [II(1") : II(1')O].

[A: B] !):. [A' : C].
Now, we assume C I is the

foll~wing set

of atoms.

C1 = {brother(Tom,Joe), strikes(Joe,Mark)}
A maximally preceding correspondence of Al to C I with
respect to Rl is shown as
{(fathe:!"(x, y), brother (Tom, Joe»,
(kills(y,z),strikes(Joe,Mark»},

and therefore,

3. For an arbitrary rule 1'" (E R) and an arbitrary set
of atoms A(~ E"), the following relation does not
hold.
[A: II(1'")]

R

>- [II(r)O : II{1")].

• Significance condition
For a r1..lle 1" which satisfies similarity condition 2, letting

P(x)). Preparations In this paper, we use standard formal logic and notations, while defining the following. An n-ary predicate U is generally expressed by AXQ, where x is a tuple of n object variables, Q is a formula in which no object variables except variables in x occur free. If t is a tuple of n terms, U(t) stands for the result of replacing each occurrence of (elements of) x in Q with (each corresponding element of) t simultaneously. For any formulas A and F, when A f- F and If F (that is, F is not valid), we say F is a genuine theorem of A and express it simply as A f-F. We will use a closed formula of first order logic A for a theory, (generally n) terms T for a tar-yet and (generally n) terms B for a base. A property is expressed by a predicate, for instance, a similarity and a p1'Ojected pr~perty are expressed by predicates, Sand P respectively. 2.2 Thus, the essential information newly obtained by analogy is F( x) in the above rather than the explicit projected property P. Making J (x) staud for the (,Ollj unction of the example-based information and F ( x). the above meta-sentence is transformed equivalently to Approach To A Seed of Analogy We can understand analogical reasoning as follows: (1) Example-based Information: "An object, x' (corresponding to a base), satisfies both properties Sand P (3x'.(S(x') 1\ P(x')))." (2) Similarity-based Information: "Another object, x (corresponding to a target), satisfies a shared property S with x' (S(x))." (3) Analogical Conclusion: "The object ,r would satisfy the other property P (P(x))." Then, .• Analogical reasoning is to reason (3) from A together with (1)+(2)." (A) Let this understanding be our starting point of analySIS. As analogy is not, generally, deductive, this starting point may, unfortunately, be expressed only as follows. In the notation of proof theory, (3) because A is closed. This implies that a rule must be a theorem of A and that the rule concludes any object which satisfies J(x) to satisfy P when it satisfies S. Once J ~s satisfied, (by reason of (S(x) :> 'P(x)),) the analogical conclusion ("an object satisfies P") can be deduced from the similarity-based information ("the object satisfies S). For this reason, this rule will be called the analogy prime rule (it will be specified in more detail later), J will be called the analogy justification. Moreover, it is improbable that the analogy prime rule is a valid formula, because, if so, any pair of predicates· can be an analogical pair of a similarity and a projected property independently of A. Thus, the analogical prime rule must be a genuine theorem of A, A ~Vx.(J(x) 1\ S(x) :::> P(x)). (4) Consequently, an object T which satisfies S is concluded to satisfy P from an analogy prime rule by analogical reasoning that assumes that T satisfies the analogy justification (J(T)). That is, our starting point (A) can be specified from two aspects. "An analogical conclusion can be obtained from an analogy prime rule together with examplebased information and similarity-based information." (B) "A non-deductive jump by analogy, if it occurs, is to assume that the analogy justification of the prime rule is satisfied." (C) In the following part of this paper, the analogy justification and non-deductivity will be further explored . Before beginning an abstract discussion, it may be useful to see concrete examples of analogical reasoning. The next section introduces ·'target" examples of analogical reasoning to be clarified here. 2.3 Examples As analogy, however, infers P(x) from the premises, it implies that some knowledge is assumed in the premise part of (1). Let the assumed knowledge be F(x), providing that it depends on the x in general. That is, Examplel: Determination Rule[3]. "Bob's car (CBob ) and Sue's car (C Sue ) share the property of being 1982 Mustangs (Mustang). We infer that Bob's car is worth about $3500 just because Sue's car is worth about $3500. (We could not, however, infer that Bob's car is painted red just because Sue;s car is painted red.)" Example-based Information: A,3x'.(S(x') 1\ P(x')),S(x),F(x) f- P(x). Model(Cs ue , Mustang) 1\ Value(Cs ue ,$3500), A,3x'.(S(x') 1\ P(x'»,S(x) If P(x). (1) (2) (5) 507 Similarity-based Information: M odel( CBob, Mustang), (6) Example2: Brutus and Tacitus [1]. ~~ Brutus feels pain when he is cut or burnt. Also, Tacitus feels pain when he is cut. Therefore, if Tacitus is burnt. he will feel pain." Example-based Information: (Suffer(Brutus, Cut) =:l FeeIPain(Brutus)) I\(Suffer(Brutus,Burn) =:l FeeIPain(Brutus)) (7) (8) Similarity-based Information: Suffer(Tacitus, Cut) =:l FeeIPain(Tacitus) (9) Example3: Negligent Student l . "When I discovered that one of the newcomers (5tudentT) to our laboratory was a member of an orchestra club (Orch), remembering that another student (5tudentB) was a member of the same club and he was often negligent of study (Study), I guessed that the newcomer would be negligent of study, too." Exa.mple-based Information: Member ...of(StudentB, Orch) I\N egligenLof(StudentB, Study) (10) Similarity-based Information: Member_of(StudentT,Orch) 2.4 (11) Logical Analysis: a rule as a seed of analogy In treating analogy in a formal system, as the information of a base object being Sand P is projected into a target object, it is desirable to treat such properties as objects so that we can avoid the use of second order langua.ge. As an example, the fact that Bob's car is a Mustang is represented by "Model(C Bob , Mustang)" rather than simply as "Mustang(CBob )". In the remaining part, we rewrite S(x) to ~(x, S) and P(x) to I1(x, P). ~ will be called a similar attribute, II will be a projected attribute,S as an object will be a similar attribute value, and P as an object will be a projected attribute value. Then, (4) is rewritten .A ~'v'x,s,p.(J(x,s,p) 1\ I:(x,s) =:l II(x,p)), (12) considering the most general case that the analogy justification J depends on all of these factors. Again, when 3-tuple < object: X, similar attribute value: 5, projected attribute value: P > satisfies the analogy justification J, object X is conjectured to satisfy the projected property AX .I1( x, P) (analogical conclusion) just because X has the similarity Ax.~(x, 5). lThe author thanks Satoshi Sato (Hokuriku Univ.) for showing this challenging example. That is, J (x, s, p) can be considered a condition. where x could be concluded to be p from x being s by analogical reasoning. Now, recalling that an analogical conclusion is obtained from the analogy prime rule with example-based information and similarity-based information, consider what information can be added by the information in relation to the analogy prime rule. 1) Example-based Information: This shows that there exists an object as a base which satisfies a similarity and a projected property ( :l.T'.(~(;r'. S) 1\ I1(x'. P)) ). It seems to be adequate that the base. B. satisfying ~(x', S) can also be derived to satisfy I1(:r'. P) from the prime rule. because B can be considered a target which has similarity S. That is. 3-tuple < B, S, P > satisfies the analogy justification. Consequently, from arbitrariness in selection of an object as a base in this information, what is obtained from this information is :lx'. J(x', S, P). 2) Similarity-based Information: This shows that an object as a target, T, satisfies the same property S in the above. Just by this fact, an analogical conclusion is obtained, by assuming that the object satisfies J by some conjecture. That is, there exists some attribute value p' and 3-tuple < T, S. p' > satisfies J (:lp'. J(T,5,p')). 3) Analogical Conclusion: With the above two pieces of information, an analogical conclusion. "T satisfies I1(x, P)", is obtained from the analogy prime rule. Therefore. such 3-tuple < T. S, P > satisfies J ( J(T, S, P) ). In the above discussion, T, 5, and P are arbitrary. Therefore. the following relation about the analogy justification turns out to be true: Vx.s,p.( :Jx'.J(x',s,p) 1\ :Jp'.J(x,s,p') =:l J(x,s,p) ). (13) (13) is able to represent it equivalently as follows: J(x,s,p) = Jatt(s,p) 1\ Jobj(X,S), (14) where both J att and J obj are predicates, that is, each of them has no free variables other than its arguments . The point shown by this result is that any analogy justification can be represented by a conjunction in which variable .T and variable p occur separately in different conjuncts. By (12) and (14), the analogical prime rule can be defined as follows. Definition 1 Analogy Prime Rule A l'ule is called an analogy prime l'ule w.r.t. < E(x, s); I1(x,p) >, if it has the following form: 508 VX,s,p.(Jatt(s,p) 1\ Jobj(;r,S) 1\ L;(x,s):) II(.1:,p)), (15) where J att , J obj . 2: and II are predicates. (That is. each of Jatt(s,p), Jobj(J',s), ~(J:,s) and II(x,p) is a forrn.ula in which no variablt other than its arguments occm's free.) o In (15), Jatt(s,p) will be called the attribute justification and J obj ( X , s) will he called the object justification. Also, by the above discussion, the following two conjectures can be considered as causes which make analogy non-dedu~tive . • Example-based Conjecture (EC): An object shows a existing concrete combination of a similarity and a projected property. This specializes the prime rule and allows it to be applicable to a similar object. Assuming some generally non-deductive inference system under A, "~A" (we will propose such a system later), 3x.(L;(x,S) 1\ II(x,P)) f'vA Jatt(S,P). (16) • Similarity-based Conjecture (SC): Just" because an ob jed satisfies S', application of the specialized prime rule to the object is allowed. L;(x, S) r-- A Jobj(X, S). (17) In case that the attribution justification (Jatt (s, p)) is a valid formula, example-based information becomes unnecessary in yielding analogical conclusion. Thus, it could, in general, be essential in analogical reasoning to guess Jatt(s,p) which is not a valid formula. The objectjustification (Jobj (x,.5)) is, still, important in another sense, because it can be considered to express a really significant similarity. It is not an unusual case when a really significant similarity is not observable. Consider a case of Example 2. Having a nervous system will be a sufficient condition for an object to feel pain. thus, whether an object has a nervous system is a significant factor in making a conjecture on feeling pain. In this case, however, we could, without dissection. not obtain a direct evidence which shows that Tacitus and Btutus have nervous systems, while we obtain only a circurnstantial evidence that the both feel pain when they are cut. Thus, the similarity-based conjecture is to guess such a really significant but implicit similarity, the object justification (Jobj ( x, s)), from an observed similarity ~(x, s). To summarize, a logical analysis of analogy could draw conclusions as follows. Analogical reasoning is possible only if a certain analogical prime rule is a genuine theorem of a given theory and the process of analogical reasoning can be divided into the following 3 steps: 1) the attribute justification part of the rule is satisfied by EC from example- based information. 2) the object justification part of the rule is satisfied by SC from similarity-based information, and, 3) from similarity-based information and the analogy prime rule specialized by the two preceding steps, an analogical conclusion is obtained by deduction. A question remains unclear, that is, what inference is EC and what SC? Though we cannot identify the mechanism underlying each of the conjectures, we can propose a (generally) non-deductive inference system as their candidates. The next section shows this. 3 Non-deductive Inference for Analogy This section explores a type of generally non-deductive inference by which a conjecture G is obtained from a given theory A with additional information K. Generally speaking, what properties should be satisfied by a, generally, non-deductive inference? It might be desirable that a non-deductive inference satisfies at least the following conditions. First, it should subsume deduction, that is, any deductive theorem is one of its theorems, because any deductive conclusion would be desirable. Secondly, any conclusion obtained by it must be able to be used deductively, that is, from such a conclusion, it should be possible to yield more conclusions using, at least, deduction. And, thirdly, any conclusion obtained must be consistent with given information. We define a class of inference systems which satisfy the above three conditions. An inference system under a theory A (written ~A) is deductively expansible if the following conditions are satisfied. For any set of sentences A and f{ and any sentences G and H, Definition 2 i) Subsuming deduction: if A, f{ f-- G then K ~A G. ii) Deductive usefulness: if iii) f{ ~A G and A,K,G f-- H. then K ~A H. Consi.5tency: if K ~A G and AUK AuK U {G} is consistent. Z.5 consistent, then The following inference system is an example of a deductively expansible system. 509 Definition 3 G is a conjecture from A based on (atomic) circumstantial reasoning (written iff i) A,K r G, J{ ~~ J{ by G) 2. V:l' .( Job j ( X , 5') or r G if there exists a minimal set of atomic formulas 3 E s.t. A, E r K. and Au E i s consistent if AUK is consistent4 . ii) A,E Proposition 1 If K ~A G and K, G r.-~ H, then I{ Corollary 1 If K r.-: G. r.- A H. IT ( x , P )). ( 19) Even if similarity-based information 2:,(T, S) is introduced. to obtain analogical conclusion II(T, P) by circumstantial reasoning, some information apart from the prime rule turns out to be needed in A. And, both EC and SC are generally needed to accomplish analogical reasoning. which implies that multiple application of circumstantial reasoning is necessary. Even in such a case, circumstantial reasoning remains worthwhile (Proposition 1). then K ~A G. Corollary 1 shows that circumstantial reasoning is deductively expansible, and proposition 1 (together with the corollary) shows that inference done by multiple applications of circumstantial reasoning is also deductively expansible. Circumstantial reasoning (K ~: G) implies a very general and useful inference class in that so many types of inference used in AI can be considered as circumstantial reasoning. Deduction and abduction, for example, are obviously circumstantial reasoning. Moreover, if we loosen the condition "atomic formulas" to "clauses", inductive learning from examples is the case where A is empty in general, K is "examples" and G is inductive knowledge obtained by "learning,,5 6 Now, we assume that both EC and SC are circumstantial reasoning, but based on different information. Then, we can see analogical reasoning in more detaiL Let an analogy prime rule w.r.t. < ~(x,s);II(x,p) > be a theorem of A. Then, when example-based information, ~(B, S) /\ II(B, P), is introduced, by circumstantial reasoning from the prime rule, some justifications are satisfied, that is, 'L-(B,S) 1\ II(B,P) 1\ ~ ( x , 5') :) r-: Jatt(S,P) 1\ Jobj(B,S), 4 Classification of Analogy and Examples Each EC and SC has two cases; a deductive one and a non-deductive one. According to this measure, analogical inference can be divided into 4 types. A typical example is shown in each class and explored. 4.1 deductive EC + deductive SC Typical reasoning of this type was proposed by T .Davies and S.Russell [3J. They insisted that, to justify an analogical conclusion and to use information of the base case. a type of rule, called a dete1'mination rule, should be a theorem of a given theory. The rule can be written as follows: Vs.p.( 3x'.('L-(x',s) 1\ II(x',p)) :) Vx.('L-(x,s) :) IT(.1',p)) ) (20) Example 1 (continued). In this example, the following determination rule is assumed to hold under A. Vs,p.( 3x'.(Model(x',.s) 1\ ~raiue(;r',p)) :) Vx.(Model(x,s) :) l/alue(;r.p)) ) (18) which concludes a specialized prime rule, This rule is an analogy prime rule. because 2Circumstantial reasoning is essentially equivalent to "abduction" + deduction [13, 15]. However, "abduction" has many definitions and various usages in different contexts, so we like to introduce a new term for the type of inference in Definition 3 to avoid confusion. 3 Atoms, that is, formulas which contain only one predicate symbol. 4If there exists such a minimal set of atomic formulas E, the case ii) involves the case i) apparently. Thus, the case i) can often be neglected in a usual application, for instance, if J{ is a universal formula which has the form I:tx.F(x), where F is quantifier-free. Note that a clause is universal. 5In this case, G = E in Definition 3, which implies that G is a minimal set to explain "example" J{. Indeed, such minimality is very common in this field. . 6Such a unified aspect of various reasoning in AI was pointed out by Koich Furukawa (lCOT) in a private discussion and a similar and more intuitive view can be seen in [5]. Jobj(X,s) = 2:,(;1'.8) = Model(x.8), Jatt(s,p) = (:lx. Model(x,s) /\ Falue(x.p)), II(x,p) = Falue(x.p). (21) Moreover, EC: Model(Cs ue , Mustang) 1\ Faiue(Cs ue , $3500) f- J att (Mustang,$3500), (22) SC: This illustrates that reasoning based on determination rules belongs to the "deductive EC + deductive SC" type and that it can also be done by circumstantial reasoning. 510 4.2 deductive EC SC + non-ded uctive 4.3 non~deductive EC + deductive SC This type of analogical reasoning was explored by the author [1]. It was concluded that, once we assumed the following two premises for analogical reasoning, it seemed to be an inevitablt conclusion that analogical reasoning which infers P(T) from S'(T), S(B), and P(B) satisfies the illustrative criter'ion. And if an inference system satisfies the criterion, the system is called an illustrative analogy. Premise 1: "Analogy is done by projecting properties (satisfied by a base) from the base onto a target." Premise 2: "The target is not a special object." Premise :2 is also assumed in this paper, it is translated into an arbitrary selection of a target object. Premise 1 was translated as follows: J(B), (where J is the justification in (4) and B stands for a base object) must be a theorem of A, because it is essential in analQgical reasoning to project J(B) onto a target object T. That is, the non-deductive part in this reasoning is just SC which conjectures the property of the target object, and EC must be deductive. Example 2 (continued). By illustrative analogy, a target is conjectured to satisfy properties used in an explanation of why a base satisfies a similarity. In this example, to explain the phenomena of the base case, "Brutus feels pain when he is cut or burnt", the following sentences must be in A. 'ix,i.( Nervous_Sys(x) 1\ Destructive(i) 1\ Suffer(x,-i} :::> FeelPain(x) ), (24) I\N ervousSys(Brutus) (25) I\Destructive(Cut) 1\ Destructive(Burn) (26) As far as the author knows, this type of analogy has never been discussed. Example:3 seems to show this type of analogy. Example 3 (continued). First, let us consider what we know from example-based information in this case. From the fact that a student (StudentB) was a member of the same club (Orch) and often neglected study (Study), we could find that "the orchestra club keeps its members very busy (BusyClub(Orch))" and that "activities of the club are obstructive to one's study (Obstructive_to( Orch, Study))". This implies that we knew some causal rule like "If it is a busy club and its activities are obstructive to something, then any member. of the club neglects the thing." 'ix,.s,p.( BusyClub(s) 1\ Obstructive_to(p,s) I\M ember_of(x, s) :::> NegligenLof(x,p) ) Using this rule, we found the above information. Thus. the above rule is assumed to be a theorem of A. BusyC lube Orch) and Obstructive_toe Orch, Study) are non-deductive conjectures and it can be obtained by circumstantial reasoning based on the above rule which is just an analogy prime rule, as follows: Jobj(x,s) = E(x,s) = Member_of(x,s), J att ( s, p) = BusyC lube s) 1\ Obstructive_to(p, s), IT(x,p) = Negligent-of(x,p). 4.4 From (24), the following follows: 'ix,s,p.( Nervous_Sys(x) (28) non-dedtlCtive deductive SC EC + nOD- I\Destructive( s) 1\ Destructive(p) I\(Suffer(x,s) :::> FeelPain(x)) :::> (Suffer(x,p) :::> FeelPain(x)) ), (27) which is an analogy prime rule, that is, Jobj(x,s) = Nervous_Sys(x), J att ( s, p) = Destructive( s) 1\ Destructil'e(p), E(x,s) = Suffer(x,s)::) FeeIPain(x), IT(x,p) = Suffer(x,p) ::) FeelPain(x). J att ( Cut, Burn) ("Both cut and burn are destructive") is a deductive theorem of A and a non-deductive conjecture, Jobj(Tacitus, Cut) ("Tacitus has a nervous system"), is obtained by circumstantial reasoning from (24) based on the similarity-based information, Suf fer(Tacitus, Cut) ::) FeelPain(Tacitus). As an example of this type, we can take Example 2 again. We might know neither "Brutus has a nervous system" nor "Both cut and burn are destructive", which corresponds to the case that (25) and (26) are not in A (nor any deductive theorem of A) in the previous Example 2. However, by circumstantial reasoning from (24) based on example- based information (" Bru t us feels pain when he is cut or burnt"), "Both cut and burn are destructive" (and "Brutus has a nervous system") can be obtain~d, and based on similarity-based information ("Tacitus feels pain when he is cut"), "Tacitus has a nervous system", a really significant but implicit similarity, is obtained similarly to the previous exampie. Consequently, the analogical conclusion ("Tacitus would feel pain when he is burnt") is derived from (27) (or (24)) together with the above conjectures. 511 5 Conclusion and Remarks • Through a logical analysis of analogy, it is shown to be reasonable that analogical reasoning is possible only if a certain analogy primt rult is a deductive theorem of a given theory. From the rule. together with an example-based conjecture and a similarity-based conjecture, the analogical conclusion is derived. A candidate is shown for a non-deductive inference system which adequately yields both conjectures. • Results shown here are general and do not depend on particular pragmatic languages like the purpost predicate [10] nor on some numeric similarity measure [20]. These results can be applied to any normal deductive data bases (DDB) which consist of logical sentences. Application of this analogical reasoning to DDB may be one of the most fruitful. It is. generally speaking, very difficult to build a DDB which involves perfect knowledge about an item. Analogical reasoning will increase the chance of answering queries adequately, even when its deductive operation fails to answer. In a DDB, it is very common to see inheritance rules and transitivity( -like) rules, which have the form of the analogy prime rule, for instance, in the area of artificial intelligence. Analogical reasoning differs from other reasoning, abductivt and deductive, in that analogical reasoning actually uses example-based information (the base information). Consider the difference from. this time. abduction in the above database case. Even if the database uses (ordinal) abductive reasoning in the query, it cannot specify an adequate grandparent of Tom. the possible answer will be x s.t. Gran_pa(x, Tom), Parent(x, Sue), (:3z. )(Parent(x, z), Parent(z. Tom)), or Sue assuming Parent(Sue, Sue), etc [2. 14, 18. 9]. The reason for this failure is that abduction tries to explain only the target case. Moreover. comparing with enumerative induction and cast-based reasoning (eBR) in which the use of examples are essential similarly to analogical reasoning, analogical reasoning has a salient feature in more strongly depending on a background knowledge (a given theory). Analogy can be seen as a singh instance generalization as Davies and Russell pointed out [3]. Take an example, Example 3. From the analogy prime rule (28) and example-based information of an base case (StudentB), some nondeductive inference (ex. circumstantial reasoning) yields a more specified analogy prime rule, 'v'x.( Member_of(x,Orch) :::J NegligenLof(x,Study) ), (33) Gran-pa(x, y) : -Parent(x,z),Parent(z, y). (29) This is an analogy prime rule w .1'. t. < Parent(z, y); Gran_pa(x, y) > (z is a variable for the similar attribute value and x is a variable for the projected attribute value). Assume that a query "7 -Gran_pa(x, Tom)" is given to a database A which involves the above rule and the following facts: Parent(Sue, Tom). Gran_pa( John, Bob). (30) Parent( Sue, Bob). (32) (31) The database cannot answer the query q.eductively, because it does not know who is a parent of Sue. If the database uses the proposed type of analogical reasoning, it is able to guess Gran_pa( John, Tom) from Bob's case just because Tom is similar to Bob in that their parents is the same. Interestingly, a method which discovers an analogy prime rule from knowledge data-base CYC is explored independently [1 7]. Such methods make analogical reasoning more common in DDB. • By the side effect of this analysis. it becomes possible to compare analogy with other reasoning formally which have been studied vigorously which is a generalization of the example-based information, Member _of(StudentB, Orch) AN egligenLof(StudentB, Study). (34) We should note that, in the process of this single instance generalization, an analogy prime rule in a background knowledge is used as an intermediary, and it might be considered the reason why analogy seems more plausible than a simple single instance generalization such that it yields (33) just from (34). In the research offormal inductive inference [16, 12], a back ground knowledge does not play such an important role. So, plenty of examples are needed until a plausible conclusion is obtained. Concerning eBR [19], though it uses base cases like analogical reasoning and, in order to retrieve their base cases, it uses an index which corresponds to the similarity S, the index is assumed to be given in spite of using background knowledge. Intuitively speaking, these methods will be very useful when a background knowledge is rather poor or difficult to formulate. and when the background knowledge is extremely strong or able to be formulated perfectly. deduction will be most usefuL on the other 512 hand, the proposed type of analogy will be useful when rather strong and difficult to formulate. • An implementation system for this type of analogy has been developed. Given a theory A, a target T and a prqjected attribute II(x,p) (from a query, "? - II( T, p)"), this system finds a base B, a similarity E(x, S) and a projected property II(x, P) (ie. "II(T, P)" is the answer of the query) by the process with backtracking, according to the following steps: 1) Find a separate rule SepR s.t. A f- SepR, where SepR = II(x,p) :- Gatt(s,p),Gobj(X,S). 2) Take a similar attribute E(:r,s) s.t. E(x,s) rv~ Gobj(X,S). 3) Obtain the similar attribute value S by the side effect of a proof A f- :ls.E(T,s). 4) Retrieve a base B and obtain the projected attribute value P by the side effect of a proof A f- ::Jx,p.(E(x, S) 1\ II(x,p)). Here, a separate rule (w.r.t. II(x,p)) is a Horn clause in which the head is II(x,p), and any variable of x and any variable of p does not appear in the same conjunct in the body. This system guesses successfully for the examples shown here, though each of them is translated into a set of Horn clauses. Significant restrictions are needed on the time complexity of this process. Details of this system will be reported elsewhere. Acknow ledgment I especially wish to thank Satoshi Sato for his frank comments and challenging problems. I am also grateful to Koichi Furukawa, Hideyuki Nakashima, Natsuki Oka, and five anonymous referees for their constructive comments, Makoto Haraguchi and members of ANR-WG, which was supported by ICOT, for discussions on this topic, Katsumi Inoue and Hitoshi Matsubara for discussions on abduction and CBR respectively, and Kazuhiro Fuchi for giving me the opportunity to do this work. References [1] Arima, J.: A logical analysis of relevance in analogy, in Proc. of Workshop on Algorithmic Learning Theory (ALT'.91), (1991). [2] Cox P.T. and Pietrzykowski T.: Causes for events: their computation and applications, in: Proc. of Eighth International Conference on Automated Deduction, Lecture Notes in Computer Science 230 (Springer-Verlag, Berlin, 1986) pp. 608-621. [3] Davies, T. and Russell, S.J.: A logical approach to reasoning by analogy, in IJCAI-81, pp.264-270 (1987). [4] Evans, T,G.: A program for the solution of a class of geometric analogy intelligence test questions, in: M.Minsky (Ed.), Semantic Information Processing (MIT Press. Cambridge, MA, 1968). [.5] Falkenhainer, B.: A unified approach to explanation and theory formation, in: J.Shrager & P.Langley (Ed.), Computational Models of Scientific Discovery and Theory Formation, (Morgan Kaufmann, San Mareo, CA. 1990). [6] Gentner, D.: Structure-mapping: Theoretical Framework for Analogy, in: Cognitive Science, Vol.7. No.2, pp.155-170 (1983). [7] Greiner, R.: Learning by understanding analogy, Al'tificial Intelligence, Vol. 35, pp.81-125 (1988). [8] Haraguchi, M. and Arikawa, S: Reasoning by Analogy as a Partial Identity between Models, in Proc. of Analogical and Inductive Inference (ALL '86), Lecture Notes in Computer Science 265, (SpringerVerlag, Berlin, 1987) pp 61-87. [9] Inoue. K.: Linear Resolution for ConsequenceFinding, in A rtificial Intelligence (To appear). [10] Kedar-Cabelli, S.: Purpose-directed analogy, in the 7th Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum Associates, pp.150-159 (1985). [11] Kling, R.E.: A paradigm for reasoning by analogy, A rtificial Intelligence 2 (1971). [12] Muggleton, S. and Buntine, W.: Machine Invention of First-Order Predicates by Inverting Resolution, In: Proc. of 5th International Conference on Machine Learning, pp 339-352 (1988). [13] Peirce C.S.: Elements of Logic, in: C. Hartshorne and P. Weiss (eds.), Collected Papers of Charles Sanders Peirce, Volume 2 (Harvard University Press, Cambridge, MA, 1932). [14] Poole D., Goebel R. and Aleliunas R.: Theorist: a logical reasoning system for defaults and diagnosis, in: N. Cercone and G. McCalla (eds.), The Knowledge Frontier: Essays in the Representation of Knowledge (Springer-Verlag, New York, 1987) 331-352. [1.5] Pople, H.E.Jr.: On the mechanization of .abductive logic, in: Proceedings IJCAI-13, Stanford, CA (1973) 147-152. 513 [16] Shapiro, E.Y.,: Inductive Inference of Theories From Facts, TR 192, Yale Univ. Computer Science Dept. (1981). [17] Shen ,W.:Discovering Regularities from Knowledge Bases, Proc. of Knowledge Discovery in Databases Workshop 1991, pp 95-107. [18] Stickel M.E.: Rationale and methods for abductive reasoning in natural-language interpretation, in: R. Studer (ed.), Natural Language and Logic, Proceedings of the International Scientific Symposium, Hamburg, Germany, Lecture Notes in Artificial Intelligence 459 (Springer-Verlag, Berlin, 1990) 233-252. [19] Schank, R.C.: Dynamic Memory: A The01'y of Reminding and Learning in Computers and Peoplt (Cambridge University Press, London. 1982). [20] Winston, P.H.: Learning Principles from Precedents and exercises, Artificial Intelligence, Vol. 19, No. :3 (1982). Appendix Proposition 1. If K r.- A G and K, G r.-:t H, then K r.- A Proof of Pr L For any formula G s.t. K r.- A G and K. G ~~ H. case-i) A, K. G f- H (from K, G r.-~ H ) From the premises. A. K. G f- L. L. (from Definition :3 i)) Therefore. K. G r.-: case- ii) otherwise. for some minimal set of atomic formulas E S.t. A. E f- K A G. A. E f- K A H. (from]{, G ~~ H) Therefore. A, E f- L. Thus. K. G 1. r.-: Thus K. G ~~ 1. iii) Consistency: if K r.-~ H AuK U and AUK is consistent, then {H} is consistent. (proof) Au J{ is consistent. =} Au J{ u {G} is consistent. (from J{ ~A G) =} Au E is consistent. (from J\'. G ~~ H) =} Au]{ u {E}. (because A. E f- J{ A H) Corollary 1. If 1{ r.-~ G. then K ~A G. Proof of Corollary 1. K ~~ K (from subsuming deduction) If K ~A K and K. K ~~ G, then J{ ~A G. (from Proposition 1) Therefore. If K ~~ G, then K ~A G. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 514 CONSISTENCY-BASED AND ABDUCTIVE DIAGNOSES AS GENERALISED STABLE MODELS Chris Preist, Kave Eshghi Hewlett Packard Laboratories, Filton Road, Bristol, BS12 6QZ, Great Britain cwp@hplb.hpl.hp.com ke@hplb.hpl.hp.com Abstract If realistic systems are to be successfully modelled and diagnosed using model-based techniques, a more expressive language than classical logic is required. In this paper, we present a definition of diagnosis which allows the use of a nonmonotonic construct, negation as failure, in the modelling language. This definition is based on the generalised stable model semantics of abduction. Furthermore, we argue that, if negation as failure is permitted in the modelling language, the distinction between abductive and consistency-based diagnosis is no longer clear. Our definition allows both forms of diagnosis to be expressed in a single framework. It also allows a single inference procedure to perform abductive or consistency-based diagnoses, as appropriate. 1 Introduction Many different definitions of diagnosis have been used in an attempt to formalise and automate the diagnosis process. In the so-called 'logical' approach, two frameworks, namely the consistency-based [Reiter 1987] and abductive [Cox and Pietrzykowski 1986], have attracted a lot of attention. Typically, the modelling language used in these frameworks is first order logic (or some subset of it). In this paper we present a unified framework for diagnosis which brings together these two styles of diagnosis, as well as providing a non-monotonic modelling language. We were primarily motivated by the need to incorporate negation asfailure, the non-monotonic construct in logic programming, into the modelling language. We first show the need for this construct through some examples, and then argue that the incorporation of negation as failure in the modelling language necessitates the inclusion of both consistency-based and abductive diagnosis within the same framework. We then present our unified framework, which allows negation as failure in the modelling language and naturally incorporates both abductive and consistency-based diagnosis. We then show that in the special cases, our approach reduces to pure consistency and pure abductive diagnosis, i.e. it is a generalisation of both styles. Our work is similar in spirit to the work of Console and Torasso, [1990],[1991], but goes beyond it in many ways. We will compare our approach to that of Console and Torasso in a later section. Our proposed framework is based on the Generalised Stable Model semantics [Kakas and Mancarella 1990a] of generalised logic programs with abduction, strengthening the link between logic programming and diagnosis first explored in [Eshghi 1990]. 2 Consistency-based and abductive approaches to diagnosis In both consistency-based and abductive approaches, a set of axioms SO (called the system description) models the system under investigation, and a set of abnormality assumptions Ab={ab 1 ,ab2 , ... abn} represents the possible underlying causes of failure. A set of statements, Obs, represents observations of the behaviour of the system which are to be explained. In the consistency-based approach, a diagnosis is a set of abnormality assumptions, L\, such that (1) SOuOBSuL\u{ -,abkl abkE Ab-L\} is consistent. The consistency-based approach focuses primarily on a model of the system's correct behaviour. When the abnormality assumptions relate to the failure of the components of the system, it attempts to find a set of normality and abnormality assumptions which can be assigned to the system's components to give a theory consistent with the observations. In the abductive approach, a diagnosis is a set of abnormality assumptions, L\, such that (2) SOuL\ ~ OBS SOuL\ is consistent. The abductive approach primarily models the behaviour of a failing system, by using fault models in the system description, SO. The diagnosis process consists of look- 515 cl' d1 Figure 1: A pre-charged line ing for a set of abnormality assumptions which, when adopted, will logically predict the observed faulty behaviour given the system description and the context of the observation. In both approaches, a diagnosis 8 is defined to be minimal if there is no other diagnosis, 8', which is a proper subset of 8. 3 The Diagnosis Problem The system description used in model-based diagnosis takes one of two forms. It is either a causal model, or a model consisting of the system's structure and the behaviour of individual components. In general, work on abductive diagnosis has focused on the former, while work on consistency-based diagnosis has focused on the latter. For the purposes of this paper, we adopt a specification of a diagnosis problem based on those used in [deKleer and Williams 1987] and [Reiter 1987], which uses a component-based approach. However, the results hold equally for a causal model-based approach, and for this reason, we adopt slightly more general language in the definition. in cluster Cj are present, the 'good behaviour model' of this cluster; good_behaviour_model f-not ab( Cj). In the component-based approach, Cj represents a component, and each cause in cluster S represents a possible fault model of the component. Note that the effects of a cause need not be defined deterministically. For example, the 'arbitrary behaviour' mode of a component, proposed in [deKleer and Williams 1989], is consistent with any behaviour of the component, but predicts nothing. The logical language adopted to represent SO can vary with the definition of diagnosis adopted. In this paper, we focus on two possible languages; classical logic, as adopted by Reiter [1987], and hom clauses with negation as failure, as used in the logic programming community. 4 The need for negation as failure in the system description The desire to integrate consistency-based and abductive diagnosis was motivated primarily by the need to include negation as failure in our models. The following two examples illustrate this need: RAM modelling Definition: A diagnosis problem consists of a triple, where; . (i) The system description, SO, specifies the behaviour of the system. (ii) The observation set, OBS, specifies a set of observations of the system as unit clauses. (iii) C consists of constants,"'cj, which represent causal clusters within the system. Causal clusters are groups of causes of abnormal system behaviour which it makes sense to consider together. Each cause, n, within the cluster, ch is modelled in SD with two clauses; eJfects_of_cause_n f-ab(cj, n). ab(Cj) f-ab(cj, n). Furthermore, if so desired, we can define emergent properties of the system which occur when none of the causes In order to model the behaviour of a random access memory cell, we needed an axiom that says: the content of a cell at time T is X if X was written to this cell at time T, and no other write operation has been performed between T and T. The most straightforward way of writing this is as the clause contents(Cell, X. T) f- written(Cell, X, I'), T be an abductive framework, and L\ k atomseA) be a set of abducibles. Then the set M(L\) of ground atoms is a generalised stable model (GSM) for iff it is a stable model for the logic program PuL\, it is a model for th~ integrity constraints Ie, and L\=AnM(L\). The above definition is an extension of that in [Kakas and Mancarella 1990a] to allow abducibles to appear in the head of a clause. As a result of this, the set of abducibles chosen as generators can be smaller than L\, the set of abducibles true in the generalised stable model. A unit clause, q, representing an observation, has an abductive explanation with hypothesis set ~ if there exists a generalised stable model, M(L\), in which q is true. Equivalently, we can say that q has an abductive explanation, L\, within the abductive framework

if the abductive framework has a generalised stable model M(L\). Having q in the integrity constraints imposes the condition that q must be true in the generalised stable model, and hence must follow from the logic program together with the set of abducibles chosen. Here, we briefly recall their definitions; Definition 1 An abductive framework is a triple where 1) P is a set of clauses of the form H f-- L l> .. ,Lk kO where H is an atom and Li is a literal. 2) A is a set of predicate symbols, the abducible predicates. The abducibles, Ab, are then all ground atoms with predicate symbols in A. 3) IC, the integrity constraints, is a set of closed formulae. Hence an abductive framework extends a logic program to include integrity constraints and abducibles. The semantics of this framework is based on the stable model semantics for logic programs; Definition 2 Let P be a logic program, and M a set of atoms from the Herbrand base. Define PM to be the set of ground horn clauses formed by taking grdund(P), in clausal form, and deleting; (i) each clause that has a negative literal--.l in its body, and 1 EM. (ii) all negative literals --.1 in the body of clauses, where 1 eM. M is a stable model for P if M is the minimal model of PM' This definition is extended to give a semantics to abductive frameworks. 7 Generalised Stable Models and Diagnosis The generalised stable model semantics for abduction can be applied to diagnosis by mapping a diagnosis problem, , with multiple observations, onto an abductive framework as follows; Represent the system description, SD, as a logic program with integrity constraints, . The integrity constraints will usually contain sentences stating that observation points cannot take multiple values at a given time. Let the abducibles represent the causes within the clusters, {ab(ci.n)1 ciE C}, hence A = {ab(X,N)}. Intuitively, given an observation set aBS, represented by a set of unit clauses, we have a choice of how to use it. We either wish to predict it, giving an abductive diagnosis, or make assumptions to restore the theory to consistency, giving a consistency-based diagnosis. By adding aBS to the integrity constraints, only models in which the observations are true, and hence explained by the system description together with selected abducibles, are legal generalised stable models. Hence we get an abductive diagnosis. If, instead, we add aBS to the logic program representing the system description, then a set of assumptions can only be made if they are consistent with the observations; i.e. the observations, system description and assumptions cannot derive anything which violates the integrity constraints. This will give us consistency-based diagnoses. Furthermore, 518 we can partition OBS into two sets, and predict some observations, OBSp, while maintaining consistency with others, OBSe. We do this by placing OBS p in the integrity constraints, and OBSe in the logic program. This allows us to give a definition of unified diagnosis as follows; Definition 4 Let be a diagnosis problem, where; SO is a logic program with integrity constraints, . OBSp is the set of observations to be predicted by diagnoses. OBSe is the set of observations which diagnoses need to be consistent with. C is the set of causal clusters in the system. Then; d is a GSM-diagnosis of iff there is a generalised stable model, M(d), of the abductive framework ' where A = {ab(C,N)} represents the set of possible root causes of misbehaviour in SO. To demonstrate this, we consider a simple example from the medical domain, that of pericardiai tamponade. The heart consists of two parts, the myocardium is the muscle which beats, while the pericardium is the protective sac which surrounds this muscle. If this sac is pierced, instantaneous pain occurs, which can subside fairly quickly. However, blood slowly flows into the pericardium over a period of time, increasing the pressure on the myocardium. Later, the myocardium will become so compressed that blood does not flow round the arteries, even though the myocardium itself is functioning perfectly. The model of this phenomenon is given below. For simplicity, we treat time discretely, in units of hours. pulse_ok(T) f- normaLcardiac_contraction(T), not hearCcompressed(T). no-pulse(T) f- hearCcompressed(T) . ab(pericardium,pierced(T)), TUlse(12). Let us consider the generalised stable models of . If we place the observation in the logic program as a unit clause, any set of abducibles can be assumed as long as they do not violate the integrity constraints - i.e. they must not generate a stable model in which pulse_ok(12) is true. If we assume nothing, the resulting stable model contains pulse_ok(12) as true, resulting in a conflict. There are two possible (minimal) ways to restore consistency. We can assume ab(myocardium,failure(1 0» 1, and cease to contain normaLcardiac_contraction(12) in the stable model. Alternatively, we assume ab(pericardium,pierced(2» 1, which predicts heart compression at time 12. The resulting stable model will therefore not contain pulse_ok(12), and so be a legitimate generalised stable model of . If, instead, we place the observation in the integrity constraints, Ie, we are restricted to stable models which contain nOJ)ulse(12). In this case, only by assuming ab(pericardium,pierced(2» do we generate a stable model which contains nOJ)ulse(12). As this also satisfies IC, it is a legitimate GSM for . Hence, by making a choice of where to place the observation, we can generate either consistency-based or abductive diagnoses. Furthermore, if we have a second observation, ecg-1)ood(12), we can choose to treat it in a different way from the first. Let OBS p = {noJ)ulse(12)} and OBSe = {ecQ-1)ood(12)}. In this case, the only (minimal) GSM of is that generated by ab(pericardium, pierced(2». However, if we swap OBSp and OBSe, the only (minimal) GSM is that generated by ab(myocardium, failure(10». Note how the model uses negation-as-failure to handle the frame problem. If we used classical negation instead, it would be necessary to have extra clauses to predict nOCheart_compressed at all relevant times, resulting in a larger, less understandable, and less efficient model. 8 Abductive and consistency-based diagnosis as special cases If we restrict our attention to the traditional definitions of diagnosis, we can show that our definition is equivalent to these under certain conditions. 1 Or, of course, at any other appropriate time instant. 519 8.1 Abductive Diagnoses as Generalised Stable Models If all the observations are to be predicted in the abductive sense, and the system description contains only hom clauses, our definition of diagnosis reduces to the standard definition of abduction given in section 1. This is achieved as follows: Given an abductive diagnosis problem , where SO is a hom-clause theory, divide the system description into a set of definite clauses, P, and a set of denials, O. Let A be the set of abducibles. It is easy to show that abductive diagnoses of SO according to formula (2) correspond to generalised stable models of the framework ' 8.2 Consistency-Based Diagnoses as Generalised Stable Models For a certain class of theories, namely almost-horn theories, we show that our definition of diagnosis is equivalent to the traditional definition of consistency-based diagnosis given in [Reiter 1987]. An almost-hom theory is a theory in which negation is used only to represent the negation of certain predicates. In the context of our theorem, these correspond to the abnormality assumptions. A clause is said to be almost-Horn with respect to A, if, when in disjunctive normal form, it contains at most one positive literal with a predicate symbol not in A. Theorem Let be a consistency-based diagnosis problem, with SO a theory which is almost-hom with respect to A={ab}. Then define the logic program with integrity constraints, SO·=, as follows; E atoms{A), and P. qj 9 Comparison with Console & Torasso [2] Console & Torasso have defined a framework for a general abduction problem. This framework allows a spectrum of diagnosis styles to be represented within it, including the pure consistency-based and abductive styles described above. They divide the observations into two sets. One set, OBSa' is to be explained by the assumptions, while the other set, OBSe, must be consistent with the assumptions. They then define two sets; r=OBS a · 'I' = { -,f{x) I f{Y)E OBSeo x:;t:y} A diagnosis is then a set of abducibles which, when added to the theory, allows prediction of all observaand is consistent with the negative literals in tions in r, Definition 5 Let aj resent the normality assumptions in the system, -,ab, then the nonmonotonic definition of diagnosis given by us is equivalent to the monotonic definition given in [Reiter 1987]. However, if negation is used elsewhere in the theory, the two definitions diverge. The classical consistency-based definition requires explicit representation of all negative information. The GSM-diagnosis, however, will make the closed-world assumption, and assume information is false unless it can be proved otherwise. ~ atoms{A). 1. For every clause of the form pr -,al,-,a2 ...-,ak.ak+l .... am.ql,q2 ... 'qn in SO, there is a program clause pr not al,not a2 ... not ak.ak+l .... am.q1,q2, .. 'qn in P. 2. For every clause of the fctrm alva2 ... vakv-,ak+lv ... v-,am-,q1v-,q2v .. v-,qn in SO there is an identical clause in IC. Then; o is a consistency-based diagnosis of according to formula (1) ¢:> D is a GSM-diagnosis of The proof of this theorem is available in an extended version of this paper, available from the authors. This theorem shows that, if negation is used only to rep- '1'. Our definition is more powerful in several ways. It extends the definition of Console and Torasso from hom-clause theories to general logic programs with integrity constraints. This gives a sophisticated and expressive language for modelling, which includes negation as failure. The inclusion of the consistency-based observations in the object level, rather than their negations in the integrity constraints, means that these can be used easily during inference. This can reduce the time to find a conflict, by using 'backwards simulation' of components. In some cases, such as the example documented in [van Soest et al. 1990] , certain diagnoses cannot be found without access to the observations in this way. Within this framework, it is possible to define minimal diagnoses model-theoretically. We will expand on this in section 10. Placing the consistency-based observations at the object level potentially gives us more efficient inference. However, to do this in the context of joint diagnoses can lead to problems. It may be possible to conclude that an abductive obser- 520 vation is true, based on the adding of a consistencybased observation to the theory alone; SD: obs1 -7obs2 they must be mutually exclusive logically. This can easily be achieved by adding an integrity constraint forbidding a component to have two modes; false ~ ab(ci,mjl)' ab(ci,mj2), mjl"",mj2. OBSa :obs2 OBSc :obs1 By adding obs1 to the system description, we can conclude that obs2 is true. Whether this is legitimate depends on how we interpret the consistency-based obselVations. If we consider them true, but not necessarily explainable, then this is legitimate. This is the case in Reiter's formalisation of diagnosis, and also in the case of the setting factors of Reggia et al. [1983]. However, if we consider them not necessarily true, merely not false, then this is unacceptable. In such circumstances, it is necessary to restrict the model so that consistency-based obselVations do not appear in the body of clauses, or use the approach proposed by Console and Torasso. The framework provided by Console and Torasso satisfies the second of these conditions, but not the first. Because they work in a monotonic framework, it is not possible to represent the correct behaviour of a component as the default behaviour; instead, it must be explicitly assumed that a component behaves correctly. As a result of this, they must specify a semantic minimisation criterion; a diagnosis is minimal if it contains a minimal set of abducibles corresponding to faulty behaviour. We, however, can specify a model theoretic criterion; A diagnosis, .1, is minimal if its corresponding GSM, M(.1), is a minimal GSM. 11 10 Minimality We now focus attention on component-based diagnosis, and consider the problem of minimal diagnoses. We wish to restrict our attention to those diagnoses which contain a minimal number of failing components. To do this, we introduce minimal generalised stable models; Definition: A general stable model, M(.1), for an abductive framework,, is minimal if there is no other GSM, M(.1'), such that.1'c.1. Hence, a minimal general stable model contains a minimal set of assumptions which allow the consequences of the logic program P to satisfy the integrity constraints, IC. Note that, because abductive frameworks are nonmonotonic, this does not imply that any superset of .1, <1>, will have a GSM, M(<1». If, in our diagnosis framework, we have a 1-1 correspondence between a hypothesised failed component and an abducible being assumed in the abductive framework, then minimal general stable models will correspond to minimal diagnoses. To do this, we must impose two restrictions on the relationship between the frameworks; (i) There must be no abducible representing the correct behaviour of a component. This must instead be a default behaviour which is used in the absence of abducibles referring to the faulty behaviour of a component. (ii) It must be illegal to make more than one assumption about a component's behaviour at a time. Note that the second condition does not force fault modes to be mutually exclusive in real-life, merely that Calculating Diagnoses By providing a uniform model-theoretic framework for consistency-based, abductive and joint diagnoses, we have also provided a method for a uniform implenientation. We simply need an algorithm for generating the minimal generalised stable models of an abductive framework, and we can use this for performing a variety of diagnosis tasks. . Much work has been carried out on the generation of stable models, and several efficient algorithms exist. However, as general stable models are a newer innovation, these results have yet to be fully exploited and extended to the GSM case. Currently, the state of the art in GSM generation is provided by Satoh and Iwayama [1991]. This work, however, has the drawback that it does not produce minimal GSMs. Traditionally, in the abductive community, top-down algorithms have been used which tend to generate minimal solutions, as they avoid making irrelevant assumptions. (e.g. [Cox and Pietrzykowski 1986] [Kakas and Mancarella 1990b]) However, non-minimal abductive diagnoses are still acceptable in the model-theoretic semantics, and can be generated by the algorithms. Similarly, in the diagnosis community, generation of minimal diagnoses has tended to be a consequence of the algorithm selected (e.g. the ATMS in [deKleer and Williams 1987]) rather than a model-theoretic restriction. However, Eshghi [1990] proposes an alternative approach. He generates a theory in which minimal diagnoses correspond exactly to the stable models of the theory. This means that non-minimal diagnoses are excluded by the semantics, rather than the algorithm. By extending these results beyond the almost-horn case, we are able to transform an abductive framework into a 521 logic program. The stable models of this logic program correspond exactly to the minimal generalised stable models of the abductive framework. This means that minimality is brought into the theory as a necessary property of each solution, rather than being a selection criterion between solutions. This work is currently in progress. [deKleer and Williams 1989] J. deKleer & B. Williams. Diagnosis with Behavioural Modes. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit 1989. As a result of this, a wider variety of literature can be used to select appropriate and efficient algorithms, rather than being restricted to algorithms which have been developed specifically for the task of diagnosis. [Dressler 1990] O.Dressler. Computing Diagnoses as Coherent Assumption Sets. Proceedings of the First International Workshop on Principles of Diagnosis, Menlo Park 1990 12 [Eshghi 1990] K. Eshghi. Diagnoses as Stable Models. Proceedings of the First International Workshop on Principles of Diagnosis, Menlo Park 1990 Conclusions By moving to a nonmonotonic logical framework, it is possible to bring abductive and consistency-based diagnosis together, and use the same inference method to perform both. We have done this by using generalised stable models to provide the semantics, which provides us with a rich and expressive modelling language. It also gives a link between diagnosis and logic programming, allowing application of theoretical and practical logic programming results to the domain of diagnosis. Acnowledgements Thanks to Bruno Bertolino and Enrico Coiera for their assistance. References [Console et al. 1990] L.Console, D. Theseider Dupre & P.Torasso. A Completion Semantics for Object-level Abduction. Proc. AAAI Symposium in Automated Abduction, 1990. [Console et al. 1991] L.Console, D. Theseider Dupre & P.Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 2(5), Sept. 1991. [Console and Torasso 1990] L.Console & P. Toras so. Integrating Models of the Correct Behaviour into Abductive Diagnosis. Proceedings of the 9th European Conference on Artificial Intelligence, 1990. [deKleer and Williams 1987] J. deKleer & B. Williams. Diagnosing Multiple Faults. Artificial Intelligence 32:97 -130, 1987. [Eshghi and Kowalski 1989] K. Eshghi & R Kowalski. Abduction compared with Negation as Failure. Proceedings of the 6th Int. Conf. on Logic Programming, Lisbon 1989, pp234-254. [Eshghi and Preist 1992] K. Eshghi and C. Preist. The Cachebus Experiment: Model Based Diagnosis applied to a Real Problem in Industrial Applications of Knowledge-Based Diagnosis, ed Guida and Stefanini, Elsevier 1992. [Gelfond and Lifshitz 1988] M. Gelfond & V. Lifshitz. The Stable Model Semantics for Logic Programming. Proceedings of the Fifth International Conference on Logic Programming, 1988. [Kakas and Mancarella 1990a] A. Kakas & P. Mancerella. Generalised Stable Models: A Semantics for Abduction. Proceedings of the 9th European Conference on Artificial Intelligence, 1990. [Kakas and Mancarella 1990b] A. Kakas & P. Mancarella. On the relation between Truth Maintenance and Abduction. Proceedings of PRICAI, 1990. [Reiter 1987] R Reiter. A theory of diagnosis fromfirsl principles, Artificial Intelligence Journal 32, 1987 [Reggia et al. 1983] J.A. Reggia, D.S. Nau & P.Y. Wang. Diagnostic Expert Systems based on a Sel Covering Model. Int. 1. of Man-Machine Studies 19, p437-460. (1983) [Console and Torasso 1991] L.Console & P.Torasso. A Spectrum of Logical Definitions of Model-Based Diagnosis. University of Torino Technical Report, 1991. [Satoh and Iwayama 1991] K. Satoh & N. Iwayama. Computing Abduction by using the TMS. Proceedings of the Eighth International Conference on Logic Programming, 1991. [Cox and Pietrzykowski 1986] P.T. Cox & T. Pietrzykowski. Causes for Events: their Computation and Application. Proc. 8th conference on Computer Aided Design and Engineering, 1986. [Shanahan 1989] M. Shanahan. Prediction is Deduction but Explanation is Abduction. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit 1989. [Davis 1984] R Davis. Diagnostic Reasoning based on Structure and Behaviour. Artificial Intelligence 24:347-410, 1984. [vanSoest et al. 1990] D.C. van Soest, RR Bakker, F. van Raalte & N.J.!. Mars. Improving effectiveness oj model-based diagnosis, Proc. 10th international workshop on expert systems and their applications, Avignon 1990. [deKleer et al.1990] J. deKleer, A. Mackworth & R Reiter. Characterizing DiagnOses. Proceedings of the Eighth National US Conference on Artificial Intelligence, Boston 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 522 A Forward-Chaining Hypothetical Reasoner Based on Upside-Down Meta-Interpretation Yoshihiko Ohta Katsumi Inoue Institute for New Generation Computer Technology Mita'Kokusai Bldg. 21F, 1-4-28 Mita, Minato-ku, Tokyo 108, Japan {ohta, inoue}@icot.or.jp Abstract A forward-chaining hypothetical reasoner with the assumption-based truth maintenance system (ATMS) has some advantages such as avoiding repeated proofs. However, it may prove subgoals unrelated to proofs of the given goal. To simulate top-down reasoning on bottom-up reasoners, we can apply the upside~down meta-interpretation method to hypothetical reasoning. Unfortunately, when programs include negative clauses, it does not achieve speedups because checking the consistency of solutions by negative clauses should be globally evaluated. This paper describes a new transformation algorithm of programs for efficient forward-chaining hypothetical reasoning. In the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed to find irrelevant negative clauses, so that the forward-chaining hypothetical reasoners based on the upside-down meta-interpretation can restrict consistency checking of negative clauses to those relevant clauses. The transformed program has been evaluated with a logic circuit design problem. 1 Introduction Hypothetical reasoning [Inoue 88] is a technique for proving the given goal from axioms together with a set of hypotheses that do not contradict with the axioms. Hypothetical reasoning is related to abductive reasoning and default reasoning. A forward-chaining hypothetical reasoner can be constructed by simply combining a bottom-up reasoner with the assumption-based truth maintenance system (ATMS) [de Kleer 86-1] (for example [Flann et al. 87, Junker 88]). We have implemented a forward-chaining hypothetical reasoner [Ohta and Inoue 90], called APRICOT /0, which consists of the RETE-based inference engine [Forgy 82] and the ATMS. With this architecture, we can reduce the total cost of the label computations of the ATMS by giving intermediate justifications to the ATMS at two-input nodes in the RETElike networks. On the other hand, hypothetical rea- soning based on top-down reasoning has been proposed in [Poole et al. 87, Poole 91]. Compared with top-down (backward-chaining) hypothetical reasoning, bottom-up (forward-chaining) hypothetical reasoning has the ad-' vantage of avoiding duplicate proofs of repeated subgoals and duplicate proofs among different contexts. Bottomup reasoning, however, has the disadvantage of proving unnecessary sub goals that are unrelated to the proofs of the goal. To avoid the disadvantage of bottom-up reasoning, Magic Set method [Bancilhon et al. 86] and Alexander method [Rohmer et al. 86] have been proposed for deductive database systems. Recently, it is shown that Magic Set and Alexander methods are interpreted as specializations of the upside-down meta-interpretation [Bry 90). The upside-down meta-interpretation has been extended to abduction and deduction with non-Horn clauses in [Stickel 91]. His abduction, however, does not require the consistency of solutions. Since the consistency requirement is crucial for some applications, we would like to make programs in'dude negative clauses for our hypothetical reasoning. When programs include negative clauses, however, the upsidedown meta-interpretation method does not achieve speedups because checking the consistency of solutions by ,negative clauses should be globally evaluated. We' present a new transformation algorithm of programs for efficient forward-chaining hypothetical reasoning based on the upside-down meta-interpretation. In the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed to find irrelevant negative clauses, so that the forward-chaining hypothetical reasoners based on the upside-down metainterpretation can restrict consistency checking of negative clauses to those relevant clauses. The transformed program has been evaluated with a logic circuit design problem. In Section 2, our hypothetical reaso~ing is defined with the default proofs [Reiter 80T. In Section 3, the outline of the ATMS is sketched. Section 4 shows the basic algorithm for hypothetical reasoning based on the bottom-up reasoner MGTP [Fujita and Hasegawa 91) together with 523 the ATMS. Section 5 presents two transformation algorithms based on the upside-down meta-interpretation. One is a simple transformation algorithm, the other is the transformation algorithm with the abstracted dependency analysis. We have implemented the hypothetical reasoner and these program transformation systems, and Section 6 shows the result of an experiment for the evaluation of the transformed programs. In Section 7, related works are considered. Let ~ be the set of all ground instances of the normal defaults of D. A default proof [Reiter 80] of G with respect to (D, W) is a sequence ~o" .. ,~k of subsets of ~ if and only if 1. WU CONSEQUENTS(~o) f- G, 2. for 1 ~ i ~ k, Wu CONSEQUENTS(~i) fPREREQUISITES(~i_1)' 3. ~k = 0, 4. WUUf=oCONSEQUENTS(~i) is consistent, 2 Problem Definition where In this section, we define our hypothetical reasoning based on a subset of normal default theories [Reiter 80]. A normal default theory (D, W) and a goal G are given as follows: • W: a set of Horn clauses. A Horn clause is represented in an implicational form, PREREQUISITES(~i_d for (a : /3//3) E ~i-1 and CONSEQUENTS(~i) (2) Here, ai (1 ~ i ~ nj n 2:: 0) and /3 are atomic formulas, and 1.. designates falsity. Function symbols are restricted to O-ary function symbols. All variables in a clause are assumed to be universally quantified in front of the clause. Each Horn clause has to be range-restricted, that is, all variables in the consequent /3 have to appear in the antecedent a1 /\ •.. /\ an. A Horn clause of the form (2) is called a negative clause. • D: a set of normal defaults. A normal default is an inference rule, a:/3 73' • goal G: a conjunction of atomic formulas. All variables in G are assumed to be existentially quantified. ~d· k W U U CONSEQUENTS(~i) F GO, i=O where the sequence ~o"'" ~k is a default proof of If GO is an answer to G from (D, W), 0 is an answer substitution for G from (D, W). A support for an answer GO from (D, W) is Uf=o CONSEQUENTS(~i)' where the sequence ~o" .. ,~k is a default proof of GO with respect to (D, W). For an answer GO from (D, W), the minimal supports for GO from (D, W), written as MS(GO), is the set of minimal elements in all supports for GO from (D, W). The solution to G from (D, W) is the set of all pairs (GO, MS(GO)), where GO is an answer to G from (D, W) and MS(GO) is the minimal supports for GO. The task of our hypothetical reasoning is defined to find the solution to a given goal from a given normal default theory. G with respect to (D, W). (3) where a, called the prerequisite of the normal default, is restricted to a conjunction a1 /\ ... /\ an of atomic formulas and /3, called its consequent, is restricted to an atomic formula. Function symbols are restricted to O-ary function symbols. All variables in the consequent /3 have to appear in the prerequisite a. A normal default with free variables is identified with the set of its ground instances. The normal default can be read as " if a and it is consistent to assume /3, then infer /3". == {/3 I (a: /3//3) E A ground instance GO of the goal G is an answer to G from (D, W) if (1) or == /\ a 3 ATMS The ATMS [de Kleer 86-1] is used as one component of our hypothetical reasoner. The following is the outline . of the ATMS. In the ATMS, a ground atomic formula is called a datum. For some datum N, r N designates an assumption. The ATMS treats both 1.. and r N as special data. The ATMS represents each datum as an ATMS node: (datum, label, justifications). Justifications correspond to ground Horn clauses and are incrementally input to the ATMS. Each justification is denoted by: 524 where Ni and N are data. Each datum Ni is called an antecedent, and the datum N is called a consequent. In the slot justifications, the ATMS records the set of antecedents of justifications whose consequents correspond to the datum. Let H be a current set of assumptions. An assumption set E ~ H is called an environment. When we denote an environment by a set of assumptions, each assumption fN is written as N by omitting the letter f. Let J be a current set of justifications. An environment E is called nogood if JuE derives .1-. The label of the datum N is the set of environments {E1 , · · · , Ej, ... , Em} that satisfies the following four properties [de Kleer 86-1]: where ai(l :S; i :S; n;n ~ 0) and {3;(1 ~ j ~ m;m ~ 0) are atomic formulas and all variables in {31 V ... V {3m have to appear in al 1\ ... 1\ an. Each clause in P is . translated into a KL1 [Ueda and Chikayama 90] clause. Then, model candidates are generated from the set of KL1 clauses. The MGTP works as a bottom-up reasoner on the distributed-memory multiprocessor called MultiPSI. As shown in Figure 1, we can construct a hypothetical reasoner by combining the MGTP with the ATMS. The normal default theory (D, W) i~ translated into a program P, P == { al 1\ ... 1\ an ---+ assume({3) I (al 1\ ... 1\ an : 1. N holds in each E j (soundness), 2. every environment in which N holds is a superset of some E j (completeness), {3 / {3) ED} U W, where assume is a metapredicate not appearing anywhere in D and W. 3. each E j is not nogood (consistency), Infer~nce 4. no E j is a subset of any other (minimality). If the label of a datum is not empty, the datum is believed; otherwise it is not believed. A basic algorithm to compute labels [de Kleer 86-1] is as follows. When a justification is incrementally input to the ATMS, the ATMS updates the labels relevant to the justification in the following procedure. Step 1: Let L be the current label of the consequent N of the justification and Li be the current label of the i-th antecedent Ni of the justification. Set L' = L U {x I x = Ui:l E j , where Ei E Ld. Step 2: Let L" be the set obtained by removing nogoods and subsumed environments from L'. Set the new label of N to L". Step 3: Finish this updating if L is equal to the new label. Step 4: If N is -1, then remove all new nogoods from labels of all data other than -1. Step 5: Update labels of the consequents of the recorded justifications which contain N as their antecedents. 4 Hypothetical Reasoner with ATMS and MGTP The MGTP [Fujita and Hasegawa 91] is a model generation theorem prover for checking the unsatisfiability of a first-order theory P. Each clause in P is denoted by: Engine MGTP Justifications ATMS Beliefs Figure 1: Forward-Chaining Hypothetical Reasoner with ATMS and MGTP proced ure R( G, P) : begin Bo:= 0; Jo := { (:::} {3) I (---+ {3) E P } U { (f.6 :::} {3) I (---+ assume({3)) E P }; s:= 0; while J s -1= 0 do begin s := s + 1; Bs := UpdateLabels(Js_1 , AT MS); J s := GenerateJustifications(Bs, P, B s- 1 ) end; Solution := 0; for each () such that G() E Bs do begin LGe := GetLabel(G(),ATMS); Solution := Solution U {(G(), LGe)} end; return Solution end. Figure 2: Reasoning Algorithm with ATMS and MGTP The reasoning procedure R(G,P) for the MGTP with the ATMS is shown in Figure 2. The reasoning proce- 525 dure consists of the part for UpdateLabels - GenerateJustifications cycles and the part for constructing the solution. The UpdateLabels - GenerateJustifications cycles are repeated while Is is not empty. The ATMS updates the labels related to a justification set l s - 1 given by the MGTP. The ATMS returns the set Bs of all the data whose labels are not empty after the ATMS has updated labels with Is-I. The procedure UpdateLabels( Is-I, AT M S) returns a believed data set Bs. The MGTP generates each set Is of justifications by matching elements of Bs with the antecedent of every clause related to new believed data. The procedure Generate1ustifications(Bs , P, B s - 1 ) returns a new justification set Is. If any element in (Bs \ B s- 1 ) can match an element of the antecedent of any (0'.1 I\. ... I\. an ~ X) in P and there exists a ground substitution ~ for all ai such that ai~ E B s , then Is is as follows. • (al~'···' an~, f,Bu ~ f3~) E Is if X = assume(f3). The procedure GetLabel(GO,ATMS) returns the label of GO and is used in constructing the solution. Note that the label of GO corresponds to the minimal supports for GO. The hypothetical reasoner with the ATMS and the MGTP can avoid duplicate proofs among different contexts and repeated proofs of subgoals. However, there may be a lot of unnecessary proofs unrelated to the proofs of the goal. 5 5.1 Upside-Down Meta-Interpretation Simple Transformation Algorithm Bottom-up reasoning has the disadvantage of proving unnecessarily subgoals that are not related to proofs of the given goal. We introduce a simple transformation of a program P on the basis of the upside-down metainterpretation for speedups of bottom-up reasoning by incorporating goal information. A bottom-up reasoner interprets a Horn clause in such a way that the fact f3~ is derived if facts al~,· .. ,an~ are present for some substitution~. On the other hand, a top-down reasoner interprets it in such a way that goals al~,·· . ,an~ are derived if a goal f3~ is present, and fact f3~ is derived if both a goal f3~ and facts al~,···, an~ are present. We transform the Horn clause into goal(f3) for every ai ~ goal(ai) (1 ~ i ~ n) and goal(f3) I\. al I\. ... I\. an ~ 13, then a bottom-up reasoner can simulate top-down reasoning. Here, goal is a metapredicate symbol which does not appear in the original program P. After some facts related to the proofs of the goal have derived with the upside-down meta-interpretation, those facts may derive contradiction with bottom-up interpretation of the original program. Thus, we transform each negative cla~se into and ~ goal(ai) for every ai (1 ~ i ~ n). This means that every subgoal related to negative clauses is evaluated. Note that (goal(f3) ~ goal(ai)) or (~ goal(ai)) may not be satisfy the range-restricted condition. We have some techniques which make every clause in transformed programs range-restricted. Here, we take a very simple technique in which only the predicate symbols are used as the arguments of the metapredicate goal. When, is an atomic formula, we denote by 1 the predicate symbol of ,. The algorithm T1 as shown in Figure 3 transforms an original program P into the program P in which the top-down information is incorporated. The solution to G from T1 (G, P) is always the same as the solution to G from P because all subgoals related to negative clauses as well as the given goal are evaluated and every label of goal (;8) for any atomic formula 13 is {0}. For example, consider a program, Pb = { ~ penguin(a), penguin(X) ~ bird(X), bird(X) ~ assume(fly(X)), fly(X) I\. notfly(X) ~ .1.., penguin(X) ~ notfly(X) }. By the simple transformation algorithm, we get T1(fly, Pb ) = { goal(penguin) ~ penguin(a), goal(bird) I\. penguin(X) ~ bird(X), goal(bird) ~ goal(penguin), goal(fly) I\. bird(X) ~ assume(fly(X)), goal(fly) ~ goal(bird), fly(X) I\. notfly(X) ~ .1.., ~ goal(fly), ~ goal(notfly), goal(notfly) I\. penguin(X) ~ notfly(X), goal(notfly) ~ goal(penguin) } u { ~ goal(fly) }. 526 Next, consider the goal bird(X). Then, the transformed program Tl(bird, Pb ) is the program Tl(bird, Pb ) = { ... } U {-+ goal(bird) }, where only the last element (-+ goal(Jly)) of Tl(Jly, Pb ) is replaced with (-+ goal(bird)). Even if the goal is bird(X), both goal(Jly) and goal(notfly) are evaluated because { ... } includes (-+ goal(Jly)) and (-+ goal(notfly)) for the negative clause. Then, the computational cost of R( bi rd (X), Tl (bi rd, Pb )) is nearly equal to the cost of R(Jly(X),Tl(Jly,Pb )). U P:= 0; for each (al 1\ ... 1\ an -+ X) E P do begin if X =..L then begin P := P U {al 1\ ... 1\ an -+ ..L}; for j := 1 until n do P := P U {-+ goal(aj)} end else if X = assume(,8) then begin P := P U {goal(fJ) 1\ al 1\ .. . 1\ an -+ assume(,8)}; for j:= 1 until n do -+ goal( aj)} end else if X =,8 then begin P := P U {goal(fJ) 1\ al 1\ ... 1\ an -+ ,8}; for j:= 1 until n do P := P U {goal(fJ) J=. {(al, .. ·,an,f,a=}fJ) -+ goal(aj)} end end; P := P U {-+ goal( C)}; return P end. Figure 3: Simple Transformation Algorithm Tl Transformation Algorithm with .Abstracted Dependency Analysis In this subsection, we describe a static method to find irrelevant negative clauses to evaluation of the goal. If we can find such irrelevant negative clauses, for every antecedent ai of each irrelevant clause, we do not need to add (-+ goal(ai)) into the transformed program. We try to find them by analyzing logical dependencies between {(a!"", an =} false(C)) I C = (al 1\ ... 1\ an -+ ..L), C E P}. Let .ii be the set of propositions appearing in J. Note that .ii consists of all predicate symbols in P and all f alse( C) for C E P. For each proposition N in .ii, we compute a set of abstracted environments on which N depends. Now, we show an algorithm to compute the set of abstracted environments. This algorithm is obtained by modifying the label-updating algorithm shown in Section 3. The modified points are as follows. 1. Replace Step 2 with Step 2': Set the new label of N to L'. 2. Remove Step 4. Every proposition in .ii is labeled with the set of abstracted environments obtained by applying the modified algorithm to the abstracted justifications J. This label is called the abstracted label of the proposition. The system to compute the set of abstracted environments for each proposition is called an abstracted dependency analyzer. The reasons why we have to modify the label-updating algorithm are as follows. Firstly, in the abstracted justifications, every 1. is replaced with the proposition false(C) for the negative clause C, so that each abstracted label is always consistent. Thus, we do not need Step 4. Secondly, each abstracted label may not be minimal because we replace Step 2 with Step 2'. Suppose that every abstracted label is minimal. Then, the theorem we present below may not hold. For example, let Pe = 5.2 I (al 1\ ... 1\ an -+ assume(,8)) E P} U {( aI, ... , an =} fJ) I (al 1\ ... 1\ an -+ ,8) E P} procedure Tl(C, P) : begin P := P U {goal(fJ) the goal and each negative clause at the abstracted level. We do not care about any argument in the abstracted dependency analysis. When, is an atomic formula, we denote by the proposition i the predicate symbol of ,. For each negative clause C, the proposition false(C) is used as the identifier of C. For every (a -+ ass u me (,8) ), fJ is called an assumable-predicate symbol. For any environment E, its abstracted environment (denoted by E) is { f,B I f j3 E E}. The abstracted justifications with respect to P is defined as: { -+ p(a), -+ p(b), -+ q(b), q(X) p(X) -+ assume(r(X)), p(X) -+ assume(s(X)), r(a) -+ g, r(X) 1\ s(X) -+ g, r(X) 1\ s(X) 1\ t(X) -+ 1. } . -+ t(X), Consider the problem defined with the goal 9 and Pe. The abstracted label of 9 is { {r }, {r, s} } . The abstracted label of the negative clause is {{ r, s}}. The abstracted environment {r, s} cannot be omitted for 9 although the set of minimal elements in the abstracted label of 9 is {{r}}. 527 procedure T2(G, P) : begin P:= 0; J:= 0; k:= 0; for each (al /\ ... /\ an - t X)E P do begin if X = l.. then begin k := k + 1 P:= P U {al/\···/\ an - t l..}; J := J U {(aI,···, an =;. false(k))}; end else if X = assume(,8) then begin P:=PU {goal(j3) /\ al /\ ... /\ an - t assume(,8)}; J U {(al,···, an, rj3 =;. j3)}; J := for j:= 1 until n do P := P U {goal(j3) - t goal(aj)} end else if X =,8 then begin p := P U {goal(j3) /\ al /\ ... /\ an - t ,8}; J := J U {(al,···, an =;. j3)}; for j:= 1 until n do P := P U {goal(j3) -t goal(aj)} end end; UpdateAbstractedLabels(J, ADA); La := GetAbstractedLabel( G, ADA); for i:= 1 until k do begin Li GetAbstractedLabel(false(i) , ADA); for each Ea E La do for each Ei E Li do if Ei ~ Ea then for (aI,···, an =;. f alse( i)) E J do := for j := 1 until n do P := P U { - t goal(aj)} end; P := P U { - t goal(G)}; return end. P Figure 4: Transformation Algorithm T2 with Abstracted Dependency Analysis Theorem: Let P be a normal default theory and G a goal, J the abstracted justifications with respect to P , L(G) the abstracted label of G , L(false(C)) the abstracted label of f alse( C) where C E P. If no element in L(false(C)) is a subset of any element in L(G), then the solution to G from P is equivalent to the solution to G from P \ {C}. Sketch of the proof: Let C be (a - t l..) and pI be P \ {C}. Assume that ()m is any answer substitution for G from pI and ak is any answer substitution for a from P'. Let MS(aak) be the minimal supports for aak from pI and M S( G()m) be the minimal supports for G()m from P'. Suppose that no element in L(false(C)) is a subset of any element in L(G). From the supposition and similarity between ATMS labels and abstracted labels, no element in MS(aak) is a subset of any element in MS(GO m ). Therefore, the solution to G from pI U {C} is the same as the solution to G from P'. • On the basis of the theorem, we can omit consistency checking for a negative clause C if the condition of the theorem is satisfied. The transformation algorithm T2(G, P) with the abstracted dependency analysis is shown in Figure 4 for the program P and the goal G. In Figure 4, UpdateAbstractedLabels( J, AD A) denotes the procedure which computes abstracted labels from abstracted justifications J with the abstracted dependency analyzer ADA, and GetAbstractedLabel(G, ADA) denotes the procedure which returns the abstracted label of G from the abstracted dependency analyzer ADA. The procedure transforms an original program into the program in which the top-down information is incorporated and consistency checking is restricted to those negative clauses relevant to the given goal. Consider the same example Pb , shown in the previous subsection, in case that the goal is bird(X). The abstracted justifications Jb is penguin), (penguin =;. bird), (bird,r f1y =;. fly), (fly,notfly =;. false(l)), (penguin =;. notfly) }. { (=;. As the result of the abstracted dependency analysis, the abstracted label of false(l) is {{fly}} and the abstracted label of bird is {0}. Then, no element in the abstracted label of f alse(l) is a subset of any element in the abstracted label of bird, so that we do not need to evaluate this negative clause. As a consequence, we have the transformed program: T2(bird, Pb ) = { goal(penguin) - t penguin(a), goal(bird) /\ penguin(X) - t bird(X), goal(bird) - t goal(penguin), goal(fly) /\ bird(X) - t assume(fly(X)), goal(fly) - t goal(bird), fly(X) /\ notfly(X) - t .1, goal(notfly) /\ penguin(X) - t notfly(X), goal(notfly) - t goal(penguin) } - t goal(bird) }. u{ 528 Since the transformed program does not include (---+ goal(Jly)) and (---+ goal(notfly)), the reasoner can omit solving both the goal fly(X) and the goal notfly(X). 6 Evaluation with Logic Design Problem We have taken up the design of logic circuits to calculate the greatest common divisor (GCD) of two integers expressed in 8 bits by using the Euclidean algorithm. The solutions are circuits calculating GCD and satisfying given constraints on area and time [Maruyama et al. 88]. The program P d contains several kinds of knowledge: datapath design, component design, technology mapping, CMOS standard cells and constraints on area and time [Ohta and Inoue 90]. The design problem of calculators for GCD includes design of components such as subtracters and adders. Table 1 shows the expermental result, on a PseudoMulti-PSI system, for the evaluation of the transformed programs. The run time of a program P for a goal G is denoted by TR(G,p). The predicate symbol G of each goal G is adder (design of adders), subtracter (design of subtracters) or cGCD (design of calculators for GCD). The run time TR(G,Pd) of each goal G is equal to the others on the original program Pd' T a bl e 1: R un T'lme 0 fP rogralll Goal G TR(G,Pd) [s] TR(G,P1 ) [s] T R (G,P2) [s] adder 10.7 17.5 0.4 subtracter 10.7 17.3 0.6 cGCD 10.7 17.3 16.8 Let PI be the simple transformed program of Pd' The experiment on the simple transformation time shows that it takes 6.35 [s] for making PI from Pd. However, the run time TR(G,Pl) for each goal G is nearly equal to the others because constraints on area and time of the GCD calculators are represented by negative clauses. Even if we want to design adders or subtracters, the hypothetical reasoner cannot avoid designing GCD calculators for consistency checking. Let P2 be the transformed program with the abstracted dependency analysis. The experiment on the transformation time with the abstracted dependency analysis shows that it takes 6.63 [s] for making P2 from Pd. The transformation time with the abstracted dependency analysis is a little bit longer (0.28 [s]) than the simple transformation time. When G is adder or subtracter, the run time TR (G,P2 ) is much shorter than the run time for the design of GCD calculators. This is because the program can avoid consistency checks for negative clauses representing constraints on area and time of the GCD calculators when the design of adders or the design of subtracters is given as a goal. The result show that each total of the transformation time with abstracted dependency analysis and the run time of the transformed program is shorter than the run time of the original program when the problem does not need the whole of the program. 7 Related Work The algorithm for first-order Horn-clause abduction with the ATMS is presented in [Ng and Mooney 91]. The system is basically a consumer architecture [de Kleer 86-3] introducing backward-chaining consumers. The algorithm avoids both redundant proofs by introducing the goal-directed backward-chaining consumers and duplicate proofs among different contexts by using the ATMS. Their problem definition is the same as [Stickel 90], whose inputs are a goal and a set of Horn clauses without negative clauses. When there are negative clauses in the program, they briefly suggest that forward-chaining consumer can be used for each negative clause to check the consistency. On the other hand, since we only simulate backward-chaining by the forward-chaining reasoner, we do not require both types of chaining rules. Moreover, we see that when the program includes negative clauses, it is sometimes difficult to represent the clauses as a set of consumers. For example, suppose that the axioms are {a---+c, b---+d, cAd---+g, c---+e, d---+f, eAf---+~} and the goal is g. Assume that the set of consumers is {(c ~ a), (d ~ b), (g ~ c, d), (e ~ c), (J ~ d), (e,f =* ~)}, where ~ means a backward-chaining consumer and =* means a forward-chaining consumer. Then, we get the solution {(g, {{g}, {a, b}, {a, d}, {c, b}, {c, d}})}. However, the correct solution is {(g, {{g}})} because {a, b}, {a, d}, {c, b} and {c, d} are nogood. To guarantee the consistency when the program includes negative clauses, for every Horn clause, we have to add the corresponding forward-chaining consumer. Such added consumers would cause the same problem as the program that appeared in using the simple transformation algorithm. In [Stickel 91], deduction and abduction with the upside-down meta-interpretation are proposed. This abduction does not require the consistency of solutions. Furthermore, rules may do duplicate firing in different contexts since it does not use the ATMS. This often causes a problem when it is applied to practical programs where heavy procedures are attached to rules. Another difference between the frameworks of [Ng and Mooney 91, Stickel 91] and ours is that their 529 frameworks treat only hypotheses in the form of normal defaults without prerequisites, whereas we allow for normal defaults with prerequisites. 8 Conclusion We have presented a new transformation algorithm of programs for efficient forward-chaining hypothetical reasoning based on the upside-down meta-interpretation. In the transformation algorithm, logical dependencies between a goal and negative clauses are analyzed at abstracted level to find irrelevant negative clauses, so that consistency checking of negative clauses can be restricted to those relevant clauses. It has been evaluated with a logic circuit design problem on a Pseudo-Multi-PSI system. We can also apply this abstracted dependency analysis to transformed programs based on Magic Set and Alexander methods. Our dependency analysis with only predicate symbols may be extended to an analysis with predicate symbols and their some arguments. Acknowledgments Thanks are due to Mr. Makoto Nakashima of JIPDEC for implementing the ATMS and combining it with the MGTP. We are grateful to Prof. Mitsuru Ishizuka of the University of Tokyo for the helpful discussion. We would also like to thank Dr. Ryuzo Hasegawa and Mr. Miyuki Koshimura for providing us the MGTP, and Dr. Koichi Furukawa for his advise. Finally, we would like to express our appreciation to Dr. Kazuhiro Fuchi, Director of ICOT Research Center, who provided us with the opportunity to conduct this research. References [Bancilhon et al. 86] F. Bancilhon, D. Maier, Y. Sagiv and J.D. Ullman, Magic Sets and Other Strange Ways to Implement Logic Programs, Proc. of ACM PODS, pp.I-15 (1986). [Bry 90] F. Bry, Query evaluation in recursive databases: bottom-up and top-down reconciled, Data fj Knowledge Engineering, 5, pp.289-312 (1990). [de Kleer 86-1] J. de Kleer, An Assumption-based TMS, Artificial Intelligence, 28, pp.127-162 (1986). [de Kleer 86-2] J. de Kleer, Extending the ATMS, Artificial Intelligence, 28, pp.163-196 (1986). [de Kleer 86-3] J. de Kleer, Problem Solving with the ATMS, Artificial Intelligence, 28, pp.197-224 (1986) [Flann et al. 87] N.S. Flann, T .G. Dietterich and D.R. Corpron, Forward Chaining Logic Program- ming with the ATMS, Proc. of AAAI-87, pp.24-29 (1987). [Forgy 82] C.L. Forgy, Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem, Artificial Intelligence, 19, pp.17-37 (1982). [Fujita and Hasegawa 91] H. Fujita and R. Hasegawa, A Model Generation Theorem Prover in KLI Using a Ramified-Stack Algorithm, Proc. of ICLP '91, pp.494-500 (1991). [Inoue 88] K. Inoue, Problem Solving with Hypothetical Reasoning, Proc. of FGCS '88, pp.1275-1281 (1988). [Junker 88] U. Junker, Reasoning in Multiple Contexts, GMD Working Paper No.334 (1988). [Maruyama et al. 88] F. Maruyama, T. Kakuda, Y. Masunaga, Y. Minoda, S. Sawada and N. Kawato, coLODEX: A Cooperative Expert System for Logic Design, Proc. of FGCS '88, pp.1299-1306 (1988). [Ng and Mooney 91] H.T. Ng and R.J. Mooney, An Efficient First-Order Abduction System Based on the ATMS, Technical Report AI 91-151, The University of Texas at Austin, AI Lab. (1991). [Ohta and Inoue 90] Y. Ohta and K. Inoue, A ForwardChaining Multiple-Context Reasoner and Its Application to Logic Design, Proc. of IEEE TAl, pp.386392 (1990). [Poole et al. 87] D. Poole, R. Goebel and R. Aleliunas, Theorist: A logical Reasoning System for Defaults and Diagnosis, N. Cercone and G. McCalla (Eds.), The Knowledge Frontier: Essays in the Representation of Knowledge, Springer-Verlag, pp.331-352 (1987). [Poole 91] D. Poole, Compiling a Default Reasoning System into Prolog, New Generation Computing, 9, pp.3-38 (1991). [Reiter 80] R. Reiter, A Logic for Default Reasoning, Artificial Intelligence, 13, pp.81-132 (1980). [Rohmer et al. 86] J. Rohmer, R. Lescoeur and J.M. Kerisit, The Alexander Method - A Technique for The Processing of Recursive Axioms in Deductive Databases, New Generation Computing, 4, pp.273-285 (1986). [Stickel 90] M.E. Stickel, Rationale and Methods for Abductive Reasoning in Natural~Language Interpretation, Lecture Nodes in Artificial Intelligence, 459, Springer-Verlag, pp.233-252 (1990). [Stickel 91] M.E. Stickel, Upside-Down Meta-Interpretation of the Model Elimination Theorem-Prover Procedure for Deduction and Abduction, ICOT Technical Report TR-664, ICOT (1991). [Ueda and Chikayama 90] K. Ueda and T. Chikayama, Design of the Kernel Language for the Parallel Inference Machine, The Computer Journal, 33, 6, pp. 494-500 (1990). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 530 Logic Programming, Abduction and Probability David Poole Department of Computer Science, University of British Columbia, Vancouver, B.C., Canada V6T lZ2 poole@cs.ubc.ca telephone: (604) 822 6254 fax: (604) 822 5485 Abstract Probabilistic Horn abduction is a simple framework to combine probabilistic and logical reasoning into a coherent practical framework. The numbers can be consistently interpreted probabilistically, and all of the rules can be interpreted logically. The relationship between probabilistic Horn abduction and logic programming is at two levels. At the first level probabilistic Horn abduction is an extension of pure Prolog, that is useful for diagnosis and other evidential reasoning tasks. At another level, current logic programming implementation techniques can be used to efficiently implement probabilistic Horn abduction. This forms the basis of an "anytime" algorithm for estimating arbitrary conditional probabilities. The focus of this paper is on the implementation. 1 Introduction Probabilistic Horn Abduction [Poole, 1991c; Poole, 1991b; Poole, 1992a] is a framework for logic-based abduction that incorporates probabilities with assump-· tions. It is being used as a framework for diagnosis [Poole, 1991c] that incorporates both pure Prolog and Bayesian Networks [Pearl, 1988] as special cases [Poole, 1991b]. This paper is about the relationship of proba..; bilistic Horn abduction to logic programming. This simple extension to logic programming provides a wealth of new applications in dia&nosis, recognition and evidential reasoning [Poole, 1992aJ. This paper also presents a logic-programming solution to the problem in abduction of searching for the "best" diagnoses first. The main features of the approach are: • We are using Horn clause abduction. The procedures are simple, both conceptually and computationally (for a certain class of problems) .. We develop a simple extension of SLD resolution to implement our framework. • The search algorithms form "anytime" algorithms that can give an estimate of the conditional probability at any time. We do not generate the unlikely explanatiolls unless we Ileed Lo. 'vVe have a boulld on the probability mass of the remaining explanations which allows us to know the error in our estimates. • A theory of "partial explanations" is developed. These are partial proofs that can be stored in a priority queue until they need to bf further expanded. We show how this is implemented in a Prolog interpreter in Appendix A. 2 Probabilistic Horn abduction The formulation of abduction used is a simplified form of Theorist [Poole et al., 1987; Poole, 1988] with probabilities associated with the hypotheses. It is simplified in being restricted to definite clauses with simple forms of integrity constraints (similar to that in [Goebel et al., 1986]). This can also be seen as a generalisation of an ATMS [Reiter and de Kleer, 1987] to be nonpropositional. The language is that of pure Prolog (i.e., definite clauses) with special disjoint declarations that specify a set of disjoint hypotheses with associated probabilities. There are some restrictions on the forms of the rules and the probabilistic dependence allowed. The language presented here is that of [Poole, 1992a] rather than that of [Poole, 1991c; Poole, 1991b]. The main design considerations were to make a language the simplest extension to pure Prolog that also included probabilities (not just numbers associated with rules, but numbers that follow the laws of probability, and so can be consistently interpreted as probabilities [Poole, 1992al). \Ve are also assuming very strong independence assumptions; this is not intended to be a temporary restriction on the language that we want to eventually remove, but as a feature. We can represent any probabilistic information using only independent hypotheses [Poole, 1992a]; if there is any dependence amongst hypotheses, we invent a new hypothesis to explain that dependency. 2.1 The language Our language uses the Prolog conventions, and has the same definitions of variables, terms and atomic symbols . Definition 2.1 A definite clause is of the form: a. or (l t - al 1\ .. . 1\ (In. where (l a.nd each (li are a.tomic symbols. 531 Definition 2.2 A disjoint declaration is of the form disjoint([hl : PI, .. " h n : Pn]). where the hi are atoms, and the Pi are real numbers Pi :::; 1 such that PI + ... + Pn = 1. Any variable appearing in one hi must appear in all of the h j (i.e., the hi share the same variables). The hi will be referred to as hypotheses. o :::; Definition 2.3 A probabilistic Horn abduction theory (which will be referred to as a "theory") is a collection of definite clauses and disjoint declarations such that if a ground atom h is an instance of a hypothesis in one disjoint declaration, then it is not an instance of another hypothesis in any of the disjoint declarations. Assumption 2.7 (acyclicity) If F' is the set of ground instances of elements of F, then it is possible to assign a natural number to every ground atom such that for every rule in F' the atoms in the body of the rule are strictly less than the atom in the head. This assumption is discussed in [Apt and Bezem, 1990]. Assumption 2.8 The rules in F' for a ground nonassumable atom are covering. That is, if the rules for a in F' are Given theory T, we define f- a f- a FT the facts, is the set of definite clauses in T together with the clauses of the form false a f- hi 1\ h j where hi and h j both appear in the same disjoint declaration in T, and i f. j. Let Ff be the set of ground instances of elements of FT. HT to be the set of hypotheses, the set of hi such that hi appears in a disjoint d~claration in T. Let Hfr be the set of ground instances of elements of HT. a f- some_other _reason_for_a and making "some_other _reason_for _a" a hypothesis [Poole, 1992a]. Lemma 2.9 [Console et al., 1991; Poole, 1988] Under assumptions 2.6, 2.7 and 2.8, if expl(g, T) is the set of minimal explanations of 9 from theory T: v 9 eiEexpl(g,T) disjoint declaration in T. Definition 2.4 [Poole et at, 1987; Poole, 1987] If 9 is a closed formula, an explanation of 9 from (F, H) is a set D of elements of H' such that • F U D 1= 9 and • F U D ~ false. The first condition says that D is a sufficient cause for g, and the second says that D is possible. Definition 2.5 A minimal explanation of 9 is an explanation of 9 such that no strict subset is an explanation of g. 2.2 Assumptions about the rule base Probabilistic Horn abduction also contains some assumptions about the rule base. It can be argued that these assumptions are natural, and do not really restrict what can be represented [Poole, 1992a]. Here we list these assumptions, and use them in order to show how the algorithms work. The first assumption we make is about the relationship between hypotheses and rules: Assumption 2.6 There are no rules with head unifying with a member of H. Instead of having a rule implying a hypothesis, we invent a new atom, make the hypothesis imply this atom, and all of the rules imply this atom, and use this atom instead of the hypothesis. Bm if a is true, one of the Bi is true. Thus Clark's completion [Clark, 1978] is valid for every non-assumable. Often we get around this assumption by adding a rule PT is a function Hfr .- [0,1]. PT(hD = Pi where h~ is a ground instance of hypothesis hi, and hi : Pi is in a Where T is understood from context, we omit the subscript. f- BI B2 Assumption 2.10 The bodies of the rules in F' for an atom are mutually exclusive. Given the above rules for a, this means that Bi 1\ Bj => false is true in the domain under consideration for each i =1= j . We can make this true by adding extra conditions to the rules to make sure they are disjoint . Lemma 2.11 Under assumptions 2.6 and 2.10, minimal explanations of atoms or conjunctions of atoms are mutually inconsistent. See [Poole, 1992a] for more justification of these assumptions. 2.3 Probabilities Associated with each possible hypothesis is a prior probability. We use this prior probability to compute arbitrary probabilities .. The following is a corollary oflemmata 2.9 and 2.11 Lemma 2.12 Under assumptions 2.6, 2.7, 2.8, 2.10 and 2.13, iJ expl(g, T) is the set oj minimal explanations oj conjunction oj atoms 9 Jront probabilistic IIorn abduction theory T: P(g) p L"~(9 e;) ,T) 2:= eiEexpl(g,T) P(ei) 532 Thus to compute the prior probability of any 9 we sum the probabilities of the explanations of g. To compute arbitrary conditional probabilities, we use the definition of conditional probability: rule«h :- c, e». rule«h :- g, b». disjoint([b:O.3,c:O.7]). disjoint([e:O.6,f:O.3,g:O.1]). There are four minimal explanations of a, namely P( 1{3) = P( a: 1\ {3) a: P({3) {e,e}, {b,e}, {j,b} and {g,b}. The priors of the explanations are as follows: Thus to find arbitrary conditional probabilities P(a:\{3), we find P({3), which is the sum of the explanations of {3, and P( a:1\{3) which can be found by explaining a: from the explanations of {3. Thus arbitrary conditional probabilities can be c9mputed from summing the prior probabilities of explanations. It remains only to compute the prior probability of an explanation D of g. We assume that logical dependencies impose the only statistical dependencies on the hypotheses. In particular we assume: Assumption 2.13 Ground instances of hypotheses that are not inconsistent (with FT) are probabilistically independent. That is, different disjoint declarations define independent hypotheses. The hypotheses in a minimal explanation are always logically independent. The language has been carefully set up so that the logic does not force any dependencies amongst the hypotheses. If we could prove that some hypotheses implied other hypotheses or their negations, the hypotheses could not be independent. The language is deliberately designed to be too weak to be able to state such logical dependencies between hypotheses. Under assumption 2.13, if {hI, .. " h n } are part of a minimal explanation, then P(c 1\ e) = 0.7 x 0.6 = 0.42. Similarly P(bl\e) 0.03. Thus pea) = 0.42 + 0.18 + 0.09 + 0.03 = 0.72 There are two explanations of e 1\ a, namely {c, e} and {b, e}.Thus pee 1\ a) = 0.60. Thus the conditional probability of e given a is P(ela) = 0.6/0.72 = 0.833. What is important about this example is that all of the probabilistic calculations reduce to finding the probabilities of explanations. 2.5 1. Generate the explanations of some goal (conjunction of atoms), in order. 2. Determine the prior probability of some goal. This is implemented by enumerating the explanations of the goal. 3. Determine the posterior probabilities of the explanations of a goal (i.e., the probabilities of the explanations given the goal). 4. Determine the conditional probability of one formula given another. That is, determining P(a:I{3) for any a: and {3. i) 1=1 To compute the prior of the minimal explanation we multiply the priors of the hypotheses. The posterior probability of the explanation is proportional to this. The following is a corollary of lemmata 2.9 and 2.11 Len1l1.la 2.14 Under assumptions 2.6, 2.7, 2.8, 2.10 and 2.13, if exp/(g, T) is the set of all minimal explanations of 9 from theory T: C'YCg'T) e i P L ) P(ei) eiEexpl(g,T) 2.4 An example In this section we show an example that we use later in the paper. It is intended to be as simple as possible to show how the algorithm works. Suppose we have the rules and hypotheses: rule«a rUle«a rule( (q rule«q rule«h b, h». q,e». h». b,e». b, f». All of these will be implemented by enumerating the explanations of a goal, and estimating the probability mass in the explanations that have not been enumerated. It is this problem that we consider for the next few sections, and then return to the problem of the tasks we want to compute. 3 peg) Tasks The following tasks are what we expect to implement: n IT P(h = 0.18, P(J 1\ b) = 0.09 and P(gl\b) = A top-down proof procedure In this section 'we show how to carry out a best-first search of the explanations. In order to do this we build a notion of a partial proof that we can add to a priority queue, and restart when, necessary. 3.1 SLD-BF resolution In this section we outline an implementation based on logic programming technology and a branch and bound search. The implementation keeps a priority queue of sets of hypotheses that could be extended into explanations ("partial explanations"). At any time the set of all the explanations is the set of already generated explanations, plus those explanations that ca.n be generated from the pa.rtial explanations in the priority queue. 533 Q:= {(g <-- g, {})}; II := {}; repeat choose and remove best (g <-- C, D) from Q; if C true then if good(D) then II := II U {D} endif else Let C aAR for each rule(h <-- B) where mgu(a, h) = 0 Q := Q U {(g <-- BAR, D) O} ; if a E Hand good( {a} U D) then Q := Q U {(g <-- R, {a} U D)} endif endif until Q = {} where good(D) == (Vd 1 ,d2 E D fJ1J E NG3cjJ (d 1 ,d2 ) = 1JcjJ) A (fJ7r E II, 3cjJ D ~ 7rcjJ) = = Definition 3.2 A partial explanation (g valid with respect to (F, H) if Proof: This is proven by induction on the number of times through the loop. It is trivially true initially as q ~ q for any q. There are two cases where elements are added to Q. In the first case (the "rule" case) we know by the inductive assumption, and so F C, D) As a() (g <-- C, D) b1 A ... A bn A R, D) 0 and add it to the priority queue. The second operation is used when a E H. In this case we produce the partial explanation (g <-- R, {a} U D) and add it to Q. We only do this if {a} U D is consistent, and is not subsumed by another explanation of q. Here we assume the set N G of pairs of hypotheses that appear in the same disjoint declaration (corresponding to nogoods in an ATMS [Reiter and de Kleer, 1987]). Unlike in an ATMS this set can be built at compile time from the disjoint declarations. This procedure will find the explanations in order of likelihood. Its correctness is based on the meaning of a partial explanation =?- h)O F= (D A R A B =?- g)O. The other case is when a E H. By the induction step we choose an element in F, such that h and a have most general unifier 0, we generate the partial explanation F= (B = h(), by a simple resolution step we have F Figure 1 gives an algorithm for finding explanations of of the priority queue Q with maximum prior probability of D. We have an explanation when C is the empty conjunction (represented here as true). In this case D is added to the set II of already generated explanations. Otherwise, suppose C is conjunction a A R. There are two operations that can be carried out. The first is a form of SLD resolution [Lloyd, 1987], where for each rule h <-- b1 A ... A bn (DARAa=?-g)O F q in order of probability (most likely first). At each step <-- F= We also know where 9 is an atom (or conjunction of atoms), C is a conjunction of atoms and D is a set of hypotheses. (g 1S Lemma 3.3 Every partial explanation m the queue Q is valid with respect to (F, H). Definition 3.1 a partial explanation is a structure <-- C, D) FF=DAC~g Figure 1: SLD-BF Resolution to find explanations of 9 in order. (g <-- F F= D F F= (D A a) A R A (a A R) =?- 9 and so ~ g If D only contains elements of H and a is an element of H then {a}UD only contains elements of H. 0 It is now trivial to show the following: Corollary 3.4 Every element of II in figure 1 is an explanation of q. Although the correctness of the algorithm does not depend on which element of the queue we choose at any time, the efficiency does. We choose the best partial explanation based on the following ordering of partial explanations. Partial explanation (gl <-- C 1 , D 1) is better than (g2 <-- C2, D 2) if P(D 1) > P(D 2). It is simple to show that "better than" is a partial ordering. \"'hen we choose a "best" partial explanation we choose a minimal element of the partial ordering; where there are a number of minimal partial explanations, we can choose anyone. When we follow this definition of "best", we enumerat.e the minimal explanations of q in order of probability. 3.2 Our example III this section we show how the simple example in Section 2.4 is handled by the best-first proof process. The following is the sequence of values of Q each time through the loop (where there are a number of minimal explanations, we choose the element that was added 534 4 last): Discussion 4.1 Probabilities in the queue {(at-a,U)} {(a t- b /\ h, U) , (a t- q /\ e, U)} We would like to give an estimate for P(g) after having {(a t- q /\ e, U), (a t- h, {b})} generated only a few of the most likely explanations of g, {(a t- h /\ e, U), (a t- b /\ e /\ e, U), (a t- h, {b})} and get some estimate of our error. This problem reduces {(a t- b /\ I /\ e, {}) , (a t- C /\ e /\ e, U) , to estimating the probability of partial explanations in (a (- 9 /\ b /\ e, {}), (a t- b /\ e /\ e, U), (a t- h, {b})} the queue. {{a (- c /\ e /\ e, {}) , (a t- 9 /\ b /\ e, U) , If (g (- C, D) is in the priority queue, then it can pos(a (- b /\ e 1\ e, U), (a (- 1/\ e, {b}), (a (- h, {b})} sibly be used to generate explanations D I , ... , Dn. Each {(a (- 9 /\ b /\ e, {}) , (a (- b 1\ e /\ e, {}) , (a t- e /\ e, {c}) , Di will be of the form D U D~. We can place a bound on (a (- 11\ e, {b}), (a (- h, {b})} the probability mass of all of the Di, by {(a (- b 1\ e 1\ e, {}), (a (- e 1\ e, {c}) , (a (- 1/\ e, {b}) , P(D I V .. · V Dn) = P(D /\ (D~ V ... V D~» (a (- h, {b}) , (a t- b /\ e, {g})} ::; P(D) { (a t- e /\ e, {c}) , (a t- e /\ e, {b}) , (a t- I /\ e, {b}) , (a (- h, {b}), (a t- b /\ e, {g})} {(a (- e, {e,c}), (a t- e /\ e,{b}), (a t- 1/\ e, {b}), (a (- h, {b}), (a t- b /\ e, {g})} {(a (- true, {e, c}), (a t- e /\ e, {b}) , (a t- 1/\ e, {b}), (a (- h, {b}), (a (- b /\ e, {g})} Thus the first, and most likely explanation is {e, c}. {(a (- e 1\ e, {b}) , (a (- 1/\ e, {b}), (a (- h, {b}), (a (- b /\ e, {g})} (a (- I /\ e, {b}), (a (- h, {b}), (a (- e, {e, b}), {(a (- b /\ e, {g})} { (a t- h, {b }) , (a (- e, {e, b}) , (a (- b /\ e, {g}) , (a (- e, {I, b})} {(a (- b /\ I, {b}), (a (- c /\ e, {b}), (a (- 9 /\ b, {b}) , (a t- e, {e, b}), (a (- b /\ e, {g}) , (a (- e, {I, b})} {(a t- I, {b}), (a t- c /\ e, {b}), (a t- 9 /\ b, {b}), (a t- e, {e, b}), (a t- b /\ e, {g}) , (a (- e, {I, b})} {(a t- c /\ e, {b}) , (a (- 9 /\ b, {b}) , (a t- e, {e, b}) , (a t- b /\ e, {g}) , (a (- true, {I, b}), (a (- e, {I, b})} Here the algorithm effectively prunes the top partial explanation as (c, b) forms a nogood. Given this upper bound, we can determine an upper bound for P(g), where {el," . , en} is the set of all minimal explanations of g: P(g) P(el V e2 V ... V en) peel) + P(e2) + .,. + peen) L p(ei») + ( ( found ei L ej p(ej)~ to be generated We can easily compute the first of these sums, and can put upper and lower bounds on the second. This means that we can put a bound on the range of probabilities of a goal based on finding just some of the explanations of the goal. Suppose we have goal g, and we have generated explanations II. Let PIT = L P(D) DeIT PQ = L P(D) D:{g<-C,D}eQ where Q is the priority queue. {(a (- 9 /\ b, {b}), (a (- e, {e,b}), (a t- b /\ e, {g}), vVc then have (a t- true, {I, b}), (a t- e, {I, b})} PIT ::; peg) ::; PIT + PQ {(a - e, {e, b}) , (a - b /\ e, {g}) , (a t- true, {I, b}) , (a t- e, {I, b}), (a t- b, {g, b})}} As the computation progresses, the probability mass {(a - t1'ue, {e, b}), (a (- b /\ e, {g}), (a (- true, {I, b}), in the queue PQ approaches zero l and we get a better (a - e, {I, b}), (a (- b, {g, b})} refinement on the value of P(g). This thus forms the We have now found the second most likely explanation, namely {e, b}. {(a - b /\ e, {g}), (a t- true, {I, b}), (a - e, {I, b}), (a - b, {g, b})} {(a (- true, {I, b}), (a (- e, {I, b}), (a - e, {g, b}), (a-b,{g,b})} We have thus found the third explanation {I, b}. {(a (- e, {I, b}), (a (- e, {g, b}), (a (- b, {g, b})} {(a - e, {g, b}), (a (- b, {g, b})} {(a - b, {g, b})} {(a -true,{g,b})} The fourth explanation is {g, b}. There are no more partial explanations and the process stops. basis of an "anytime" algorithm for Bayesian networks. 4.2 Conditional Probabilities We can also use the above procedure to compute conditional probabilities. Suppose we are trying to compute the conditional probability P( aLB). This can be computed from the definition: P( 1,8) = P( a /\ ,8) a P(,8) We compute the conditional probabilities by enumerating the minimal explanations of a/\,8 and,8. Note that the minimal explanations of a 1\,8 are explanations (not 1 Note that the estimate given above does not always decrease. It is possible that the error estimate increases. [Poole, 1992b] considers cases where convergence can be guaranteed. ) 535 necessarily minimal) of (3. We can compute the explanations of a 1\ (3, by trying to explain a from the explanations of (3. The above procedure can be easily adapted for this task, by making the task to explain (31\ a, and making sure we prove (3 before we prove a, so that we can collect the explanations of (3 as a we generate them. Let pf3 be the sum of the probabilities of the explanations of (3 enumerated, and let pcx/l.f3 be the sum of the explanations of a 1\ (3 generated. Thus given our estimates of P( a 1\ (3) and P((3) we have pcx/l.f3 pcx/l.f3 + PQ pf3 + PQ :::; P(al(3) :::; pf3 The lower bound is the case where all of the partial descriptions in the queue go towards worlds implying (3, but none of these also lead to a. The upper bound is the case where all of the elements of the queue go towards implying a, from the explanations already generated for (3. 4.3 Consistency and subsumption checking One problem that needs to be considered is the problem of what happens when there are free variables in the hypotheses generated. When we generate the hypotheses, there may be some instances of the hypotheses that are inconsistent, and some that are consistent. We know that every instance is inconsistent if the subgoal is subsumed by a nogood. This can be determined by substituting constants for the variables in the the subgoal, and finding if a subset unifies with a nogood. We cannot prune hypotheses because all instance is inconsistent. However, when computation progresses, we may substitute a value for a variable that makes the partial explanation inconsistent: This problem is similar to the problem of delaying negation-as-failure derivations [Naish, 1986], and of delaying consistency checking in Theorist [Poole, 1991a]. We would like to notice such inconsistencies as soon as possible. In the algorithm of Figure 1 we check for inconsistency each time a partial explanation is taken off the queue. There are cases where we do not have to check this explicitly, for example when we have done a resolution step that did not assign a variable. There is a trade-off between checking consistency and allowing some inconsistent hypotheses on the queue 2 • This trade-off is beyond the scope of this paper. Note that the assumptions used in building the system imply that there can be no free variables in any explanation of a ground goal (otherwise we have infinitely many disjoint explanations with bounded probability). Thus delaying subgoals eventually grounds all variables. 4.4 Iterative deepening In many search techniques we often get much better space complexity and asymptotically the same time complexity by using an iterative deepening version of a search procedure [Korf, 1985]. An iterative deepening version of the best-first search procedure is exactly the 2We have to check the consistency at some time. This could be as late as just before the explanation is added to II. same as the iterative deepening version of A * with the heuristic function of zero [Korf, 1985]. The algorithm of procedure 1 is given at a level of abstraction which docs not preclude iterative deepening. For our experimental implementations, we have used an interesting variant of iterative deepening. Our queue is only a "virtual queue" and we only physically store partial explanations with probability greater than some threshold. We remember the mass of the whole queue, including the values we have chosen not to store. When the queue is empty, we decrease the threshold. We can estimate the threshold that we need for some given accuracy. This speeds up the computation and requires less space. Recomputing subgoals One of the problems with the above procedure is that it recomputes explanations for the same subgoal. If s is queried as a subgoal many times then we keep finding the same explanations for s. This has more to do with the notion of SLD resolution used than with the use of branch and bound search. . We are currently experimenting with a top-down procedure where we remember computation that we have computed, forming "lemmata". This is similar to the use of memo functions [Sterling and Shapiro, 1986] or Earley deduction [Pereira and Shieber, 1987] in logic programming, but we have to be very careful with the interaction between making lemmata and the branch and bound search, particularly as there may be multiple answers to any query, and jllst because we ask a query docs not mean we want to solve it (we may only want to bound the probability of the answer). 4.5 4.6 Bounding the priority queue Another problem with the above procedure that is not solved by lemmatisation is that the bound on the priority queue can become quite large (i.e., greater than one). Some bottom-up procedures [Poole, 1992b], can have an accurate estimate of the probability mass of the queue (i.e., an accurate bound on how much probability mass could be on the queue based on the information at hand). See [Poole, 1992b] for a description of a bottom-up procedure that can be compared to the top-down procedure in this paper. In [Poole, 1992b] an average case analysis is given on the bottom-up procedure; while this is not an accurate estimate for the top-down procedure, the case where the bottom-up procedure is efficient [Poole, 1992b] is the same case where the top-down procedure works well; that is where there are normality conditions that dominate the probability of each hypothesis (i.e., where all of the probabilities are near one or near zero). 5 COHlparison with other systen1S There are many other proposals for logic-based abduction schemes (e.g., [Pople, 1973; Cox and Pietrzykowski, 1987; Goebel et ai., 1986; Poole, 1987]). These, however, consider that we either find an arbitrary explanation or find all explanations. In practice there are prohibitively many of these. It is also not clear what to do with all of the explanations; there are too many to give to a 536 user, and the costs of determining which of the explanations is the "real" explanation (by doing tests [Sattar and Goebel, 1991]) is usually not outweighed by the advantages of finding the real explanation. This is why it is important to take into account probabilities. We then have a principled reason for ignoring many explanations. Probabilities are also the right tool to use when we really are unsure as to whether something is true or not. For evidential reasoning tasks (e.g., diagnosis and recognition) it is not up to us to decide whether some hypothesis is true or not; all we have is probabilities and evidence to work out what is most likely true. Similar considerations motivated the addition of probabilities to consistency-based diagnosis [de Kleer and Williams, 1989]. Perhaps the closest work to that presented here is that of Stickel [Stickel, 1988]. His is an iterative deepening search for the lowest cost explanation. He does not consider probabilities. 6 U sing existing logic programming technology In this section we show how the branch and bound search can be compiled into Prolog. The basic idea is that when we are choosing a partial explanation to explore, we can choose any of those with maximum probability. If we choose the last one when there is more than one, we carry out a depth-first search much like normal Prolog, except when making assumptions. We only add to the priority queue when making assumptions, and let Prolog do the searching when we are not. 6.1 Remaining subgoals Consider what subgoals remain to be solved when we are trying to solve a goal. Consider the clause: h b1 /\ b2 /\ ••• /\ bm . Suppose R is the conjunction of subgoals that remain to be solved after h in the proof. If we are using the leftmost reduction of subgoals, then the conjunction of sub goals remaining to be solved after subgoal bi is Suppose in our proof we select a possible hypothesis h of cost P( {h}) with U being the conjunction of goals remaining to be solved, and T the set of currently assumed hypotheses with cost peT). We only want to consider this as a possible contender for the best solution if P( {h} U T) is the minimal cost of all proofs being considered. The minimal cost proofs will be other proofs of cost peT). These can be found by failing the current subgoal. Before we do this we need to add U, with hypotheses {h} U T to the priority queue. When the proof fails we know there is no proof with the current set of hypotheses; we remove the partial proof with minimal cost from the priority queue, and continue this proof. We do a branch and bound search over the partial explanations, but when the priorities are equal, we use Prolog's search to prefer the last added. The overhead on the resolution steps is low; we only have to do a couple more simple unifications (a free variable with a term). The main overhead occurs when we reach a hypothesis. Here we store the hypotheses and remaining goals on a priority queue and continue or search by failing the current goal. This is quick (if we implement the priority queue efficiently); the overhead needed to find aU proofs is minimal. Appendix A gives code necessary to run the search procedure. 7 Conclusion This paper has considered a logic programming approach that uses a mix between depth-first and branch-andbound search strategies for abduction where we want to consider probabilities, and only want to generate the most likely explanations. The underlying language is a superset of pure Prolog (without negation-as-failure), and the overhead of executing pure Prolog programs is small. f- bi+1 /\ ... /\ bm /\ R The total information of the proof is contained in the partial explanation at the point we are in the proof, i.e., in the remaining subgoals, current hypotheses and the associated answer. The idea we exploit is to make this set of subgoals explicit by adding an extra argument to each atomic symbol that contains all of the remaining subgoals. 6.2 Saving partial proofs There is enough information within each subgoal to prove the top level goal it was created to solve. When we have a hypothesis that needs to be assumed, the remaining subgoals and the current hypotheses form a partial explanation which we save on the queue. We then fail the current subgoal and look for another solution. If there are no solutions found (i.e., the top level computation fails), we can choose a saved subgoal (according to the order given in section 3.1), and continue the search. A Prolog interpreter This appcndix gives a brief overvicw of a lnctainterpreter. Hopefully it is enough to be able to build a system. Our implementation contains more bells and whistles, but the core of it is here. A.l Prove prove(G, To, T 1 , Go, G 1 , U) means that G can be proven with current assumptions To, resulting in assumptions Tl, where Gi is the probability of Ii, and U is the set of remaining subgoals. The first rule defining prove is a special purpose rule for the case where we have found an explanation; this reports on thc answer found. prove(ans(A),T,T,C,C,_) :- !, ans(A,T,C). The remaining rules are the real definition, that follow a normal pattern of Prolog meta-interpreters [Sterling and Shapiro, 1986]. prove(true,T,T,C,C,_) :- !. prove((A,B),TO,T2,CO,C2,U) :- !, 537 prove(A,TO,Ti,CO,Ci,(B,U», prove(B,Ti,T2,Ci,C2,U). prove(H,T,T,C,C,_) :hypothesis(H,PH), member(H, T), ! . prove(H,T,[HIT],C,Ci,U) hypothesis(H,PH), \+ (( member(Hi,T), makeground((H,Hi», nogood(H,Hi) », Ci is C*PH, add_to_PQ(process([HITJ,Ci,U», fail. prove(G,TO,Ti,CO,Ci,U) :rul(G,B), prove(B,TO,Ti,CO,Ci,U). A.2 Rule and disjoint declarations We specify the rules of our theory using the declaration rule(R) where R is the form of a Prolog rule. This asserts the rule produced. rule((H :- B» :- !, assert(rul(H,B». rule(H) :assert(rul(H,true». The disjoint declaration forms nogoods and declares probabilities of hypotheses. :- ope 500, xfx, : ). disjoint( [J). disjoint([H:PIRJ) assert(hypothesis(H,P», make_disjoint(H,R), disjoint(R). make_disjoint(_,[]). make_disjoint(H,[H2: _ I RJ) assert(nogood(H,H2», assert(nogood(H2,H», make_disjoint(H,R). A.3 Explaining To find an explanation for a subgoal C we execute explain( C). This creates a list of solved explanations and the probability mass found (in "done"), and creates an empty priority queue. explain(G) :assert(done([J,O», initQ, ex ( (G, ans (G) ) , [J ,1) , ! • We can report the explanations found, the estimates of the prior probability of the hypothesis, etc, by defining ans(C, D, C), which means that we have found an explanation D of C with probability C. ans ( G, [J , _ ) :llriteln( [G, I is a theorem. ,]), ! . ans(G,D,C) :allgood(D), qmass (QM) , retract(done(Done,DC», DCi is DC+C, assert(done([expl(G,D,C)IDone],nCi», TC is DCi + QM, llriteln(['Probabilityof I,G, ,= [1,DCi,',I,TC,IJ']), Pri is C / TC, Pr2 is C / DCi, llriteln( ['Explanation: I ,nJ), llriteln(['Prior = I,CJ), llriteln(['Posterior = [',Pri,', I,Pr2, IJIJ). more is a way to ask for more answers. It will take the top priority partial proof and continue with it. more :- ex(fail,_,_). A.4 Auxiliary relations used The following relations were also used. They can be divided into those for managing the priority queue, and those for managing the nogoods. We assume that there is a global priority queue into which one can put formulae with an associated cost and from which one can extract the least cost formulae. We assume that the priority queue persists over failure of subgoals. It can thus be implemented by asserting into a Prolog database, but cannot be implemented by carrying it around as an extra argument in a meta-interpreter [Sterling and Shapiro, 1986], for example. We would like both insertion and removal from the priority queue to be carried out in log n time where n is the number of elements of the priority queue. Thus we cannot implement it by having the queue asserted into a Prolog database if the asserting and retracting takes time proportional to the size of the objects asserted or retracted (which it seems to in the implementations we have experimented with). Four operations are defined: initQ initialises the queue to be the empty queue, with zero queue mass. add_to_PQ(process(D, C, U)) exeC, D, C) tries to prove C with assumptions D such that probability of Dis C. If G cannot be proven, a partial proof is taken from the priority queue and restarted. This means that ex( C, D, C) succeeds if there is some proof that succeeds. adds assumption set D, with probability C and remaining subgoals U to the priority queue. Adds C to the queue mass. ex (G, D, C) :prove(G,D,_,C,_,true). ex(_,_,_) :remove_from_PQ(process(D,C,U»,!, ex(U,D,C). if the priority queue is not empty, extracts the element with highest probability (highest value of C) from the priority queue and reduces the queue mass by C. remove_!1'om_PQ fails if the priority queue is empty. remove_from_PQ(process(D, C, U)) qmass(l\l) 538 returns the sum of the probabilities of elements of the queue. We assume the relation for handling nogoods: fails if L has a subset that has been declared nogood. [Poole et ai., 1987] D. Poole, R. Goebel, and R. Aleliunas. Theorist: A logical reasoning system for defaults and diagnosis. In N. Cercone and G McCalla editors r,he Knowledge Frontier: Essays i'n the Re~resenta~ tzon of Knowledge, pages 331-352. Springer-Verlag, New York, NY, 1987. Acknowledgements [Poole, 1987] D. Poole. A logical framework for default reasoning. Artificial Intelligence, 36(1):27-47, 1987. allgood(L) Thanks to Andrew Csinger, Keiji Kanazawa and Michael Horsch for valuable comments on this paper. This research was supported under NSERC grant OGP0044121, and under Project B5 of the Institute for Robotics and Intelligent Systems. References [Apt and Bezem, 1990] K. R. Apt and M. Bezem. Acyclic programs (extended abstract). In Logic Programming: Proceedings of the Seventh International Conference, pages 617-633. MIT Press, 1990. [Clark, 1978] K. L. Clark. Negation as failure. In H. Gallaire and J. Minkel', editors, Logic and Databases, pages 293-322. Plenum Press, New York, 1978. [Console et al., 1991] L. Console, D. Theseider Dupre, and P. Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1991. [Cox and Pietrzykowski, 1987] P. T. Cox and T. Pietrzykowski. General diagnosis by abductive inference. Technical Report CS8701, Computer Science, Technical University of Nove Scotia Halifax April 1987. ' , [de Kleer and Williams, 1989] J. de Kleer and B. C. Williams. Diagnosis with behavioral modes. In Proc. 11th International Joint Con/. on Artificial Intelligence, pages 1324-1330, Detroit, August 1989. [Goebel et al., 1986] R. Goebel, K. Furukawa, and D. Poole. Using definite clauses and integrity constraints as the basis for a theory formation approach to diagnostic reasoning. In E. Shapiro, editor, Proc. Third International Conference on Logic Programming, pages 211-222, London, July 1986. [Korf, 1985] K. E. Korf. Depth-first iterative deepening: an optimal admissable tree search. Artificial Intelligence, 27(1):97-109, September 1985. [Lloyd, 1987] J. W. Lloyd. Foundations of Logic Programming. Symbolic Computation Series. SpringerVerlag, Berlin, second edition, 1987. [Naish, 1986] L. Naish. Negation and Control in Pro10g.Lecture Notes in Computcr Scicnce 238. Springer ' Verlag, 1986. [Pearl, 1988] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988. [Pereira and Shieber, 1987] F. C. N. Pereira and S. M. Shieber. Prolog and Natural-Language Analysis. Center for the Study of Language and Information, 1987. [Pool~, 1988] D,. Pool~. Representing knowledge for logIc-based dIagnosIs. In International Conference on Fifth Generation Computing Systems, pages 12821290, Tokyo, Japan, November 1988. [Poole, 19~1a] D. Poole. Compiling a default reasoning system mto Prolog. New Generation Computing Journal, 9(1):3-38, 1991. [Poole, 1991b] D. Poole. Representing Bayesian networks within probabilistic Horn abduction. In Proc. Seventh Con/. on Uncertainty in Artificial Intelligence, pages 271-278, Los Angeles, July 1991. [Poole, 1991c] D. Poole. Representing diagnostic knowledge for probabilistic Horn abduction. In Proc. 12th International Joint Conf. on Artificial Intelligence, pages 1129-1135, Sydney, August 1991. [Poole, 1992a] D. Poole. Probabilistic Horn abduction and I3ayesian networks. Technical Report 92-2, Department of Computer Science, University of I3ritish Columbia, January 1992. [Poole, 1992b] D. Poole. Search for computing posterior probabilities in I3ayesian networks. Proc. Eighth Con/. on Uncertainty in Artificial Intelligence, submitted, Stanford, California, July 1992. [Pople, 1973] H. E. Pople, Jr. On the mechanization of abductive logic. In Proc. 3rd International Joint Conf. on Artificial Intelligence, pages 147-152, Stanford, August 1973. [Reiter and de Kleer, 1987] R. Reiter and J. de Kleer. Foundations of assumption-based truth maintenance systems: preliminary report. In Proc. 6th National Conference on Artificial Intelligence, pages 183-188, Seattle, July 1987. [Sattar and Goebel, 1991] A. Sattar and R. Goebel. Using crucial literals to select better theories. Computational Intelligence, 7(1):11-22, February 1991. [Sterling and Shapiro, 1986] L. Sterling and E. Shapiro. The Art of Prolog. MIT Press, Cambridge, MA, 1986. [Stickel, 1988] M. E. Stickel. A prolog-like inference system for computing minimum-cost abductive explanations in natural language interpretations. Technical Note 451, SRI International, Menlo Park, CA, September 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 539 Abduction in Logic Programming with Equality P.T. Cox, E. Knill, T. Pietrzykowski Technical University of Nova Scotia, School of Computer Science P.O. Box 1000, Halifax, Nova Scotia Canada B3J 2X4 Abstract Equality can be added to logic programming by llsing surface deduction. Surface deduction yields interpretations of unification failures in terms of residual hypotheses needed for unification to succeed. It can therefore be used for abductive reasoning with equality. In surface deduction the input clauses are first transformed to a flat form (involving no nested terms) and symmetrized (if necessary). They are then manipulated by binary resolution, a restricted version of factoring and compression. The theoretical properties of surface deduction, including refutation completeness and weak deductive completeness properties (relative to equality), are established in [Cox et al. 1991]. In this paper we show that these properties imply that an enhancement of surface deduction will yield all parsimoniolls hypotheses when used as an abductive inference engine. The characterization of equational implication for goal clauses given in [Cox et al. 1991] is shown to yield a uniquely defined equationally equivalent residuum for every goal clause. The residuum naturally represents the corresponding abductive hypothesis. An example illustrating the use of surface deduction in abductive reasoning is presented. 1 Introduction In abductive reasoning, the task is to explain a given observation by introducing appropriate hypotheses ([Cox and Pietrzykowski 1987], [Goebel 1990]). Most presentations of abduction do not include reasoning with equality, nor do they allow the introduction of equality assumptions to explain an observation. A notable exception is E. Charniak's work on motivation analysis [Charniak 1988]. Charniak allows the introduction of certain restricted equality assumptions to determine motivations for observed actions. He shows that the introduction of such equality assumptions is required to successfully abduce motivations. In this paper we consider the problem of abductive reasoning with Horn clauses in the presence of equality. We show that surface deduction has the necessary properties for use in an abductive inference system provided that the input theory contains the function substitutivity axioms. In the presence of equality, an abduction problem consists of a theory T and a formula 0 (the observation). An explanation of (0, T) is a formula E consistent with T such tha.t E together with T equationally implies O. We will assume that 0 and E are existentially quantified conjunctions of facts and that T is a Horn clause theory. One way to obtain an explanation E, given an observation 0 and a theory T, is to deduce -,E from T and -,0. Since explanations with less irrelevant information are preferred (the pa1'simony principle), it is sufficient to deduce a clause -,E' such that -,E' implies -,E. Intuitively, E' is at least as good an explanation as E (see Section 4). It follows that a deduction system adequate for abductive reasoning should satisfy a weak deductive completeness: If the theory T implies a non-tautological clause -,E, then we must be able to deduce a clause -,E' from T such that -,E' implies -,E. In the absence of equality, SLD-resolution (see [Lloyd 1984]) satisfies this condition. The problem of introducing equality to Horn clause logic has been well-studied, see [Holldobler 1989] for an excellent overview. The simplest approach to this problem involves adding the equality axioms (which are Horn clauses) to the set of input clauses. However, unrestricted use of these axioms results in inefficiency. Furthermore, this approach does not yield any insights into the degree to which the equality axioms are needed. Paramodulation and other term rewriting systems do not explicitly introduce new equality assumptions into derivations and therefore do not satisfy the weak deductive completeness condition. Other approaches, such as the ones in [van Emden and Lloyd 1984] and extended in [Hoddinott and Elcock 1986] using the homogeneous form of clauses, require restricting the form of the input theory. Here, we use the results of [Cox et al. 1991] to show that if equality is introduced to Horn clause logic via surface deduction with the function substitutivity axioms, then all preferred explanations for an abduction problem ca.n be obta.ined. The need for axioms of equality other than function substitutivity is thus eliminated. 540 In surface deduction, a set of input clauses is first transformed to a flat form and symmetrized. The deduction then proceeds using linear input resolution for Horn clauses (see [Lloyd 1984]) together with a limited use of factoring and a new rule called compression. The additional deduction rules are equivalent to those restricted uses of the reflexivity axiom (x ~ x :-) which preserve flatness. They are required only at the end of a deduction. A clause is flat if it has no nested functional expressions, and every variable which appears immediately to the right of an equality symbol (~) appears only in such positions. A stronger version of flatness requires that in addition the clause is separated. This means that every variable appears at most once in any given literal and has only one occurrence inside a functional or relational expression. Symmetrization affects only those clauses with equalities in their heads (see Section 3). The idea of using flattening to add equality to theorem proving is due to [Brand 1975] and is applied to logic programming in [Cox and Pietrzykowski 1986] Flattening is where surface deduction is defined. closely related to narrowing. In narrowing the process of flattening is implicit in the deduction rules. The relationship between the two methods is examined in [Bosco et al. 1988]. Separation of terms is implicit in the transformations to the homogeneous forms of [Hoddinott and Elcock 1986]. The symmetrization method used here is similar to the one introduced in [Chan 1986] and does not increase the number of clauses in the theory. In [Cox et al. 1991] it is shown that surface deduction satisfies a weak deductive completeness provided that the input clauses are first transformed to separated form. As an application of this result, equational implication for goal clauses is found to have a simple syntactic characterization analogous to subsumption. Once an explanation E is obtained by surface deduction, in what form should E be presented? For example if ,E (the actual clause deduced) is given by :- x ~ a, y ~ b, y ~ c, then :- y ~ b, y ~ c is equationally equivalent to ,E. Therefore the atom x ~ a is irrelevant and should be removed. In Section 4 it is shown that the cha.racterization of equational implication for goal clauses given in [Cox et al. 1991] implies that for every goal clause G there is a uniquely defined equational residuum RES( G) which cannot be further reduced without weakening the corresponding explanation. The notion of equational residuum is related to that of prime implicates used in switching theory [Kohavi 1978], truth maintenance systems [Reiter and de Kleer 1987] a.nd diagnoses [de Kleer et al. 1988]. RES( G) is an equational prime implicate of a flattening of C. In Section 2 the terminology is established; in Section 3 surface deduction is defined and the completeness results needed for abductive reasoning are given. In Section 4 the formalism of abductive reasoning with surface deduction is discussed; and finally in Section 5 an example is presented of an abductive problem solved by using surface deduction. 2 Preliminaries Familiarity with logic programming is assumed (see e.g. [Lloyd 1984]). As in [Holldobler 1990], let ~ denote the equality predicate symbol. The usual equality symbol = is used exclusively for syntactic equality. If L is an atom and C = {Ml , ... , Aln} is a set of atoms, then L :- C denotes the Horn clause L V ,Ml V ... ,Mn. In this expression, L is the head and G is the body of the clause. A clause of the form :- C is a goal clause. The atoms of C are the subgoals of :- G. A clause of the form L:- is a fact. If C l , ... ,Gn are sets of atoms and G is the union of the Gi , then L :- C l , ... , Cn means L :- C. \i\/hen possible, set notation is omitted for one-element sets. If OP is an operation which maps clauses to clauses and A is a set of clauses, then OP(A) = {OP(G) ICE A}. Let (Y be a substitution. If Xi(Y = ti for i = 1, ... ,n and X(Y = x for all other variables, then (Y is denoted by {Xl f - t}, ... Xn f - t n }. A substitution (Y is variable-pure iff X(Y is a variable for every variable x. The expression 'most general unifier' is abbreviated by 'mgu'. An equality is an atom of the form s ~ t. Let ['; be the set of equality axioms other than x ~ x :-. If A and B are sets of clauses, then A satisfies (or implies) B iff every model of A is a model of B. A equationally satisfies (or implies) B iff A u ['; u {x ~ x :- } satisfies B. A and Bare (equationally) equivalent iff each (equationally) satisfies the other. A is equationally inconsistent iff A equationally implies the empty clause. 3 Surface Deduction In surface deduction, a refutation of a set of input clauses proceeds by first transforming the input clauses to a flat form and then refuting the result using resolution, factoring and compression. The transformation subsumes the equality axioms other than reflexivity. The rules of factoring and compression subsume reflexivity. Definition. Let C be a clause and t a term. An occurrence of t on the left-hand side (right-hand side) of an equality t ~ s (s ~ t) in C is a root (surface) occurrence of t in C. Every other occurrence of t is an internal occurrence of t. The term t is a root term of C iff it has a root occurrence in G. Surface and internal terms are defined analogously. 541 Definition. A clause C is flat iff (i) every atom of C is of the form P(x}, ... , x n ), x == f(XI,''''X n ) or x == y, and (ii) no surface variable of C is a root or internal variable of C. Definition. Let C be a Horn clause. An elementary flattening of C is obtained by either (i) replacing some of the non-surface occurrences of a non-variable term t by a new variable y and adding the equality y == t to the body, or (ii) replacing some of the surface occurrences of a root or internal variable x of C by a new variable y and adding the equality x == y to the body. An elementary flattening of the set of clauses A is obtained by replacing a clause in A by an elementary flattening of that clause. Modifying a clause C by successive elementary flattenings eventually results in a flat clause (a flattening of C) which cannot be flattened any further (Theorem 2 of [Cox and Pietrzykowski 1986]). Definition. Let C be a clause. Then FLAT( C) denotes a (arbitrary but fixed) flattening of C. For any set of clauses A, FLAT(A) is equationally equivalent to A. In [Cox et al. 1991] it is shown that for refutation completeness the transformation FLAT subsumes the substitutivity axioms but not transitivity and symmetry. In order to subsume transitivity and symmetry, we need another transformation. Definition. Let C be a clause with an equality in its head. Then C is symmetric iff C is of the form x == u :- x == v, s == v, y == u, y == t, 111 for some terms sand t and set of atoms 111, where x, y, u and v do not occur in M, s or t. The set of clauses A is symmetrized iff every clause C of A with an equality in its head is symmetric. Definition. Let C be a Horn clause. If C does not have an equality in its head or if C is symmetric, then the symmetrization SYM( C) of Cis C. If C is not symmetric and of the form s == t :- 111, then SYM (C) is given by x == u :- x == v, s == v, y == 1l, Y == t, 111. Note that if A is a set of Horn clauses, then SYM(A) is equationally equivalent to A, and if A is flat, then SYM(A) is flat. In [Cox et al. 1991] it is shown that the transformation SYM subsumes transitivity and symmetry. In order to subsume substitutivity, transitivity and symmetry, the transformations SYM and FLAT are composed. Flattening and symmetrization followed by SLDresolution using resolution with x == x :- as an additional deduction rule is refutation complete for logic programming with equality. However, weak deductive completeness is not satisfied [Cox et al. 1991]. In order to obtain weak deductive completeness an additional transformation is required. Definition. A positive (negative) root occurrence of the term t in the clause C is a root occurrence in the head (body) of C. Definition. The flat clause C is separated in the variable x iff (i) every literal of C has at most one occurrence of x, (ii) C has at most one internal occurrence of x, and (iii) if x has an internal occurrence in C, then x has a negative root occurrence in C. The clause C is separated iff C is separated in all its variables. If A is a set of separated flat Horn clauses, then SYM( A) is separated. Separated clauses can be obtained from a given fla.t clause by using the transformation SEP: Definition. Let C be a flat clause and x a variable. The clause SEP( C) is the separated flat clause obtained by applying the following transformation to C: For every variable x such that C is not separated in x, replace each internal occurrence of x by a new variable Xi and add the equalities x == y, Xl == y, x2 == y, ... to the body of C (where y is a new surface variable). The rules of factoring and compression used in surface deduction are: (i) Root factoring. The clause C, is a root factor of C iff C, is obtained by factoring two equalities of C with the same root variable. (ii) Surface facto1'ing. The clause C' is a surface factor of C iff C, is obtained by factoring two equalities of C with the same surface term. (iii) Root compression. The clause C, is a root compression of C iff C' is obtained by removing an equality x == t from the body of C, where x has only one occurrence in C. (iv) Surface compression. The clause C' is a surface compression of C iff C' is obtained by removing an equality x == y from the body of C, where y has only one occurrence in C. 542 A compression is a root or surface compression. A compression of a clause C is a clause C' obta.ined [rom C by a sequence of applications of compression rules. The soundness of root and surface factoring and compression (in the presence of equality) is shown in [Cox and Pietrzykowski 1986]. Observe that binary resolution, surface and root factoring and compression preserve flatness. The relationship between factoring, compression and resolution with the reflexivity axiom is determined by the following result (proved implicitly in [Cox and Pietrzykowski 1986] and explicitly in [Cox et al. 1991]; see also [Hoddinott and Elcock 1986]): Theorem 3.1 Let ;- C be a fiat goal clause. If ;- c' is a fiat goal clause obtained from ;- C by a sequence of binary resolutions with x ~ x ;-, then ;- C' can be obtained from ;- C by a sequence of root and sU1face facto rings and compressions. Definition. Let A be a set of fiat I-lorn clauses. The flat goal clause C is S -deducibl e from A iff C can be obtained from A by a sequence of binary resolutions, surface and root factorings and compressions. Note that we can assume that the deduction is linear. A is Srefutable iff the empty clause is S-deducible from A. Definition. Let :- C be a goal clause. An equa:- C is a minimal subclause of REDU(FLAT( :- C)) which is equationally equivalent to tional residuum of :-C. Every equational residuum of :- C is equationally equivalent to :- C. The fact that every subclause of a reduced clause is reduced implies that if :- C' is an equational residuum of :- C, then :- C' is reduced. The next theorem shows that the equational residuum is unique. Theorem 3.4 [Cox et al. 1991] Let ;- A' and ;- B' be equational 1'esidua of the goal clauses ;- A and :- B respectively. Then ;- A is equationally equivalent to :- B iff ;- A' is a variant of ;- B'. 4 Abduction USIng Surface Deduction As an application of this result, the following theorem is proved in [Cox et al. 1991]: An existential conjunction of facts is a conjunction of facts with all its free variables quantified existentially. The abduction problem for Horn clause logic with equality can be stated as follows: Abduction Problem: An abduction problem is a pair (A, 0), where A is a theory of Horn clauses and 0 (the observation) is an existential conjunction of facts. An explanation of the abduction problem (A,O) is an existentia.l conjunction of facts E consistent with A such that E and A equationally imply O. Let -,0 and -,E denote the disjunctions of the negations of the constituent facts of 0 and E respectively. Since E and A equationally imply 0 iff -,0 and A equationally imply -'E, a solution to an abduction problem can be obtained by deducing a clause C from A and -,0, and negating C to obtain E. In general, it is desirable for an explanation E of an abductive problem (A,O) to have certain additional properties (see [Cox and Pietrzykowski 1987]). For example, an explanation E should not contain any facts not required to yield the observation from A (the parsimony principle). Thus if E and E' are explanations of (A, 0) and E equationally implies E', E' is preferred over E. (Here 'preferred' is to be understood as 'at least as good as'.) For abduction, a desirable property of a deduction system is that for every explanation E of an abductive problem (A, 0), one can obtain an explanation preferred over E. The weak completeness result of Theorem 3.2 implies that surface deduction with separated clauses and the function substitutivity axioms has this property. Theorem 3.3 Let :- A and ;- B be goal clauses. Then Theorem 4.1 Let (A,O) be an abductive problem, :- A equationally implies ;- B iff there is a variable-pure substitution 0"" such that a compression of FLAT( :- A)O"" is included in REDU(FLAT( ;- B)). whe're A contains the function substitutivity axioms. Then for every explanation E of (A,O), there is an explanation E' preferred over E such To state the weak deductive completeness result for flat, separated and symmetrized clauses, we need the transformation defined next. Definition. Let :- C be a fiat goal clause. Then :- C is 1'educed iff :- C has no surface variables and no two equalities of :- C have the same right-hand sides. A fla.t reduced clause REDU( :- C) is obtained from :- C by factoring equalities with identical right-hand sides until all right-hand sides are distinct, and by removing all remaining equalities with surface va.ria.bles by surfa.ce compression. Note that for every fla.t goal clause :- C, REDV( :- C) is equationally equivalent to :- C. Let ;- C be a goal clause and A a set of Horn clauses which includes Then A equathe function substitutivity axioms. tionally implies ;- C iff the1'e is a fiat goal clause ;- C' such that for some variable-pure substitution (J", :- C'O"" ~ REDU(FLAT( ;- C)) and ;- C, is S-deducible from SYM(SEP(FLAT(A))). Theorem 3.2 [Cox et al. 1991] 543 that ,E' is S-deducible from SYM(SEP(FLAT(A))) U {SEP(FLAT( ,O))}. Proof. This follows by Theorem 3.2 and the fact that -,0 is a goal clause, so that it does not need to be sym• metrized. Fortunately, it appears that the function substitutivity axioms are rarely needed in abductive problems when using surface deduction with separated clauses. Flattenings of a clause can be viewed as alternate representations of the clause's term structure and are therefore essentially equivalent. Without loss of generality we restrict our attention to explanations E such that -,E is flat (flat explanations). If E and E' are explanations of (A, 0) such that E equationally implies E' but is not equationally equivalent to E', then E' is strictly preferred over E. Given an explanation E of (A, 0) there are many equationally equivalent existential conjunctions of facts, all of which are also explanations of (A,O). The preference criteria introduced so far do not distinguish among equationally equivalent explanations. Using the intuition that a "simpler" explanation should be preferred, we give a stronger definition of preference: Definition. Let E and E' be flat explanations. Then E' is strictly preferred over E iff either E equationally implies E' but is not equivalent to E', or E is equationally equivalent to E' and E' has fewer atoms. Given these preference criteria, we have the following theorem which determines the most preferred flat explanation among equationally equivalent ones: Theorem 4.2 For any explanation E, if E' is the negation of the equational residuum of -,E, then E' is the unique most preferred flat explanation among flat explanations equationally equivalent to E. inclusion of equality in abductive reasoning are given in [Charniak 1988]. Here we give an example from a different domain. Consider the following (imaginary, but realistic) situation. A researcher X experimentally determines the value of a quantity associated with a physical object (e.g. the mass of an isotope of an element) and sends us the result. We have independently obtained a value for the same quantity (by theory and/or experiment) and our value differs from X's value. We believe our value to be correct and we would like to explain the discrepancy. \Ve do not know the exact means by which X's value was obtained, but we know what kinds of experimental apparatus X might have used. One kind of apparatus (type A) is notorious for a hard- to-control drift in the settings which results in a systematic bias in the readings. Thus we can explain the discrepancy between our and X's values by hypothesizing that X used apparatus of type A with a systematic bias equal to the difference between the two values. The situation is formalized as follows: Let T A(x) mean that x is an apparatus of type A. Let Vt(y) be the true value of quantity y, Vm(z,y) the value of quantity y measured in experiment z, A(u) the apparatus used in experiment u and B( x) the systematic bias of apparatus x. The quantity measured by X is q, and the experiment performed by X is given the name e. With these definitions, our knowledge T consists of the clauses Tl: Vt(q)~O:- T2: Vm(XI' x 2) TA(A(XI)) Xl ~ 0 + Xl :- T3: where knowledge about other types of apparatus and theorems about real numbers other than T3 have been omitted. The observation 0 is given by 0: Proof. Let :- A be a flat clause equationally equivalent to ,E. If :- A is not reduced, then REDU( :- A) has fewer atoms than :- A and the corresponding explanation is therefore strictly preferred. Assume that :- A is reduced. If the equational residuum of :- A is not given by :- A, then the equational residuum of :- A has fewer atoms than :- A, so that the corresponding explanation is strictly preferred. The result now follows by the uniqueness theorem for equational residua, Theorem 3.4. • 5 An Application Examples from the domain of story comprehension and motivation analysis which demonstrate the need for the Vm(e,q)~2:- The first task is to obtain a flattening of T and the negation of the observation: IT3: Xl == 0 :- Xl == Vt(x 2), X2 == q. X4 ~ Xs + X6:- T A(X3)' X6 ~ B(X3)' X4 Vm(XI,X2), Xs ~ Vt(x 2), X3 ~ A(x l )· Xl ~ x2 + Xl :- X2 ~ O. fO: :- Xl == 2, Xl ~ Vm(X2' X3), X2 ~ e, X3 ~ q. ITl: IT2: The clauses ITl and fO are separated. clauses for IT2 and IT3 are given by Separated sIT2: X4 ~ Xs + x6:- T A(X3), X6 ~ B(X7)' X3 ~ XS, X7 ~ XS, X4 ~ Vm(xI' X2), Xs == Vt(XlO), X2 ~ Xg, XlO ~ Xg, X3 ~ A(x u ), Xl ~ X12, Xu ~ X12' sIT3: Xl ~ X2 + X3 :- X3 == X4, Xl == X4, X2 ~ O. 544 All clauses of T have equalities in their heads and need to be symmetrized. The fully transformed set of clauses is given by TI': X3 ~ X4:- ~ Xl T2': X3 ~ X s , Xl ~ Xs, X6 ~ X 4 , X6 ~ + X6, 0, XIS' X 4 - XIS, X16 - X 14 ' ~ B(X7), X3 ~ Xs, X7 ~ Xs, X4 == Vm(XI,X2), Xs ~ Vt(XlO), X2 ~ X9, XlO ~ X9, X3 == A(xu), Xl ~ X12, Xu ~ X12' Xl6 == Xs T A(x 3 ), root fact., surf. fact., and compr. X6 Xs X2 == X6:- + X3, X3 == X4, Xl X.j, X2 ~ 0. ~ X19 X31, X25 :- X9 e, X3 ~ 0' :- Xl ~ e, res. with T2' X3 ~ Xl ~ Vm(X2' X3), X2 ~ 2, ~ Xl9 X7 ~ XIS, + Xs X6 ~ Xu, ~ Vm(x4' x s ), :rs Xs - Xl surf. fact. followed by root fact. and compr. == == Xu, Xg ~ 0, Xg ~ ~ A(XI4)~~ XIS, X6 e, X3 ~ :- X9 == 2, T A(X6), X6 ~ X11' X 17 ~ X 21 ' ~ q, Xg ~ XIO q. X9 Xu, X12, :- Xl9 ~ 2, Xl9 ~ Xs ~ xu, X6 ~ XIS' + X9, T A(X6), X9 ~ B(XlO), X6 ~ X11, :rlO ~ Xll, Xs ~ Vt(x I3 ), :r3 ~ :r I2 , X l3 ~ X 12 ' ~ X2 ~ ~ A(X I4 ), X2 e, X3 :- X l9 ~ == 2, ~ XIS' X 14 XIS' q. x l9 ~ Xs ~ B(XlO), X6 ~ X9 + Xg, TA(X6), ~ X11' :1.:u, XlO ~ X24, X20 ~ X24, X2S ~ Vt(X I 3), X25 == 0, X 20 ~ Vt(X21 ), X21 ~ q, Xs X3 ~ X12' X l3 ~ x 12 ' X6 ~ A(x 14 ), x 2 ~ XIS, X l 4 ~ XIS, X2 and :- X l9 == 2, X I9 ~ Xs == e, X3 ~ q. + X g , T A(x6 ), ~ B(xlO)' X6 ~ :r 11 , XlO ~ Xu, Xs ~ X24, X20 ~ X2.j, X20 ~ 0, X20 ~ Vt(X3), X6 ~ A(x 14 ), X2 ~ XIS, X9 Xl4 == XI5, X2 ~ e, X3 ~ q. res. with T3' X3I, X32 ~ Xs X27 ~ T A(X6)' XlO ~ X2S' X9 Xu, + Xg, X32 ~ == X26 + X27, ~ 0, ~ B(XlO)' X6 ~ X 11 , X 2S X2S' Xs ~ X24, :1.: 26 X20 ~ X24, ~ Vt(X3), X6 ~ A(x 14 ), X2 ~ X15, X14 ~ XIS' :/:2 == e, X3 ~ q. X20 X9 ~ :X6 ~ X2 == :- X22 == 0, X l 7 ~ Vt(XIg), ~3)' X6 ~ A(X I4 ), 2, XI5, Xl4 == e, X3 ~ q. T A(X6), X9 ~ B(XlO), Xu, XlO X6 ~ B(XlO), Xs ~ X21, ~ == Xu, X6 XI5, X2 A(X2), X2 Xg ~ B(X6), X9 ~ ~ ~ A(XI4), e, X3 == e, == q. T A(X6), 2. == 0, X20 The last clause is the negation of the desired explanation. Note how two resolutions with TI' were used to simulate symmetry. Vt(X I3 ), X14 XIS, 2, ~ Vm(X2' X3), X2 ~ e, X3 ~ q. X6 res. with TI' ~ X4 ~ Xg ~ XlO ...:.. Xl3 X12, A(X14)' '&19 T A(x 6 ), X9, B(XlO), X7 reduction to the min. residuum q. q. Xl ~ XIS, :- surf. fact., root fact. and compI'. X31, ~ 2, T A(X6)' X9 ~ B(xlO), X2 ~ XIS, Xl4 ~ XI5, X2 ~ The negation of the desired explanation can now be deduced from 0'. In the deduction below, the literals involved in each step are underlined. As is usually the case, the function substitutivity axioms are not needed. ~ X6 ~ Xu, XlO XIS 0': surf. fact. compr. 2, Xl4 ~ X15, X2 ~ Xs ~ X7, Xl ~ X7, Xs ~ X6, Xs ~ == ~ X19 ~~25 == X2S, TA(X6), X9 ~ B(XlO), X6 == Xu, XlO ~ Xu, Xs ~ 0, Xs ~ Vt(x 3 ), X6 == A(X I 4), X2 - Vi(X3), res. with TI' T3': .- X9 XIS' X l4 ~ X 15 ' X 2 ~ == q. Vt(X2), X2 ~ X I4 :- XI3 ~ X l3 root fact., surf. fact. and compr. 6 Conclusion From a theoretical perspective, surface deduction is very appealing in its simplicity. We have seen how (at least in theory) surface deduction can be applied in situations such as abductive reasoning where deduction rather than refutation is the primary goal. If the equality theory of interest contains function substitutivity, a problem with using surface deduction for abduction is that in general the function substitutivity axioms are still required. Current research indicates that to a large extent, the function substitutivity axioms can be ignored in abductive problems when using surface deduction with symmetrized, separated and flat clauses. VVe do not know any practical example where this is not the case. From a practical point of view, one of the frequently recognized problems with flattening the clauses of the input theory is that one loses most of the advantages of unification, particularly if the input theory contains few equalities. One can regain some of these advantages in practice by interpreting the set of equalities in the body of a clause as a directed graph or hypergraph (with arcs from the root variables to the surface terms) which defines the set of possible definitions of the main terms and variables of the clause. Such a directed graph generalizes the usual tree representation of terms. Unification and more generally term rewriting can then be replaced by (hyper)graph rewriting rules. To implement 545 this idea, the deduction procedures must be substa.ntially enhanced. The types of graph rewriting rules and graph representations needed require further research. The preference criteria for explanations given in Section 4 are very weak. However, we believe that no matter what preference criteria are used, RES( C) is at least as good an explanation as C. One of the most important problems in abductive reasoning is to determine stronger preference criteria to avoid combinatorial explosion. These issues are discussed in [Poole and Provan 1990]. Many ofthe results used in this paper can be generalized to arbitrary clauses so that the restriction of abductive reasoning to Horn clause theories ca.n be removed. These generalizations will be the topic of a forthcoming paper. References [Baxter 1976] L. D. Baxter. The C01nple;rity of Unification. Ph.D. Thesis, University of 'Waterloo, 1976. [Bosco et al. 1988] P. G. Bosco, E. Giovannetti, and C. Moiso. Narrowing vs. SLD-Resolution. Theoretical Computer Science Vol. 59 (1988), pp. 3-2:3. [Brand 1975] D. Brand. Proving Theorems with the Modification Method. SIA1\1 1. Comput. Vol. 4 (1975), pp. 412-430. [Chan 1986] K. H. Chan. Equivalent Logic Programs and Symmetric Homogeneous Forms of Logic Programs with Equality. Technical Report 86, Dept. of Computer Science, Univ. of Western Ontario, London, Ont., Canada, 1986. [Charniak 1988] E. Charniak. Motivation Analysis, Abductive Unification, and Nonmonotonic Equality. Artificial Intelligence Vol. 34 (1988), pp. 275-295. [Colmerauer et al. 1982] A. Colmerauer et al. Prolog II: Reference Manual and Theoretical Model. Groupe d'Intelligence Artificielle, Faculte des Sciences de Luminy, Marseilles, 1982. [Cox and Pietrzykowski 1985] P. T. Cox and T. Pictrzykowski. Surface Deduction: a Uniform Mechanism for Logic Programming. In Pmc. Syrnp. On Logic Programming, IEEE Press, Washington, 1985. pp. 220227. [Cox and Pietrzykowski 1986] P. T. Cox and T. Pietrzykowski. Incorporating Equality into Logic Programming via Surface Deduction. Ann. P1l1'e Appl. Logic, Vol. 31 (1986), pp. 177-189. [Cox and Pietrzykowski 1987] P. T. Cox and T. Pietrzykowski. General Diagnosis by Abductive Inference. In Proceedings of the Symposium on Logic Programming, IEEE Press, Washington, 1987. pp. 183-189. [Cox et al. 1991] P.T. Cox, E. Knill, and T. Pidrzykowski. Equality and Abductive Residua for Horn Clauses. Technical Report TR-8-1991, School of Computer Science, Technical University of Nova Scotia, Halifax, NS, Canada, 1991. [van Emden and Lloyd 1984J M.H. van Emden and J.W. Lloyd. A Logical Reconstruction of Prolog II. In Proc. 2nd Int!. Conlon Logic Prog. Uppsala, 1984. pp. 35-40. [Goebel 1990] R. Goebel. A Quick Review of Hypothetical Reasoning Based on Abduction. In AAAI Spring Symposium on Automated Abduction, Stanford University, 1990. pp. 145-149. [Hoddinott and Elcock 1986] P. Hoddinott and E.W. Elcock. PROLOG: Subsumption of Equality Axioms by the Homogeneous Form. In Pmceedings of the Symposium on Logic Programming, 1986. pp. 115-126. [Bolldobler 1989] S. I-Iolldobler. Foundations of. Equational Logic Programming. Lecture Notes in Computer Science 353, Springer Verlag, Berlin, 1989. [Holldobler 1990] S. Holldobler. Conditional Equational Theories and Complete Sets of Transformations. TheO1,etical Computer Science, Vol. 75 (1990), pp. 85-110. [de Kleer et al. 1988] J. de Kleer, A. K. Mackworth, and R. Reiter. Characterizing Diagnoses. In Proceedings Eighth National Conference on Artificial Intelligence, 1990. pp. 324-330. [Kohavi 1978J Z. Kohavi. Switching and Finite Automata Theory. McGraw-Hill, 1978. [Lloyd 1984] J. W. Lloyd. Foundations of Logic Programming. Springer Verlag, Berlin, 1984. [Paterson and Wegman 1978J M. S. Paterson and M. N. \Vegman. Linear Unification. J. Comput. Syst. Sci. Vol. 16 (1978), pp. 158-167. [Poole 1988] D. Poole. A Logical Framework for Default Reasoning. Ar·tificial Intelligence Vol. 36 (1988), pp.27-47. [Poole and Provan 1990] D. Poole and G. M. Provan. What is an Optimal Diagnosis? In Conference on Uncertainty in AI, Boston, 1990. [Reiter and de Kleer 1987J R. Reiter and J. de Kleer. Foundations of Assumption-Based Truth Maintenance Systems: Preliminary Report. In Proceedings of the National Conference on A l·tificial Intelligence, 1987. pp. 183-188. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 546 Hypothetico-deductive Reasoning Chris Evans * and Antonios C. Kakas t *Department of Mathematical Studies, Goldsmiths' College, University of London New Cross, London SE14 6NW, UK. EMAIL: c.evans@gold.lon.ac.uk. tDepartment of Computer Science, University of Cyprus, 75 Kallipoleos Street, Nicosia, Cyprus. EMAIL: kakas@cyearn.earn (Part of the research for this paper was completed while both authors were at Imperial College, London SW7 2BZ) Abstract This paper presents a form of reasoning called "hypothetico-deduction", that can be used to address the problem of multiple explanations which arises in the application of abduction to knowledge assimilation and diagnosis. In a framework of hypothetico-deductive reasoning the knowledge is split into the theory T and observable relations S which may be tested through experiments. The basic idea behind the reasoning process is to formulate and decide between alternative hypotheses. This is performed through an interaction between the theory and the actual observations. The technique allows this interaction to be user mediated, permitting further information through the acquisition of experimental tests. Abductive explanations which have all their empirical consequences observed are said to be "fully corroborated". We set up the basic theoretical framework for hypothetico-deductive reasoning and develop a corresponding proof procedure. We demonstrate how hypothetico-deductive reasoning deals with one of the main characteristics of common-sense reasoning, namely incomplete information, through the use of partial corroboration. We study the extension of basic hypothetico-deductive reasoning applied to theories that incorporate default reasoning as captured by negation-as failure (NAF) in Logic Programming. This is applied to the domain of Temporal Reasoning, where NAP is used to formulate default persistence. We show how it can be used successfully to tackle typical problems in this domain. 1 Motivation Abduction is commonly adopted as an approach to diagnostic reasoning [Reggia & Nau, 1984], [Poole, 1988]. However, there are frequently many possible abductive explanations for a given observation. This is the problem of "multiple explanations". In order to choose between these explanations it becomes necessary to collect more information. Consider the Crime Detection example formalized below (Theory Tl). Suppose we arrive at the scene of the crime and the first observation we make is that someone is dead. We seek an explanation for this on the basis of the theory Tl above. Suppose we accept that there are only three possible causes of death: being strangled, being stabbed, or drinking arsenic (these are technically known as the abducibles). Simple abduction starting from the observation "dead" yields precisely these three possible explanations. In order to choose between these multiple explanations, we need to collect more information. For example, if we examined the corpse and discovered that there were marks on the neck, we Theory Tl strangled ~ blood_loss poisoned ~ ~ dead strangled dead stabbed dead ~ ~ neck_marks blood_loss drunk_arsenic ~ poisoned might take this as evidence for the first explanation over the others. Moreover, we know that drinking arsenic also has the consequence of leaving the victim with a blue tongue, so we might like to look for that. One approach to deciding between multiple explanations is through the performance of crucial experiments ([Sanar & Goebel, 1989]): pairs of explanations are examined for contradictory consequences, and an experiment is performed which refutes one of them whilst simultaneously corroborating the other. With n competing explanations we must thus perform at most (n-l) crucial experiments . The crucial experiment approach is, however, unable to choose between explanations when they fail to have contradictory consequences or when they have contradictory consequences that are not empirically determinable (e.g. Tychonic and Copernican world systems). In our example, for instance, the explanations "strangled" and "stabbed" are not incompatible. It is possible that the victim was both strangled and stabbed. As result, there can be no crucial experiment that will decide between the two. However, further evidence might lead us to accept one explanation, whilst tentatively rejecting the other. For example, knowledge that the person exhibits marks on the neck supports the "strangled" hypothesis. In fact we have all the theoretically necessary observations to conclude that the victim was strangled. On the other hand, the "stabbed" hypothesis implies "blood_loss", which if not observed might lead us to favour the "strangled" explanation. Note that later evidence of blood loss would lead us to return to the "stabbed" hypothesis (in addition to "strangled"). From our viewpoint, crucial experiments are the speCial case of general hypothetico-deductive reasoning when an hypothesis is refuted whilst simultaneously corroborating a second. The process of hypothetico-deductive reasoning allows the formation and testing of hypotheses within an interactive framework which is applicable to a wide 547 class of applications and is implementable using existing technology for resolution. The technique of hypothetico-deductive reasoning has its origin in the Philosophy of Science. It was primarily proposed by opponents of Scientific Induction. Its notable contributors were Karl Popper ([Popper, 1959],[Popper, 1965]), and Carl Hempel [Hempel, 1965]. In its original context, hypotheticodeduction is a method of creating scientific theories by making an hypothesis from which results already obtained could have been deduced and which entails new predictions that can be corroborated or refuted. It is based on the idea that hypotheses cannot be derived from observation, but once formulated can be tested against observation. The hypothetico-deductive mechanism we formulate, resembles this method in having the two components of hypothesis formation and corroboration. It differs from the accepted usage of the term in philosophy of science by the status of the hypothesis formation component. In the philosophy of the process of hypothesis formation is equivalent to theory formation: a creative process in which a complete theory is constructed to account for the known observations. By contrast, the method we describe here starts with a fixed generalized theory which is assumed to be complete and correct. The task is to construct some hypotheses which when added to the theory have the known observations as logical consequences. The process is more akin to that used by an engineer when they apply classical mechanics to a particular situation: they don't seek a new physical theory, but rather a set of hypotheses which would explain what they have observed. Since, for us, hypothesis formation can be mechanized, we do not have to tackle the traditional issues of the philosophy of science concerning the basis of theory formation. We thus avoid (like Poole before us [Poole, 1988, p.28]) one of the most difficult problems of science. This paper is organized as follows. We first describe the reasoning process and present the logical structure of the reasoning mechanism, indicating how it relates to classical deduction and model theory. Abductive and corroborative derivation procedures for implementing the reasoning process are then defined through resolution. We indicate how this reasoning technique relates to current work on abduction and diagnostic reasoning, and suggest some possible extensions. We illustrate the features and applicability of this reasoning method with several examples. We then describe the extension of hypothetico-deduction to apply to theories which include some form of default reasoning, using negation-as-failure as an example. We consider a typical application of defaults in causal reasoning, namely default persistence, and provide several further examples which illustrate this extension. 2 Hypothetico-deductive Framework Suppose we have a fixed logical theory T about the world. For example, it might be a medical model of the anatomy, or a representation of the connections in an electrical network, or a model of the flow of urban traffic in Madrid. Let us divide the relations in the theory into two categories: empirical and theoretical. How we make this distinction will depend on how we interpret these relations in the domain for the theory. An empirical relation is one which can be (or has been) observed. For example, the blood pressure of a patient, the status of a circuit-breaker (open or closed), or the number of cars passing some point. By contrast, a theoretical relation is in principle not observable. Examples of theoretical relations might be infection with an influenza virus, the occurrence of a short-circuit from the viewpoint of a control centre, or the density of traffic at some pOint. Suppose we want an explanation for G on the basis of the theory. By this, what we mean is "what relations (we will call them hypotheses) might be true in order to have given rise to G?". The answer to this question could involve either theoretical or empirical relations. In order to be confident that an explanation is the correct explanation it is useful to test it. Explanations in terms of empirical relations are directly testable. In the simplest case we just consider the other observations we have already made; in more complicated cases, we may need to "go and look" or even perform an "experiment". Explanations in terms of theoretical relations must be tested indirectly, by deducing their empirical consequences. and testing these. Unfortunately, not all hypotheses that might give rise to the observation G serve as explanations. regardless as to whether they pass any tests. Some are too trivial such as taking G as an explanation for itself. Others we rule out as unsuitably shallow. For example, suppose we sought an explanation for the observation "Jo laughed at the joke"; one possible hypothesis is because "the joke was funny". However, what we really wanted was a deeper explanation: Why was the joke funny? We therefore designate certain types of hypotheses as explanatory (or, more strictly, "abducible"). The problem of explanation. as far as we are concerned in this paper, is the problem of constructing abducible hypotheses which when we add them to T will have G as a logical consequence. Furthermore, explanations must pass (direct or indirect) tests. The process of constructing hypotheses which have G as a deductive consequence is an example of hypothesis formation. It is this stage that corresponds to the "hypothetico-" component of hypotheticodeductive reasoning. The process of testing an explanation is an example of corroboration. It is this stage that corresponds to the "deductive" component of hypothetico-deductive reasoning. This is because we use deduction to determine the empirical consequences of a given explanation. The process of hypotheticodeductive reasoning can now be formulated as the construction of an explanation for an observation through interleaving hypothesis formation and corroboration. 3 The Hypothetico-deductive Mechanism Let us consider the mechanism for hypotheticodeductive reasoning in more detail. To simplify matters we shall require that our theory is composed of rules and no facts. In logical terms, an hypothesis (and thus an explanation) will be a set of ground atomic wellformed formulae. Suppose we have a (usually causal) theory T, an observation set 0, a set of abducible atomic formulae A, and a particular observation G from 0 which we wish to explain. Let 0' = O-G. In addition we define a set S, the observables, containing all the formulae that can occur in O. There are three components to the reasoning process: hypothesis formation, hypothesis corroboration, and explanation corroboration. In outline, we carry out hypothesis formation on G, and for each component formula in the resultant hypothesis. We repeat this process until all that remains 548 is a set of abducible relations constituting the explanation. We also carry out hypothesis corroboration at each formation point. Finally we reason forwards from the explanation to perform explanation corroboration. Hypothesis Formation From any ground atomic formula F we form an hypothesis for that formula. This is done by determining which rules in T might allow F as a conclusion, and forming an hypothesis from tIle antecedents of each such rule (after carrying out the relevant substitutions dictated by F). Each hypothesis is thus sufficient to allow the conclusion of F. Hypothesis Corroboration An hypothesis for an observation may contain instances of observables defined by S. For each such component we check to see whether it is an observation recorded in 0'. If it is a member of 0' then it is corroborated and we can retain it. However, where any component is not corroborated in this fashion, we reject the entire hypothesis. Explanation Corroboration An hypothesis H which is composed entirely of instances of abducible predicates defined by A is an explanatory hypothesis. To corroborate H, we use T to reason forwards from H as an assumption. Each logical consequence of H which is also an instance of an observable is checked against 0' for corroboration (similar to "hypothesis corroboration"). If it does not occur in 0' then the original hypothesis H is rejected. If all observable consequences are corroborated, then the explanation H is said to be corroborated. In general, rules may have more than one literal in their antecedent. We must also check the satisfaction of the other literals in a given rule by reasoning backwards until we reach either one of the observations in 0' or one of the other explanatory hypotheses. If neither of these two situations arise, the rule is discarded from the forward reasoning process. We make a distinction between corroboration failure, where an hypothesis or prediction does not occur in the observation set 0', and refutation, where the negation of an hypothesis or prediction occurs in 0'. Normally the form of and T means that refutation is impossible (see the next section for details of this form). Later we suggest an extension which allows the possibility of refutation in addition to corroboration failure. In cases where it is natural to apply the closed world assumption to 0, these two situations will coincide. ° 4 The Logical Structure of Hypothetico-deductive Reasoning Suppose we have a theory T composed of definite Horn clauses and an observation set of ground atomic well-formed formulae 0. Let the set of ground atomic be S, the observables. formulae which can occur in Similarly, let us define a set of distinguished ground atomic formulae A, the abducibles, in terms of which all explanations must be constructed. An explanation will be a member of the set A. We will assume that the theory T alone does not entail any empirical observation without some other empirical input i.e. there does not exist any formula <\> such that <\> E Sand T 1= <\>. Consider also a ground atomic formula G (a member of S) for which we seek an explanation. ° Given the 4-tuplc , a corroborated explanation 6. for G, is a set of ground atomic wellformed formulae, which fulfils all of the following criteria: (1) Each formula in ~ must be a member of A. (2) T v ~ 1= G ° (3) IfT v 6. 1= nand n ~ S ,then n ~ An explanation set ~ which satisfies (1) and (2) but not (3) is said to be uncorroborated. This formulation is easily generalized to explanation for multiple observations by simply replacing G with a conjunction of ground atomic formulae. We note that since at this stage we have taken our theories to be Horn, a simple extension to hypotheticodeductive reasoning allows us to distinguish between explanation refutation when a prediction is inconsistent with observation, and merely the failure of corroboration where a prediction is consistent with known observations but not present in them. Such an extension would allow a hypothetico-deductive system to deal with circumstances where our observations cannot ever be complete (where we know our faultdetection system is itself fallible, for instance). We could then discard only those explanations that are refuted, and order the remaining ones according to their degree of corroboration (corresponding to Popper's notion of versimilitude, [Popper, 1965]). A later section discusses the extension of hypotheticodeductive reasoning to theories which include ncgationas-failure. This extended version of hypothetico-deductive reasoning is non-monotonic because later information might serve to refute a partially corroborated explanation. To return to our first example for instance, the observation that the victim does not have a blue tongue would lead us to reject the hypothesis that they had drunk arsenic (even if previously this hypothesis had some observational consequences which had been observed). 5 Hypothetico-deductive Proof Procedure A resolution proof procedure which implements hypothetico-deductive reasoning is formally presented below. BaSically we define two types of derivation: abductive derivation and corroboration derivation which are then interleaved to define the proof procedure. Abductive derivation corresponds to the processes of hypothesis formation and corroboration, deriving hypotheses for goals. Corroboration derivation corresponds to the process of explanation corroboration, deriving predictions from goals. There are two different ways to interleave the abductive and deductive components of the reasoning mechanism. One approach is to derive all the abducible literals in the hypothesis for an observation, before any of them are corroborated. The second approach attempts corroboration as soon as an abducible literal is derived, postponing consideration of other (non-abducible) literals in the hypothesis. Here we present a proof procedure based on the second approach. Definition (safe selection rule) A safe selection rule R is a (partial) function which, given a goal ~ Li, ... , Lk k~l returns an atom Li, i=l, ... ,k such that: either i) Li is not abducible; Lj is ground. or ii) 549 Definition (Hypothetico-deductive proof procedure) An abductive derivation from (G I ~ I) to (G n ~n) via a safe selection rule R is a sequence (GI ~l), (G2 ~2), ... , (G n ~n) such that for each i> 1 Gi has the form ~ L 1, .. · ,L k> R(Gi)=Lj and (Gi+l ~i+r) is obtained according to one of the following rules: AI) If Lj is neither an abducible nor an observable, then Gi+l=C and ~i+l=~i where C is the resolvent of some clause in T with Gi on the selected literal Lj; A2) If Lj is observable, then Gi+l=C and ~i+l=~i where C is the resolvent ofC': ~ Ll', ... ,Lj', ... ,Lk' with some clause in T on Lj' where ~ LI', ... ,Ljl',Lj+l', ... ,Lk' is the resolvent of Gi with some clause (ground assertion) Lj' in a on the selected literal Lj; A3) If Lj is abducible and LjE ~ i, then Gi+l= ~LI, ... ,Lj-l,Lj+I, ... ,Lk and ~i+l=~i; A4) If Lj is abducible and Lje: ~ and there exists a corroboration derivation from ({ Lj} ~iU {Lj}) to ({} ~') then Gi+l = ~Ll, ... ,Lj-l, Lj+l, ... ,Lk and ~i+l = ~'. Step AI) is an SLD-resolution step with the rules of T. In step A2) under the assumption that observables and abducibles are disjoint we need to reason backward from the true observables in the goal to find explanations for them since the definition of an explanation requires that it logically implies G in the theory T alone without the set of observations O. Step A3) handles the case where an abductive hypotheses is required more than once. In step A4) a new abductive hypotheses is required which is added to the current set of hypotheses provided it is corroborated. A corroboration derivation from (FI ~l) to (Fn ~n) is a sequence (FI ~l), (F2 ~2) ... (Fn ~n) to (Fn ~n) such that for each i>1 Fi has the form {H~LI, ... ,Lk} U Fi' and (Fi+l ~i+l) is obtained according to one of the following rules: Cl) If H is not observable then Fi+l = C' U Fi' where C' is the set of all resolvents of clauses in T with H~LI, ... ,Lk on the atom Hand C2) If H is a ground observable, He: and LI, ... ,Lkis not empty then Fi+l = C' U Fi' ~i+l=~i; a where C' is ~LI, ... ,Lk and ~i+l=~i; IfHeO then Fi+l = Fi' and ~i+l=~i. C3) If H is a non ground observable, O~ 3 xH and L I, ... ,Lk is not empty then Fi+ 1 = C' u Fi' where C' is ~LI, ... ,Lk and ~i+l=~i; C4) If H is a non ground observable and Lj is any non observable selected literal from L I, ... ,Lk then Fi+ 1 = C' u Fi' where C' is the set of all resolvents of clauses in T U ~i with H ~ L 1, ... ,Lk on the selected literal Lj and ~i+ 1=~i; If Lj is observable the resolutions are done only with clauses in O. C5) If H is empty, Lj is any selected literal and Lj is not observable then Fi+ 1 = C' u Fi' where C' is the set of all resolvents of clauses in T u ~i with ~ L 1, ... ,Lk on the literal Lj and De: C', and ~i+l =~i; If Lj is observable the resolutions are done only with clauses in O. In step C I) we "reason forward" from the conclusion H trying to generate a ground observable at the head. Once this happens if this observable is not "true" steps C2), C3) give the denial of the conditions that imply this observable. Step C4) reasons backward from the conditio ns either fail i ng or try i ng to instantiate further the observable head. Step C5) reasons backward from the denials of steps C2), C3) until every possible such backward reasoning branch fails. Note that in the backward reasoning steps observables are resolved from the observations a and not the theory. More importantly notice that we do not reason forward from an observable that is true. Note that we have included the set of hypotheses ~i in the definition of the corroboration derivation although this does not get affected by this part of the procedure. The reason for this is that more efficient extensions of the procedure can be defined by adding extra abducible information in the ~ i duri ng the corroboration phase e.g.the required absence of some abducible A can be recorded by the addition of a new abducible A *. Theorem Let be a Hypothetico-Deductive framework and G a ground atomic formula. If (~G {}) has an adbuctive derivation to (0, Ll) then the set Ll is a corroborated explanation for G. Proof (Sketch) The soundness of the abductive derivations follows directly from the soundness of SLD resolution for definite Horn theories as every abductive derivation step of this procedure can be mapped into an SLD resolution step. To show that the explanation ~ is corroborated let AE S be any ground atomic logical consequence of T u ~ . Since T u ~ is a definite Horn theory A must belong to its minimal model which can be constructed in terms of the immediate consequence operator 'I[van Emden & Kowalski, 1976] . Hence there exists a finite integer n such that A E 'IT v Ll i n (0) and A does not follow from T alone by our assumption on the form of the theory T . The result then follows by induction on the length of the corroboration derivation. 6 Application of Hypotheticodeductive Reasoning In this section we will illustrate hypotheticodeductive reasoning with some examples. Before this it is worth pointing out that existing abductive diagnosis techniques (e.g. [Poole et aI., 1987], [Davis, 1984], [Cox & Pietrzkowski, 1987], [Genesereth, 1984], [Reggia et aI., 1983], [Sattar & Goebel, 1989]) can be accommodated within the HD framework. For example in the diagnosis of faults in electrical circuits hypothetico-deductive reasoning exhibits similar behaviour to [Genesereth, 1984], [Sattar & Goebel, 1989]. Problems and domains which are ideally suited to the application of hypothetico-deductive reasoning exhibit two characteristics. Firstly, they have a large number of 550 possible explanations in comparison to the number of empirical consequences of each of those explanations. Secondly, they have a minimal amount of observational data pertaining to a given explanation so that corroboration failure is maximized. To illustrate the manner in which general hypothetico-deductive reasoning deals with differing but compatible explanations, let us consider the example of abdominal pain first presented by [Pople, 1985] and axiomatized in [Sattar & Goebel, 1990]. The axioms are reproduced below. To allow the possibility of several diseases occurring simultaneously, the three expressions which capture the fact that the symptoms (nausea, irritation_in_bowel, and heartburn) are incompatible, have been omitted. Theory T2 abdominaCpain_symp(X) problem_is(indigestion) ~ ~ has_abdominal-pain abdominal_pain_symp(nausea) problem_is(dysentry) ~ abdominal-pain_symp(irritation_in_bowel) problem_is(acidity) ~ abdominal_pain_symp(beartbum) Now consider the following observations: Observations 0 has abdominal pain abdominal pain symp(nausea) Abducibles, A {problem_is(indigestion), problem_is(dysentry), problem_is(acidity) } Observables, S {has_abdominal_pain, abdominal_pain_symp(nausea), abdominal_pain_sympCirritation_in_bowel), abdominal_pain_symp(heartburn) } There are three possible potential explanations for the observation "has_abdominal_pain". Since they are not mutually incompatible (it is possible to have all three diseases, for example), there is no crucial literal which can help us distinguish between the three explanations. There is thus no "best" explanation from this pOint of view. From the point of view of hypothetico-deductive reasoning however, one of the explanations stands apart from the others. On the basis of all the currently available evidence "problem_is(indigestion)" is completely corroborated. The two remaining explanations remain possible but uncorroborated; that is to say there is no supplementary evidence in support of them. Experiments might be performed (testing for "abdominal_pain_symp(irritation_in_bowel)", and "abdominal_pain_symp(heartburn)") which could corroborate one or both of the others, which would lead us to extend our explanation. Since physical incompatibilities are rare in common-sense reasoning, hypothetico-deductive reasoning has an advantage in being able to offer a (revisable) "best" explanation based on the currently available evidence, in spite of the absence of possible crucial experiments. It is important to appreciate that it is usually impractical to simply construct the hypotheses by performing abduction on .all the observations in 0, since in general there may be an extremely large number of them. Moreover, only a few may be relevant to the particular observation for which we seek an explanation. It might be thought that the checking of all the observational consequences of some explanation might be equally impractical: there might be an infinite number of them as well. However, it must be borne in mind that we are only considering the representation of common-sense; we would normally ensure that there are only a small number of observable consequences in which we would be interested. We would define our set of observables, S, accordingly. So, for instance, in the fermentation example below we represent certain critical times (often referred to as "landmarks") at which we might perform observations. Similarly, in the "stolen car" example which we present later, we restrict observables to events that occurred at some specific pOint in time. One application area in which incomplete information is intrinsic, is that of temporal reasoning. Reasoning about time is constrained by the fact that factual information is only available concerning the past and the present. By its very nature we must perform temporal diagnosis with no knowledge about the future states of the systems we are trying to model. As an example of temporal diagnosis which illustrates this characteristic, consider an industrial process involving the fermentation of wine. Suppose we are faced with the task of diagnosing whether the fermentation process has proceeded normally, or that the extremely rare conditions have occurred under which we will produce a vintage wine. To do this we must carry out a test at some time after the winemaking process has begun, such as measuring its pH, its relative density, or its alcohol content. Suppose further that we need to decide on this diagnosis before a certain time, e.g. the bottling-time tomorrow. Let us refer to some property of the mixture which would be observed for vintage wine by the symbol pI, and that for ordinary wine as p2. These two properties might be entirely compatible: it is perfectly possible for ordinary wine to be produced under conditions which exhibit . p leas well as p2), but in such a case it is not the fact that the mixture is ordinary wine that causes pI to be observed. Now suppose we observe p I before the bottling time, and suppose there are no further observational consequences for the "vintage wine" hypothesis that are observable before tomorrow. Then the "vintage wine" hypothesis is completely corroborated within the defined time-scale. On the other hand, the "ordinary wine" hypothesis remains at best only partially corroborated. Hypothetico-deductive reasoning would then p refe r the "vintage wine" hypothesis over the "ordinary wine" one. The temporal dimension illustrates the ability of hypothetico-deductive reasoning to form diagnoses on the basis of incomplete information. Notice that an extension of the time scale would revise the status of the observable relations and perhaps the "vintage wine" hypothesis would become only partially corroborated. The application of hypothetico-deductive reasoning to the temporal domain will be discussed in more detail in the next section as an important special case of the integration of hypothetico-deductive reasoning and default reasoning. 7 Hypothetico-deduction with Default Theories As we discussed above, the aim of hypotheticodeducti ve reasoning has been to provide a framework in which we can tackle one of the main characteristics of common sense reasoning, namely incomplete information. More specifically it addresses the fact that 551 we are often forced to form hypotheses and explanations on the basis of liI?ited informati~n. Another important form of reas?n~ng t~at deals. wl~h the problem of incomplete (or limIted) mformatIOn IS default reasoning (see e.g. [Reiter, 1980]). We can then enhance the capability of each framework separately to deal with this problem of missing information by integrating them together into a common framework. So far we have only considered the application of hypothetico-deduction to classical theQries. In t~s section we study its application to default theon~s incorporating negation-as-failure (N~) from .Loglc Programming. We will then apply this ~daptatIOn of hypothetico-deduction to temporal reasomng pr?blemS formulated within the event calculus where NAF IS used to represent default persistence in time ([Kowalski & Sergot, 1987], [Evans, 1989]). . The approach we adopt is to conside~ only class~cal theories to which non-monotonIC reasonIng mechanisms such as default and hypothetico-deductive reasoning are applied (in contrast to ~on-monotonic logics). The motivation a~ before, IS to. separate representation (classical lOgIC) from reasomng .(nonmonotonic). Recent formalizations of the semanUcs of negation-as-failure [Eshghi & Kowalski, 1989], [Kakas & Mancarella, 1990], [Dung, 1991], [Kakas & Mancarella, 1991] have adopted a similar point of vi~w. This approach means that hypothetico~deductIve reasoning can be applied to default theOrIes of any system which separates these two components, e.g. circumscription [McCarthy, 1980]. Following this work, we associate to any gene~al logic program, P, (Horn clauses extended wIth negation-as-failure) a classical theory, P', as follows. Each negative condition, no~ p, where not de.notes the negation-as-failure operator, IS regarde~ ~s a smgle n.ew positive atom. This can be made explICIt by. repla~Ing each such negative literal, not p, by a syntacuc vanant, say p*, to give the Horn theory P'. The model-theoretic extension of the new symbol is intended to be the complement of the old one, so that we can 0!llit the not. To take a more meaningful example we mIght replace "not alive" with "dead". These new symbols "p*" or "dead" are then defined to be abducible predicates. The above authors show that with this view it is possible to understand (and generalize) the stable model semantics [Gelfond & Lifschitz, 1989] for NAF in logic programming. (Note that this is also the approach ta~en more generally in [Poole, 1988]. for unders~andmg default reasoning through abductIOn by namll1g the defaults and considering these as assumptions.) We can then apply an adapted formulation of hypothetico-deductive reasoning to these. classical Horn theories P' corresponding to general lOgIC programs P. As above we have a 4-tuple where the set, A, of abducibles has been extended with new abducibles e.g. "p*", "dead", which name the different NAF default assumptions. Hence given a 4-tuple , a corroborated explanation d for an observation G, is a set of ground atomic well-formed formulae, which fulfils all of the following criteria: (1) Each formula in d is a member of A. Let d = dO U dH where dD denotes the subset of abducibles corresponding to NAF. (4) There exists a stable model 1 M of P' U dH U 0 such that the negations corresponding to dD hold in M (Le. are contained in the complement of M). This is a direct extension of the previous definition of hypothetico-deductive reasoning.. The ext~a condition (4) captures the default reasonmg present In the theory P (or P'). This is clearly separated in this condition although it does play an important role in the generation of explanations by rejecting expl.anations that do not satisfy it. This has the effect of addmg extra abducibles in the d to make it acceptable. For example in the theory, G ~ p* p ~ q* q ~ a although {p*} is an explanation for G, this is not accepted until the abducible "a" .is added ~o it ~hich ensures that this default assumptIOn {p*} IS valId. In addition condition (4) also ensures that any default assumption (abducible) in d is compatible with the observations O. Note that we could have chosen to put together condi tions (2) and (4) as "G is true in a stable model of P U dH" for generating the explanations d, and use condition (4) solely for the purpose of ensuring that dD are compatible with the observations O. Although at first sight it might seem appropriate to allow default reasoning during the corroboration of an explanation this is not the case as indicated by condition (3). The reason for this is clear: if we allow it then the corroboration process will not be for the explanation d alone, but for d plus any additional default assumptions made in arriving at the observable test. In other words, we would not want to reject an explanation ~ by failure to corroborate an observati.on that is a not a consequence of ~ alone but of d WIth some additional default assumptions. Let us now indicate how the proof procedure for hypothetico-deductive reasoning, defined earlier, needs to be extended to deal with this more general formulation where our theories are general logic programs. The first thing to notice is that, as indicated by condition (3), the corroboration phase of the procedure remains unchanged apart from the fact t~at it will also be applied whenever a NAP hypotheSIS, "p*" (or "not p"), is added to the explan.ation. Similarly, the adbuctive derivation phase remall1s as before with the set of abducibles enlarged to include the NAF default assumptions. The main extension of the procedure arises from the need to implement the new condition (4). This can be done by adopting the abductive proof procedure developed in [Eshghi & Kowalski, 1989], [Kakas & Mancarella, 1990b], [Kakas & Mancarella, 1990c] for NAF which is an extension of SLDNF. A new type of derivation, called consistency derivation, is introduced interleaved with the abductive phase of the procedure whenever a NAF hypothesis, "p*" (or "not p"), is required in the explanation. Itsyurpose is to ens~re that "p*" (or "not p") is a valId NAF ass~mJ?tIOn by checking that p does not succeed. ThIS lI1volves reasoning backwards from p in all possible ways and showing that each such branch ends in failure. During this consistency check for some N AF hypothesis, "p*" (or "not p"), it is possible for new (2) P' U d 1= G (3) If P' U d 1= nand n ~ S , then n ~ 0 1 More generally, we can use recent extensions of stable models e.g. preferred extensions or stable theories as defined in rDun!!. 19911 and rKakas & Mancarella. 19911 resoectivelv. 552 abductive phases to be generated whenever the failure of some consistency branch reduces to showing that some other NAF default assumption e.g. "q*" (or "not q") does not hold in the theory P' u ~. To ensure this the procedure starts a new abductive phase to show that q holds where it is possible that new hypotheses may be added in the explanation if this is needed to prove q. Then with this enlarged explanation "q*" (or "not q") is not a valid (default) NAF assumption (as q holds) and so the original consistency branch can not succeed. In the example above the abducible "a" in the explanation {p*, a} for G is generated during the consistency check of p* (or not p) as described here. More details about this extension of the proof procedure can be found in the references above. 8 Application of HD Reasoning to Temporal Reasoning As an example of the application of the above extended hypothetico-deducti ve mechanism, let us consider temporal reasoning with the Event Calculus [Kowalski & Sergot, 1987] where NAF is used to express default persistence in time. The Event Calculus represents properties which hold over intervals of time. They are initiated and terminated by events which happen at particular instances of time. NAF is used to conclude that a property is not "clipped" or "broken" over an interval of time, achieving default persistence. Variants of the two main axioms, which define when a property "holds" and when a property is "broken", are given below. holds-at(p,t2) f- happens-at(e,t 1) " initiates(e,p) " t1 < t2 " not broken-during(p, }. M is a stable model of P iff M is a minimal model of GL(P,M) [Pr2,GL]. It has been showed [DK] that each minimal Herbrand model of a positive disjunctive programs is a model of the Clark's completion of N(P). In this chapter, we are interested in the more general -question about the relationship between the stable models of P and N(P). We introduce now the acyclic disjunctive programs. The following theorem shows the equivalence between P and N(P) for acyclic disjunctive programs. Definition A disjunctive program P is acyclic if it is possible to decompose the Herbrand base of P into disjoint sets, called strata Ho,Hl' ... '~' .. where i is a natural number so that for each ground clause in Gp (i) all Ci belong to the same stratum, say ~. (ii) all Ai and Bi belong to U { Hj I j < r } II Since acyclic programs are locally stratified, their intended semantics is the perfect model semantics. 3. Transforming Acyclic Disjunctive Programs into Normal Programs Let us introduce some new notations. Let D be a disjunction of atoms. D is canonical if the atoms in D are pairwise different. For each disjunction D, the canonical Theorem 1 Let P be an acyclic disjunctive program P, and M be a Herbrand interpretation of P. Then M is a stable model of P iff M is a stable model of N(P). Proof "=>" Let Q = GL(Gp,M). Since M is a stable model of P, M is a minimal model of Q. Since M is a minimal model of Q, for each A E M, there is a clause A v Al v.. v An <- Body in Q such that for each i: Ai e. M and Body is true in M. Hence, for each A E M, there is a clause A <- Body' in GN(P) such that Body' is true in M. Thus, there exists a clause C' in GL(GN(P),M) such that head(C')=A and body(C') is true in M. Since P is acyclic, GL(GN(p),M) is acyclic, too. It follows, that M is the least Herbrand model of GL(GN(P),M). So M is a stable model of N(P). "<=" Let M be a stable model of N(P). Since GL(GN(P),M) = GL(N(GL(Gp,M)),M), M is also a stable model of N(GL(Gp,M)). Thus M is a minimal model of GL(Gp,M). Hence M is a stable model of P. II 557 Corollary Let P be an acyclic disjunctive program, N (P) be its normal fann. Then a Herbrand interpretation M is a perfect model of P iff M is a stable model of N(P). Summary Let P be an acyclic disjunctive program, and L be a ground literal. II 2) P U {L} is stable-consistent iff The following example shows that in general, the above theorem does not hold. N(P) U {L} is stable-consistent iff Example Let P: a <- b b <- a avb N(P): a <- b b <- a The question of basic interest to us now is: a <-l b b <-l a (*) It is clear that P is not acyclic. It is easy to see that N (P) has no stable model while the unique minimal model of P is {a,b}. II Since each locally stratified disjunctive program posseses at least one perfect model [Prl,Pr2], it is obvious that there exists at least one stable model for N(P). So Corollary comp(N(P» U {L} is consistent. II If P is acyclic, then N(P) posseses at least one stable model. II The following theorems give important characterizations of the normal form of a acyclic disjunctive program. "Given an acyclic disjunctive program P and a ground literal L, is P U {L} stable-consistent ?" Eshghi and Kowalski have developed an abductive procedure [EK,Dun] which takes as input a query G and a normal program P, and delivers as output a set of ground negative literals H such that P U H U {G} is stableconsistent. From the above obtained results, it is clear that this abductive procedure can be used as a proof procedure for the question (*). 4. The Eshghi and Kowalski's Abductive Procedure Before presenting the formal definition of the abductive procedure, let us explain the algorithm informally by an example. Example Theorem 2 Let P be an acyclic disjunctive program. Then each stable model of N(P) is a Herbrand model of comp(N(P) and vice versa where comp(N(P» denotes the Clark's predicate completion [Cla,Llo] of N(P). II Theorem 3 The three-valued semantics and the two-valued semantics of comp(N(P» are equivalent in the sense that each three-valued model of comp(N(P» can be extended into an two-valued one. II Let L be a ground literal. We say that L holds with respect to the stable semantics of P, written P h L, if L is true in each stable model of P. We say P U {L} is stableconsistent if there exists one stable model of P in which L is true. P: P <-l q q <-l P We want to check whether p belongs to some stable model of P, i.e. whether P U {p} is stable-consistent. It is clear that the SLDNF-resolution will not terminate for this goal due to the existence of a negative loop. To avoid getting trapped in this loop, the abductive procedure uses a loop check by "storing" all "encountered" negative literals in a set H. If a selected sub goal belongs to H, then the respected goal is simplified by deleting the selected subgoal from it. 558 ~- P I <-lq ~------------C~--~~-=~~---~ I <-q and 1 q is "stored" in R = h q}. I I such that, for each i, O,R') Gi+l = <-1' and ~+l = H' An abductive refutation is an abductive derivation to a pair ([],H). A consistency derivation from (Fl,Hl) to (Fn'~) (wrt P) is a sequence if there is an abductive derivation from «-k,~) to ([],R') then else Fi+1 = F i ' and ~+l = H' if l' is not empty then F i+l= {<-I'} U Fi ' and ~+l=~ A consistency derivation of the goal ({G},. We say that the abductive procedure is complete with respect to the stable semantics if for each ground literal L, if P U {L} is stable-consistent then there exists a refutation for the goal «-L,. I {<- l b, <-b} I Proof (Sketch) Let Ho, .. ~, ... be the strata of P. Let Pi consist of those clauses Ai v ... v Au. <- Bd in Gp such that all Aj belong to ~. By induction, we can prove that for each i, the stable semantics and preferential semantics [Dun] of Pi coincide. It follows then that the stable and preferential semantics of P coincide. The theorem follows immediately from the fact that the abductive procedure is sound wrt preferential semantics [Duti]. ~~------------------, 1-b I I <- l a t ~--------------~ {<-a} H I {<- l b} I = h P,l a} I II Using AbdudiYe ........... For Skeptical Reasoning <-b I I The question of this chapter is: [] {<-b} I {<- l a} "Given a logic program P and a ground literal L, does L hold with respect to the stable semantics of P ?" I fail The following lemma shows that if the abductive procedure is complete, then it can be used to as a proof procedure for skeptical reasoning. fail As there is no refutation for Lemma Let L be a ground literal and assume that the abductive procedure is sound and complete with resp'eet to the stable semantics. If there exists no refutation for «- l L, IBI. It is not difficult to see that if P is an acyclic disjunctive program then N(P) is always p-acyclic. Note that positive .cyclicity is different to local stratifiability, i.e. there exists programs which are p-acyclic and not locally stratified and vice versa. The atom dependency graph of P is a graph with ground atoms as its nodes such that there exists a positive (resp. 560 negative) edge from A to B if A occurs in the head, and B occurs positively (resp. negatively) in the body of some clause C in Gp • An infinite path (AI'''.An , •• ) of pairwise different atoms in the atom dependency graph of P is said to be a negative infinite loop if the path contains infinitely many negative edges. P is said to be free of infinite negative loop, written INL-free, if there exists no negative infinite loop in the atom dependency graph of P. A program P is allowed [LIo] if each clause in P satisfies the condition that each variable appearring in the clause appears also in a positive subgoal in the clause body. References [ABW] [AB] [Bez] [Cav] Theorem 5 (Completeness of the Abductive Procedure) [CIa] Let P be an allowed, p-acyclic, and NIL-free normal program, and L be an arbitrary ground literal. Then the abductive procedure will terminate for the goal «-L,e true in any model of P, by the first rule, and so, the second rule cannot contradict the assigned meaning. Another way to understand this is that one may safely assume IVC using a form of CWA on c, since IVa may not be consistently assumed. = {}. However, when relying on the absence of present evidence about some atom A, we do not always want to assume that IV A holds, since there may exist consistent assumptions allowing to conclude A. Roughly, we want to define the notion of concluding for the truth of a negative literal IVA just in case there is no hard nor hypothetical evidence to the contrary, i.e. no consistent set of negative . assumptions such that IVA is untenable. Consider P = {a +-lVbj b +-lVaj C +- a}. If we interpret the meaning of this program as its WFM (which is empty), and as we do not have a, a naive CWA could be tempted to derive IVC based on the assumption IVa. There is however an alternative negative assumption IVb, that if made, defeats the assumption IVa, i.e. the assumption IVa may not be sustained since it can be defeated by the assumption IVb. We will define later more precisely the notions of sustainability and tenability. Both programs above have empty well founded 563 models. We argue that WFS is too careful, and something more can safely be added to the meaning of program, thus reducing the undefinedness of the program, if we are willing to adopt a suitable form ofCWA. We argue that a set CW A( P) of negative literals (assumptions) added to a program model MOD(P) by CWA must obey the four principles: 1. MOD(P) U CW A(P) ~ L for any ",L E CW A(P). This says that the program model added with the set of assumptions identified by the CWA rule must be consistent. 2. There is no other set of assumptions A such that MOD(P) U A F L for some ",L E CW A(P). I.e. CW A(P) is sustainable. 3. CW A(P) must be maximal. 4. CW A(P) must be unique. The paper is organized as follows: in the next section we present some basic definitions. In section 3 we introduce some new definitions, capturing the concepts behind the semantics, accompanied by examples illustrating them. Models are defined and organized into a lattice, and the class of sustainable A-Models is identified. In section 5 we define the O-Semantics of a program P based on the class of maximal sustainable tenable A-Models. A unique model is singled out as the O-Model of P. Afterwards we present some properties of the class of A-Models. Finally, we relate to other semantics and present conclusions. 2 Language Here we give basic definitions and establish notation ([Monteiro, 1991]). A program is a set of rules of the form: H +- B I , .•. ,i!n,"'CI , ... ,,,,Cm (n 2:: 0, m 2:: 0) or equivalently H +- {B I , ••• , Bn}U "'{ CI, ... ,Cm }, where ",{AI, ... ,An} is a shorthand for {",AI, ... , ",An}, and", C is short for "'{ CI, ... ,Cm }; H" Bi and Cj are atoms. The Herbrand Base B(P) of a program P is defined as usual as the set of all ground atoms. An interpretation I of P is denoted by TU ",F, where T and F are disjoint subsets of B(P). Atoms in T are said to be true in I, atoms in F false in I, and atoms in B(P) - (T U F) undefined in I. In an interpretation TU '" F a conjunction of literals {BI, ... , Bn}U '" {CI, ... ,Cm } is true iff {BI, ... , Bn} ~ T and {CI, ... , Cm} ~ F, is false iff {B I , ... ,Bn}nF # 0 or {CI, ... ,Cm}nF # 0, and is undefined iff it is neither true nor false. 3 Adding Negative Assumptions to a Program Here we show how to consistently add negative assumptions to a program P. Informally, it is consistent to add a negative assumption to P if the assumption atom is not among the consequences P after adding the assumption. We also define when a set of negative assumptions is defeated by another, and show how the models of a program, for different sets of negative assumptions added to it, are organized into a lattice. We begin by defining what it means to add assumptions to a program. This is achieved by substituting true for the assumptions, and false for their atoms, in the body of all rules. Definition 3.1 (P+A) The program P + A obtained by adding to a program P a set of negative assumptions A ~"'B(P) is the result of: • Deleting all rules H +- {B I , ••. , Bn}U '" C from P, such that some Bi E A • Deleting from the remaining rules all fVL E A Definition 3.2 (Assumption Model) An Assumption Model of a program P, or A-Model for short, is a pair (A; M) where A ~'" B(P) and M=WFM(P+A). Among these models we define the partial order (AI; M I ) :Sa (A 2 ; M 2 ) iff Al ~ A 2 • On the basis of set union and set intersection among the sets A of negative assumptions, the set of all A-Models becomes organized as a complete lattice. Having defined assumption models we next consider their consistency. According to the CWA principles above, an assumption '" A cannot be added to a program P if by doing so A is itself a consequence of P, or some other assumption is contradicted. :Sa in the following way: Definition 3.3 (Consistent A-Model) An AModel (A; M) is consistent iff A U M is an interpretation, i.e. there exists no assumption fVL E A such that L E M. 564 Example 1 Let P = {c +-"" b; b +-"" a; a +-"" a}, whose WFM is empty. The A-Model ({ "'a}; {a, b, ""c}) is inconsistent since by adding the assumption ""a then a E W F M(P + {""a}). The same happens with all A-Models containing the assumption ""a. The A-Model ({ ""b, ""c}; {c}) is also inconsistent. Thus the only consistent AModels are ({};{}), ({"'b};{c}) and ({""c};{}). 0 Lemma 3.1 If an A-Model AM is inconsistent then any AM' such that AM ~a AM' is incon~ sis tent. Proof;[sketch] We prove that for all ""a' E B(P), if (A; W F M(P + A) is inconsistent then (A U {""a'};W F M(P + A U {""a'}) is also inconsistent. By definition of consistent A-Model: 3 ",b E A I b E W F M(P + A), so it suffices to guarantee that: b ¢ W F M(P + A U {""a'}) --. a' E W FM(P + A U {"'a'}). Consider b ¢ W F M(P + A U {""a'}). Since P + A U {",a'} only differs from P + A in rules with a' or ""a', and since b is true in P + A, it can be shown a' is also true in P + A. As the truth of an atom in the WFM of any program may not rely neither on the truth of itself nor of its complementary, and because the addition of "" a' to P + A only changes rules with ""a' or a', the truth value of a' in P + A U {""a'} remains the same, i.e. a' E WFM(P+AU{""a'}). ¢ ({ ""b}; {c}), i.e. the assumption ""c is unsustainable since there is a set of consistent assumptions (namely {""b}) that leads to the conclusion c. 0 The assumptions part of maximal sustainable AModels of a program P are maximal sets of consistent Closed World Assumptions that can be safely added to the consequences of P without risking contradiction by other assumptions. Lemma 3.2 If an A-Model AM is defeated by another A-Model D, then all A-Models AM' such that AM ~a AM' are defeated by D. Proof: Similar to the proof of lemma 3.1 above. ¢ Lemma 3.3 The A-Model ({}; WFM(P) ways sustainable. is al- Proof: By definition of sustainable. ¢ Theorem 3.4 The set of all sustainable A-Models is nonempty. On the basis of set union and set intersection among its A sets, the A-Models ordered by ~a form a lower semi lattice. Proof: Follows directly from the above lemmas. ¢ A program may have several maximal sustainable A-Models. According to the CWA principles above, an assumption ",A cannot be sustained if there is some set of consistent assumptions that concludes A. We've already expressed the notion of consistency being used. To capture the notion of sustainability we now formally define how an A-Model can defeat another, and define sustainable A-Models as the nondefeated consistent ones. Example 3 Let P = {c +-""c, ""b; b +- a; a +-""a}. Its sustainable A-Models are ({}; {}), ({ ""b}; {}) and ({ ""c}; {}). The last two are maximal sustainable A-Models. We cannot add both ""b and ""c to the program to obtain a sustainable A-Model since ({ ""b, ""c}; {c}) is inconsistent. 0 Definition 3.4 (Defeating) A consistent A-Model (A; M) is defeated by a consistent (A'; M') iff 3 ",a E Ala E M'. 4 Definition 3.5 (Sustainable A-Models) An A-Model (A; M) is sustainable iff it is consistent and not defeated by any consistent A-Model. Equivalently (""S; M) is sustainable iff: S n Uconsistent (AjjMj) Mi = {} Example 2 The only sustainable models in example 1 are ({}; {}) and ({""b}; {c}). Note that the consistent A-Model ({ ""c}; {}) is defeated by The O-semantics This section is concerned with the problem of singling out, among all sustainable A-Models of a program P, one that uniquely determines the meaning of P when the CWA is enforced. This is accomplished by means of a selection criterium that takes a lower semilattice of sustainable A-Models and obtains a subsemilattice of it, by deleting AModels that in a well defined sense are less preferable, i.e. the untenable ones. Sustainability of a consistent set of negative assumptions insists that there be no other consistent 565 set that defeats it (Le. there is no hypothetical evidence whose consequences contradict the sustained assumptions). Tenability requires that a maximal sustainable set of assumptions be not contradicted by the consequences of adding to it another competing (nondefeating and nondefeated) maximal sustainable set. The selection process is repeated and ends up with a complete lattice of sustainable AModels, which defines for every program Pits 0Semantics. The meaning of P is then specified by the greatest A-Model of the semantics, its 0Model. To illustrate the problem of preference among maximal A-Models we give an example. Example 4 Let P = {c ~rvc, rvb; b ~ a; a ~rva}, whose sustainable A-Models are ({};{}), ({rvb};{}), and ( { rvC }, {} ). Because we wish to maximize the number of negative assumptions we consider the maximal A-Models, which in this case are the last two. The join of these maximal A-Models, ({ rvb, rvc}; {c}), is per force inconsistent, in this case wrt c. This means that when assuming rvC there is an additional set of assumptions entailing c, making this A-Model untenable. But the same does not apply to rvb. Thus the preferred AModel is ({ rvb}, {}), and the A-Model ({ rvc}; {}) is said untenable. The rationale for the preference is grounded on the fact that the inconsistency of the join arises wrt c but not wrt b. 0 Definition 4.1 (Candidate Structure) A Candidate Structure CS of a program P is any subsemilattice of the lower semi lattice of all sustainable A-Models of P. Definition 4.2 (Untenable A-Models) Let {(AI; M 1 }, ••• , (An; Mn)} be the set of all maximal A-Models in Candidate Structure GS. Let J = (AJ; MJ) be the join of all such A-Models, in the complete lattice of all A-Models. An A-Model (Ai; Mi) is untenable wrt G S iff it is maximal in G S and there exists rva E Ai such that a E M J. The Candidate Structure left after removing all untenable A-Models of a CS, may itself have several maximal elements, some of which might not be maximal A-Models in the initial CS. If the removal of untenable A-Models is performed repeatedly on the retained Candidate Structure, a single maximal element is eventually obtained, albeit the bottom element of all the CSs. Definition 4.3 (Retained CS) The Retained Candidate Structure R( G S) of a Candidate Structure G Sis: • G S if it has a single maximal A -Model, i. e. G S is a complete lattice. • Otherwise, let U nt be the set of all untenable A-Models wrt GS. Then R(GS) = R(GS Unt). Definition 4.4 (The O-Semantics) The O-Semantics of a program P is defined by the Retained Candidate Structure of the semilattice of all sustainable A-Models of P. Let (A; M) be its maximal element. The intended meaning of P is A U M, the O-Model of P. Theorem 4.1 (Existence of O-Semantics) The Retained Candidate Structure of the semilattice of all sustainable A-Models is nonempty. Proof:[sketch] It suffices to guarantee that at each iteration with more than one maximal A-Model at least one is untenable. This is done by contradiction: suppose no maximal A-Model is untenable. Then their join would be the single maximal sustainable one, and so could not be untenable, in the previous and final iteration; accordingly the supposed models cannot be maximal. When there is a single maximal A-Model then the structure is a complete lattice, since at each iteration only maximal A-Models were removed. This lattice is nonempty since its bottom ({}; W F M(P) is always sustainable and can never be untenable. 0 Proposition 4.1 There exists no untenable AModel wrt a Candidate Structure with a single maximal element. Proof: Since the join coincides with the unique maximal A-Model, which is sustainable by definition of CS, then it cannot be untenable. 0 5 Examples In this section we display some examples and their O-Semantics. Remark that indeed the O-Models obtained express the safe CWAs compatible with the WFMs (which are all {}). 566 Example 5 Let P = {a +-"'aj b +- aj C +-"'C, ",bj d +- c} The semilattice of all sustainable A-Models CS is: vice-versa. Thus the O-Semantics is determined by ({}j {}) and ({ ",p}j {}), and the meaning of P is {"'P}, its O-Model. 0 = The Jom of its maximal AModels is ({ ",b, "'c, ",d}j {c, "'d}). Consequently, the maximal A-Model on the right is untenable since it contains "'c in the assumptions, and c is a consequence of the join. So R( C 8) = R( C B') where C8' is: The join of all maximal elements in C 8' is the same as before and the only untenable A-Model is again the maximal one having'" c in its assumptions. Thus R( C 8) = R( C 8") where C 8" is: So the O-Model is {",b, "'d}. Note that if P is divided into PI = {c +-"'c, "'b; d +- c} and P2 = {a +-"'a; b +- a}, the O-models of PI and P2 both agree on the only common literal ",b. So ",b rightly belongs to the O-models of P. 0 Example 6 Let P = {q +-'" pj p +- a; a +-"'bj b +-'" C; C +-'" a}. Its only consistent A-Models are ({};{}), ({",p};{q}) and ({",q};{}). As this last one is defeated by the second, the only sustainable ones are the first two. Since only one is maximal, these two A-Models determine the 0Semantics, and the meaning of P is {"'p, q}, its O-Model. Note that if the three last rules, forming an "undefined loop", are replaced by another "undefined loop" a +-",a, the O-model is the same. This is as it should, since the first two rules conclude nothing about a. 0 Example 7 Let P = {p +- a, b; a +-",b; b +-",a}. The A-Models with '" b in their assumptions defeat A-Models with ",a in their assumptions and , Example 8 Let P {c +-'" C, '" b; b +-"'c, "'b; b +- aj a +-'" a}. Its sustainable A-Models are ({}j {}), ({ ",b}j {}) and ({ ",c}j {}). The join of the maximal ones is ({ ",b, ",c}j {b, c}), and so both are untenable. Thus the Retained Candidate Structure has the single element ({}j {}) and the meaning of P is {} 0 6 Properties of Sustainable A-Models This section explores properties of sustainable A-Models that provide a better understanding of them, and also give hints for their construction without having to previously calculate all AModels. We begin with properties that show how our models can be viewed as an extension to Well Founded Semantics (WFS). As mentioned in [Kakas and Mancarella, 1991a], negation in WFS is based on the notion of support, i.e. a literal ",L only belongs to an Extended Stable Model (XSM) if all the rules for L (if any) have false bodies in the XSM. In contradistinction, we are interested in negations as consistent hypotheses that cannot be defeated. To that end we weaken the necessary (but not sufficient) conditions for a negative literal to belong to a model as explained below. We still want to keep the necessary and sufficient conditions of support for positive literals. More precisely, knowing that XSMs must obey, among others, the following conditions d. [Monteiro, 1991]: • If there exists a rule p +- B in the program such that B is true in model M then p is also true in M (sufficiency of support for positive literals). • If an atom p E M then there exists a rule p +- B in the program such that B is true in M (necessity of support for positive literals). • If all rule bodies for p are false in M then "'p E M (sufficiency of support for negative literals). • If "'P E M then all rules for p have false bodies in M (necessity of support for negative literals). 567 Our consistent A-models, when understood as the union of their pair of elements, assumptions A and W F M{P + A), need not obey the fourth condition. Foregoing it condones making negative assumptions. In our models an atom might be false even if it has a rule whose body is undefined. Thus, only false atoms with an undefined rule body are candidates for having their negation added to the WFM{P). Proposition 6.1 Let (A; M) be any consistent AModel of a program P. The interpretation A U M 'obeys the first three conditions above. Proof: Here we prove the satisfaction of the first condition. The remaining proofs are along the same lines. If 3p +- bt , ... , bn , I'V Ct, ... , I'V Crn E P I {b t , ... ,bn , I'VCt, ... ,I'VCrn } ~ AU M then bi E M (I ~ i ~ n) and I'VCj E M or I'VCj E A (1 ~ j ~ m). Let p +- b}, ... ,bn,I'VC1, ... ,l'Vck(1 ~ 1,k ~ m) be the rule obtained from an existing one by removing alll'Vcj E A, which is, by definition, a rule of P + A. Thus there exists a rule p +- B in P + A such that B ~ WFM{P+A) = M. Given that the WFM of any program must obey the first condition above, p E WFM{P+A). ~ Next we state properties useful for more directly finding the sustainable A-Models. Proposition 6.2 There exists no consistent AModel (A; M) of P with I'V a E A such that a E WFM(P). Proof: Let (A; M) be an A-Model such that l'Va E A and a E W F M(P). It is known that the truth of any a E WFM(P) cannot be supported neither on itself nor on l'Va. If A = {l'Va} then, lafter adding {l'Va} to the program, the rules supporting the truth of a remain unchanged, i.e. a E W F M{P + {l'Va}), and thus ({ l'Va}; W F M (P + {l'Va})),s inconsistent. It follows, from lemma 3.1, that all A-Models (A; M) such that {l'Va} ~ A are inconsistent. ~ Hence, A-Models not obeying the above restriction are not worth considering as sustainable. Proposition 6.3 If a negative literall'VL E W F M(P) then there is no consistent A-Model (A; M) of P such that LEM. Proof:[sketch] We prove that if L E M for a given A-Model (A; M) of P then (A; M) is inconsistent. If L E M there must exist a rule L +- B, I'Ve in P such that BU I'Ve ~ M U A and BU I'Ve is false in W F M(P), i.e. there must exist L +- B, I'Ve in P with at least one body literal true in M U A and false in W FM{P). If that literal is an element of I'Ve, by proposition 6.2, (A; M) is inconsistent (its corresponding atom is true in W F M (P) and false in M U A). If it is an element of B this theorem applies recursively, ending up in a rule with empty body, an atom with no rules or a loop without an interposing I'V/. By definition of W F M (P + A) the truth value of literals in these conditions can never be changed. ~ Theorem 6.1 If I'VL E W F M(P) then I'VL E M in every consistent A-Model (A; M) of P. Proof: Given proposition 6.3, it suffices to prove that L is not undefined in any consistent A-Model of P. The proof is along the lines of that of the proposition above. ~ Consequently, all supported negative literals in the W F M{P), which includes those without rules for their atom, belong to every sustainable AModel. Lemma 6.2 Let WFM{P) = TU I'VF. For any subset S of I'VF, W F M{P) = W F M(P + S). Proof: This lemma is easily shown using the definition of P + A and the properties of the WFM. ~ Theorem 6.3 Let WFM(P) = TU I'V F and (A; W F M(P + A)) be a consistent A-Model, and let A' = An I'V F. Then WFM(P + A) = WFM(P + (A - A')). Proof: Let pI = P + (A - A'), and W F M(P) = Tu I'VF. By theorem 6.1 I'VF ~ W F M(P I). So, by lemma 6.2, WFM(P I) = WFM(PI+ I'VF) = W F M([P+(A-(An I'VF) )]+ I'VF). By definition of P+A it follows that (P+At}+A 2 = P+(A 1 UA 2 ). Thus W F M(P I ) is: ~ = W F M(P + [(A - (An I'VF)U I'VF]) WFM(P+A) 568 This theorem shows that sets of assumptions including negative literals of W F M(P) are not worth considering since there exist smaller sets having exactly the same consequences AU M and, by proposition 6.3 the larger sets are not defeatable by reason of negative literals from the WFM(P). Another important hint for calculating the sustainable A-Models is given by lemma 3.1. According to it one should start by calculating A-Models with smaller assumption sets, so that when an inconsistent A-Model is found, by the lemma, sets of assumptions containing it are unworth considering. Example 9 Let P = {p f-"""a, """b; a f - c, d; c f-"""C; d}. The least A-Model is ({}; {d, """b}) where {d, """b} = W F M(P). Thus sets of assumptions containing """ d or """ b are not worth considering. Take now, for example, the consistent A-Model ({ """a}; {d, """b,p}), which we retain. Consider ({ """c}; {c, a, """p}); as this A-Model is inconsistent we do not retain it nor consider any other AModels with assumption sets containing """c. Now we are left with just two more A-Models worth considering: ({ """P }; {d, """b}) which is defeated by ({"""a}; {d,,,,,,,b,p}); and ({"""p,,,,,,,a};{d,,,,,,,b,p}) which is inconsistent. Thus the only two sustainable A-Models are ({}; {d, """b}) and ({"""a}; {d, """b,p}). In this case, the latter is the single maximal sustainable A-Model, and thus uniquely determines the intended meaning of P to be A U M = {"""a, d, """b, p}. 0 7 Relation to other work Consider the following program ([Van Gelder et al., 1980]): P = {p f - q,"""r,"""s; q f - r,"""p; r f - p, """q; s f-"""P, """q, """r} In [Przymusinska and Przymusinski, 1990] they argue that the intended semantics of this program should be the interpretation {s, """p, """q, """r} due to the mutual circularity of p, q, r. This model is precisely the meaning assigned to the program by the O-Semantics, its O-Model. Note that WFS identifies the (3-valued) empty model as the meaning of the program. This is also the model provided by stable model semantics [Gelfond and Lifschitz, 1988]. The weakly perfect model semantics for this program is undefined as noticed in [Przymusinska and Przymusinski, 1990]. The EWFS [Baral et al., 1990] is also an extension to the WFM based on the notion of GCWA [Minker, 1987]. Roughly EWFS moves closer than the WFM (in the sense of being less undefined) to being the intersection of all minimal Herbrand models of P [Dix, 1991]: EWFM(P) =def WFM(P) + (T(WFM(P)),F(WFM(P))) where: T(I) =def True(I - MIN - MOD(P)),F(I) =aef False(I - MIN - MOD(P)) and I - MIN - MOD ( P) is the collection of all minimal models consistent with the three valued interpretation I. For the program P = {a f-"""a} we have: WFM(P)={}, MIN-MOD(P) = {a} and EWFM(P) = {a} Note this view identifies the intended meaning of rule a f-""" a as the equivalent logic formula a f - -,a, i.e. a. The O-Model of P is empty. The difference between the O-Semantics and EWFS may be noticed in the intended meaning of the two rule program: {a f-"""b; b f-"""a}, which is behind the motivation of the extension EFWS of WFM based on GCWA. EWFS wants to identify a V b as the meaning of this program, which also justifies the identification of a f-"""a with the fact a. The O-Model is empty. A similar approach based on the notion of stable negative hypotheses (built upon the notion of consistency) is introduced in [Kakas and Mancarella, 1991b], identifying a stable theory associated with a program P as a "skeptical" semantics for P, that always contains the well founded model. One example showing that their approach is still conservative is: {p f-"""q; q f-"""r; r f-"""P; s f - p}. Stable theories identifies the empty set as the meaning of the program; however its O-Model is {"""s}, since it is consistent, maximal, sustainable and tenable. Kakas (personnal communication) now also obtains this model, as a result of the investigation mentioned in the conclusions of [Kakas and Mancarella, 1991b]. 8 Conclusions We identify the meaning of a program P as a suitable partial closure of the well founded model of 569 the program in the sense that it contains the well founded model (and thus always exists). The extension we propose reduces undefinedness (which some authors argue is a desirable property) in the intended meaning of a program P, by an adequate form of CWA based on notions of consistency, sustainability and tenability with regard to alternative negative assumptions. Sustainability of a consistent set of negative assumptions insists that there be no other consistent set that defeats it (i.e. there is no hypothetical evidence whose consequences contradict the sustained assumptions). Tenability requires that a maximal sustainable set of assumptions be not contradicted by the consequences of adding to it another competing (nondefeating and nondefeated) maximal sustainable set. Acknow ledgements We thank ESPRIT BRA COMPULOG (no. 3012), Instituto Nacional de Investiga~ao CientHica, Junta Nacional de Investiga~ao Cientlfica e Tecnol6gica and Gabinete de Filosofia do Conhecimento for their support. We are indebted to Anthony Kakas and Paolo Mancarella for their previous incursions and intuitions into a similar problem in the setting of their Stable Theories. LUIS Monteiro is thanked for helpful discussions. References [Baral et al., 1990] C. Baral, J. Lobo, and J. Minker. Generalized well-founded semantics. In M. Stickel, editor, CAD'90. Springer-Verlag, 1990. [Dix, 1991] J. Dix. Classifying semantics of logic programs. In A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Logic Programming and NonMonotonic Reasoning'91. MIT Press, 1991. [Gelfond and Lifschitz, 1988] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In R. A. Kowalski and K. A. Bowen, editors, 5th International Conference on Logic Programming, pages 1070-1080. MIT Press, 1988. [Kakas and Mancarella, 1991a} A. C. Kakas and P. Mancarella. Negation as stable hypothesis. In A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Logic Programming and NonMonotonic Rea30ning'91. MIT Press, 1991. [Kakas and Mancarella, 1991bJ A. C. Kakas and P. Mancarella. Stable theories for logic programs . . In Ueda and Saraswat, editors, International Logic Programming Symposium'91. MIT Press, 1991. [Minker, 1987] J. Minker. On indefinite databases and the closed world assumption. Readings in Nonmonotonic Reasoning. Morgan Kaufmann, 1987. [Monteiro, 1991J L. Monteiro. Notes on the semantics of logic programs. Technical report, DIjUNL, 1991. [Pereira et al., 1991a] 1. M. Pereira, J. J. Alferes, and J. N. Aparicio. Contradiction Removal within Well Founded Semantics. In A. Nerode, W. Marek, and V. S. Subrahmanian, editors, Logic Programming and NonMonotonic Reasoning '91. MIT Press, 1991. [Pereira et al., 1991b] L. M. Pereira, J. J. Alferes, and J. N. Aparicio. The extended stable models of contradiction removal semantics. In P. Barahona, L. M. Pereira, and A. Porto, editors, 5th Portuguese AI Conference'91. Springer-Verlag, 1991. [Pereira et al., 1991cJ 1. M. Pereira, J. N. Aparicio, and J. J. Alferes. Counterfactual reasoning based on revising assumptions. In Ueda and Saraswat, editors, International Logic Programming Symposium'91. MIT Press, 1991. [Pereira et al., 1991d] L. M. Pereira, J. N. Aparicio, and J. J. Alferes. Hypothetical reasoning with well founded semantics. In B. Mayoh, editor, Scandinavian Conference on AI'91. lOS Press, 1991. [Pereira et al., 1991e] L. M. Pereira, J. N. Aparicio, and J. J. Alferes. Nonmonotonic reasoning with well founded semantics. In International Conference on Logic Programming'91. MIT Press, 1991. [Przymusinska and Przymusinski, 1990] H. Przymusinska and T. Przymusinski. Semantic Issues in Deductive Databases and Logic Programs. Formal Techniques in Artificial Intelligence. North Holland, 1990. [Przymusinski, 1990J T. Przymusinski. Extended stable semantics for normal and disjunctive programs. In International Conference on Logic Programming'90, pages 459-477. MIT Press, 1990. [Van Gelder et al., 1980] A. Van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logic programs. Journ,al of ACM, pages 81132,1980. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 570 Contributions to the Semantics of Open Logic Programs A. Bossi l , M. Gabbrielli 2 , G. Levi 2 and M.e. Meo 2 2) Dipartimento di Informatica 1) Dipartimento di Matematica Pura ed Applicata Universita di Padova Universita di Pisa Via Belzoni 7, 1-35131 Padova, Italy Corso Italia 40, 56125 Pisa mat010@IPDUNIVX.UNIPD.IT {gabbri,levi,meo}@dipisa.di .. unipi.it Abstract The paper considers open logic programs originally introduced in [Bossi and Menegus 1991J as a tool to build an OR-compositional semantics of logic programs. We extend the original semantic definitions in the framework of the general approach to the semantics of logic programs described in [Gabbrielli and Levi 1991bJ. We first define an ORcompositional operational semantics On(P) modeling computed answer substitutions. We consider next the semantic domain of D-interpretations, which are sets of clauses with a suitable equivalence relation. The fixpoint semantics Fn(P) given in [Bossi and Menegus 1991J is proved equivalellt to the operational semantics, by using an intermediate unfolding semantics. From the model-theoretic viewpoint, an D-interpretation is mapped onto a set of Herbrand interpretation, thus leading to a definition of D-model based on the classical notion of truth. We show that under a suitable partial order, the glb of a set of D-models of a program P is an D-model of P. Moreover, the glb of all the D-models of P is equal to the usual Herbrand model of P \;vhile Fn(P) is a (non-minimal) D-model. 1 Introduction An D-open program [Bossi and Menegus 1991J P is a program in which the predicate symbols belonging to the set D are considered partially defined in P. P can be composed with other programs which may further specify the predicates in D. Such a composition is denoted by Un. Formally, if Pred(P) n Pred(Q) ~ fl then P Un Q = P U Q, otherwise P Un Q is not defined (Pred(P) denotes the predicate symbols in P). A typical partially defined program is a program where the intensional definitions are com- pletely known while extensional definitions are only partially known and can be further specified. Example 1.1 Let us consider the following program Ql = { anc(X, Y) : -parent(X, Y). anc(X, Z) : -parent(X, Y), anc(Y, Z). parente isaac, jacob). parent(jacob, benjamin). } New extensional information defining new parent tuples can be added to QI as follows Q2 = { parente anna, elizabeth). parente elizabeth, john). } The semantics of open programs must be flcompositional w.r.t. program union, i.e. the semantics of PI Un P2 must be derivable from the semantics of PI and P 2 • If D contains all the predicates in P, D-compositionality is the same as compositionality. The least Herbrand model semantics, as originally proposed [van Emden and Kowalski 1976] and the computed answer substitution semantics in [Falaschi et al. 1988,Falaschi et al. 1989a], are not compositional w.r.t. program union. For example, in example 1.1, the atom anc( anna, elizabeth) which belongs to the least Herbrand model semantics of QI U Q2 cannot be obtained from the least Herbrand model semantics of QI and Q2 (see also example 2.1). In this paper we will introduce a semantics for fl-open programs following the general approach in [Gabbrielli and Levi 1991bJ which leads to semantics definitions which characterize the program operational behavior. This approach leads to the introduction of extended interpretations C7rinterpretations) which are more expressive than Herbrand interpretations. The improved expressive power is obtained by accommodating more syn,tactic objects in 7r-interpretations, which are (possibly 571 infinite) programs. The semantics in terms of 7rinterpretations can be computed both operationally and as the least fixpoint of suitable continuous immediate consequence operators on 7r-interpretations. It can also be characterized from the model-theoretic viewpoint, by defining a set of extended models (7rmodels) which encompass standard Herbrand models. In the specific case of n-open programs, extended interpretations are called n-interpretations and are sets of conditional atoms (i.e. clauses such that all the atoms in the body are open). Each n-interpretation represents a set of Herbrand interpretations that could be obtained by composing the open program with a definition for the open predicates. n-interpretations of open programs are introduced to obtain a unique representative model, computable as the least fixpoint of a suitable continuous operator, in cases where no such a representative exists in the set of Herbrand models. The main contribution of this paper is the definition of an OR-compositional (i.e. compositional w.r.t. program union) semantics of logic programs in the style of [Falaschi et al. 1988, Falaschi et al. 1989b]. Other approachs to OR-compositionality can be found in [Lassez and Maher 1984, Mancarella and Pedreschi 1988, Gaifman and Shapiro 1989a, Gaifman and Shapiro 1989b]. An OR-compositional semantics corresponds to an important program equivalence notion, according to which two programs PI and P2 are equivalent iff for any program Q a generic goal G computes the same answers in PI U Q and P2 U Q. An OR-compositional semantics has also some interesting applications. Namely it can be used • to model logic languages provided with a module-like structure, • to model incomplete knowledge bases, where new chunks of knowledge can incrementally be assimilated, • for program transformation (the transformed programs must have the same OR-compositional semantics of the original program), • for semantics-based "modular" program analySIS. The paper is organized as follows. Subsection 1.1 contains notation and useful definitions on the semantics of logic programs. In section 2 we define an operational semantics On(P) modeling computed answer substitutions which is OR-compositional. Section 3 introduces a suitable $emantic domain for the On(P) semantics and defines n-interpretations which are sets of clauses modulo a suitable equivalence relation. In section 4 the fixpoint semantics Fn(P), is proved equivalent to the operational semantics by using an intermediate unfolding semantics. Section 5 is concerned with model theory. From the model-theoretic viewpoint, an n-interpretation is mapped onto a set of Herbrand interpretations, thus leading to a definition of n-model based on the classical notion of truth. We show that under a suitable partial order, the glb of a set of n-models of a program P is an n-model of P. Moreover, the glb of all the n-models of P is equal to the usual Herbrand model of P. Moreover, Fn(P) is a (non-minimal) nmodel, equivalent to the model-theoretic semantics defined in [Bossi and Menegus 1991] in terms of Somodels. A comparison between n-models and the So-models is made in section 6. Section 7 is devoted to some conclusive remarks. All the proofs of the results given here can be found in [Bossi et al. 1991]. 1.1 Preliminaries The reader is assumed to be familiar with the terminology of and the basic results in the semantics of logic programs [Lloyd 1987,Apt 1988]. Let the signature S consist of a set F of function symbols, a finite set P of predicate symbols, a denumerable set V of variable symbols. All the definitions in the following will assume a given signature S. Let T be the set of terms built on F and V. Variable-free terms are called ground. A substitution is a mapping {) : V ---7 T such that the set D({)) = {X I {)(X) =I- X} (domain of f)) is finite. If W c V, we denote by {)Iw the restriction of {) to the variables in W, i.e. {)lw(Y) = Y for Y w. c denotes the empty substitution. The composition {)(J of the substitutions {) and (J is defined as the functional composition. A renaming is a substitution p for which there exists the inverse p-l such that pp-l = p-l P = c. The pre-ordering:::; (more general than) on substitutions is such that {) :::; (J iff there exists {)' such that {){)' = (J. The result of the application of the substitution {) to a term t is an instance of t denoted by tf). We define t :::; t' (t is more general than t') iff there exists {) such that t{) = t'. A substitution {) is grounding for t if t{) is ground. The relation :::; is a preorder. ~ denotes the associated equivalence relation (variance). A substitution {) is a unifier of terms t and t' if tf) == t'{). The most general unifier of tl and t2 is denoted by rngu( tll t2)' All the above definitions can be extended to other syntactic expressions in the obvious way. An atom is an object of the form P(tl" .. ,t n) where pEP, t ll ... ,tn E T. rt 572 A clause is a formula of the form H : -L 1 , . . . , Ln with n ;::: 0, where H (the head) and L 1 , ..• ,Ln (the body) are atoms. ": -" and "," denote logic implication and conjunction respectively, and all variables are universally quantified. If the body is empty the clause is a -unit clause. A program is a finite set of clauses. A goal is a formula L 1 , ... , Lm, where each Li is an atom. By V m'( E) and P1'ed(E) we denote respectively the sets of variables and predicates occurring in the expression E. A Herbrand interpretation I for a program P is a set of ground atoms. The intersection M(P) of all the Herbrand models of a program P is a model (least Herbrand model). M(P) is also the least fixpoint of a continuous transformation Tp (immediate consequences operator) on the complete lattice of Herbrand interpretations. If G is a goal, G ~ p B I , ... , Bn denotes an SLD derivation with fair selection rule of B I , ... ,Bn in the program P where 13 is the composition of the mgu's used in the derivation. G ~ p 0 denotes the refutation of G in the program P with computed answer s'ubstitution 13. A computed answer substitution is always restricted to the variables occurring in G. The notations i, X will be used to denote tuples of terms and variables respectively, ,,,,,hi Ie iJ denotes a (possibly empty) conjunction of atoms. 2 Computed answer substitution semantics for D-open programs The operational semantics is usually given by means of a set of inference rules which specify how derivations are made. From a purely logical point of view the operational semantics is simply defined in terms of successful derivations. However, frol11 a programming language viewpoint, the operational semantics must be concerned with additional information, namely observable properties. A given program in fact may have different semantics depending on which of its properties can be observed. For instance in pure logic programs one can observe successes, finite failure, computed answer substitutions, partial computed answer substitutions or any combination of them. A given choice of the observable induces an equivalence on programs, namely two programs are equivalent iff they are observationally indistinguishable. "\iVhen the semantics correctly captures the observable, two programs are equivalent if they have the same semantics. "\iVhen also compositionality is taken into account, for a given observable property we can obtain different seman- tics (and equivalence relations) depending on which kind of program composition we consider. Indeed, the semantics of logic programs is usually concerned with AND~composition (of atoms in a goal or in a clause body). Consider for example logic programs with computed answer substitutions as observable [Falaschi et al. 1989a]. The operational semantics can be defined as O(P) = {p(X)8 IX are distinct var, p(X) ~p D} where the denotation of a program is a set of nonground atoms, which can be viewed as a possibly infinite program [Falaschi et al. 1989a]. Since we have syntactic objects in the semantic domain, we need an equivalence relation in order to abstract from irrelevant syntactic differences. If the equivalence is accurate enough the semantics is fully abstract. According to [Gabbrielli and Levi 1991b], Herbrand interpretations are generalized by 7r-interpretations which are possibly infinite sets of (equivalence classes of) clauses. The operational semantics of a program P is then a 7r-interpretation I, which has the following property. P and I are observationally equivalent with respect to any goal G. This is the property which allows to state that the semantics does indeed capture the observable behavior [Falaschi et al. 1989a]. The following example shows that when considering OR-composition (i.e. union of sets of clauses), non-ground atoms (or unit clauses) are not sufficient any longer to define a compositional semantics. Exalnple 2.1 Lei grams PI ={ U8 con8ider the following pro- q(X): -p(X). 1'(X) : -.s(X). .s( b). pea). P2 = { p( b). According to the prevzous definition of O( P), O(Pd = {pea), q(a), reb), s(b)} and O(P2 ) = {pCb)}. Since O(PI U P2 ) = {p(a),p(b), q(a), q(b), reb), s(b)}, the semantics of the ~mion of the two programs cannot be obtained from the semantics of the programs. In order for a semantics to be compositional, it must contain information in the form of a mapping from sets of atoms to sets of atoms. This is indeed the case of the semantics based on the closure operator [Lassez and Maher 1984] and on the Tp operator [Mancarella and Pedreschi 1988]. If we want a semantics expressed by the program syntax, ORcompositionality can only be obtained by choosing as semantic domain a set of (equivalence classes of) clauses. In example 2.1, for instance, the semantics of PI should contain also the clause q(X) : -p(X). 573 Let us formally give the definition of the program composition we consider. Definition 2.2 Let P be a program and Q be a set of predicate symbols. P is open w.r.t. Q (or Q-open) if the information on the predicates in Q is considered to be partial. Moreover if P, Q are Q-open progra,ms and (Pred(Q) n Pred(P)) ~ Q then P Uo Q is the Q-open program PUQ. If(Pred(Q)nPred(P)) Cl Q then P Uo Q is not defined. Note that when considering an Q-open program P and an Q'-open program Q, the composition of P and Q is defined only if (Pred(Q) n Pred(P)) ~ (Q n Q'). Moreover, the composition of P and Q is a W-open program, where W = Q U Q'. The definition of any predicate symbol p E Q in an Q-open program P can always be extended or refined. For instance in example 1.1 program Ql is open w.r.t. the predicate parent and this predicate is refined in program Q2. Therefore, a deduction concerned with a predicate symbol of an Q-open program P can be either complete (when it takes place completely in the program P) or partial (when it terminates in P with an atom p(i) such that p E Q and p( i) does not unify with the head of any clause in P). A partial deduction can be completed by the addition of new clauses. Thus we have an hypothetic deduction, conditional on the extension of predicate p. Let us consider again the program PI of example 2.1 and assume = {pl. Then, the goalr'(X) produces a complete deduction onl;y, comput.ing the answer substitution {Xjb}. The goal q(X) produces a complete deduction, computing t.he answer substitution {X j a} and an hypothetical deduction returning any answer that could be computed by a definition of p external to Pl' The goal q( b) instead has one hypothetical deduction only, conditional on the provability (outside PI) of p( b). V\Te want to express this hypothetical reasoning, i.e. that. q( b) is refutable if p( b) is refutable. Hence we will consider the following operational semantics (recall that by B we denote B l , ... , Bn with n ~ 0). . n Definition 2.3 Let Q be a set of predicate syrnbol8. We define Id(Q) = {p(-X-) : -p(_Y) I p E D, ~t are distinct variables } Definition 2.4 (Q-compositional computed answer substitutions semantics) Let P be a program and let P* = P U Id(Q). Then we define Oo(P) = {A: -B2 I p(X) ~p Bl -0.p. B2 X distinct variables, A = p( _Y ){J'y, {Pred( .Hz)} ~ Q} The set of clauses I d( Q) in the previous definition is used to delay the evaluation of open atoms. This is a trick which allows to obtain by using a fixed fair selection rule R, all the derivations P(Xl' ... ,Xn ) ~ P B l , ... , En which use any selection rule R', for P1'ed(B l , ... ,Bn) ~ D. Note that t.he first step of the derivation uses a clause in P (instead than in P*) because we want Oo( P) to contain a clause p(-X) : -p(X) if and only if p(_Y) ~p p(~Y). Example 2.5 Let PI, P 2 be the Q-open programs of example 2.1 where Q = {p}. Then On(P2) = {p( b)} and 0dPl ) = {q(X) : -p(X), p(a), q(a), r(b), s(b)}. 0 0 contains eno'ugh information to compnte the semantics of compositions. Indeed O(PI U P2) ~ On(Pl UP2) and On(Pl UP2) = Oo( On(PdUOo(P2)) (see theorem 2.9). Example 2.6 Lei Q = {q,r} and lei Ql, Q2 be the following programs Ql = {p(X, Y) : -1'(X), q(Y). Q2 = { 1'(b). } r(a). Then 00(Q2) = {1'(b)}, 00(Q1)= {p(X, Y) : -r(X), q(Y), p(o., Y) : -q(Y), dan and On(Q1 U Q2) = Oo(Oo(Qd U On(Q2)) = {p(X, Y) : -1'(X), q(Y), p(a, Y) : -q(Y), p(b, Y) : -q(Y), r(a), r(b)} (see theorem 2.9). Note that Oo(P) is essentially the result of the partial evaluation [Lloyd and Shepherdson 1987] of P, where derivations terminate at open predicates. This operational semantics fully characterizes hypothetic deductions, conditional on the extension of the predicates in n. Indeed the semantics of a program P can be viewed as a possibly infinite set of clauses and the partial computed answer substitutions can be obtained by executing the goal in the "program". The equivalence (~n) on programs induced by the computed ansvver substitution observable when considering also programs union, can be formally defined as follows. Definition 2.7 Let PI, P2 be Q-open programs. Then PI ~n P 2 if for e'very goal G and for every program Q s.t. Pi Un Q, i = 1,2) is defined) {} P UnQ 0 Z.'ff G' 1----+ {}p 1 1p, '1,S a renamG 1----+ P U nQ 0 'were j T 2 mg. On allows to characterize a notion of answer substitution which enhances the usual one, since also (unresolved) atoms, with predicate symbols in Q, are considered. Therefore it is able to model computed answer substitutions in an OR compositional way. The following results show that On(P) is compositional w.r.t. Un and therefore it correctly captures 574 the computed answer substitution observable notion when considering also programs union. where E is the empty renamzng. However" PI ~o P 2 since by considering Q = {q(X,b),q(a,Y)}, Theorem 2.8 Let P be an rl-open program. Then P ~o Oo(P). p(X,Y) ~PIUQ 0 where () = {X/a,Y/b}, while the goal p(X, Y) in the program P 2 U Q can compute either {X/a} or {Y/b} only. Theorem 2.9 Let Pl. P2 be n-open programs and let PI Uo P2 be defined. Then Oo(Oo(Pd Un 00(P2 )) = Oo(Pl Uo P2 ). Corollary 2.10 Let PI, P2 be n-open programs. If Oo(Pd = On(P2 ) then PI ~o P 2 . 3 ,Semantic domain for n-open programs In this section we formally define the semantic domain which characterizes the above introduced operational semantics 0 0 . Since 0 0 contains clauses (whose body predicates are all in n), we. have to accommodate clauses in the intei'pretations we use. Therefore we will define the notion of ninterpretation which extends the usual notion of interpretation since an n-interpretation contains conditional atoms. As usual, in the following, n is a set of predicates. Definition 3.1 (Conditional atoms) An n-conditional atom is a clause A: -B l , .. . , Bn such that Pn;:d(B l , . ..• Bn) ~ n. In order to abstract fro111 the purely syntactical details, we use the following equivalence ~ on conditional atoms. Definition 3.2 Let Cl = Al : -B l , ... , B n , C2 = A2 : -D l , ... , Dm be cla1tSes. Then Cl :s; C2 iff:::h9 such that 3{i l , . . . , in} ~ {I, ... , rn} such that A(z9 = A 2 , i h =I- i k for h =I- k, and (BliJ, ... , Bn iJ ) = (Dil' ... ,Din ). Moreover we define C1 ~ C2 iff C1 :::; C2 and C2 :::; C1· Note that in the previous definition bodies of clauses are considered as multisets (considering sets would give the standard definition of subsumption). Equivalent clauses have the same body (considered as a multi set ) up to renaming. Considering sets instead of multisets (subsumption equivalence) is not correct when considering computed answer substitutions. The following is a simple counterexample. Example 3.3 Let = p(X, Y) : -q(X, Y), q(X, Y) and C2 = p(X, Y) : -q(X, Y). Let PI = {cd and P 2 = {C2} be D.-open programs 'where D. = {q}. Obviously, considering bodies of cl(J.1I,8es (l.S sets. C1 = C2E C1 Definition 3.4 The n-conditional base, Cn , is the quotient set of all the rl-conditional atoms w. r. t. ~. In the following we will denote the equivalence class of a conditional atom c by G itself, since all the definitions which use conditional atoms are not dependent on the element chosen to represent an equivalence class. Moreover, any subset of Cn will be considered implicitly as an D.-open program. Before giving the formal definition of rl-interpretation, we need the notion of 'u-closed subset of Cn . Definition 3.5 A subset I of Cn is u-closed iff VH : - Bll ... ,Bn E I and VB : - AI, ... ,Am E I s'uch that ~() = mgu(Bi' B), for 1 :::; i :::; n, (H : -B l , ... , B i - l , AI,"" Am, B i + b . · . , Bn){) E I. Moreover if I ~ Cn , 'we denote by j its 'n-closure defined as the least (w. r. t. ~) l' ~ Co 1t-closed such tha.t I ~ 1'. Proposition 4.5 will show that the previous notion of u-closure is well defined. A u-closed interpretation I is an interpretation which, if viewed as a program, is closed under unfolding of procedure calls. Interpretations need to be u-closed for the validity of the model theory developed in section 5. Therefore, in order to define rl-interpretations we will consider uclosed sets of conditional atoms only. Let us now give the formal definition of D.-interpretation. Definition 3.6 An rl-interpretation I is any subset of Co which is tt-closed. The set of all the D.interpretations is denoted by S'. Lemma 3.7 (S',~) is a complete lattice where the minimal element is X ~ S'. 0 and glb(X) = Ux~ x for any In the following the operational semantics On will be formally considered as an D.-interpretation. 4 Fixpoint semantics In this section we define a fixpoint semantics Fn( P) which in the next subsection is proved to be equivalent to the previously defined operational semantics On(P). This can be achieved by defining an immediate consequence operator TJ1 on the lattice (S', ~) of D.-interpretations. Fn( P) is the least fixpoint of TJ1. 575 The immediate consequences operator TJl is strongly related to the derivation rule used for D.open programs and hence to the unfolding rule. Therefore TJl models the observable properties in an OR compositional way, and may be useful for modular (i.e. OR compositional) bottom-up program analysis. 4.1 Unfolding semantics and equivalence results {(A : -LI, . .. ,Ln){} E Cn I 3A: -Bb'" ,Bn E P, 3Bi: -Li E I U Id(D.), i = 1, ... ,11., mi ~ 0 s.t. {} = mgu((Bb"" B n), (B~, ... , B~))} To clarify the relations between the operational and the fixpoint semantics, before proving their equivalence, we introduce the intermediate notion of unfolding semantics Un(P) [Levi 1988, Levi and Mancarella 1988]. Un(P) is obtained as the limit of the unfolding process. Since the unfolding semantics can be expressed top-down in terms of the r~ operator, the unfolding semantics can be proved equal to the standard bottom-up fixpoint semantics. On the other hand, since Un(P) and On(P) are based on the same inference rule (applied in parallel and in sequence respectively) Un(P) and Oo(P) can easily be proven equivalent. Proposition 4.2 TJl is contin1W'lls in the complete lattice (~, ~). Definition 4.7 Let P and Q be D.-open programs. Then the unfolding of P w. r. t. Q is defined as Definition 4.1 Let P be an D.-open program. Then TfJ(I) = r~(I) where r~(I) is the operator defined in [Bossi and M enegus 1991 j as follows. r~(I) = The notion of ordinal powers for TJl is defined as usual, namely TJl iO = 0, TJl in+1 = Tj}( TJl in ) and TJl iw = Un>O ( TJl in). Since Tj} is continuous on (8', ~), ~ell known results of lattice theory allow to prove proposition 4.3 and hence to define the fixpoint semantics as follows. Proposition 4.3 Tj} iw is the least fixpoint of Tj} in the complete lattice (~, ~). Definition 4.4 Let P be an D.-open program. The fixpoint semantics Fn(P) of P is defined as Fn(P) =Tj} i~', Remark The original definition of r~( I) does not require D.interpretations to be u-closed subsets of Cn . If we consider an D.-interpretation as any subset of Cn and the r~ operator, even if the intermediate results r~ i 11. are different, the following proposition 4.5 and theorem 4.6 show that the least fixpoint r~ i w is a u-closed set and it is equal to Fn(P) (r~ is continuous on (~(Cn), ~)). Therefore, when considering the fixpoint semantics we can use the r~ operator. Moreover, proposition 4.5 ensures us that the previous notion of u-closure is well defined. Proposition 4.5 Let I ~ Cn and let r9(I) be defined as in definition 4.7. Then the following hold 1. I is u-closed iff 1= r9(I)) 2. for any program p) r~ i w is 'u-closed) 9. I' = r9 i w is the least (w. r. t. set incl1tsion) subset of Cn such that it is ,It-closed and I ~ 1'. Theorem 4.6 Lei P an D.-ope~ program. r~ Fn(P). i w = unffJ(Q)= {(A: -Ll'''' , Ln){} I 3A : -Bl' ... , Bn E P, 3Bi : -Li E I U !d(D.), i = 1, ... ,11., mi ~ 0 s.t. {} =n/'gU((Bl, ... ,Bn),(B~, ... ,B~))} Note that the only difference between unffJ(Q) and r~( Q) is that the second restricts to clauses in Cn the set resulting from the definition. Therefore if I is an D.-interpretation (i.e. I ~ Cn), r~(I) = unffJ(1) holds. In general, r~(I) = tn(unffJ(1)) where tn(P) extracts from a program P an D.-interpretation. Definition 4.8 Let P be an D.-open program. Then we define tn(P)={cEPlcECn }. Definition 4.9 Let P be an D.-open program and let tn(P) be as defined in definition 4,8. Then we define the collection of programs Po =P Pi = Unfpi_l (P) The unfolding semantics Un(P) of the program P is defined as Un(P) = Ln(Pi ). U i=1,2, ... The following theorem states the equality of the unfolding and the operational semantics. Theorem 4.10 Let P be an D.-open program. Then On(P) = Un(P). Note that r~ i 11. + 1 = unf;;, (0), where P~ P and P!+! = 1mffJ(Pf). Therefore we have the following t.heorem. 576 Theorem 4.11 Let P be a program. Then Fo(P) = Uo(P). Corollary 4.12 Let P Fo(P) = Oo(P}. 5 be a program. Then Example 5.6 Let us consider the D-open program P = {pea) :' -q(a)} where D = {q}. Then 0 is a (the least) H erbrand model of P. If, by violating the J ~ Atomo(!) condition, {q(a)} E H(0), since {q( a)} is not a H erbrand model of P, 0 would not be an D-model of P. Model Theory As we have shown, the operational and fixpoint semantics of a program P define an D-interpretation I p , which can be viewed as a syntactic notation for a set of Herbrand interpretations denoted by H(Ip). Namely, H( Ip) represents the set of the least Herbrand models of all programs which can be obtained by closing the program Ip with a suitable set of ground atoms defining the open predicates. Our aim is finding a notion of D-model such that Oo(P) (and Fo(P)) are D-models and every Herbrand model is an D-model. This can be obtained as follows. Definition 5.1 Let J be an D-interpretation .. Then we define Atomo( J) = {pC i) I p E D and p( i) is a ground instance of a.n a.tom in J}. Example 5.2 Let D = {p, q} and J = {pea) : -q(b)}. Then Atorno(J) = {p(o.),q(b)}. Definition 5.3 Let I be an D-interpreta.tion for an D-open program. Then we define H(!) = {M(I U J) I J ~ Atomo(I)} where M( K) denotes the least H erbrand model of J(. Example 5.4 Let I interpretation. Then = {p(o.) : -q(b)} be an D- 1) for D = {q} Atomo (I) = {q( b)} and H(!) = {0. {pC a.), q(b)}}, 2) for D = {p, q} Atomo(I) = {p(a),q(b)} and H(!) = {0, {p(o.)}, {p(o.),q(b)}}. Definition 5.5 Let P be an D-open program and_ I be an D-interpretation. I is an D-model of P iff 'If J E H(I), J is a Herbrand model of P. Obviously, in general given a Herbrand modellvI of a program P, M U N is not anymore a model of P for an arbitrary set of ground atoms N. Since we want a notion of D-model which encompasses the standard notion of Herbrand model, the "closure" of the interpretation I can be performed by adding only ground atoms which unify with atoms already in I. The following example 5.6 shows that if such a condition is not satisfied, a standard Herbrand model would not any more be an D-p10del. Example 5.7 Let us consider the program PI where D = {p} of the example 2.1. Then Oo(Pd = {q(X) : -p(X), pea), q(a), reb), s(b)} is an D-model of PI since H(Oo(Pl )) = {H l ,H2 ,H3 , .. • }' where, denoting by [p(X)] the set of ground instances of p(Xo), HI = {p(a),q(a),r(b),s(b)} H2 = {p(a),p(b),q(a),q(b),r(b),s(b)} Hw = {reb), s(b)} U [p(X)] U [q(X)]} and HI, H 2 , ••• ,Hw a.re Herbrand models of Pl' The following proposition states the mentioned properties of D-models. Proposition 5.8 Let P open program. Then 1. every Herbrand model of P is an D-model of P, 2. Oo(P) is an D-model of P. A relevant property of standard Herbrand models states that the intersection of a set of models of a program P is always a model of P. This allows to define the model-theoretic semantics of P as the least Herbrand model obtained by intersecting all the Herbrand models of P. The following example shows that this important property does not hold any more when considering D-models with set theoretic operations. Example 5.9 Let D = {q} and P be the following D-open program P = {pCb) : -q(b), p(X), q(a)}. Then Oo(P) = {pCb) : -q(b), p(x), q(a)} and M(P) = {q(a)} U {pel) I l is a ground term }. By proposition 5.8 Oo(P) and M(P) are D~models ofP. HoweverOo(P)nM(P) = {q(a)} is not an D-model of P. The D-model intersection property does not hold because set theoretic operations do not adequately model the operations on conditional atoms. Namely, the information of an D-interpretation II may be contained in 12 without II being a subset of 12. In order to define the model-theoretic semantics for Dopen programs as a unique (least) D-model, we ·then need a partial order ~ on D-interpretations which 577 allows to restore the model intersection property. G should model the meaning of D-interpretations, in such a way that (SS, G) is a complete lattice and the greatest lower bound of a set of D-models is an Dmodel. As we will show in the following, this can be obtained by considering G as given: in definition 5.10. According to the above mentioned property, there exists a least D-model. It is worth noting that such a least D-model is the standard least Herbrand model (proposition 5.21). Moreover note that, the most expressive D-model Oo(P) is a non-minimal Dmodel. The following definitions extend those given in [Falaschi et a1. 1989b] for the non compositional semantics of positive logic programs. Definition 5.10 Let II, 12 be D-interpretations. We define • II S 12 UJ VC1 E II 3C2 E 12 s'uch that e2 • II G 12 iff (II S 12) and (12 S II II ~ 12)' SCI' implies Proposition 5.11 The relation S is a preorder (f.nd the relation G is an ordering. Note that if II ~ 12 , then II G 12 , since II ~ [;2 implies II S 12 . The following definitions and propositions will be used to define the model-theoretic semantics. Definition 5.12 Let I be an D-interpretation. We define Min'(I) = {c E I I Ve' E I, c'S c :::} c' = c} and Min(I) = 1I1in'(I). Example 5.13 We show "~1in and jIlin' for the following D-interpretations I and J. Let It is worth noting that V I Min( I) ~ I (recall that I is u-closed) and "~1in(A) = "~lin.(U A). Proposition 5.15 For any set A of D-interpretations there exists the least tlpper bound of A , Iv.b(A). and [ub(A) = UA holds. Proposition 5.16 The set of all the D-interpretations SS with the ordering G is a complete lattice. Co is the top element and 0 is the bottom element. The model-theoretic construction is possible only if D-interpretations can be viewed as representations of Herbrand interpretations. Notice that every Herbrand interpretation is an D-interpretation. The following proposition generalizes the standard intersection property of Herbrand models to the case of Dmodels . Proposition 5.17 Let M be (J. non-empty 8et of Dmodels of an D-open pTOgram P. Then glb(M) is an D-model of P. Corollary 5.18 The set of all the D-rnodels of a program P with the ordering G 7:" (J. complde lattice. Vve are now in the position to formally define the model- theoretic semant.ics. Definition 5.19 Let P be a program. n~ modeltheoretic semantics 1:S the greak~t lower bound of the set of its modeL~: i.e., Alo (P) = glb( {I E '::s' I I is a D-m.odel of P}). 1= {p(x), q(b), p(a), p(a) : -q(b) J={ q(;1.;) : -p(;r), 7'(;1') Proposit.ion 5.21 shovvs t.hat the above defined model-t.heoret.ic semantics is t.he standarclleast. Herbrand model. This fact just.ifies om choice of t.he ordering relat.ion. q(b) : -p(b) q(b) : -p(T) db) Proposition 5.20 For any D-morlel I there exists a standard H erbrand model I' ,,·u.ch that I' G I. Proposition 5.21 The least "ta.ndo.rd model is the lea,8t D-model. Then Min'(I) = lvIin(I) = {p(x), q(b)}. Min'(J) = {r(b), q(x) : -p(;r),r(x). q(b) : -p(.?,)}, Min(J) = J. Definition 5.14 Let A be a set of D-interpreta.tions. the following notations. 6 We introd'uce • VA = UIEA I • Min(A) = lvIin(vA) • UA = ,4 'where ...-l = JlIin(~\) U v{I Min(A) ~ I} ) E .:\ I Herbrand SD-models vVe will now consider t.he relat.ion between Dmodels (definition 5.5) ancl the So-models defined in [Bossi and Menegus 1991] on the same set of interpretat.ions. Bot.h t.he D-models and the So-models are intended to capt.ure specific operat.ional properties, from a model-t.heoret.ic point of view. However, So-models are based on an ad hoc notion of t.ruth (So-trut.h) and t.he least. So-modd is eX(l('tly Fo(P). 578 Conversely, n-models are based on the usual notion of truth in a Herbrand interpretation through the function 1t. Moreover the least n-model is the usual least Herbrand model, while Fo(P) is a non-minimal n-model. Definition 6.1 (Bossi and Menegus 1991] (So-Truth) Let n be a set of predicate symbols and I be an n-interpretation. Then (a) An atom A is n-true in I iff AEI. (b) A definite clause A:-B I , ... ,Bmis n-true in I iff VB~, ... , B~ such that B~ : - L1 , ••• ,B~ : - Ln E I U I d( n) if 319 = mg·u((B ll ... , B n ), (B~, ... , B:J) then (A : -LI" .. , LnW EI. So-models are defined in the obvious way. It is worth noting that, Slllce Oo(P)= Fo(P) = MSf)(P), theorem 2.9 shows that the model- theoretic semantics Msn(P) is compositional w.r.t. n-union of programs when considering computed answer substitutions as observahles. This result was already proved in [Bossi and Menegus 1991] for the Msn(P) model. Finally note that, as shownby the following example, Tj1 is not monotonic (and therefore it is not continuous) on the complete lattice (S', ~). However, proposition 6.10 ensures us that Fo(P) is still the least fixpoint of Tj1 on (S', ~). Exalnple 6.9 Consider the program P = {reb) p(x): -q(x)}. Let n = 0, II = {q(a),q(x)} and 12 = {r(b),q(x)}. Then II ~ 12 while Tj1(Id={p(:r),p(a),r(b)} ~ Tj1(I2 ) ={p( x), r( b)}. Proposition 6.2 E-very So-model is an n-model (according to definition 5.5). Proposition 6.10 Tj1 jw is the least fixpoint of Tj1 on the complete lattice CS', ~). Proposition 6.3 (Bossi and Meneg'lls 1991] If A i8 a non-empty set of So-models of an n-open program P, then nMEA k! is an So-model of P. 7 The previous proposition allows to define the model theoretic semantics M Sf) (P) for Ct prograrn P in terms of the So-models as follows. Definition 6.4 (Bossi and M encgns 1991] Let P be an n-open p7'Ogram and let S be the set of a.ll the So-modeL~ of P. Then j'vlsn(P) = nIlES j\I. Corollary 6.5 Let A be a non-empty sct of Snmodels of an n-open program P. Then nUEA JI is an n-model of P. By definition and by proposition 6.:3., j'vlSf)(P) is the least So-model in the lattice (:s. <:;;;:) (recall that :s is the set of all the n-interpretations). The following proposition shows that j'vlsn(P) is also the least Somodel in the lattice (S, ~). Proposition 6.6 Let P be a p'T'ogram and let S be the set of all the SwmodeL~ of P. Then j'vl 8'-1 (P) = glb(S) (acco'Nling to ~ oTdeTing). The following theorem shows the equivalence of the fixpoint semantics (definition 4.4) and the model-theoretic semantics j'vl S'n (P). Theorem 6.7 (Bossi and Meneg'l/.s 1991] Let P be an n-open program. Then Fo(P) = .i'vlS'I(P), Corollary 6.8 Let P be an n-open pTogT(J:rn. Then Fo(P) is an n-model of P. Related work and conclusions The result of our semantic construction has several sirnilarities with the proof-theoretic semantics defined in [Gaifman and Shapiro 1989a, Gaifman and Shapiro 1989b]. Our construction however is closer to the usual characterization of the semantics of logic programs. Namely we define a topdown operational and bottom-up fixpoint semantics, and, last but not least a model-theoretic semantics which allows us to obtain a declarative characterization of syntactically defined models. The semantics in [Gaifman and Shapiro 1989a] does not characterize computed ansvver substitutions, while the denotation defined by the fully abstract semantics in [Gaifman and Shapiro 1989b] is not a set of clauses (i.e. a program). The framework of [Gaifman and Shapiro 1989a, Gaifman and Shapiro 1989b] can be useful for defining a program equivalence notion, even if our more declarative (model-theoretic) characterization is even more adequate. Moreover, the presence of an operational or a fixpoint semantics makes our construction useful as a formal basis for program analysis. Another related paper is [Brogi et al. 1991]' where n-open logic programs are called open theories. Open theories are provided with a model-theoretic semantics v"hich is based on ideas very similar to those underlying our definition 5.3. [Brogi et al. 1991] however does not consider semantic definitions in the style of our OoCP) which gives a. unique denotation to any open program. 579 Let us finally remark some interesting properties of the n-model On(P). • By means of a syntactic device, we obtain a unique representation for a possibly infinite set of Herbrand models when a unique representative Herbrand model does not exist. A similar device was used in [Dung and Kanchanasut 1989, Kanchanasut and Stuckey 1990, Gabbrielli et a1. 1991J to characterize logic programs with negation. • Operators, such as Un are quite easy and natural to define on On(P). • On(P) can be used for modular program analysis [Giacobazzi and Levi 1991J and for studying new equivalences of logic programs, based on computed answer substitutions. which arf' not considered in [Maher 1988]. • It is strongly related to abd'uction [Eshghi and Kowalski 1989]. If n is the set of abducible predicates, the abductive consequences of any goal G can be found by executing G in Oll(P). • The delayed evaluation of open predicates which is typical of Oll(P) can easily be generalized to other logic languages, to achieve compositionality w.r.t the union of programs. In particular this matches quite naturally the sem.antics of CLP and concurrent constraint programs given in [Gabbrielli and Levi 1990. Gabbrielli and Levi 1991aJ. References [Apt 1988] 1\:. R. Apt. Introduction to Logic Programming. In J. van Leeuvven, editor, Handbook of Theoretical Computer Science, volume B: Formal Models and Semantics. Elsevier, Amsterdam and The MIT Press, Cambridge. 1990. [Bossi et al. 1991] A. Bossi, M. Gabbrielli, G. Levi, and M. C. Meo. Contributions to the Semantics of Open Logic Progratns. Technical Report TR 17/91, Dipartimento di Informatica. Universita di Pisa, 1991. [Bossi and Menegus 1991] A. Bossi and M. IVIenf'gus. Una Semantica Composizionale per Programmi Logici Aperti. In P. Asirelli, editor, Proc. Sixth Italian Conference on Logic Programming, pages 95-109. 1991. [Brogi et a1. 1991J A. Brogi, E. Lamma. and P. Mello. Open Logic Theories. In P. Krnf'ger L.-H. Eriksson and P. Shroeder-Heister, editors, Proc. of the Second WorA.:8hop on Exten8ion8 to Logic Programming, Lecture Notes in Artificial Intelligence. Springer-Verlag, Berlin, 1991. [Dung and Kanchanasut 1989] Phan Minh Dung and K. Kanchanasut. A Fixpoint Approach to Declarative Semantics of Logic Programs. In E. Lusk and R. Overbeck, f'ditors, Proc. North American Conf. on Logic Programming)89, pages 604-625. The MIT Press, Cambridge, Mass., 1989. [Eshghi and Kowalski 1989J K. Eshghi and R. A. Kowalski. Abduction compared with Negation by Failure. In G. Levi and M. Martelli, editors, Proc. Sixth Int)l Conf. on Logic Programming, pages 234-254. The T\IIT Press, Cambridge, Mass., 1989. [Falaschi et a.l. 1988] M. Falaschi, G. Levi, M. Martelli, and C. Palamidessi. A new Declarative Semantics for Logic Languages. In R. A. Kowalski and K. A. Bow·en, editors, Proc. Fifth Int'l Conf. on Logic Programming, pages 9931005. The NIIT Press, Cambridge. :r-dass., 1988. [Falaschi et a.l. 1989a] M. Falaschi. G. Levi, DeclaraM. Martelli, and C. Palamidessi. tive Modeling of the Operational Behavior of Logic Languages. Theoretical Computer Science, 69(3):289-318,1989. [Falaschi et a.l. 1989b] lvI. Falaschi, G. Levi, M. Martelli, and C. Palamidessi. A ModelTheoretic Reconstruction of the Operational Semantics of Logic Programs. Tf:'chnical Report TR 32/89, Dipartimento di Informatica, Universita di Pisa, 1989. To appecu- ill Information and Computation. [Gabbrielli and Levi 1990J M. Gabbrielli and G. Levi. Unfolding and Fixpoint Semantics of Concurrent Constraint Programs. In H. Kirchner and \V. \Vechler, editors, Proc. Second Inn Conf. on Algebra1:c and Logic Pmgrarnming, volume 463 of Led-ure N ote8 in Computer Science, pages 204-216. Springer-Verlag. Berlin, 1990. Extended version to appear ill Thr:oretica.l Computer Science. [Gabbrielli and Levi 1991a] M. Gabbrielli and G. Levi. Modeling Answer Constraints in Constraint Logic Programs. In K. Furukmva, editor, Proc. Eighth Int'l Conf. on Logic Programming, pages 238- 252. The MIT Press. Cambridge, Mass., 1991. 580 [Gabbrielli and Levi 1991 b] M. Gabbrielli and G. Levi. On the Semantics of Logic Programs.· In J. Leach Albert, B. Monien, and M. Rodriguez-Arta.lejo, editors, A-utorrwta,. Lang'uages and Programming, 18th International Colloquium, volume 510 of Ledllre Notes in Comp'llter Science, pages 1-19. SpringerVerlag, Berlin, 1991. [Gabbrielli et al. 1991] M. Gabbrielli, G. Levi, and D. Turi. A Two Steps Semantics for Logic Programs with Negation. Technical report, Dipartimento di Informatica, Universita di Pisa, 1991. [Gaifman and Shapiro 1989a] H. Gaifman and E. Shapiro. Fully abstract compositional semantics for logic programs. In Pr"()(;. Sixtr.enih Annual ACM Symp. on Principles of PTogTa7f/,ming Lang'llages, pages 134-142. AC~I. 1989. [Gaifman and Shapiro 1989b] H. Gaifman and E. Shapiro. Proof theory and semantics of logic programs. In Proc. Fourih IEEE Symp. on Logic In Cornputer Science, pages 50-62. IEEE Computer Society Press. 1989. [Giacobazzi and Levi 1991] R. Giacobazzi anel G. Levi. Compositional Abstract Interpret.ation of Constraint Logic Programs. Technical report, Dipartimento di Informatica. lTni\"(~rsit;\ di Pisa. 1991. [Kanchanasut and Stuckey 1990] E. Eanchanasut and P. Stuckey. Eliminating Negation from Normal Logic Progrmns. In H. Eirchner and W. \Vechler, editors, Pmt. Su;ond Iui'l Conf. on Algebraic and Logic Progra:mming. volume 463 of Leci'llre Note8 in Comp-u.icr Sciencf:. pages 217-231. Springer-Verlag. Berlin. 1990. [Lassez and r-v'Iaher 1984] J.-L. Lassez €, M = A + (B - A) /2, x = Xl + X2 o int(A, M, Xl)' int(M, B, X2), E(€). E(x) : - x = l/nON(n). N(n') n' = n + 10N(n). N(n') : - n' = 10. Let, for instance, c+, c* and Cf be the costs of addition, multiplication/division and f respectively. Variables and constants have a zero cost. Thus, denoting r T such constraint system: F1T(P) = { int(A, B, x) : E(n'): - c*' 4c+ + 3c* + Cf, } N(n') :- 0 o A space of approximate constraints can be specified by defining an auto-weak morphism p which is an upper closure operator (i.e. an idempotent, monotonic and extensive operator) on (e, ~I'f)). As shown in [Cousot and Cousot 79] the approximation process essentially consists in partitioning the space of constraints so that no distinction is made between equivalent constraints, all approximated by a representant of their equivalence class. The equivalence relation is induced by an upper closure operator p: Cl C2 iff P(CI) = P(C2). In [Cousot and Cousot 79] different equivalent methods for specifying abstract domains (i.e. upper closure operators) are presented. However, there are standard techniques in algebraic specifications that allow the definition of abstract constraint systems. For example, cylindrifications can be interpreted as abstractions on the algebra of constraints. =p Proposition 4.3 Let ~ ~ V; :J~ is an auto-weak morphism and upper closure operator on (e, ~I'f)). Existential quantification is then a way to define abstract domains. The space of approximate constraints can also be specified by adding axioms to 588 the underlying constraint system A. These additional axioms extend the meaning of the diagonal elements dt,t' of the algebra, in effect specifying which objects are to be considered "equivalent" from the perspective of the analysis. This is illustrated by the following example: Example 4.2 Consider the logic program P p(O). p(s(x)) q(s(x)) and a simple type (parity) analysis for P. Interpreting P as a constraint logic program on the Herbrand constraint system A H , the type analysis can be specified by extending the axioms specifying the constraint system with the additional axiom: s(s(x)) = x. The resulting constraint system, denoted by A1J, is trivially Noetherian. The semantics of P in AH is {p(x) :- x = 0 V x = s2n(0); q(:r):- V x = n>l n>l s2n-l(0)}; whereas the interpretation in A1J l~turns {p(x) : - x = 0; q(x) : - x = s(O)}. The meaning of P in A1J captures the type of the predicate p and 0 A very useful analysis on the relationships among variables of a program can be specified in our framework [Cousot and Halbwachs 78]. The automatic derivation technique in [Verschaetse and De Schreye 91] for linear size relations among variables in logic programs can be suitably specified as a constraint computation. A constraint system of affine relationships (i.e. linear equalities of the form Co = CIX 1 + ... + cnXn) can be defined by specifying intersection, disjunction and cylindrification (restriction) as. given in [Verschaetse and De Schreye 91]. Generalizations considering linear inequalities, as proposed in [Cousot and Halbwachs 78], can still be defined in our framework, thus making explicit the strong connection between automatic detection of linear relationships among variables and C LP( 3?) computations. Applications of this analysis are: compile time overflow, mutual exclusion, constraint propagation, termination etc. [J¢rgensen et al. 91J. 4.1 1. . Example 4.3 The following weighting map is a norm on the Herbrand term system: Itlsize = 0 if t is avariableort = [], Itlsize = 1+ltaill size ift = [hltail]. o q(x). p(x). q, computing even and odd numbers respectively. Definition 4.3 Let T ba a term system. A norm on T is a function 1< : T ~ N, mapping any term t E T into a natural number. I Generalized Rigidity Analysis There exists a wide class of abstract interpretation techniques for the analysis of ground dependences (also named covering) of pure logic programs [Barbuti et al. 91,Cortesi et al. 91J. In this section we extend the ground dependence notion by means of the notion of rigidity. A norm is a function weighting terms. Let us recall some basic concepts about norms. For a more accurate treatment on this subject see [Bossi et al. 90]. In order to extend the notion of groundness and ground dependences [Barbuti et aI. 91, Cortesi et al. 91] to deal with a more refined one, able to take into account only the relevant subterms of a given (possibly non-ground) term t, we address the notion of rigidity as introduced in [Bossi et al. 90J. Definition 4.4 Let I.. k be a norm on the term system T. A term t E T is rigid with respect to I.. k iff for any substitution of variables a: latk = Itk. I The rigidity of terms turns out to be important in simplifying termination proofs. If a term is rigid, its weight will not be modified by further substitutions. Rigidity is then strongly related to groundness. Any ground term can not change its weight by instantiation, thus it is always rigid. This notion allows to identify those sub terms which are relevant for the analysis purposes. Notice that given a norm I.. k, and a non-rigid term t E T, there must exist some variable in t whose instantiation affects the weight of t. In the Herbrand case, results in [Bossi et al. 90] allow to restrict our attention to a particular class of norms: semilinear norms on Herbrand. Definition 4.5 A norm on 1(~,V) is semilinear iff it may be defined according to the following structure: Itk = 0 if t is a variable; Itk = Co + Itil k + ... + Itimlc if t = f(tl, ... ,t n ), where Co 2: a and {il, ... ,i m } ~ {1, ... ,n}. I Note that the position of the subterms which allow the principal term to change its weight by instantiation depends on the outermost term constructor only (i.e. 1). These subterms are then relevant from the analysis viewpoint. All the non-relevant subterms are discarded by the analysis. Semilinear norms allow to reduce the rigidity notion to a syntactical property of terms. Let Vrel«(t) = { v E V I :3 a such that latk =I- Itk }. As shown in [Bossi et al. 90], given a semilinear norm I.. k, a term t E 1(~,V) is rigid iff Vrel«(t) = 0. The notion of semilinear norms can be generalized to 589 arbitrary term systems in a straightforward way, as follows: given a term system T, we define a function w : T ---+ N; for each t E T, an associated finite set of functions Ft : t ---+ T; and an associative and commutative function ~: N x N ---+ N. Intuitively, for any term t, the value of w(t) is the "initial weight" of the term t, the set of functions F t correspond to the set of selectors for the "relevant" subterms, and ~ indicates how the sizes of the sub terms of a term are to be combined. Then, generalized semilinear norms can be defined as follows: It I = w(t)+ ~fEFt If(t)l· Example 4.4 The "usual" notion of semilinear norms for Herbrand constraint systems can now be generalized as follows, let Co E N: w(t) = 0 if t is a variable, Co otherwise; if t is a variable then F t = 0; otherwise Ft consists of selectors for the relevant positions of t; ~ is summation. The "depth norm", which could not be expressed as a semilinear norm in the development of [Bossi et al. 90], can be defined as follows: w( t) = 0 if t is a variable, 1 otherwise; if t is a variable then F t = 0; otherwise if t = f(tl, ... , tn) then F t = {fill ::s; i ::s; n}, where fi(t) = ti, i.e. fi is the selector for the subterm at the ith position; and ~ is max. 0 Let us consider the set C(V) of finite conjunctions of variables in V (the empty conjunction is denoted €) and a term abstraction map O'.T : T --+ C(V) such that, given a semilinear norm I·· Ie and t E T, O'.T(t) = { Xl /\ ... /\ Xm Vrel((t) = {Xl, ... , x m } } . Let 7( be the corresponding abstract term system where substitutions are performed as usual. Marriott and Spndergaard have proposed an elegant domain, named Prop, further studied in [Cortesi et al. 91]' to represent ground dependences among arguments in atoms. In [Codognet and File 91] an interesting application is introduced. Prop is formalized as a constraint system, and both groundness and definiteness analysis are specified by executing programs in CLP(Bool). The corresponding constraint system does not allow disjunctions of vara.bles, without fully exploiting the expressive power of Prop. The general notion of ground dependence corresponding with any Prop formula (including disjunctions) ca.nnot be specified. I Let A( = (Prop(, V,/\,T,F,3x,t <-t t')xr;,V;t,t'E1{ be the algebra of possibly existentially quantified formulas defined on the term system 7(; including the set of connectives V, /\, <-to Intuitively, the formula X /\ Y /\ z <-t W /\ t = t' where {w,v}; X /\ Y v represents an equation VreZ,(t) = {x,y,z} and Vrel,(t') = represents a term whose rigidity depends upon variables X and y; while X V Y represents a set of terms whose rigidity depends upon variables X or y. Local variables are hidden by existential quantification, projecting away non-global variables in the computation [Codognet and File 91]. Let Bool be a boolean algebraic structure; c ~Bool c' iff B 001 1= c <-t c'. It is easy to prove that Ad';::, Boo! is an abstract constraint system. Exalnple 4.5 Let us consider the semi linear norm "size" and the following constraint logic program on the Herbrand constraint system append(XI' X2, X3) append(xI,x2,X3) Xl = [] /\ X2 = X3· Xl = [hly] /\ X3 = [hlz] Dappend(y, X2, z). The corresponding abstract model is: {append(xI,x2,X3) :- Xl <-t € /\ X2 <-t X3}, generalizing the standard ground behavior (where Vrel(t) var(t): and the abstract model is {append( Xl, X2, X3) : - X3 <-t Xl /\ X2}) vs. sizerigidity behavior: "the second argument list-size can change iff the third argument does". 0 5 Machine-level Traces In this section, we consider an example non-standard semantics for constraint logic programs, that of machine-level traces (for a discussion of similar nonstandard semantics in a denotational context, see [Stoy 77]). Such a semantics is essential, for example, if we wish to reason formally about the correctness of a compiler (e.g. see [Hanus 88]) or the behavior of a debugger or profiler. In this section, we show how the semantics described in earlier sections may be instantiated to describe such low-level behaviors. Instead of constrained atoms where each atom is associated with a constraint, this semantics will associate each atom with a set of machine states (equivalently, instruction sequences) that may be generated on an execution of that atom. The code generated by a compiler for a constraint language must necessarily depend on both the constraint system and the target machine under consideration. Suppose that each "primitive" constraint op( t l , ... , t n ) in the language under consideration corresponds to (an instance of) a (virtual) machine instruction op(t l , ... , t n ).2 For example, correspond2In an actual implementation, each such virtual machine instruction may, of course, "macro-expand" to a sequence of lower-level machine instructions. 590 ing to a constraint 'X = Y + 5' in the language under consideration, we might have a virtual machine instruction 'eq(X, Y + 5)'. Each such machine instruction defines a transformation on machine states, representing the changes that are performed to the heap, stack, :registers, etc. of the machine by the ex. ecution of that instruction (e.g., see [Hanus 88] for a discussion of the WAM along these lines). In other words, let S be the set of all possible states of the machine under consideration, then an instruction I denotes a function I : S --+ S U {fail}, where fail denotes a state where execution has failed. Given a set S, let Soo denote the set of finite and infinite sequences of S. Intuitively, with each execution we want to associate a set of finite and infinite sequences of machine states, that might be generated by an OR-parallel interpreter. Thus, we want the universe of our algebra to be 2s "", the set of sets of finite and infinite sequences of machine states. One subtlety, however, is that instructions may "fail" at runtime because som~ constraints may be unsatisfiable. To model this, it is necessary to handle failure explicitly, since "forward" execution cannot continue on failure. To deal with this, we define the notion of concatenation of sequences of machine states as follows: given any two sequences 81 and 82 of states in S U {fail}, their concatenation 81 8 82 is given by 81 8 82 = if 81 contains fail then 81 else concat (81,82), where concat (81,82) d~notes the "usual" notion of concatenation of finite and count ably infinite sequences. Thus, the cylindric closed semiring in this case is (e, 0 , EEl ,1, 0, 3~, dt,t ' )~~V;t,t'ET where: C = 2(su{fail})OO is the set of finite and infinite sefor any S1, 52 E C, quences of machine states; S10S2 = {818 8 2I 81 E S1,82 E 52}; EEl = U; 1 = {c:}, where c: is the empty sequence; 0 = 0; 3~ corresponds to the function that, given any machine state S, yields the machine state obtained by discarding all information about the variables in ~; and for any t, t' E T, dt,t ' corresponds to the function that, given any machine state S, yields the machine state resulting from constraining t and t' to be equal, and fail if this is not possible. A simple variation on this semantics is one where failed execution sequences are discarded silently. To obtain such a semantics, it suffices to redefine the operation EEl as follows: S1 EEl S2 = { 8 8 E S1 U S2 !\ fail is not in s }. I 6 Related Work A related framework is considered in [Codognet and File 91] where an algebraic definition of constraint systems is given. Program analysis based on ab- stract interpretation techniques are considered, like groundness dnalysis and definiteness analysis for CLP programs. Only 0-composition is considered. The notion of "computation system" is introduced but it is neither formalized as a specific algebraic structure nor extended with the join-operator. In particular, because of the underlying semantics construction, mainly based on a generalization of the top-down SLD semantics, a loop-checker consisting in a "tabled" -interpreter is introduced. The use of tabled interpreters allows to keep separate the notion of abstraction from the finiteness required by any static analysis. As a consequence, static analysis can be performed by "running" the program in the standard CLP interpreter with tabulation. In our framework, no tabulation is considered. This makes the semantics construction more general. Finiteness is a specific property of the constraint system (expressed in terms of EEl-chains), thus allowing to specify nonstandard computations as standard CLP computations over an appropriate non-standard constraint system. Both the traditional top-down and bottomup semantics can then be specified in the standard way thus allowing the definition of goal-independent static analysis as an abstract fixpoint computation, without loop-checking. If the constraint system is not Noetherian, a widening/narrowing technique [Cousot and Cousot 91] can be applied in the fixpoint computation to get a finite approximation of the fixpoint. T; In a related paper, Marriott and S¢ndergaard consider abstract interpretation of CLP. A metalanguage is defined to specify, in a denotational style, the semantics of logic languages. Abstract interpretation is performed by abstracting such a semantics [Marriott and S¢ndergaard 90]. In this framework, both standard and non-standard semantics are viewed as an instance of the meta language specification. Acknowledgment The stimulating discussions with Maurizio Gabbrielli, Michael Maher and Nino Salibra are gratefully acknowledged. References [Aho et al. 74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer Algorithms. Addison Wesley Publishing Company, 1974. [Barbuti et al. 92] R. Barbuti, M. Codish, R. Giacobazzi, and G. Levi. Modelling Prolog Control. In Proc. Nineteenth Annual ACM Symp. on Principles of Programming Languages, pages 95-104, 1992. 591 [Barbuti et aZ. 91] R. Barbuti, R. Giacobazzi, and G. Levi. A General Framework for Semantics-based Bottom-up Abstract Interpretation of Logic Programs. Technical Report TR 12/91, Dipartimento di Informatica, Universita. di Pisa, 1991. To appear in ACM Transactions on Programming Languages and Systems. [Barbuti and Martelli 83] R. Barbuti and A. Martelli. A Structured Approach to Semantics Correctness. "Science of Computer Programming", 3:279-311, 1983. [Bossi et aZ. 90] A. Bossi, N. Cocco, and M. Fabris. Proving Termination of Logic Programs by Exploiting Term Properties. In S. Abramsky and T. Maibaum, editors, Proc. TAPSOFT'91, volume 494 of Lecture Notes in Computer Science, pages 153-180. Springer-Verlag, Berlin, 1991. [Cirulis 88] J. Cirulis. An Algebraization of First Order Logic with Terms. Colloquia Mathematica Societatis Jimos Bolyai, 54, 1991. [Codognet and File 91] P. Codognet and G. File. Computations, Abstractions and Constraints. Technical Report 13, Dipartimento di Matematica Pura e Applicata, Universita di Padova, Italy, 1991. [Cortesi et al. 91] A. Cortesi, G. File, and W. Winsborough. Prop revisited: Propositional Formulas as Abstract Domain for Groundness Analysis. In p.roc. Sixth IEEE Symp. on Logic In Computer Science, pages 322327. IEEE Computer Society Press,1991. [Cousot and Cousot 77] P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proc. Fourth A CM Symp. Principles of Programming Languages, pages 238-252, 1977. [Cousot and Halbwachs 78] P. Cousot and N. Ha.lbwa.chs. Automatic Discovery of Linear Restraints Among Variables of a Program. In Proc. Fifth A CM Symp. Principles of Programming Languages, pages 84-96, 1978. [Cousot and Cousot 79] P. Cousot and R. Cousot. Systematic Design of Program Analysis Fra.meworks. In Proc. Sixth A CM Symp. Principles of Programming Languages, pages 269-282, 1979. [Cousot and Cousot 91] P. Cousot and R. Cousot. Comparing the Galois Connection and Widening/Narrowing Approaches to Abstract Interpretation. Preliminary draft, ICLP'91 Pre-conference workshop, Paris, 1991. [Debray and Ramakrishnan 91] S. Debray and R. Ramakrishnan. Generalized Horn Clause Programs. Technical report, Dept. of Computer Science, The University of Arizona, 1991. [Falaschi et al. 89] M. Falaschi, G. Levi, M. Martelli, and C. Palamidessi. Declarative Modeling of the Operational Behavior of Logic Languages. Theoretical Computer Science, 69(3):289-318, 1989. [Gabbrielli and Levi 91] M. Gabbrielli and G. Levi. Modeling Answer Constraints in Constraint Logic Programs. In K. Furukawa, editor, Proc. Eighth Int'l Conf. on Logic Programnling, pages 238-252. The MIT Press, Cambridge, Mass., 1991. [Hanus 88] M. Hanus. Formal Specification of a Prolog Compiler. In P. Deransart, B. Lorho, and J. Maluszynski, editors, Proc. International Workshop on Programming Languages Implementation and Logic Programming, volume 348 of Lecture Notes in Computer Science, pages 273-282. Springer-Verlag, Berlin, 1988. [Henkin et al. 85] L. Henkin, J.D. Monk, and A. Tarski. Cylindric Algebras. Part I and II. North-Holland, Amsterdam, 1971. (Second edition 1985) [Jaffar and Lassez 87] J. Jaffar and J.-1. Lassez. Con. straint Logic Programming. In Proc. Fourteenth Annual ACM Symp. on Principles of Programming Languages, pages 111-119, 1987. [J0rgensen et al. 91] N. J0rgensen, K. Marriott, and S. Michaylov. Some Global Compile-Time Optimizations for CLP(~). Technical report, Department of Computer Science, Monash University, 1991. [Kemp and Ringwood 90] R. Kemp and G. Ringwood. An Algebraic Framework for the Abstract Interpretation of Logic Programs. In S. Debray and M. Hermenegildo, editors, Proc. North American ConI. on Logic Programming'90, pages 506-520. The MIT Press, Cambridge, Mass., 1990. [Marriott and S0ndergaard 90] K. Marriott and H. S(Ilndergaard. Analysis of Constraint Logic Programs. In S. Debray and M. Hermenegildo, editors, Proc. North American Conf. on Logic Programming'90, pages 531547. The MIT Press, Cambridge, Mass., 1990. [Saraswat et al. 91] V.A. Saraswat, M. Rinard, and P. Panangaden. Semantic foundation of concurrent constraint programming. In Proc. Eighteenth Ann.ual A CM Symp. on Principles of Programming Languages, 1991. [Scott 82] D. Scott. Domains for Denotational Semantics. In Proc. ICALP, volume 140 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1982. [Stoy 77] J .E. Stoy. Denotational Semantics: The ScottStrachey Approach to Programming Language Theory. MIT Press, 1977. Verschaetse [Verschaetse and De Schreye 91] K. and D. De Schreye. Automatic Derivation of Linear Size Relations. Technical report, Dept. of Computer Science, K.U. Leuven, 1991. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 592 Extended Well-Founded Semantics for Paraconsistent Logic Programs Chiaki Sakama ASTEM Research Institute 17 Chudoji Minami-machi, Shimogyo, Kyoto 600 Japan sakama@astem.or.jp Abstract This' paper presents a declarative semantics of logic programs which possibly contain inconsistent information. We introduce a multi-valued interpretation of logic programs and present the extended well-founded semantics for paraconsistent logic programs. In this setting, a meaningful information is still available in the presence of an inconsistent information in a program and an.y fact which is affected by an inconsistent information is distinguished from the others. The well-founded semantics is also extended to disjunctive paraconsistent logic programs. 1 Introduction Recent studies have greatly enriched an expressive power of logic programming as a tool for knowledge .representation. Handling classical negation as well as negation by failure in a program is one of such extension. An extended logic program, which is introduced by Gelfond and Lifschitz [GL90], distinguishes two types of negation and enables us to deal with explicit negation as well as default negation in a program. An extended logic program is, however, possibly inconsistent in general, since it contains negative heads as well as positive ones in program clauses. Practically, an inconsistency is likely to happen when we build a large scale of knowledge base in such a logic program. A knowledge base may contain local inconsistencies that would make a program contradictory and yet it may have a natural intended global meaning. However, in an inconsistent program, the answer set semantics proposed in [GL90] implies every formula from the program. This is also the case for most of the traditional logics in which a piece of inconsistent information might spoil the rest of the whole knowledge base. To avoid such a situation, the so-called paraconsistent logics have been developed which are not destructive in the presence of an inconsistent information [Co74]. From the point of view of logic programming, a possibly inconsistent logic program is called a para consistent logic program. Blair and Subrahmanian [BS87] have firstly de- veloped a fixpoint semantics of such programs by using Belnap's four-valued logic [Be75]. Recent studies such as [KL89, Fi89, Fi91] have also developed a logic for possibly inconsistent logic programs and provided a framework for reasoning with inconsistency. However, from the point of view of logic programming, negation in these' approaches is classical in its nature and the treatment of default negation as well as classical one in paraconsistent logic programming is still left open. In this paper, we present a framework for paraconsistent logic programming in which classical and default negation are distinguished. The rest of this paper is organized as follows. In section 2, we first present an application of Ginsberg's lattice-valued logic to logic programming and provide a declarative semantics of paraconsistent logic programs by extending the well-founded semantics of general logic programs. Then we show how the extended well-founded semantics isolates an inconsistent information and distinguishes meaningful information from others in a program. In section 3, the wellfounded semantics is also extended to paraconsistent disjunctive logic programs. 2 2.1 Well-Founded Semantics for Paraconsistent Logic Programs Multi-valued Logic To present the semantics of possibly inconsistent logic programs, multi-valued logics are often used instead of the traditional two-valued logic. Among them, Belnap's four-valued logic [Be75] is well-known and several researchers have employed this logic to give the semantics of paraconsistent logic programs [BS87, KL89, Fi89, Fi91]. In Belnap's logic, truth values consist of {t, f, T, 1..} in which each element respectively denotes true, false, contradictory, and undefined. Each element makes a complete lattice under a partial ordering defined over these truth values (figure 1). To represent nonmonotoriic aspect of logic programming, however, we need extra truth values which represent default assumption. Such a logic is firstly introduced by Ginsberg [Gi86] in the context of bilattice for 593 T T t t ...L Figure 1. Four-valued logic Figure 2. The logic VII default logic. We use this logic to give the semantics of paraconsistent logic programs. 1 Definition 2.1 Let P be a program and I be its interpretation. Suppose IFF denotes that I satisfies a formula F, then: The set VII = {t, f, dt, df, *, T,.l} is the space of truth values in our seven-valued logic. Here, additional elements dt, df, and *, are read as true by default, false by default, and don't-care by default, respectively. In VII, each element makes a complete lattice under the ordering ~ such that: \Ix E VII, x ~ x and .1 ~ x ~ T; and for x E {t, f}, dx ~ * ~ x (figure 2). A program is a (possibly infinite) set of clauses of the form: A ~ Bl A ... A Bm AnotC1 A ... A notCn where m, n ~ 0, each A, Bi(1 ~ i ~ m) and Cj (1 ~ j ::; n) are literals and all the variables are assumed to be universally quantified at the front of the clause. In a program, two types of negation are distinguished; hereafter, ..., denotes a monotonic classical negation, while not denotes a nonmonotonic default negation. A ground clause (resp. program) is a clause (resp. program) in which every variable is instantiated by the elements of the Herbrand universe of a program. Also, such an instantiation is called H erbrand instantiation of a clause (resp. program). An interpretation I of a program is a function such that I : HB -+- VII where Hn is the Herbrand base of the program. (Throughout of this paper, HB denotes the Herbrand base of a program.) A formula is defined as usual; (i) any literal L or ...,L ,is a formula, (ii) for any literal L, notL and not...,L are formulas, and (iii) for any formula F and G, \IF, 3F, F V G, FAG and F ~ G are all formulas. A formula is closed if it contains no free variable. Satisfaction of a formula is also defined as follows. 1 [KL89] has also suggested the extensibility of their logic for han- dling defaults by using Ginsberg's lattice-valued logic. 1. For any atom A E H B , (a) I FA ift ~ I(A), (b) I F ...,A if f (c) I F notA if df ~ I(A) (d) I F not...,A if dt ~ I(A) ~ I(A), *, ~ ~ *. F 3F (resp. I F \IF) if IFF' for some (resp. every) Herbrand instantiation F' of F. 2. For any closed formula 3F (resp. \IF), I 3. For closed formulas F and G, (a) IFF V G if IFF or I (b) IFF A G if IFF and (c) IFF ~ G F G, I F G, if IFF or I ~ G. 0 The ordering ~ on truth values is also defined between interpretations. For interpretations 11 and 12, 11 ~ 12 iff \lA E H B , Il(A) ~ I2(A). An interpretation I is called minimal, if there is no interpretation J such that J i= I and J ~ I. An interpretation I is also called least, if I ~ J for every interpretation J. An interpretation I is called a model of a program if every clause in a program is satisfied in I. Note that in our logic, the notion o( model is also defined for an inconsistent set of formulas. For example, a program {p, ...,p} has a model I such that I(p) = T. Especially, an interpretation I of a program is called consistent if for every atom A in H B , I(A) i= T. A program is called consistent if it has a consistent model. 594 2.2 lin + 1 = a(I in); Mp = Un<(oI lin. 0 Extended Well-Founded Semantics The well-founded semantics is known as one of the most powerful semantics which is defined for every generallogic program [VRS88, Pr89]. The well-founded semantics has also extended to programs with classical negation in [Pr90], however, it is not well-defined for inconsistent programs in which inconsistent models are all thrown away_ In this section, we reformulate the wellfounded semantics for possibly inconsistent logic programs. To compute the well-founded model, we first present an interpretation of a program by a pair of sets of ground literals. Definition 2.2 For a program P, a pair of sets of ground literals I =< Crj 6 > presents an interpretation of P in which each literal in I is interpreted as follows: For a positive literal L, (i) if L (resp. ,L) is in a, L is true (resp. false) in I; (ii) else if L (resp. ,L) is in 6, L is false by default (resp. true by default) in I; (iii) otherwise, neither L nor ,L is in a nor 6, L is undefined. Especially, if both L and ,L are in a (resp. 6), L is contradictory (resp. don't-care by default) in I. 0 Intuitively, a presents proven facts while 6 presents default facts, and an interpretation of a fact is defined by the least upper bound of its truth values in the pair. Now we extend the constructive definition of the wellfounded semantics for general logic programs [Pr89] to paraconsistent logic programs. Definition 2.3 Let P be a program and I =< a; 6 > be an interpretation of P. For sets T and F of ground literals, the mapping cI> I and WI are defined as follows: cI> I(T) = {A I there is a ground clause A ~ BI /\ ... /\ Bm /\ notCI /\ ... /\ notCn from P s.t. VBi (1 ::; i ::; m) (1 ::; j ::; n) Cj E 6}, Bi E aUT and VCj wI(F) = {A I for every ground clause A ~ BI/\ ... /\ Bm /\ notCI /\ ... /\ notCn from P, either 3Bi (1 ::; i ::; m) 0 s.t. Bi E 6 U F or 3Cj (1 ::; j ::; n) s.t. Cj E a}. Definition 2.4 Let I be an interpretation. Then, TI i 0 = 0 and FI! 0 = HB U,HB (where ,HB = {,A I A E H B } ); TIin+1=cI>I(TIin) and Fj !n+1=wI(Fj ! n); TI = Un I and WI, respectively. Definition 2.5 For every interpretation I, an operator e is defined by: e(I) = IU < T j ; FI >; Ii 0 =< 0;0 >; Lemma 2.1 Mp is the least fixpoint of the monotonic operator e and also a model of P. 0 By definition, Mp is uniquely defined for every paraconsistent logic program. We call such an Mp the extended well-founded model of a program and the meaning of a program represented by such a model is called the extended well-founded semantics of a program. Note that the original fixpoint definition of the wellfounded semantics in [Pr89] is three-valued and defined for general logic programs, while our extended wellfounded semantics is seven-valued and defined for extended logic programs. Compared with the three-valued well-founded semantics, the extended well-founded semantics handles positive and negative literals symmetrically during the computation of the fixpoint. Further, the extended well-founded model is the least fixpoint of a program under the ordering ~, while the three-valued well-founded model is the least fixpoint with respect to the ordering f < .1 < t, which is basically different from ~.2 Example 2.1 (barber's paradox) Consider the following program: shave(b, X) ~ not shave(X, X) Then shave( b, b) is undefined under the three-valued well-founded semantics, while Mp =< 0; {,shave(b,b)} > then shave(b,b) is true by default under the extended well-founded semantics. In another words, the extended well-founded semantics assumes the fact 'the barber shaves himself' without conflicting the sentence in the program. 0 Also it should be noted that the extended wellfounded model is the least fixpoint of a program, but not necessarily the least model of the program in general. Example 2.2 Let P = { ,p ~ not p, ,q ~ ~}. Then Mp =< {,p,q"q}j{p} > and the truth value of each predicate is {p ---+ f, q ---+ T}. While, the least model assigns truth values such as {p ---+ .1, q --+ ,p, q t}. 0 In fact, the above least model is not the fixpoint of the program. In this sense, our extended well-founded semantics is different from the least fixpoint model semantics of [BS87] (even for a program without nonmonotonic negation). The difference is due to the fact that in their least fixpoint model semantics each fact which cannot be proved in a program is assumed to be undefined, while it possibly has a default value under the extended well2This point is also remarked in [Pr89, Pr90j. In terms of the bilattice valued logic [Gi86, Fi91j, the ordering < is called a truth ordering, while the ordering ::S is called a knowledge ordering. 595 founded semantics. The above example also suggests the fact that for. a consistent program P, Mp is not always consistent. The extended well-founded semantics is also different from Fitting's bilattice-valued semantics [Fi89, Fi91]. Example 2.3 Let P = {p - q, p - 'q, q }. Then, as is pointed out in [Su90], p is unexpectedly contradictory under Fitting's semantics, while Mp =< {p, q}; {'p, .q} > then both p and q are true under the extended well-founded semantics. D Now we examine the behavior of the extended wellfounded semantics more carefully in the presence of an inconsistent information. Example 2.4 Let P be the following program: innocent - .guilty .guilty - charged A not guilty charged Then Mp is < {charged, innocent, .guilty}; {guilty, .innocent, .charged} >. Then the truth values of charged and innocent are true, while guilty is false. D In the above example, when we consider the program P' = P U {.innocent -}, the truth value of innocent turns contradictory, while truth values of charged and guilty are unchanged. That is, a meaningful information is still available from the inconsistent program. On the other hand, when we consider the program P" = P U {.charged -, man -}, the truth value of charged is now contradictory, while man, innocent are true and guilty is false. Carefully observing this result, however, the truth of innocent is now less credible than the truth of man, since innocent is derived from the fact .guilty which is now supported by the inconsistent fact charged in the program. Such a situation also happens in Blair and Subrahmanian's fixpoint semantics [BS87], in which a truth fact is not distinguished even if it is supported by an inconsistent fact in a program. In the next section, we refine the extended well-founded semantics to distinguish such suspicious truth facts from others. 2.3 Reasoning with IncoBSistency When a program cont ains an inconsistent information, it is important to detect a fact affected by such an information and distinguish it from other meaningful information in a program. In this section, we present such skeptical reasoning under the extended well-founded semantics. First we introduce one additional notation. For a program P and each literal L from HE, L r is called a suffixed literal where r is a collection of sets of ground literals (possibly preceded by not). Informally speaking, each element in r presents a set of facts which are used to derive L in P (it is defined more precisely below). An interpretation of such a suffixed literal L r is supposed to be the same with the interpretation of L. Definition 2.6 Let P be a program and I =< 0"; 8 > be an interpretation in which 0" (resp. 0) is a set of suffixed literals (resp. a set of ground literals). For a set T (resp. F) of suffixed literals (resp. ground literals), the mapping «Pj and Wj are defined as follows: «pj(T) = {Ar I there are k ground clauses A - BIl A ... AB'mAnotCIlA ... AnotC'n (1 ::; 1 ~ k) from P s.t. VB ,i (1 ~ i ::; m) B~'i E 0" U T and VC'j (1 ::; j ~ n) C ,j E 0 and r = U,{ {B Il , .. , B,m , notCIl , .. , notC,n }U')'Il U.. U')'lm I ')'li E r ,i }. wj(F) = {A I for every ground clause A - Bl A ... A Bm A notCl A ... A notCn from P, either 3Bi (1 ::; i::; m) s. t. Bi E 0 U F or 3Cj (1 ::; j ::; n) s. t. Cp EO"}. D The least fixpoint Mp of a program is similarly defined by using the mapping «Pj and wj instead of «P I and W1, respectively in the previous section. Clearly, Mp is also a model of P and we call such Mp the suspicious well-founded model. Example 2.5 Let P = {p - q A not r, p q s, . r - , S - } . Then, Mp =< {p{{q,$,not r},{ -.r}}, q{{ $}} , .r{0}, s{0}}; {'p, 'q, r, .s} >. .r, D Definition 2.7 Let P be a program and Mp be its suspicious well-founded model. For a suffixed literal L r in M p, if every set in r contains a literal L' or .L' such that L' is contradictory in M p, L is called suspicious. D We consider a proven fact to be suspicious if every proof of the fact includes an inconsistent information. In another words, if there is at least one proof of a fact which contains no inconsistent information, we do not consider such a fact to be suspicious. A proven fact which is not suspicious is called sure. Note that we do not consider any fact derived from true and false by default information to be suspicious, since such a don"t-care information just presents that both positive and negative facts are failed to prove in a program and does not present any inconsistency by itself. The following lemma presents that a fact which is derived using a suspicious fact is also suspicious. Lemma 2.2 Let P be a program and Lr be a suffixed literal in Mp. If each set in~r contains a suspicious fact, then the truth value of L is also suspicious. Proof Suppose that each set')' in r contains a suspicious fact A. Then A has its own derivation histories r' such that each ')" in r' contains a literal which is contradictory in Mp. By definition, ')" ~ ')' then')' also contains 596 in a program. This solution is rather ad-hoc and also easily simulated in our framework by giving higher priorNow reasoning under the suspicious well-founded seities to negative facts in a program. Another approaches mantics is defined as follows. such as [PAA91] and [DR91] consider r~moving contraDefinition 2.8 Let P be a program and Mj, be its diction brought about by default assumptions. For insuspicious well-founded model. Then, for each atom A stance, consider a program {p f - not q, .p f - r, r}. such that A r (resp. .Ar ) is in Mj" A is called true This program has an inconsistent well-founded model, with suspect (resp. false with suspect) if A (resp . • A) is however, it often seems legal to prefer the fact .p to suspicious and .A (resp. A) is not sure in Mj,. p, since p is derived by the default assumption notq, On the contrary, if A (resp. .A) is suspicious but while its negative counterpart .p is derived by the .A (resp. A) is sure in Mj" then A is false (resp. true) proven fact r. Then they present program transformain Mj, without suspect. 0 tions for taking back such a default assumption to generate a consistent well-founded model. In our frameEspecially, if A is both true and false with suspect, work, such a distinction is also achieved as follows. ConA is contradictory with suspect. sider a suspicious well-founded model of the program Example 2.6 Let P be the following program: < {p{{not q }},.p{{r}},r{0}};{q,.q,.r} > where a fact p innocent f - .guilty has a default fact in its derivation history while .p does .guilty f - charged 1\ not guilty not, then we can prefer the fact .p as a more reliable charged f one. These approaches [PAA91, DR91] further discuss .charged f contradiction removal in the context of belief revision or man f abductive framework, but from the point of view of paraconsistent logic programming, they provide no solution where Mj, is < {charged{0} , .charged{0} , man{0}, innocent{{ .... guilty,charged,not guilty}}, 'guilty{{charged,not guilty}}}; for an inconsistent program such as {p, 'p, q}. Another approaches in this direction are [In91, GS92b] in which {guilty, .innocent, .man} >. Then, man is true, the meaning of an inconsistent program is assumed to charged is contradictory, while innocent and guilty are be a collection of maximally consistent subsets of the true with suspect and false with suspect, respectively. program. o the contradictory literal. 0 In the above example, if a new fact guilty is added to P, this fact now holds for sure then guilty becomes true without suspect. 2.4 Related Work Alternative approaches to paraconsistent logic programming based upon the stable model semantics [GL88] are recently proposed in [PR91, GS92a]. These approaches have improved the result of [GL90] in the sense that stable models are well-defined in inconsistent programs. However, these semantics still inherit the problem of the stable model semantics and there exists a program which has no stable model and yet it contains a meaningful information. For example, a program {p f - , q f - not q} has no stable model, while it has an (extended) well-founded model in which p is true. Wagner [Wa91] has also introduced a logic for possibly inconsistent logic programs with two kinds of negation. His logic is paraconsistent and not destructive in the presence of an inconsistent information, but it is still restricted and different from our lattice valued logic. Several studies have also been done from the standpoint of contradiction removal in extended logic programs. Kowalski and Sadri [KS90] have extended the answer set semantics of [GL90] in an inconsistent program by giving higher priorities to negative conclusions 3 Extension to Disjunctive Programs The semantics of logic programs is recently extended to disjunctive logic programs which contain incomplete information in a program. The well-founded semantics is also extended to disjunctive logic programs by several authors [Ro89, BLM90, Pr90]. In paraconsistent logic programming, [Su90] has also extended the fixpoint semantics of [BS87] to paraconsistent disjunctive logic programs. In this section, we present the extended wellfounded semantics for paraconsistent disjunctive logic programs. A disjunctive program is a (possibly infinite) set of the clauses of the form: Al V .. , V Al f- BI 1\ ... 1\ Bm 1\ notCI 1\ ... 1\ notCn where l > 0, m, n ;::: 0, each Ai, B j and C k are literals and all the variables are assumed to be universally quantified at the front of the clause. The notion of a ground clause (program) is also defined in the same way as in the previous section. Hereafter, we use the term normal program to distinguish a program which contains no disjunctive clause. As in [Sa89], we consider the meaning of a disjunctive program by a set of its split programs. Definition 3.1 Let P be a disjunctive program and 597 G be a ground clause from P of the form: Al V ... V A, -+- BI A ... A Bm A notel A ... A noten (1 ~ 2) Then G is split into 2' - 1 sets of clauses G}, .. , G 2 1-1 such that for each non-empty subset Si of {A I, .. , A, } j Gi = {Aj -+- Bl A ... A Bm A notel A ... A noten I Aj E Silo A split program of P is a ground normal program which is obtained from P by replacing each disjunctive clause G with its split clauses G i . 0 Example 3.1 Let P = {p V -'q -+- not r, s -+p, s -+- -,q}. Then there are three split programs of Pj PI = {p -+- not r, s -+- p, S -+- -,q}, P2 = {-,q -+- not r, s -+- p, S -+- -,q}, P3 = {p -+- not r, -,q -+- not r, s -+- p, S -+- -,q}. o Intuitively, each split program presents a possible world of the original program in which each disjunction is interpreted in either exclusive or inclusive way. The following lemma holds from the definition. Lemma 3.1 Let P be a disjunctive program and P;, be its split program. If I is a model of P;" I is also a model of P. 0 The extended well,-founded models of a disjunctive program are defined by those of its split programs. Definition 3.2 Let P be a disjunctive program. Then Mp is called the extended well-founded model of P if Mp is the extended well-founded model of some split program of P. 0 Clearly, the above definition reduces to the extended well-founded model of a normal program in the absence of disjunctive clauses in a program. A disjunctive program has multiple extended wellfounded models in general and each atom possibly has different truth value in each model. In classical twovalued logic programming, a ground atom is usually assumed to be true (resp. false) if it is true (resp. false) in every minimal model of a program. In our multi-valued setting, we define an interpret.tion of an atom under the extended well-founded semantics as follows. Definition 3.3 Let P be a disjunctive program, .. , Mp be its extended well-founded models and M~(A)(i = 1, .. , n) be the truth value of an atom A in M~. Then an atom A in P has a truth value J-L under the extended well-founded semantics if M~(A) = ... = M~, Mp(A) = J-L. 0 Example 3.2 For the program P in example 3.1, there are three extended well-founded models such that M~ =< {p, s}j {-,p, q, -'q, r, -'r, -,s} >, M~ =< {-,q, s}; {p, -,p, q, r, -'r, -,s} > and M~ =< {p,-,q,s};{-,p,q,r,-,r,-,s} >. Then s is true and r is don't-care by default in P under the extended wellfounded semantics, while truth values of p and q are not uniquely determined. 0 When a program has inconsistent models as well as consistent ones, however, it seems natural to prefer consistent models and consider truth values in such models. Example 3.3 Let P = {p -+-, -,p V q -+-}. Then the extended well-founded models of P are M~ =< {p,-,p}j{q,-,q} >, M~ =< {p,q}j{-,p,-,q} > and M~ =< {p, -,p, q}; {-,q} > where only M~ is consistent. o In the above example, a rational reasoner seems to prefer the consistent model M~ to M~ and M~, and interprets both p and q to be true. The extended wellfounded semantics for such a reasoner is defined bellow. Definition 3.4 Let P be a disjunctive program such that M~, .. , Mp (n =F 0) are its consistent extended wellfounded models. Then an atom A in P has a truth value J-L under the rational extended well-founded semantics if M}(A) = ... = Mp(A) = J-L. 0 Lemma 3.2 Let P be a disjunctive program such that it has at least one consistent extended well-founded model. If an atom A has a truth value J-L under the extended well-founded semantics, then A has also the truth value J-L under the rational extended well-founded 0 semantics, but not vice versa. The suspicious well-founded semantics presented in section 2.3 is also extensible to disjunctive programs in a similar way. 4 Concluding Remarks In this paper, we have presented the extended wellfounded semantics for paraconsistent logic programs. Under the extended well-founded semantics, a contradictory information is localized and a meaningful information is still available in an inconsistent program. Moreover, a suspicious fact which is affected by an inconsistent information can be distinguished from others by the skeptical well-founded reasoning. The extended well-founded semantics proposed in this paper is a natural extension of the three-valued well-founded semantics and it is well-defined for every possibly inconsistent extended logic program. Compared with other paraconsistent logics, it can treat both classical and default negation in a uniform way and also simply be extended to disjunctive paraconsistent logic programs. This paper has centered on a declarative semantics 598 of paraconsistent logic programs, but a proof procedure of the extended well-founded semantics is achieved in a straightforward way as an extension of the SLSprocedure [Pr89]' That is, each fact which is true/false in a program have a successful SLS-derivation in a program, while a default fact in a program has a failed derivation. A fact which is inconsistent in a program has a successful derivation from its positive and negative goals. The proof procedure for the suspicious well-founded semantics is also achieved by checking consistency of each literal appearing in a successful derivation. These procedures are sound and complete with respect to the extended wellfounded semantics and also computationally feasible. Acknowledgments I would like to thank V. S. Subrahmanian and John Grant for useful correspondence on the subject of this paper. References [Be75] Belnap, N. D., A Useful Four-Valued Logic, in Modern Uses of Multiple- Valued Logic, J. M. Dunn and G. Epstein (eds.), Reidel Publishing, 8-37, 1975. [BLM90] Baral, C., Lobo, J. and Minker, J., Generalized Disjunctive Well-Founded Semantics for Logic Programs, CS-TR-2436, Univ. of Maryland, 1990. [BS87] Blair, H. A. and Subrahmanian, V. S., Paraconsistent Logic Programming, Proc. Conf. on Foundations of Software Technology and Theoretical Computer Science (LNCS 287), 340-360, 1987. [Co74] Costa, N. C. A. da, On the Theory ofInconsistent Formal Systems, Notre Dame J. of Formal Logic 15, 497-510, 1974. [DR91] Dung, P. M. and Ruamviboonsuk, P., WellFounded Reasoning with Classical Negation, Proc. 1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 120-132, 1991. [Fi89] Fitting, M., Negation as Refutation, Proc. 4th Annual Symp. on Logic in Computer Science, 6369, 1989. [Fi91] Fitting, M., Bilattices and the Semantics of Logic Programming, J. of Logic Programming 11, 91-116, 1991. [Gi86] Ginsberg, M. L., Multivalued Logics, Proc. of AAAI'86, 243-247, 1986. [GL88] Gelfond, M. and Lifschitz, V., The Stable Model Semantics for Logic Programming, Proc. 5th Int. Conf. on Logic Programming, 1070-1080, 1988. [GL90] Gelfond, M. and Lifschitz, V., Logic Programs with Classical Negation, Proc. 7th Int. Conf. on Logic Programming, 579-597, 1990. [GS92a] Grant, J. and Subrahmanian, V. S., Reasoning in Inconsistent Knowledge Bases, draft manuscript, 1992. [GS92b] Grant, J. and Subrahmanian, V. S., The Optimistic and Cautious Semantics for Inconsistent Knowledge Bases, draft manuscript, 1992. [In91] Inoue, K., Extended Logic Programs with Default Assumptions, Proc. 8th Int. Conf. on Logic Pr~­ gramming, 490-504, 1991. [KL89] Kifer, M. and Lozinskii, E. L., RI: A Logic for Reasoning with Inconsistency, Proc. 4th AnnualSymp·. on Logic in Computer Science, 253-262, 1989. [KS90] Kowalski, R. A. and Sadri, F., Logic Programs with Exception, Proc. 7th Int. Conf. on Logic Programming, 598-613, 1990. [PAA91] Pereira, L. M., Alferes, J. J. and Aparicio, N., Contradiction Removal within Well-Founded Semantics, Proc. 1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 105-119, 1991. [Pr89] Przymusinski, T. C., Every Logic Program has a Natural Stratification and an Iterated Least Fixed Point Model, Proc. 8th ACM Symp. O?l Principle of Database Systems, 11-21, 1989. [Pr90]Przymusinski, T. C., Extended Stable Semantics for Normal and Disjunctive Logic Programs, Proc. 7th Int. Conf. on Logic Programming, 459-477, 1990. ' [PR91] Pimentel, S. G. and Rodi, W. L., Belief Revision and Paraconsistency in a Logic Programming Framework, Proc. 1st Int. Workshop on Logic Programming and Nonmonotonic Reasoning, 228-242, 1991. [Ro89] Ross, K., The Well-Founded Semantics for Disjunctive Logic Programs, Proc. 1st Int. Conf. on Deductive and Object Oriented Databases, 352-369, 1989. [Sa89] Sakama, C., Possible Model Semantics for Disjunctive Databases, Proc. 1st Int. Conf. on Deductive and Object Oriented Databases, 337-351, 1989. [Su90] Subrahmanian, V. S., Paraconsistent Disjunctive Deductive Databases, Proc. 20th Int. Symp. on Multiple-valued Logic, 339-345, 1990. [Su90] Subr ahmani an , V. S., V-Logic: A Framework for Reasoning about Chameleonic Programs with Inconsistent Completions, Fundamenta Informaticae XIII, 465-483, 1990. 599 [VRS88] Van Gelder, A., Ross, K. and Schlipf, J. S., Unfounded Sets and Well-Founded Semantics for General Logic Programs, Proc. 7th ACM Symp. on Principle of Database Systems, 221-230, 1988. [Wa91] Wagner, G., A Database Needs Two kinds of Negation, Proc. 3rd Symp. on Mathematical Fundamentals of Database and Knowledge Base Systems (LNCS 495), 357-371, 1991. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, . edited by ICOT. © ICOT, 1992 600 Formalizing Database Evolution in the Situation Calculus Raymond Reiter Department of Computer Science University of Toronto Toronto, Canada M5S lA4 and The Canadian Institute for Advanced Research email: reiter@ai.toronto.edu Abstract vVe continue our exploration of a theory of database updates (Reiter [21, 23]) based upon the situation calculus. The basic idea is to take seriously the fact that databases evolve in time, so that updatable relations should be endowed with an explicit state argument representing the current database state. Database transactions are treated as functions whose effect is to map the current database state into a successor state. The formalism is identical to that arising in the artificial intelligence planning literature and indeed, borrows shamelessly from those ideas. Within this setting, we consider several topics, specifically: 1. A logic programming implementation of query evaluation. 2. The treatment of database views. 3. State constraints and the ramification problem. 4. The evaluation of historical queries. 5. An approach to indeterminate transactions. 1 1. We sketch a logic programming implementation of the axioms defining a database under updates. While we give no proof of its correctness, we observe that under suitable assumptions, Clark completion axioms (Clark [3]) should yield such a proof. 2. We show how our approach can accommodate database views. 3. The so-called ramification problem, as defined in the AI planning literature, arises in specifying database updates. Roughly speaking, this is the problem of incorporating, in the axiom defining an update transaction, the indirect effects of the update as given by arbitrary state constraints. We discuss this problem in the database setting, and characterize its solution in terms of inductive entailments of the database. 4. An historical query is one that references previous database states. We sketch an approach to such queries which reduces their evaluation to evaluation in the initial database state, together with conventional list processing techniques on the list of those update transactions leading to the current database state. Introduction Elsevlhere (Reiter [21, 23]), we have described how one may represent databases and their update transactions vvithin the situation calculus (McCarthy [13]). The basic idea is to take seriously the fact that databases evolve in time, so that updatable relations should be endowed with an explicit state argument representing the current database state. Database transactions are treated as functions, and the effect of a transaction is to map the current database state into a successor state. The resulting formalism becomes identical to theories of planning in the AI literature (See, for example, (Reiter [18])). Following a review of some of the requisite basic concepts and results, we consider several topics in this paper: 5. The database axiomatization of this paper addresses only determinate transactions; roughly speaking, in the presence of complete information about the current database state, such a transaction determines a unique successor state. By appealing to some ideas of Haas ([7]) and Schubert ([24]), we indicate how to axiomatize indeterminate database transactions. 2 Preliminaries This section reviews some of the basic concepts and results of (Reiter [23, 21, 19]) which provide the necessary Qackground for presenting the material of this paper. 601 These include a motivating example, a precise specification of the axioms used to formalize update transactions and databases, an induction axiom suitable for proving properties of database states, and a discussion of query evaluation. 1. register(st, course): Register student st in course course. 2. change(st,course,grade): Change the current grade of student st in course course to grade. 3. drop(st, course): Student st drops course course. 2.1 The Basic Approach: An Example In (Reiter [23]), the idea of representing databases and their update transactions within the situation calculus was illustrated with an example education domain, which we repeat here. Relations The database involves the following three relations: 1. enrolled(st,course,s): Transaction Preconditions Normally, transactions have preconditions which must be satisfied by the current database state before the transaction can be "executed". In our example, we shall require that a student can register in a course iff she has obtained a grade of at least 50 in all prerequisites for the course: Student st is enrolled in course course when the database is in state s. Poss(register(st, c), s) == {(Vp).prerequ(p,c)::J (3g).grade(st,p,g,s) Ag 2 50}.l 2. grade( st, course, grade, s): The grade of student st in course cou'rse is grade when the database is in state s. It is possible to change a student's grade iff he has a grade which is different than the new grade: 3. prerequ(pre, course): pre is a prerequisite course for course course. Notice that this relation is 'state independent, so is not expected to change during the evolution of the database. Poss(change(st,c,g),s) == (3g') .grade( st, c, g', s) A g' =J g. A student may drop a course iff the student is currently enrolled in that course: Poss(drop(st,c),s) == enrolled(st,c,s). Initial Database State We assume given some first order specification of what is true of the initial state So of the database. These will be arbitrary first order sentences, the only restriction being that those predicates which mention a state, mention only the initial state So. Examples of information which might be true in the initial state are: enrolled(Sue, ClOO, So) V enrolled(Sue, C200, So), (3c)enrolled(Bill, c, So), (Vp).prerequ(p, P300) == p = PIOO V P = MIOO, Update Specifications These are the central axioms in our formalization of update transactions. They specify the effects of all transactions on all updatable database relations. As usual, all lower case roman letters are variables which are implicitly universally quantified. In particular, notice that these axioms quantify over transactions. In what follows, do( a, s) denotes that database state resulting from performing the update transaction a when the database is in state s. Poss(a,s):J [enrolled(st,c,do(a,s)) == a = register(st,c) V enrolled(st,c,s) A a =J drop(st, c)], (Vp)-,prerequ(p, CIOO), (Vc).enrolled(Bill, c, So) == c = MIOO V c = ClOO V c = P200, enrolled(M ary, ClOO, So), -,enrolled(John, M200, So), ... Poss(a, s) :J [grade(st, c,g, do(a, s)) == a = change(st, c,g) V grade(st,c,g,s) A (Vg')a =J change(st,c,g')]. grade(Sue, P300, 75, So), grade(Bill, 111200, 70, So), . . . prerequ(M200, MIOO), -'prerequ(MIOO, CIOO), .. . 2.2 An Axiomatization of Updates Database Transactions The example education domain illustrates the general principles behind our approach to the specification of Update transactions will be denoted by function symbols, and will be treated in exactly the same way as actions are in the situation calculus. For our example, there will be three transactions: lIn the sequel, lower case roman letters will denote variables. All formulas are understood to be implicitly universally quantified with respect to their free variables whenever explicit quantifiers are not indicated. 602 database update transactions. In this section we precisely characterize a class of databases and updates of which the above example will be an instance. where, for not.ational convenience, we assume that F's last argument is of sort state, and where F is a simpl~ formula, all of whose free variables are among Unique Names Axioms for Transactions a, s, Xl, . .. ,X n . For distinct transaction names T and T', T(x) i= T'(f}). Identical transactions have identical arguments: T(X1' ... , xn) = T(y!, ... , Yn) => Xl = Y1 A ... A Xn = Yn for each function symbol T denoting a transaction. U nique Names Axioms for States i= do(a,s), = do(a',s') => a = a' As = s'. (Va,s)So (Va,s,a', s').do(a, s) Definition: The Simple Formulas The simple formulas are defined to be the smallest set such that: . 1. F(~s) and F(~So) are simple whenever F is an updatable database relation, the are terms, and s is a variable of sort state. 2 t 2. Any equality atom is simple. 3. Any other atom with predicate symbol other than Pass is simple. 4. If Sl and S2 are simple, so are --,S1, Sl A S2, Sl 'l.fS2, Sl => S2, Sl == S2. 5. If S is simple, so are (:lx)S and (Vx)S whenever X is an individual variable not of sort state. In short, the simple formulas are those first order formulas whose updatable database relations do not mention the function symbol do, and which do not quantify over variables of sort state. Definition: Transaction Precondition Axiom A transaction precondition axiom is a formula of the form (Vx, S).PoSS(T(X1, ... ,xn), s) == fIr, where T is an n-ary transaction function, and fIr is a simple formula whose free variables are among X1,···,X n,S. Definition: Successor State Axiom A successor state axiom for an (n + 1)-ary updatable database relation F is a sentence of the form 2.3 An Induction Axiom There is a close analogy between the situation calculus and the theory of the natural numbers; simply identify So with the natural number 0, and do(Add1, s) with the successor of the natural number s. In'effect, an axiomatization in the situation calculus is a theory in which each "natural number" s has arbitrarily many successors.3 Just as an induction axiom is necessary to prove anything interesting about the natural numbers, so also is induction required to prove general properties of states. This section is devoted to formulating an induction axiom suitable for this task. We begin by defining an ordering relation < on states. The intended interpretation of s < s' is that state s' is reachable fTom state s by some sequence of transactions, each action of which is pos.sible in that state resulting from executing the transactions preceeding it in the sequence. Hence, < should be the smallest binary relation on states such that: 1. 2. (7 < doe a, (7) whenever transaction a is possible in state (7, and (7 < doe a, (7') whenever transaction a is possible in (7' and (7 < (7'. state This can be achieved with a second order sentence, as follows: Definitions: s < s', s :::; s' (Vs, s').s < s' == (VP).{[(Va,sl).Poss(a,sl) => P(sl,do(a,sl))] A [(Va, Sl, S2)'POSS( a, S2) A P( Sl, S2) => pes!, do(a, S2))]} => pes,s'). (1) (Vs,s')s :::; s' == s < s' V s = s'. (2) Reiter [20] shows how these axioms entail the following induction axiom suitable for proving properties of states s when So :::; s: (VW).{W(So) A [(Va, s).Poss(a, s) A So:::; s A W(s) => W(do(a,s))]} :) (Vs ).So :::; s => W(s). (3) (Va,s).Poss(a,s) => (Vx!, ... ,xn).F(X1, ... ,xn,do(a,s)) == F This is our analogue of the standard second order induction axiom for Peano arithmetic. 2For notational convenience, we assume that the last argument of an updatable database relation is always the (only) argument of sort state. 3There could even be. infinitely many successors whenever an action is parameterized by a real number, as for' example move(block, location). '603 Reiter [23, 20J provides an approach to database integrity constraints in which the concept of a database satisfying its constraints is defined in terms of inductive entailment from the database, using this and other axioms of induction for the situation calculus. In this paper, we shall find other uses for induction in connection with database view definitions (Section 4), the so-called ramification problem (Section 5), and historicaf queries (Section 6). 2.4 Databases Defined In the sequel, unless otherwise indicated, we shall only consider background database axiomatizations 'D of the form: D = less-axioms U Dss U Dtp U'Duns U 'Dunt U Dso where • less-axioms are the axioms (1), (2) for < and ~. • 'Dss is a set of successor state axioms, one for each updatable database relati'on. • Dtp is a set of transaction precondition axioms, one for each database transaction. • 'Duns is the set of unique names axioms for states. • 'Dunt is the set of unique names axioms for transactions. Querying an evolving database is precisely the temporal projection problem in AI planning [8J.4 Definition: A Regression Operator R Let W be first order formula. Then R[WJ is that formula obtained from W by replacing each atom F (~ do( 0:,0-)) mentioned by W by F(~ 0:, 0-) where F's successor state axiom is (Va, s).Poss(a, s) :J (Vx).F(x,do(a,s)) == F(X, a,s). All other atoms of W not of this form remain the same. The use of the regression operator R is a classical plan synthesis technique (Waldinger {25]). See also (Pednault [16, 17]). Regression corresponds to the operation of unfolding in logic programming. For the class of databases of this paper, Reiter [23, 19J provides a sound and complete query evaluator based on regression. In this paper, we shall have a different use for regression, in connection with defining database views (Section 4). 3 Updates in the Logic Programming Context It seems that our approach to database updates can be implemented in a fairly straightforward way as a logic program, thereby directly complementing the logic programming perspective on databases (Minker [15]). For example, the axiomatization of the education example of Section 2.1 has the following representation as clauses: Successor State Axiom Translation: • Dso is a set of first order sentences with the property that So is the only term of sort state mentioned by the database updatable relations of a sentence of Dso' See Section 2.1 for an example Dso' Thus, no updatable database relation of a formula of Dso mentions a variable of sort· state or the function symbol do. Dso will play the role of the initial database (i.e. the one we start off with, before any transactions have been "executed"). 2.5 Querying a Database Notice that in the above account of database evolution, all updates are virtual; the database is never physically changed. To query the database resulting from some sequence of transactions, it is necessary to refer to this sequence in the query. For example, to determine if John is enrolled in any courses after the transaction sequence drop(J ohn, ClOO), register(M ary, ClOO) has beeri 'executed', we must determine whether Database F (3c).enrolled(John,c, do(register(M ary, ClOO), do(drop(John, ClOO), So))). enrolled( st, c, do( register( st, c), s)) t - Poss(register(st,c),s). enrolled( st, c, do( a, s)) t - a=j:. drop(st,c),enrolled(st,c,s),Poss(a,s). grade( st, c, g, do( change(st, c, g), s)) t - Poss(change(st,c,g),s). grade( st, c, g, do( a, s)) t - a =j:.change( st, c, g'), grade( st, c, g, s), Poss( a, s ).5 Transaction Precondition Axiom Translation: Poss(register(st, c), s) t - not P(st, c, s). Q(st,p,s) t - grade(st,p,g,s),g 2': 50. 6 Poss(change(st,c,g),s) t - grade(st,c,g',s),g =j:. g'. Poss(drop(st,c),s) t - enrolled(st,c,s). 4This property of our axiomatization makes the resulting approach quite different than Kowalski's situation calculus formalization of updates [9], in which each database update is accompanied by the addition of an atomic formula to the theory axiomatizing the database. 5This translation is problematic because it invokes negationas-failure on a non-ground atom. The intention is that whenever . a is bound to a term whose function symbol is change, the call should fail. This can be realized procedurally by retaining the clause sequence as shown, and simply deleting the inequality a =f. change(st, c, g'). 604 With a suitable clausal form for Dso , it would then be possible to evaluate queries against updated databases, for example enrolled(John, C200, do( register(M ary, ClOO), do( drop( John, ClOO), So))). not mention V and whose free variables are among x, s. Suppose further that D BS contains the successor state axiom (6) for V, and that Dso contains the initial state axiom (5). Then, DU {3} F= (Vs).So f- Presumably, all of this can be made to work under suitable conditions. The remaining problem is to characterize what these conditions are, and to prove correctness of such an implementation with respect to the logical specification of this paper. In this connection, notice that the equivalences in the successor state and transaction precondition axioms are reminiscent of Clark's [3J completion semantics for logic programs, and our unique names axioms for states and transactions provide part of the equality theory required for Clark's semantics (Lloyd [12], pp.79, 109). Views In our setting, a view is an updatable database relation V(x,s) defined in terms of so-called base predicates: (Vx,s).V(x,s) == B(x,s), (4) x where B is a simple formula with free variables among and s, and which mentions only base predicates. 7 Unfortunately, sentences like (4) pose a problem for us because they are precluded by their syntax from the databases considered in this paper. However, we can accommodate nonrecursive views by representing them as follows: (Vx).V(x, So) == B(x, So), (Va, s ).Poss( a, s) ::) (Vx). V( x, do( a, s)) == R.[B( x, do( a, s) )J.8 (5) (6) Sentence (5) is a perfectly good candidate for inclusion in D so ' while (6) has the syntactic form of a successor state axiom and hence may be included in Dss. This representation of views requires some formal justification, which the following theorem provides: Theorem 1 Suppose V(x, s) is an updatable database relation) and that 8(x,s) is a simple formula which does 6We have here invoked some of the program transformation rules of (Lloyd [12], p.113) to convert the non-clausal formula {('v'p).prerequ(p, c) ::) (3g).grade(st, c, g, s) 1\ 9 ~ s::) (Vx).V(x,s) == 8(x,s). Theorem 1 informs us that from the initial state and successor state axioms (5) and (6) we can inductively derive the view definition (Vs).So ~ s::) (Vx).V(x,s) == B(x,s). This is not quite the same as the view definition (4) with which we began this discussion, but it is close enough. It guarantees that in any database state reachable from the initial state So, the view definition (4) will be true. We take this as sufficient justification for representing views within our framework by the axioms (5) and (6). 5 4 ~ State Constraints and R~mification Problem the Recall that our definition of a database (Section 2.4) does not admit state-dependent axioms, except those of Dso referring only to the initial state So. For example, we are prevented from including in a database a statement requiring that any student enrolled in C200 must also be enrolled in C 100. (Vs, st).So ~ sA enrolled(st, C200, s) ::) enrolled( st, ClOD, s). (7) In a sense, such a state-dependent constraint should be redundant, since the successor state axioms, because they are equivalences, uniquely determine all future evolutions of the database given the initial database state So. The information conveyed in axioms like (7) must already be embodied in Dso together with the successor state and transaction precondition axioms. We have already seen hints of this observation. Reiter [20J proposes that dynamic integrity constraints should be viewed as inductive entailments of the database, and gives several examples of such derivations. Moreover, Theorem 1 shows that the view definition (Vs).So ~ s::) (Vx).V(x,s) == 8(i,s). is an inductive entailment of the database containing the initial state axiom (5) and the successor state axiom (6). These considerations suggest that a state constraint can be broadly conceived as any sentence of the form 50} ::) Poss(register(st, c), s) to a Prolog executable form. P and Q are new predicate symbols. 7We do not consider recursive views. Views may also be defined in terms of other, already defined views, but everything eventually "bottoms out" in base predicates, so we only consider this case. 8Notice that since we are not considering recursive views (i.e., f3 does not mention V), the formula n[f3(x, do(a, s))] is well defined. (Vs 1 , •.. , sn).SO ~ Si A Si ~ Sj A··· ::) W(Sl,"" sn), and that a database is said to satisfy this constraint iff the database inductively entails it. 9 9See Section 2.3 for a brief discussion of inductively proving properties of states in the situation calculus. 605 The fact that state constraints like (7) must be inductive entailments of a database does not of itself dispense with the problem of how to deal with such constraints in defining the database. For in order that a state constraint be an inductive entailment, the successor state axioms must be so chosen as to guarantee this entailment. For example, the original successor state axiom for enroll (Section 2.1) was: Poss(a,s)::) {enTollecl(st,c,clo(a,s)) == a = TegisteT(st,c)V enTollecl(st,c,s) /\ a i- clrop(st, c)}. Historical Queries Using the relations < and ::; on states, as defined in Section 2.3, it is possible to pose hi.stoTical queries to a database. First, some notation. Notation: do([al, ... ,n],s) Let aI, ... ,an be transactions. Define clo( [ ], .s) = .s, (8) and for As one would expect, this does not inductively entail (7). To accommodate the state constraint (7), this Sllccessor state axiom mllst be changed to: Po.ss(a,s)::) {enTollecl(st,c,clo(a,.s)) == a = regi.steT( .st, c) /\ [c = C200 ::) enTollecl( .st, ClOO,.s)] V enrolled(.st,c,.s) /\ a i- clTOp(.st,c)/\ [c = C200 ::) a i- dTop(.st, ClOD)]}. (9) It is now simple to prove that, provided 'Dsa contains the unique names axiom ClOD i- C200 and the initial instance of (7), enTolled( .st, C200, So) ::) enTolled( .st, ClOD, So), then (7) is an inductive entailment of the database. The example illustrates the subtleties involved in getting the successor state axioms to reflect the intent of a state constraint. These difficulties are a manifestation of the so-called ramification problem in artificial intelligence planning domains (Finger [4]). Transactions might have ramifications, or indiTect effect.s. For the example at hand, the transaction of registering a student in C200 has the direct effect of causing the student to be enrolled in C200, and the indirect effect of causing her to be enrolled in ClOD (if she is not already enrolled in ClOD). The modification (9) of (8) was designed to capture this indirect effect. In our setting, the ramification problem is this: Given a static state constraint like (7), how can the indirect effects implicit in the state constraint be embodied in the successor state axioms so as to guarantee that the constraint will be an inductive entailment of the database? A variety of circumscriptive proposals for addressing the ramification problem have been proposed in the artificial intelligence literature, notably by Baker [1], Baker and Ginsberg [2], Ginsberg and Smith [5], Lifschitz [10] and Lin and Shoham [11]. Our formulation of the problem in terms of inductive entailments of the database seems to be new. For the databases of this pa. per, Fanghzen Lin lo appears to have a solution to this problem. lOPersonal communication. 6 17, = 1,2, ... do( [aI, ... , an], s) is a compact notation for the state term clo(a n , do(an-I,'" clo(al' s) .. .)) which denotes that state resulting from performing the transaction aI, followed by a2, ... , followed by an, beginning in state .s. Now, suppose T is the transaction sequence leading to the current database state (i.e., the current database state is clo(T, So)). The following asks whether the database was ever in a state in which John was simultaneously enrolled in both ClOD and 1I1100? (3s).50 ::; s /\.s ::; clo(T, 5 0 )/\ em'ollecl( John, ClOD, s) /\ enTollecl( John, 1I1100,.s). (10) Has Sue always worked in department 13? (\ls).5 0 ::; s /\ s ::; clo(T, So) ::) emp(5ue, 13, s). (11) The rest of this section sketches an approach to answering historical queries of this kind. The approach is of interest because it reduces the evaluation of such queries to evaluations in the initial database state, together with conventional list processing techniques on the list of those transactions leading to the current database state. Begin by considering two new predicates, last and mem-dZtf. The intended interpretation of last(s, a) is that the transaction a is the last transaction of the sequence.s. For example, last( clo( [clTOp( 111 w'y, ClOD), registeT( John, ClOD)]' 50), TegisteT( John, ClOD)). is true, while la.st( do([ clrop(M ary, ClOD), clrop( John, ClOD)]' 50)' 1'egisteT( John, ClOD)) is false, assuming unique names axioms for transactions. The following two axioms are sufficient for our purposes: .(last(50 , a). la.st( do( a,.s), a') == a = a'. 606 The intended interpretation of rnern-d1jJ( a, s, Sf) is that transaction a is a member of the "list difference" of s and 5', where state s' is a "su blist" of .5. For example, mern-diff( drop(JVJ ary, C100), dO([Tegister( John, C100), dTOp(Bill, ClOO), dTOp(1I1 aTY, ClOO), drop( John, 111100)], So), dO([TegisteT(John, ClOO)J, So)) This form of the original query is of interest because it reduces query evaluation to evaluation in the initial database state, together with simple list pTocessing on the list T of those transactions leading to the current database state. We can verify that Sue has always been employed in department 13 in one of two ways: 1. Verify that she was initially employed in department 13, and that neither jire(Sue) nor quit(Sue) are members of list T. is true, whereas 2. Verify that T has a sublist ending with hire(Sue,13), and that neither jire(Sue) nor quit(Sue) are members of the list difference of T and this sublistY rnern-diff( registerUvI aTY, C1 00), do([register( John, ClOO), drop(Bill, ClOO), drop(M ary, ClOO), drop(J ohn, 111100)], So), do([register( John, ClOO)J, So)) is false (assuming unique names axioms for transactions). The following axioms will be sufficient for our needs: ,'mern-diff(a, s, s). s S s':::) rnern-diff(a,do(a,s'),s). We now consider evaluating the first query (10) in the same list processing spirit. We shall assume that (8) is the successor state axiom for enrolled. Using the above sentences for last and rnern-diff, together with (8) and the induction axiom (3), it is possible to prove: So S s :::) enrolled(st, c, s)) == enrolled(st,c,So) /\ ,rnern-diff(drop(st,c),s,So) V (3s').So S s' S s /\ last(s',regi5ter(st, c)) /\ ,rnern-diff(drop(st, c), s, 5'). mern-diff( a, s, Sf) :::) rnern-diff( a, do( a', s), s'). mem-diff(a,do(a',s),s') /\ a =I- a':::) rnern-diff(a,s,s'). 'vVe begin by showing how to answer query (11). Suppose, for the sake of the example, that the successor state axiom for emp is: Poss(a,s):::) emp(p,d,do(a,5)) == a = hiTe(p,d) V emp(p, d,s) /\ a =I- jire(p) /\ a =I- quit(p). Then, on the assumption that the transaction sequence T is legal, it is simple to prove that the query (10) is equivalent to: (3s).So S s S do(T, So) /\ enrolled( John, ClOO, So) /\ enrolled(John, MlOO, So) /\ ,rnern-diff( drop( J olm, ClOO), s, So) /\ ,rnern-diff( drop( John, M100), 5,5'0) Using this, and the sentences for last and rnern-diff together with the induction axiom (3), it is possible to prove: So S s :::) emp(p, d, s) == emp(p, d, So) /\ ,rnern-diff(fire(p) , s, So) /\ ,rnern-diff(quit(p) , s, So) (3s').So S s' S s/\ last(s', hire(p, d)) /\ ,rnern-diff(fi1'e(p) , 5, Sf) /\ ,rnern-diff( quit(p), 5, s'). { } V V Using this and the (reasonable) assumption that the transaction sequence T is legal,l1 it is simple to prove that the query (11) is equivalent to: enrolled(John, ClOO, 5'0) /\ ,rnern-diff(drop(J ohn, ClOO), s, 5'0) /\ (35').So Ss' S s /\ last(s', register(John, MlOO)) /\ ,rnern-diff( drop( John, MlOO), s, Sf) V 1 enroll ed( John, Ml 00, 5'0) /\ (3s").5'0 S S" S s /\ last( S", register( John, ClOO)) /\ ,rnern-diff( drop( John, ClOO), s, S") { emp(Sue, 13, So) /\ } ,rnern-diff(fire(Sue) , do(T, So), So) /\ { ,rnern-diff( quit(Sue), do(T, So), So) ) } V V (3s').So S s' S do(T, So) /\ } last( s', hire( Sue, 13)) /\ { ,rnern-diff(fire(Sue), do(T, So), Sf) /\ ,rnern-diff(quit(Sue) , do(T, So), Sf). 11 Intuitively, T is legal iff each transaction of T satisfies its preconditions (see Section 2,1) in that state resulting from performing all the transactions preceeding it in the sequence, beginning with state So' See (Reiter [19)) for details, and a procedure for verifying the legality of a transaction sequence. (3s', S") .5'0 S s' S s /\ So S S" S s /\ last(s', register(John, MlOO)) /\ last( S", register( John, C100)) /\ ,rnern-diff( drop( John, M100), s, Sf) J\ ,rnern-diff( drop( John, ClOO), s, S") 1 ) 12The correctness of this simple-minded list processing procedure relies on some assumptions, notable suitable unique names axioms. 607 Despite its apparent complexity, this sentence also has a simple list processing reading; we can verify that John is simultaneously enrolled in ClOO and All00 in some previous database state as follows. Find a sublist (loosely denoted by s) of T such that one of the following four conditions holds: 1. John was initially enrolled in both ClOO and 1\IIlOO and neither drop(John, ClOO) nor clrop( John, All00) are members of list s. 2. John was initially enrolled in ClOO, d7'Op( John, ClOO) is not a member of list s, s has a sublist s' ending with register( John, MlOO) and drop(John, All00) is not a member of the list difference of sand s'. 3. John was initially enrolled in MlOO, drop(John,All00) is not a member of list s, s has a sublist s' ending with register( John, ClOO) and clrop( John, ClOO) is not a member of the list difference of sand s'. 4. There are two sublists s' and s" of s, s' ends with registe7'( J o/m, MlOO), s" ends with register(John,ClOO), clrop(John,All00) is not a member of the list difference of sand s', and dr'op( John, ClOO) is not a member of the list difference of sand s". \Ve can even pose queries about the future, for example, is it possible for the database ever to be in a state in which John is enrolled in both ClOO and C200? (::Is ).So :::; s /\ enrolled( John, ClOO, s) /\ enrolled( John, C200, s). Answering queries of this form is precisely the problem of plan synthesis in AI (Green [6]). For the class of databases ofthis paper, Reiter [22, l8J shows how regression provides a sound and complete evaluator for such queries. 7 Indeterminate Transactions A limitation of our formalism is that it requires all transactions to be determinate, by which we mean that in the presence of complete information about the initial database state a transaction completely determines the resulting state. One way to extend the theory to include indeterminate transactions is by appealing to a simple idea due to Haas [7J, as elaborated by Schubert [24J. As an example, consider the indeterminate transaction drop-astudent(c), meaning that some student - we don't know .whom - is to be dropped from course c. Notice that we cannot now have a successor state axiom of the form Poss(a,s) :J {enrollecl(st,c, do(a, s)) == (st,c, a,s)}. To see why, consider the following instance of this axiom: Poss( drop-a-student(Cl 00), So) :J {enrolled( John, ClOO, clo( drop-a-student(Cl 00), So)) == (John, ClOO, drop-a-student(C100), So)}. Suppose L;o is a complete description of the initial data.base state, and suppose moreover, that L;o 1= Poss( drop-a-student(Cl 00), So) /\ enrolled(John, ClOO, So). By the completeness assumption, L;o 1= ±(John, ClOO, drop-a-student(C100), So), in which case L;o 1= ±enrollecl(John, ClOO, do( drop-a-student (C100), So)). In other words, we would know whether John was the student dropped from ClOO, violating the intention of the drop-a-student transaction. Despite the inadequacies of the axiomatization of Section 2.2 (specifically the failure of successor state axioms for specifying indeterminate transactions), we can represent this setting.with something like the following axioms: (3st)enrolled( st, c, s) :J Poss( drop-a-student( c), s). enrolled(st, c, s) :J Poss(drop(st, c), s). Poss(a, s) :J {a = drop(st,c):J ,enrolled(st,c,do(a,s))}. Poss(a,s):J {a = drop-a-student(c):J (::I!st)em'olled( st, c, s) /\ ,enrolled( st, c, do( a, s)}Y Poss(a, s) :J {-.enrolled(st, c, s) /\ enrolled(st, c, do(a, s)) :J a = register(st, c)}. Poss( a, s) :J {enrollecl(st, c, s) /\ -.enrolled(st, c, do(a, s)) :J a = clrop(st, c) Va = drop-a-student(c)}. The last two formulas are examples of what Schubert [24] calls explanation closure axioms. For the example at hand, the last axiom provides an exhaustive enumeration of those transactions (namely clrop( st , c) and drop-a-student(c)) which could possibly explain how it came to be that st is enrolled in c in the current state s and is not enrolled in c in the successor state. Similarly, the second last axiom explains how a. student could come to be enrolled in a course in which she was not enrolled previous to the transaction. 14 The feasibility of . 13(3!st) denotes the existence of a unique st . is these explanation closure axioms which provide a succinct alternative to the frame axioms (McCarthy and Hayes [14]) which would normally be required to represent dynamically changing worlds like databases (Reiter [23]). 14It 608 such an approach relies on a closure assumption, namely that we, as database designers, can provide a finite exhaustive enumeration of such explaining transactionsY In the "real" world, such a closure assumption is problematic. The state of the world has changed so that a student is no longer enrolled in a course. What can explain this? The school burned down? The student was kidnapped? The teacher was beamed to Andromeda by extraterrestrials? Fortunately, in the database setting, such open-ended possible explaining events are precluded by the database designer, by virtue of her initial choice of some closed set of transactions with which to model the application at hand; no events outside tflis closed set (school burned down, student kidnapped, etc.) can be considered in defining the evolution of the database. This initial choice of a closed set of transactions having been made, explanation closure axioms provide a natural representation of this closure assumption. By appealing to explanation closure axioms, we can now specify indeterminate transactions. The price we pa.y is the loss of the simple regression-based queryevalua.tor of (Reiter [23, 21]); we no longer have a simple sound and complete query evaluator. Of course, conventional first order theorem-proving does provide a query evaluator for such an axiomatization. For example, the following are entailments of the above axioms, together with unique names axioms for transactions and for John and Ma1'Y: en1'olled(J ohn, ClOO, So) 1\ en1'olled(M ary, CIOO, So) Acknowledgements Many of my colleagues provided important conceptual and technical advice. My thanks to Leo Bertossi, Alex Borgida, Craig Boutilier, Charles Elkan, Michael Gelfond, Gosta Grahne, Russ Greiner, Joe Halpern, Hector Levesque, Vladimir Lifschitz, Fangzhen Lin, Wiktor Marek, John McCarthy, Alberto Mendelzon, John Mylopoulos, Javier Pinto, Len Schubert, Yoav Shoham and Marianne Winslett. Funding for this work was provided by the National Science and Engineering Research Council of Canada, and by the Institute for Robotics and Intelligent Systems. References [1] A. Baker. A simple solution to the Yale shooting problem. In R. Brachman, H.J. Levesque, and R. Reiter, editors, Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning (KR '89), pages 11-20. Morgan Kaufmann Publishers, Inc., 1989. [2] A. Baker and M. Ginsberg. Temporal projection and explanation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 906-911, Detroit, MI, 1989. [3] K.L. Clark. Negation as failure. In H. Gallaire and J. Minker, editors, Logic and Data Bases, pages 292-322. Plenum Press, New York, 1978. => enrolled(John, CIOO, do(drop(Mary, CIOO), So)) 1\ -,enrolled(M ary, C100, do( drop(M ary, ClOO), So)). {(Vst).enrolled(st,C100,So) ==st = John} => (Vst)-,enrolled(st, CIOO, do(drop-a-student (C100),So)). {(Vst).enrolled(st, ClOO, So) == st = John V st = Mary} => enrolled(John, CIOO, do( drop-a-student(C100), So)) EEl enrolled(lvJ ary, ClOO, do( drop-a-student( CIOO), So)). Notice that the induction axiom (3) of Section 2.3 does not depend on any assumptions about the underlying database. In p~rticular, it does not depend on successor state axioms. It follows that we can continue to use induction to prove properties of d·atabase states and integrity constraints in the more generalized setting of indeterminate transactions. The fundamental perspective on integrity constraints of (Reiter [20]) - namely that they are inductive entailments of the database - remains the same. 15This assumption is already implicit in our successor state axioms of Section 2.2 [4] J. Finger. Exploiting Constraints in Design Synthesis. PhD thesis, Stanford University, Stanford, CA, 1986. [5] M.L. Ginsberg and D.E. Smith. Reasoning about actions I: A possible worlds approach. Artificial Intelligence, 35:165-195, 1988. [6] C. C. Green. Theorem proving by resolution as a basis for question-answering systems. In B. Meltzer and D. Michie, editors, Machine Intelligence 4, pages 183-205. American Elsevier, New York, 1969. [7] A. R.. Haas. The case for domain-specific frame axioms. In F. M. Brown, editor, The frame problem in artificial intelligence. Proceedings of the 1987 workshop, pages 343-348, Los Altos, California, 1987. Morgan Kaufmann Publishers, Inc. [8] S. Hanks and D. McDermott. Default reasoning, nonmonotonic logics, and the frame problem. In Proceedings of the National Conference on Artificial Intelligence, pages 328-333, 1986. [9] R.. Kowalski. Database updates in the event calculus. Journal of Logic Programming, 12:121-146, 1992. 609 [10] V. Lifschitz. Toward a metatheory of action. In J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning (KR '91), pages 376-386, Los Altos, CA, 1991. Morgan Kaufmann Publishers, Inc. [11] F. Lin and Y. Shoham. Provably correct theories of action. In Proceedings of the National Conference on Artificial Intelligence, 1991. [12] J.W. Lloyd. Foundations of Logic Programming. Springer Verlag, second edition, 1987. [13] J. McCarthy. Programs with common sense. In lV1. Minsky, editor, Semantic Information Processing, pages 403-418. The MIT Press, Cambridge, MA,1968. [14] J. McCarthy and P. Hayes. Some philosophical problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, Machine Intelligence 4, pages 463-502. Edinburgh University Press, Edinburgh, Scotland, 1969. [15] J. Minker, editor. Foundations of Deductive Databases and Logic Programming. Morgan Kaufmann Publishers, Inc., Los Altos, CA, 1988. [16] E.P.D. Pednault. Synthesizing plans that contain actions with context-dependent effects. Computational Intelligence, 4:356-372, 1988. [17] E.P.D. Pednault. ADL: Exploring the middle ground between STRIPS and the situation calculus. In R.J. Brachman, H. Levesque, and R. Reiter, editors, Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning (KR '89j, pages 324-332. Mor- gan Kaufmann Publishers, Inc., 1989. [18] R. Reiter. The frame problem in the situation calcuIus:. a simple solution (sometimes) and a completeness result for goal regression. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pages 359-380. Academic Press, San Diego, CA, 1991. [19] R. Reiter. The projection problem in the situation calculus: A soundness and completeness result, with an application to database updates. 1992. submitted for publication. [20] R. Reiter. Proving properties of states in the situation calculus. 1992. submitted for publication. [21] R. Reiter. On specifying database updates. Technical report, Department of Computer Science, University of Toronto, in preparation. [22] R. Reiter. A simple solution to the frame problem (sometimes). Technical report, Department of Computer Science, University of Toronto, in preparation. [23] R. Reiter. On formalizing database updates: preliminary report. In Proc. 3rd International Conference on Extending Database Technology, Vienna, March 23 - 27, 1992. to appear. [24] L.K. Schubert. Monotonic solution of the frame problem in the situation calculus: an efficient method for worlds with fully specified actions. In H.E. Kyberg, R.P. Loui, and G.N. Carlson, editors, Knowledge Representation and Defeasible Reasoning, pages 23-67. Kluwer Academic Press, 1990. [25] R. Waldinger. Achieving several goals simultaneously. In E. Elcock and D. Michie, editors, Mach1:ne IntelligerLce 8, pages 94-136. Ellis Horwood, Edinburgh, Scotland, 1977. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 610 Learning Missing Clauses by Inverse Resolution Peter Idestam-Ahnquist* Department of Computer and Systems Sciences Stockhohn University Electrum 230, 16440 Kista, Sweden pi@dsv .su.se Abstract The incomplete theory problem has been of large interest both in explanation based learning and more recently in inductive logic programming. The problem is studied in the context of Hom clause logic, and it is assumed that there is only one clause missing for each positive example given. Previous methods have used either top down or bottom up induction. Both these induction strategies include some undesired restriction on the hypothesis space for the missing clause. To overcome these limitations a method where the different induction strategies are completely integrated is presented. The method involves a novel approach to inverse resolution by using resolution, and it implies some extensions to the framework of inverse resolution which makes it possible to uniquely determine the most specific result of an inverse resolution step. 1 Introduction Completion of incomplete theories has been of large interest in machine learning, particularly in the area of explanation based learning, for which a complete theory is crucial [Mitchell et al. 1986, Dejong and Mooney 1986]. Research on augmenting an incomplete domain has been reported in [Hall 1988, Wirth 1988, Ali 1989]. A new framework for inductive learning was invented by inverting resolution [Muggelton and Buntine 1988]. Papers considering augmentation of incomplete theories in this framework are [Wirth 1989, Rouveirol and Puget 1990, RouveiroI1990]. We only consider Hom clause logic, which is a subset of first order logic, and we follow the notation in logic programming [Lloyd 1987]. The incomplete theory problem can then be formulated as follows. Let P be a definite program (an incomplete theory) and E a definite program clause which should but does not follow from P (P 1* E). * This research was supported by NUTEK, the Swedish National Board for Industrial and Technical Development. Then find a definite program clause H such that: (a) Pu{E} 1* H (b) Pu{H} 1= E H is an inductive conclusion according to [Genesereth and Nilsson 1987]. Let E=(Af-Bl. ... ,Bn). Then by top down induction we mean any reasoning procedure, to infer an inductive conclusion, that starts from A. By bottom up induction we mean any inductive reasoning procedure that starts from Bl, ... ,B n . Most previous methods use either top down [Hall 1988, Wirth 1988, Ali 1989] or bottom up induction [Sammut and Banerji 1986, Muggelton and Buntine 1988, Rouveirol and Puget 1990]. Both these induction strategies have some undesired restrictions on the hypothesis space of H. In [Wirth 1989] a method that combines top down and bottom up induction is presented, while in this paper a method where they are completely integrated will be described. In the previous methods there are also other undesired restrictions, namely that the input clause E must be fully instantiated [Hall 1988, Wirth 1988, Wirth 1989, Ali 1989, Sammut and Banerji 1986] or a unit clause [Muggelton and Buntine 1988]. Our method works for full Horn clause logic. Logical entailment is used as a definition of generality. Let E and F be two expressions. Then E is more general than F, if and only if E logically entails F (E 1= F). We also say that F is more specific than E. In the examples, predicate symbols are denoted by p, q, r, s, t and u. Variables (universally quantified) are denoted by x, y, z and w. Constants are denoted by a, band c. Skolem functions are denoted by k. In section 2 the inductive framework of inverse resolution is given. In section 3 some extensions to this framework, which make it possible to determine the most specific inverse resolvent, are described. In section 4 a new 611 inverse resolution method is presented, and finally in section 5 related work and contributions is discussed. condition (c) can be rewritten as: (c) R is the clause (C-{A})8AU(D-{B})8B, where 8=8AU8B and A8A= B8B. 2 The Framework of Inverse Resolution The inductive framework of inverse resolution was first presented in [Muggelton and Buntine 1988]. First, as a background, resolution will be described. Then inverse resolution will be definied, and some problems considering inverse reslution will be pointed out. Let RO be a definite program clause and P a definite program. A linear derivation from RO and P consists of a sequence RO,R 1,... of definite program clauses and a sequence C1,C2, ... of variants of definite program clauses in P such that each Ri+1 is resolved from Ci+1 and Ri. A linear derivation of Rk from RO and P is denoted: (RO;C1) I-R (R1;C2) I-R ... I-R Rk or for short (RO;P) I-R* Rk. 2.2 Inverse Resolution 2.1 Resolution A substitution is a finite set of the form {V1/t1, ... ,Vn/tn}' where each Vi is a variable, each ti is a term distinct from Vi, and the variables Vb ... ,Vn are distinct. Each element Vilti is called a binding for Vi. A substitution is applied by simultaneously replacing each occurence of the variable Vi, in an expression, by the term ti. An expression is either a term, a literal, a clause or a set of clauses. (A fixed ordering of literals in clauses and a fixed ordering of clauses in sets of clauses are assumed.) Let E be an expression and V be the set of variables occurring in E. A renaming substitution for E is a substitution {X1/Yl, ... ,xn/Yn} such that Y1, ... ,Yn are distinct variables and (V-{X1, ... ,Xn})n{Y1, ... ,Yn}=0. Let E and F be expressions. Then E is a variant of F if there exists a renaming substitution 8 such that E=F8. A unifier for two terms or literals t1 and t2 is a substitution 8 such that t18=t28. A unifier 8 for t1 and t2 is called a most general unifier (mgu) for t1 and t2, if for each unifier 8' of t1 and t2 there exists a substitution 8" such that 8'=88". Let C and D be two clauses which have no variables in common. Then the clause R is resolved from C and D, denoted (C;D) I-R R, if the following conditions hold: (a) A is a literal in C and B is a literal in D. (b) 8 is an mgu of A and B. (c) R is the clause «C-{A})u(D-{B })8. The clause R is called a resolvent of C and D. Since C and D have no variables in common, the mgu 8 can uniquely be divided into two disjunct!ve parts 8A and 8B such that 8=8AU8B and A8A= R8B. Consequently A place within an expression is denoted by an n-tuple and defined recursively as follows. The term, literal or clause at place within f(t1, ... ,tn) or {t1, ... ,tn } is tal' The term or literal at place (m>1) within f(tl, ... ,t n) or {t1, ... ,tn} is the term or literal at place in tal' Let E be an expression. Then for each substitution 8 there exists a unique inverse substitution 8- 1 such that E88- 1=E. Whereas the substitution 8 maps variables in E to terms, the inverse substitution 8- 1 maps terms in E8 to variables. An inverse substitution is a finite set of the form {(t1, {P1,1, ... ,P1,ml }/Vl, ... ,(t n,{ Pn,l, .. ·,Pn,m n } )/vn} where each Vi is a variable distinct from the variables in E, each ti is a term distinct from Vi, the variables V1, ... ,Vn are distinct, each Pi,j is a place at which ti is found within E and the places P1,1, ... ,Pn,m n are distinct. An inverse substitution is applied by replacing all ti at places {Pi,l, ... ,Pi,m) in the expression E by Vi. Example: If the following inverse substitution {(a,{ <1,1,2>,<1,2,1,1>,<2,2,1> ))/x} is applied on the expression {(p(a,a)~p(f(a»,(q(a)~r(a»}, the expression {(p(a,x)~p(f(x»,(q(a)~r(x»} is obtained. Let R, C and D be three clauses. If R can be resolved from C and D, then D can be inverse resolved from Rand C. The clause 0 is inverse resolved from Rand C, denoted (R;C) I-IR D, if the following conditions hold: (a) A is a literal in C. (b) 8A is a substitution whose variables are variables that occur in A. (c) (C-{A})8A is a subset of R. (d) r is a subset of (C-{A})8A. (e) 8B- 1 is an inverse substitution whose terms are terms that occur in A. (f) D is the clause «R-r)u{ A}8A)8B-1. 612 The clause D is called an inverse resolvent of Rand C. Given R and C there are four sources of indeterminacy for D, namely: A, SA, rand SB- l . If A is a positive literal then D is forwardly inverse resolved, and if A is a negative literal then D is backwardly inverse resolved. Example: Suppose we have R=(s(a,z)f-q(a),r(b», C=(P(a,x)f-q(a),r(x» and D=(s(y,z)f-p(y,b». The clause D can be forwardly inversed resolved from Rand C, (R;C) I-IR D, if A=p(a,x), SA={x/b}, r=(f-q(a),r(b» and SB- l ={(a,{,<2,1>})/y}. The clause C can be backwardly inverse resolved from Rand D, (R;D) I-IR C, if A=-.p(y,b), SA={y/a}, r=(s(a,z)f-) and SB- 1 = {(b,{<1,2>,<3,1>})/x}. (It is assumed that the positive literal is first in the ordering of literals in a clause.) Unfortunately there are examples when the choice of SA2 is crucial. Example: Let R=(rf-q), Cl =(p(x)f-q), C2=(Sf-p(a» and D=(rf-s). Then there is a linear derivation of R from D and {Cl,C2}: (D;C2) I-R «rf-p(a»;Cl) I-R (rf-q). Consequently, there is an inverse linear derivation of D from Rand {Cl,C2}: (R;Cl) I-IR «rf-p(a»;C2) I-R (rf-s). In the first inverse resolution step SA2 is chosen as {x/a}. With any other choice of SA2 the inverse linear derivation of D would not have been possible. If R, C, A and SAl are given, then it is desirable that a unique most specific inverse resolvent can be determined. Unfortunately, in Horn clause logic, it is not possible due to the substitution SA2. Let DO be a definite program clause and P a definite program. An inverse linear derivation from DO and P consists of a sequence DO,Dl, ... of definite program clauses and a sequence Cl,C2, ... of variants of definite program clauses in P such that each Dj+l is inverse resolved from Cj+l and Dj. An inverse linear derivation of Dk from Do and P is denoted: (DO;Cl) I-IR (Dl;C2) I-IR ... I-IR Dk or for short (DO;P) I-IR* Dk· Example: Let R=(rf-q) and C=(p(x)f-q). If we seek the most specific clause D such that (R;C) I-IR D, then we let r=0 and SB- l =0 but what should SA2 be? If we let SA2=0, the clause Dl=(rf-p(x),q) is obtained. For example the clauses D2=(rf-p(a),q) and D3=(rf-p(b),q) are more specific than Db but neither D2 nor D3 is more specific than the other. Consequently, there is no unique most specific inverse resolvent. A backward inverse linear derivation is an inverse linear derivation where each Dj is backwardly inverse resolved, and a forward inverse linear derivation is an inverse linear derivation where each Dj is forwardly inverse resolved. 3 Extended Inverse Resolution 2.3 Some Problems Consider the definition of inverse resolved. The substitution SA can be divided into two disjunctive parts, SAl including the variables that occur both in A and (C-{ A}), and SA2 including the variables that only occur in A (SA=SA1USA2). Then, to determine an inverse resolvent D, we have to choose A, SAl, SA2, rand SB- I . Only in some special cases there are more than one alternative for A and SAl. Example: Let R=(pf-q(a),r(b» and C=(Pf-q(x),r(x». Then we have either A=-.q(x) and SAl ={ x/b}, or A=-.r(x) and SA1={ x/a}. For rand SB- l there are limited numbers of alternatives, but for S A2 there is not. The terms in S A2 can be any possible terms. Consequently, it is hard to choose SA2. Our inverse resolution method (see section 4) implies some extensions to the framework of inverse resolution. After these extensions the choices of SA2, rand SB- l in inverse linear derivations can be postponed, and the most specific inverse resolvent can be determined. 3.1 Existentially Quantified Variables To postpone the choice of SA2, existentially quantified variables will temporarily be introduced. Any sentence, in which the existentially quantified variables are replaced by Skolem functions, is equal to the original sentence with respect to satisfiability [Genesereth and Nilsson 1987]. Therefore the existentially quantified variables will be represented by Skolem functions. As a consequence of the introduction of existentially quantified variables (Skolem functions), some additional types of substitutions are needed. 613 A Skolemfunction is a term f(Xl, ... ,X n) where f is a new function symbol and Xl, ... ,Xn are the variables associated with the enclosing universal quantifiers. A S kolem substitution is a finite set of the form {Vl/kl, ... ,vn/k n }, where each Vj is a variable, each kj is a Skolem function, and the variables Vl, ... ,Vn are distinct. An inverse Skolem substitution is a finite set of the form {kl/Vl, ... ,kn/vn}, where each ki is a Skolem function, each Vj is a new variable, and the Skolem functions kl, ... ,k n are distinct. Let o'={ xl/kl, ... ,Xn/kn} be a Skolem substitution and o-l={kl/Yl, ... ,kn/Yn} an inverse Skolem substitution such that the Skolem functions in 0' and 0- 1 are exactly the same. Then the composition 0'0'-1 of 0' and 0'-1 is a renaming substitution {Xl/Yl, ... ,Xn/Yn} for any expression E. An existential substitution is a finite set of the form {kl/tl, ... ,kn/t n }, where each kj is a Skolem function (existentially quantified variable), each tj is a term (possibly a Skolem function) distinct from kj, and the Skolem functions kl, ... ,kn are distinct. While a substitution or a Skolem substitution corresponds to a specialization an existential substitution corresponds to a generalization. As an inverse substitution, an inverse existential substitution is specified with respect to an expression E. An inverse existential substitution is a finite set of the form {(tl, {Pl,l, ... ,Pl,ffil }/kl, ... ,(t n,{ Pn,l, ... ,Pn,ffi n } )/k n } where each kj is a Skolem function distinct from the Skolem functions in E, each tj is a term distinct from kj, the Skolem functions kl, ... ,k n are distinct, each Pj,j is a place at which ti is found within E and the places Pl,l, ... ,Pn,ffi n are distinct. An inverse existential substitution is applied by replacing all ti at places {Pj,l, ... ,Pi,ffij} in E by kj. Let o'={ vl/kl, ... ,Vn/kn} be a Skolem substitution and Tl={kl/tl, ... ,kn/tn } an existential substitution such that the Skolem functions in 0' and Tl are exactly the same. Then the composition O'Tl of 0' and Tl is the substitution {Vl/tl, ... ,Vn/tn}. In this way Skolem substitutions and existential substitutions can be used to postpone the choice of SA2. 3.2 Most Specific Inverse Resolution To postpone the choice of r, the notion of optional literals will be used. A clause {Bl, ... ,Bk,Bk+I, ... ,B n }, in which the literals {Bk+ I , ... ,Bn} are optional, is denoted C[c]={BI, ... ,Bk,[Bk+l, ... ,B n]} where C={B}, ... ,Bd and C={Bk+l, ... ,B n }. Consequently, if c=0 then C[c]=C. Example: Let R=(pf-q,r,s) and C=(tf-q,r,s) be two clauses. Then (R;C) I-IR D, where D=(pf-t,q,r,s)-r and r~{-,q,-,r,-,s}. All these alternatives for D can be described in a compact way by using optional literals. Thus, D[d]=(pf-t,[ q,r,s]). The definition of inverse resolved can now be modified in such a way that the choices of SA2, rand SB- 1 are postponed. The clause D is most specific inverse resolved from R[r] (which may include Skolem functions) and C, denoted (R[r];C) 1-.tIR D, if the following conditions hold: (a) A is a literal in C. (b) SAl is a substitution whose variables are variables that occur both in A and (C-{A}). (c) Tl- l is an inverse existential substitution whose terms are terms that occur in (C-{ A}). (d) (C-{A})SAlTl- 1 is a subset of R[r]. (e) 0' is a Skolem substitution whose variables are all the variables that only occur in A. (f) D[d] is the clause D=«R-r)u{ A}SA1O'), d=rur, where r=(C-{A })SAlTl- I . The clause Du d is called a most specific inverse resolvent of Rand C. Given Rand C, there are only two sources of indeterminacy, namely: A and SAl. Consequently, given R, C, A and SAl there is a unique most specific inverse resolvent Dud. Example: Let R=(rf-q) and C=(p(x)f-q). Then the unique most specific inverse resolvent of Rand C is the clause Dud=(rf-p(k),q) where k is a Skolem functions (representing an existentially quantified variable). This is true, since 'v'x(rf-p(x),q) 1= (rf-p(t),q), and (rf-p(t),q) 1= 3x(rf-p(x),q) for any term t. Let DO[do] a be definite program clause and P a definite program. A most specific inverse linear derivation from DO[dO] and P consists of a sequence DO[dO],D}[dI1, ... of definite program clauses and a sequence CI,C2, ... of variants of definite program clauses in P such that each Di+l[dj+I1 is most specific inverse resolved from Cj+l and Dj[dj]. A most specific inverse linear derivation of Dk[dkl from Do[do] and P is denoted: (DO[dO];Cl) 1-.tIR (D}[dl];C2) 1-.tIR ... 1-.tIR Dk[dkl or for short (DO[dO];P) 1-.tIR* Dk[dk]· 614 Each result of an inverse linear derivation can be obtained from the result of some most specific inverse linear derivation, if we apply an inverse substitution, an existential substitution, and drop a subset of the optional literals. Example: Suppose we have the following clauses R=(rf-q), Cl=(P(X)f-q), C2=(Sf-p(a)), C3=(t(b)f-p(b)), D=(rf-s,t(x),p(c» and D'[d']=(rf-s,t(b),[p(k),q]). Then (R;{Cl,C2,C3}) -IR* D and (R; {Cl,C2,C3}) I-J-IR* D'[d']. The clause D can be obtained from D'[d'] by application of the inverse substitution {(b,{<3,1>})/x} and the existential substitution {klc}, and by dropping the optional literal q. The most specific inverse linear derivation of D'[d'] looks as follows: . (R;Cl) I-J-IR «rf-p(k),[q]);C2) I-J-IR «rf-s,[p(k),q]);C3) I-J-IR (rf-s,t(b),[p(k),q]). That TJ-l, in the two last steps are {(a,)/k}, and that k then can be replaced by a third term c, may seem inconsistent, but it is not. Consider the corresponding inverse linear derivation of D from Rand {Cl,C2,C3}: «rf-q);Cl) I-IR «rf--q,p(c»;Cl) I-IR «rf--q,p(b),p(C»;Cl) I-IR «rf--p(a),p(b),p(C));C2) I-IR «rf--s,p(b),p(C»),C3) I-IR (rf--s,t(x),p(c)). Note that since k has been used as three different terms (a, b and c) in the most specific inverse linear derivation, three inverse resolution steps are needed to compensate for the step where k is introduced. Note also that 8A2={X/C} in the first, 8A2={x/b} in the second and 8A2={x/a} in the third inverse resolution step. To choose exactly those substitutions is hard, but in a most specific inverse linear derivation it is not necessary. 3.3 Truncation Generalization A clause Cl 01]-subsumes a clause C2 if there exists a substitution 8 and an existential substitution rt such that C18~C2TJ· If Cl 8rt-subsumes C2 then CI 1= C2. To perform a 01]-truncation is to apply some arbitrary existential substitution rt, apply some arbitrary inverse substitution 8- 1, and drop some arbitrary literals. The generalizarion technique 8rt-truncation corresponds to 8rtsubsumption. Let P be a definite program (an incomplete theory) and E a definite program clause which should but does not follow from P (P 1# E), let D be the set of definite program clauses D such that (E;P) I-IR* D, and let lHl be the set of definite program clauses H such that Pv{H} 1= E. Since resolution is not complete [Rob65] D is a subset of·lHl (D ~ lHl). In particular each definite program clause D' that 8TJ-subsumes some clause D, where D E D, will be in lHl. This is true since Pv{D} 1= E, and D' 1= D, gives us Pv{D'} 1= E. Consequently, we can perform any 8rt-truncation on the result D of a most specific inverse linear derivation and still have an inductive conclusion. 4 The Method In this section a method, which in an easy way realizes inverse linear derivations, will be described. Instead of performing an inverse linear derivation from the example clause E, a variant of ordinary resolution derivation is performed from the complement E of E. 4.1 Complement A definite program clause complement set (dpcc-set) is set of clauses containing exactly one unit goal and a number of unit clauses. Let C be a definite program clause (Af-Bt, ... ,B n), as- l an inverse Skolem substitution including all Skolem functions in C, and as a Skolem substitution including all the universally quantified variables in C. Then the complement C of C is the definite program clause complement set {( f--A),(B 1f- ), ... ,(Bnf-) las-lac. Let S be a dpcc-set {(f--A),(Blf--), ... ,(Bnf--)}, ac- l an inverse Skolem substitution including all Skolem functions in S, and as a Skolem substitution including all the universally quantified variables in S. Then the complement S of S is the definite program clause (Af--Bl, ... ,Bn)ac-las. Thus, the complement of a dpcc-set is a definite program clause and vice versa. Example: Let C be the clause (p(a,x)f--q(k,x,y». Then the complement C of C is the definite program clause complement set {(f--p(a,kx»,(q(Xk,kx,ky)f-)}, which is obtained by application of the inverse Skolem substitution {klXk} and the Skolem substitution {x!kx,ylky} on the set of clauses {(f-p(a,x»),(q(k,x,y)f--)}. The complement C' of C is the definite program clause (p(a,x')f--q(k',x',y'), which is obtained by application Qf the inverse Skolem substitution {kx/x',ky/Y'} and the Skolem substitution 615 {xidk'} on the clause (p(a,kx)f-q(Xk,kx,ky)). The clause C is a variant of C, since C=C8 where 8 is the renaming substitution {x/x',y/y'}. 4.2 Clause Set Resolution The notion of optional clauses will be used a similar same way as optional literals. A set of clauses {Cl, ... ,Ck,Ck+l, ... ,Cn}, in which the clauses { C k + 1 , ... , C n } are optional, is denoted S[s]={ Cl, ... ,Ck,[Ck+l, ... ,C n]} where S={Cl. ... ,Ck} and S={Ck+I. ... ,Cn }. Consequently, if s=0 then S[s]=S. An elementary clause set L is a set of clauses containing at most one clause, that is L=0 or L={ C} where C is a clause. Let Si[Si] be a clause set and L an elementary clause set. Then Si+l[Si+11 is clause set resolved from Si[Si] and L, denoted (Si[Si];L) I-CSR Si+l[Si+l]' if the following conditions hold: (a) C is a variant of a clause C in Si[Si]UL. (b) D is a clause in Si[SJ. (c) R is a resolvent of C' and D. (d) 1:1 is the elementary clause set of unit clauses in {C,D}. (e) Si+l[Si+Il is the clause set Si+l=(Si-{C,D})u{R}, Si+l=Siul:1. If D is a definite goal then R will also be a definite goal, and we say that Si+dsi+l] is backwardly clause set resolved from Si[Si] and L. If both C and D are definite program clauses then R will also be a definite program clause, and we say that Si+ 1[Si+ 11 is forwardly clause set resolved from Si[Si] and L. Let SO[SO] be a clause set and P a definite program. A clause set derivation from So[SO] and P consists of a sequence SO[so],Sl[sIl, ... of clause sets, and a sequence Ll.L2, ... of elementary clause sets, such that each Li is a subset of P and each clause set Si+dsi+Il is clause set resolved from Si[Si] and Li+l. A clause set derivation of Sk[Sk] from SO[SO] and P is denoted: (SO[SO];Ll) I-CSR (Sl[SI];L2) I-CSR ." I-CSR Sk[skl or (SO[SO];P) I-CSR* Sk[skl. A backward clause set derivation is a clause set derivation where each Si[Sj] is backwardly clause set resolved, and a forward clause set derivation is a clause set derivation where each Si[Si] is forwardly clause set resolved. Example: Let So={ (f-p(k)),(q(k)f- ),(r(k)f-)} and C=(p(x)f-r(x),s(x). Then we have the following backward clause set derivation: (SO;{C}) I-CSR ({(f-r(k),s(k),(q(k)f-),(r(k)f-)};0) I-CSR {(f-s(k»,(q(k)f- ),[(r(k)f-)]}. 4.3 The Algorithm Let P be a definite program and E a definite program clause which should but does not follow from P (P I:;t: E). Our algorithm to produce an inductive conclusion H looks as follows. Completion of Refutation Proof Algorithm: 1. Compute the complement E of E, which is a dpcc-set. 2. Perform a clause set derivation from P and E of a dpccset H'[h']. 3. Compute the complement H[h'] of H[h'], which is a definite program clause. 4. Perform a 8Tl-truncation of H[h'] to obtain H. The generalization performed in steps 1-3, is called a generalization, which in fact is equivalent to performing a most specific inverse linear derivation. Reconsider the definition of most specific inverse linear resolved in section 2. Let {AI, ... ,Am}=C-{A} and {Bl, ... ,B n }=R-{Al, ... ,A m}8A11l- 1. Then the clause D[d]={ Bl, ... ,B n }u{ A} 8AI aS2U[ {AI, ... ,Am }8AITl- 1] is most specific inverse resolved from R={Al, ... ,A m}9AlTl- 1u{B}, ... ,B n} and C={A }u{Al. ... ,A m}. The corresponding reformulation generalization looks as follows: 1. The complement R of R is the dpcc-set ({ { Ad, ... ,{ Am} }9A11l- lu {{ Bd, ... ,{ Bn} DaSI-laR where aSl- 1 is an inverse Skolem substitution including all Skolem functions in Rand aR is a Skolem substitution including all universally quantified variables in R. 2. The following clause set derivation is performed: ( R;{C}) I-CSR* D[d] where D=({{ Bl}, ... ,{ Bn}}u{{A}9All and d=[{ { Ad, ... ,{ Am} }8AlTl- l ])asl-laR. 3. The complement D[d] of D[d] is the definite program clause ({Bl, .. ·,Bn}u{ A }9AlaS2U[ {Al, ... ,A m}9AlTl- l ])9 where aS2 is a Skolem substitution including all universally quantified variables in D[d]aSl and 8 is the renaming substitution 8=aSl- 1 })/x} and the existential substitution {kw/c} and by dropping the optional literals, DI'=(r(x,z')~s(c,z')) is obtained. If the last negative literal in DI' also is dropped then D2'=(r(x,z')~) is obtained. Steps 2 and 4 in the completion of refutation proof algorithm are indeterministic. The use of a preference bias can make them deterministic. Such a preference bias must specify which clause set is the most preferable result of the clause set derivation (reformulation bias), and which generalization should be done in the STl-truncation (truncation bias). The algorithm is implemented in a system, called CRPl, in which a depth first search is used to find the best dpcc-set H[h'] according to some given preference bias. 4.4 Integrating Top down and Bottom up Induction Backward inverse linear derivations correspond to top down induction, and forward inverse linear derivations correspond to bottom up induction. In our method, backward clause set derivations correspond to top down induction, and forward clause set derivations correspond to bottom up induction. Each step in a clause set derivation can be either backwardly or forwardly clause set resolved. Consequently, in our method (and in the system CRP1) top down and bottom up induction are completely integrated. Example: Let E=(p~q,t,u) and P={ (p~q,r),(s~t,u)}. Then the inductive conclusion HI =(r~t,u) is inferable by top down induction (backward inverse linear derivation), but not by bottom up induction (forward inverse linear derivation). The inductive conclusion H2=(P~q,s) is inferable by bottom up induction, but not by top down induction. The inductive conclusion H3=(r~s) can only be inferred by a method that combines top down and bottom up induction. With our algorithm the clause H3 is constructed as follows: E of E is the dpcc-set 1. The complement {(~p),(q~ ),(t~),(u~)}. 2. The following clause set derivation is performed: ( E; {(p~q,r)}) I--CSR ({ (~q,r),(q~ ),(t~ ),(u~) };0) I--CSR ({ (~r),(t~ ),(u~),[(q~ )]};{ (s~t,u)}) I--CSR ({(~r),(s~u),(u~),[(t~),(q~)]};0) I--CSR H3[h3] where H3[h3]={ (~r),(s~ ),[(u~ ),(t~ ),(q~)]} 3. The complement H3[h3] of H3[h3] is the definite program clause (r~s,[u,t,q]). 4. By dropping the optional literals H3=(r~s) is obtained. The first two steps in the clause set derivation are backwardly clause set resolved (top down induction) and the last two steps are forwardly clause set resolved (bottom up induction). 5 Concluding Remarks Some extensions to the inverse resolution framework and a new inverse resolution method have been presented. This method subsumes the previous methods based on inverse resolution and completely integrates top down and bottom up induction. 617 Reconsider the definition of inverse resolved in section 2. References If we let A be a positive literal, SA2=0 and r=(C-{A})SA then it is a definition of the absorption operator [Muggelton and Buntine 1988]. If we let A be a positive literal, SA2=0, r=0 and SB- 1=0 then it is a definition of elementary saturation [Rouveirol and Puget 1990]. The saturation operator [Rouveirol and Puget 1990] is equal to an exhaustive forward inverse linear derivation, in which each step is restricted according to elementary saturation. If we let A be a positive literal, SA2=0, r=(C-{A})SA and SB- 1=0 then it is a definition of the learning procedure called generalize in [BaneIji 1991]. If we let A be a negative literal, SA2=0 and r=(C-{A})SA then it is a definition of the identification operator [Muggelton and Buntine 1988]. Since our method performs inverse linear derivations without any restrictions on A, SA2, r or SB- 1, all the methods mentioned above can be seen as special cases of our method. Our notion of optional literals is the same as in [Rouveirol and Puget 1990]. Our Srt-truncation is similar to the truncation generalization in [Rouveirol and Puget 1990] and the truncation operator in [Muggelton and Buntine 1988], which both correspond to S-subsumption. Wirth [Wirth 1989] and Rouveirol [RouveiroI1991] have both pointed out the advantages of combining top down and bottom up induction. In [Wirth 1989], a system called LFP2, which uses both top down and bottom up induction is presented. However, the different induction strategies are separated into different parts of the system. The first part (top down) is based on completion of partial proof trees, while the second part (bottom up) is based on operators performing inverse resolution. The second part uses the result from the first part, and different types of bias are used in the different parts. Our method has the major advantage that the two different induction strategies are completely integrated, which not only eliminates the restrictions that they imply when separated, but also makes possible the use of an overall preference bias. The main contributions of this research are: 1. A complete integration of top down and bottom up induction. 2. Introduction of existentially quantified variables, which makes it possible to uniquely determine the most specific inverse resolvent. 3. A method to perform inverse resolution for full Hom clause logic by using resolution. [Ali 1989] K. M. Ali, "Augmenting Domain Theory for Explanation Based Generalization" in Proceedings of the 6th International Workshop on Machine Learning, Morgan Kaufmann, 1989. [Banerji 1991] R. B. Banerji, "Learning Theoretical Terms" in Proceedings of International Workshop on Inductive Logic Programming, 1991. [Dejong and Mooney 1986] G. Dejong and Mooney, "Explanation-Based Learning: An Alternative View" in Machine Learning 1: 145-176, 1986. [Genesereth and Nilsson 1987] Nilsson and Genesereth, Logic Foundations of Artificial Intelligence, Morgan Kaufmann, 1987. [Hall 1988] R. J. Hall, "Learning by Failing to Explain: U sing Partial Explanations to Learn in Incomplete or Intractable Domains" in Machine Learning 3: 45-77, 1988. [Lloyd 1987] J. W. Lloyd, Foundations of Logic Programming (second edition), Springer-Verlag, 1987. [Mitchell et al. 1986] T. M. Mitchell, S. Kedar-Cabelli and R. Keller, "Explanation-Based Generalization: A Unifying View" in Machine Learning 1: 47-80, 1986. [Muggleton and Buntine 1988] S. Muggleton and W. Buntine, "Machine Invention of First-order Predicates by Inverting Resolution" in Proceedings of the 5th International Conference on Machine Learning, Morgan Kaufmann, 1988. [Robinson 1965] J. Robinson, "A Machine-oriented Logic Based on the Resolution Principle" in Journal of ACM 12(1), 1965. [Rouveirol 1990] C. Rouveirol, "Saturation: Postponing Choices when Inverting Resolution" in Proceedings of the 9th European Conference on Artificial Intelligence, Pitman, 1990. [Rouveirol 1991] C61ine Rouveirol, "ITOU: Induction of First Order Theories" in Proceedings of International Workshop on Inductive Logic Programming, 1991. [Rouveirol and Puget 1990] C. Rouveirol and J. F. Puget, "Beyond Inversion of Resolution" in Proceedings of the 7th International Conference on Machine Learning, Morgan Kaufmann, 1990. [Sammut and Banerji 1986] C. Sammut and R. Banerji, "Learning concepts by asking questions" in Michalski, Carbonell and Mitchell (eds), Machine Learning: an artificial intelligence approach volume 2, Morgan Kaufmann, 1986. [Wirth 1988] R. Wirth, "Learning by Failure to Prove" in Proceedings of the 3rd European Working Session on Learning, Pitman, 1988. [Wirth 1989] R. Wirth, "Completing Logic Programs by Inverse of Resolution" in Proceedings of the 4th European Working Session on Learning, Pitman, 1989. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 618 A Machine Discovery from Amino Acid Sequences by Decision Trees over Regular Patterns Setsuo Arikawat t + tt ++ Satoru Kuhara+ Ayumi Shinoharat Research Inst.it.ut.e of Fundamental Information Science, Kyushu University 33, Fukuoka 812, Japan. Graduate School of Genetic Resources Technology, Kyushu University 46, Fukuoka 812, Japan. Depart.ment of Information Systems, Kyushu Universit.y 39, Kasuga 816, Japan. Depart.ment of Art.ificial Intelligence, Kyushu Institute of Technology, Iizuka 820, Japan. Abstract This paper describes a machine learning syst.em that discovered a "negat.ive mot.if", in transmembrane domai~ ident.ificat.ion from amino acid seqnences, and report.s it.s experiments on protein dat.a using PIR database. We int.roduce a decision tree whose nodes are labeled wit.h regular pat.terns. As a hypothesis, t.he system produces such a decision tree for a small number of randomly chosen posit.ive and negat.ive examples from PIR. Experiments show t.hat our syst.em finds reasonable hypotheses very successfully. As a theoret.ical foundat.ion, we show t.hat. the class of languages defined by decision trees of depth at most dover k-variahle regular patterns is polynomialtime learnable in the sense of probably approximately correct (PAC) learning for any fixed d, k ~ O. 1 Satoru Miyanot Yasuhito Mukouchi tt Takeshi Shinohara++ Introduction Hydrophobic transmembrane domains can be ident.ified by a very simple decision tree over regular patterns. This result was discovered by the machine learning system we developed. The system takes some training sequences of positive and negative examples, and produces a hypothesis explaining them. When a small number of positive and negative examples of transmembrane domains were given as input, our system found a small decision tree over regular patterns as a hypothesis. Although the hypothesis is made from just 10 positive and 10 negative examples, it can explain all data in PIR database [PIR] with high accuracy more than 90%. The hypothesis exhibits that "two consecutive polar amino acids" (Arg, Lys, His, Asp, Glu, GIn, Asn) are not included in the tl'ansmembrane domains. This indicates that significant Email addresses: arikawaOrifis.sci.kyushu-u.ac.jp kuharaOgrt.kyushu-u.ac.jp miyanoOrifis.sci.kyushu-u.ac.jp mukouchiOrifis.sci.kyushu-u.ac.jp ayumiOrifis.sci.kyushu-u.ac.jp shinoOdonald.ai.kyutech.ac.jp motifs are not in the inside of the transmembrane domains but in the outside. We call such motifs "negative motifs." This paper describes a machine learning system t.ogether wit.h a background theory that discovered such negative motifs, and reports its experiments on knowledge acquisition from amino acid sequences that reveal the importance of negative data. Traditional approaches to motif-searching are to find subsequences common to functional domains by various alignment techniques. lIenee the eyes are focused only on positive examples, and negative examples are mostly ignored. Our approach by decision trees over regular patterns provides new direction and method for discovering motifs. A regular pattern [Shinohara 1982, Shinohara 1983] is an expression WOXI WI X2 ••• Xn Wn that defines the sequences containing Wo, Wt, ... , Wn in this order, where each 'IDi is a sequence of symbols and Xj varies over arbitrary sequences. Regular patterns have been used to describe some features of amino acid sequences in PROSITE database [Bairoch 1991] and DNA sequences [Arikawa et al. 1992, Gusev and Chuzhanova 1990]. Our view to these sequences is through such regular patterns. A decision tree over regular patterns is a tree which describes a decision procedure for determining the class of a given sequence. Each node is labeled with either a class name (lor 0) or a regular pattern. At a node with a regular pattern, the decision tree tests if the sequence matches the pattern or not. Starting from the root toward a leaf, the decision procedure makes a test at each node and goes down by choosing the left or right branch according to the test result. The reached leaf answers the class name of the sequence. Such decision trees are produced as hypotheses by our machine learning system. Since the system searches a decision tree of smaller size, regular patterns on the resulting decision tree exhibit motifs which play a significant role in classification. Hence, compared with neural network approaches [Holly and Karplus 1989, Wu et al.]' our system shows important motifs in a hypothesis more explicitly. 619 We employ the idea of ID3 algorithm [Quinlan 1986, Utgoff 1989] for constructing a decision tree since it is sufficiently fast and experiments show that small enough trees are usually obtained. We also devise a new method for constructing a decision tree over regular patterns using another evaluation function. Given two sets of positive and negative examples, our machine learning system finds appropriate regular patterns as node attributes dynamically during the construction of the decision tree. Hence, unlike ID3, we need not assume any concrete knowledge about attributes and can avoid struggles from defining the attributes of a decision tree beforehand. Our system makes a decision tree just from a small number of training sequences, which we also guarantee with the PAC learning theory [Valiant 1984] in some sense. Therefore it may cope with a diversity of classification problems for proteins and DNA sequences. We made an experiment on raw sequences from twenty symbols of amino acid residues. The system discovered a small decision tree just from 20 sequences with more than 85% accuracy that show if a sequence contains neither E nor D (both are polar amino acids) then it is very likely to be a transmembrane domain. A hydropathy plot [Engelman et al. 1986, Kyte and Doolittle 1982, Rao and Argos 1986] has been used generally to predict transmembrane domains from primary sequences. With this knowledge, we first transform twenty amino acids to three categories (*, +, -) according to the hydropathy index of Kyte and Doolittle [1982]. From randomly chosen 10 positive and 10 negative training examples, our system has successfully produced some small decision trees over regular patterns which are shown to achieve very high accuracy. The regular patterns appearing in these decision trees indicate that two consecutive polar amino acid residues are important negative motifs for transmembrane domains. From the view point of Artificial Intelligence, it is quite interesting that the polar amino acid residues D and E were found by our machine learning system without any knowledge on the hydropathy index. 2 Let E be a finite alphabet and X = {x, y, Z, Xl, X2,' .. } be a set of variables. We assume that E and X are disjoint. A pattern is an element of (E U X)+, the set of all nonempty strings over E U X. For a pattern 7r, the language L( 7r) is the set of strings obtained by substituting each variable in 7r for a string in E*. We say that a pattern 7r is regular if each variable occurs at most once in 7r. For example, xaybza is a regular pattern, hut xx is not. Obviously, regular patterns define regular languages, but not vice versa. In this paper we consider only regular patterns. A regular pattern containing at most k variables is called a k-variable regular pattern. A decision tree over regular patterns is a binary tree such that the leaves are labeled with 0 or 1 and each internal node is labeled with a regular pattern (see Figure 1). For an internal node v, we denote the left and right children of v by left (v) and right( v), respectively. We denote by 7r( v) the regular pattern assigned to the internal node v. For a leaf tt, value( u) denotes the value o or 1 assigned to u. The depth of a tree T, denoted by depth(T), is the length of the longest path from the root to a leaf. For a decision tree T over regular patterns, we define a function fT : E* -+ {O, I} as follows. For a string w in E*, we determine a path from the root to a leaf and define the value fT( w) by the following algorithm: begin /* Input: w E E* */ v +- root; while v is not a leaf do if w E L( 7r( v)) then v +-right( v) else v +-left( v); fT( w) +- value( v) end For a decision tree T over regular patterns, we define = I}. It is easy to see that L(T) is also a regular language. But the converse is not true. Let L = {a 2n I n ~ I}. It is straightforward to show that there is no decision tree T over regular patterns with L = L(T). The same holds for the language {a 2n b I n ~ I}. L(T) = {w E E* I fT(W) 3 After knowing the importance of negative motifs, we examined decision trees with a single node with regular patterns XI-X2-' • '-X n for n ~ 3. The best is the pattern XI-X2-X3-X4-XS-X6 that gives the sequences containing at least five polar amino acids. The result is very acceptable. The accuracy is 95.4% for positive and 95.0% . for negative examples although it has been believed to be difficult to define transmembrane domains as a simple expression when the view point was focussed on positive examples. Decision Trees over Regular Patterns Constructing Decision Trees This section gives two kinds of algorithms for constructing decision trees over regular patterns that are used in our machine learning system. The first algorithm employs the idea of ID3 algorithm [Quinlan 1986] in the construction of decision trees. The ID3 algorithm assumes data together with explicit attributes in advance. On the other hand, our approach assumes a space of regular patterns which are simply generated by given positive and negative examples. No 620 Figure 1: Decision tree over regular patterns defining a language {ambna l I m,n,12 I} over ~ = {a,b} function DT1 ( P, N : sets of strings ): node; begin if N = 0 then return( CREATE("l", null, null) ) else if P = 0 then return( CREATE("O", null, null) ) else begin Find a shortest pattern 7r in II(P, N) that minimizes E(7r, P, N); PI +- P n L(7r); Po +- P - PI; NI +- N n L(7r); No +- N - N I ; return(CREATE(7r,DTl (Po, N o),DT1(P1 , Nd)) end end Algorithm 1 extra knowledge about data is required. Although the space may be large and contain meaningless attributes, our algorithm finds appropriate regular patterns from this space dynamically during the construction of a decision tree in a feasible amount of time. This is a point which is very suited for our empirical research. Let P and N be finite sets of strings with P n N = 0. Using P and N, we deal with regular patterns of the form 'tVOXl'WIX2 ••• XkWk such that 'tVa, ••• , 'Wk are substrings of some strings in PuN. Let II(P, N) be some family of such regular patterns made from P and N. The family II(P, N) is appropriately given and used as a space of attributes. For a regular pattern 7r E II( P, N), the cost E( 7r, P, N) is the one defined in [Quinlan 1986] by where PI (resp. nd is the number of positive examples in P (resp. negative examples in N) that match 7r, i.e., PI = IP n L(7r)I, nl = IN n L(7r)I, and Po (resp. no) is the number of positive examples in P (resp. negative examples in N) that do not match 7r, i.e., Po = IPnL(7r)I, no = IN n L(7r)I, L(7r) = ~* - L(7r), and I(x,y) o = { (if x = 0 or y = 0) x x y y . ---log - - - --log - - (otherwIse). x+y x+y x+y ct~t (b) Figure 2: A leaf is replaced by (a) or (b) for some pattern 1r. function DT2( P, N: sets of strings, AfaxNode: int ) : tree; begin if N = 0 then return( CREATE("l", null, null) ) else if P = 0 then return( CREATE("O", null, null) ) else begin T +-CREATE("l", null, null); while ( nodes(T) < !lfaxN ode and Score(T, P, N) < 1 ) do begin find Tmax E T(T) that maximizes Score(Tmax , P, N); T +- Tmax end end return ( T ) end Algorithm 2 x+y The first algorithm DT1(P,N) (Algorithm 1) sketches our decision tree algorithm for II(P, N), where CREATE( 7r, To, Td returns a new tree with a root labeled with 7r whose left and right subtrees are To and 1'1, respectively. The second algorithm uses a different evaluation function. For a decision tree T over regular patterns, let nodes(T) be the number of nodes in T, and T(T) be the set of trees constructed by replacing a leaf v of T by the tree of Fig. 2 (a) or Fig. 2 (b) for some pattern 7r. The score function Score(T, P, N) balances the infor- cf~ (a) mation gains in classification and is defined as S core (T , P'N) . = IP n L(T)I . IN n L(T)I IPI INI· The second algorithm DT2(P, N,MaxNode) (Algorithm 2) checks all leaves at each phase of a node generation using the evaluation function Score(T, P, N). Algorithm 2 is slower than Algorithm 1 since all leaves are checked at each phase of a node generation. However, 621 Algorithm 2 constructs decision trees which are finely tuned when the size of decision trees is large. Moreover, it is noise-tolerant, i.e., it allows conflicts between positive and negative training examples. If the size of Il(P, N) is polynomial with respect to the size of P and N, then these algorithms run in polynomial time. 4 Transmembrane Identification Domain The problem of transmembrane domain identification is one of the most important protein classification problems and some methods and experiments have been reported. For example, Hartman et a1. [1989] proposed a method using the hydropathy index for amino acid residue'S in [Kyte and Doolittle 1982]. The reported success rate is about 75%. Most approaches deal with positive examples, i.e., sequences corresponding to transmembrane domains, and try to find properties common to them. The sequence in Figure 3 is an amino acid sequence of a membrane protein. There is a tendency to assume that a membrane protein contains several transmembrane domains each of which consists of 20 '" 30 amino acid residues. Therefore, if a sequence corresponding to a transmembrane domain is found in an amino acid sequence, it is very likely that the protein is a membrane protein. Our idea for transmembrane domain identification is to use decision trees over regular patterns for classification. Algorithm 1 and 2 introduced in Section 3 are used to find good decision trees from positive and negative training examples. In order to avoid combinatorial explosion, we restrict the space of attributes to the regular patterns of the form xay, where x and yare variables and a is a substring taken from given examples. In our experiments, a positive example is a sequence which is already known to be a transmembrane domain. A negative example is a sequence of length around 30 cut out from the parts other than transmembrane domains. The length 30 is simply due to the reasonable length of a transmembrane domain. From PIR database our machine learning system chooses randomly two small sets P and N of positive and negative training examples, respectively. Then, at each trial, by using Algorithm 1 or Algorithm 2, the system tries to construct a small deci-' sion tree over regular patterns which classifies P and N exactly. We have evaluated the performance ratio of a produced decision tree in the following way. As the total space of positive examples, we use the set POS of all . transmembrane domain sequences (689 sequences) from PIR database. The total space NEG of negative examples consists of 19256 negative examples randomly chosen from all proteins from PIR. The success rate of a decision tree for positive examples is the percentage of the positive examples from POS recognized as positive (class 1). The success rate for negative examples is counted as the percentage of the negative examples from NEG recognized as negati\'f~ (class 0). Fignre 5 (a) is one of the sma llest df'cision tref'S discovf'rf'd hy our system just from 10 posit.ivc and 10 lwgative raw sequences that achif've good accuracy. The performance ratio is (81.8%,89.6%) for all data in POS and NEG, respectively. This decision trf'e suggests that if a seque'nce of length around :30 contains neitllfT D not E then it is Vf'ry likely to be a part of transmemhrane domain. The alphahet of amino acid sequences consists of tWf'nty symhols. It has bcen shown that the use of the hydropat hy indf'x for amino acids is Vf'ry successful [Arikawa et al. 1992, Hartmann et al. 1989]. According to the hydropathy indf'x of [Kyte and Doolittle 1982], we transform t l1<'se twent.y symbols to three symbols as shown in Tahle 1. This t.ransformation reduces the size of a search spa,e drastically small while less information is, fortunately, lost in classification. Then by this transformation table, the sequence in Figure 3 becomes the sequence in Figure 4. Figure 5 (b), (c) show two of the best decision trees over regular patterns that our machine learning system found from 10 positive and 10 negative training examples. The decision tree (b) recognizes 91.4% of positive examples and 94.8% of negative examples. Even the decision tree of (c) can recognize 92.6% of the positive examples and 91.6% of the negative examples. The negative motif "--" which indicates consecutive polar amino acid residues plays a key role in classification. This may have a close relation to the signal-anchor structure that consists of two parts, the hydrophobic part of a membrane-spanning sequence and the charged residues around the hydrophobic part [Lipp et al. 1989, Von Heijine 1988]. The decision tree (a) also shows the importance of a cluster of polar amino acids in transmembrane domain identification although our machine learning system assumed no knowledge ahout the hydropathy. VVe examined how the performance of our machine learning system changes with respect to the number of training examples. The training examples are chosen randomly ten times in each case and a point of the graph of Figure 6 is the average of these ten results for each case. Figure 6 shows the results. We may observe the following facts: 1. The hydropathy index of Kyte and Doolittle [Kyte and Doolittle 1982] is very useful. \Vhen indexed sequences are used, the system can produce from 40 positive and 40 negative examples a decision tree with only several nodes whose accuracy is more 622 MDVVNQLVAGGQFRVVKE(PLGFVKVLQWVFAIFAFATCGSY)TGELRLSVECANKTESALNIEVEFEYPFRLHQVYFDA PSCVKGGTTKIFLVGDYSSSAE(FFVTVAVFAFLYSMGALATYIFL)QNKYRENNK(GPMMDFLATAVFAFMWLVSSSAW A)KGLSDVKMATDPENIIKEMPMCRQTGNTCKELRDPVTS(GLNTSVVFGFLNLVLWVGNLWFVF)KETGWAAPFMRAPP GAPEKQPAPGDAYGDAGYGQGPGGYGPQDSYGPQGGYQPDYGQPASGGGGYGPQGDYGQQGYGQQGAPTSFSNQM Figure 3: An amino acid sequence which contains four transmembrane domains shown by the parenthesized parts. Amino Acids A MC F L V I P Y WS T G R K DEN Q H Hydropathy Index 1.8 4.5 -1.6 -0.4 -4.5 -3.2 New Symbol I"V -+ I"V -+ I"V -+ * + Table 1: Transformation rules *-**--***++-*-**--(+*+**-**-+********+*+++)++-*-*+*-**--+-+**-*-*-*-++*-*--*+*-* ++**-++++-****+-++++*-(***+*******++*+***++***)---+-----(++**-***+******+**+++*+ )-+*+-*-**+-+--**--*+**--++-+*--*--+*++(+*-++***+**-***+*+-*+***)--+++**+**-*++ +*+---+*++-*++-*+++-+++++++--++++-+++-+-++-+*++++++++-+-++--+++--+*+++*+--* Figure 4: The sequence obtained by the transformation no~yes [6.5%,3.9%] [4.4%,13.0%] (a) (84.8%, 89.6%) [91.4%,5.2%] [1.2%,3.2%] (b) (91.4%,94.8%) cD ® [92.6%,8.4%]. [7.4%,91.6%] (c) (92.6%, 91.6%) Figure 5: The node label, for example, -- is an abbreviation of XI--X2 that tests if a given sequence contains the sequence --. The leaf label 1 (resp. 0) is the class name of transmembrane domains (resp. non-transmembrane domains). The total space consists of 689 positive examples and 19256 negative examples. Each of the decision trees (a)-( c) is constructed from 10 positive and 10 negative training sequences. The pair [P%, n%] attached to a leaf shows that p% of positive examples and n% of negative examples have reached to the leaf. The pair (p%,n%) means that p% of 689 positive (resp. n% of 192.56 negative) examples are recognized as transmembrane domains (resp. non-transmembrane domains). than 90% for the total space in average. On the other hand, for raw sequences the accuracy is not so good but both accuracies approach to the same line as the number of training examples increases. 2. The number of nodes of a decision tree is reasonably small. But when the number of training examples is larger, the number of nodes in a decision tree becomes larger while the accuracy is not improved very much. There may arise the problem of overfitting. A new discovery obtained from these decision trees is that the motif "--" drastically rejects positive examples. After knowing the negative motif "--", we have examined the decision trees with a single node with the patterns of the form for n ~ 3. The best is the pattern containing "-" five times. The result is quite acceptable as shown in Table 2. 623 Accuracy (%) 100 ·-0'-" .--0'-" 90 indexed positive indexed negative raw positive' raw negative 20 80 Jit. ..... .... ••••• 70 ~: : :~: : : .: : : : : : : : : : : : : : Number of Nodes in Decision Tree 10 ••• 4-•••• "" raw indexed If:~····-u: 20 40 60 80 100 Number of Training Examples Figure 6: Relations between the number of training exa.mples, accuracy and the number of nodes in a decision tree Pattern POS (689) NEG (19256) 18296 (95.0%) With these decision trees over regular patterns, we have developed a transmembrane domain predictor that reads an amino acid sequence of a protein as an input and predicts symbol by symbol whether each location of a symbol is in a transmembrane domain or not. Experiments on all protein sequences in PIR show that the success rate is 85% 90%. f'V 5 PAC-Learnable Class This section provides a theoretical foundation on the classes of sets classified by decision trees over regular patterns from the point of algorithmic learning theory [Valiant 1984]. For integers k, d ~ 0, we consider a decision tree T over k-variable regular patterns whose depth is at most d. We denote by DTRP(d, k) the class of languages defined by decision trees over k-variable regular patterns with depth at most d. Theorem 1 DTRP( d, k) is polynomial-time learnable for alld, k ~ o. We need some terminology for the above theorem. When we are concerned with learning, we call a subset of :E* a concept. A concept class C is a non empty collection of concepts. For a concept c E C, a pair (x, c(x)) is called an example of c for x E :E*,where c(x) = 1 (c(x) = 0) if x is in c (is not in c). For an alphabet :E and an integer n ~ 0, :E$n denotes the set {x E :E* Ilxl ::; n}. A concept class C is said to be polynomial-time learnable [Blumer et al. 1989, Natarajan 1989, Valiant 1984] if there is an algorithm A which satisfies (1) and (2). (1) A takes a sequence of examples as an input and runs in polynomial-time with respect to the length of input. (2) There exists a polynomial p("".) such that for any integer n ~ 0, any concept c E C, any real number e, 8 (0 < e,8 < 1), and any probability distribution P on :E$n, if A takes p(n,;, examples which are generated randomly according to P, then A outputs, with proba.bility. at least 1 - 8, a representation of a hypothesis h with P(cEB h) < e. t) Theorem 2 [Blumer et al. 1989, Natarajan 1989] A concept class -C is polynomial-time learnable if the following conditions hold. 1. C is of polynomial dimension, i.e., there is a polynomial d(n) such that I{c n :E$n IcE C}I :::; 2d(n) for all n ~ O. 2. There is a polynomia.l-time algorithm called a polynomial-time hypothesis finder for C which produces a hypothesis from a sequence of examples such that it is consistent with the given examples. 624 Moreover, the polynomial-time hypothesis finder for C is a learning algorithm satisfying (1) and (2) if C satisfies 1. The following lemma can be easily shown. Lemma 1 Let T be a decision tree over regular patterns and Tv be a subtree of T at node v. We denote Tv by 7r(To, TJ), where 7r is the label of node v and To, Tl are the left and right subtrees of Tv, respectively. Let S be a set of strings and let T' be the tree obtained from T by replacing Tv wit.h To at node v. If no string in S matches 7r, then L(T) n S = L(T') n S. Proof of Theorem 1. First we show that the concept class DTRP( d, k) is of polynomial dimension. Let DTRP(d, I.~)n = {L n ~~n I L E DTRP( d, I.:)} for n 2: O. \Ve evaluate the cardinality of DTRP(d, k)n. Let 7r be a regular pattern with 17r1 > n + k, then no string of length at most n matches 7r. By Lemma 1, we need to consider only regular patterns of length at most n + k. The number of such patterns is roughly bonnded by (I~I + 1 )n+k. Since a tree of depth bounded by d has at most 2d - 1 internal nodes and at 2d most 2d leaves, IDTRP(d, k)nl ::; ((I~I + 1)n+k?d- 1 ·2 This shows that the dimension of DTRP(d, k)n is O(n). Next we show that there is a polynomial-time hypothesis finder for DTRP(d, k). Let P and N be the sets of strings which appear in positive and negative examples, respectively. Let TI (I.:, P, N) be the set of regular patterns 7r up to renaming of variables such that it contains at most k variable occurrences and 7r() is a substring of some s in PuN. By Lemma 1, we need to consider only patterns in TI( I.~, P, N) in order to find a decision tree over regular patterns which is consistent with P and N. Then ITI(k, P, N)I ::; LSEPUN((lsI 2 )k+ 1 ). Therefore the number d 2d of possible trees is bounded by (ITI(k, P, N)1)2 -l . 2 , which is bounded by a polynomial with respect to the input length LSEPUN lsi· It is known that, given a regular pattern 7r and string W, we can decide in polynomial time whether w matches 7r or not. Therefore, given a string wand a decision tree T over k-variable regular patterns whose depth is at most d, we can decide whether w E L(T) or not in polynomial-time. The required polynomial-time algorithm enumerates decision trees T over regular patterns in TI(k, P, N) with depth at most d. Then it checks whether s E L(T) for each s E P and t ~ N for each tEN. If such a tree T is found, the algorithm outputs T as a hypothesis. D \Ve should say that the polynomial-time learning algorithm in the proof of Theorem 1 exhausts an enormous amount of time and is not suited for practical use. We may understand the relationship of Algorithms 1 and 2 in Section 3 to Theorem 1 in the following way: \Vhen we set n(p, N) to be the family of k-variahle regular patterns made from P and N, Algorithms 1 and 2 run sufficiently fast in practicalnse (of conrse, in polynomialtime) and produce a decision tree over I.~-variable regular patterns which classifies given positive and negative examples. But the produced decision tree is not guaranteed to be of depth at most d. Hence, these algorithms are not any learning algorithm in the exact sense of (2). However, experiences tell that these algorithms usually find small enough decision trees over regular patterns in our experiments on transmembrane domains. For the class DTRP(d, I.:), Theorem 2 asserts that if a polynomial-time algorithm A produces a decision tree over k-variable regular patterns with depth at most d which classifies given positive and negative examples then it is a polynomial-time learning algorithm. In this sense, we may say that Algorithms 1 and 2 are polynomial-time algorithms for DTRP(d, I.~) which often produce reasonahle hypotheses although there is no mathematical proof showing how often snch small hypotheses are obtained. This aspect is very important and useful when we are concerned with machine discovery. Ehrenfeucht and Haussler [1989] have considered learning of decision trees of a fixed rank. For learning decision trees over regular patterns, the restriction by rank can be shown to have no sense. Instead, we consider the depth of a decision tree. It is also reasonable to put a restriction on the number of variables in a regular pattern. It has been shown that the class of regular pattern languages is not polynomial-time learnable unless NP =1= RP [Miyano et al. 1991]. Therefore, unless restrictions such as bound on the number of variables in a regular pattern are given, we may not expect any positive results for polynomial-time learning. 6 Conclusion We have shown that the idea of combining regular patterns and decision trees works quite well for transmembrane domain identification. The experiments also have shown the importance of negative motifs. A union of regular patterns is regarded as a special form of a decision tree called a decision list. We have reported in [Arikawa et al. 1992] that the union of small number of regular patterns can also recognize transmembrane domains with high accuracy. However, the time exhausted in finding hypotheses in [Arikawa et al. 1992] is much larger than that reported in this paper. Our system constructs a decision tree over regular patterns just from strings called positive and negative examples. We need not take care of which attribut~s to specify as in ID3. Therefore it can be applied to 'another classification problems for proteins and DNA sequences. We believe that our approach provides a new application 625 of algorithmic learning to Molecular Biology. We are now in the process of examining our method for some other related problems s11ch as predicting the secondary structure of proteins. [Lipp et al. 1989] J. Lipp, N. Flint, M.T. Haellptle and B. Dobberskin. Structural Requirements for Membrane Assemhly of Proteins Spanning the 1\lembrane Several Times. J. Cell BioI., Vol. 109 (1989), pp. 2013-2022. References [Miyano et al. 1991] S. Miyano, A. Shinohara and T. Shinbhara. \Vhich Classes of Elementary Formal Systems are Polynomial-Time Learnable? In Proc. 2nd Algorithmic Learning Theory, Tokyo, 1991. pp. 139-150. [Arikawa et al. 1992] S. Arikawa, S. Kuhara, S. Miyano, A. Shinohara and T. Shinohara. A Learning Algorithm for Elementary Formal Systems and its Experiments on Identification of Transmembrane Domains. In Proc. 25th Hawaii Int. Conf. on Sys. Sci, IEEE, Hawaii, 1992. pp. 675-684. [Rairoch 1991] A. Bairoch, PROSITE: A Dictionary of Sites a.nd Patterns in Proteins, sl Nucleic Acids Res., Vol. 19 (1991), pp. 2241-2245. [Blumer et al. 1989] A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. ·Warmuth. Learnahility and the Vapnik-Chervonenkis Dimf'nsion. JACM, Vol. 36 (1989), pp. 929-965. [Ehrenfeucht and Haussler 1989] A. Ehrenfeucht and D. Haussler. Learning Decision Trees from Random Examples. Inform. Comput., Vol. 82 (1989), pp. 231-246. [Engelman et al. 1986] D.M. Engf'lman, T.A. Steiz and A. Goldman. Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins. Ann. Rev. Biophys. Biophys. Chem., Vol. 15 (1986), pp. 321-353. [Gusev and Chuzhanova 1990] V. Gusev, N. Chuzhanova. The Algorithms for Recognition of the Functional Sites in Genetic Texts. In Proc. 1st "Vorkshop on Algorithmic Learning Theory, Tokyo, 1990. pp. 109-119. [Hartmann et al. 1989] E. Hartmann, T.A. Rapoport and H.F. Lodish. Predicting the Orientation of Eukaryotic Membrane-Spanning Proteins. In Proc. Natl. Acad. Sci. U.S.A., Vol. 86 (1989), pp. 57865790. [Holly and Karplus 1989] L.H. Holly and M. Karplus. Protein Secondary Structure Prediction with a Neura.l Network, In Proc. Nat!. Acad. Sci. USA, Vol. 86 (1989), pp. 152-156. [Kyte and Doolittle 1982] J. Kyte and R.F. Doolittle. A Simple Method for Displaying the Hydropathic Character of Protein. J. Mol. BioI., Vol. 157 (1982), pp. 10.5-132. [Natarajan 1989] n.K. Natarajan, On Learning Set.s and Functions. AJachine Learning, Vol. 4 (1989), pp. 6797. [PIR] Protein Identification Resource, National Biomf'dical Resf'arch Foundation. [Quinlan 1986] J.R. Quinlan, Induction of Decision Trees. l\larhine Learning, Vol. 1 (1986), pp. 81-106. [Quinlan and Rivest 1989] J.R. Quinlan and R. L. Rivest. Inferring Decision Trees using the 1\finimum Description Length Principle. Inform. Comput., Vol. 80 (1989), pp. 227-218. [Rao and Argos 1986] J.K.1\1. Rao and P. Argos. A Confirmationa.l Preferf'nce Paramder to Predict Helices in Integral 1\lemhrane Proteins. Biorhim. Biophys. Acta, Vol. 869 (1986), pp. 197-214. [Shinoha1'a 1982] T. Shinohara. Polynomial Time Inference of Pattern Languages and its Applicat ions. In Proc. 7th IBM Symp. !lJathematiral Foundations of Computer Science, 1982. pp. 191-209. [Shinohara 1981] T. Shinohara. Polynomial Time Inference of Regular Pattern Languages. In Proc. RL\JS Symp. SoftWAre Science and Engin('('ring (Lecture Notes in Computer Science, Vol. 147), 1983. pp. 115-127. [Utgoff 1989] P.E. Utgoff. Incrf'mental Induction of Decision Tree. MAchine Learning, Vol. 4 (1989), pp.161-186. [Valiant 1981] L. Valiant. A Theory of the Lf'arna hIe. Commun. A.CM, Vol. 27 (1981), pp. 113/1-1142. [Von Heijine 1988] G. von Heijine. Transcending the Impenetrahlf': How Proteins Come to Terms with Membranes. Biochim. Biophys. A.cta, Vol. 9-17 (1988), pp. 107-333. [\Vu et al.] C.H. \Vu, G.M. \Vhiston and G ..J. :Montllor. PROCANS: A Protein Classification System Using a Neural Network, J.JCNN Int. Joint Conf. N(,l1ral Networks, Vol. 2 (1990), pp. 91-96. PROCEEDINGS OF THE INTERN A TIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 626 Efficient Induction of Version Spaces through Constrained Language Shift Claudio Carpineto Fondazione Ugo Bordoni Via Baldassarre Castiglione 59, 00142 - Rome, ITALY fubdpt5@itcaspur.bitnet Abstract A large hypothesis space makes the version space approach, like any other concept induction algorithm based on hypothesis ordering, computationally inefficient. Working with smaller composable concept languages rather than one large concept language is one way to attack the problem, in that it allows us to do part of the induction job within the more convenient languages and move to the less convenient languages when necessary. In this paper we investigate the use of multiple concept languages in a version space approach. We define a graph of languages ordered by the standard set inclusion relation, and provide a procedure for efficiently inducing version spaces while shifting from small to larger concept languages. We apply this method to the attribute languages of a typical conjunctive concept language (i.e., a conjunctive concept language defined on a tree-structured attribute-based instance space) and compare its complexity to that of a standard version space algorithm applied to the full concept language. Finally we contrast our approach with other work on language shift, outlining an alternative highlyconstrained strategy for searching the space of new concepts which is not based on constructive operators. 1 Introduction Of all the algorithms for incremental concept induction that are based on the partial order defined by generality over the concept space, the candidate elimination (CE) algorithm [Mitchell 1982] is the best known exemplar. The CE algorithm represents and updates the set of all concepts that are consistent with data (Le. the version space) by maintaining two sets, the set S containing the maximally specific concepts and the set G containing the maximally general concepts. The procedure to update the version space is as follows. A positive example prunes concepts in G which do not cover it and causes all concepts in S which do not cover the example to be generalized just enough to cover it. A negative example prunes concepts in S that cover it and causes all concepts in G that cover the example to be specialized just enough to exclude it. As more examples are seen, the version space shrinks; it may eventually reduce to the target concept provided that the concept description language is consistent with the data. This framework has been later improved along several directions. The first is that of incorporating the domain knowledge available to the system in the algorithm; this has resulted in feeding the CE algorithm with analytically- generalized positive examples (e.g., [Hirsh 1989], [Carpineto 1990]), and analytically-generalized negative examples (e.g., [Carpineto 1991]). Another research direction is to relax the assumption about the consistency of the concept space with data. In fact, like many other learning algorithms, the CE algorithm uses a restricted concept language to incorporate bias and focus the search on a smaller number of hypotheses. The drawback is that the target concept may be contained in the set of concepts that are inexpressible in the given language, thus being unleamable. In this case the sets Sand G become empty: to restore consistency the bias must be weakened adding new concepts to the concept language [Utgoff 1986]. Thirdly, the CE algorithm suffers from lack of computational efficiency, in that the size of S and G can be exponential in the number of examples and the number of parameters describing the examples [Haussler 1988]. Changes to the basic algorithm have been proposed that improve efficiency for some concept language [Smith and Rosenbloom 1990]. In this paper we investigate the use of multiple concept languages in a version space approach. By organizing the concept languages into a graph corresponding to the relation larger-than implicitly defined over the sets of concepts covered by the languages, we have a framework that allows us to shift from small· to larger concept languages in a controlled manner. This provides a powerful basis to apply a general divide-and-conquer strategy to improve the efficiency of a standard version space approach in which the concept description language is factorizable. The idea is to start out with the smallest concept languages (Le., the factor languages) and, once the version spaces induced over them have become inconsistent with the data, to move along the graph of product languages to the maximally small concept languages that restore consistency. Working with smaller concept languages may greatly reduce the size of S and G, thus resulting in a neat improvement in efficiency. On the other hand, use of several languages in parallel and language shifts negatively affect complexity. Therefore the two main objectives of the paper are : (1) define a set of languages and a procedure for inducing version spaces after any language shift efficiently, (2) show that in some cases this method may be applied to reduce the complexity of the standard CE algorithm. Since this framework supports version-space induction over a set of concept languages, it can also be suitable to handle inconsistency when the original concept language is too small. More generally, it suggest an alternative approach to inductive language shift in which the search for useful concepts to be added to the concept language is not based on constructive operators. This aspect is also discussed in the paper. 627 any suit anyrank "\ ../"\ /"\ / black / • • • "\ face red J numbered //"\ //.. ~ Q K 1 2 10 Fig. 1. Two concept languages in the playing cards domain. The rest of the paper is organized as follows. In the next section we define a graph of conjunctive concept languages and describe the learning problem with respect to it. Then we present the learning method. Next, we apply the method to the factor languages of a conjunctive concept language defined on a tree-structured attribute-based instance space, and evaluate its utility. Finally we compare this work to other approaches to factorization in concept induction and to inductive language shift. Incrementally Find The version spaces in the set of product concept languages that are consistent with data and that contain the smallest number of factors. 2 The learning problem We first introduce the notions that characterize our learning problem. In the following concepts are viewed as sets of instances and languages as sets of concepts. A concept c 1 is more general than a concept c2 if the set of instances covered by c 1 is a proper superset of the set of instances covered by c2. A language LI is larger than a language L2 if the set of concepts expressible in LI is a proper superset of the set of concepts expressible in L2. In the playing cards domain, which we shall use as an illustration, two possible concept languages are: LI = {anysuit, black, red, "', ... , ., • } and L2 = {anyrank, face, numbered, J, Q, K, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1O}. The relation more-general-than over the concepts present in each language is shown in fig 1. The product L1,2 of two factor languages LI and L2 is the set of concepts formed from the conjunctions of concepts from Ll and L2 (examples of product concepts are 'anyrank-anysuit', 'anyrank-black', etc). The number of concepts in the product language is therefore the product of the number of concepts in its factors. Also, a concept cel' c2' in the language LI2 is more general than (» another co~cept ccl",c2" if and on'ly if CI' > Cll! and c2'> c21!. With n initial languages it is possible to generate Lk=l,n n! / (n - k)! k! = 2 0 - 1 product languages (see fig. 2). Moreover, given that the superconcept 'any' can always be added to each factor language, the relation larger than over this set of languages can be immediately established, for each product language is larger than any of its factor languages. The learning problem can be stated as follows. A set of factor concept languages A set of positive instances. A set of negative instances. Fig. 2. The graph of product languages with three factor languages. 3. The learning method In this approach concept learning and language shift are interleaved. We process one instance at a time, using a standard version space approach to induce consistent concepts over each language of the current set (initially, the n factor languages). During this inductive phase some concept languages may become inconsistent with the data. When every member of the current set of languages has become inconsistent with data, the language shifting algorithm is invoked. It iteratively selects the set of maximally small concept languages that are larger than the current ones (i.e. the two-factored languages, the threefactored languages, etc.) and computes the new version spaces in these languages. It halts when it finds a consistent set of concept languages (i.e. a set in which there is at least one consistent concept language); then it returns control to the inductive algorithm to process additional examples.The whole process is iterated as long as the set of current languages can be further specialised (i.e. until the n-factored language has been generated). We call this algorithm Factored Candidate Elimination (FCE) algorithm. The top-level FCE algorithm is presented in table 1. The core of the algorithm is the procedure to find the new consistent version spaces in the product languages (in italics in table 1). The difficulty is that the algorithm for inducing concepts over a language (the inductive algorithm) is usually distinct from the algorithm for adding new terms to the language itself (the language-shifting algorithm). In 628 Table 1: The top-level FCE algorithm Input: Output: Variables: Function: An instance set {I}. A set of partially ordered concept languages {L} formed by n given one-factored languages and their products. The version spaces in the set of languages {L} that are consistent with {I} and that contain the smallest number of factors. {LS}k is the subset of (unordered) languages in {L} which have k factors. {VS} k is a set of version spaces, with 1VSki = ILSkl. {Ls,VS}k is the set of pairs obtained pairing the corresponding elements in {LS}k and {VS}k. CE(i,l,vs) takes an instance, a concept language and a version space and returns the updated version space. FCE({I},{L}) K=1. {VSh ={Lsh. For each instance i in {I}, For each (ls,vs) in {LS,VS}k vs = CE(i,ls,vs). If all the version spaces in {VS} k are empty Then Repeat IfK=n Then Returnfailure K=K+1. For each Is in {LS}k, find the new version space vs associated with it. Until at least one vs is not empty. general, the inductive algorithm has to be run again over the instance set after any change made by the languageshifting algorithm ([Utgoff 1986], [Matheus and Rendell 1989], [Pagallo 1989], [Wogulis and Langley 1989]). In this case, however, in defining the procedure to induce the new consistent concepts after any language shift, we take advantage of the features of the particular inductive learning algorithm considered (i.e. the CE algorithm) and of the properties of language "multiplication". The two key facts are that the CE algorithm makes an explicit use of concept ordering and that concepts in any product language preserve the order of concepts in its factors. This makes it possible to modify the basic CE algorithm with the aim of computing the set of consistent concepts in a product language as a function of some appropriate concept sets induced in its factors. The concept sets computed in each factor language which will be utilized during language shift are the following. First, for each language we compute the set S*. S* contains the most specific concepts in the language that cover all positive examples, regardless of whether or not they include any negative examples. Second, for each language and each negative example, we compute the set 0*. 0* contains the most general concepts in the language that do not cover the negative example, regardless of whether or not they include all positive examples. These operations can be better illustrated with an example. Let us consider again the playing cards domain and suppose that we begin with the two concept languages introduced above - rank (L 1) and suit (L2). Let us suppose the system is given one positive example - the Jack of spades - and two negative examples - the Jack of hearts and the Two of spades. We compute the two corresponding version spaces (one for each language), the sets S* (one for each language), and the sets 0* (one for each language and for each negative example) in parallel. In particular, the sets S* and 0* can be immediately determined, given the ordering over each language's members. The inductive phase is pictured in fig.3 (f stands for face, b for black,etc). The three instances cause both of the version spaces to reduce to the empty set. The next step is therefore to shift to the set of maximally small concept languages that are larger than Ll and L2 (in this case the product L 12) and check to see if it contains any concepts consistent with data. The problem of finding the version space in the language L12 can be subdivided into the two tasks of finding the lower boundary set S 12 (i.e. the set of the most specific concepts in L12 that are consistent with data) and the upper boundary set 0 12 (i.e. the set of the most general concepts in L12 that are consistent with data). Computation of S 12 Because a product concept contains an instance if and only if all of its factor concepts contain the instance, the product of S 1* and S2 * returns the most specific factor concepts that include all positive instances. By discarding those that also cover negative examples, we get just the set S 12' If the set becomes empty, then the product language is also inconsistent with the data. More specific concepts, in fact, cannot be consistent because they would rule out some positive example. More general concepts cannot be consistent either, for they would cover some negative examples. In our example, as there is only one positive example, the result is trivial: S 12 = {J'" }. 629 vers-s P 2 vers-sP1 /n + J. y G* any", b" • b /f J {} " • S~ = {J} s;={.} * ~={n,Q,K} * ~={b, +} * {} {} 2. 5 * G={f,1,3 .. 10} 1 * ~={r, ... } Fig. 3. Concept sets computed during the inductive phase. Computation of 012 The simplification with the set S12 returns: Rather than generating and testing for consistency all the product concepts more general than the members of S12, the set 0 12 is computed using the sets 0* As for each negative example there must be at least. one factor concept in each consistent product concept whIch does not cover the negative example, and because we se~k t~e maximally general consistent product concepts, the Idea IS to use the members of the sets 0* as upper bounds to find the factor concepts present in such maximally general product concepts. . . The algorithm is as follows. It begms by droppmg from the sets 0* the elements that cannot generate factor concepts that are more general than those contained in S12' Then, it (a) finds all the conjunctions o~ concept~ in the reduced sets 0* such that each negative mstance IS ruled out by at least one concept, and (b) checks if there are ~ore general consistent conjunc.tions. SteI? (~) reqUIres conjoining each factor concept I~ each 0* (It wII~ rul~ out at least one negative example) wIth all the combmatIOns of factor concepts in the other O*'s which rule out. the remaining negative examples. Step (b) reqUIr~s generalising (with the value 'any') the factor c<:mcepts m the conjunctions found at the en.d of step (a) WhICh do ~ot contribute to rule out any negatIve example. The resultmg set of conjunctions, if any is found, coincides with the set 0 12, in that there cannot be more. general product co?cepts consistent with data. However, It may not be possIble to find a consistent concept conjoining the members of the O*'s. In this case we are forced to specialise the members of the O*'s to the extent required so that they rule out more negative instances, and to iterate the procedure (in the limit, we will get the set S12)' In our example there are just two factors and only two negative instances. The initial sets 0* are: tn, Q, K} {f, 1, 3,00, 10} {b, + } {r, ... } { } {b} {f} { } Step (a) in this case reduces to the union of ~he conjunction of 01 * relative t? inst~nce 1 and 02* r~latIve to instance 2 and the conjunctIon of 01 * relatIve to instance 2 and 02* relative to instance 1. The resull ({ fb }) does not need be generalized (step (b)) for both 'f and 'b' contribute to rule out (at least) one negative example. Also, in this case, the specialisation procedure is not needed because we have been able to find a consistent conjunction: 0 12={fb }. The overall version space in the language 1.,12 is shown in fig.4. ~ t"'M G 12 Jb Fig.4. The version space in the product language after the constructive phase. 630 4 Evaluation There are two ways in which the factored CE algorithm (FeE) can be used to reduce the complexity of the standard CE algorithm. Either we use a graph-factoring algorithm [Subramanian and Feigenbaum, 1986] to find the factors of a given concept space (provided that it is factorable), or we choose a concept language that can be naturally decomposed into factor languages. Here we evaluate the utility of the FCE algorithm with respect to a simple but widely used concept language that has this property. We consider a conjunctive concept language defined on a treestructured attribute-based instance space. We assume the number of attributes be n, each attribute with I levels and branching factor b (the case can be easily extended to nominal and linear attributes, considering that a nominal attribute can be converted in a tree-structured attribute using a dummy root 'any-value', and that a linear attribute can be considered as a tree-structured attribute with branching factor = 1). Each term of the concept space is a conjunction of n values, one for each attribute; the total number of terms in the concept space is [(bI - 1) / (b - 1)]n. It is worth noting that with such a concept language the set S of the version space will never contain more than one element [Bundy et al. 1985]. Even in this case, however, Haussler [1988] has shown that the size of the set G can still be exponential, due to its fragmentation. In the following we compare the CE algorithm applied to this full conjunctive concept language to the FCE algorithm applied to its attribute languages. While their relative performances are equivalent, in that in order to find all the concepts consistent with data in the full concept language it suffices to eventually compute the boundaries of the n-factored version space, their time complexity may strongly vary. The gain/loss in efficiency ultimately depends on the number of instances that each intermediate language is able to account for before it becomes inconsistent. In the best case all the induction is done within the smallest languages,and language shift to larger languages is not necessary. In the worst case no consistent concepts are induced in the smaller languages, so that all the induction is eventually done within the full concept language. To make a quantitative assessment we have to make assumptions about a number of factors in addition to the structure of factor and product languages, including target concept location, training instance distribution, cost of matching concepts to training instances. We consider the worst case convergency to the target concept in the full concept language. This amounts to say that after the first positive instance (the first instance must be positive in the CE algorithm) there are only negative instances, and that each of them causes only one concept to be removed from the version space until it shrinks to the target concept (i.e., the first positive instance). In terms of the full concept language ordering this means that general concepts are removed earlier than any of their more specific concepts. Furthermore, we assume that the generality of the attribute values in the concepts dropped from the version space decreases uniformly. More precisely, we assume that if an attribute value in a dropped concept is placed at level k in the corresponding attribute tree, then the values of that attribute in the remaining consistent concepts are placed at most at level k+ 1. This presentation of training instances has the effect of maximizing the amount of instances that each intermediate language can take in before it becomes inconsistent. As for the cost of matching concepts to instances and other concepts we assume that it is the same in all languages. We can now analyse the complexity in the two approaches. As done in [Mitchell 1982], the time complexity bounds indicate bounds on the number of comparisons between concepts and instances, and comparisons between concepts. CE ale-orithm with full conjunctive concept lane-uae-e. Let q be the number of negative instances, g the largest size of G. Following [Mitchell 1982], in our case the key term is 0(g2q). The maximum size of G is given by the largest number of unordered concepts that can be found in the version space after the first positive instance. This number turns out to be 0(n21). To illustrate, first we must note that the version space after the first positive instance will contain the concepts more general than the instance, therefore the admissible values for each attribute will be the I values in the attribute tree that are placed in the chain linking the attribute value in th~ instance to the root of the attribute tree. When n = 2 there "are. at most I ways to choose a pair of values from two ordered sets of size 1 in such a way that the pairs are unordered. When n increases, this number comes to be multiplied by n / (n -2)! 2! . In fact, considering that two n-factored concepts are unordered if they contain at least two factor concepts with different orderings, all the possible unordered n-factored concepts can be obtained considering the same combinations as in the 1 original unordered concepts for each possible way of choosing a pair of attributes from among the n attributes. The maximum size of G is therefore 0(n21). The complexity of the CE algorithm is 0(n412q). FCE al~orithm with attribute lane-uae-es. In this case several concept languages are active at once. For each negative instance we have to update in parallel at most maxk [n! / (n-k)!k!], that is 0(n 2), version spaces. Given our hypothesis on instance distribution, the g value of the intermediate version spaces will be 1 for the one-factored languages, 2 for the two-factored languages, .. , n for the n-factored languages. The largest value of g is n, and the relative complexity factor for each version space is therefore 0(n2). Thus the time taken to induce version spaces within the set of active languages is at most 0(n2n2q) =0(n4q). . The total time complexity can be calculated adding the time taken by language shift to the time taken by concept induction alone. The cost of shifting the concept languages is given by the number of language shifts (2n) multiplied by the cost of any single language shift. The time taken by any single language shift becomes constant if we modify the FCE algorithm's inductive phase by labelling each . member of each G* and any of its more specific concepts with all the negative instances it does not cover. In this way, in fact, the operations described in the procedure to compute the G set in any product language will no longer involve any matching between concepts and instances. On the other hand, the cost of labelling must now be added to the cost of language shift. The labelling we introduced requires matching each negative instance against the members of n G*'s (we keep only the G*'s relative to the initial factor languages), where each G* contains only one member (in our case, in fact, as there is only one positive instance, we can immediately remove the concepts that are not more general than the positive instance from the G*'s, 631 at an additional cost of O(qnbl», and repeat for all the 1 more specific concepts of each member of G* (Le., the concepts contained in the chain of admissible values relativ~ to that G*'s factor language). Therefore labelling takes In all O(qnl) + O(qnbl) = O(qnbl). The time complexity of language shift is 0(2n) + O(qnbl). The overall time complexity is therefore 0(n4q) + 0(2n) + O(qnbl), which, for practical values of n, b, and 1, approximates to 0(n4q). In sum, we have 0(n4}2q) in the CE algorithm versus 0(n4q) in the FeE algorithm. The effect of using the FCE algorithm with the chosen instance distribution appears to be that of blocking the fragmentation of G due to 1. It is also worth noting that the factor 0(n2) in the FCE algorithm due to the presence of multiple languages can be reduced by reducing the number of intermediate product languages employed. This would, on the other hand, be counteracted by an increase of the factor 0(n2) due to the g of the intermediate languages. Here is a trade-off between using few concept languages and using many concept languages in a given range. The fewer the concept languages, the less the amount of computation devoted to parallel induction and language shift. The more the concept languages, the more likely it is that a smaller amount of induction will be done within the largest concept languages, which are the least convenient. Experimentation might help investigate this kind of trade-off. 5 Relation to factorization in concept induction Factorization with smaller concept languages in the CE algorithm has been first explored in [Subramanian and Feigenbaum 1986] and [Genesereth and Nilsson 1987]. Although we were inspired by their work, our goals, methods and assumptions are different. First, in [Subramanian and Feigenbaum 1986] and [Genesereth and Nilsson 1987].1anguage factorization has been used with the aim of improving efficiency during the phase of experiment generation, whereas we have investigated its utility during the earlier and more important stage of version-space induction from given examples and counterexamples. Second, while they have primarily addressed the problem of factoring a version space and assessing credit over its factors, we have focussed on language shift during version-space induction over a set of available factor and product languages. Third, their approach relies on the assumption that the given concept langage is factorable into independent concept languages l . By contrast, when applying the FCE algorithm directly to the attribute languages of a conjunctive concept language it is not necessary the attribute languages be independent. For example, the two factor languages we have used as an illustration throughout the paper (Ll and L2) happen to be ITwo concept languages LA and LB are independent if membership in any of the concepts from LA does not imply or deny membership in any of the concepts in LB. This definition implies that for every concept a in LA and every concept bin LB the intersection of a and b is neither empty nor equal to either conceptTwo independent concept languages are unordered with respect to the larger-than relation. two independent languages 2 ; however, we could well apply the FeE algorithm to the concept language LB we introduced earlier along with the concept language Lc = {anyrank, odd, even, 1, 3, 5, 7, 9, J, K, 2, 4, 6, 8, 10, Q}, these two languages being not independent (the intersection of the concept "2" in LB and the concept "odd" in Lc is empty, for instance). Using non-independent factor languages, as their product may contain a large number of empty or redundant concepts, may badly affect the performance when the FeE algorithm is applied to recover from inconsistency due to use of small concept languages. But it does not seem to affect the result when the FeE algorithm is used to improve efficiency with respect to the full conjunctive concept language. 6 Relation to inductive language shift As mentioned earlier, the FCE algorithm can also be seen as a method for introducing new concepts to overcome the limitations of a set of restricted concept languages (Le., the factor languages). It does so by creating another set of larger concept languages (Le., the product languages) to constrain the search for new useful concepts. This is a significant departure from the search strategy usually employed in most approaches to inductive language shift. Regardless of the specific goal pursued many systems deal with improvement of some quality measures of the learned descriptions rather than with their correctness - "the problem of new terms" [Dietterich et al. 1982] or "constructive induction" [Michalski 1983] is in general tackled by defining a set of appropriate constructive operators and carrying out a depth-first search through the space of the remaining concepts to find useful (e.g., consistent, more concise, more accurate) extensions to be added to the given language. Furthermore, since the number of admissible extensions is generally intractably large, most of the approaches to constructive induction rely on various heuristics to reduce the number of candidate additional concepts and/or to cut down the search (e.g, [Matheus and Rendell 1989], [Pagallo 1989], [Wogulis and langley 1989]). By contrast, we compute and keep all the admissible language extensions (in a given set of extensions) that restore consistency with data, rather than considering one or few plausible language extensions at a time. Just as the relation more general than that is implicitly defined over the terms of a concept language may allow efficient representation and updating of all consistent concepts [Mitchell 1982], so too the relation larger than that is implicitly defined over a set of languages may provide the framework to efficiently organize the small-to-Iarge breadth-first search of useful languages. These considerations suggest that an alternative abstract model for language shift can be formulated, in which the search for new concepts, rather than being based on the use of constructive operators, is driven by the ordering of a set of candidate concept languages (work in preparation). 2 It is ~ften the case that attribute choice reflects independencies in the world, thus giving rise to actual independent factor languages. 632 7. Conclusion We have presented the FCE algorithm for efficiently inducing version spaces over a set of partially-ordered concept languages. The utility of this algorithm is twofold: improving the efficiency of version-space induction if the initial concept language is decomposable into a set of factor languages, and inducing consistent version spaces if a set of concept languages inconsistent with data is initially available. In this paper we have focussed on the former. We have applied theFCE algorithm to the task of inducing version spaces over a conjunctive concept language defined on a tree-structured attribute-based instance space, and we have evaluated when it leads to a reduction in complexity. Acknowledgements Part of this work was done while at the Computing Science Department of the University of Aberdeen, partially supported by CEC SS project SC1.0048.C(H). I would like to thank Derek Sleeman and Pete Edwards for their support and for useful discussions on this topic. The work was carried out within the framework of the agreement between the Italian PT Administration and the Fondazione Ugo Bordoni 633 References 107-148. [Bundy et al. 1985] A. Bundy, B. Silver, D. Plummer. An analytical comparison of some rule-learning problems. Artificial Intelligence, Vol. 27, No.2 (1985), pp. 137-181. [Wogulis and Langley 1989] J. Wogulis, P. Langley. Improving efficiency by learning intennediate concepts. In Proc. 11 th IJCAI, Morgan Kaufmann, Los Altos, pp. 657662. [Carpineto 1990] C. Carpineto. Combining EBL from success and EBL from failure with parameter version spaces. In Proc. 9th ECAI, Pitman, London, 1990, pp. 138-140. [Carpineto 1991] C. Carpineto. Analytical negative generalization and empirical negative generalization are not cumulative: a case study. In Proc. EWSL-1991, Lecture Notes on Artificial Intelligence, Springer-Verlag, Berlin, 1991, pp. 81-88. [Dietterich et al 1982] T. Dietterich, B. London, K. Clarkson, R. Dromey. Learning and inductive inference. In Cohen & Feigenbaum (Eds.) The Handbook of Artificial Intelligence, Morgan Kaufmann, Los Altos, 1982. [Genesereth and Nilsson 1987] M. Genesereth, N. Nilsson. Logical Foundations of Artificial Intelligence. Morgan Kaufmann, Los Altos, 1987. [Haussler 1988] D. Haussler. Quantifying inductive bias: Artificial Intelligence learning algorithms and Valiant's learning framework. Artificial Intelligence, Vol. 36, No.2 (1988), pp. 177-221. [Hirsh 1989] H. Hirsh. Combining Empirical and Analytical Learning with Version Spaces. In Proc. 6th Int.Workshop on Machine Learning. Morgan Kaufmann, Los Altos, 1989, pp. 29-33. [Matheus and Rendell 1989] C. Matheus, L. Rendell. Constructive induction on decision trees. In Proc. 11th IJCAI, Detroit, Morgan Kaufmann, Los Altos, 1985, pp. 645-650. [Michalski 1983] R. Michalski. A theory and methodology of inductive learning. Artificial Intelligence, Vol. 20, 1983, pp.111-161. [Mitchell 1982] T. Mitchell. Generalization as Search. Artificial Intelligence, VoL 18, 1982, pp. 203-226. [Pagallo 1989] G. Pagallo. Learning DNF by Decision Trees. In Proc .11 th IJCAI, Morgan Kaufmann, Los Altos, pp. 639-644. [Smith and Rosenbloom 1990] B. Smith, P. Rosenbloom. Incremental Non-Backtracking Focusing: A Poliniomally Bounded Generalization Algorithm for Version Spaces. In Proc. 8thAAAI, Morgan Kaufmann, Los Altos, pp. 848853. [Subramanian and Feigenbaum 1986] D. Subramanian, J. Feigenbaum. Factorization in Experiment Generation. In Proc. 5th AAAI, Morgan Kaufmann, Los Altos, 1986, pp. 518-522. [Utgoff 1986] P. Utgoff. Shift of bias for inductive concept learning. In Michalski et al. (Eds), Machine Learning II. Morgan Kaufmann, Los Altos, 1986, pp. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 634 Theorem Proving Engine and Strategy Description Language Massimo Bruschi State University of Milan - Computer Science Department Via Comelico 39, 20135 Milan, Italy e-mail: mbruschi~imiucca.csi.unimi.it Abstract The concepts of strategy description language (SDL) and theorem proving engine (T PE) are introduced as architectural and applicative tools in the design and use of an automated theorem proving system. Particular emphasis is given to the use of an SDL as a research tool as well as a way to use a prover both as a batch or as an interactive program. In fact, the availability of an interpreter for such a language offers the possibility of having a system able to cover both of these usages, giving to the user some way of choosing the granularity of the steps the prover must take. Three examples are given to show possible applications. Their purpose is to show its usefulness for expressing and testing new ideas. Some interesting capabilities of an SDL are applied to highlight how it allows the treatment of self-analysis on the state of the search space. Examples of these are the definition of a self-adaptive search and a tree pruning strategy. All the definitions we give reflect a running Prolog prototype and inherit much from the Prolog style and structure. 1 Introd uction The uses of and the interest in automated theorem proving have grown markedly in the preceding decade. The cause rests in part with faster computers, easy access to workstations, portable and powerful automated theoremproving programs, and successes with answering open questions. Various researchers in the field conjecture that far more power is needed to attack the deep problems of mathematics and logic that are currently out of reach. Although some of the needed increase in effectiveness will result from even faster computers, many state that the real advances will result from the formulation of new and diverse strategies. Because we feel that the ease of comparing, analyzing, and formulating such strategies would be enhanced if an appropriate abstract language and theory were available, we undertake here the development of such a language. Perhaps the abstraction and language will lead to needed insights into the nature of strategy of diverse types. In addition, because of its relation to this language, here we also provide an abstract treatment of theorem-proving programs as engines. This abstraction may enable researchers to analyze the differences, similarities, and sources of power among the radically diverse program designs. The idea for developing a strategy description language (SDL) usable to define search strategies for a theorem prover was born when we began to study the application of parallelism to ATP. One proposal was to run many theorem provers on the same problem but with different search strategies. Having different strategies expressed as programs would mean having, as input of each prover process, the couple < theorem, search algorithm >. The development of a language requires the definition of an abstract machine to execute its programs, requiring an interpreter for the language. Our experiences and previous work with Prolog has suggested its use for the realization of a prototype. One simple way to build an interpreter is to define a kernel module offering the basic services. This led us to the definition of a theorem-proving engine (TPE). Next, we developed a theorem prover having an SDL interpreter and a TPE as basic modules. zSDL is the name of our SDL. Generally, we conjecture that an SDL might benefit by having one (or more) of the basic attitudes and of being procedural, functional, and logical. It should also be able to focus on the operations with different granularity as well as directing the prover process, controlling details of different level of complexity. As a sample model we can think at production systems in AI, and say that an SDL could be used to describe the control side of such a system. There can be as many SDL languages as different production systems. The language we defined did not result from a deep analysis of the cited aspects; instead, it has been driven by the underlying structure of the TPE we developed, by the fact that is realized in Prolog, and by the wish to define the language on the field so that it could he run. One of the nice things about Prolog is that you can develop executable meta-languages. 635 2 A theorem proving engine A TPE is a program module devoted to maintain and operate a knowledge base (K B) of logical formulas and a set of indexes on them. We think of these indexes as sets of references (or ids) to the formulas. The sets are distinguished by name. Each formula is retained together with various information about it. A TPE can perform two basic activities: inference and reduction. The object of the first activity is to deduce new knowledge, gathering it by considering various subsets of the formulas in the KB. The object of the second activity is to keep the size (or the weight) of the KB as small as possible, by discarding redundant information. We require that every successful call to the inference process (IP) also calls the reduction process (RP). To better define the activities of a TPE, we focus on a possible minimal interface to such a module. We assume that the TPE finds the KB initialized with a given input set of formulas and that each operation maintains appropriately the indexes. We shall extend this interface gradually in the paper. The kernel functions of a TPE can be: set {N1 ,N2 , ... ,Nm } will refer to their resolvents (if any). Rules with single premises are called with 8uperpose(Id,Id). • (TPE.4) - delete(+Id) : It is used to delete the formula referred to by Id from the KB and from the indexes. This operation, combined with a superposition call, can be used to realize transformation processes on the KB. Consider for example the standard CNF transformation. It replaces a formula with a (satisfiability) equivalent set of clauses. We can model this by calling an inference rule with only one premise to generate the set of clauses and then delete the premise. As a matter of fact we think of this operation as reversible. See the next operation. • (TPE.5) - undelete(+Id) It is called to recover an earlier deletion of a formula. We can think of it as a special inference rule that uncovers a formula. It can be useful in adaptive searches. Suppose for example we are using a weighting strategy to discard newly generated formulas if they exceed a fixed weight. Using the delete/1 call we can simply hide the formula from the KB and the indexes and later recover it if, for example, the search ends with a consistency status. • (TPE.l) - enable(+Rule) • (TPE.2) - disable(+Rule) A TPE is thought to offer a set of inference and reduction rules, each referred with a name. An IP will then apply the set of all the active inference rules, and the RP will only use the active reduction rules. These two functions are used to control the activity sets. For simplicity we assume the calls can also accept a list of rule names. As a matter of fact the indexes on the formula KI3 have a dominant role for understanding the entire idea. In the next section we will make clearer this role. 3 Its purpose is to activate the IP. It will superpose the formula referred to by I d} on the one referred to by Id 2 using all the active inference rules. We use the concept of superposition because it implies the ordering of the arguments, which is sometimes required. In this respect the general form of a single inference (as well as reduction) rule is thought of as meaning that this rule takes as premises two formula references and produces a set of new references associated to the formulas resulting from the actual application. Consider for example the binary resolution inference rule. It takes two clauses and generates a set of resolvents. So, if we consider the clausal formulas referred to by Id 1 and by Id 2 , the reference zSDL: a strategy description language Indexes, as sets of references to KB's formulas, are the basic objects of the language zSDL, which uses id-sets as the basic elements to refer to nodes and to describe the visit of the search tree. The underlying idea is that an SDL requires some mechanism to represent a proof tree, for the ideal search strategy for proving a given theorem is the description of the precise steps the reasoning module must follow to reach the proof nodes in the search tree. Wi th an SDL we must be able to speak about the nodes of the tree (the formulas) and the relations between them (how to reach the parents of each node, following the ancestor relation, as well as how to reach the children of a node, following the descendants relation). Another useful property might be the ability to know the level of a tree node, in order to define a (partial) ordering between the steps made to reach the proof (a sequence of parallelizable steps). 636 From these observations we chose to use sets of nodes as the basic description objects. And zSDL turned out to be, in some sense, a sets-operations oriented language. We will refer to a generic zSDL set of references to formulas to mean either an id-set or an index. A set is referred by a (unique) name. It is something like a variable of type id-set. In zSDL we can apply to the id-sets all of the common operations and relations on sets, plus some special (procedural) ones like assignments, evaluation, etc. The following is a list of these functions, giving in addition some of the syntax of zSDL (recall that it is a Prolog byproduct). In zSDL an id-set is represented as a Prolog list. The Prolog variable names implicitly define the types of the operators in the following way: SelName: the name of the variable that refers to the set. SelExpr: an expression on sets, which can be an explicit set (list), a SetNarne or an expression built up using the defined operations. Var: a Prolog non-instantiated variable. ElemOr Var: a Prolog variable (Var) eventually instantiated (Elem). Notice that the SetExpr are evaluated. As an example, in a zSDL-Prolog session we could have: I ?- a :- [1,2,3], b .• a .- [3,4,5]. yes I ?- A B X a, b, b .• [6]. A" [1.2.3]. B .. [1.2]. X [1,2,6] in which you see how zSDL sets are permanent objects, contrary to the classical Prolog variables. This level of basic operations on (id- )sets must be enriched by statements to permit interaction with the TPE. We will show the basic calls zSDL defines to run an IP by developing the Prolog code that can realize it. We are looking for a statement responsible for executing the actual inference steps applicable on some given id-sets. Consider the zSDL syntax o (zSDL.4) - directed superposition +SetExprA ++> +SetExprB After the evaluation of the id-set expressions the general form of a call can be thought of as o (zSDL.l) - set operations: •• +5'etExpr A +SetExprB % union +5'etExprA .+ +SetExprB % weak union +5'etExpr A .* +SetExprB % intersection +5'etExpr A - +SetExprB % difference Obviously we expect this search to consider all the pairs, i.e. the TPE must be directed to try all the following superpositions: The weak union makes no checks on repetitions. , < A},B2 >, ... , < A 2 , B} >, < A 2 , B2 >, ... , o (zSD L.2) - ,'et relations : ? ElernOrVar .? +SetExpr +5'etExpr A . -< +SetExprB +5'etExpr A . < +SetExprB +5'etExprA . +5'etExprB II: % membership % strict containment % equality Notice that, using the Prolog negation, we also have the negations of these relations o (zSDL.3) - set procedures: +5'etN arne : +5'etExpr - Var . +SetN arne - Var " +SetExpr .. +5'etN arne II: ... , % containment % assignment % extract 1st element % evaluate % destroy the set The pop operation treats the set as a stack. This can be realized by the following straightforward Prolog code: SetExprA ++> SetExprB Ai .? SetExprA, Bj .? SetExprB, superpose(Ai,Bj), stop_search. SetExprA ++> SetExprB. The only new predicate we used is stop_search/O. In fact, one omitted item in the TPE interface we have observed is a test to control the status of the KB. Therefore, we extend the TPE interface with 637 • (TPE.6) - prooLfoundC-Int) : Used to ask the status of the KB. The number of found proof(s) is given. You can think of stop_search/O as built from a proof _found/ 1 call followed by an appropriate comparison and by any other (eventually) necessary operations. In addition to the ++>/2 operator, zSDL also defines the syntax <> It asks the TPE to release a new dynamic index that will be updated during the execution of the given TPE_Goal to hold the result of the evaluation. This result is then properly assigned to the input Given argument and finally the dynamic index is cleared. This asks for the extension of the TPE interface with the two following calls • (TPE.7) - new_dynamicjndexC-SetName) Ask the TPE to extend the sets of active indexes. SetName will be used to refer to this new dynamic id-set. The complementary call is (zSDL.5) - superposition +SetExprA <+> +SetExprB With the <+>/2 operator each couple is also reversed (except for the ones). As we commented, the general form of an inference rule in zSDL is thought to be • (TPE.8) - deLdynamicjndexC+SetName) It is used to remove the index referred by SetName from the set of the dynamic indexes known by the TPE. Id},Id 2 The actual application of such a rule is called by With the new zSDL operator we can now use the following statement to sketch the application of an inference rule: NewIds ::- [Id}] ++> [Id 2 ]. The first missing item is a way to get, in a zSDL program, the id-set of the generated formulas. With a typical Prolog attitude, we can generalize this problem. A superposition goal on id-sets is like evaluating a high-level function on a set. The relation that links the input and the output sets is different from the classical ones, for it is related to some properties of the objects in the sets and not to the sets themselves. This simply implies that the actual module responsible of the evaluation of these relations is not the classical one. And we know that that module must be the TPE. So we are looking for a syntax like <> (zSDL.6) : ?Index ::- +TPE_Goal, where aT P E_Goal can be, as an example, a superposition call. Notice that we defined the new operator: : =/2 in order to switch the evaluation to the right module. The call also suggests a possible model for the computation of the goal. In fact a goal of the TPE is generally requested to produce a new index (say a dynamic index) that is updated during the actual evaluation of the goal. Consider the following code. Given ::- TPE_Goal :new_dynamic_indexCNewSet), callCTPE_Goal), ( var(Given), Given .. NewSet Given :- NewSet ), del_dynamic_indexCNewSet). where NewIds will be instantiated to the right instance of [N}, N 2 , ••• , N m ], even possibly the empty idset. Notice that the: : =/2 operator works for each TPE goal. The last extension we will give before going through some examples of an application of zSDL focus on a way to have a local specification of the inference rules we wish to apply in a TPE goal. The zSDL syntax is: <> (zSDL.7) : +TPE_Goal ./ +Inferences, defines a TPE evaluation modulo a given set of inference rules. Suppose for example we wish to superpose clauses 3 and 15 only by binary resolution (binary _res). Consider the following code TPE_Goal .f Inferences Active .. enabled_inferences, disable(Active), enable(Inferences). call(TPE_Goal) • disable(Inferences), enable(Active). With this new operator we can express the preceding problem as Resolvents :: - [3] ++> [15] ./ binary _res. 638 The code we have given assumes that the enable/1 and disable/1 calls in the TPE interface maintain one set, called enabled_inferences, collecting the names of the active inference rules. So in zSDL the more general IP activation call to the TPE is NewIds ::- EXprA <+> EXprA ./ Infs. which will give in NewIds the id-set of all the formulas derivable by applying the chosen inferences to all the pairs of formulas implicitly referred to by the id-set expressions. 4 A simple zSDL program: the breadth-first strategy Time has come to give the first example of the use of zSDL to describe a classical strategy: the breadth-first search. We suppose that the TPE is already active and some input formulas are present in the KB. An index called input collects the references to those statements. In the breadth-first search the next level of the tree is filled with all the conclusions given by superposing the last level with all the existing levels. The search stops with complete search or, for example, with a proof. The zSDL program is breadth_first .levels :- input, last :- input, while( ( \+ stop_search, \+ last . = [] ), ( Next ::- last <+> levels, last :- Next, levels .- last) ). The two indexes, levels and last, refer to the entire tree and to its last level, respectively. The vhile/2 is the classical cyclic structure you found in each procedural language. Its syntax is o {zSDL.8} - while(+Condition,+Goal) After the initialization of the values to the input references, the program repeatedly fills the Next level of the search tree, superposing the last level with all the nodes. Then the Next level becomes the last and is also added to the references of the entire tree. The.- notation resemble the C language style assignments. Similarly zSDL accepts the operators +-, -a and .11:. Notice also that the instances of the Prolog variable(s) in the vhile/2 statement are released between the cycles. The preceding algorithm can be improved by thinking of the cases it generates. When we superpose the last level with the entire tree, we must note that all of the nodes in last are already in levels. Furthermore, if we apply the <+> operator to superpose an id-set on itself, we try all of the pairs twice. So, a better program is breadth_first :last :- input, others :- [], while( ( \+ stop_search, \+ last . - [] ), LL ::- last ++> last, LO ::- last <+> others, others :- last .+ others, last :- LL .+ LO ) ). In this definition the last index refers again to the last level of the tree while others refers to the rest of the tree. At each step last is superposed on itself (with the oriented operation ++» and then with the upper levels of the tree. You might also note that in this way we can substitute the use of the standard union with the weak one (append) as no repetitions are possible in the references in the indexes. In addition to the while statement, zSDL defines some other basic control structure: o {zSDL.9} - foreach(+Generator, +Goal) : Goal is executed for all the solutions of the given Generator. o {zSDL.I0} - repeat (+Goal, +Condition) Goal is executed at least once and re-executed while Condition fails. o (zSDL.l1) - iF( +Condition, +Goal) Goal is executed only if the Condition holds. It always succeeds. This list is given only for completeness: the reader might note that zSD L programs are basically extended Prolog programs and that all the structures definable on the underlying Prolog machine can be used by zSDL programs. However, we think that one real important aspect of the < TPE,zSDL > Prolog-based architecture comes from its direct executabilty on a Prolog machine. The global proving system loses the property to be batch or interactive: a proof search is directed by the execution of goals, and the granularity of these steps can vary from the single superposition to the entire search. 639 5 More complex applications The availability of a language like zSDL adds to the ease of implementing and experimenting with new ideas, for example, non-standard search strategies. To illustrate the value of using of zSDL, and to introduce some additional features of this language, we now focus on three somewhat complex programs. The first defines an adaptive, weighting-based, search strategy. The second introduces some atypical deletion strategy into the search. The last one shows how to define a strategy (oriented) tailored to a given inference rule. 5.1 A weight-based adaptive strategy By weighting (w) strategies we refer to those algorithms structured to consider the length, or weight, of the formulas. Examples of w-functions are: the number of symbols in a formula, the number of (positive, negative, total) literals in a clause, as well as linear functions built on these or other values. The general behavior of a wstrategy is to filter the retention in the KB of a newly generated formula, according to the given w-function. Formulas that are too heavy are discarded. The underlying intuitive idea is that if a proof can be obtained without the use of heavy formulas, then such formulas can be discarded. We shall not consider the well-known subproblems that the subsumption operation can lead to, which vary with the w-function adopted. Instead, we consider one of the practical difficulties in the application of these strategies, namely, choosing the appropriate threshold (upper bound on weights) to use for deciding which formulas to discard. The solution we propose follows this simple idea: the threshold can be increased, when the search stops generating formulas, and set to the lightest weight template in the set of the w-deleted formulas. In this sense the search is adaptive: it adapts to the performance of the program. Let us first show the mechanisms provided by the TPE to support w-strategies. Each formula is stored with a weight template. An internal function, namely, weight (+Formula ,-W _Template), is used by the TPE to calculate it. Such a template consists of a 4-integers tuple (N -P-T-S) that counts Negative.Literals, Positive.Literals, Total.Literals and Symbols, where the first three values are "0" if the formula is not a clause. The TPE offers some calls in order to define weighting-based strategies: • (TPE.9) - max_weights(?W _Template) The call can be used both to access the current reference w-template (if W_Template is an uninstantiated variable at the call) or to set a new value for it. The new given W_Template will be used by the w-filter operation to decide which new formulas to accept or discard. All the values for the new formulas must be less or equal to the threshold ones fixed by the given W_Template. The value of a variable will be considered greater than each integer. • (TPE.I0j - lormula_weight(+Id, -W..Template) : Accesses the given formula(s) to get their weight tern plate( s). The basic behavior of the strategy we are going to write is straightforward. At each time we choose the lightest not yet used formula in the KB to be superposed with all the already used ones. Than we move the given formula to the set of the used ones (say "done") while the new generated formulas are added to the first set (say "to_do"). We can express this with the following zSDL program: to_do :- [], done .• [], Input input, add_ordered (Input ,to_do) , while( ( \+ stop_search, \+ to_do .- [] ), Lightest .- to_do, add_ordered([Lightest] ,done), New ::- [Lightest] <+> done, add_ordered (New ,to_do) ) ). As one sees, we solved the problem of getting the lightest formula in a set by extracting the first element from an ordered set. The expected side effect of an add_ordered(Set,SetName) call is to build an ordered union of Set and SetName (into SetName) according to the weight of the corresponding formulas. We can obtain this with: add_ordered([] ,_SetName). add_ordered(Set,SetName) :Xet .. SetName, XX gets a list of Count-Id pairs get_counts(Set,SetCs), get_counts(Xet,XetCs), append(SetCs,XetCs,YetCs), XX sorts by counts keysort(YetCs,ZetCs), XX removes the counts pop_counts(ZetCs,Zet), Set Name :- Zet . where the get_counts/2 call accesses the weightstemplate of the formulas to get the symbol counts (obviously, one can choose different approaches). To extend our strategy to be self-adaptive we have to solve certain problems: 640 * * how to get information on the deleted formulas; how to choose some initial value for the reference w-template. The first problem rests entirely on the TPE behavior, as the "over-weight" deletions are embedded into its operations. Our system maintains a set of structures, indexed by weights-template, to have the references to the deleted formulas. The call • (TPE.ll) - queue(wdel(?W _Template), ?Queue) Queue holds the ids of all the deleted formulas shar- ing the same it W _Template. We first give the extended program that realizes the self-adaptive search, and then we discuss its main steps. self_adaptive :input_weighting, to_do .• [], done :- [], Input .. input, add_ordered(Input,to_do) , while( ( \+ stop_search, ( \+ to_do .= [] q_exists(wdel(_» », once ( to_do .= [], lightest_deleted(Count) , closest_wtemplate(Count,NewWT), max_weights (NewWT), deleted :- [], add_deleted(Count,deleted), Unhide .. deleted, Restored ::= undelete (Unhide), add_ordered(Restored,to_do) Lightest .- to_do, add_ordered([Lightest] ,done), New ::= [Lightest] <+> done, add_ordered(New,to_do) ) ). The first difference concerns the while condition: it now considers the possible presence of formulas deleted by weight, so the search is complete only if no deleted formulas remain. The lightest_deleted/l call accesses the deletion queue, searching for the lightest-weight formula. Its definition can be: lightest_deleted(Count) setof( SymCount, queue(wdel(N-P-L-SymCount),Q), Deleted ), sort (Deleted, [Countl_Others]). The closest_wtemplate/2 call is responsible for deciding the value for the new reference weights-template, or, in other words, for the "size of the adaptation-step". The following definition builds the new template in order to accept all the formulas with the given deleted smallest symbol count. closest_wtemplate(Count,Template) .setof( N-P-L-Count, queue(wdel(N-P-L-Count),Q), Deleted ), max_weights (CurrentWT), max4([CurrentWTIDeleted] ,Template) . where the call to max4/2 builds the Template given by the maximal values for each count. The add_deleted/2 call is conceptually similar to the add_ordered/2 call, but works with the deletion queue. Its definition can be: add_deleted(Count,SetName) ( queue(wdel(N-P-L-Count),Queue), SetName += Queue, q_del(wdel(N-P-L-Count», fail true ). It collects into SetName all the references to the deleted formulas with the given symbol Count and deletes the corresponding queue (q_del/l). So, in the while loop of our program, the to_do idset is extended either by newly inferred formulas or by reactivating the lightest deleted ones (if any). A last point addresses the choice of the initial values for the reference w-template. A strategy that has given us interesting results fixes the values by looking at the counts of the input formulas and choosing the lowest values among them. Its definition is: input_weighting :Input .. input, formula_weight(Input,WTs), max4(WTs,Template) , max_weights(Template). 5.2 A pruning strategy This second example of the applications of the zSDL language is given to show how it can be used to define some self-analytical activity for the proving process. In other words we can use it to reason about the current state of the search during the execution. A well-known problem each ATP program must face is the possible explosion of the search space, which can occur for various reasons. Here we do not study this topic, nor do we suggest that our program has a deep impact on the solution of the general problem. Our goal is only to show how an SDL language can be useful in different research areas of ATP. 641 We observe that our pruning strategy is based on the addresses of the undeterminism in the order of application of the inference steps. On the other hand, the use of reduction rules comes from the wish to have a KB capture the same logical consequences with a smaller possible representation "size". Consider now a generic search process and suppose a reduction step occurs. With "reduction" we will refer to the results of an operation able to change the structure of a formula, maintaining its logical value. Generally speaking a reduction step will reformulate a formula by "reducing" its complexity and/or size. This transformation will in general involve other formulas used as a base for the logical reformulation. As an example, consider the following steps on two generic clauses [1] -,A I B, [2] -,A I -,B I C binary resolution [3] -,A I C subsumption delete 2 We can view this step as the application of a reduction rule that uses [1] to transform [2] into [3]. We note that the satisfiability of the overall KB is preserved, i.e. the operation maintains the logical truth of the set of formulas. Suppose next that such a reduction has occurred during a search, say a formula F has been reduced to F'. There now exists a potential set of formulas whose generation depends on the order in which the search process has been executed: this set consists of all of the descendants of F that have not contributed to the generation of F', or, more precisely, the set by_inference{ descendants{F)) - ancestors{F'). (We note that we must leave all the descendants of F given by reduction as those are formulas originally not in the set generated by F). Pruning this set (if not empty) could perhaps make the proof longer, as the proof could be reachable rapidly by using one of the formulas we deleted, but it will not preclude the possibility of finding the proof if there is one. The effectiveness of this pruning strategy depends mainly on the effectiveness and the applicability of reduction steps in a proof, and so it relies directly on the structure of the search space (given by the formulas asserting the theorem). Let us now see how we can implement this operation by using zSDL and the mechanisms of the TPE. First of all we formalize the calls the TPE defines (and zSDL inherits) to access various relations on the content of the KB. We already announced some of them in section 2. • (TPE.12) - parentsC+ld,-Parents) • (TPE.13) - ancestorsC+ld,-Ancestors) • (TPE.14) - childrenC+ld,-Children) • (TPE.15) - descendantsC+ld, -Descendants) Being I d the reference to a formula, these calls will respectively return the id-set of its parents, ancestors, children, and descendants, with respect to the current KB. We note that the given id-set may contain references to currently inactive formulas (deleted for some reason). All these relations will consider both inference as well as reduction steps. • (TPE.16) - by..reductionC+ldSet,-ByRed) : Given an IdSet this call selects which referred formulas have been produced by application of a reduction rule, building the id-set ByRed with their ids. • (TPE.17) - replaceC?Newld, ?Id) The call succeeds if New I d refers to a formula that replaces an old one (referred by I d) following a reduction step. Otherwise the call fails. The proposed pruning strategy acts like a filter on the result of a superposition call: at each step it checks if the new formulas are given by reduction, in which case it tries to apply the deletion. So, we are going to extend the superposition control level of zSDL with a meta-call realizing the pruning. pruning_deriveCSetA,Mode,SetB) XA .? SetA, XB .? SetB, once C Given::- deriveC[XA] ,Mode,[XB]), by_reductionCGiven,ByRed), foreach( Nld .? ByRed, C replace(Nld,Id), ancestors(Nld,NldAnc), descendants(Id,IdDes), by_reduction(IdDes, IdDesByRed), IdDesBylnf .. IdDes .- IdDesByRed, DelSet .. IdDesBylnf .- NldAnc, delete(DelSet) ) ) ), stop_search. pruning_derive(SetA,Mode,SetB). derive(SetA,«+»,SetB) derive(SetA,(++»,SetB) SetA <+> SetB. SetA ++> SetB. The schema is quite simple. Each by-reduction child (Nld) of a superposition call is related to the formula it replaces (Id). Then the set of the by-inference descendant of Id is reduced by the set of the Nld ancestors. Notice how the byjnference(descendant(F)) set is evaluable a.'> desceTldants(F) - by_reductions( descendants{F )). 642 5.3 A hyperresolution-oriented search strategy Our last example uses zSDL to define a strategy specifically oriented to work with a given inference rule, namely, the inference rule hyperresolution. The efficiency of an ATP system comes from the efficiency of all of the different components of the program, from the basic unification and match algorithms to the KB management, and so on. With some "tough" inference rule, it also heavily relies on the ability of the search strategy to control its application ensuring a complete search without repeating steps. Hyperresolution is one such inference rule. . Hyperresolution considers a basic clause (called nucleus) that has one or more negative literals. An inference step occurs when a set of positive unit clauses (called satellites) is found that simultaneously unify with all of the negative literals of the nucleus. It is simple to see how hyperresolution will not generate new nuclei (for the rule cannot produce a clause containing negative literals) while it can generate new satellites. So, the set of potential satellites change dynamically during the search, and a good strategy must ensure a complete covering partition (with multiple occurrences) of this set without repeating trials. We first explain how we implemented the hyperresolution inference rule in our system (we call it hy_p). As usual the. rule has two arguments: the first must be a satellite and the second a nucleus. If a unification is found between the given satellite and one of the negative literals in the nucleus, then the set of the current active satellites is partitioned and superposed on the remaining negative literals. This behavior suggests the development of a search strategy driven by the generation of new satellites. In fact, we can visit the search space by levels, generate all the possible hyperresolvents, choose from them the new satellites, and use those to drive the search in the next level. As those satellites are new, the partitions we will try are new too, and no repetition in the trials occur. The basic shape of the strategy can be: hyper_strategy :Input .. input, get_satellites(Input,Sats), get_nuclei(Input,Nucs), last_sats := Sats, nucs :- Nucs, while( ( \+ stop_search, \+ last_sats .K [ ] ) , ( New ::- last_sats ++> nucs ./ hy_p, get_satellites(New,NewSats), last_sats := NewSats ) ). The get_satellites/2 and get..lluclei/2 calls are used to choose from an id-set the subset of formula- references corresponding, respectively, to valid satellites and nuclei. Notice how these calls can be defined by using the formula_weight/2 call and testing the negative and positive literal counts accordingly. As a matter of fact, the algorithm we have given follows closely the general schema of a breadth-first search. So, it can be simply extended to consider the application of more inference rules, intermixing the searches with the control, the enable/i, and the disable/i operations permitted. 6 Conclusions This work introduces the concepts of Theorem Proving Engine and Strategy Description Language as architectural and applicative tools in the design and use of an automated theorem-proving program. The definitions we give reflect a running Prolog system, named zEN2, and" because of this fact, they inherit a Prolog style structure. Particular emphasis is given to the use of an SDL as a research tool as well as a way to reinterpret the use of a theorem prover as a batch or as an interactive program. In fact, the availability of an interpreter for such a language offers the possibility of having a system able to cover both of these usages, giving to the user some way of choosing the granularity of the steps the prover must take. Three examples are given to show the possible application of an SOL. Their purpose is to show its usefulness for expressing and testing new ideas. Some interesting capabilities of zSDL are applied to highlight how it allows the treatment of self-analysis on the state of the search space. Examples of these are the definition of the self-adaptive search and the pruning strategy. Acknowledgments The author is very grateful to Larry Wos, Bill McCune and Gianni Degli Antoni for their comments. This work was partially supported by the CEE ESPRIT2 KWICK Project and partially by a grant of the Italian Research Council. Most part of the work was done while the author was visiting the Mathematics and Computer Science Division of the Argonne National Laboratory. References [Henschen et al. 1974 ] L.Henschen, R.Overbeek and L. Wos. A Theorem Proving Language for Experimentation. Communications of the A eM, Vol. 17 No.6 (1974) PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 643 A New Algorithm for Subsumption Test Byeong Man Kim*, Sang Ho Lee**, Seung Ryoul Maeng*, and Jung Wan Cho* * Department of Computer Science & Center for Artificial Intelligence Research Korea Advanced Institute of Science and Technology, Dae-Jeon, Korea ** Database Section Electronics and Telecommunications Research Institute, Dae-Jeon, Korea Abstract To reduce the number of generated clauses in resolutionbased deduction systems, subsumption has been around quite for a long time in the automated reasoning community. It is well-known that use of the subsumption sharply improves the effectiveness of theorem proving. However, subsumption tests can be very expensive because they should be applied repeatedly and are relatively slow. There have been several researches to overcome the expensiveness of subsumption. One of them is the s-link test based on the connection graph procedure. In the s-link test, it is essential to find a set of pairwise strongly compatible matching substitutions between literals in two clauses. This paper presents an improved algorithm of the s-link test with a new object, called strongly compatible list. By use of the strongly compatible lists and appropriate bit operations on them, the proposed algorithm reduces the possible combinations of matching substitutions between literals as well as improves the pairwise strongly compatible test itself. Two other subsumption algorithms and our algorithm are analyzed in terms of the estimated maximal number of string comparisons. Our analysis shows that the worstcase time complexity of our algorithm is much lower than the other algorithms. 1 Introduction Logical Reasoning (or theorem proving) is the key to solving many puzzles, to solving problems in mathematics, to designing electronic circuits, to verifying programs, and to answering queries in deduction systems. Logical reasoning is a process of drawing conclusions that follows logically from the supplied facts. Since the first-order predicate logic is generally sufficient for logical reasoning and offers the advantage of being partially decidable, it is widely used in automated reasoning. There have been a number of approaches to show that a formula is a logical consequence of a set of formulas. Notable among them is Robinson's resolution principle [Robinson 1965] which is very powerful and uses only one inference rule. Many refinements of the resolution principle based on graph have been proposed to increase the efficiency [Kowalski 1975, Sickel 1976, Andrew 1981, Bibel1981, Kowalski 1979]. One of them is Kowalski's connection graph proof procedure [Kowalski 1975, Kowalski 1979] which has some distinct advantages over previous approaches based upon resolution. 1. Once an initial connection graph is constructed all information is present as to which literals are potentially resolvable so that no further search for unifiable complementary literals is needed. 2. Application of a deletion operation can result in further deletion operations, thus potentially leading. to a snowball effect which reduces the graph rapIdly. The probability of this effect rises with the number of deletion rules available. 3. The presence of the complete search space during connection graph proof procedure suggests the opportunity to use parallel evaluation strategies [Loganantharaj 1986,Loganantharaj 1987,Juang 1988] to improve the efficiency. Various deletion strategies [Munch 1988 Gottlob and Leitsch 1985,Chang and Lee 1973] are suggested to re~uce the number of cla?ses generated in theorem provmg (automated reasomng). A very powerful deletion rule in resolution-based deduction systems is the subsumption [Eisinger 1981, Wos 1986]. The subsumption is used not only to discard a newly deduced clause when a copy already has been retained, but also to discard other types of unneeded information. The use of subsumption sharply improves the effectiveness of theorem proving, as illustrated by the benchmark problem, Sam's Lemma [Wos 1986]. However, the use of subsumption can be quite expensive because it must be repeated very often and is relatively slow [Wos 1986]. There have been two approaches for overcoming the expensiveness of subsumpt~on. One is t.o.reduce the number of necessary subsumptIon tests [ElSlnger 1981], and the other is to improve the subsumption test itself [Gottlob and Leitsch 1985 Stillman 1973]. Eisinger [Eisinger 1981] proposes the s~ link test which is based on the principal ideas of the connection graph proof procedure. His method provides an efficient preselection which singles out clauses D that do not possess the appropriate links to the clause C. Having preselect~d t~e candida~es, we need to compose matchmg substItutIOns from lIterals in clause C to literals in clause D to find a matcher () from C to D. In some cases many compositions are possible and hence the search for () becomes quite expensive. Socher [Socher 1988] improves the search procedure by imposing restrictions on the possible matching substitutions. In this paper we propose an improved s-link test with a new object, called strongly compatible list. By use of the strongly compatible lists and appropriate bit operations on them, the proposed algorithm reduces the 644 possible combinations of matching substitutions between literals as well as improves the pairwise strongly compatible test itself. Two subsumption algorithms (Eisinger, Socher) and our algorithm are analyzed in terms of the estimated maximal number of string comparisons. Our analysis shows that the worst-case time complexity of our algorithm is much lower than the other algorithms. In the next chapter, preliminary definitions and the s-link test are presented. A new subsumption algorithm based on strongly compatible lists and its related works and analysis are given in Chapter 3 and Chapter 4, respectively. In Chapter 5, our works are summarized. 2 Preliminaries We assume that the readers are familiar with materials in [Chang and Lee 1973]. A variable starts with an upper case letter and a constant starts with a lower case letter. Definition 2.1 A substitution variables to terms. (7 IS a mapping from We represent a substitution (7 with Si(7 = ti for each i (1 :::; i :::; n) by the set of pairs {td SI,' .. ,tn/ sn}, and represent the composition of substitution of (7 and T by 0' - T. For convenience, we denote (71 •••• - (7 n by -£=1 (7i. Definition 2.2 Two substitutions compatible, if 0' • T = T • (7. (7 and T are strongly Definition 2.3 Substitutions (7I,' •. ,(7n are pairwisestTongly compatible, if any two substitutions (7i,O'j E {0'1' ... ,(7n} are strongly compatible. Definition 2.4 A matching substitution from a term (or a literal) s to a term (or a literal, respectively) t is a substitution Il such that SIl = t. Definition 2.5 uni( C, Ii, D) is a set of all matching substitutions mapping a literal Ii in clause C onto some literal in clause D. en For example, given C = {p(X, Y), q(Y, and D = P(a,b),p(b,a),q(a,c we have uni(C,p(X,Y),D) = {a/X,b/Y},{b/X,a/Y}} and uni(C, q(Y,c), D) = {a/Y}}. l n, Definition 2.6 If there is a T with {} = (7 _ T for any other unifier {} for sand t, (7 is a most general unifier (mgu) for sand t. To reduce the search space in theorem proving, redundant clauses must be removed. The redundant clause means a clause whose removal does not affect the unsatisfiability. The redundant clause includes a tautology or a subsumed clause. The subsumption can be defined in two ways. Definition 2.7 A clause C l subsumes another clause C2 if C l logically implies C2 . Definition 2.8 A clause C 1 {}-subsumes another clause IGII :::; IG2 1 and there is a substitution {} such that G1 {} ~ C 2 • G2 if It has been shown [Gottlob and Leitsch 1985,Loveland 1978] that these two definitions are not equivalent. If we use the first definition, then most of the resolution-based proof procedures are not complete because a clause always subsume its factors. In this paper we are concerned only with the {}-subsumption. In order to perform a subsumption test on given two clauses, we must find a matcher 0 such that CO ~ D. It is well known that finding such {} is NPcomplete [Gottlob and Leitsch 1985] and the search for {} may become expensive. There have been some efforts to reduce the cost of finding a matcher {} [Gottlob and Leitsch 1985,Socher 1988,Chang and Lee 1973,Eisinger 19S1,Stillman 1973]. One of them is the s-link test based on the connection graph procedure. The subsumption test based on the s-link is provided by the following theorem [Eisinger 1981]: Theorem 2.1 Let C = {/I, ... , In} and D be clauses. Then C {}-subsumes D if and only if ICI S IDI and there is an n-tuple ((7I,"" O'n) E x£=luni(C, Ii, D) such that all (7i (1 :::; i :::; n) are pairwise strongly compatible. Example 2.1 (of Theorem 2.1 [Socher 1988]) Given a set {C, D l , D2 , D3} of clauses with C = {p(X, Y), q(Y, en, Dl = {p(a, c), r(b, en, en D2 = {p(U, V), q(V, Wn and D3 = {p(a, b), p(b,a), q(a, one want to find out, which clauses are subsumed by C. Dl can be excluded because the literal q(Y, c) in C is not unifiable with any literal in DI, that is, there is no s-link from q(Y, c) to a literal in D l . D2 cannot be a candidate because uni( C, q(Y, c), D2) = D. For D3 we obtain the two pairs ((7I, T) and ((72, T), where (71 = {a/X, b/Y}, (72 = {b/X, a/Y} and T = {a/Y}. From these two pairs only ((72, T) is strongly compatible and thus C subsumes D 3. 0 As shown in Example 2.1, in order to find clauses that are subsumed by a clause C = {II, ... , 1m}, first we have to preselect clauses that are connected to every literals in C by s-links of a connection graph. If D is such clause then each literal in C is unifiable with some literals in D. For such candidate D, we need to perform a pairwise strongly compatible test on all elements of X~l uni( C, 3 ii, D). A New Subsumption Algorithm Based on Strongly Compatible Lists The s-link test [Eisinger 1981] for long clauses with more than one matching substitution for each literal may require an expensive search of all elements of the Cartesian product. We define the strongly compatible list of matching substitutions in order to improve the s-link test. With the strongly compatible lists, we can single out useless matching substitutions and improve the pairwise strongly compatible test itself. The following three bit operations are used in this paper. bitwise disjunction of WI and W2 bitwise conjunction of WI and W2 bitwise complementation 645 where WI by Wi is a bit sequence. For convenience, we denote Similarly, we denote WI *... *'Wn +... +'W n by +i=l 'Wi. *i=l'Wi. To test whether the given two matching substitutions are strongly compatible, we need the following definition. Definition 3.1 Let {VI,'" ,vn } be an ordered set of variables in clause C, and a matching substitution 0" between literals in clauses C and D be {tIi Sl,' .. ,tmj Sm}. 8(0") is an n-length list such that the ith element is tj if Vi = Sj, , a), 8(U4) = (, ). We can calculate the followmg strongly compatible lists of matching substitutions: f3IUIj = PIla) * P2(.unctured-tube. flat-tyre ~ leaky-valve. } where the predicates broken-spokes, puuctured-tube and leaky-valve are the abducibles. Given a query Q = +wobbly-wheel, abductive reasoning allows to infer the asswuptions: 51 = { pWlctured-tube }, 52 = { leaky-valve}, and 53 = { broken-spokes} . ·supported by the Belgian "Diensten voor Programmatie van Wetenschapsbeleid", under the contract RFO-AI-03 t supported by the Belgian National Fund for Scientific Research These sets of assumptions are abductive solutions to the given query +-Q in the sense that for each 5i, we have that P U Si 1= Q. Kowalski points out that we can equally well obtain these solutions by deduction, if we first transform the abductive program P U {Q} into a new logic theory T. The transformation consists of taking the only-if part of every defuution of a non-abducible predicate hI the Clark-completion of P and by adding the negation of Q. In the example, we obtain the (non-Horn) theory T: T = { wobbly-wheel -+ flat-tyre, broken-spokes. flat-tyre -+ punctured-tube, leaky-valve. wobbly-wheel +} Minimal models for this new theory Tare: Ml = { wobbly-wheel, flat-tyre, puucturedtube }, M2 = { wobbly-wheel, flat-tyre, leaky-valve }, and M3 = { wobbly-wheel, broken-spokes }. Restricthlg these models to the atoms of the abducible predicates only, we precisely obtain the three abductive solutions 51, 52 and 53 of the original problem. The above observation points to an hlteresting issue; namely the possibility of linking these dual declarative semantics by completely equivalent dual procedures. Figure 1 shows this duality between an 5LD+ A~duction tree (see [Cox and Pietrzykowski, 1986]) and the exectution tree of Satcluno, a theorem prover based on model generation ([Manthey and Bry, 1987]). f-wobbly-wheel ~roken-SPOkes f-flat-tyre !;leaky-valve f-Qunctured-tube SO=0 ~ SI =so u (wobbly-wheel} Abrok""."""'" s2=slu(flat-tyre} J ~eaky-valve} S3=S2u (punctured-tube } . Figure 1: Procedural Duality of Abduction and Satchmo 651 Although this example illustrates the potential of using deduction or more precisely, model generation, as a formalisation of abductive reasoning, an obvious restriction of the example is that it is only propositional. Would this approach also hold for the general case of definite abductive programs? An example of a non-propositional program and its only-if part is given in figure 2. Abd = {q!2}; P = { p(a,b)~ p(a,X) ~ q(X,V). } only-ifCP) =FEQ U {p(y,Z) ~ (y=a&Z=b), (3V: Y=a & q(Z,V»} Q=~p(x.X). notCQ)= 3 X: p(X,X). details of the computation, figure 5 presents the computation tree. transitivity ~a=b Figure 2: A predicate example The theory only-if(P) consists not only of the only-if part of the definitions of the predicates but comprises also the axioms of Free Equality (FEQ), also known as Clark Equality ([Clark, 1978]). The abductive solutions and models of only-if(P) are displayed in figure 3. M= {p(a,a), ~} M= {p(a,a), ~} M= {p(a,a),~} Figure 3: Abductive solutions and models The duals of the abductive solutions are again identical to models of only-if(P). This example suggests that at least the duality on the level of declarative semantics is maintained. However, on the level of procedural semantics, some difficulties arise. The SLD+Abduction derivation tree is given in figure 4. . / ' \ e = {X/a} fails substitution p(a,sk ) 1 p(sk ,a) 1 ~ ~21 success \ f- Globally, the structure of the SLD+abduction tree of figure 4 can still be seen in the Satchmo-1-tree. Striking is the duality of variables in the abductive derivation and skolem constants in the model generation. However, one difference is that the Satchmo-1 tree comprises many additional inference steps due to the application of the axioms of FEQI. In the abductive derivation these additional steps correspond to the unification operation (e.g. on both left-most branches, the failure of the unification of {X=a, X=b} corresponds to the derivation of the inconsistency of the facts {" kl =a, "k 1 = b } ). Another difference is that the generated model {p(a, a), q(a, "k1 ), p("k 1 , a), p(a, "k 1 ), p( $kl' "k 1 ), q($kl! $kl ), $kl =a, a="k 1 , a=a, "k 1 =$k 1 , "k 1 ="kl p(X,X) unifiiation {sk 1=a. q(sk l'sk2 )} reflexivity symmetry Figure 5: Execution tree of Satchmo-1 1l ={q(a,a)}, e ={X/a} 1l = {q(a,b)}, e = {X/a} 1l = {q(a,sk)}, e = {X/a} f- failure St v q(a, V) Figure 4: Abductive derivation tree After skolemisation of the residue +-q( a, V), we obtain the third abductive solution. With respect to the model generation, the theory only-if(P) is not clausal, however the extension of Satchmo, Satchmo-1 ([Bry, 1990]), can deal with such formulas directly (without normalisation to clausal form). Without dealing with the technical } is much larger than the model which is dual to the abuctive solution. Satchmo_1 generates besides the atoms of this model also all logical implications of FEQ, comprising all substitutions of a by Ski' It is clear that in general this will lead to an exponential explosion. However, observe that we obtain the desired model by contracting $k1 and a in the generated model. Therefore, extending Satchmo-1 with methods for dynamic contraction of equal elements would solve the efficiency problem and would restore the duality on the level of declarative semantics. Contraction of a model is done by taking one unique witness out of every equivalence class of equal terms and Ihnproper use or Satchmo_l: Equality in head or rule. 652 replacing all terms in the facts of the model by their witnesses. Techniques from Term Rewriting can be used to implement this. The procedural solution is to consider the set of inferred equality facts as a Term Rewriting System (TRS), to transform the set to an equivalent complete TRS, and to normalise all facts in the model using this complete TRS, and this after each forward derivation step in Satchmo_1. This procedure may seem alien to Logic programming, but the contrary is true. As a mather of fact, the proposed procedure appears to be exactly the dual of techniques used in SLD+Abduction: • the completion procedure corresponds dually to unification. The dual of the mgu (by replacing variables by skolem constants) is the completion of the set of equality atoms. related work. Due to space restrictions, all proofs are omitted. We refer to [Denecker and De Schreye, 1991] for the explicit proofs. 2 Extended programs. In this section we int.roduce the formalism for which the model generation will be designed. This formalism should at least contain any theory that can be obtained as the only-if part of the definition in the Clarkcompletion of definite logic programs. The extended clause formalism introduced below, generalises both this kind of formulas and the clausal form. Definition 2.1 Let L be a first order language. A 11. extended clause or rule is a closed formula of the type: V(Gl, ... ,G k -+ El, ... ,EI ) where Ei has the general form: • the normalisation corresponds dually to applying the mgu. Therefore, incorporating these techniques in Satchmo.l would also restore the duality on the level of procedural semantics. The research reported in this paper started as a mathematical exercise in duality. However, there are clearly spinoffs. One application is the extension of Satchmo.l with efficient treatment of equality. We propose a framework for model generation under an arbitrary equality theory and we formally proof the duality of SLD+abduction in the instance of the framework, obtained by taking FEQ as the equality theory. Also for abduction there are spinoffs. An illustration of this is found in the context of planning as abduction in the event calculus. The event calculus contains a clause, saying that a property holds at a certain moment if there is an earlier event which initiates this property, and the property is not terminated (clipped) in between: holds.at(P, T)~happens(E), iniiiates(E, P), E < T, -,clipped(E, P, T). A planner uses this clause to introduce new events which initialise some desired property. Technically this is done by first skolemising and then abducing the happens goal. However, skolemisation requires explicit treatment of the equality predicate as an abducible satisfying FEQ ([Eshghi, 1988]). The techniques proposed in this paper allow efficient treatment of the abduced equality atoms, and provide a declarative semantics for it. The paper is structured as follows. III section: 2, we present the class of theories for which the model generation is designed. Section 3 recalls basic concepts of Term Rewriting. In section 4, the framework for model generation is presented and inlportant semantic results are formulated. In section 5, the duality with abductive reasoning is formalised. Section 6 discusses future and' such that all Gi are atoms based on L, all Fi are equality atoms based on L 11.011.- Definition 2.2 An extended program is a set of extended clauses. Interestingly, the extended clause formalism can be proven to provide the full expressivity of first order logic. Any first order logic theory can be translated to a logically equivalent extended program, in the sense that they share exactly the same models. (Recall that the equivalence between a theory and its clausal form is much weaker: the theory is consistent iff its clausal form is consistent. ) In the sequel, the theory of general equality (resp. the theory of Free Equality), for a first order language L will be denoted EQ(L) (resp. FEQ{L)). A theory T, based on L, is called a theory with equality if it comprises EQ( L). A theory T, based on L is called an equality theory if it is a theory with equality in which" =" is the only predicate symbol in all formulas except for the substitution axioms of EQ(L). 3 Concepts of Term Rewriting. The techniques we intend to develop for dealing with equality, are inspired by Term Rewriting. However, work in this area is too restricted for our purposes, because the concepts and techniques assume the general equality theory EQ underlying the term rewriting. To be able to deal with FEQ, we extend the basic concepts for the case of an arbitrary underlying equaHty theory E. In the sequel, equality and identity will be denoted distinctly when ambiguity may occur, resp. by "=" and "=". We assume the reader to be familiar with basic notions of TRS's (see 653 e.g. [Dershowitz and Jouannautl, 1989]). We just recall some general ideas. A TRS I associates to each term s a reduction tree in which each branch consists of successive applications of rewrite rules of I' If I is noetherian, these trees are all finite. If moreover I is Church-Rosser or confluent, all leaves of the reduction tree of any term t contain the same term, called the normalisation of t and denoted t. , . In Term Rewriting, such a TRS is called complete. Below we extend tIus concept. Definition 3.1 Let E be an equality theory based on a language L, I a Term Rewriting System based on L. I is complete wrt to < L, E> iff I is noetherian and Church-Rosser and, moreover, has a Least Herbrand Model, which consists of all ground atoms s = t constructed from terms in HU{L) such that S,,=t. , . This definition extends the normal definition in Term Rewriting by the tIlird condition. However, for E = EQ, it has been proved that this property is implied by the noetherian and Church-Rosser properties (for a proof see [Huet, 1980]). Of course this is not the case for an arbitrary equality theory (as FEQ). 4 A framework for Model Generation Informally a model generator const-ructs a sequence 1 (C'ld,jd)~' where Cld is the ground instance of a rule applied after d steps, and jd thl! index indicating the conclusion of Cld that was selected, an increasing sequence of sets of asserted ground facts (Md)~ of non-equality predicates, a sequence of complete Term Rewriting Systems ((d)O' each of which is equivalent with the set of asserted equality facts, and an increasing sequence of sets of skolem constants (S kd)~, obtained by skolemizing the existentially quantified variables. Formally: Definition 4.1 Let L be a language, L,/c an infinite countable alphabet of skolem constants, T an e:xtended program based on L consisting of an equality theory with completion E with completion function TRS-comp and P an extended program. An Nondeternunistic Model Generator with Equality {NMGE} J( is a tuple of jour sequences (Skd)o' (Md)o, b d)O and -( C ld' jd)~ where n E IN U {oo}. The sequences satisfy the following conditions: 1. Mo Definition 3.2 A completion of a TRS I wrt is: = Sko = Uj 10 = TRS-comp( {}) 2. for each d such that 0 < d :::; n, Cld, jd, Sk d, lvld and Id are obtained from Sk d- l , M d - 1 and Id-1 by applying the following steps: • {o} if is inconsistent (a) Selection of rule and conclusion • a complete TRS Ie' such that FI ~ Ie Our framework for model generation is developed for logical theories consisting of two components, an extended program P and an underlying equality theory E. TIlls distinction reflects the fact that the model generation mechanism applies only to the extended clauses of P, wIllIe E is dealt with in a procedural way, using completion and normalisation. However, in order to make this possible, E should satisfy severe conditions, which are formulated in the following definition. Definition 3.3 An equality theory with completion, E, based on a language L, is a clausal equality theory equipped with a language independent completion procedure. The latter condition means that if I is a ground Term Rewriting System based on an extension L' of L by skolem constants, and Ie is the completion of 'Y wrt to , then for any further extension L" of L' by skolem constants, 'Ye is still the completion of I wrt . We denote 'Ye as TRS-comp((). Define LHMd_l as: LHM{ is a tree such that: • Each node is labeled with a tuple (Sk,M,,) where Sk is a skolem set, M a set of non-equality facts baud on L+Sk, and I is a ground TRS based on L+Sk. • To each non-leaf N, a ground instance Cl of a rule of P is associated. For each conclusion with index j in the head of GI, there is an arc leaving from N which is labeled by (GI,i). • The sequence of labels on the nodes and arcs on each branch of T constitute an NMGE . Definition 4.5 An NMGET is fair if each branch tS fair. Definition 4.6 An NMGET is failed if each branch is failed. (LHMd)'O is a monotonically increasing se- An NMGE performs a fixpoint computation, the result of which can be seen as an interpretation of the language L and, as we later show, a model of . Definition 4.3 The fixpoint of an NMGE K is UoLHMd and is denoted by Kj. The skolem set used by K is U'OSk d and is denoted by Sk(K). Kj defines an interpretation of L in the following way: • domain: HU(L+Sk(K)) • for each constant c of L: KT( c )=c • for each functor fin of L: KTUln) is the function which maps terms t l , ... , tn of HU(L + Sk(K)) to f(tl, ... ,t n ). • for each predicate of L: Kj(pln) is the set of P(tl'" .,t n ) facts in Kj. Corollary 4.1 If K is a finite successful NMGE [( of length n, then Kj = LHMn Theorem 4.1 (Soundness) If K is a fair NMGE, then KT i.s a model for and P+E is con.si.stent (a fortiori). We say that [(j is the model generated by K. To state the completeness result, we require an additional concept: the NMGE-Tree. Analogously with the concept of SLD-Tree, anNMGE-Tree is a tree of NMGE's obtained by applying all different conclusions of one rule in the descendents of a node .. Observe that a failed NMGET contains only a finite number of nodes. Also if T is inconsistent then because of the soundness Theorem 4.1, each fair NMGET is failed. As a completeness result, we want to state that for any model of P+E, the NMGE contains a branch generating a smaller model. In a context of Herbrand models, the smaller-than relation can be expressed by set inclusion. However, because of the existential quantifers and the resulting skolem constants, we cannot restrict to Herbrand models only. In order to define a smaller-than relation for general models, we must have a mechanism to compare models with a different domain. A solution to this problem is provided by the concept of homomorphism. Definition 4.7 Let II, 12 be interpretations of a language L with domains' D 1 , D 2 • A homomorphism from 11 to 12 is a mapping h: Dl ~Dz which satisfies the following conditions: • For each functor fin (n 2: 0) of Land:e, :el, ... ,:e n E D 1 : :e::I1 (fIn) ( :el, . ", :en) => h(:e)::Iz{f/n)(h(:ed, ... , h(:e n )) • For each predicate symbol pin (n 2: 0) of Land :ell ... , :en E D l : I1(p/n)(:Cl, ... ,:e n ) ~ 12(pln)(h(:ed,· .. ,h(:e n )) 655 Intuitively a homomorphism is a mapping from one domain to another, such that all positive information in the first model is maintained under the mapping. Therefore the homomorphsinlS in the class of models of a theory can be used to represent a " .. .contains less positive information than ..." relation. We denote the fact that there exists a homomorphism from interpretation Ii to 12 by Ii ::S 12, This notation captures the intuition that Ii contains less positive information than 12 , For NMGET's we can proof the following powerful completeness result. Theorem 4.2 (Completeness) Let E be an equality theory with completion, P an extended program, both based on L. Let L,/c be an alphabet of skolem constants. Theorem 4.3 (Minimal Herbrand models) If P is clausal, then for each fair NMGET T, each minimal Herbrand model is generated by a branch in T . We have extended the concept of mininlal model for general logic theories and proved the completeness of NMGE in the sense that each fair NMGET We refer to T generates all minimal models. [Denecker and De Schreye, 1991]. 5 Duality of SLD+Abduction and Model Generation. 1. There exists a fair NMGET for . The NMGE framework allows to formalise the observations that were made in the introduction. We first introduce the notion of a dualisation more formally. 2. For each model M of and each fair NMGET T, there exists a succesful branch K of T wch that KT ::S M. Definition 5.1 Let L be a first order language, L ,/c an We refer to [Denecker and De Schreye, 1991] for a constructive proof of this strong result. As a corollary we obtain the following reformulation of a traditional completeness result. Corollary 4.2 If is consistent then in each fair NMGET there exists a succesjul branch. If there exists a failed NMGET for , then is inconsistent, and all fair NMGET's are failed. The completeness result does not imply that all models are generated. For example for P = {pt-q}, the model {p,q} is not generated by an NMGE. The following example shows that different NMGET's for the same theory might generate different models. Example P ={ p, qt- pt-} Depending on which of these clauses is applied first, we get two different nonredwldant NMGET's. If pt- is applied first, then p, qt- holds already and is not applied anymore. So we get an NMGET with one branch of length 1. On the other hand if p, qtwas selected first, then two branches exist and we get the solutions {p} and {p,q}. Therefore it would be interesting if we could characterize a class of models which are generated by each NMGET. The second item of the completeness Theorem 4.2 gives some indication: for any given model M, some succesful branch of the NMGET generates a model with less positive information than M. For the clausal case, models with no redundant positive information are minimal Herbrand Models. From this observation one would expect that for a clausal program, each fair NMGET generates all minimal models. Indeed, the following completeness theorem holds: alphabet of skolem constants, V,/C a dual alphabet of variables such that a bijection D : L ,,.---+ V,k exists. The dualisation mapping D can be extended to a mapping from HU(L+L,/.:) U HB{L+L'k) to the set of terms based on L+ V,k by induction on the depth of terms: • for each constant c of L : D(c) == c • for each term t = f(t 1 , ••• , tn) : D(f(t 1 , ••• , tn))==f(D(td, ... , D(t n)) D can be further extended to any formula or set of formulas. Under dualisation, a ground TRS , based on L+L'k corresponds to an equation set Dr,) with terms based on L +V,/.:. , is said to be in solved form iff Dr,) is an equation set in solved form. An equation set is in solved form iff it consists of equations :Vi = ti, such that the :Vi'S are distinct variables and do not occur in the right side of any equation. So a TRS is in solved form if the left terms a.re distinct skolem constants of L ,/c which do not occur at the right. A TRS in solved form can also be seen as the dual of a variable substitution. Property 5.1 Let, be a TRS in solved form. Then 'Y is complete wrt to . Theorem 5.1 (Duality completion - unification) FEQ{L) is an equality theory with completion. The completion procedure is dual to unification. The dual of the com.pletion of a ground TRS, based on L+Sk, is the mgu of D(r). Or D{TRS-comp(r)) = mgu(D(r)). As was observed in the int.roduction, this duality can be extended further to the complete process of SLD+abduction. On a procedural level, each resolution step corresponds dually to a model generation step. The selection of a goal for resolution corresponds dually to 656 the selection of the extended rule with its condition instantiated with the dual of the goal. The selection of the clause in the resolution corresponds dually to the selection of the corresponding conclusion in the extended rule. The unification of goal with the head of the clause and the subsequent application of the mgu, corresponds to the completion of the dual equations in the conclusion and the subsequent normalisation. Now we can formulate the duality theorem for SLD+Abduction ([Cox and Pietrzykowski, 1986]) and Model Generation. Theorem 5.2 Let L be a first order language, with an alphabet of variables L", L,/c an alphabet of skolem constants, and D: L,k-+L" a duality bijection between skolem con3tants and variables. Let P be a definite abductive program baud on L. For any definite query ~Q, an abductive derivation for t-Q and P can be dually interpreted as a fair NMGE for only-ij(P}+3(Q). The set of atoms of the generated model, re3tricted to the abducible predicate,s is the dual of the abductive solution. The dual of the answer substitution is the re3triction of'n to the skolem constants dual to the variables in the query. The following corollary was proved first by Clark ([Clark, 1978]) for normal programs. For the definite case it follows immediately from the theorem above. Corollary 5.1 An SLD-refutation for a query t-Q, and a definite program P without abducibles is a consistency proof of 3( Q) +only-if{P}. A failed SLD-tree for a ground query ~Q and P is an inconsistency proof of:J( Q) +onlyij(P}, and therefore of 3(Q)+comp{P}. 6 Discussion A current limitation of the duality framework is its restriction to definite abductive programs. In the future we will extend it to the case of normal abductive prQcedures. The extended framework will then describe a duality between an SLDNF+Abduction procedure and a form of model generation. The SLDNF+Abduction procedure can be found by proceeding as for the definite case. There we started from pure SLD and definite programs without abduction, we dualised it and obtained the NMGE method, which under dualisation yields an SLD+Abduction procedure. At present we have performed (on an informal basis) the dualisation of SLDNF for normal programs without abduction. Under dualisation, the resulting model generation procedure gives a natural extension of SLDNF for abductive programs. The abductive procedure incorporates skolemisation for non-ground abducibles goals and efficient treatment of abduced equality atoms by the methods presented earlier. Integrity constraints can be represented by adding for any integrity constraint IC, the rule: "falset-not{IC)." ,transforming these rules to a normal program using the transformation of LloydTopor ([Lloyd and Topor, 1984]), and adding the literal not false to the query. A prototype of this method has been implemented. An interesting experiment was its extension to an abductive planner based on the event calculus. Our prototype planner was able to solve some hard problems with context dependent events, problems that are not properly solved by existing systems ([Shanahan, 1989], [Missiaen, 1991]). In [Denecker and De Schreye, 1992], we proved the soundness of the procedure with respect to Completion semantics, in the sense that for any query ~Q and generated solution .6.: This implies the soundness of the procedure with respect to the Generalised Stable Model semantics of [Kakas and Mancarella, 1990b]: a generated solution can be extended in a natural way to a generalised stable model of the abductive program. As a completeness result we proved that the procedure generates all minimal solutions when the computation tree is finite. Related to our work, [Bry, 1990] also indicates a relationship between abduction and model generation. However, while we propose a relationship on the object level, there it is argued that abductive solutions can be generated by model generation on the abductive program augmented with a fixed metatheory. In [Console et al., 1991]' another approach is taken for abduction through deduction. An abductive procedure is presented which for a givellnormal abductive program P and query t-Q, derives an explanation formula E equivalent with Q under the completion of P: comp( P) F (Q ¢:> E) The explanation formula is built of abducibles predicates and equality only. It characterises all abductive solutions in the sense that for any set .6. of abducible atoms, .6. is an abductive solution iff it satisfies E. Although this approach departs also from the concept of completion, it is of a totally different nature. In the first place, our approach aims at contributing to the procedural semantics of abduction. This is not the case with the work in [Console et al., 1991]. Another difference is that this approach is restricted to queries with a finite computation tree. If the computation tree contains an infinite branch, then the explanation formula cannot be computed. In [Kakas and Mancarella, 1990a], an abductive procedure for normal abductive programs has been defined. A restriction of this method is that abducible goals can only be selected when they are ground. As argued in section I, this poses a serious problem for applications such 657 as plaIUllng. The methods presented here allow to overcome the problem by skolemisation of nonground goals and efficient treatment of abduced equality facts. Recently, an plalmi.ng system based on abduction in the event calculus has been proposed i.n [Missiaen, 1991]. The underlying abductive system incorporates negation as failure, skolemisation for non-ground abducible goals and efficient treatment of abduced equality facts. However, the system shows some problems with respect to soundness and completeness. Experiments indicated that these problems are solved by our prototype planner. Finally, we want to draw attention to an unexpected application of the duality framework. In current work on abduction, the theory of Free Equality is implicitly or explicitly present. What happens if FEQ is replaced by general equality EQ and the equality predicate is abducible? The result is an uncommon form of abduction illustrated below. Take the program P = {r( a) ~ }. For tIlls program, the query ~r(b) has a successful abductive derivation. ~r(b) b.={} o b. = {b = a} ~r( b) succeeds under the abductive hypothesis {b= a}. The duality framework provides the teclmical support for efficiently implementing this form of abduction. The only difference with normal abduction is that the completion procedure for FEQ -the dual of unificationmust be replaced by a completion procedure for EQ, for example Knuth-Bendix completion. To conclude, we have presented a duality between two computation paradigms. This duality allows to transfer tecruucal results from one paradigm to the other and vice versa. One application that was obtained was an efficient extension of model generation with equality. Transferring these methods back to abduction, we obtained techniques for dealing with non-ground abducible goalS and efficient treatment of abduced equality atoms. We discussed experiments indicating that the extension of the duality framework for the case of normal programs is extremely useful for obtaining an abductive procedure for normal abductive programs. 7 Acknowledgements We thank Krzysztof Apt, Eddy Bevers, Maurice Bruynooghe and Francois Bry for helpful suggestions. References [Bry, 1990] F. Bry. Intensional updates: Abduction via deduction. In proc. of the intern. conf. on Logic Programming 90, pages 561-575,1990. [Clark, 1978] K.L. Clark. Negation as failure. In H. Gallaire and J. Minker, ~ditors, Logic and databases, pages 293-322. Plenum Press, 1978. [Console et al., 1991] L. Console, D. Theseider Dupre, and P. Torasso. On the relationship between abduction and deduction. Journal of Logic and Computation, 1(5):661-690, 1991. [Cox and Pietrzykowski, 1986] P.T. Cox and T. Piet.rzykowski. Causes for events: their computation and application. In proc. of the 8th intern. conf. on Automated Deduction, 1986. [Denecker and De Schreye, 1991] Marc Denecker and Danny De Schreye. A framework for indeterministic model generation with equality. Technical Report 124, Department of Computer Science, K. U .Leuven, March 1991. [Denecker and De Schreye, 1992] Marc Denecker and Dann,Y De Schreye. A family of abductive procedures for normal abductive programs, their soundness and completeness. Technical Report 136, Department of Computer Science, K.U.Leuven, 1992. [Dershowitz and Jouannaud, 1989] N. Dershowitz and J .-P. J ouannaud. Rewrite systems. In Handbook of Theoretical Computer Science, vol.B, chapter 15. North-Holland, 1989. [Eshghi, 1988] K. Eshghi. Abductive planning with event calculus. In R.A. Kowalski and K.A. Bowen, editors, proc.of the 5th lCLP, 1988. [Huet, 1980] G. Huet. confluent reductions: Abstract properties and applications to term rewriting systems. journal of the Association for Computing Machinery, 27( 4) :797-821, 1980. [Kakas and Mancarella, 1990a] A.C. Kakas and P. Mancarella. Database updates through abduction. In proc. of the 16th Very large Database Conference, pages 650-661,1990. [Kakas and Mancarella, 1990b] A.C. Kakas and P. Mancarella. Generalised stable models: a semantics for abduction. In proc. of ECAI-90, 1990. [Kowalski, 19911 R.A. Kowalski. Logic programming in artificial intelligence. In proceedings of the IlCAl, 1991. [Lloyd and Topor, 1984] J.W. Lloyd and R.W. Topor. Making prolog more expressive. Journal of logic programming, 1(3):225-240, 1984. [Manthey and Bry, 1987] R. Manthey and F. Bry. A hyperresolution-based proof procedure and its implementation in prolog. In proc. of the 11 th German workshop on Artificial Intelligence, pages 221-230. Geseke, 1987. [Missiaen, 1991] L. Missiaen. Localized abductive planning with the event calculus. PhD thesis, Department of Computer Science, K.U.Leuven, 1991. [Shanahan, 1989] M. Shanahan. Prediction is deduction but explanation is abduction. In IlCAI89, page 1055, 1989. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 658 Defining Concurrent Processes Constructively * Yukihide Takayama Kansai Laboratory, OKI Electric Industry Co., Ltd. Crystal Tower, 1-2-27 Shiromi, Chuo-ku, Osaka 540, Japan takayama@kansai.okLco.jp, takayama@icot.or.jp Abstract This paper proposes a constructive logic in which a concurrent system can be defined as a proof of a specification. The logic is defined by adding stream types and several rules for them to an ordinary constructive logic. The unique feature of the obtained system is in the (!vIP ST) rule which is a kind of structural induction on streams. The (AI PST) rule is based on the idea of largest fixed point inductions, but the formulation of the rule is quite different and it allows to define a concurrent process as a Burge's mapstream function with a good intuition on computation. This formulation is possible when streams are viewed as sequences not infinite lists. Also, our logic has explicit nondeterminacy but we do not introduce any extralogica.l device. Our nondeterminacy rule, (N onDet), is actually a defined rule which uses inherent nondeterminacy in the traditional intuitionistic logic. Several techniques of defining stream based concurrent programs are also presented through various examples. 1 Introduction Constructive logics give a method for formal development of programs, e.g., [C+86, HN89]. Suppose, for example, the following formula: \:Ix : D 1 .3y : D 2 • A(x, y). This is regarded as a specification of a functIon, j, whose domain is Dl and the codomain is D2 satisfying the input-output relation, A(x,y), that is, \:Ix : D 1 • A(x, j(x)) holds. This functional interpretation of formulas is realized mechanically. Namely, if a constructive proof of the formula is given, the function, j, is extracted from the proof with q-realizability interpretation [TvD8S] or with Curry-Howard correspondence of types and formulas [HowSO]. This programming methodology will be referred to as constnlctive programming [SK90] in the following. Although constructive programming has been studied by many researchers, the constructive systems which can handle concurrency are ra.ther few. This is mainly be*This work was supported by ICOT as a joint research project on theorem proving and its application. cause most of the constructive logics. have been formalized as intuitionistic logics, and the intuitionism itself does not have explicit concurrency besides proof normalization corresponding to the execution of programs [Got85]. For example, QJ [Sat87] is an intuitionistic programming'logic for a concurrent language, Quty. However, when we view QJ as a constructive programming system, concurrency only appears in the operational semantics of Quty. Linear Logic [Gir87] gives a new formulation of constructive logic which is not based on intuitionism. This is the first constructive logic which can handle concurrency at the level of logic. The logic was obtained by refining logical connectives of traditional intuitionistic or classical logic to introduce drastically new connectives with the meaning of parallel execution. In Linear Logic, formulas are regarded as processes or resources and every rule of inference defines the behavior of a concurrent operation. Linear Logic resembles Milner's SCCS [Mi189] in this respect. We take intermediate approach between QJ and Linear Logic in the sense of not throwing away but extending intuitionistic logic. The advantage of this approach is that the functional interpretation of logical connectives in the traditional constructive programming based on intuitionism is preserved, and that both the sequential and concurrent parts of programs are naturally described as constructive proofs. To this end, we take the stream based concurrent programming model [KM74]. We introduce stream types and quantification over stream types. A formula is regarded as a specification of a process when it is a universally or an existentially quantified over stream types, and otherwise it represents a specification of a sequential function, properties of processes 9r linkage relation between processes. A typical process, \:IX.::lY.A(X, Y) where X and Yare stream variables, is regarded as a stream transformer. Most of the rules of inference are those of ordinary constructive programming systems, but rules for non determinacy and for stream types are also introduced. Among them, a kind of structural induction on stream types called (M PST) is the heart of our extended system: With (M PST), stream transformers can be defined as Burge's mapstream functions [Bur75]. 659 T. Hagino [Hag87] gave a clear categorical formalization of stream types (infinite list types or lazy types) whose canonical elements are given by a schema of mapstream functions, but relation between his formulation and logic is not investigated. N. Mendler and others [PL86] introduced lazy types and the type checking rules for them into an intuitionistic type theory preserving the propositions-as-types principle in the sense that an empty type can exist even in the extended type theory. However, they do not give sufficient rules of inference for proving specification of stream handling programs. Reasoning about stream transformer can be handled with a largest fixed point induction as was demonstrated by P. Dybjer and H. P. Sander [DS89]. However, their system is designed as a program verification system not as a constructive programming system. Although q-realizability interpretation for program extraction can be defined for the coinduction rule [KT91]' the rule seems rather difficult to use for proving specifications. The reason is that the coinduction rule deeply depends on the notion of bisimulation, so that in the proof procedure one must find a stronger logical relation included in the more general logical relation and that is not always an easy task. The (~1 PST) rule is based on a similar idea to the coinduction rule: one must find a new logical relation and a new function to prove the conclusion. However, what one must find has a clear intuitive meaning as the components of a concurrent process. Therefore, the (111 PST) rule shows an intuitive guideline on how to construct a concurrent process. Section 2 explains how a concurrent system is specified in logic. A process is specified by the VX.::lYA(X, Y) type formula as in the traditional constructive programming. The rest of the sections focus on the problem of defining processes which meet the specifications. Section 3 formulates streams and stream types. Streams are viewed as infinite lists or programs which generate infinite lists at the level of underlying programming language. At the logical reasoning level, streams are sequences, namely, total functions on natural numbers. This two level formulation of streams enables to introduce (JIll PST) which will be given in section 4. Section 5 presents the rest of the formalism of the whole system. The realizability interpretation which gives the program 'extraction algorithm from proofs will be defined. Several examples will be given in section 6 to demonstrate how stream based concurrent programming is performed in our system. Notational preliminary: 'Ne assume first order intuitionistic natural deduction. Equalities of terms, typing relations (}vI : 0'), and T (true) are atomic formulas. The domain of the quantification is often omitted when it is clear from the context. Sequences of variables are denoted as x or X. ~lx[N] denotes substitution of N to the variable, x, occurring freely in 111. 1I1x[N] denotes simultaneous substitution. FV(M) is the set of free variables in M. (::) denotes the (infinite) list constructor. Function application is denoted ap(M, N) or 1I1(N). Mn(N) denotes M(· .. ~1( N) ... ). '-......--' n 2 Specifying Concurrent tems in Logic Sys- The model of concurrent computation in this paper is as follows: A concurrent system consists of processes linked with streams. A process interacts with other processes only through input and output streams. The configuration of processes in a concurrent system is basically static and finite, but in some cases, which will be explained later, infinitely many new processes may be created by already existing processes. A process is regarded as a transformer (stream transformer) of input streams to an output stream, and it is specified by the 'IX : lO'l,oo',O'n. ::lY : IT". A(X, Y) type of formula where 10' denotes the type of streams over the type 0', but its definition will be given later. l u1 ,oo.,O'n is an abbreviation of 10'1 x· .. x l un , X and l' are input and output streams, and A(X, 1') is the relation definition of input and output streams. The combination of two processes, VX.::lY A(X, Y) and VP.::lQ. B(P, Q), by linking the stream l' and P is described by the following proof procedure: ~l VX.::lY A(X,1') ('IE) ::l1'. A(X, 1') IIo (::lE)(l) ::l1'.::la. A(X, a) & B(a, 1') (VI) VX.::l1'.3a. A(X, a) & B(a, Y) where ITo ~ ~2 VP.::lQ. B(P, Q) ('IE) ::lQ. B(1",Q) III (::lE)(2) 31'.3a. A(X,a) & B(a,Y) and III ~ [A(X, y,)](l) [B(y, Q,)](2) A(X, 1") & B(1" , Q') (& I) (3I) ::la. A(X,a) & B(a,Q') (31) ::l1'.::la. A(X, a) & B(a, 1') and ~l and ~2 are the definition of process VX.::l1'.A(X,1') and VP.::lQ.B(P, Q). This is a typical proof style to define a composition of two functions. Thus, a concurrent system is also specified by VX.3Y A(X, 1') type formula. X and l' are input and output streams of the whole concurrent system, and a is an internal stream. All these things just realize the idea that functions can be viewed as a special case of processes. In the following, we focus on the problem of how to define a process (stream transformer) as a constructive proof. 660 3 r, n : nat I- hd(tln(s)) Formulation of Streams Two Level Stream Types A stream can be viewed at least in three ways: an infinite list, an infinite process, and an output sequence of a.n infinite process, namely, a total function on natura.! numbers. The formal theories of lazy functional programming such as [PL86] and [Hag87] can be regarded as the theories of concurrent functional programming based on the first two points of view on streams .. Our system uses a lazy typed lambda calculus as the underlying programming language and has lazy types as computational stream types. Computational stream types are only used as the type system for the underlying language. In proving specifications of stream transformers, we use logical stream types which are based on the third point of view on streams. In other words, we have two kinds of streams: computational streams at the programming language level, and logical streams at the logical reasoning level. Vve denote a computational stream type Cu and a logical stream type Iu' The following is the basic rules for computational stream types. The idea behind them is similar to that behind the lazy type rules in [PL86]. We confuse the meaning of the infinite list constructor, (::), and will use this also as an infinite cartesian product constructor. Vie abbreviate 111 ~ N for 111 = N in CT in the following. r I- 111 : CT r I- S : Cu r I- (111 :: S) : Cu r1-1I1~N r rl-S~T I- (111 :: S) ~ (N :: T) r I- A1 %:: N r I- (111 :: S) ~ (N :: T) r I- (111 :: S) ~ (N :: T) r,Z: T I- M: T r I- liZ. 111: T' rl-S~T where T is Cu or 7 - t Cu' // is the fixed point operator only used for describing a stream as an infinite process (infinite loop program). The reduction rule for II-terms is defined as expected. hd and tl are the primitive destructor functions on streams. I- M: Cu r I- 111 : Cu I- hd( 111) : CT r r r r I- X: Cu r I- X ~ (hd(X) :: tl(X)) r I- (111 :: S) : Cu r I- hd((l11:: S)) ~ 111 r, n : nat, tln(s) r I- (M :: S) : Cu r I- tl((111 :: S)) ~S ~ tln(T) I- S Un~C<7 T rl-S~T r I- A1z ~ Nw[z] r I- liZ. Mz ~ IIW. Nw Before giving the definition of logical stream types, note that the type, nat - t CT, is isomorphic to Cu , namely, Proposition 1: Let CT be any type, then Let

0 Note that X(n) = hd(tln(x)) for arbitrary X : Iu and n : nat. All the rules for hd, tl and (::) in computational streams also hold for these defined functions and the constructor for logical streams. 3.2 Quantification over Logical Stream Types There is a difficulty in defining the meaning of quantification over (logical) stream types. The standard intuition- 661 is tic interpretation of, say, existential quantification over a type, CT, :lx : CT.A(x) is that "we can explicitly give the object, a, of type CT such that A(a) holds". However, as a stream is a partial object we can only give an approximation of the complete object at any moment. Therefore we need to extend the familiar interpretation of quantification over types. In fact, Brouwer's theory of choice sequences [TvD88] in intuitionism provides us with the meaning of quantification over infinite sequences. There are two principles in Brouwer's theory, the principle of open data and the principle of function continuity. The principle of open data, which informally states that for independent sequences any property which can be asserted must depend on initial segments of those sequences only, gives the meaning of the quantification of type \lX.:ly.A(X, y). That is, for an arbitrary sequence, X, there is a suitable initial finite segment, X o, of X such that :ly. A(Xo, y) holds. The principle of function continuity gives the meaning of the quantification of type \lX.:lY.A(X, Y). Assume the case of natural number streams (total functions between natural number types). The function continuity is stated as follows: \lX.:lY. A(X, Y) =? :lj : K. \IX. A(X, fiX) where fiX = Y is an abbreviation of \Ix : nat. f(x .. X) = Y (x) and J( is the class of functions that take initial finite segment of the input sequences and return the values. This means that every element of Y is determined with a suitabl~ initial finite segment of X. These principles meet our intuition of functions on streams and stream transformers very well. \IX : 1"..:ly : r.A(X, y) represents a function on streams over CT, but we would hardly ever try to define a function which returns a value after taking all the elements of an input stream. Also, we would expect a stream transformer, \IX : 1"..:lY : I.r.A(X, Y), calculate the elements of the output stream, Y, gradually by taking finitely many elements of the input stream, X, at any step of the calculation. Note that this semantics also meets the proof method used in [KM74j: To prove a property P(X) on a stream X, we first prove P for an initial finite subsequence, X o, of X (I- P(Xo)) and define I- P(X) to be limxo--+x P(Xo). 4 Structural Inductiori on Logical Streams As streams ca.n be regarded as infinite lists, we would expect to extend the familiar structural induction on finite lists to streams. However, a naive extension of the structural induction on finite lists does not work well. If we allow the rule below, f, A(tl(X)) I-jA(X) (S1) f I- \IX : 1".. A(X) the following wrong theorem can be proved: WrongTheorem: \IX : 1nat . B(X) where B(X) ~:ln : nat. X(n) = 100. Proof: By (S1) on X : 1nat . Assume B(tl(X)). Then, there is a natural number k such that tl(X)(k) = X(k + 1) = 100. Then B(X).I This proof would correspond to the following uninteresting program: foo = )"X. foo(tl(X)). This is because the naive extension of the structural rule on finite lists does not maintain the continuity of the function on streams. Therefore, we need a drastically different idea in the case of infinite lists. One candidate is the coinduction rule (a largest fixed point induction) as in [DS89j: (B =? p[B]) =? (B =? lIP.cP) where 1IP. denotes the largest fixed point of P = . \IX : 1"..A(X) part will be described with lIP. type formulas, and one must find a suitaQle logical relation B to prove the conclusion. But searching B will not always be an easy task: we wish the searching task decomposed into more than one smaller tasks each of which has clear and intuitive meaning of computation. Therefore, we take another approach: the (M PST) rule. 4.1 Mapstream Functions as Stream Transformers Recall that the motivation of pursuing a kind of structural induction on streams is to define stream transformers as proofs, and stream transformers can be realized as Burge's mapstream functions. A schema of mapstream functions is described in typed lambda calculus as follows: P = )..MT-+"'.)..NT-+T.)..XT. (eM x) :: (((P M) N) (N x))) If we give the procedures M and N, we obtain a mapstream function. Note that, from the viewpoint of continuity, these procedures should be as follows: M "Fetch initial segment, Xo, of the input stream, X, to generate the first element of the output stream. " N "Prepare for fetching the next finite segment input stream interleaving, if necessary, other stream transformer between the original input stream and the input port. This suggests that if a way to define M, N, and P as proof procedures is given, one can define stream transformers as constructive proofs. = = 4.2 A Problem of Empty Stream Before giving the rule of inference for defining stream transformers, a little more observation of stream based programming is needed. Assume a filter program on natural number streams realized as a mapstream function: flt a= )"X. if (alhd(X)) then Jlta(tl(X)) else (hd(X) :: flta(tl(X))) = )"X.((M X) :: (((P M) N)(NX))) 662 where (alhd(X)) is true when hd(X) ca.n be divided by a (a. na.tural number) and !ll _ ,\X. if (alhcZ(X)) then Al(tl(X)) else hd(X) N == AX. if (alhd(X)) then N(tl(X)) else tl(X) For example, flt s ((5 :: 5 :: 5 :: 5 :: ... )) is an empty sequence because the evaluation of ~M(5 :: 5 :: 5 :: 5 :: ... ) does not terminate. This contradicts the principle of open data explained in 3.2. To handle such a case, we introduce the notion of complete stream. The idea is to regard flts, for example, always generating some elements even if the input stream is (5 :: 5 :: ... ). Def. 1: Complete types Let ()" be a.ny type other than a strea.m type, then ()".I.. denotes a type ()" together with the bottom element -L" (often denoted just -L) and it is called a complete type. Def. 2: Complete stream types A stream type, 1" or C", is called complete when ()" is a complete type. flt s is easily modified to a function from C nat to Cnatl.' and then flt s ((5 :: 5 :: ... )) will be (-L :: -L :: ... ) which is practically an empty stream. 4.3 The (1\11PST) rule Based on the observations in the previous sections, we introduce a rule (M PST) for defining stream transformers. The rule is formulated in natural deduction style, but the formula, A, in the specification of a stream transformer, VX.3Y A(X, Y), is restricted. In spite of the restriction, the rule can handle a fairly large class of specifications of stream transformers as will be demonstrated later. The rule is as follows: (a) VX : Ier.3a : T. M(X, a) (b) VX : Ier.Va : T. VS : IT' (Al(X, a) ::} A(O,X, (a :: S))) (c) 3f : Ier -+ Ier. VX : Ier.\fY : IT.Vn : nat. (A(n, f(X), tl(Y)) ::} A(n + 1, X, Y)) VX : Ier.3Y : IT.Vn : nat. A(n,X, Y) where Al is a suitable predicate and A(n, X, Y) must be a rank 0 formula [HNS9]. We can easily extend the rule to the multiple input stream version. \lve do not give the precise definition of rank 0 formulas here, but the intention is that we should not expect to extract any computational meaning from A(n, X, Y) part. This restriction comes from purely technical reason, but does not degenerate the expressive power of the rule from the practical point of view because we usually need only to define a stream transformer program but not the verification code corresponding to A(n, X, Y) part. The technical reason for the side condition of (AlP ST) is as follows: (AIPST) is in fact a derived rule with (ST) and (CON), so that q-rea.lizability interpretation defined in the next section is carried out using the interpretation of those rules. The difficulty resides in the interpretation of the (CON) rule, but if we restrict the formula A(n, X) in (CON) to be rank 0, the interpretation is trivial. This condition corresponds to to side condition of (M PST). The intuitive meaning of (.N! PST) is as follows. As explained in 4.1, a mapstream function is defined when Al and N procedure are given. (a) is the specification of the M procedure, fM' and (b) means that fM certainly generates the right elements of the output stream. The N procedure, fN, is defined as the value of existentially quantified variable, f, in (c). (c) together with (b) intuitively means the following: for X : I" (input stream) and Y : IT (output stream), let us call a pair, (fN(X), tln(y)), the nth fN-descendant of (X, Y). Then, for arbitrary n : nat, A(n, X, Y) speaks about nth fN-descendant of (X, Y), and A(n, fN(X), tl(Y)) actually speaks about n + lth fN-descendant of (X, Y). If fN is a stream transformer, this means that the process (stream transformer) defined by (M PST) generates another processes dynamically. Note that, as we must give a suitable formula, M, to prove the conclusion, (M PST) is essentially a second order rule. 5 The Formal System This section presents the rest of the formalization of our system briefly. 5.1 Non-deterministic 'x-calculus The non-deterministic A-calculus is a typed concurrent calculus based on parallel reduction and this is used as the underlying programming language. The core part is almost the same as that given in [Tak91]. It has natural numbers, booleans (T and F), Land R as constants. Individual variables, lambda-abstractions, application, sequences of terms ((MI, ... , Mn) where Mi are terms), if-then-else, and a fixed point operator (f.L) are used as terms and program constructs. The reduction rules for terms are defined as expected, and if a term, M, is reducible to a term, N, then AI and N are regarded as equal. Also, several primitive functions are provided for arithmetic operations and for the handling of sequences of terms such as projection of elements or subsequences from a sequence of terms. The type structure of the calculus is almost that of simply typed A-calculi. nat (natural number type), bool (boolean types), and 2 (type of Land R) are primitive types and x (cartesian product) and -+ (arrow) are used as type constructors. The type inference rules for this fragment of the calculus are defined as expected. In addition to them, computational streams, computational stream types and a special term called coin flipper is introduced to describe concurrent computation of streams. For the reduction strategy, /1- 663 terms in section 3.1 are lazily evaluated. The coin flipper is a device for simulating nondeterminacy. It is a term, ., whose computational meaning is given by the following reduction rule: • t> Lor R That is, • reduces to L or R in a nondeterministic way. This is like flipping a coin, or can be regarded as hiding some particular decision procedure whose execution may not always be explained by the reduction mechanism. • is regarded as an element of 2+, a super type of 2. The elements of 2 have been used to describe the decision procedure of if-then-else programs in the program extraction from constructive proofs in [Tak91) as if T = L then A1 else N. Nondeterminacy arises when T is replaced by •. The intentional semantics of • IS undefined. 2+ enjoys the following typing rules: L : 2+ R : 2+ • : 2+ Let A be a formula. defined as follows: Then, a type of A, type(A), IS 1. type(A) is empty, if A is rank 0; 2. type(A & B) ~ type(A) x type(B); 3. type(A V B) ~ 2+ X type(A) x type(B); 4. type(A ~ B) ~f type(A) 5. type(Vx: 0". A) ~ 0" 6. type(3x : 0". A) ~ -t 0" X -t type(B); type(A); type(A); Proposition 2: Let A be a formula with a free variable x. Then, type(A) = type(Ax[M)) for any term 111 of the same type as x. Def. 5: q-rea.lizability 1. If A is a: rank 0 formula, then () q A ~ A; 5.2 Rules of Inference 2. A 1\1 : 0" 5.3 a: 0" n: nat ap(Mn,a) : 0" - t 0" 4. a q Vx : g: A ~ Vx : 0". (a(x) q A); rank 0; 6. • q A V A ~ A if A is rank 0; 7. (a,b) q A & B ~f a q A & b q B. Proposition 3: Let A be any formula. Ifa q A, then a: type(A). 0"2 - t T2 0"2 - t Tl X T2 Realizability Interpretation The realizability defined in this section is a variant of q-realizability [TvD88). A new class of formulas called realizability relations is introduced to define q-realizability. Def. 3: Realizability relation A 'realizability Telation is an expression in the form of a q A, where A is a formula and a is a finite sequence of variables which does not occur in A. a is called a Tealizing vaTiables of A. For a term A1, A1 q A, which reads "a term 1\1 realizes a formula A", denotes (a q A)a[A1], and A1 is called a TealizeT of A. Theorem: Soundness of realizability: Assume that A is a formula. If A is proved, then there is a term, T, such that T q A can be proved in a trivially extended logic in which realizability relations are regarded as formulas, and FV(T) C FV(A). The proof of the theorem gives the algorithm of program extraction from constructive proofs. The program extracted from (NonDet) is if • = L then Meise N where M and N are the program extracted from the subproofs of two premises. From a proof by (MPST), the program AX.Am.apUM, f'N(X)) is extracted where fM and fN are as explained in section 4.3. Other part of the extraction algorithm can be seen in [Tak91). 6 A type is assigned for each formula, which is actually the t.ype of the realizer of the formula. Def. 4: type(A) 0". 5. (z,a,b) q A VB ~f (z = L & A & a q A & b: type(B)) V (z = R & B & b q B & a : type(A)) provided that A and B are distinct or A = B with A and B not A f : 0"1 - t T1 f X g : 0"1 X B ~ Vb : type(A).(A & b q A ~ a(b) q B); 3. (a, b) q 3x : 0'. A ~ a : 0" & Ax[a) & b q Ax[a); (1) Logical Rules The rules for logical connectives and quantifiers are those of first order intuitionistic natural deduction with mathematical induction. (2) Rules for Nondeterminacy • = Lv. = R A (NonDet) (N onDet) is actually a derived rule: This is obtained by proving A by divide and conquer on TVT. (NonDet) means that if two distinct proof of A are given, one of them will be chosen in a nondeterministic way. This is the well-known nondeterminacy both in classical and intuitionistic natural deduction. (3) Auxiliary Rules aq A ~ Examples The basic programming technique with (A1 PST) is demonstrated in this section. In the following, we write Xn for X(n) when X is a stream. 664 6.1 6.3 Simple Examples A process which doubles each element of the input natural number stream is defined as follows: SPEC 1: VX : I nat .3Y : Inat.Vn : nat. Yn = 2· Xn The proof is continued by (!I1PST). Let M(X, a) ~ a = 2 . hd(X), and (a) and (b) are easily proved. (c) is proved by letting f = )..X. tl(X) .• Proof: The following example, a program which extracts only prime numbers in the input stream, is one of the typical examples of dynamic creation of new prqcesses. SPEC 4: VX: I nat .3Y: Inatl..Vn: nat. OA(n,X,Y) where A(n,X,Y) The program extracted from the proof is )"X.)..m. 2 . hd(tlm(x)) which is, by the isomorphism cp, extensionally equal to l/z.)..X. (2· hd(X) :: z(tl(X))). A process which takes the successive two elements at once from the input stream and outputs the sum of them is defined as follows: SPEC 2: VX: IO'.3Y: Iq.Vn : nat. Yn = X 2 .n + X 2 ' n+1 (MPST). Let !I1(X,a) ~ a = hd(X) + hd(tl(X)) and (a) and (b) are easily proved. (c) is proved by letting f 'X.hd(X) and fN ~ >'(X, Y). if • L then (tl(X), Y) else (Y, tl(X)). 7 Conclusion and Future Works An extension of constructive programming to stream based concurrent programming was proposed in this paper. The system has lazy types at the level of programming language and logical stream types, which are types of sequences viewed as streams, at the level of logic. This two level formulation of streams enables to formulate a purely natural deduction style of structural induction on streams (lv[ PST) in which concurrent processes (stream transformers) are defined as proofs. The (MPST) rule allows to develop the proof of a specification with a good intuition on the concurrent process to be defined, and the rule seems to be easier to handle than the largest fixed point induction. Also, nondeterminacy was introduced at the level of logic using the inherent nondeterminacy of proof normalization in intuitionistic logic. For the future work, as seen in the example of a merger process, the side condition for (M PST) should be relaxed to handle larger varieties of concurrent processes. References [Bur75] W. H. Burge. Recursive Programming Techniques. Addison-V\lesley, 1975. [C+86] R. 1. Constable et al. Implementing Mathematics with the Nuprl Proof Development System. Prentice-Hall, 1986. [DS89] P. Dybjer and H. P. Sander. A Functional Programming Approach to the Specification and Verification of Concurrent Systems. Formal Aspects of Computing, 1:303 - 319, 1989. [Gir87] J.- Y. Girard. Linear logic. Theoretical Computer Science, 50, 1987. North-Holland. [Got85] S. Goto. Concurrency in proof normalization and logic programming. In Internatioinal Joint Conference on Artificial Intelligence '85, 1985. [Hag87] T. Ragino. A Typed Lambda Calculus with Categorical Type Constructors. In Category Theory and Computer Science, LNCS 283, 1987. [HN89] S. Hayashi and H. Nakano. PX : A Computational Logic. The MIT Press, 1989. [How80J W. A. Howard. The formulas-as-types notion of construction. In Essays on Combinatory Logic, Lambda Calculus and Formalism, eds. J . P. Seldin and J. R. Hindley. Academic Press, 1980. [KM74] G. Kahn and D. B. MacQueen. The Semantics of a Simple Language for Parallel Programming. In IFIP Congress 74. North-Holland, 1974. [KT91] S. Kobayashi and M. Tatsuta. private communication. 1991. [Mi189] R. Milner. Communication- and Concurrency. Prentice Hall, 1989. [PL86] N. Mendler P. Panangaden and R. L.Constable. Infinite Objects in Type Theory. In Symposium on Logic in Computer Science'86, 1986. [Sat87] M. Sato. Quty: A Concurrent Language Based on Logic and Function. In Fourth International Conference on Logic Programming, pages 10341056. The MIT Press, 1987. [SK90] M. Sato and Y. Kameyama. Constructive Programming in SST. In Proceedings of the Japanese-Czechoslovak Seminar on Theoretical Foundations of [(nowledge Information Processing, pages 23-30, INORGA, 1990. [Tak91] Y. Takayama. Extraction of Redundancy-free Programs from Constructive Natural Deduction Proofs. Journal of Symbolic Computation, 12(1):29-69, 1991. [TvD88] A. S. Troelstra and D. van Dalen. Constructivism in A1athematics, An Introduction. Studies in Logic and the Foundation of Mathematics 121 and 123. North-Holland, 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 666 Realizability Interpretation of Coinductive Definitions and Program Synthesis with Streams Makoto Tatsuta Research Institute of Electrical Communication, Tohoku University, 2-1-1 Katahira, Sendai 980, JAPAN e-mail: tatsuta@riec.tohoku.ac.jp Abstract The main aim of this paper is to construct a logic by which properties of programs can be formalized for verification, synthesis and transformation of programs. This paper has 2 main points. One point is realizability interpretation of coinductive definitions of predicates. The other point is an extraction of programs which treat streams. An untyped predicative theory TID 1I is presented, which has the facility of coinductive definitions of predicates and is based on a constructive logic. Properties defined by the greatest fixed point, such as streams and the extensional equality of streams, can be formalized by the facility of coinductive definitions of predicates in TIDI/' q-realizability interpretation for TIDI/ is defined and the realizability interpretation is proved to be sound. By the realizability interpretation, a program which treats streams can be extracted from a proof of its specification in TIDI/' General program extraction theorem and stream program extraction theorem are presented. 1 Introduction Our main aim is to construct a logic by which we can formalize properties of programs for verification, synthesis and transformation of programs. In this paper, we concentrate on formalization of programs with streams and present a theory TIDI/' Coinductive definitions are very important for this purpose. Properties of streams are represented semantically by the greatest fixed point. The predicate representing what a stream is and the extensional equality of streams are defined semantically by the greatest fixed point. These properties defined by the greatest fixed point can be formalized by coinductively defined predicates and coinduction. It-calculus has been studied to formalize programs with streams for verification [3]. {L-calculus has the facility of coinducti;e definitions of predicates and coinduction and is based on classical logic. In this paper, we present a theory T1DI/' which has the facility of coinductive definitions of predicates and coinduction and is based on a constructive logic. By these facilities we can formalize properties of programs with streams in TIDI/' Our theory T1DI/ is based on a constructive logic because we want to use the facility of program extraction by realizability for TIDI/' Program extraction is one of the benefits we get when we use a constructive formal theory to formalize properties of programs. Program extraction is to get a program from a constructive proof of its specification formula. One method of program extraction is to use realizability interpretation. In PX[4], for example, a LISP program is extracted from a proof of its specification formula by realizability interpretation. By the facility of coinductive definitions of predicates and realizability interpretation, we can synthesize programs with streams naturally in TIDI/ using theorem proving techniques. This paper has 2 main points. One point is realizability interpretation of coinductive definitions. The other point is an extraction of programs with streams. We present an untyped predicative theory T1DI/' which has coinductive definitions of predicates and is based on a constructive logic. We define q-realizability interpretation of TIDI/' We show that the realizability interpretation is sound. We present general program extraction theorem and stream program extraction theorem. The soundness proof is based on the early version of this paper [8]. The soundness theorem was proved also in [5]. Both works are independent. In Section 2, we define a theory TIDI/' In Section 3, we briefly explain how useful the facility of coinductive definitions of predicates is to formalize streams. In Section 4, we discuss a model of T1DI/ and prove its consistency. In Section 5, we present q-realizability interpretation of TIDI/ and prove the soundness theorem. In Section 6, we give general program extraction theorem, stream 667 program extraction theorem for T1Dv and an example of program synthesis. 2 Theory T1Dv 'vVe present a theory T1Dv in this section. It is the same as Beeson's EON [1] except for the axioms of coinductive definitions of predicates. In this paper, we choose combinators as the target programming language for simplicity since we want to concentrate on the topic of coinductive definitions of predicates. We suppose that the evaluation strategy of combinators is lazy or call by name because we represent a stream by an infinite list, which is a non-terminating term. We omit also the formalization of the lazy or call by name evaluation strategy in T1Dv for simplicity. Definition 2.1. (Language of T1Dv) The language of T1Dv is based on a first order language but extended for coinductive definitions of predicates. The constants are: K, S, p, Po, PI' 0, SN, PN, d. We choose combinators as a target programming language for simplicity. K and S mean the usual basic combinators. We have natural numbers as primitiyes, which are given by 0, a successor function SN and a predecessor function PN. We also have paring functions p, Po and PI as built-in, which correspond to cons, car and cdr in LISP respectively. d is a combinator judging equality of natural numbers and corresponds to an if-then-else statement in a usual programming language. We have only one function symbol: App whose arity is 2. It means a functional application of combinators. Terms are defined in the same way as for a usual first order logic. For terms s, i, we abbreviate App(s, i) as si. For terms s, i, we also use an abbreviation (s, i) == psi, to == Pot and tl == PIt. The predicate symbols are: 1.., N, -. V.,re have predicate variables, which a first order language does not have. The predicate variables are: X, Y, Z, ... , X*, Y*, Z*, .... Each predicate variable has a fixed arity. We use an abbreviation Ax.i which is constructed by combinators in the usual way. We also abbreviate Y(>..x.t) as j.lx.i where Y == Af.(Ax.f(XX ))(>..x.f(xx )). Definition 2.2. (Formula) We define a formula A, a set S+(A) of predicate variables which occur positively in A and a set S_(A) of predicate variables which occur negatively in A. 1. If a, b are terms, .1, N(a), a = b are formulas. Then S+(1..) = S_(1..) S+(N(a)) S+(a = b) = , = S_(N(a)) = , = S_(a = b) = . 2. If X is a predicate variable whose arity is n, X(XI"" ,x n) is a formula and S+(X(XI,'" ,x n)) = {X}, S-CX(XI"" ,xn )) = . 3. A & B, A V B, A ~ B, VxA, :lxA are formulas if A and B are formulas in the same way as a first order language. Then S+CA & B) = S+(A V B) = S+(A) U S+(B), S_CA & B) = S_(A V B) = S_(A) U S_CB), S+(A ~ B) = S_(A) U S+(B), S_(A ~ B) = S+(A) U S_(B), S+CVxA) = S+(:lxA) = S+(A), S_(VxA) = S_(:lxA) = S_(A). 4. (VX.AXI'" xn.A)(t l , ... , in) is a formula where X is a predicate variable whose arity is n, A is a formula, i l , ... , tn are terms and X is not in S_(A). Then S+((VX.AXI ... xn.A)(i l , ... , tn)) = S+(A) - {X}, S_((VX.AXI'" xn.A)(i b ···, in)) = S_(A). .1 means contradiction. N(a) means that a is a natural number. a = b means that a equals to b. The last case corresponds to coinductively defined predicates. Remark that X and Xl, ... , Xn may occur freely in A. The intuitive meaning of a formula (VX.AXI ... xn.A(X, XI, ... ,xn))(i l , .. . ,in) is as follows: Let P be a predicate of arity n such that P is the greatest solution of an equation P(XI,"" Xn) f-4 A(P, Xb"" xn). Then (VX.AXI'" xn.A(X, Xl, ... , Xn))(t l , ... , tn) means P(tl,' .. ,tn ) intuitively. We abbreviate a sequence as a bold type symbol, for example, Xl, ... ,X n as x. Example 2.3. We give an example of a formula. We assume the arity of a predicate variable P is 1. Then (VP.AX.X = (Xo, Xl) & Xo = & P(XI))(X) is a formula. ° Among many axioms and inference rules of TID v, we discuss only inference rules of coinductive definitions of predicates here. The rest of axioms and inference rules are almost the same as EON [1] and we only list them in Appendix A. 668 Definition 2.4. (Coinductive Defi.nitions) Let v == vP.Ax.A(P) where x is a sequence of variables whose length is the same as the arity of a predicate variable P and A(P) is a formula displaying all the occurrences of P in a formula A. Suppose that C(x) is a formula displaying all the occurrences of variables x in the formula. Vie have the following axioms: Vx(v(x) -+ A(v)), (vI) Vx(C(x) -+ A(C)) -+ Vx(C(x) -+ vex)). (v2) v P.Ax.A(P) means the greatest fixed point of the function from a pr~dicate P to a predicate Ax.A(P). We define a theory TID- as a theory T1Dv except for the 2 axioms of coinductive definitions of predicates. 3 Coinductive Predicates Definitions of We explain coinductive definitions of T1Dv and show some examples of formalization of streams by coinductive defini tions. or the fixed point of the function AP.AX.X = (Xo, Xl) & (Xo = 0 V Xo = 1) & P(Xl). (2) There may be many solutions P for (1). For example, AX.l. is one solution of (1), though it is not our intended solution. AX.l. is the least solution. Our intended solution is the greatest solution of (1) or the greatest fixed point of (2). Hence we have the solution in TIDv and it is represented as follows: BS == v P.AX.X = (Xo, Xl) & (Xo = 0 V Xo = 1) & P(Xl). Let 0 be f-Ls.(O, s). 0 represents the zero stream whose elements are all o. We can show BS(O) by coinduction (v2). Let C be AX.(X = 0) in (v2), then we have Vx(x = 0-+ X = (xo, Xl) & (xo = 0 V Xo -+ Vx(x = 0 -+ BS(x)). = 1) & Xl = 0) By definition of 0, Vx(x = 0-+ X = (xo, Xl) & (xo = 0 V Xo = 1) & Proposition 3.1. Let v be vX.Ax.A(X). Then Vx(v(x) f-t A(v)) holds. Xl = 0) holds and we have Vx(x = 0 -+ BS(x)). ( vI') Proof 3.2. By (vI), we get vex) -+ A(v). By letting C be Ax.A(v) in (v2), A(v) -+ vex) holds. 0 This proposition shows that vP.Ax.A(P) is the solution of the following recursive equation of a predicate P: P(x) f-t A(P). (v2) says that vP.Ax.A(P) is the greatest solution of Let X = 0" then we get BS(O"). The coinductive definitions of predicates play an important role also to represent predicates of properties of streams [3, 6]. We will define the extensional equality s ~ t for streams sand t. This equality can be represented by the coinductive definitions of predicates. ~ is the greatest solution of the following equation for a predicate P: P(x, y) f-t Xo = Yo & P(Xb Yd. Therefore ~ can be formalized in TIDII as follows: ~ == VP.AXY·Xo = Yo & P(Xl,Yl). this equation or the greatest fixed point of the functi~n AP.AX.A(P). Streams can be formalized by coinductive definitions [3]. Therefore we can formalize streams in TID II . We represent a stream by an infinite list (a, s) constructed by pairing where a is the first element of the stream, s is the rest of the stream. In this representation, if s is a stream, we can get the first element of s by So and the rest by Sl. We present an example of bit streams. A bit stream is a stream whose elements are 0 or 1. We will define a predicate BS(x) which means that x is a bit stream. When we write down a formula BS(x) in a naive way, BS itself occurs in the body of the definition as follows: BS(x) f-t x = (xo, Xl) & (xo = 0 V Xo = 1) & BS(Xl). BS is a solution P of the following equation for a predicate P P(x) f-t 4 Model of TIDv We will briefly explain semantics of TIDII by giving its intended model. We will use classical set theory and the well-known greatest fixed point theorem for model construction in this section. Theorem 4.1. (Greatest Fixed Point) Suppose S be a set, p( S) be a power set of S. If f : p( S) -+ p( S) is a monotone function, there exists a such that a E p( S) and 1. f(a) = a, 2. For any bE peS), if b c feb), then be a. 669 a is abbreviated as gfp(f). We will construct a model M' of TIDv extending an arbitrary model M of TID-. Our intended model of TID- is the closed total term model whose universe is the set of closed terms [1]. We denote the universe by U. We will define p 1= A in almost the same way as for a first order logic where A is a formula and p is an environment which assigns a first order variable to an element of U and a predicate variable of arity n to a subset of un and which covers all the free first order variables and all the free predicate variables of A. We present only the definition for the case (vP.>.x.A(P))(t). Define F as follows: Ixl = n, F : p(U n ) - t p(U n ), F(X) = {x E un I p[P := X] 1= A(P)}, where p[P := X] is defined as follows: p[P := X](P) = X, p[P := X](x) = p(x) if x is not P. Then p 1= (vP.>.x.A(P))(t) is defined as t E gfp(F). Note that F is monotone since a predicate variable P occurs only positively in A(P). Theorem 4.2. If TIDv f- A, then p 1= A for any environment p which covers all the free variables of A. Theorem 4.3. T1Dv is consistent. 5 q-Realizability Interpretation of TIDv We will explain motivation of our realizability. We start with a usual q-realizability and try to interpret (vP.>.x.A(P))(x). Let v be vP>.x.A(P) and then v(x) f-+ A(v, x) holds. We want to treat v(x) and A(v, x) in the same manner. So we require (e q v( x)) f-+ (e q A(v,x)). Therefore it is very natural to define (e q v(x)) as v*(e,x) where v*(e,x) is the greatest solution of a recursive equation for a predicate variable X*: X*(e, x) f-+ (e q A(v, x))[(r q v(y)):= X*(r, y)]. where [(I" q v(y)):= X*(r, y)] of the right hand side means replacing each subformula (I" q v(y)) by a subformula X*(r,y) in a formula (e q A(v,x)). We get the following definition of our realizability by describing syntactically this idea. Our realizability in this paper is an extension of Grayson's realizability. We can also define usual qrealizability of coinductively defined predicates in the same way as in this paper. Definition 5.1. (Harrop formula) 1. Atomic formulas ..1, N(a) and a = b are Harrop. 2. If A and B are Harrop, then A & B, C and (vP>.x.A)(t) are also Harrop. -t B, VxA Since a Harrop formula does not have computational meanings, we can simplify the q-realizability interpretation of them. Definition 5.2. (Abstract) 1. A predicate constant of arity n is an abstract of arity n. 2. A predicate variable of arity n is an abstract of arity n. 3. If A is a formula, >'Xl ... xn-A is an abstract of arity n. We identify (AXl ... xn.A)(tl , ... , t n) with A[XI t l , .. ·, Xn := t n] where [Xl := t l , ... , Xn := t n] denotes a substitution. Definition 5.3. ( q-realizability Interpretation) Suppose A is a formula, PI,".' Pn is a sequence of predicate variables whose arities are ml, ... ,mn respectively and Fl , Gi, ... , Fn , Gn is a sequence of abstracts whose arities are ml, ml + 1, ... ,mn , mn + 1 respectively. (e qPl, ... ,Pn[Fl , Gl, ... , Fnl Gn] A) is defined by induction on the construction of A as follows. We abbreviate qPl, ... ,pJFl , Gl , ... , Fn, Gn] as q', qP1, ... ,Pn,p[Fl, Gl , ... , Fn, Gn, F, G] as qp[F, G], Fl, ... ,Fn as F and Pl"",Pn as P. 1. (e q' A) == e = O&Ap[F] where A is Harrop. 2. (e q' Pi(t)) == Fi(t) & Gi(e, t). 3. (e Pi Q(t)) (l:::;i:::;n). q' Q(t) & Q*(e, t) where Q :t 4. (e q' A & B) == (eo q' A) & (el q' B). 5. (e q' A V B) == N(eo) & (eo = 0 - t (el q' A)) & (eo =/:. 0 - t (el q' B)). 6. (e q' A-tB) == (A-tB)p[F]&Vq((q q' A)-t (eq q' B)). 7. (e q' VxA(x)) == Vx(ex q' A(x) ). 8. (e q' :3xA(x)) == (el q' A(eo)). 9. (e q' (vX.>.x.A(X))(t)) == (vX*.>.ex.(e q'x[vp[F],X*] A(X)))(e, t) where v == vX.>.x.A(X). In the above definition, Pl, ... ,PJFl, Gl , ... , Fn, Gn] means a substitution. Our realizability interpretation is something like a realizability interpretation with a substitution. 670 6 Proposition 5.4. Let v vP.Ax.A(P). = 1. Vxr((r q vex)) f-+ (r q A(v))). 2. Axr.r q Vx(v(x) -+ A(v)). Proof 5.5. By the definition of q-realizability and (vI'). 0 Definition 5.6. For a formula A, a predicate variable P and a term I, we define a term (]'~,j by induction on the construction of A as follows: 1. A is a Harrop formula, then (]'~,j =Ar.O. 2. A = P(t), then (]'~,J = Ar.ltr. 3. A = Q(t), then (]'~,J = Ar.r if Q ¢ P. (]'~,j Vx(A(x) \ p,j( r ( (]'Al P,j q)) . = Arq.(]' A2 -+ B(x, Ix)) then P. = -+ 3yB(x,y)) effectively from the proof of the specification formula. Proposition 5.7. Let v vP.Ax.A(P). Then Aq'f.-lf.Axr.(]'~(~)(qxr) q Vx(C(x) -+ A(C)) -+ Vx(C(x) -+ Theorem 6.1. (Program Extraction) Suppose that we prove a specification formula Vx(A(x) -+ 3yB(x, y)) of a program in TIDv and we have a realizer j such that VX(A(x)-+(jx q A(x))). Then we can get a program I and a proof of - = (vQ.Ax.A1)(t), = (f.-lg.Axr.(]'~~g((]'~/r))t where Q ¢ with In this section, we give general program extraction theorem, stream program extraction theorem for TIDv and an example of program synthesis. Program synthesis by theorem proving techniques has been studied both in typed theories [2] and untyped theories [4]. For untyped theories, realizability interpretation is used as the foundation of program synthesis by theorem proving techniques. In Section 3, we showed that streams and programs which treat streams can be formalized in TIDv by the facility of coinductively definitions of predicates. In Section 5, we showed that realizability interpretation can be defined for TIDv and the interpretation is sound. Hence we can synthesize programs which treat streams by theorem proving techniques in TIDv using realizability interpretation. . We represent streams by infinite lists constructed by pairing. We represent a specification of a program by a formula: VX(A(x) 9. A Synthesis where x is an input, y is an output, A(x) is an input condition and B(x, y) is an input output relation. - \ (p,j P,J) 4 . A -A = 1 & A 2, th en(]'AP,J =Ar·(]'A1rO,(]'A2r1. 6 . A -A = 1 -+ A 2, then (]'AP,J Program Streams vex)) holds. We prove it in Appendix B. Proof 6.2. Since the specification formula is proved in TID v, by soundness theorem of q-realizability interpretation we have a realizer e such that e q VX(A(x) -+ 3yB(x,y)) holds. Let I be Ax.(ex(jx))o. Then the claim holds. 0 We can synthesize programs in the following steps: 1. We write down a specification formula. 2. We prove the specification formula in TIDv. Theorem 5.B. (Soundness Theorem) If TIDv f- A, we can get a term e from the proof of f- A and TIDv f- (e q A) holds where all the free variables of e are included in all the free variables of A. Proof 5.9. By induction on the proof of f- A. The case of the axiom (vI) is proved by Proposition 5.4. The case of the axiom (v2) is proved by Proposition 5.7. 0 3. We extract a program from the proof. The program extraction theorem says that the third step can be automated completely. Example 6.3. We show an example of the program which gets a stream of natural numbers and returns a stream whose each element is the element of the input stream plus one. 671 The predicate NS( x) which says that x is a stream of natural numbers can be represented in TIDv by the facility of coinductive definitions of predicates as follows: NS == VX.AX.X = (Xo, Xl) & N(xo) & X(XI). The input condition of the specification is a formula NS(x). The input output relation of the specification is a formula ADD1(x,y) which is defined as follows: ADDl == VX.AXY.Yo = Xo + 1 & X(XI,YI). The specification formula is: Vx(NS(x) -t 3yADD1(x, y»). We have one problem for this program synthesis method. The coinduction cannot be applied to the part Vx(NS(x) - t ... ) in the above example. We cannot prove 3y AD D1( x, y) by the coind uction in general. Therefore the realizer of the coinduction cannot give a loop structure for the program. On the other hand, a realizer of the induction principle plays an important role for this approach of program synthesis since the realizer corresponds to a loop structure of a program [4, 7]. Therefore we need the new method by which a realizer of the coinduction also corresponds to a loop structure and is useful. Then we need more specialized program extraction method for programs with streams in which the coinduction is useful. We give one solution for this problem by the next theorem. We put 2 restrictions on the theorem: One is that the input condition A(x) must be the form (VX.AX.X = (Xo, Xl) & A(xo) & X(Xl)(X) for some A. The other is that the input output relation B(x, y) must be the form (vX.Axy.B(x, Yo) & X(XI' Yl)(X, y) for some B. Th~se restrictions require an input condition and an input output relation are uniform over data and they are natural when we suppose that an input X and an output yare both streams. Theorem 6.4. (Stream Program Extraction) Suppose that the specification formula is Vx(A(x) Then we define BO == vX.Ax.3zB(x, z) & X(Xl). If we have e such that -t BO(x)), we can get a term F such that Vx(A(x) -t -t 2. We prove the corresponding formula Vx(A(x) BO(x» in TIDv. -t 3yB(x, y»). 3. We extract a program Ax.filter(ex(jx)) from the proof where e is a realizer of the corresponding formula Vx(A(x) - t BO(x»). In the second step, we can apply the coinduction to prove the part BO(x) since BO(x) is defined by coinductive definitions. Therefore a realizer of the coinduction can correspond to a loop structure of the program. Example 6.5. We treat the same example as above again. The specification formula is a formula Vx(NS(x)-t3yADD1(x, y)). Hence the formula ADD10(x) is: ADD10 == vX.Ax.3z(z B(x, Fx)) where filter == Jlf.AX.(Xoo, fXI), F == Ax.filter(ex(jx)). We prove it in Appendix C. By this theorem, we can synthesize programs in the following steps: = Xo + 1) & X(XI). (3) Therefore the corresponding formula we must prove is: Vx(NS(x) -t ADD10(x )). (4) If we prove this formula in TID v, we can get the program which satisfies the specification by stream program extraction theorem. The conditions of the theorem hold for this case. We can put j == AX.JlS.(O, s) since Vx(NS(x) -t (Jls.(O,s) q NS(x))). We prove (4) in the following way here: Firstly, we prove Vx(NS(x) -t 3z(z = Xo + 1) & NS(XI)). (5) This is proved by letting z be Xo + 1. Secondly, by letting C be NS in (v2) for ADDlo, we have Vx(NS(x) - t 3z(z = xo + 1) & NS(XI)) - t -t 3yB(x, y», A == VX.AX.X = (xo, Xl) & A(xo) & X(XI), B == vX.Axy.B(x, Yo) & X(Xb yd and we have a termj such that Vx(A(x)-t(jx q A(x))). e q Vx(A(x) Vx(A(x) 1. We write down a specification formula Vx(NS(x) -t ADD10(x)). (6) Finally, by (5) and (6), we get (4). We calculate realizers corresponding to the above proofs as follows: The realizer corresponding to the proof of (5) is: el == Axr.( (xo + 1,0), Tll)', el q Vx(NS(x) - t 3z(z = Xo + 1) & NS(XI)). The realizer corresponding to the proof of (6) is: e2 e2 == Aq.Jlf.AXT.(J"(qxr), q Vx(NS(x) Vx(NS(x) -t 3z(z = Xo ADD10(x» -t + 1) & NS(XI)) - t where (J" == Ar.((Too, TOI), fXITI). The realizer corresponding to the proof of (4) is: e == e2ell e q Vx(NS(x) - t ADD10(x)). 672 Appendix We get e = p,j.Axr.((xo + 1,0),jxlrn). The extracted program is: Fx filter(ex(jx)) A = filter(Jx(p,s.(O,s))) = (p,g.AX.(Xo + 1, gXl))X where j == p,f.Axr.((xo + 1,0),jxlrn). This is the program we expect. Remark that the realizer e2 of the coinduction (6) gives a loop structure of the program F. Axioms and Inference Rules of TIDv The logical axioms and inference rules are the same as the ones of a usual intuitionistic logic. Axioms for Equality: Vx(x=x) (El) Vx, y(x = y & A(x) --+ (E2) A(y)) Axioms for Combinators: Axioms for Pairing: I would like to thank Mr. Satoshi Kobayashi and Mr. Yukiyoshi Kameyama for careful comments. I'm deeply grateful to Prof. Masahiko Sato for invaluable discussions and comments. Vx,y(Po(pxy) VX,y(Pl(PXY) [1] M. Beeson, Foundations of Constructive Mathematics (Springer, 1985). N(O) (Nl) Vx(N(x) --+ N(SNX)) Vx(N(x) --+ PN(SNX) = x) Vx(N(x) --+ SNX =I- 0) A(O) & Vx(N(x) & A(x) --+ A(SNX))--+ Vx(N(x) --+ A(x)) (N2) (N3) (N4) Vx, y, a, b(N(x) & N(y) & x = y --+ dxyab Vx, y, a, b(N(x) & N(y) & x =I- y --+ dxyab ics with the Nuprl Proof Development System (Prentice-Hall, 1986). B (Dl) (D2) (r qp[F, AyX.(y q C(x))] A)--+ ((7~,fr qp[F,Ayx.3r((r q C(x)) & y = jxr)] A). (2) If a predicate variable P occurs only negatively in [6] R. Milner, Communication and Concurrency (Prentice Hall, 1989). Proof B.2. [9] M. Tatsuta, Monotone Recursive Definition of Predicates and Its Realizability Interpretation, Proceedings of Theoretical Aspects of Computer Software, LNCS 526 (1991) 38-52. = a) = b) Lemma B.I. (1) If a predicate variable P occurs only positively in a formula A, a formula A, [8] M. Tatsuta, Realizability Interpretation of Greatest Fixed Points, Manuscript (1991). (N5) Proof of Soundness Theorem [5] S. Kobayashi, Inductive/Coinductive Definitions and Their Realizability Interpretation, Manuscript (1991). [7] M. Tatsuta, Program Synthesis Using Realizability, Theoretical Computer Science 90 (1991) 309-353. (PI) (P2) . Axioms for d: [2] R.L. Constable et al., Implementing Mathemat- [4] S. Hayashi and H. Nakano, PX: A Computational Logic (MIT Press, Cambridge, 1988). = x) = y) Axioms for Natural Numbers: References [3] P. Dybjer and H.P. Sander, A Functional Programming Approach to the Specification and Verification of Concurrent Systems, Formal Aspects of Computing I (1989) 303-319. (Cl) (C2) Vx,y(Kxy = x) Vx, y, z(Sxyz = xz(yz)) Acknow ledgments (r qp[F, Ayx.3r((r q C(x)) & y = jxr)] A)--+ ((7~,J r qp[F, AyX.(y q C(x))] A). We prove (1) and (2) simultaneously by induction on the construction of A. 0 Proof B.3. (of 5.7) Let v == vP.Ax.A(P). Suppose Vx(C(x) --+ A(C)), q q Vx( C(x) --+ A( C)) and let j == p,j.Axr.(7~(~)(qxr). We show j q Vx(C(x) --+ v(x)). 673 Let v*(r, x) == (r q v(x». It is sufficient to show Vxr((r q C(x» --t v*(fxr, x». This is equivalent to Vxy(3r((r q C(x» & y = jxr) --t v*(y,x». By (v2), it is sufficient to show Vxy(3r((r q C(x» & y = jxr) --t (y qp[v, Ayx.3r((r q C(x» & y = jxr)] A(P)). This is equivalent to Vxr((r q C(x»--t (fxr qp[v, Ayx.3r((r q C(x» & y = jxr)] A(P»). Fix x and r and assume r q C(x). We show A(XI) & (filter(fx)h = filter(gxI»)' We will prove it. Fix x and j and assume that Vx(A(x) --t (fx q BO(x», (7) (8) (9) A(x). By (8) and (9), (fx q BO(x» holds. Hence holds. Therefore (filter(fx»o 13(x, (filter(fx»o) holds since = (fx)oo. Put 9 be Ay.(f(XO, y) h. We will show Vy(A(y) --t (gy q BO (y )) ). Fix y and assume that A(y). By the By the assumption about q, qxr q A(C). Hence definition of A, A(x) qxr qp[C, AyX.(y q C(x»] A(P). By positivity and Vx(C(x) --t v(x», qxr qp[v, AyX.(y q C(x»)] A(P). A(P). = jxr)] f-7 X = (xo, Xl) & A(xo) & A(XI) and By Lemma B.l, O"~(~)(qxr) qp[v, Ayx.3r((r q C(x» & y = jxr)] A(P). D C 13(x, (filter(fx»o) & 3g(Vx(A(x) --t (gx q BO(x») & ((fX)OI q 13(x, (fx)oo» & ((fxh q BO(XI») (10) jxr qp[v, Ayx.3r((r q C(x») & y = jxr)] A(P). By jxr = O"~(~)(qxr), we have jxr qp[v, Ayx.3r((r q C(x) & y By only rules of NJ, it is equivalent to Vxj(Vx(A(x) --t (fx q BO(x») & A(x)--t Proof of Stream Extraction Theorem Lemma C.l. Suppose that A == VX.AX.X = (xo, Xl) & A(xo) & X(XI), B == VX.AXy.13(X, Yo) & X(Xll YI), BO == vX.Ax.3z13(x, z) & X(XI)' Then Vj(Vx(A(x) --t (fx q BO(x»)--t Vx( A( x) --t B( x, filter(f x»») holds. Proof C.2. By only rules of NJ, the above goal is equivalent to Vxy(3j(Vx(A(x) --t (fx q BO(x») & A(x) & y = filter(f x» --t B(x, y). By (v2), it is sufficient to show Vxy(3j(Vx(A(x) --t (fx q BO(x») & A(x) & y = filter(fx»--t 13(x, Yo) & 3g(Vx(A(x) --t (gx q BO(x») & A(XI) & YI = filter(gxl»)' A( (xo, y)) f-7 A( xo) & A(Y) hold. By this and (9), A( xo) holds. Hence A( (xo, y) holds. Combined it with (8), we get (f(xo, y) q BO((xo,Y)). Hence ((f(xo,y)h q BO(y) and (gy q BO(y») hold. Therefore we get Vy(A( x) --t(gy q BO(y»). By (9), A( Xl) holds. Since, in general, (filter( s) h = filter(sl) holds, we get (filter(fx»)l = filter((fxh) = filter(gxl)' Therefore (7) holds. D Proof C.3. (of Theorem 6.4) By the aElsumptions and the definition of qrealizability, Vx(A(x) --t (exUx) q BO(x») holds. Letting j be Ax.ex(jx) in Lemma C.l, we get Vx(A(x)--t B(x,Fx). D PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 674 MLOG: A STRONGLY TYPED CONFLUENT FUNCTIONAL LANGUAGE WITH LOGICAL VARIABLES Vincent Poirriez* Universite de Reims INRIA-Ecole Normale Superieure PARIS, FRANCE Abstract A new programming language called MLOG is introduced. MLOG is a conservative extension of ML with logical variables. To validate our concepts, a compiler named CAML Light FL UO was implemented. Numerous examples are presented to illustrate the possibilities of MLOG. The pattern-matching of ML is kept for Acalculus bindings and an unification primitive is introduced for the logical variables bindings. A suspension mechanism allows cohabitation of pattern-matching and logical variables. Though the evaluation strategy for the application is fixed, the order for evaluation of the parts of pairs and application remains free. MLOG programs can be evaluated in parallel with the same result obtained irrespective of the particular order of evaluation. This is guaranteed by the Church Rosser property observed by the evaluation rules. As a corollary, a strict A-calculus with explicit substitutions on named variables is shown to be confluent. A completely formal operational semantics of MLOG is given in this paper. 1 Introduction Many attempts have been made at integrating functional and logical tools in the same language. It actually seems worthwile to combine the strengths of the two paradigms, allowing the programmer to choose the most appropriate tool to resolve his problem. The approach we have followed is to add "logical" tools to a well-known strongly typed functional language: ML. To validate our ideas and to demonstrate that MLOG is a realistic proposal, we have implemented a compiler for MLOG named "CAML Light FL UO" . It is an extension of the CAML Light system of X.Leroy[Leroy 90]. Logical variables and unification serve two goals in logical languages: to handle partially defined values, and to provide a resolution mechanism. The implementation of logical variables and unification is a required step to "Projet Forme! BP 105 Domaine de Voluceau 78153 Rocquencourt Cedex, FRANCE. poirriez~margaux.inria.fr implement a resolution mechanism, so we bypass that second goal and focus on the first one. MLOG is an extension of ML with built-in logical variables instantiable once, and unification. We allow a fruitful cohabitation of logical variables and ML pattern matching by introducing a suspension mechanism, thus when an application cannot be evaluated due to a lack of information, the application is suspended. In the designing of MLOG, we strive to obtain a conservative extension of ML. Pure ML programs are not penalized by the extension. This result is obtained by limiting the domain of logical variables and suspensions to specified logical types. Moreover, MLOG inherits from ML a strong system of types and a safety property for the execution of well-typed programs. Thus the programmer does not waste energy in checking types. In this article, we trace the execution of programs that illustrate that synchronisation algorithms, demand driven computation, algorithms using potentially infinite data structures or partially instantiated values are easily written in MLOG. Then we focus on the confluence property. In MLOG, the strategy for the evaluation of an application is strict evaluation: i.e. we impose the evaluation of the argument before reducing the application. Nevertheless, some freedom remains in the order of evaluation of a term: both parts of an application or of a pair for example. Then MLOG is independent of the implementation choices and it can be implemented on a parallel machine. As we fix the strategy for the evaluation of the applications, we can name variables without risking clashes. A complete operational semantics is given in appendix. A subset limited to the functional part of these rules is a strict A-calculus with explicit substitutions and named variables that verify the Church Rosser property. That calculus is a very simple formalism and as it is confluent, it is a good candidate to describe any implementation of strict A-calculus, even a parallel one. 2 MLOG syntax and examples We describe here the added syntax to ML. As MLOG is an extension of ML, all programs of ML are programs of 675 MLOG. For clearness, we limit ourselves to a mini-ML. All examples are produced by a session of our system CAML Light FLUO. Note that # is the prompt and;; the terminator of our system. 2.1 Syntax The language we consider is A-calculus with patternmatching, concrete types (either built-in, as int or string, or declared by the user), constructors, the let construct and the conditional. We first define the set P of programs of MLOG. We assume the existence of a countable set Var of term variables, with typical elements x, y, and a disjoint countable set C of constructors, with typical elements c. Some constructors are predefined: integers, strings, booleans (true, false) and 0, the element of type unit. In the following, i ranges over integers and s over strings. The syntax of patterns, with typical element p, 'is: p ::= x I c I (pI, ... ,Pn) I c P As in ML, we limit ourselves to linear patterns. The syntax of programs, with typical elements a, b, is: where [ ] surround optional expressions. A logical type is declared by the new key-word: type logic. The type void below has a unique value void and logical variables of type void may be declared. The type void is isomorphic to the type unit except that no logical variable can be declared in unit. A value of the type Bool below is True, False, or a free logical variable that will possibly be instantiated later to either True or False. #type Type #type Type logic void = void;; void defined. logic Bool = True False;; Bool defined. a then evaluate b and return the value of b. The last two constructs are specific to MLOG: undef is a generator of fresh logical variables; unif is the unification primitive. let_ var u in '" is syntactic sugar for let u = undef in .... The following rules govern type variable instantiations: (1) ) a may be instantiated by any type (including 'b?); (2) ) a? may be instantiated by any logical type; (3) 'a? may not be instantiated by a non logical type. We write "a : ti" the program a of type ti. Thus, the set of MLOG programs is in fact the subset of the well-typed programs Py of P defined by the familiar ML type system. We just have to specify that: (1) undef: 'a?; (2) unif: 'a -+' a -+ void. Fortunately, as far as types are concerned, logical variables and assignable constructs are quite close, we have adapted to logical variables previous work done for typing assignable objects in ML. We have directly applied the idea of Pierre Weis and Xavier Leroy [LeroyWeis 91], and, using their notion of cautious generalization, we get an extension of the ML. type system to logical variables that is sound: 2.2 Theorem 1 No evaluation of a well-typed program can a :: = x I c I a b I (a 1, ... , an) I let x = a in b I a; b (function Pl -+ al I ... I Pn -+ an) I undef I unif a; b is the ML notation for a sequence, it means evaluate Types leads to a run-time type error. In MLOG, the programmer has to declare specially the types that may contain undefined objects (that is, logical variables and suspensions). The notion of logical type, is introduced. We assume given a countable set of type variables TVar, with typical elements 'a, 'b, a disjoint countable set of variables over logical types LTVar with typical elements 'a?, 'b? and two countable sets of type constructors with typical elements ident and lident. The sets of logical types .c, with typical element 'Ti, and types T (typical element ti) are recursively defined by: 'Ti::= 'a? I [til lident and ti::= 'Ti I bool I int I string I unit I ti -+ tj I ti [til ident Note that .c is a strict subset of T. declare new type are: * tj I Expressions to type ['a, ... , 'kJident = c [of tiHi .. ·Ic' [of tjll I type logic ['a, ... ,'k] lident = c [of ti][l ... Ic' [of tj]] Thus CAML Light Fluo has a type-checker that infers and checks the types of programs. 2.3 Examples We give below very simple examples to illustrate the semantics of unification and logical variables in MLOG. First logical variables are instantiable once, when the unification fails, the exception Unify is raised: #let (u:Bool) = undef;; Value u : Bool u = ? #unif u True; unif u False;; - : void Uncaught exception: Unify #u; ; - : Bool - = True CAML Light FL UO prints "?" for a free logical variable. Rational trees are allowed; unif does not perform any occur-check. Moreover, unif does not loop when unifying rational trees. The type 'a stream below implements the potentially infinite lists. 676 #type logic 'a stream = Nil 1St of 'a * 'a stream;; Type stream defined. #let (u:int stream) = undef;; Value u : int stream u ? #unif u (St(l,u));u;; int stream - = St (1, St (1, St (1, St (1 ,Interrupted. The printing of u was interrupted by a system break. At that point we can use classical technics used in the logical languages, see for example in the appendix the classical functional quicksort program, except that difference lists are used instead of lists to improve the concatenation of sorted sublists. 2.4 Suspensions: an intuitive semantics Consider first the example below: #let neg = function True -) False IFalse -) True;; Value neg : Bool -> Bool #let b,exp = let_var u in (u, neg u);; Value b : Bool Value exp : Bool b = ? exp b is a new free logical variable of type Bool. The application cannot match u with True or False: u is free. So what is the meaning of exp? The answer is: the application neg u is suspended. Thus, exp is a suspension of type Booll. A suspension is a first class citizen in MLOG. It may be handled in data structures, and used in other expressions. #let exp' = unif exp False;; Value exp' : void exp' = ... Since exp is a suspension, MLOG cannot perform the unification of exp with False. Therefore this unification is also suspended 2 • Let us now instantiate b with True, and look at exp and exp' . #unif b True; exp,exp';; Value - : Bool * void #let (a,b,e) = let_var a,b in (a,b,(function True ->(unif a True))b);j Value a : Bool Value b : Bool Value e: void b =? e = ... e is suspended waiting for the instantiation of b. #unif b True;; Value - : void - #a;; Value - : Bool - = void INote that CAML Light FLUO prints suspensions as " ... ". 2That is why the'type of the result of unif has to be a logical type. We do not want to have suspension in a non logical type. = True The example above illustrates the fine control on evaluation allowed by the suspension mechanism. The application is performed and then a is instantiated only when b is instantiated. A confluence result 3 To give an operational semantics for MLOG we have to deal with bindings of .A.-calculus variables, bindings of logical variables and suspensions. We give here a simple formalism that allows us to keep named parameters and we show that this calculus is strongly confiuent 3 . In this section we neglect types. 3.1 A strict calculus with environment We store bindings of parameters in environments. We call EA the set of terms with environments. As our calculus is strict, we specialize a subset Val of EA which is the set of the values handled by the language. Typical elements of Val and EA are respectively noted v and t. e ::= [] I (x,v)::e v ::= c I c(v) I (v,v') t ::= 3.2 (False,void) We have to clarify when a suspension is awakened. Awakening a suspension could be delayed until it is actually needed. We must define when such an evaluation is needed: a =? As b is instantiated, e can be awakened. If we choose to wake up a suspension only if its value is needed, e remains suspended and then a remains free. If the value of a is needed, nothing indicates that the evaluation of e will instantiate a. This motivates our choice to wake up all suspended evaluations that can be awakened. Another motivation is that, if an expression is suspended, it is because its evaluation was needed and unfortunately was stopped by lack of information. So if we look at a: I (function ... ).e c I c(t) I (t,t') I t(t') la.e Logical variables, substitutions and suspensions Now we have to extend the set Val with logical variables. We assume the existence of a countable set U disjoint with V and C with typical element u( i), distinct logical variables have distinct indexes. We call LVal and ELA the obtained sets of values and terms with environments. To manage the bindings of logical variables we define substitutions as functions from U to ELA. We will use greek letters to note substitutions. We call the domain of (J' and note dom( (J') the set {u(i) s.t. (J'(u(i)) -=I u(i)}. We will note (J' 0 a the composition of substitutions. The MLOG pattern matching algorithm has to deal with logical variables. It has to 3Recall that if no strategy for application is imposed, name clash may occurs. To avoid that problem, the names of variables can be replaced by numbers "iI. la De Bruijn"[AbadiCaCuLe 90, HardinLevy 90] 677 access to the pointed value when it checks a bound variable, it fails with Unknown when it tries to match a free logical variable with a construct pattern. We define the match of a term t with a pattern pat in the substitution 0- and note ~cr(pat, t) as the list of appropriate bindings of parameters of pat. Recall that patterns are linear. We define now a sequential pattern matching without entering into the optimization of the algorithm4. if where t is the term to reduce. The substitution 0- stores the bindings of unified logical variables and updated suspensions. The valuation a stores the suspensions (recall they are bound to u(j) with j < 0). The substitution r stores the suspensions of which evaluations are running. We use the classical notation ~ and .!!:., for reflexive transitive closure of - t and for derivations of length n. We first have two lemmas that say that no term of the form (a.e).e' is produced and that the term component of a normal form is a value. Lemma 1 Let a be a program and t, 0-, a, r >. For all subterms of t oj the form t'.e, i' is a program. < a.[], 0, 0, 0 >.!!:.,< Lemma 2 Let a be a program and < a.[], 0, 0, 0 >~< i, 0-, a, r > such that < i, 0-, a, r > is a normal form. Then t is a value. We can deduce from these lemmas that all bindings in 0- bind a variable with a value. Let us remark now that if no suspension rule is applied, as we do not reduce under a A and we impose a strict calculus we have strong confluence for our reduction rules. 678 < t, 0-, a, r >-t< tll 0-1, a, rl > and < t,o-,a,r >-t< t2,0-2,a,r2 > two reduction using respectively rules r 1 and r2 with ri not a suspension rule. Then we have by the application of respectively r2 and r1: < tl, 0-1, a, rl >-t< t3, 0-3, a, r3 > and < t2, 0-2, a, >-t< t3, 0-3, a, > Proposition 1 Let r2 r3 An important corollary of that result is that if we restrict ourselves to the functional subset of MLOG, we have describe a strong confluent calculus with explicit substitutions and named variables. That calculus is rather simple (all that concerns logical variables and suspensions is unnecessary) and describes all implementations of a strict >.-calculus, even a parallel one. Remark that -t is not strongly confluent on the whole language. That is illustrated by the example below where the choice is between UniIT and Susp and the diagram cannot be closed in one step as even if U niIT is chosen after Susp waking up the suspension remains to be done. < ((fun c -t c').[] u(l), unif u(l) c), 0, 0, 0 > We can see the use of a rule Susp, ASusp or USusp as the translation of subterm from the term to r. From a reduction point of view we can say that these rules do not work. Thus the idea is to define an equivalence between four_uples < t, 0-, a, r > which is stable for these suspension rules and then show the strong confluence of -t up to that equivalence. Definition 1 < t, 0-, a, r >==< t', 0-', a', r' > iff 1. there exists a permutation P over positive variable index such that (0- 0 a 0 r)*(t) = P(o-' 0 a' 0 r')*(t') 2. and for all u(i) in dom(o-) with i > 0, (0r)*( u(i)) = P( 0-' 0 a' 0 r')*( u(P( i))) 0 a 0 3. and for all u( i) in dome a) U dom(r) or there exists j < 0 such that u(j) in dome a') U dom(r') and (o-oaof)*(u(i)) = P(O"'oa'or')*(u(j)), either there exists a subtermt~ oft' such that (o-oaor)*(u(i)) = pea' 0 a' 0 r')*(tD and vice versa for all u( i) in dome a') U dom(f') or t = t' = failwith(s) Thus we have verified the Church Rosser property (the proof is in appendix C): Theorem 3 If < t, 0-, a, r then it is unique up to > has a normal form for-t == Remark that if we add types as defined in the section above, the rules have not to be modified and the result holds. 4 MLOG: a conservative extension ofML The fact that the type of undef is ' a? ensures that no logical variable occurs in a non-logical type. That is not enough to ensure that no suspension of a non-logical type is built. Fortunately, we handle type information when we compile the pattern matching. Thus we have the following rules for the application: Let f be a function of type tl -t t2: (1) if type tl is a non-logical type, then do not do any test to check if the argument is a free variable or a suspension. (2) if type tl is a logical type, then (21) first, test if the argument is a bound logical variable or an updated suspension, and access the bound value. (22) if type t2 is a nonlogical type, test if the argument is a free variable or a suspension. If so, raise failure Unknown. (23) if type t2 is a logical type, test if the argument is a free variable or a suspension. If so build and return the appropriate suspension. Example: #type logic 'a partial = P of 'a;; Type partial defined. #(function (P x) ->x) undef;; uncaught exception Unknown Theorem 4 Let a be a well-typed program. The evaluation of a cannot build a logical variable or a suspension of a non-logical type. We can now deduce that MLOG is a conservative extension of ML as pure ML programs need not know for the extension. However, it is clear that with that rule of failure, our calculus is no longuer Church Rosser. To keep that property, we must not use functions from a logical type to a non-logical type. Let call M LOG* the subset of MLOG that does not contain such functions. Thus, we have the following result. Proposition 2 The relation -t is confluent on MLOG*. Remark: The counterpart of the conservative property of MLOG is the need to be cautious with logical variables and "functional types". First, for any instances of' a and 'b the type' a -t' b cannot include a logical variable as it is a "pure ML" type. Anyway, it is correct to have logical variables of type (int -t int)partial as illustrated below. #let app (P h) (P x) = P (h x);; Value app:('a->'b)partial->'a partial->'b partial #let (g: (int -> int)partial)=undef;; Value g : (int -> int) partial g = ? #let e2 = app g (P 2);; Value e2 : int partial e2 = ... 679 #unif g (P (fun x -> x*x»;; - : void - = void #e2; ; - : int partial 5 P 4 Conclusion We have defined MLOG as an extension ofML. We have shown that it verifies a Church Rosser property and then it may be parallelized or used to simulate parallel processes. Such processes can communicate with each other through shared logical variables and the suspension mechanism allows synchronization. Partial data are handled by MLOG, for example potentially infinite lists can be implemented by the use of free logical variables for the tail of the structure (see example in appendix). MLOG includes a suspension mechanism, let us now compare it to some other proposals of integration that have made a similar choice. MLOG is close to the language Qute defined by M.Sato and T.Sakurai in [SatoSakurai 86]. However, it differs from it in the following points: (1) its evaluation strategy ensures that the evaluation of a suspended expression will be tried only when needed information is provided; (2) the reduction of an application is allowed even if a subexpression of the argument is suspended, the only condition is that pattern matching succeeds, in that case the binding of the suspension by a logical variable and the storage in a avoid duplication of that suspension. MLOG is also close to GHC ofK.Veda [Ueda 86], the main difference (except for typing point of view) is that MLOG does not have non-determinism for rule selection and that we have preferred to keep the functional formalism in place of the predicate one as selection of rules is done by pattern matching. However, determinist GHC programs are easily translated in MLOG6. The use of a suspension mechanism and the cohabitation of logical variables and functions are common to Le Fun of H.Ait Kaci[Ait Kaci 89] and MLOG. Here the main differences are that Le Fun provides a resolution mechanism based on backtracks and that MLOG is strongly typed. Perhaps the main difference between MLOG and these related works is that MLOG is a conservative extension of ML. We demonstrate that the type system of ML can be extended to MLOG and we gave a safety property for well typed programs. As a side effect, we have described an operational semantics for strict Acalculus which uses names for parameters and verifies the Church Rosser property. Therefore it can be used to 6The author has traduced all programs given by G.Huet in [Huet 88], he found that the use of types and of a functional formalism lead to more clear programs. describe any interpreter of strict A-calculus, even parallel one. If it seems desirable, further work can be done to provide a resolution mechanism in MLOG. Note that the exhaustive search transformation described by K.Ueda in [Ueda 86] is applicable. We hope that MLOG is an attractive extension of ML as from a "logical paradigm" point of view it allows handling incomplete data structures and controlled parallel evaluation with the improvement of the ML type system. And from a "functional paradigm" point of view, it respects functional programs with the improvement of partial data and a fair control mechanism. Acknowledgments: We would like to thanks all members of LIENS-INRIA Formel project for helpful discussions. In particular Therese Hardin for her accurate suggestions to improve our formalism and demonstration. A Appendix: MLOG programs The program below is the classical functional quicksort program, except that difference lists are used instead of lists to improve the concatenation of sorted sublists. This is done by the use of the same variable r in both recursive calls of qsortrec. #let partition order x = let rec partrec = function Nil -> Nil,Nil ISt(h,t) -> let infl,supl = partrec t in if order(h,x) then St(h,infl),supl else infl,St(h,supl) in partrec ;; Value partition ('a*'b->bool)->'b->'a stream->'a stream*'a stream #let quicksort order 1 = let rec qsortrec = function (Nil,result,sorted) -> (unif result sorted); result I(St(h,t),presult,sorted) -> let infl,supl = partition order h t in let_var r in (qsortrec(supl,r,sorted); qsortrec(infl,presult,St(h,r») in qsortrec (l,undef,Nil) ;; Value quicksort: ('a*'a->bool)->'a stream->'a stream The following example illustrates the use of potentially infinite lists and demand driven computation. The confluence property allows to parallelize the evaluation of nested applications in the definition of the Hamming sequence of integers of the form 2i * 3j * 5k [Dijkstra 76J. #let mult (P X,P y) = P(x*y)jj Value mult : int partial * int partial -> int partial #let rec times (u,St(v,r» = St(mult(u,v),times(u,r);; Value times: int partial*int partial stream->int partial stream #let rec merge (St(P x,s),St(P y,r») = if xy then Step y,merge (St(P x,s),r» else Step x, merge(s,r»jj Value merge: int partial stream*int partial stream -> int partial stream #let rec copy_stream (St(a,b)as s) (St(h,t» = unif a hj copy_stream b tj Sjj Value copy_stream: 'a stream -> 'a stream -> 'a stream 680 #let Hamming = let_var r in copy_stream (St(P l,merge(merge(times(P 2,r),times(P 3,r», times(P 5,r»» rj r· . " Value Hamming : int partial stream Hamming = ? #let rec increase_stream st = function o -> st I n -> let_var tail in unif st St(undef,tail)j increase_stream tail (n-l) jj Value increase_stream : 'a? stream -> int -> 'a? stream #increase_stream Hamming 9j Hamming;; Value - : int partial stream - =St(P l,St(P 2,St(P 3,St(P 4,St(P 5,St(P 6,St(P 8, Step 9,St(P 10,?»»))) B Env < x.(x, t) :: _, a, 0:, r >~< t, a, 0:, r > EnvO < x.(y, t) >~< x.e, a, 0:, r > Const < c.e,a,o:,r AEnv < (t t').e,a,o:,r UEnv < (unif t t').e,a,o:,r PEnv < (t,t').e,a,o:,r PairlF Pair2F < t',a,o:,r >~< failwith(s), a, 0:, r > < (t,t'),a,o:,r >~< failwith(s),a,o:,r > > Pairl Lemma 3 If < t,CT,a,r of a suspension rule then Proposition 3 If < t',CT',a',r' > by application < t,CT,a,r >=< t',CT',a',r' > tl,CT1,al,rl >~< t~,CT~,a~,r~ > Proof: We carefully discuss one case, others are similar: >~< >~< > (unif t.e t'.e),a,o:,r (t.e,t'.e),a,o:,r > > Susp < t, a, 0:, r > is in ~ normal form. Cs =k a*(f) = (fun PI ~ aII···1 pn ~ an).e, ~< u(-n),a, (u(-n), a*(f) t):: o:,r > and C s +- (k + 1) ASusp < t, a, 0:, r > is in ~ normal form. c.=n a*(f) = u(i) < f t, a, 0:, r >~< u(-n), a, (u( -n), u(i) t) :: 0:, r > and c. +- (n + 1) Fail < t, a, 0:, 0> is in ~ normal form. ~< failwith(Pattern), a, 0:, r > UnifT < t, a, 0:, 0 > and < t', a, 0:, 0 >are in ~ normal form uniju(t, t') a' Let L = 0 if a' = a or a'(u(i» = u(j) for all u(i) E dorn(a')\dom(a) and L = queueu,,,,(u(i» in other cases < unif t t', a, 0:, r >~< void, a' , o:\L >, L U r > UnifF < t,a,0:,0 > and < t',a,0:,0 > are in ~ normal form unifu(t, t') = fail < unif t t',a,o:,r >-.< failwith(Unif),a,o:,r >~< u(c),a,o:,r > = USusp < t, a, 0:, 0 > and < t', a, 0:,0 > are in ~ normal form unifu(t, t') = 8usp(u(i», C s = n < unif t t', a, 0:, r >~ < u(-n),a,(u(-n),uniJ t t'):: o:,r > and c. +- (n + 1) Aw u(i) E dorn(r) and r(u(i» = t < t,a,0:,0 >~< t',a',o:',0 > and < t' , a' , 0:' , 0 > not in normal form < to,a,o:,r >~< to,a',o:',r[u(i) +- t'] > >~< by application of a rule distinct of a suspension rule, and if < t l , CTI, al, r 1 >=< t 2, CT2, a2, r 2 > then we have < t~,CT~,a~,r~,> such that < t2,CT2,a2,r2 >~< t~,CT~,a~,r~ > and < t~,CTLa~,n >=< t~,CT~,a~,r~ > (t.e t'.e),a,o:,r < t, a, 0:, 0 > is in ~ normal form a*(f) = (fun PI ~ aII···1 pn ~ an).e, ~< ai.ei @ e,a,o:,r > Demonstration of theoreme 3 Let us give preliminary results. >~< (3 Figure 1: Structural rules C c,a,o:,r > < undef.e,a,o:,r and c +- (c + 1) Pair2 We assume that we ha.ve a function queue such that queueu,cxu(i) returns all the suspensions in a waiting for instantiation of u(i). The rule DVar uses a counter c that is increased each time a new logical variable is created. c is initially at 1. The rules Susp and USusp use an other counter C s dedicated to suspensions also initially at 1, they increase a with the new suspension. The rules UniIT and AwUpd increase CT with the new bindings and increase r with the suspensions waiting for these instantiations or update. Note that we remain free to choose the order of evaluation of binary constructs as for Ell.. (We give in figure 1 the rules for pairs, rules for unification and application are similar.). Moreover, the order of evaluation of terms bound in r is also free (see rule Aw). >~< DVar Reduction rules < t,a,o:,r >~< failwith(s),a,o:,r > < (t,t'),a,o:,r >~< failwith(s),a,o:,r :: e, a, 0:, r u(i) E dorn(r) and r(u(i» = t < t,a,0:,0 >~< t',a',o:',r" > and < t' , a' , 0:' , 0 > is in normal form AwUpd r' = queueu,,,,(u(j» < to,a,o:,r >~ < to, (u(j), t') :: a', o:'\r', r" u r' u r\ {( u(j), tn AwFail u(i) E dom(r) and r(u(i» = t < t,a,0:,0 >~< failwith(s),u,0:,0 > < to,a,o:,r >~< jailwith(s) , u, 0:, r > > > 681 Let < h,O"I,al,r l > be reduced by f3 applied on a subterm of t l . Let note that subterm (fun PI -+ al I ... I Pn -+ an).e v. By the hypothesis of == we have (0"20 a2 0 r 2)'"(t2) = tl, thus the corresponding subterm of t2 is of one of the following forms: u(i); u(i) u(j); (fun PI -+ al I·.· I Pn -+ an).e w. We examine the first two forms: (1): u(i). First as 0"2 binds variable with values, we have O"z(u(i)) = u(j) and u(j) ¢ dom(0"2). The == hypothesis ensures that u(j) ¢ dom(a2) as in that case the application would be suspended when the rule f3 applies on tl' Thus we have: O"z(r 2(u(j))) = (fun PI -+ al I ... I Pn -+ an).e v. The == hypothesis ensures that the same pattern matchs in both reduction and then application of Aw with the rule f3 on that term clearly leads to an equivalent four_uple. (2) u(i) u(j). The fact that bindings in a2 and r 2 are bindings of logical variable to non value terms ensure that O"2{u(i)) = (fun PI -+ a1 I··· I Pn -+ an).e and O"2'(u(j)) = V; then f3 applies on u( i) u(j) and leads to an equivalent four_uple.¢ We have now the result of strong confluence of -+ up to ==, all < t, 0", a, r > such that: < t,O",a,r >-+< tl,O"l,al,r l > < t,O",a,r >-+< t2,0"2,a2,r 2 > There exists < t~, O"~ , a~ , r~ > and < t~, O"~, a~ , r~ > such Theorem 5 For that < t1, 171, al, r 1 >~ < ti, O"~ , ai, ri > < t2,a2,a2,r 2 >~< t~,a~,a~,r~ > < ti, a~ , ai , r~ > == < t~, O"~ , a~ , r~ > /e~ ;/"\: . el e2 e2 el II e3/ Two suspensions One suspension e ~ ~ el e2 ~/ e'l II e'2 No suspension Figure 3: Strong confluence Proof of the theorem: We show that the diagram of figure 4 holds with the theorem above and by successive inductions on lengths of d1 and d2 .<> Remark that the limitation to a strict calculus is necessary. If we permit reducing application without reducing the argument, as some unification may occur in that argument different normal forms are possible. Example: < (fun (x,y) -+ unif x True).[](u(l),unif u(l) False),0,0,0 > has two normal forms: ° < void, ((u(l), True)}, 0, > and \ i e'l • e'2 Figure 4: Church Rosser property References [AbadiCaCuLe 90J M.Abadi, L Cardelli, P.L. Curien, J.J. Levy, "Explicit sUbstitutions", Proc. Symp. POPL 1990 [Ait Kaci 89J H.Ait-Kaci, R. Nasr, "Integrating Logic and Functional Programming", Lisp and Symbolic Computation,2,51-89 (1989) [DeGrootLindstrom 86J D DeGroot G Lindstrom (eds), "Logic Programming - Functions, Relations and Equations", Prentice-Hall, New-Jersey, 1986. [Dijkstra 76J E.W. Dijkstra, " A Discipline of Programming", Prentice Hall, New Jersey, 1976. [HardinLevy 90J T. Hardin, J.J. Levy, "A Confluent Calculus of Substitutions", third symposium (IZU) on LA. Proof: it is illustrated in figure 3. The cases where at least one reduction use a suspension rule are: if both rl and T2 use suspension rules, then the lemma 3 is enough to conclude. If one ri use a suspension rule, then we conclude with, the proposition 3 and the lemma 3.<> e dl;/e~d2 < jailwith(Unij),{(u(1),False)},0,0>. [Huet 76] G.Huet, "Resolution d'equations dans leslangages d'ordre 1,2, ... ,w" These d'etat de l'Univ. de Paris 7, 1976 [Huet 88] G. Huet, "Experiments with GHC prototypes", may 1988, unpublished [Laville 88] Alain Laville, "Implementation of Lazy Pattern Matching Algorithms" , ESOP'88,LNCS 300 [Leroy 90] X.Leroy, "The ZINC experiment: an economical implementation of the ML language", INRIA technical report 117, 1990. [LeroyWeis 91] X.Leroy P.Weis, "Polymorphic type inference and assignment", "Principles of Programming Languages", 1991. [Poirriez 91] V.Poirriez, "Integration de fonctionnalites logiques dans un langage fonctionnel fortement type: MLOG une extension de ML" These, Univ. Paris 7, 1991 [Poirriez 92a] Vincent Poirriez, "FLUO: an implementation of MLOG", Fifth Nordic Workshop on Programming Languages in Tampere, 1992 [SatoSakurai 86J M.Sato et T.Sakurai "QUTE: a functional Language Based on Unification". In [DeGrootLindstrom 86J pp 131-155. [PuelSuarez 901 A. Suarez and L.Puel, "Compiling pattern matching by term decomposition", LFP'90 [Ueda 861 K. Ueda, " Guarded Horn Clauses", Ph.D.Thesis, Information Engineering Course, Univ. of Tokyo, 1986. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 682 A New Perspective on Integrating Functional and Logic Languages John Darlington Yi-ke Guo Helen Pull Department of Computing Imperial College, University of London 180 Queen's Gate London SW7 2BZ U.K. E-mail: jd.yg.hmp@doc.ic.ac.uk February 1992 Abstract Traditionally the integration of functional and logic languages is performed by attempting to integrate their semantic logics in some way. Many languages have been developed by taking this approach, but none manages to exploit fully the programming features of both functional and logic languages and provide a smooth integration of the two paradigms. We propose that improved integrated systems can be constructed by taking a broader view of the underlying semantics of logic programming. A novel integrated language paradigm, Definitional Constraint Programming (DCP), is proposed. DCP generalises constraint logic programming by admitting user-defined functions via a purely functional subsystem and enhances it with the power to solve constraints over functional programs. This constraint approach to integration results in a homogeneous unified system in which functional and logic programming features are combined naturally. 1 tend the conventional constraint logic programming (CLP) framework by using a functional programming language to define the domain over which relations are defined. Thus we combine functional programming with a general CLP framework rather than with the conventional Prolog-like system .. We call the resulting language paradigm Definitional Constraint Programming (DCP). We claim that DCP provides a uniform and elegant integration of functional, constraint and logic programming, while preserving faithfully the essence of each of these language paradigms. In section 3, constraint systems and constraint programming are investigated at a very general level. A constraint logic programming model is then presented in section 4 as a particular constraint programming paradigm. Section 5 presents constraint functional programming (CFP) as a framework which superimposes a solving capability on the functional programming paradigm. The definitional constraint programming paradigm is developed in section 6. We discuss future work in section 7 and make some concluding comments in section 8. Introduction During the past ten years the integration of functional and logic programming languages has attracted much research. An extensive survey and classification of their results can be found in [GLDD90]. Traditionally this integration is performed by attempting to integrate the respective semantic logics of functional and logic languages in some way, resulting in a "super logic language". The conventional understanding is that a logic program defines a logical theory and computation is attempting to prove that a query is a logical consequence of this theory. Taking this view, integration is regarded as enhancing the original logic to cope with functional programming features and results in a new logic programming system. In section 2 we survey the main results of this approach. It seems to us that this approach fails to deliver all the features of both functional and logic programming. The main source of inadequacy appears to stem from the respective "intended semantics" assumed for logic and functional languages. It is this intended semantics which we question, motivating our search for a new way of approaching the problem of integrating functional and logic languages. We show in later sections that if we regard functional programming as defining a higher-order value space, we can ex- 2 Background and Motivation From the traditional view of logic programming, integrating functional and logic languages is viewed as enhancing the original logic to cope with functional programming features. Most approaches take first-order equational logic as the semantic logic of functional languages and combine it with Horn clause logic. A comprehensive presentation of the theory of Horn clause logic with equality may be found in [GM87] and [Yuk88]. This shows that for every theory in Horn clause logic with equality, its initial model (called the least Herbrand E-model in [Yuk88], and the least Herbrand model in [Sny90]) always exists. Crucially, the initial model is the intended model of a logic programming system, since, according to the Herbrand theorem, the model is complete with respect to solving a query. For a Horn clause with equality program r and a query 3Xl,' .. , xnAJ, .. . , An where Ai is an atom or equation, a computational model must verify r 1= 3Xl,' .. , xnA 1 , •• • , An by computing an answer substitution (J such that r 1= V((JAI 1\ ••. 1\ (JAn). Such models integrate SLD-resolution with some form of equational deduction such as paramodulation. A complete computational model was proposed recently by Snyder et. al. [Sny90] as a goal directed inference system. Systems which aim to sup- 683 port the full power of Horn clause logic with equality include Eqlog [GMS4], which exploits fully the order-sorted variation of the logic, SLOG [FriS5] in which a completion procedure is used as the computational model, and Yukawa's system [YukSS] which uses an explicit axiomatization of equality. The computational difficulties of constructing a practical programming language based on the full Horn clause logic with equality leads us to conclude that this approach is not appropriate. Alternative languages overcome these problems by imposing syntactic and semantic restrictions on the paradigm. They all aim either to restrict the use of, or to weaken, defined equality. An example of the first approach is Jaffer and Lassez's Logic Programming Scheme [JLS6], in which the equality part of a program is defined separately from predicate definitions. A program uses a firstorder equational sublanguage to define abstract data types over which a definite clause subprogram is imposed. Operational models are based on SLD-resolution together with an E-unification procedure which solves equations over the equality defined by the equational subprogram. Another way to restrict the computational explosiveness of general equational deduction is to use equational clauses as directed rewrite rules. A full discussion may be found in [DOSS]. Narrowing [HuISO] (resp. conditional narrowing [DOSS]) is employed to solve equations in a rewriting system (resp. conditional rewriting system). Many languages have been developed along this line, e.g. RITE [DPS6b], K-Leaf [EGPS6]. They represent enhanced Prolog systems in which a "rewrite" relation is defined over the Herband space. Syntactic restrictions guarantee the confluency of this rewrite relation so that equational logic can mimic first order functional programming. In the case of K-Leaf, the Herbrand space is enhanced to include partial terms, thus the lazy evaluation of functional languages may be modeled. These endeavours have led to the development of several very successful languages and have significantly enriched the state of the art of declarative language design, semantics and implementation. However, we believe that the benefits of this combination are arguable and question how much is gained by enhancing a first-order logic by weakening a higher-order logic. Moreover, even with only firstorder equational logic added, the inefficiencies of equational deduction mean that the resulting system is far from practical. This approach to language integration results in a sophisticated theorem prover, which we find unsatisfactory. We suggest, therefore, some fundamental rethinking on the purpose of integrating functional and logic languages. In fact, the conventional assumption that a logic program defines a logical theory has been criticized in many circumstances because: "there is no reference to the models that the theory is a linguistic device for" [MesS9]. A logical theory may have many models, however when we are programming we always have a particular intended model in mind. This alternative school of thought regards a program as a linguistic description of the intended model; but the model itself is primary. For a Horn clause program, its least Herbrand model is taken as the intended model. Therefore, if a program is regarded as a linguistic description of this model, the canonical denotation of a program is not a first-order theory but a set of relations over the Herbrand space. This view oflogic programming has also been taken by researchers wishing to extend Prolog-like systems. Hagiya and Sakurai [MTS4] present a formal system for logic programming based on the theory of iterative inductive definitions. A similar approach is taken by Hallnas and Schroeder-Heister to develop the framework of General Horn Clause Programming [AEHKS9]. Paulson and Smith proposed an integrated system in which a logic subprogram is regarded as an inductive definition of relations [PSS9]. This definitional view of logic programming suggests the flexibility to define Horn clauses over arbitrary domains. Relations become constraints over the domain of discourse, which coincides with the general framework of Constraint Logic Programming [SmoS9]. In this paper, we take this idea one step further by using a functional programming language to define the domain over which relations are defined. A novel definitional constraint programming system is induced in which functions and relations are used together to define constraint systems. 3 Constraint Programming In this section, we present a framework for constraint programming which has it origins in the seminal work of Steele [SteSO]. From the mathematical point of view, constraints are associated with well-studied domains in which some privileged predicates, such as equality and various forms of inequalities, are available. Relations formed by applying these predicates are regarded as constraints. A constraint may be regarded as a statement of properties of objects; its denotation is the set of objects which satisfy these properties. Therefore, constraints provide a succinct finite representation of possibly infinite sets of objects. We present a simple definition of constraint systems to capture these characteristics. 3.1 Constraint System Definition 3.1 (Constraint System) A constraint system is a tuple < A, V, CP, I> where • A is a set of values called the domain of the system. • V is a set of variables. • cp is a set of constraints. We define an A-valuation as a mapping V -+ A, and VA as the set of all A-valuations. A computable function V is used to assign to every constraint a finite set V( <1» of variables, which are the variables constrained by <1>. ValA denotes the set of all A-valuations. • I is an interpretation which consists of a solution mapping [f, mapping every basic constraint to [f , a set of A-valuations called the solutions of and []I is solution closed in the sense that We now present some examples of constraint systems. The most familiar constraint system in the context of program- 684 ming languages is perhaps the Herbrand system which is a constraint system over finite labelled trees. Example 3.1.1 (Herbrand System) Let E be a set of ranked signatures of function symbols and V be a set of constant symbols treated as variables. T(E) is the ground term algebra consisting of the smallest set of inductively generated E-terms. A Herbrand system is a constraint system < T(E), V, <1>, I> where consists of all term equations of the form tl = t2 for tl, t2 E T(f' V), where T(E, V) is the free term algebra, and [tl = t2] = {a 1 atl == at2} where == denotes the identity of two terms. Example 3.1.2 (Herb rand E-System) Let E, V be as above and E an equational theory over T(E, V). Then T(E)j E denotes the quotient term algebra consisting of the finest E-congruences over T(E) generated by E. The constraint system < T (E) j E, V, <1>, I > is called the Herbrand E-System where consists of all term equations of the form 1 tl = t2 for tl, t2 E T(E, V) and [tl = td = {[alE 1 [atl]E = [at2]E}, where [t]E stands for the equivalence class of t in T(E) and [alE : V --t T(E)j E stands for the corresponding equivalence class of ground term substitutions a : V --t T(E). Constraint systems on various term structures can be regarded as cases of the following general definition of an algebraic constraint system. Example 3.1.3 (Algebraic Constraint System) Let A be an algebra equipped with a set of operators E and a set of predicates II. Then the algebra is associated with a constraint system SA : , I > where 1A 1is the carrier of the algebra and evey constraint in is of the form p( el, ... , en) where every ei is an A -expression and p E II is an n-ary predicate in the algebra. [p( el, ... , en) = {a 1 A, a 1= p( el,"" en)}. Examples of algebraic constraints are constraints over term algebras, constraints over arithmetic expressions and constraint systems in boolean algebra. f Following the idea of associating constraint systems with algebras, predicate logic can be viewed from the constraint system perspective. [ is valid iff [2, denoted where T(I;) is the ground term algebra for the signature I; of function symbols (Herbrand space) and

be a constraint system closed under conjunction, renaming and existential quantification. Given a signature R as a family of user-defined predicates indexed by their arities, a constraint logic program rover C is a set of constrained defining rules of the form : P f- Cl, . . . , Cj, B 1 , Bm ••• , P is an R-atom of form p(xt, ... , xn) where p E Rn is an n-ary user-defined predicate, Ci E

be a constraint system and r a constraint logic program over C. The sequence of interpretations In represents a chain in the complete lattice of interpretations of r. The limit of the chain is the minimal model of rover C. 687 In this least model semantics of a CLP program the underlying constraint system is extended to a new constraint system via user-defined constraints. We call this a relational extension of a constraint system. Definition 4.1 (Relational Extension) Let r be a constraint logic program and R be the signature of user-defined predicates in r. r constructs the constraint system : R( C) :< A, V, as a relational extension of the underlying constraint system C :< A, V, by extending fc {a I (a(xl), ... ,a(xn )) E pIr} where Ir is the minimal model of rover C. A solver for a relationally extended constraint system can be constructed by integrating SLD-resolution with the constraint solver of the underlying constraint system to give constrained SLD-resolution. Constrained SLDresolution rewrites a goal of the form G = Gc U Gn, where Gn is a finite subset of atoms in G':3Xu Y < GnU{B1, ... ,Bm}UGcU{ q""'Ck }U{Xl-Sl , ... ,xn-Sn}> where VY.P(Xl ... Xn ):- CI, ... ,ck,BI, ... ,Bm is a variant of a clause in a program InCap ([m, 2m, 3m], 1000) InCap (x,1.1c-i), x=[2m, 3m], i=m, c=1000 --+R InCap(x',1.1c'-i'), x=i':x', c'=1.1c-i, x=[2m, 3m], i=m, c=1000 --+c InCap (x',1.1c'-i'), i'=2m, x'=[3m],i=m, c'=1100-m --+R InCap(x",1.1c"-i"), x'=i":x",1.1c'- i'=c" , i'=2m, x'=[3m],i=m,c'=1100-m --+c InCap (x",1.1c"-i"), x"=[], i"=3m,i'=2m, i=m,x'=[3m], c'=1100-m, c"=1210-3.1m --+R x"=[],1.1c"-i"=0, i"=3m, i'=2m, i=m, x'=3m, c'=1100-m, c"=1210-3.1m --+c 1.1(1210-3.1m)=3m --+c m =207+413/641 --+R Constrained SLD-resolution is a sound solver for a relationally extended constraint system and, as proved in [Sm089], it is also well-founded. Therefore, any consistent goal can be simplified to a set of solved forms. Let

l G~ as its solved forms, given a goal G~ ~f

G~ I- Vi>l G~. Moreover, if the underlying constraint system is compact, then G~ I- Vi=l G~ for some n, i.e. the model has the stronger completeness of section 3.2. r. Constraint Simplification' G:3X • G':3X if 3X.G c ----tc 3X.G~ and ----tc is the simplification derivation realised by the solver in the underlying system. Finite Failure: G:3ia~~~Gc if 3X.G c ----tc false. In this model, semantic resolution generates a new set of constraints whenever a particular program rule is applied. The unification component of SLD-resolution is replaced by solving a set of constraints via the underlying solver. Whenever it can be established that the set of constraints is unsolvable, finite failure results. For example, the following CLP program [CoI87]: InCap ([], 0) InCap (i :X~ c) we compute the solved form of the goal constraint: InCap ([m, 2m, 3m), 1000). One execution sequence is illustrated below, in which --+ R denotes a semantic resolution rewrite step: InCap (x, 1.1*c - i) can be used to compute a series of instalments which will repay capital borrowed at a 10% interest rate. The first rule states that there is no need to pay instalments to repay zero capital. The second rule states that the sequence of N+1 instalments needed to repay capital c consists of an instalment i followed by the sequence of N instalments which repay the capital increased by 10% interest but reduced by the instalment i. When we use the program to compute the value of m required to repay $1000 in the sequence [m, 2m, 3m}, 5 Constraint Functional Programming Constraint functional programming (CFP) is characterized as functional programming, enhanced with the capability to solve constraints over the value space defined by a functional program. An intuitive construction of this language paradigm is presented below. 5.1 Informal CFP A data type D in a functional program, r, can be associated with a constraint system CD. CD may contain privileged predicates over D. A CFP system may be formed to extend the constraint solver so that any D-valued expression, which may involve user-defined functions, can be admitted in constraints. A D-valued expression must be evaluated to its normal form with respect to r to enable the constraint solver to handle that value. We give a simple example of this paradigm. We assume a constraint system over lists in which atomic constraints are equations asserting identity over finite lists. A unification algorithm is used as the basic solver for the system. Given a functional program defining the function ++ which concatenates two lists and the function length which computes the length of a list: 688 data [alpha] = [] I alpha : [alpha] functions ++ :: [alpha] X [alpha] ---t [alpha] length :: [alpha] ---t Num [] ++ (x:y) z = z z = x: (y++z) + length y An extension to the basic solver may be used to solve the constraint: 11 ++ 12 = [al, a2, ... , an], length 11 = 10 to compute the first 10 elements of a the list [aI, a2, ... , an]. The solver must apply the function definitions of ++ and length and must guess appropriate instances of the constrained variables. We will show that this procedure itself may be modelled by some new constraints generated during rule application. Solving constraints over a functional program significantly enhances the expressive power of functional programs to incorporate logic programming features. This idea was central to the absolute set abstraction construct which was originally proposed in [DAP86,DG89] as a means to invoke constraint solving and collect solutions. Using the absolute set abstraction notation, the above constraint may be represented as the set-valued expression: { 11 I 11 ++ 12 = [al, .. - Decl in Exp .. - ft = e Exp .. - Pattern ++ length [] = 0 length (x:y) = 1 Program Decl a2, ... , an], length 11 = 10 } Reddy's proposal of "Functional Logic Programming" languages [Red86] also exploits this solving capability in functional programs. However, his description of functional logic programming as functional syntax with logic operational semantics fails to capture the essential semantic characteristics of the paradigm. The constraint programming approach, as we will show in the following, presents a concise semantical and operational model for the paradigm. We assume a functional language that is strongly typed, employs a polymorphic type system and algebraic data types, and supports higher-order functions and lazy evaluation. Examples of such languages are Miranda [Tur85] and Haskell [Com90]. To investigate constraint solving we put aside the statical features of a functional language such as its type system, and concentrate on its dynamic semantics. We use a kernel functional language with recursion equation syntax for defining functions. We assume variables ranged over by x and y, a special set of functional variables (identifiers) ranged over by f and g, constructors ranged over by d, constants ranged over by a and b, patterns ranged over by t and s and expressions ranged over bye. A pattern is assumed to be linear, i.e. having no repeated variables. Data terms comprise only constants, constructors and first-order variables. The following syntax defines this tiny functional language: I Decl; Decl x I a I el e2 I el op e2 I if el then e2 else e3 .- x I a I dt1 , ••• , tn The language can be regarded as sugared A-calculus and a program as a A-expression. The program shown above is an instance of this formalism in which the data statement introduces a list structure with a nullary constructor [] and a binary constructor :, and functions length and ++ which are defined by recursion equations. The semantics of a functional program is given in the standard way [Sc089]. The semantic domain D of the program is an algebraic CPO which is the minimal solution of the domain equation : D = B.l+ C(D)+D ---t D D contains the domain B.l of basic types (real numbers, boolean values et. al. lifted by .1 which denotes undefinedness), the domain C ( D) for constructed data structures which consists of partial terms ordered with respect to the monotonicity of constructors and the domain D ---t D of all continuous functions. A subdomain A of C(D) : A = B.l + C(A) is distinguished as the domain of data terms in the language (which is defined by the eq-type of ML [MiI84]). We use T to denote all complete objects of A. For a functional program, the semantic function P[] computes the value of the program in terms of the function D[] : Decl ---t (Var ---t D) ---t (Var ---t D) which maps function definitions to an environment which associates each function name with its denotation. The function £[] : Exp ---t (Var ---t D) ---t D maps an expression together with an environment Tf : Var ---t D (a D-valuation) to an element of D. 5.3 Evaluating Nonground Expressions Conventional functional programming involves evaluating a ground expression to its unique normal form by taking a program as a rewriting system. To superimpose a solving capability on the functional programming paradigm, we consider first the extension of functional programming to handle non-ground expressions. The meaning of a non-ground expression is a set of values corresponding to every correctly typed instantiation of its free variables. Narrowing has been proposed as the operational model for computing all possible values of a nonground expressions [Red84]. In the theorem proving context, enumerating narrowing derivations provides a complete E-unification procedure for equational theories defined by convergent rewriting systems. This use of narrowing must be refined for the functional programming context. Due to the lazyness of functional languages, only those narrowing derivations whose corresponding reduction derivations are lazy should be enumerated. This notion of lazy narrowing is mentioned by Reddy in [Red84]. A lazy narrowing procedure, pattern-driven narrowing, is proposed by Darlington and Guo in [DG90] for evaluating absolute set abstractions. A similar procedure was indepen- 689 dently developed by You for constructor based equational programming systems [You88]. Here we present a lazy narrowing model following the constraint solving approach. The model is central to the CFP paradigm. Consider reducing a non-ground expression of form fe by a defining rule ft = e'. The environment 11 should be enhanced to satisfy £[ e]1] = £[ t]ry, i.e. ry is a solution ofthe rewriting constraint e = t. This equality is the so called semantic equality since it is determined by the identity of denotations of components. It is not even semidecidable since it involves verifying the equivalence of partial values. However, since in our problem t is always a linear pattern, a semidecidable solver exists. a standard reduction in [Hue86j. Enumerating patterndriven derivations is optimal and complete in the sense that any other derivation is subsumed by a pattern-driven derivation. Definition 5.1 The solved form of a rewriting constraint e = t is of the form {Xl = tl,· .. ,Xn = tn, YI = el, ... ,Ym = em} where the Xi E V( e) are output variables and the Yi E V( t) are input variables. The equation set 8 : {Xl = tl , ... ,Xn = t n } is an output substitution equation and B: {YI = el, ... , Ym = em} is an input substitution equation. We conclude that pattern-driven narrowing ·provides a realisation of lazy narrowing. Lazy narrowing extends functional programming with the capability to find for which values of variables in a nonground expression the expression evaluates to a given value. Thus, it introduces the essential solving feature to functional languages. However, on its own it is not enough because "built in" predicates may exist in functional languages, for example equality and various boolean valued primitive functions, for which a dedicated constraint solver is required. If we integrate lazy narrowing with a constraint solver over data terms, the solver is then extended to allow general expressions containing userdefined functions. Therefore, querying a functional program becomes possible. This enhanced functional programming framework may be formalized as the paradigm of constraint functional programming. The substitutions 8 and (j corresponding to 8 and Bare called output substitutions and input substitutions respectively. 5.4 The constraint solver presented below simplifies a rewriting constraint to its solved form. Solving a rewriting constraint realises the bidirectional parameter passing mechanism for narrowing an outmost function application. The algorithm is called pattern-fitting [DG89]. Substitution: {x = r} U G ::} {x = r} U pG where p = {x 1--+ r}. Decomposition: {de = dt} U G ::} {e = t} U G Formalizing CFP We assume a constraint system Cy : (7, V, cI> Cl Ie) over firstorder values, where V is the set of variables over first-order types and cI> e are constraints consisting of privileged predicates R. Computing the truth value of a ground relation of data terms with respect to R is decidable. Thus, a predicate w in can always correspond to a boolean valued function fw in the language. A functional program may be applied to Cy. This introduces a new syntactic category in the functional program for constraints: n Constraint ::= w( el, ... , en) I Constraint, Constraint Removing: {a = a} U G::} G Failure: {dl el = d2e2} U G ::} false if dl =I d2 • Constrained Narrowing: {fe where ft =r E = ds} u G ::} {r = ds, e = t} u G r Lemma 5.1.1 The pattern-fitting algorithm is a complete solver for simplifying a rewriting constraint to its solved form. For any rewriting constraint e = t, a solved form corresponds to a pattern-driven narrowing step fe ~8 (je' with respect to a defining rule ft = e' where 8 is the output substitution and (j is the input substitution associated with the solved form. A pattern-driven narrowing derivation is defined in a standard way by composing the output substitutions of each of its component steps. Note that a one step pattern-driven narrowing derivation contains many narrowing steps due to the need to solve rewriting constraints. Each narrowing step is demand driven and affects an outermost function application. Therefore, we have the following theorem: Theorem 5.1.1 For any expression e and term t, if e ~8 t, then the corresponding reduction derivation 8e --+ * t is always a lazy derivation . . Such a reduction derivation is called where W(XI, ••. , xn) E cl>e. We use c to range over constraints. Constraints in Cy are now enriched to admit general expressions defined by the functional program. A constraint system is admissible if it is closed under negation. In the following, we assume the underlying constraint system is admissable. A CFP program is an extension of a functional program with the syntax: Program ::= Decl in e I Decl in c The semantic function C[] : Constraint constraints to their solution sets: C[ CI, C2] C[w( ell ... , en)] --+ P( Env) maps C[ CI] n C[ C2] {ryIUV( ei) I 7 1= w( £[ el]ry, ... , £[ en]ry)} This semantics reveals constraint solving over a functional language as "computing the environments" in which expressions, when evaluated, satisfy constraints. The constraint solving mechanism is formed by integrating the solver of Cy with lazy narrowing, thus enhancing Cy to handle constraints in the more general universe constructed by a functional program. A scheme for such an integration is presented below. We use the pair (G, C) to represent a goal G U C in which C contains rewriting constraints and G contains constraints from the underlying constraint solver G. 690 Constrained Narrowing: where ft = r E r . Simplification 1: (Gu{w( ... ,je, ... )},C) ( GU{w(' .. ,r, ... )),C)u{e==t} (g,',~) if G ~ G' where ~ is a simplification derivation computed by the under lying solver. Simplification 2: some algebra, it is perfectly reasonable to define relations over the system following the philosophy of general constraint logic programming. Therefore, CFP is a "building block" for deriving a fully integrated Definitional Constraint Programming system in which both constraints and the domain of discourse are user-definable. 19,'g1 if C ~ C' where ~ stands for a simplification derivation computed by the solver of rewriting constraints. Failure: (G,C) 6 if G ~ false or C ~ false (G,CU{x==e}) S U b S t I't U toIon 1 : (pG,CU{x==e}) where p = {x f---f e} and x E V(G) and C u {x = e} is in solved form. (GU{x==t},Ci S U b S t IOt U toIon 2 : (G,pCu{x==t ) where p = {x f---f t} and G U {x = t} is in solved form. We are now in a position to present a unified definitional constraint programming (DCP) framework. A DCP program defines a constraint system by defining its domain of discourse and constraints over this domain. As discussed above, CFP and CLP exhibit, respectively, the power to define domains, and the power to define constraints. Therefore we would expect the unification of these two paradigms to result in a full definitional constraint programming system. We start by superimposing a functional program onto a privileged constraint system. As shown in the previous section, the functional program defining functions ++ and length can be queried to compute the initial segment of a given list. A further abstraction is possible if we take this CFP enriched constraint system as the underlying constraint system for a CLP language. Thus, CFP queries can be used to define relations as new constraints. For example we can define the relation front: Definitional ming Constraint Program- false Positive Accumulating: ( G, cU{Jw e==true}) (GU{w(e)},C) Ifw(X)E~c. A I to (G,CU{Jwe=false}) · N egat Ive ccumu a mg: (Gu{-'w(e)},C) if w( x) E ~ c and the constraint system is admissable. An initial goal takes the form (G, {}). Its solved form is of the form (G n, Cn) where Gn is in solved form with respect to the underlying solver and V( G) rf. V( C) and C are solved form rewriting constraints. The soundness of lazy narrowing guarantees that the enhanced solver is sound. However, it is not in general com plete because a functional program may define some boolean-valued functions which have no corresponding constraints in CT. This problem is similar to that of solving "hard constraints" in general constraint programming. Some ways exist to resolve this problem such as the "waitingresuming" approach in which the solving of a hard constraint is delayed until its variables are sufficiently instantiated [JL87], or by defining special simplification rules for such constraints. However, for a program in which all booleanvalued functions are consistent with the underlying constraint system, the scheme provides a complete enhanced solver. The scheme provides a generic model to enhance a constraint system to solve constraints in functional languages. In [Pulga], Pull uses unification on data terms as the underlying solver and combines it with lazy narrowing to solve equational constraints in lazy functional languages. In [JCGMRA91], a more general constraint system over data terms is adopted in which disunification is also exploited to deal with negative equational constraints. This model can be regarded as an instantiation of the scheme by providing unification and disunification as the "built-in" solvers. CFP represents a constraint programming system of the "domain construction" approach of section 3.3. This means that constraints appear only as computational goals; it is not possible to define new constraints in the system. However, the framework significantly enhances the expressive power of both functional programs and the basic constraint system. Moreover, since a CFP program provides a constraint system in which defined functions behave as operators in front (n, 1, 11) 11 ++ 12 = 1, length 11 =n to compute the initial segment with length n of an input list 1. This systematic integration of CFP and CLP results in a definitional constraint programming system and therefore, can be expressed by the formula DCP = CLP( CFP). It is straightforward to construct the semantic model of a DCP program. The semantics for its functional component are traditional functional language semantics. The intended model of the relational component is its least model. This may be constructed by computing all ground atoms generated by the program using the "bottom up" iterative procedure presented in theorem 4.0.1 and taking the functionally enhanced constraint system as the underlying constraint system. In terms of the semantic functions defined above the denotation of a defined predicate p in a program r can be computed by enumerating the inductive closure of r as follows: pO 0 pIn+l {a( xl, ... , xn) I a E ni==l C[ Ci] n nj==l [Bj fn+l } for each P(Xb.'.'Xn ):- cl, ... ,cn,Bl, ... Bm E r. [B]I maps B to all solutions of B under the interpretation I for the predicates in B. That is : [p( el, . .. , en)f = {1] I (£[el]1], . .. ,£[ en]1]) E pI Compared with other functional logic systems, this general notion of constraint satisfaction permits us, not only to define equational constraints over finite data terms, but also to introduce more general domain specific constraints. Moreover, partial objects as introduced by lazy functional programming are admissible for constraint solving in the system 691 as approximations of complete objects. This gives uniform support for laziness in a fully integrated functional logic programming system. The computational model of the DCP paradigm is simply the instantiation of the underlying constraint solver in constrained SLD-resolution to the CFP solver. Soundness and completeness are a direct result of the properties of these two components. Clearly then, DCP represents a supersystem of both these paradigms. Both the CLP InCap program and the CFP query which computes the initial segment of a list are valid DCP programs and queries. Moreover, the expressive power of each of these individual paradigms is enhanced in the DCP framework. We will demonstrate this with reference to some programming examples. The "built-in" solver manipulates only first-order objects. In any correctly-typed DCP program, a function-typed variable will never become a constrained variable. Thus, higherorder functional programming features safely inherit their intended use in functional computation without introducing computability problems. The following examples illustrate some of the attractive programming features of this rich language paradigm. The quicksort algorithm is defined below as a relation which uses difference lists (which appear as pairs of lists (x, y)) to perform list concatenation in constant time. The partitioning of the input list is specified naturally as a function, while the ordering function is passed as an argument to the quicksort relation. Within the semantics of DCP, such a functional parameter can be treated as special constant in relation definitions. A primitive function apply is assumed which is responsible for the application of such function names to arguments. functions partition : (alpha -+ alpha -+ boolean) X alpha X [alpha] -+ ([alpha], [alpha]) relations quicksort : (alpha -+ alpha -+ boolean) X [alpha] X ([alpha ],[alpha]) partition (j, n, m : 1) = if f (n, m) then (m : 11, 12) else (11, m : 12) where (11, 12) = partition (j, n, 1) partition (j, n, []) = ([], []) quicksort (j, n : 1, (x, y)) partition (j, n, 1) = (11, 12), quicksort (j, 11, (x, n : z)), quicksort (j, 12, (z, y)) quicksort (j, [], (x, x)) The relation perms below shows an interesting and highly declarative way of specifying the permutations problem in terms of constraints over applications of the list concatenation function ++. relations perms : [alpha] X [alpha] perms (a : 1, (11 ++ (a : 12)) (11 ++ 12) = perms The final example shows how the recursive control constructs of higher-order functions may be used to solve problems in 1 the relational component of a DCP language. We use a reduce function over lists, together with the "back substitution" technique familiar in logic programming, to find the minimal value in a list and propagate this value to all cells of the list. This is shown via the relation propagatemin below, which uses the standard list reduce function to find the minimum value, y, in the input list and construct a list, ll, which is isomorphic to the input list, in which each element is a logical variable x. relations propagatemin : [Int] X [Int] propogatemin 1 11 :reduce (j x, 1, (MaxInt, nil)) = (y, 11), x = y where f z n (m, 12) = (min (n, m), z : 12) These examples show that as well as being a systematic and uniform integration of constraint, logic and functional programming with a sound semantics, the DCP paradigm displays a significant enhancement of programming expressive power over other integrated language systems. We believe that this pleasing outcome is a direct result of our strenuous effort to identify clearly the essential characteristics of the component language paradigms and to preserve them faithfully in the DCP language construction. We have defined a concrete DCP language, Falcon [GP91]. Many Falcon programming examples appear in [DGP91]. 7 Future Work A very promising area of future research is the use of DCP as the foundation for studying declarative parallel programming. The idea is quite simple. If we keep strictly to the functional computational model for the functional sublanguage of a DCP language, synchronization between functional computation and constraint solving over logic variables becomes possible. Within this concurrent DCP framework, both the logical and the functional su blanguages cooperate to construct objects. The logical component approximates objects by imposing constraints and the functional component constructs objects explicitly. At each step of the construction, the functional part asks for more information and continues the construction if and when that information is available. Otherwise, it suspends and waits until other concurrently executing agents provide the required information. This behaviour is an important generalization of the traclitional local propagation model for constraint-based computation [Ste80]. The synchronization mechanism for functional computation obviously follows the data flow school, but the use of constraint computation to enhance incrementally the information of logical variables provides a very attractive general data flow model, i.e. hi-directional data flow. This idea originated from the data flow language JdNouveau [NPA86] in which an array of logical variables is a special structure for synchronising functional computation and constraint solving. This feature is generalised by the concurrent DCP model as the basic principle of programming. Concurrent DCP may be understood as a further development of the concurrent constraint programming framework proposed by Saraswat et. al. [SR90] by exploiting the 692 elegant concurrent cooperation between functional and logic computation. Since computation in its functional sublangauge is deterministic, we would expect the efficiency of the system to be much better than a logic programming system. Moreover, since the functional component provides a powerful synchronization mechanism for deduction, with such a "control" mechanism the overall efficiency of the paradigm is promising. This idea of exploiting deterministic computation in a non-deterministic system by constraint propagation is also central to the Andorra model [S.H90] which has been widely accepted recently in the logic programming community. The development of concurrent DCP has led to a very interesting convergence of research on language integration, constraint programming and declarative parallel programming in [GF91]. 8 M. Aronsson, L-H Eriksson, L. Hallnas, and P. Kreuger. A Survey ofGCLA: A DefinitionalApproach to Logic Programming. In Proc. of the International Workshop on Extensions of Logic Programming, volume 475 of Lecture Notes in Gomputer Science, Springer Verlag. Springer, 1989. [Co187] A. Colmerauer. Opening the Prolog III universe. Byte, July, 1987. [Com90] Haskell Committee. Haskell: A non-strict, purely functional language. Technical report, Dept. of Computer Science, Yale University, April 1990. [DAP86] J. Darlington, Field. A.J., and H. Pull. The unification of functional and logic languages. In D. DeGroot and G. Lindstrom, editors, Logic Programming, pages 37-70. Prentice-Hall, Englewood Cliffs, New Jersey, 1986. [DG89] J. Darlington and Y.K. Guo. Narrowing and Unification in Functional Programming. In Proc. of RTA 89, pages 292-310, 1989. [DG90] J. Darlington and Y. Guo. Constraint equational deduction. Technical report, Dept. of Computing, Imperial College, March 1990. will be presented in CTRS' 90. [DGP91] J. Darlington, Y.K. Guo, and H. Pull. A new perspective on integrating functional and logic languages. Technical report, Dept. of Computing, Imperial College, December 1991. [D088] N. Dershowitz and M. Okada. Conditional equational programming and the theory of conditional term rewriting. In Proc. of the FGGS '88, ed. by IGOT,1988. [DP86] N. Dershowitz and D.A. Plaisted. Equational programming. Machine Intelligence (Mitchie,Hayes and Richards, eds.), 1986. [EGP86] C. Moiso E. Giovannetti, G. Levi and C. Palmidessi. Kernel Leaf: An experimental logic plus functional language - its syntax, semantics and computational model. ESPRIT Project 415, Second Year Report, 1986. [Fri85] Laurent Fribourg. SLOG: A logic programming language interpreter based on clausal superposition and rewriting. In Proceeding of the 2nd IEEE Symposium on Logic Programming, Boston, 1985. [GF91] Y. K. Guo and M. Fereira. Constraints, Functions and Concurrency. Technical report, Dept of Computing, Imperial College, Sept. 1991. Working Research Notes. Y. Guo, H. Lock, J. Darlington, and R. Dietrich. A classification for the integration of functional and logic languages. Technical report, Dept. of Computing, Imperial College and GMD Forchungsstelle an der Universitat Karlsruhe, March 1990. Deliverable for the ESPRIT Basic Research Action No.3147. Conclusion This paper set out to provide an answer to the question of how and why we should integrate functional and logic programming languages. We believe that this should be done not only with the goal of building a more powerful programming system but also aiming at diminishing the drawbacks of the individual language paradigms. An integrated system should not only inherent the features of its components but also, and equally importantly, it should exhibit new distinguishing features as a result of their combination. We have developed a methodology for integration which demonstrates how the essential relational and functional features may be preserved, and have explored the new programming features which arise. The main idea underpinning this work comes from clarification of the intended semantics of logic and functional languages which motivated the insight to use constraints as the glue for their integration. This led us to develop the new language paradigm of definitional constraint programming. We believe that the declarative constraint programming model is a promising language paradigm for the design of future programming languages. 9 [AEHK89] Acknowledgements We are indebted first and foremost to Sophia Drossopoulou and Ross Paterson, our two colleagues on the Phoenix project at Imperial College, for many valuable discussions. We also thank our other colleague on the Phoenix project at Nijmegen University and at GMD Kahlsruhe, particularly Maria Fereira for her cooperation and significant contribution to the recent work on concurrent DCP, and Hendrick Lock for his enlightening discussions on the philosophy of language integration. Many thanks are due to Dr. Hassan Ait-Kaci, Prof. J-L Lassez, Dr. J. Jaffer and Dr. Meseguer for their helpful insights and to all the people in the Advanced Languages and Architectures Section at Imperial College who provide a stimulating working environment. This work was carried out under the European Community ESPRIT funded Basic Research Action 3147 (Phoenix). References [GLDD90] [GM84] Joseph A. Goguen and Jose Meseguer. Equality, types, modules, and (why not?) generics for logic programming. Journal of Logic Programming, 2:179-210, 1984. 693 [GM87] Joseph Goguen and Jose Meseguer. Models and equality for logical programming. In Proc. of TAPSOFT 87, volume 250 of Lecture Notes in Computer Science, Springer Verlag. Springer, 1987. [GP91] Y.K. Guo and H. Pull. Falcon: Functional And Logic language with CONonstraints-language definition. Technical report, Dept. of Computing, Imperial College, February 1991. [Hue86] G. Huet. Formal structure for computation and deduction. Technical report, Dept. off Computer Science, Carnegie-Mellon University, May 1986. [HuI80] Jean-Marie Hullot. Canonical forms and unification. In 5th Con! on Automated Deduction. LNCS 87, 1980. [JCGMRA91] M.T. Hortala-Gonzalez J Carlos Gonzalez-Moreno and Mario Rodriguez-Artalejo. A Functional Logic Language with Higher Order Logic Variiables. Technical Report, Dpto. de Informatica y Automatica UCM, 1991. [JL86] [JL87] [LM89] Joxan J affar and Jean-Louis Lassez. Logical programming scheme. In D. DeGroot and G. Lindstrom, editors, Logic Programming, pages 441-467. Prentice-Hall, Englewood Cliffs, New Jersey, 1986. Joxan Jaffar and Jean-Louis Lassez. Constraint logic programming. In Prod. of POPL 87, pages 111-119, 1987. J-L. Lassez and K. McAloon. A constraint sequent calculus. Technical report, IBM T.J. Watson Research Center, 1989. [Mes89] Jose Meseguer. General logics. Technical Report SRI-CSL-89-5, SRI International, March 1989. [Mil84] Robin Milner. A proposal for Standard ML. In ACM Conference on Lisp and Functional Programming, 1984. [MT84] [NPA86] [PS89] M.Hagiya and T.Sakurai. Foundation of Logic Programming Based on Inductive Definition. New Generation Computing, 2(1), 1984. R. Nikhil, K. Pingali, and Arvind. Id nouveau. Technical report, M.I.T. Laboratory for Computer Science, 1986. CSG Memo 265. L.C. Paulson and A.W. Smith. Logic Programming, Functional Programming and Inductive Definitions. In Proc. of the International Workshop on Extensions of Logic Programming, volume 475 of Lecture Notes in Computer Science, Springer Verlag. Springer, 1989. [PuI90] Helen M. Pull. Equation Solving in Lazy Functional Languages. PhD thesis, Dept. of Com- puting, Imperial College, University of London, November 1990. [Red84] Uday S. Reddy. Narrowing As the Operational Semantics of Functional Languages. In Proc. of Intern. Symp. Logic Prog. IEEE'. IEEE, 1984. [Red86] Uday S. Reddy. Functional Logic Languages, Part 1. In J.H. Fasel and R.M. Keller, editors, Poceedings of a Workshop on Graph Reduction, Santa Fee, number 279 in Lecture Notes in Computer Sci- ence, Springer Verlag, pages 401-425, 1986. [Sco89] Dana Scott. Semantic domains and denotational semantics. Lecture Notes of the International Summer School on Logic, Algebra and Computation, Marktoberdorf, 1989. to be published in LNCS series by Springer Verlag. [S.H90] S.Haridi. A logic programming language based on andorra model. In New Generation Computing. 1990. [Smo89] Gert Smolka. Logic Programming over Polymorphically Order-Sorted Types. PhD thesis, Vom Fachbereich Informatik der Universitat Kaiserlautern, May 1989. [Smo91] Gert Smolka. Residuation and Guarded Rules for Constraint Logic Programming. Research Report RR-91-13 DFKI, 1991. [Sny90] W. Snyder. The Theory of General Unification. Birkhauser, Boston, 1990. [SR90] V.A. Saraswat and M. Rinard. Concurrent Constraint Programming. In Proc. 17th Annual ACM Symp. on Principles of Programming Languages. ACM, 1990, 1990. [Ste80] G .L. Steele. [Tur85] David A. Turner. Miranda: A non-strict language with polymorphic types. In Conference on Func- The Definition and Implementation of a Computer Programming Language Based on Constraints. PhD thesis, M.I.T. AI-TR 595, 1980. tional Programming Languages and Computer Architecture, LNCS 201, pages 1-16,1985. [You88] Jia-Huai You. Outer Narrowing for Equational Theories Based on Constructors. In Timo Lepisto and Arto Salomaa, editors, 15th Int. Colloqium on Automata, Languages and Programming, LNCS 317, pages 727-741,1988. [Yuk88] K. Yukawa. Applicative logic programming. Technical Report LP-5, Logic programming Laboratory, June 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 694 A Mechanism for Reasoning about Time and Belief Hideki Isozaki NTT Basic Research Laboratories 3-9-11, Midoricho, Musashino-shi Tokyo 180, Japan Abstract Yoav Shoham Computer Science Department Stanford University Stanford, CA 94305, U.S.A. information about the mental states of other agents at various times. Several computational frameworks have been proposed A statement referring to an agent's temporal belief to maintain information about the evolving world, which about another agent's temporal belief will be called a embody a default persistence mechanism; examples in- nested temporal belief statement. An example of it is the clude time maps and the event calculus. In multi-agent sentence "On Wednesday John believed that on the pre- environments, time and belief both play essential roles. vious Monday Jane believed that on the following Sat- Belief interacts with time in two ways: there is the time urday they would clean the house." Nested temporal be- at which something is believed, and the time about which liefs pose a number of interesting problems, both seman- it is believed. tical and algorithmic. In this paper we concentrate on We augment the default mechanisms proposed for the the latter kind; we propose a computational mechanism purely temporal case so as to maintain information not called a Temporal Belief Map, which functions as a data only about the objective world but also about the evo- base of nested temporal beliefs. lution of beliefs. In the simplest case, this yields a two- Consider a formal language for expressing nested tem- dimensional map of time, with persistence along each di- poral beliefs. A standard construction would extend clas- mension. sicallogic with a modal operator B:rp for each agent des- Since beliefs themselves may refer to other beliefs, ignator a and time point symbol t,meaning intuitively we have to think of a statement referring to an agent's that at time t the agent a believes rp. To ensure that the temporal belief about another agent's temporal belief (a modal operator respects the properties of belief (or, more nested temporal belief statement). It poses both semanti- exactly, its crude approximation that has been employed cal and algorithmic problems. In this paper, we concen- in computer science and AI), various restrictions on this trate on the algorithmic aspect of the problems. The gen- operator have been suggested, and then extensively ex- eral case involves multi-dimensional maps of time called plored, debated and modified[Hintikka 1962, Griffiths Temporal Belief Maps. 1967, Konolige 1986]. These include properties such as 1 Introduction: Time Maps and Temporal Belief Maps B:(rp:::) '¢) I\B:rp:::) B:,¢ (the 'K' axiom), B:rp:::) ,B:,rp Bta BtaT and ,BtaT Bta ,Bta rp (the (the 'D' axiom) ' aBtT (f) :::) (f) (f) :::) '4' and '5' axioms)[Chellas 1980], and others. In addition, although these have been less well studied, further In multi-agent environments, time and belief both play constraint may be imposed on the change in belief over essential roles. Belief interacts with time in two ways: time. there is the time at which something is believed, and We will briefly return to these properties in the next the time about which it is believed. As in the atemporal section, but they are not the focus ofthis paper. Instead, treatment of belief, beliefs themselves may refer to beliefs we concentrate on algorithmic issues. Consider first the (of other agents, or even the same one). For example, in purely temporal case, without an explicit notion of be- the framework of Agent Oriented Programming [Shoham lief. In principle, capturing the truth of facts over time 1990], at any time the mental state of an agent contains 695 should pose no problem; we can use standard data base techniques to capture the fact true at a single point in time, and repeat it for all point. In practice, though, it is impossible, and we will need to use some shortcuts. Figure 1: A simple persistence The representational aspect of the problem appears in the form of the well-known frame problem [McCarthy and Hayes 1969]: when you buy a red bicycle, how you conclude that a year later it will still be red, regardless of what happens in the meanwhile - the bike is ridden, the tire is fixed, elections are held - Figure 2: Clipping a persistence unless it is painted. An axiom stating explicitly that the color does not change after each action is called a frame axiom; the problem is to capture the persistence of facts without including the numerous possible frame axioms. The frame problem and related problems have been investigated in detail from the logical point of view (d. [Shoham 1992]), and most solutions proposed have made use of nonmonotonic logic. Adding belief yields a qualitative increase in difficulty, since beliefs (and lack thereof) tend to persist as well: once you learn something, you will keep it in mind until you forget it or learn incompatible facts. The formal details of the persistence of mental state have not yet been studied as deeply; an initial treatment of it appears in [Lin and Shoham 1992]. As was said, we are interested in the algoirthmic aspects of the problem. Computational complexity of knowledge and belief without time was discussed by [Halpern and Moses 1985]. In the purely temporal case, the question is how to efficiently implement the following persistence principle (throughout this article we will assume discrete time, but the discussion can be adapted to the continuous case as well; we also assume propositional facts, with no variables): determine the truth value of all other points. Each event gives rise to a default persistence, which ends at the first future point about which a contradictory fact is believed. For example, if an event which causes p occurs at time t[l] (the superscript in [ ] identifies a given time point), and no other information about p is yet present in the time map, then the two points t[l] and 00 are associated with p, with a default persistence of p from the first to the second. This may be depicted graphically by Figure 1. If it is subsequently added that at time t[2] (> t[l]) an event happened that causes -'p, t[2] is associated in ad- dition with p; a default persistence of -'p is assumed between t[2] and 00, is "clipped" at t[2] and the persistence of p starting at t(1] (Figure 2). This is a crude description of the operation of time maps, but it suffices to explain the transition to temporal belief maps (TBM's), which incorporate an explicit notion of belief. (Note that we have discussed only persistence into the future. Most of the literature in AI does that, and we too will in this paper. However, persistence into the past can make as much sense, especially when one adds an explicit pHI holds iff either an event which causes p notion of belief. For example, if you find a book on a occurred at time t, or else pi holds and no event desk, you will believe that the book was on the desk a few which causes -'p occurred at time t. minutes ago. Most researchers manage to avoid this is- Straightforward embodiment of this rule backward sue by limiting the form of temporal information. In par- In order to determine the ticular, both time maps and the event calculus embody truth value of pi, you do not want to have to check pi-I, a certain causality principle: the only way new temporal pi-2 ,an d so on unt!·1 you d·IS cover th a t p i-2l3857.IS t rue. information is added is by a preceding event which causes chaining is too inefficient. III Both time maps [McDermott 1982, Dean and McDer- it. Since an explicit cause is known, there is no reason to mott 1987] and event calculus [Kowalski and Sergot 1986] posit backward persistence, past the cause. For example, provide better alternatives. In particular, time maps rely we cannot represent the simple fact that the book was on keeping track of only the points at which the truth on the table; we must represent a specific event or ac- value of the proposition changes, which are sufficient to tion that resulted in that state (such as placing the book 696 there). The closest one gets to backward persistence is as shorthand for the English sentence). The input to through abductive reasoning, "what would have to be the previous events. In applications such as planning[Allen our problem is assumed to be a collection of data points t[i] t[i] t[i] t[i] t[i] P] tfi] P] of the form Lall La22 ••• La:~llPt and Lall La~ ... La:~ll -'Pjn In other words, the sequences of agent indices are iden- et al. 1991], this is a reasonable assumption, as in those tical in all the input data, but the time indices are un- one is constructing a map of the future based on spe- constrained (we will see in section 5 why assuming a case previously in order for this fact to hold," positing cific planned events. However, if one is trying to use the fixed sequence of agent indices is not limiting). We also mechanism to piece together a map of time on the ba- assume that the data is consistent, that is, it does not sis of spotty data, this may prove inappropriate. For ex- • t[k] t[k] P] P] t[k] t[k] ample in a framework such as Agent Oriented Program- contam both Lall ... La:~llPkn and Lall ... La:~ll -'Pkn for any k. The problem is to define the rules of persis- ming [Shoham 1990], a major source of new temporal in- tence in this n-dimensional space, that is, to define for formation are INFORM messages from other agents. As a result of these messages, the agent may possess a rich any (tl' tz' ... ,tJ in the space and each fact p, which (if either) of Btl Bt2 ••• Btn-l ptn and Btl Bt2 ••• Btn-l-'pt n are sample of what is true and false over time, but no causal supported by the data. (In all of the above, both the al a2 an al a2 an knowledge of the precipitating events. Nevertheless, we agent indices and the time indices may contain repeti- will ignore backward persistence in most of the paper. tions.) Furthermore, we will want our definition to sup- Unless we explicitly state otherwise, we will use the term port an efficient mechanism for answering such a query persistence to mean forward persistence.) about any point in the space. Suppose we now wish to represent the evolution of Note that both the input form and query form are an agent's beliefs. Let us first introduce the notion of quite constrained. For example, the input form precludes learning, which will playa role that is analogous to that facts such as "John learned that Mary did not believe c.p," of an event in time maps. Given this notion, beliefs too (L John -,BMaryc.p) without making the stronger statement will be subject to a persistence rule: "John learned that Mary learned -'c.p." (LJohnLMary -,c.p) "The agent believes a fact at time t + 1 iff he Similarly, a query "Does John believe that Mary does learned it at time t, or else at time t he believed not believe c.p?" the fact and did not at that time learn that it the stronger query about Mary's believing the negated became false." fact (BJohnBMary -'c.p?). A positive answer to the second (This rule embodies the assumption that agents have perfect memory.) If, in addition, the "fact" itself is temporal, we end up with persistence along two orthogonal dimensions: the time of belief and the time of the property. This is the simple case of a 2-dimensional TBM. The extension to higher-dimensional TBM's is natural. Such TBM's are obtained by nested belief statements, such as as "John believes today that yesterday he did not believe ... " and "John believes today that tomorrow Mary will believe ... "); both of these example statements induce a 3-dimensional TBM. It turns out that resolving contradictions in a multi-dimensional TBM is somewhat more subtle than in standard time maps, as the following sections will describe. Here then is the problem we will address. Let us use the notation L: c.p to mean that agent a learned c.p at t (actually formalizing this notion is tricky, but that is not the concern of this paper; we use the notation merely (B John -,BMaryc.p?) are disallowed, only (BJohnBMary -,c.p) would entail a positive one to the first (B John -,BMaryc.p), but a negative answer (-,BJohnBMary -,c.p) would shed no light on the first query (BJohn -,BMaryc.p?). These are extensions we plan to look at in the future. In the remainder of this paper we will elaborate on this picture. We will explicate the assumptions made about agents, and discuss the multi-dimensional persistence in more detail. The organization is as follows. In section 2 we state the assumptions we make about agents' beliefs, both at single points in time and over periods of time. In section 3 we look closely at persisten.ce in a TBM's with a single datum point. In section 4 we look at TBM's with multiple data points. In section 5 we discuss the extension to data with multiple sequences of agent indices. In section 6 we briefly mention the complexity of the query answering, and in section 7 we briefly mention implementation efforts. We conclude with discussion of related and future work. 697 2 Assumptions about Belief We mentioned before that various idealizing assumptions about belief have been made and debated by other researchers, and that the focus of this paper is different from them. Nonetheless a few basic assumptions are es- tl: belief sential, and we discuss them here. In the spirit of this paper, we discuss these properties in commonsense terms, tl: belief Figure 3: Default region (left) and causal region (right) rather than in a formal logic. We have already listed some of the more common restriction on belief: closure of beliefs under tautological implication (as captured by the 'K' axiom), consistency (as captured by the 'D' axiom), and positive and negative introspection (as captured by the '4' and '5' axioms). Since among objective properties (those without a belief operator) we will consider only literals (atomic properties and their negations), the closure property will be irrelevant. Positive and negative introspection will also Assumption 4 (Common knowledge) Every agent believes that every agent believes the above properties) that every agent believes that every agent believes them) and so on. Multi-Dimensional Persistence of a Single Datum 3 turn out to impact our results only minimally, as will be discussed in section 5. However, consistency will lie In this section, we consider TBM's induced by a single at the heart of the TBM mechanism, and is our first as- datum point. We start by considering the non-nested sumption. case, in which the datum has the form Assumption 1 (Consistency) B:'P and both hold. B:''P cannot p] p] Lal p 2 (at time t~] agent a learns that at time t~l] property p was (is, will be) true). This induces a 2D TBM, in which the This is the only assumption we will make about a belief persistences along both axes are uninterrupted and thus at an instance of time. do not terminate at all. This situation is represented In addition we have constraints on how beliefs change graphically in Figure 3. over time. We first assume that agents do not come to The hatched quarter plane in the left picture, rooted in believe facts without explicitly learning them, but that the point (til], t~l]), is called the default region of (t~l], t~]). once they learn them, they do not forget them. The meaning of this region is that, given only the datum Assumption 2 (Causality and Memory) If at t agent point Lal p 2 ,B al p 2 holds by default iff (iI' t 2) lies in that . (.I.e., 1·ff t[l] t d i[l] i) regIOn I < I an 2 < 2 . t[l] a does not learn ''P) then B!+1'P holds iffB:'P holds. Our next assumption is that agents are extremely receptive to new information[Gardenfors 1988]. Assumption 3 (Gullibility) If at time t agent a learns 'P) then B!+I'P holds. t[l] t t Similarly, if we focus on an affected point (*), all data points affecting it by their forward persistence are distributed in the opposite quarter plane. This is the dual concept of the default region and is called a causal re- gion of the affected point. It is depicted graphically in (Of course, in an environement in which agents are sup- the right picture of the above figure. In this paper we plied with unreliable or dishonest information, this last will be concerned mostly default regions. assumption would be unacceptable, and we would need Finally, although it is only the 2-dimensional case that a more sophistiated criterion to determine which of the is so amenable to graphical representation, these con- two contradictory facts, the previously believed one and cepts extend naturally to the multi-dimensional case. t[l] P] tf1] Specifically, given only the datum Lall •.. La:-=-llP n , we ... Btn-l ptn holds iff it is the case that i > have that Btl l al an-l t[l] ... t > i[l] the newly learned one, should dominate.) Our last assumption is that all these properties are 'common knowledge': I' 'n n . 698 Mutiple Data with Incompatible Beliefs 4 We have so far considered only TBM's induced by a single datum. We now look at the general case in which we have mutiple data. We still assume that all data have the .c t['] t[i] t[i] P] t[)] t[)] lorm L 1 ••• L n-l p.n or L 1 ••• L n-l ' p n for some fixed al an-l z al an-l t[2] 2 .. till 2 .. elief J' aI' ... ,a n _ l (again, see section 5 in this connection), but nothing beyond that. If for any Pk the collection does not contain more than Figure 4: Overlapping default regions (t~l] ::f. t~2], t~l] ::f. t~2]) one occurrence of Pk (whether preceded by , or not), the situation is simple: the persistence of each fact is independent of the others, and so we construct an independent TBM for each one. t[2] 2 The situation in which multiple occurrence of a Pk ex- .. till 2 .. ist, but all with the same polarity (that is, either in all elief data containing Pk the Pk is preceded by" or in none), the situation is also simple: the default region is simply the union of the individual regions for each datum containing Pk' It is the presence of contradictory data that makes the Figure 5: Consistent default regions 4.1 (4 1 ] ::f. t~2], 4 ::f. t~2]) 1 ] The 2D Case story more interesting. Our assumption of consistency dictates that persistences of contradictory beliefs may not overlap. Without the strong limitations on the form of input data and queries, we would have two problems - to determine which sets of persistences are contra- dictory, and to resolve the contradiction. For example, we would have to notice that the three sentences B: B: B: (p V q), 'P and 'q are jointly inconsistent, even though all pairs are· consistent. Our restrictions remove this first problem. Since we only consider facts of the form Btl Btn-l tn d Btlal' .. Bta:-=-l1 , pt / ,t h eon1y f act contraal ... an-l Pi an dicting tn Btl ... Btn-l Pk al an-l will be Btl .•. Btn-l ,ptn al an-l k' and vice versa. When in future work we relax the restrictions on input and queries, we will need a new criterion for deter- dimensional case is derived in a straightforward fashion from the assumptions stated in section 2, and is analogous to the clipping of persistences in simple time maps. We will discuss the case of two data points, but the discussion extends easily to multi pIe points. Consider the input data consisting of the two points P] P] • : Lal P 2 and t[2] t[2] • Without loss of generality, assume that t~2] ~ til] holds. We consider the two cases - 0 : Lal 'P 2 t~2] ~ t~l] and t~] < t~l] - and assume for now that neither ti2] = t~l] nor t~2] = t~l] hold. The default regions of the two points in both cases are shown in the left and right portions of Figure 4, respectively. In both cases the default regions overlap, which is for- mining incompatibility. Our restrictions do not only render the problem of determining incompatibility trivial, they also simplify the task of resolving it. Since we always have exactly two beliefs contradicting one another, our task reduces to removing one of them; the question is which. The rule for resolving contradictory beliefs in the two l bidden, and one of them must be trimmed. In deciding which, we recall the assumption of gullibility: right after learning a fact, the agent must believe it. furthermore, the assumption of memory and causality dictates that the agent must continue to believe it until the next point about which he learns that the fact is false there. This produces the consistent default regions in Figure 5. Example. If John learns on Monday that on lor cours~, removing both would also restore consistency, but that would VIOlate our assumption about causality and memory. Thursday his house will be painted white (.) and on Tuesday he learns that on Friday it will 699 be painted blue (0), then from Monday until Tuesday John will believe that his house will be white from Thursday until the end of time, and from Tuesday on he will believe that his house will be white from Thursday until Friday (+45 0 0 shading), and blue afterwards (-45 shading) tl: belief (the left picture). (Of course, on Thursday he will learn that the painter had a wedding in t1: belief Figure 6: Default regions (left: t~l] = ti2], right: t~l] = t~]) Chicago and couldn't come.) On the other hand (the right picture), if John learns on Monday that on Thursday his In this case the agent first learned that p became true house will be painted white (.) and on Tuesday at some point, and later learned that p became false at he learns that on Wednesday it will be painted that very point. Now in principle we could imagine quite blue (0), then from Monday until Tuesday John sophisticated criteria to decide which evidence should will still believe that his house will be white from Thursday until the end of time (+45 0 be given greater credence. However, our assumption of gullibility forces a "recent is better" policy, leading us shading), but from Tuesday he will believe that to accept the later information and abandon the older his house will be blue from Wednesday until one. The resulting default regions are shown in the right 0 Thursday (_45 shading), and leave unaltered his belief that it will be white afterwards (+45 figure. 0 4.2 The General Case shading). (That will change when the painter, back from Chicago a week later, paints John's We now extend the previous discussion to higher TBM's. house turquoise, since neither white nor blue We will unfortunately have to do so without the aid of really go well with olive tree in the yard.) graphics; instead, we will use the following example. Note that in either case, the beliefs from Example. At t~l] you learn that at time t~l] Tuesday onwards would not change even if the your son learned that your son's teacher moved the two pieces of information were acquired in to Japan at time t3[ the opposite order. This is no accident; this Church-Rosser property is true in general of our system. We now turn to the limiting cases, in which either t~2] = 1] holds or t~2] = 1] holds. Note that from our 4 4 1] [1] [1] [1] (Lt1 Lt2 pt3). you son At time t~2] you learn that at time t~2] your son learned t1 2 that his teacher moved to the US at time ] P] t[2] t[2] [2] [1] (Ly~uLs~n ""p 3 ) where t3 > t3 . Let t > max(t[1] t[2]) t > max(t[l] t[2]) 1 and t3 1'1'2 > t~2](> t~1]). Then at 2'2' t1 you believe assumption about the consistency of the input, at most that at t2 your son believes that his teacher is one of them can hold. Therefore, if t~2] = t~1] holds, we living in the US at t3' This is true regardless of the relationship between t~l] and t~2], or the may assume without loss of generality that t~2] > t~1]. This means that at time t~1] (= t~2]) the agent learned that relationship between t~l] and t~2]. agent will th~refore believe at time t~l\ = t~2]) that p will Now consider the same scenario, except that t~2] = t~I]. This means that you believe that be true from the first point until the second, and false your son learned two contradictory facts. How- afterwards. There will be nothing later to change that ever, from the assumption that rules of belief belief, and thus the default region of p forms an infinite change are common knowledge , you know that p first became true (.) and later became false (0). The 2 horizontal strip, and the default region of ""p occupies your son will adopt the latest information (as the quadrant above it (Figure 6). The case in which t~2] = t~l] holds is more interesting, illustrated in the previous figure). since it provides insight into the higher dimensional case. Therefore 2Note that this is our first use of the common knowledge assumption! 700 your beliefs about your son's beliefs will de- but they also have perfect memory of past beliefs: Assumption 7 (Introspection about past beliefs) pend on the relationship between t~11 and t~21; if t~l > t~ll then you will believe that your son believes that the teacher lives in the US; other- if T > O. wise you will believe that your son believes that The last assumption states that agents do not expect the teacher lives in Japan. their beliefs to change: Assumption 8 (Belief about stability of beliefs) Finally, what will you believe if t~21 = t~l t~ll? In this case, you will need to break the tie by comparing t~ll and ti21 . Note and t~21 = if that they cannot also be equal, as that would T > O. (Notice ,that assumptions 5, 7, and 8 can be unified into violate the assumption that the input data is BtlBt2 a a r.p consistent. == B min (h,t2 ) .) a r.p We are not arguing on behalf of these assumptions. We The lesson from this example is clear. To determine list them merely as examples of plausible assumptions whether a point in the hyper-space lies in a particular one might want to make. The reason we mention them at default region, you should compare the associated time all is that they violate the property that nested temporal vectors. This ordering is a reverse lexicographical order- beliefs with different agent indices are independent of ing, the innermost time being the most significant and one another. For example, under assumption 8, B~B!p8 the outermost time the least significant. is contradictory with B~ Multiple Sequences of Agent Indices 5 We have all along assumed one fixed sequence of agent indices in the data: a l , " ' , an_I' However, relaxing this limitation is quite simple. Consider data points with multiple sequences of agents indices. Unless we make further assumptions about belief, data with different index sequences will simply not interact. For example, the t[l] t[l] t[l] truth of Ba1 Bb2 p 3 is completely independent from the t[2] t[2] truth of any statement that is not of the form Ba1 Bb2 x, where x is an objective sentence (containing no belief oppl pl -,l. Fortunately, these four assumptions allow an easy solution. We simply keep simplifying the sentences by substitution, until no further simplifications are possible. It turns out that no matter what subset of these four we choose, the result of this substitution process is unique (the Church-Rosser property again). More generally, whenever our assumptions allow us to derive a unique canonical form, we convert the query and the input data to this canonical form, and then revert to our usual procedure. We have not yet investigated the more complex case in which the canonical form is hard to derive or nonexistent. t[ll erator); in particular, it is consistent with Ba1 Bc2 -,p 3 • Thus we may simply construct separate TBM's for these different sentences, each obeying our restriction. 6 Complexity Our definition of default regions was constructive, and However, if we do make further assumptions about allows efficient query answering. We briefly discuss the belief, we must take greater care. We consider here four complexity here. If we assume that comparison of a pair possible further assumptions about belief. The first two of one-dimensional time points is done in one operation, are the familiar assumptions of introspective capability: then comparing two n-dimensional time points requires Assumption 5 (Positive introspection) B:r.p holds iff Bt Bt holds. a aT (f') Assumption 6 (Negative introspection) -,B:r.p holds at most n operations. In ordinary applications, n will be a very small integer. Ordinary people will not think of n = 5 cases in their everyday life. If we have N data points, we can get a sorted list B: -,B: r.p holds. of the data points by the priority based on the reverse The other two have to do with beliefs of the agent at lexicographical ordering, as eX:plained. This requires only iff ~ different points in time. The first is that not only do O(n . Nlog 2 N) agents have memory (which we have already assumed), agent learns informations gradually, it is useful to use a O(NlogN) operations. Since each 701 heap, a well known balanced tree data structure which made it clear that in this work we did not undertake can be easily modified to keep ordering. a logical treatment of time, belief and nonmonotonic- If we need to identify only the dominant data point in ity. We were also explicit about the limitations of our the causal region, even a naive implementation gives it framework. We hope to do both in the future, as well as in O(nN) c::: O(N) operations. demonstrate the pratical utility of this work. 7 Acknowledgments Implementation Our framework can be easily implemented by logic programmming languages such as Prolog as well as ordinary procedural languages such as C. We implemented various We would like to thank AOP group members at Stanford University and the referees of this paper who gave us useful comments. The first author would like to thank Koichi Furukawa at ICOT, Shigeki Goto, Hirofumi Katsuno, and other colleagues at NTT, too. versions of this framework in both languages. Backward reasoning mechanism implemented in Prolog employed References simplified versions of Kowalski/Sergot's Event Calculus. [Halpern and Moses 1985] J. Y. Halpern, Y. Moses. A Guide to the Modal Logics of Knowledge and Belief: Preliminary Draft, ppA80-490, Proc. of IJCAI, 1985. [Kowalski and Sergot 1986] R. Kowalski, M. Sergot. A Logic-Based Calculus of Events, New Generation Computing, VolA, pp.67-95, 1986. [Lin and Shoham 1992] F. Lin, Y. Shoham. Persistence of Knowledge and Ignorance (in preparation), 1992. [McCarthy and Hayes 1969] J. McCarthy, P. J. Hayes. Some Philosophical Problems from the Standpoint of Artificial Intelligence, in B. Meltzer and D. Michie (Eds.), Machine Intelligence 4, Edinburgh University Press, ppA63-502, 1969. [McDermott 1982] D. V. McDermott. A Temporal Logic for Reasoning about Processes and Plans, Cognitive Science, Vol.6, pp.101-155, 1982. [Dean and McDermott 1987] T.1. Dean, D. V. McDermott. Temporal Data Base Management, Artificial Intelligence, Vol. 32, pp.1-55, 1987. [Chellas 1980] B. F. Chellas. Modal Logic, An Introduction, Cambridge University Press, 1980. [Gardenfors 1988] P. Gardenfors, D. Makinson. Revisions of Knowledge Systems Using Epistemic Entrenchment, Proc. of the Second Conference on Theoretical Aspects of Reasoning about Knowledge, pp.83-95, 1988. [Shoham 1990] Y. Shoham. Agent-Oriented Programming, Stanford Technical Report CS-1335-90, 1990. [Shoham 1992] Y. Shoham. Nonmonotonic Temporal Reasoning, in D. Gabbay (Ed.), The Handbook of Logic in Artificial Intelligence and Logic Programming (to appear), 1992. [Sripada 1991] S. M. Sripada. Temporal Reasoning in Deductive Database, PhD thesis of Univ. of London, 1991. [Allen et al. 1991] J. F. Allen, H. A. Kautz, R. N. Pelavin, J. D. Tenenberg. Reasoning about Plans, Morgan Kaufmann Publishers, 1991. [Hintikka 1962] J. Hintikka. Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell University Press, 1962. [Griffiths 1967] A. P. Griffiths ed. Knowledge and Belief, Oxford University Press, 1967. [Konolige 1986] K. Konolige. A Deduction Model of Belief, Morgan Kaufmann Publishers, 1986. Forward reasoning mechanism implemented in C employed sorting of an array. As we described before, our algorithm is very fast in simple cases. We intend to implement more complex cases and evaluate their complexity. As for 2D cases, we have a program which draw a map from a set of data points whose time stamps are given in hour/minutes or minutes/seconds or year/month/day. Finally, this work has been carried out as part of the research on Agent Oriented Programming. The current simple interpreter, AGENTO [Shoham 1990l, only has a simple version of standard time maps. We have implemented an experimental agent interpreter which incorporates the ideas ofthis paper, and hope to report on it in the future. 8 Related Work and ConcluSIons The only closely related work of which we are aware, other than the work on time maps and event calculus which we have discussed at length, is Sripada's [Sripada 1991], which was independently developed. Both sys- tems can deal with nested temporal beliefs. Sripada represents a nested temporal belief by a Cartesian product of time intervals, and like us assumes that nested temporal beliefs are consistent. However, he does not consider the notion of default persistence, and therefore not with the resolution of competing default persistences. It would seem that the result of our system could serve as input to his, but we would like to understand his work better before making stronger claims about the relationship to his work. As should be clear, much more needs to be done. We PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 702 Dealing with Time Granularity in the Event Calculus Angelo Montanari (+ )(*), Enrico Maim (-), Emanuele Ciapessoni (+), Elena Ratto (+) ( +) CISE, Milano, Italy (-) SYSECA, Paris, France (*) Current Affiliation: University of Udine, Udine, Italy Abstract The paper presents a formalization of the notion of time ·granularity in a logic-based approach to knowledge representation and reasoning. The work is based on the Event Calculus [Kowalski,86], a formalism for reasoning about time, events and properties using firstorder logic augmented with negation as failure. In the paper, it is extended to include the concept of time granularity. With respect to the representation, the paper defines the basic notions of temporal universe, temporal decomposition and coarse grain equivalence. Then, it specifies how to locate events and properties in the temporal universe and how to pair event and temporal decompositions. With respect to the reasoning mechanisms, the paper defines two alternative modalities of performing temporal projection, namely upward and downward projections, that make it possible to switch among coarser and finer granularities. 1 Introduction The paper presents a formalization of the notion of time granularity in a logic-based approach to knowledge representation and reasoning. The work is based on the Event Calculus, a formalism for reasoning about time, events and properties using first-order logic augmented with negation as failure [Kowalski and Sergot 1986]. In the paper, it is exAuthors' addresses: Angelo Montanari, University of Udine, Mathematics and Computer Science Department, Via Zanon, 6 33100 Udine, ITALY, email: montanari@uduniv.cineca.itj Enrico Maim, SYSECA Temps Reel, Constraint Resolution Research Group, 315 Bureaux de la Colline, 92213 Saint-Cloud Cedex, FRANCE, email: enrico..maim@eurokom.iej Emanuele Ciapessoni and Elena Ratto, CISE, Artificial Intelligence Section, Division of Systems and Models, Via Reggio Emilia, 39, Segrate (Milano), ITALY, email: kant@sia.cise.itandelena@sia.cise.it. Most work of the fitst author was done while he was employed in CISE. tended to include the concept of time granularity. Informally, granularity can be defined as the resolution power of a representation. In general, ea.ch· level of abstraction at which knowledge can be represented is characterized by a proper granularity. Providing a formalism with the concept. of granularity allows it. to embed different levels of knowledge in a representation. In such a way, each reasoning t.ask can refer to the representational level that abstracts from the domain only those aspects relevant io the actual goal. We are interested in time granularity. With respect to the expressive power, it allows one to maintain the representations of the dynamics of different processes of the domain that evolve according to different time constants as separate as possible [Corsetti et al. 1990]. It also allows one to model t.he dynamics of a process with respect t.o different time scales. In such a case time granularity has to be paired with other refinement mechanisms such as process decomposition [Allen 1984], [Kaut.z and Allen 1986J, [Corset.t.i ct al. 1991a], [Evans 1990]. Finally, time granularit.y increases both the temporal distinctions that a language can make and the distinctions that it can leave unspecified. This means that consjdering two events as simultaneous or temporally distinct, or two time dependent relations as temporally overlapped or disjoint depends on the granularity one refers to. With respect to the computational powet·, it supports different grains of reasoning to deal with incomplete and uncertain knowledge [Allen 1983]' [Dean and Boddy 1988]. It also allows one to t.ailor t.he visibility of the knowledge base and the reasoning process to the needs of the actual task [Fum et al. 1989]. Secondly, it allows one to alternate among different time granularities during the execution of a task in order to solve each incoming problem at a time granularity as coarse as possible [Dean et al. 1988]. An example of a limited use of time granularity to expedit.e the search of large temporal databases is provided by [Dean 1989]. Finally, it allows one to solve a problem at a time granularity coarser than the required one to 703 cope with the complexity of temporal reasoning. Such a simplification speeds up the reasoning, but implies a relaxation of the precision of the solution. The ratio between the time granularities provides a measurement of the approximation of the achieved result. In despite of the widespread recognition of its relevance for knowledge representation and reasoning, there is a lack of a systematic framework for temporal granularity. The main references are the paper of Hobbs [1985] on the general concept of granularity and the works of Plaisted [1981], Giunchiglia and Walsh [1989] on abstract theorem proving. Hobbs defines a concept of granularity that supports the construction of simple theories out of more complex ones. He formally introduces the basic notions of relevant predicate set, indistinguishability relations, simplification, idealization and articulation. Such notions are extended and refined by Greer and McCalla [1989], which identify two orthogonal dimensions along which granularity can be interpreted, namely abstraction and 'aggregation. However, the one and the others reserve little or no attention to time granularity. In particular, Hobbs only sketches out a rather restrictive mapping of continuous time into discrete times using the situation calculus formalism. Conversely, a set-theoretic formalization of time granularity is provided by Clifford and Rao [1988], but they do not attempt to relate the truth value of assertions to time granularity. Finally, Galton [1987] and Shoham [1988] give significant categorizations of assertions based on their temporal properties. These categorizations are strictly related to the concept of time granularity even if it is not explicitly considered. A first attempt to introduce the notion of time granularity in the Event Calculus is reported in [Evans 1990]. Evans defines a macro-events calculus for dealing with time granularity whose limitations are discussed in section 4.1. Our paper proposes a framework to represent and reason abou t time granularity in the Event Calculus that generalizes these previous results. It significantly benefits by the work done to formalize the concept of time granularity in TRIO, a logic formalism for specifying realtime systems [Corsetti et al. 1991b], [Corsetti et al. 1991c], [Montanari et al. 1991], and [Ciapessoni et al. 1992]. [Maim 1991] and [Maim 1992a] present an alternative approach where the granularity problem is seen as an issue of dealing with ranges and intervals in constraintbased reasoning. The paper is organized as follows: section 2 presents the original Event Calculus together with its basic extensions, namely types, macro-events and continuous change; section 3 focuses on the representation of time granularity; section 4 details the modalities of reasoning about time granularity. 2 The Event Calculus The Event Calculus proposes a general approach to represent and reason about events and their effects in a logic framework [Kowalski and Sergot, 1986]' [EQUATOR 1991]. From a description of events t.hat occur in the real world, it allows one to derive va.riolls relationships and the time periods for which they hold. It also embodies a notion of default persistence, that. is, relationships are assumed to persist until an event occurs which terminates them. As an example, if we know that an aircraft enters a given sector at 10:00hrs and leaves at 10:20hrs, the Event Calculus allows us to infer that it is in that sector at 10:15hrs. More precisely, the Event Calculus takes the notions of event, property, time-point and time-interval as primitives and defines a model of change in which et!ent.~ happen at time-points and initiate and/or terminate timf'intervals over which some property holds. So, for instance, the events of entering and leaving the sector initiate and terminate the aircraft's property of being in the sector, respectively. Time-points are unique points in time at which events take place instantaneously. Tn the previous example, the event of entering the sector occurs at 10:OOhrs, while the event of leaving t.he sector occurs at. lO:20hrs. They can be specified at different degree of explicitness, e.g. "91/5/24:10:00hrs" to include the full date or just "lO:OOhrs", but belong to a unique domain. Time-intervals are represent.ed by means of tuples of two time-points. Wit.h the same example, we can deduce that the aircraft is in the sect.or during the time-interval starting at 10hrs and ending at lO:20hrs. Formally, Event Calculus represents domain knowledge by means of initiates and terminates predicat.es . 1 that express the effects of event.s on propertIes : initiates( Event., Property) terminates(Event, Property) In such a way, domain relations are intensionally defined in terms of events and properties types [EQUATOR 1991]. Weak forms of t.he initiate.'! and terminates predicates, namely weak-initiate.'! and wmkterminates, have been introduced in [Sergot 1990]. The predicate weak-terminate., states that a giv(>n event terminates a given property unless this propert.y has been already terminated. In a similar way, the predicate weak-initiates states that a given event initiates a given property unless this property has been already initiated. Instances of events and properties are obtained by at.taching a time-point (event, time-point) and a time1 We adopt the variable convention of the original Event. Calculus where constants are distinguished from variables by being denoted by names beginning with upper-case characters. 704 interval (property, time-interval) to event and property types, respectively. The first Event Calculus axiom we introduce is the Mholds-for. It allows us to state that the property p holds maximally (i.e. there is no larger time-interval for which it also holds) over (start, end) if an event e occurs at the tiine start which initiates p, and an event e' occurs at time end which terminates p, provided there is no known interruption in between: Mholds-for(p, (start,end») +0happenLat(e,start) /\ initiates(e,p) A happens_at(e',end) A terminates(e',p) A end> .start A not broken-during(p, (start, end») In the above axiom, the negation involving the broken predicate is interpreted using negation-as-failure. This means that properties are assumed to hold uninterrupted over an interval of time on the basis of failure to determine an interrupting event. Should we later record a terminating event within this interval, we can no longer conclude that the property holds over the interval. This gives us the non-monotonic character of the Event Calculus which deals with default persistence 2 • The predicate broken-during is defined as follows: broken-during(p, (start,end») +0happens_at(e,t) /\ start < t /\ end> t /\ terminates(e,p) This states that a given property p ceases to hold at some point during the time-interval (start,end) if there is an event which terminates p at a time t within (start,end). Event Calculus also defines an Iholds-for predicate in terms of Mholds-for to state that a property holds over each time-interval included in the maximal one: Iholds-for(p, (start,end») +0Mholds-for(p, (a,b») /\ start ~ a A end ~ b Finally, Event Calculus defines the holds-at predicate which is similar to Iholds-for except that it relates a property to a time-point rather than a time-interval: holds-at(p,t) +0Mholds-for(p, (start,end») A t > start A t < end In particular, the holds-at predicate states that a property is not valid at the time points at which occur the events that initiate and terminate it. This negative conclusion about the validity of properties at the left and right ends of time-intervals properly stands for ignorance. Time granularity will allow us to refine descriptions with respect to finer temporal domains. 2To deal with default persistence, [Maim 1992b] presents an approach to constructive negation in constraint-based reasoning. 2.1 Macro-events to Model Discrete Processes To model discrete processes the basic Event Calculus has been extended with an event decomposition mechanism that allows us to refine event represent.ations [Evans 1990], [EQUATOR 1991]. Evans introduced the notion of macro-event, which is a finite event. decomposed into a number of sub-events. The connections between a macro-event and its components are formalized in the Event Calculus as follows: happens_at(e,t) +0happens_at(e1,tJ) happens_at(e2,t2) happens_at(e3,t3) happens_at(e4,t4) A parLoJ(e1,e) A /\ parLoJ(e2,e) /\ /\ parLoJ(e3,e) /\ A parLoJ(e4,e) where the predicate parLof is defined by means of appropriate domain axioms. This axiom allows us to derive the occurrence of a macro-event from the occurrences of it.s sub-events. It. can also be used to abduce the occurrence of sub-events from the occurrence of the macro-event. 2.2 Continuous Change to Model Continuous Processes The basic Event Calculus is well-equipped to represent discrete processes, but is not so good for representing continuous processes, i.e. processes characterized by a continuous variation in a quantity such a.s t.he height of a falling object or the angular position of a crankshaft. Modelling a continuous process in terms of its temporal snapshots, in fact, can be seen a partiClllar case of event decomposition, but cannot he directly done by means of macro-events. To model cont.inuous processes, Event Calculus has been extended wit.h the idea of the trajectory of a continuously changing property through a space of values [Shanahan 1990], [Shanahan 1991], [EQUATOR 1991]. Shanahan introduced the notion of 'dynamic' propert.ies, like motJing of a train. When such a property holds, another property is continuously changing, such as position of the train. Continuously changing properties are modelled as trajectories. Formally, the holds-at axiom which gives value to a continuously changing property is: holds_at (p, t2) +0happens_at(e,t1) A initiates(e,q) /\ t1 < t2 A not broken_during(q, (tt, t2»/\ trajectory(q,t1,p,t2) In this axiom, the continuously changing propert.y p can be assigned a given value at a time point t2 if an instance of the relevant dynamic property q is initiated at a time point t1 (before t2) and not. broken 705 at some point between t1 and t2. The predicate trajectory describes the functional relationship between the continuously changing property and the time that has elapsed since it started to change. It can be seen as a path plotted against time through the corresponding quantity space. The formula trajectory(q, n,p, t2) represents that property p holds at time t2 on the trajectory of the period of continuous change represented by q which start at time tl. Such a property p holds only instantaneously and represents that some quantity varying continuously has a particular value. Its definition is domain specific. That is, a set of trajectory clauses is also part of the description of the domain, along with the domain's initiates and terminates clauses. For example, suppose that the angular position of a crankshaft increasing linearly with time whilst the shaft is rotating. If w is the angular velocity of the crankshaft, we have the following domain axiom: trajectory(rotating, t1, angle(a2), t2) ..holds-Llt(angle(a1), t1)" a2 = w(t2 - tl) 3 + a1 Representing Time Granularity This section first introduces the notion of temporal universe as a set of related, differently grained temporal domains. Such a notion supports the definition of the relations of indistinguishability and distinguishability among the time-points of the domains. Then, it precisely states the linkage between events and properties, and time granularity. 3.1 The Temporal Universe Providing a representation with time granularity requires introducing a finite set of disjoint temporal domains that constitutes the temporal universe of the representation: T= U T, '=1 .... n The set {Tt, T 2 , .. , Tn} is totally ordered on the basis of the degree of fineness (coarseness) of its elements and, for each i, with 1 ::; i < n, Ti+l is said of a finer granularity than Ti. Each domain is discrete with the possible exception of the finest domain that may be dense. For the sake of simplicity, we assume that each domain is denumerable. The temporal universe includes at most one dense domain because each dense domain is already at the finest level of granularity, since it allows any degree of precision in measuring time distances. As a consequence, for dense domains we must distinguish granularity from metric, while for discrete domains we can define granularity in terms of set cardinality and assimilate it to a natural notion of metric 3 . For the sake of simplicity, we assume that each domain is denumerable. For each pair of domains Ti, Ti+l, a mapping is defined that maps each time-point of Ti into a t.imeinterval of Ti+l (totality). mathemat.ical expressions we use the SUCCi(t) denotes the It maps contignous time-points into contiguous, disjoint time-intervals (contiguity) preserving the ordering of the domains (order preserving). Moreover, the union set. of the time-intervals of Ti+l belonging to its range is eq1lal to Ti+l (coverage). Finally, we assume that the length of time-intervals into which it maps the time-point.s of Ti is constant (homogeneity). This const.ant., denoted by ~i.i+l' defines the conversion factor bet.ween Ti and Ti+l which provides a relative measurement of the granularity of Ti and Ti+l wit.h respect. to each other. A general mapping between Ti and ~, wit.h Ti coarser than T J , can be easily obtained by a suitable composition of a number of elementary mappings. It is formally defined in a recursive way in [Corsetti et al. 1991a], where it is also shown that the properties of totality, contiguity, order preserving, coverage and homogeneity are preserved. In general, there are several ways to define these mappings each one satisfying the required propert.ies. According to the intended meaning of the mappings as decomposition functions, each time-point of Ti is mapped into the set of time-points of Ti+l that compose it.. Nevertheless, we are faced by a number of alternative possibilities in settling the reference time-point of each domain. Choosing the one or the other is merely a matter of convention, but it determines the actnal form of the mappings. In the following, assume that, for each pair Ti, Tj, the relevant function maps t.he reference time-point of Ti into a time-interval of 7~ whose first element is the reference time-point of TJ (reference time-points alignment assumption). To include the notion of temporal universe in t.he Event Calculus, we introduce the predicat.e valuemetric which splits each time-point (1st argument) into a metric (2nd argument) and a value (3rd arg11ment) components. Moreover, we express metrics as a subset of integer. Let us consider a temporal universe consisting of hours, minutes and seconds, a.nd assign by convention the metric 1 to the doma.in of 3 "Mapping, say, a set of reals into anot.her set of rcals would only mean changing the lmit of measure wit.h no semantic effect. Just in the same way one could decide t.o describe geomet.ric facts by using, say, Kmeters and centimetres. However, if Kmeters are measured by real numbers, the same level of precision as with centimetres can b~ achieved. Instead, the key point in time granularit.y is t.hat. saying that something holds for all days in a given int.erval does not imply that it holds every second within the 'sam~' interval" [Corsetti et al. 1991c). 706 seconds (in general, metric 1 is assigned to the finest domain), the metric 60 to the domain of minutes )1 minu te corresponds to 60 seconds) and the metric 3600 to the~domain of hours (1 hour corresponds to 3600 seconds). As an example, value-metric(2hrs30m,60,150) since there are 60 minutes in an hour. Using the predicate value-metric, decomposition functions can be defined as follows: fine_grain_of( < tl, 12 >, t)+value_metric(t, m, v) /\ value_metric(tl, ml, vI) /\ value_metric(t2, ml, v2) /\ ml :::; m /\ vI = v * (mimI) /\ v2 = (v + 1) * (mimI) - 1 Given a pair of domains Ti, Tj , with Ti coarser grain of Tj, for each time-point tj of Tj we also define as its coarse grain equivalent on T, the time-point ti of Ti such that tj belongs to the time-interval obtained by applying the corresponding decomposition function to ti. The uniquity of the coarse grain equivalents can be easily deduced from the definition of the decomposition functions. Coarse grain equivalent functions can also be defined using the predicate value-metric as follows: coarse_grain_of(t2, tl)+value_metric( t1, m1, vI) /\ value_metric(t2, m2, v2) /\ m1 :::; m2/\ v2 = (vI * ml)llm2 where (vI * m1)1 1m2 denotes the integer division of (vI *ml) by m2. The relationships of temporal ordering can be generalized to make it possible to compare two time-points belonging to different temporal domains as follows: is-Clfter(t2, tl)+value_metric(tl, m, vI) /\ coarse_grain_of(t, t2) /\ value_metric(t, m, v) /\ vI < v is_after(t2, tl)+valuLmetric(t2, m, v2) /\ coarse_grain_of(t, tl) /\ value_metric(t, m, v) /\ v < v2 The is_before predicate can be easily defined in a similar way. The coarse grain equivalent and the decomposition functions can be viewed as forms of simplification and articulation along the dimension of temporal aggregation, i.e. shifts in focus through part-whole relationships among time-points, respectively. They define distinguishability and undistinguishability relations between any pair of time-points with respect to each domain of the temporal universe. 3.2 Events and Properties In the Temporal Universe Let us now locate events and properties in t.he temporal universe. The idea is to directly associate a time granularity with events and to derive the granularity of propert.ies on the basis of the initiates and termirwtc relations. First of all, we give a characterization of events with respect to the temporal universe. With respect to a given domain, we distinguish instantaneou,'1 events, that. happen at. a time-point., and events with duration, that take place over a nonpoint time-interval. Such a distinction among events is a relative one, so, for example, passing from a given domain to a finer (coarser) one an instantaneous (with duration) event may become an event with duration (instantaneous). With respect to the temporal universe, we distinguish finite and infinitesimal events. An event is said finite if there exists a domain with respect to which it has duration. A finite event thus identifies an implicit level of time granularity: at this level and coarser ones, it is an inst.antaneous event; at finer levels it is of finit.e duration. We define such a threshold the intrinsic time granularity of the event. An event is said infinitesimal if it is instantaneolls with respect to every doma.in 4 . Infinit.esimal event.s are needed for dealing with continuous change [Shanahan 1990]. Let us consider, for instance, a process of continuous change such as sink filling with wat.er. We might. associate the occurrence of an event. with each new level reached by the filling fluid. If we did t.his, t.hen there would be no limit to how fine we might choose our temporal grain in order for the events to remain instantaneous. Thus t.a.king this approach we have a need for infinitesimal event.s. Differently from the previous one, such a dist.inction among events is an absolute one. To be able to deal with instantaneous events only, we impose that every event is associated with a domain whose granularity is equal to or coarser than the intrinsic one of the event. In such a way, Event Calculus axioms can be still used to reason within domains. On the contrary, they are insufficient by themselves to deal wit.h events associated with different domains (differently grained events). However, reasoning across domains can be brought back to reasoning within domains provided that there exist some rules t.o relate differently grained events to the same domain. The idea is to integrate macro-events (section 3.3) and continuous change (section 3.4) mechanisms with time granularity, and to define general temporal project.ion 4The absolute instantaneousness of infinitesimal event.s copes with the same representational problems t.hat suggested to Hayes and Allen the introduction of short time periods (moments) in Allen's Interval Logic [Hayes and Allen 1987J. 707 rules (section 4) that are used by default when neither macro-events nor continuous change decompositions are explicitly given. 3.3 Refining Macro-Events We define a unifying framework for the packaging of events and the granularity of time to describe the temporal relationships between a macro-event and its components. We require that the intrinsic time granularity of a macro-event is coarser than the ones of its subevents and that its occurrence time is a coarse grain equivalent of the occurrence time of all its sub-events. We also define a number of general operators, called macro-event constructors, for specifying temporal relationships among sub-events [EQUATOR 1991](we use the infix notation for macro-event constructors for the sake of simplicity)5: sequence ;delay(min,max) minimum and maximum delay between two events alternative I parallelism II sequential repetition (n is optional) parallel repetition (n is optional) composition [] Let us report here the Event Calculus axiomatization of the basic operators expressing sequence, alternative, and parallelism. happens..at(e1; e2, t)happens..at( e1, t1) " happens_at( e2, t2)" coarsf-grain_of(t, tl)" coarse_grain_of(t, t2) is_after(t2, t1) The operator expressing sequence deserves further consideration. It allows us to deduce the occurrence of a macro-event at a time-point of a domain coarser than the domain(s) the occurrence times of its component events (possibly macro-events in their turn) belong to. Such a time-point is a coarse grain equivalent of both the occurrence times of components. Then, the rule for sequential macro-events first executes a comparison of time-points with respect to the finer domain and then it abstracts them into a time-point of the coarser one. The presence of this switching to a coarser domain makes the definition of sequential macro-events incomplete. Consider the following example. Given the occurrences of three events e1, e2 and e3 at time-point.s 2hrs15m, 2hr42m and 2hrs50m, respectively, we are not able to deduce the occurrence of a sequential event. e1; [e2; e3] at time-point 2hr s when the temporal universe is { ... , hours, minutes, ... }. In fact, there is no way of strictly ordering e1 and the macro-event. into which e2 and e3 can be abstracted, because the occurrence time of the macro-event is a coarse grain equiva.lent of the occurrence time of d. To make it possible to derive the occurrence the macro-event el; [e2; e3], the temporal universe has to be extended wit.h the domain of 3D-minutes (similar considerations hold for the macro-event [eli e2]; e3). However, it is easy t.o find another sequence that cannot be abstracted into a sequential macro-event with respect to the extended temporal universe too. Such an incompleteness is due to the fact that mappings between temporal domains are fixed once and for all and then is inherent to t.he upward tempora.l projection involved in macro-event. derivation rules (section 4.1). happens..at(e1Ie2, t)happens..at( e1, t)" not happens..at( e2, t) 3.4 happens..at(e1Ie2, t)not happens..at{e1, t)" happens..at(e2, t) The original approach to con tin uous change makes the assumptions that the parameters of the trajectory function are set not after tl and are not reset between tl and t2. In general, these assumptions are too restrictive. Mechanisms are requested for resetting t.he parameters of the trajectory function. This allows it to be initiated with parameters values at the start of the property, but also allows the parameters to be changed during the interval of validity of the property_ In this way, the trajectory may model 'non-linearities' (e.g. a. change in the rate of a linear increase of a temperature) without interrupting the relevant dynamic property (e.g. by splitting a 'temperature rising' property when the rate of rise changes). happens..at(e11Ie2, t)happens..at( e1, t)" happens..at( e2, t) In general, domain axioms include definition of macroevents in terms of a suitable composition of su b-events. An example of these domain axioms is: happens..at( e, t)happens..at([e1; [e21Ie3]], t)" parLof(e1, e) AparLof(e2, e)" parLof(e3, e) 5Dealing with the repetition operators may require the addition of a domain composed of a single time point to the temporal universe (absolutely coarsest domain). Refining Continuous Change To take into account the resetting of parameters the original axiom can be replaced by the following one: 708 holds_at(p,t) +value_metric(t, m, v)A happens..at(e, tl) A value_metric(tl, m, vl)A ~vl < v A initiates( e, q)A not broken..during( q, (tl, t»)A happens..at(e', t2) A value_metric(t2, m, v2)A v2 < v A initiates(e',par)A not broken..during(par, (t2, t»)A max(tl,t2,ti) A trajectory(q,par,ti,p,t) A continuously changing property p can be assigned a given value at a time point t if an instance of the relevant dynamic property q is initiated at a time point tl (before t) and not broken at some point between tl and t, the relevant parameter par is (re)set at a time point t2 (before t) and not broken at some point between t2 and t, and the initial value of p is calculated (by the trajectory predicate) at the time point ti which is the maximum between tl and t2 and the 'max' predicate has the obvious definition. The crankshaft example of section 2.2 must be rewritten according to the revisited axiom as: trajectory(rotating, velocity(w), ti, angle(a), t) value_metric( ti, m, vi) A value_metric( t, m, v)A holds - at(angle(ai), ti) A a = w(v - vi) + ai The indirect recursion on the predicate trajectory (or, equivalently, on the predicate holds_at) stops when the initial values of the configuration variables, e.g. the angular position, are reached. They can be explicitly asserted or derived from the occurrence of independent events. The application of the refined axiom for continuous change is not restricted to discrete resetting of parameters; it can be used to deal with continuously changing parameters too. In such a case, the occurrence of the continuous events of resetting can be derived from the continuous change of the configuration variables by means of appropriate domain axioms. Continuous events can be either acquired by the external environment or computed according to explicit laws. In both cases, we generally need to plot them at regular time intervals to make the model computable. Choosing the width of the time interval is equivalent to choosing the time granularity at which describing the process. Then, a change in the frequency of plotting is equivalent to the switching of a continuous process from one time granularity to another. 4 Reasoning Granularity with Time We distinguished two basic modalities of relating differently grained events, namely upward and downward temporal projections. Upward (downward) projection determines the temporal relat.ions set up by two events ei and ej which occur at the time-points ti E Ti and tj E Tj, respectively, wit.h Ti coarser than Tj, by upward (downward) projecting ej (ej) on Tj (Tj). 4.1 'Naive' Upward Projection The 'naive' upward projection is a quite straightforward approach to abstractive t.emporal reasoning. It. states that the upward projection of an event e that occurs at a time-point t of a domain Tj on a domain Ti, coarser than Tj, is accomplished by simply replacing t with its coarse grain equivalent on Ti [Evans 1990]. Then the temporal ordering and distance between two events ei and ej which occur at the time-points ti E Ti and tj E Tj, respectively, are determined on the basis of the relation between tj and the coarse grain equivalent of tj on Tj. Moreover, if ej (ej) precedes ej (ed then the properties initiated by ej (e j) and terminated by e j (ed hold over the time-interval of identified by ti and the coarse grain equivalent of tj. To formalize upward projection in the Event Calculus, we first extend the definition of the occurrence time of an event. as follows: n happens..at( e, tl)happens_at(e, t2) A coarse - grain - ol(tl, t2) , In this way, each event is endowed with several occurrence times belonging to different domains, i.e. the time-point at which it originally occurs and all the coarse grain equivalents of such a point. Combined with the macro-event derivation rules, upward projection allows us to deduce the occurrence of parallel and alternative macro-events at time-points of domains coarser than the domains at which occur their components. Upward projection can be seen as a simplification rule [Hobbs 1985], because it allows us to derive a relation of temporal indistinguishability, i.e. simultaneity, among events from the relation of indistinguishabilit.y among time-points defined by coarse grain equivalent functions. Then, the Mholds-for predicate is redefined to constrain the starting and the ending time-points of the time-interval to belong to the same domain: Mholds-for(p, (start,end») +happens_at(e,start) A initiates(e,p) A value-metric(start, m, vs) A happens_at(e' ,end) A terminates(e' ,p) A value-metric(end,m,ve) A vs < ve 1\ not broken-during(p, (start,end») together with a similar axiom for the predicate brokenduring. In such a way, the predicate Mholds-for identifies several time-intervals of ditferents domains over which the 709 properties initiated and terminated by two differently grained events hold. In despite of its apparent simplicity upward projection involves a number of semantic assumptions. The most relevant one is related to its application to contradictory events, i.e. events that cannot occur simultaneously. We formally define two events as contradictory if they initiate or terminate incompatible properties. The definition of the relation of incompatibility among properties depends on domain-specific knowledge [Kowalski and Sergot 1986]. Upward projection maintains the weak temporal ordering between events, but it does not always preserve the strict one. Then the logical consistency of the upward projection cannot be guaranteed in the general case, because it may enforce contradictory events to occur at the same time-point in a coarser domain. As a consequence, if two differently grained events are contradictory the coarse grain equivalent of the occurrence time of the fine grained event must be different from the occurrence time of the coarse-grained event. This is guaranteed by the following integrity constraint 6 : +- happens.JLt(e1, t) /\ happens_at(e2, t)/\ cont'radictory( e1, e2) Moreover, upward projection may change the ratio between the width of time-intervals. That is, given two domains Ti and TJ', with Ti coarser than T i , the coarse grain equivalents on Ti of two pairs of time-points of Tj which are at the same temporal distance may be at a different one, while the coarse grain equivalents on Ti of two pair of time-points of TJ that are at a different temporal distance may be at the same one. Such a weakness of the 'naive' upward projection will be overcome refining upward projection according to the downward projection schema we are going to define. 4.2 Downward Projection The downward projection of an event e that occurs at a time-point t of a domain Ti on a domain TJ finer than Ti is accomplished by applying the following decomposition scheme: for each event e that occurs at a time-point t of Ti there exist two infinitesimal event.s ei and e f that occur at the time-points ti and t J of 6This solution can be generalized by making cont.radiction dependent on granularities or even on time instants. In such a way, simultaneous occurrence of two events can be classified as contradictory in certain domains, or even in certain time instants of them, only. The relevant integrity constraint becomes: +- happens_at(el, t) /\ happens_at(e2, t)/\ contradictory(el, e2, t) Tj, respectively, and such that (i) ti ~ t J; (ii) t i~ tllp coarse grain equivalent on Ti of both ti and t,; (iii) for each property p such that p is terminated hy (', there exist an event e1' that occurs at t1' of Tj such that e1' terminates p and ti ~ t1' ~ t J; (iv) for each property q such that q is initiated bye, there exist. an event e q . that occurs at tq of Tj such that eq initiates q and ti ~ tq ~ t f; (v) the (type of the) event e becomes a dynamic property that is initiated by fi and terminated bye, with respect to Tj. Because of an event is defined by the properties that it initiat.es and terminates, such rules provide the definition of the component events ei, e" e 1' and e/. Downward projection can be seen as an articulation rule [Hobbs 1985]. From the relation of distinguishability among time-points of the finer domain introduced by the decomposition function, in fact, it. derives a relation of temporal distinguishability among the sub events of a given finite event. Let us formalize this scheme in the Event Calculus. First of all, we define two functions begin and end that map a given instance of a macro-event. int.o its initiating and terminating events, respectively. The occurrence of such events can be deduced from the occurrence of the macro-event by means of the following aXIOms: happens_at{begin( e, t), time(begin( e, t)))+happens_at( e, t) /\ coarse - grain - of(t, time(begin{e, t))) happens_at(end(e, t), time(end(e, t)))+happens_at(e, t) /\ coarse - grain - of(t, time(end(e, t))) together with (condition (ii)): coarse - grain - of(t, time(begin(e, t))) coarse - grain - of(t, time(end(e, t))) where time(begin(e, t)) and time(end(e, t)) denote t.he occurrence times of begin( e, t) and end( e, t), resp~c­ tively. Condition (i) is expressed by the following integrity constraint: +- iSJlfter(time(begin( e, t)), time( end( e, t))) Let us now represent e1' and e q by means of two functions term and in. For each property p (q), term (in) maps each instance of a given macro-event int.o the component event that terminates (initiat.es) sl1ch a property. Using these functions, conditions (iii) and (iv) are codified by the following axioms: e, 7When ti and t J coincide, the events ei, €1" €q and are merged into the original single event e. This is always the case of the downward projection of infinitesimal events. For instance, the infinitesimal event of swit.ching on t.he light remains instantaneous with respect to all the domains of the temporal universe composed of {Day, Hour, Minute}. 710 terminates(term(e, t,p),p) - terminates(e,p) initiates(in(e, t, q), q) - initiates(e, q) togetlfer with: - is-D./ter(time(begin( e, t)), time( term( e, t, p)))V is_be/ore(time( end( e, t)), time(term( e, t, p») - is-D./ter(time(begin(e, t»), time(in(e, t, q»)V is_be/ore(time(end(e, t)), time(in(e, t, q))) for each property p and q. Finally, condition (v) is expressed by the following axioms: initiates(begin( e,t) ,e) terminates( end( e,t ),e) They allow us to state that the property e holds over (time(begin(e, t)), time(end(e, t») by means of the Mholds-for axiom. These last axioms provide each temporal object with a twofold event/property characterization. That is, (the type of) an event e, associated with a given domain, may become a dynamic prop~rty with respect to a finer domain, and vice versa. Let us consider, as an example, the event of flying from Milan to Venice. With respect to the domain TH of hours it can be modeled as an instantaneous event that occurs at a time-point t of TH. Such an event terminates the property of being in Milan and initiates the property of being in Venice. With respect to the domain TM of minutes, it can be decomposed into a pair of infinitesimal events /lyingi and flying, that occur a~ the time-points ti and t f of TM, respectively, with ti ::; t f, and such that t is the coarse grain eq ui valent of both. Moreover, flyingi terminates the property of being in Milan and initiates the property of flying, while flying, terminates the property of flying and initiates the property of being in Venice. 4.3 'Revised' Upward Projection The event/property duality introduced by downward projection suggests an extension of the upward projection rules to cope with contradictory events without restrictions. When the coarse grain equivalents of two contradictory events coincide the downward projection schema suggests to merge and replace the events by a macro-event corresponding to the conjunction of the properties initiated by the first one and terminated by the second one. Moreover, such a macro-event terminates (initiates) all the properties terminated (initiated) by its first (second) component and every property terminated (initiated) by the second (first) component which is not initiated (terminated) by the first (second) component. Let us consider, as an example, the events of leaving station A and arriving at station B of a train. The first one terminates the property of the train of being at station A and initiates the property of moving, while the second one terminates the property of moving and initiates the property of being at station B. Let be T a domain with respect to which the two events are simultaneous. According to the revised upward projection rules they are merged and replaced by the event of moving that terminates the property of being at station A and initiates the property of being at station B. The actual structure of the corresponding macroevent can be given in terms of a suitable composition of the component events using macro-event constructors. Consider two contradictory events e1 and e2. If their temporal ordering is known and meaningful, e.g. el precedes e2, then the corresponding macro-event e is a sequential one, that is, el; e2; if their temporal ordering is meaningless (their global effect does not change even if their ordering changes), and possibly unknown, then the corresponding macro-event is a parallel one, that is ellle2; if their temporal ordering is meaningful and unknown, then the corresponding macro-event is [ellle2]l[[el; e2]I[e2; ell]; and so on. The last one is the case, for instance, of events of rotation around orthogonal axes in the three dimensional space which are not commutative, that is, the final configuration of the rotating system depends on the ordering of their occurrences. 5 Conclusion The paper made a proposal for embedding the notion of time granularity into a logic-based representat.ion language. Firstly, it enumerated a number of notational and computational reasons that motivate the introduction of time granularity and briefly surveied and discussed the existing relevant literature. Successively,it extended the Event Calculus to deal with time granularity by introd ucing the concepts of temporal universe, finite and infinitesimal events, macro-event., and continuously changing events and properties. Finally, it provided Event Calculus with the axioms supporting upward and downward temporal projection. Acknow ledgements We would like to thank Chris Evans of Goldsmiths' College, University of London, and Murray Shanahan of Imperial College for the useful discussions we had with them. The research for this paper was partially funded by t.he European Community ESPRIT Program, EQUATOR Project no. 2409 [EQUATOR 1991]. Collaborat.ing organizat.ions are CENA (France), CISE (Italy), EPFL (Switzerland), ERIA (Spain), ETRA (Spain), Ferrant.i 711 Computer Systems Ltd. (UK), Imperial College (UK), LABEN (Italy), Politecnico di Milano (Italy), SWIFT (Belgi~m), SYSECA (France), UCL (UK). The work of CISE was partially funded by the Automatica Research Center (CRA) of the Electricity Board of Italy (ENEL) within the VASTA project too. References [Allen 1983] Allen, J., Maintaining Knowledge about Temporal Intervals; Communications of the ACM, 26, 11, 1983. [Allen 1984] Allen, J., Toward a General Theory of Action and Time, Artificial Intelligence, Vol. 23, No.2, July 1984. [Clifford and Rao 1988] Clifford,J., Rao,A., A Simple, General Structure for Temporal Domains in Temporal Aspects in Information Systems, Rolland, C., Bodart, F., Leonard, M. (Eds.), IFIP 1988. [Ciapessoni et al. 1992] Ciapessoni,E., Corsetti, E., Montanari,A., San Pietro,P., Embedding Time Granularity in a Logical Specification Language for Synchronous Real- Time Systems; submitted to Science of Computer Programming, North-Holland, January 1992. [Corsetti et al. 1990] Corsetti, E., Montanari, A., Ratto, E., A Methodology for Real- Time System Specifications based on Knowledge Representation; in Computational Intelligence, III, N. Cercone, F. Gardin, G. Valle (Eds.), NorthHolland, Proc. of the International Symposium, Milan, Italy, 24-28 September, 1990. [Corsetti et al. 1991a] Corsetti, E., Montanari, A., Ratto, E., Dealing with Different Time Granularities in Formal Specification of Real- Time Systems; The Journal of Real-Time Systems, Vol. III, Issue 2, June 1991. [Corsetti et al. 1991 b] Corsetti, E., Montanari, A., Ratto, E., Time Granularity in Logical Specifications, Proc. 6th Italian Conference on Logic Programming, Pisa, Italy, June 1991. [Corsetti et al. 1991c] Corsetti, E., Crivelli, E., Mandrioli, D., Montanari, A., Morzenti, A., San Pietro, P., Ratto, E., Dealing with Different Time Scales in Formal Specifications; Proc. 6th International Workshop on Software Specification and Design, Como, Italy, October 1991. [Dean and Boddy 1988] Dean, T., Boddy, M., Reasoning about partially ordered events; Artificial Intelligence, 36, 1988. [Dean et al. 1988] Dean, T., Firby, R., Miller, D., Hierarchical Planning involving deadlines, travel time and resource,,!; gence, 4, 1988. Computational Intelli- [Dean 1989] Dean, T., Using Temporal Hierarchies to Efficiently Maintain Large Temporal Databases; Journal of ACM, 36, 4, 1989. [Evans 1990] Evans, C., The Macro-Event Calculus: Representing Temporal Granularity; Proe. PRICAI, Japan 1990. [Fum et al. 1989] Fum, D., Guida, G., Montanari, A., Tasso, C., Using Levels and Viewpoints in Text Rep"esentation; in Artificial Intelligence and Information-Control Systems of Robot.s 89, North-Holland, I. Plander (Ed.), Proc. 5th International Conference, Strbske Pleso, Czechoslovakia, 6-10 November, 1989. [Galton 1987] Galton, A., The Logic of Occurrence; in Temporal Logics and their applications, Galton A., (Ed.), Academic Press, 1987. [Giunchiglia and Walsh 1989] Giunchiglia, F., Walsh, T., Abstract Theorem Proving; Proc. 11th TJCAl, Detroit, USA, 1989. [Greer and McCalla 1989] Greer, J., McCaHa, G., A Computational Framework for Granularity and its Application to Educntional Diagnosis; Proc. 11th IJCAI, Detroit, USA 1989. [EQUATOR 1991] Formal Specification of the GRF and CRL; CISE, FERRANTI, SYSECA (Eds.), ESPRIT Project no. 2409 EQUATOR, Deliverable D123-1, 1991. [Hayes and Allen 1987] Hayes, P., Allen, J., Shod Time Pe"iods; Proc. 10th IlCAI, Milano, Italy 1987. [Hobbs 1985] Hobbs, J., Granularity; Proc. 9th lJCAl, Los Angeles, USA 1985. [Kautz and Allen 1986] Kautz, II., Allen, J., Generalized Plan Recognition; Proc. AAAI, 1986. [Kowalski and Sergot 1986] Kowalski, R., Sergot, M., A Logic-based Calculus of Events; New Generation Computing, 4, 1986. [Maim 1991] Maim, E., Reasonig with Different Granularities in CRL; Proc. IMACS '91, Dublin, Ireland, July 1991. [Maim 1992a] Maim, E., Uniform Event Calculu. .t; 2 IQ GI C) e ~ IQ o append 100 nreverse 30 qsort 50 primes 100 8 queen Figure 7: average number of active contexts in the sample programs gram, and this is over twice as big as in other programs such as quick sort 50 and 8 queens. This is because it takes about 120 instructions to perform an integer division which is required in primes 100. For other similar programs which require multiplication and/or division of integer and/or floating point, low performance is also expected. But, because the management processor has its own FPU (floating point unit) in the IUs of PIE64, UNIREDll can pass such calculation to the MP and can concentrate on reducing goals. However, the evaluation has not been done yet. 4.3 Tolerance of Remote Access Latency To evaluate tolerance of remote memory access latency, we incorporated a pseudo-remote access mechanism in 721 0 Cl 2500000 r.a pipeline sleep 2500000 pipeline hold 0 D D invalidated insts. 8 2000000 2000000 VI VI CP CP u>- 1500000 U >U .:i! U 0 m 1500000 u pipeline sleep pipeline hold invalidated insts. executed inslS. .:i! U 0 1000000 U U 500000 500000 o 1000000 o o 20 40 60 remote pointer ratlo[%] 80 100 Figure 8: all sorts of clock cycles vs. remote memory access (8 queens, the maximum number of contexts = 1) EI 2000000 IZl EI 20 40 60 80 remote pointer ratio[%] 100 Figure 10: all sorts of clock cycles vs. remote memory access (8 queens, the maximum number of contexts = 4) 2500000 "T"'"'"---r-----r----T"'"'"'"'--r----: o o 1.6,--------------------------,---nreverse 30 pipeline sleep pipeline hold 1.5 invalidated insts. executed inslS. VI c.. CP 1.4 :::> U >U "0 CI) CI) .:i! U 1.3 qsort 50 •• primes 100 6 8 queen c.. o (/) U o 20 60 40 remote pointer ratlo[%] 80 1.0....L...-. .=---+----1-----i-none +derf +dfclldfcc +dcll/exll condition 100 Figure 11: effects of dereference instructions Figure 9: all sorts of clock cycles vs. remote memory access (8 queens, the maximum number of contexts = 2) the simulator in spite of the single processor model of it as shown in figure 5. In more detail, we change the value of the IU-identifier field of the pointers included in every goal when reduction of the goal starts or resumes after suspension, with the probability which we call remote pointer ratio. Remote memory access commands issued by UNIREDII are emulated by the command processor shown in figure 5 with cycles listed in table 3. Under these conditions, we varied the maximum number of the contexts from one to four, and measured the clock cycles required by all sorts of the pipelined execution of instructions using the 8 queens program. Results are shown in figure 8 to 10. In these figures, the lowest part (shadowed) of the graph represents the number of executed instructions, the second part (hatched) represents the number of invalidated instructions by some jumps, the third part (lightly shadowed) the number of cycles while the internal pipeline of UNIREDII holds, and the fourth, uppermost part (white) the number of cycles while the pipeline are sleeping because, waiting for some replies, no contexts can be executed. In figure 8, the multi-context processing mechanism of UNIREDil is not activated because the maximum number of the active contexts is set to one. Therefore the pipeline sleeping time (the white part of the graph) can not be hidden and becomes longer and longer as the remote memory access increases. Moreover, the pipeline hold time and the amount of invalidated instructions are great because the pipeline interlock occurs frequently. In the other two figures (figure 9 and 10), the multicontext processing mechanism works and works O1ol'f' effectively as the number of contexts increase. The pipeline sleeping time is least in the figure 10 and the pipeline interlock (the pipeline hold and the instruction invalidation) hardly occurs in that figure. They become a little longer as the remote memory access increases because the average number of the active contexts decreases. Figure 9 shows an intermediate state between figure ~ a.nd 10. 4.4 Effects of Dedicated Instructions Finally, we present the effect of the dereference instructions, which are most characteristic of the instruction set of UNIREDII. Figure 11 shows the speed up about four sample programs (naive reverse 30, quick sort 50, primes 100, 8 queen) without the dereference instructions (the dereference instructions are resolved into more basic instructions), with only the basic dereference (derf) 722 instruction, with the dereference-and-check-listj constant (dfcl j dfcc) instruction, and with the all combined instructions such as the dereference-and-check-list-Ioad-car j execute-on-list-Ioad-car (dclljexll) instruction, respectively. In the figure, the speed up of the basic dereference instruction is about 10 % except in toe primes 100 program, in which the majority of the executed instructions are arithmetic ones. In addition, the combined instructions have their effect as shown, and the total effect of these instructions is about 30% except primes 100. Therefore it can be said that the dereference instructions have a great effect. Acknowledgements We specially thank Prof. J .A.Robinson for much helpful advice. And we also thank the members of the group SIGLE in our laboratory, namely Tadashi Saito, Eiichi Takahashi, Minoru Yoshiada, Takeshi Shimizu, Yasuo Hidaka, Jun'ichi Tatemura, Hidemoto Nakada, Kei Yamamoto, Hajime Maeda, Shougo Shibauti, and Takashi Matsumoto. This work was supported by Grant-in-Aid for Specially Prompted Research (No.62065002), and is now supported by Grant-in-Aid for Encouragement of Young Scientists (No.03001269) of the Ministry of Education, Science and Culture. References 5 Discussion In the previous subsection, we present the effect of the dereference instructions and the combined ones. One point is that they are not such complicated instructions. In the hardware design, the instruction decoder does not include the critical path which actually determines the maximum clock rate of UNIREDll. The critical path is included in reading general purpose register file and ALU calculation. Moreover, all of the instructions of UNIREDll are single-cycle instructions because they jump to themselves recursively when they need more cycles to complete their action, as described before in section 3.4.1. Owing to these dedicated instructions, we can compile Fleng programs so that the number of executed instructions a.re minimized. As the result, we can achieve high performance though the clock rate is comparatively slow, 10 MHz. Finally, we shall mention the effect of the multi-context processing of UNIREDll. As well as reducing overhead of inter-processor synchronization, we can reduce pipeline interlock with it so that we can turn the pipeline of UNIREDll into an interlock-free one. 6 Conclusion We have described the architecture of the inference processor UNIREDll, and evaluated some aspects of it. We got a performance of about 1 MRPS with lOMHz clock, and made certain that the multi-context processing of UNIREDll has a big effect on reducing pipeline interlocking and on reducing overhead of the remote memory access latency. In future, we will evaluate it by larger, real application programs. And, of course, we will make the real UNIREDll chip work as PIE64 system element. [Jordan 1983J Jordan,H.F.: Performance Measurements on HEP - A Pipelined MIMD Computer, Proc. of the 10th Annual International Symposium on Computer Architecture, pp.207-212, ACM (1983) [Halstead and Fujita 1988] Halstead,R. and Fujita,T.: MASA:A Multithreaded Processor Architecture for Parallel Symbolic Computing, Proc. of the 15th International Symposium of Computer Architecture, pp.443-451, IEEE (1988) [Shimuzu et al. 1989] Shimizu,K., Goto,E., and Ichikawa,S.: CPC (Cyclic Pipeline Computer) A n A rchitecture Suited for Josephson and PipelinedMemory Machines, Transactions on Computers, Vo1.38, No.6, pp.825-832, IEEE (1989) [Kimura and Chikayama 1987J Kimura,Y. and Chikayama,T.: An Abstract KLl Machine and Its Instruction Set Proc. of the 1987 Symposium on Logic Programming, pp.468-477 (1987) [Nilsson and Tanaka 1988] Nilsson,M. and Tanaka,H.: Massively Parallel Implementation of Flat GHC on the Connection Machine, Proc. of Fifth Generation Computer Systems 1988, pp.1031-1040, ICOT (1988) [Koike and Tanaka 1988] Koike,H. and Tanaka,H.: Multi-Context Processing and Data Balancing Mechanism of the Parallel Inference Machine PIE64 , Proc. of Fifth Generation Computer Systems 1988, pp.970-977, ICOT (1988) [Takahashi et al. 1991] Takahashi,E., Shimizu,T., Koike,H., and Tanaka,H.: A Study of a High Bandwidth and Low latency Interconnection Network in PIE64, Proc. of Pacific Rim Conference on Communications, Computers and Signal Processing, pp.5-8, IEEE (1991) [Shimizu et al. 1991] Shimizu,T.,Koike,H.,and Tanaka,H.: Details of the Network Interface ProcesS07' for PIE64, (in Japanese) SIG Reports on Computer Architecture, 87-5, IPSJ (1991) PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 723 Hardware Implementation of Dynamic Load Balancing in the Parallel Inference Machine PIM/c T. NAKAGAWA, N. IDO, T. TARUI, M. ASAIE and M. SUGIE Central Research Laboratory, Hitachi Ltd. Higashi-Koigakubo, Kokubunji, Tokyo .185, Japan ABSTRACT and information processing systems. ICOT has also produced software in KL 1, including the PIM operating This paper proposes and evaluates the. hardware implementation required for dynamic load balancing in system [3]. We are currently developing the PIM/c [4] as a KL1- the prototype PIM/c of the Parallel Inference Machine (PIM). In fine grain multiprocessing, dynamic load based machine. A hierarchical structure of networkconnected clusters each of which is a bus-connected balancing is suffering from the high overhead due to the multiprocessor is introduced to utilize high access frequent access to load information. Proposed hardware locality of KLI programs in PIM [5]. Use of locality can reduce the overhead by speeding up the access to the load information. In order to utilize the high locality could restrict the interactions to clusters of several of logic programs, PIM/c is configured along a processors and thus reduce the communications among clusters. Therefore, a double hierarchical organization hierarchical structure of network-connected clusters is used in PIM/c. each of which is a bus-connected multiprocessor. Therefore two kinds of hardware suitable for each hierarchy are implemented for dynamic load balancing. Dynamic load balancing is one of the main research areas for PIM. As a result of the fact that logical relations are present in a KLI program and they never First, in the clusters, we propose a register with defIne their process of execution with determinacy, broaC:1cast write feature. The evaluation determines the dynamic load balancing must be used. For dynamic reduction of overhead due to memory polling which load balancing it is necessary to require load detects a load request. The proposed hardware reduces the execution time of logic programs by 15%. existence of idle processors or the value of a total load information, for example, the information about the Second, in the network, we propose the use of a within a cluster. The load information is updated and shortcut path to request the value of the total load within a cluster. The evaluation shows that the overhead due to referenced by distributed processors. In other words the load information is global, therefore it has no the request of that value is reduced as a result of locality. introducing the shortcut path. The proposed hardware reduces the execution time by 50%. The results obtained confIrm that the use of hardware can reduce the high overhead of dynamic load balancing. 1. INTRODUCTION A problem exists in that hardware for normal process execution in PIM/c: is optimized to the access with locality. With this type of hardware the latency in accessing global information is large. In fIne grain multiprocessing in KLI programs, high frequency and large latency in accessing load information produces high overhead. Therefore, extensions in hardware are introduced in order to reduce the latency of load Japan's Fifth Generation Computer project [1] has information in PIM/c. been centered around ICOT (the Institute for new In shared bus mUltiprocessors, snooping caches are generation COmputer Technology). ICOT has known to reduce the memory latency observed by the developed the parallel logic programming language processors [6,9]. There are two types of cache KLI (Kernel Language-I) [2] to describe knowledge coherency protocols for rewriting shared data with 724 copies distributed in plural caches; invalidation-type protocols and broadcast-type protocols. The choice 2. PIM/c HARDWARE FEATURES PIM/c has the following hardware features: depends on whether it is preferable to invalidate old copies for rewriting by the same processor, or to broadcast the new data for rewriting by other processors. Eggers [7] defined "per processor locality" as the average number of repeated write references to the same address by the same processor. For normal process execution in the KLI system, an incremental garbage collection makes the same processor reuse the same address repeatedly for different data references [4]. Thus invalidation protocols are more suitable due to high "per processor locality". For dynamic load balancing, broadcast protocols are preferable in order to access load information efficiently. Although protocols using both invalidation and broadcast features are known as "competitive snooping protocols [8]", the cache is insufficient to reduce the latency in accessing load information within the cluster of bus-connected multiprocessors. Thus the snooping cache in PIM/c utilizes an invalidation A. Hierarchical structure of shared bus multiprocessor and network based multiprocessor. Figure 1 shows the configuration of PIM/c. It is organized along a hierarchical structure of networkconnected clusters to utilize the localities of KLI programs. Thus, the shared bus hierarchy consists of processors combined in a cluster. Each processor has its own cache, and they share a common bus. Software simulation has proved that the common bus might be a bottle-neck. We concluded that the number of processors within a cluster should be limited to around eight, and that a two-way-interleaved common bus [11] should be possible in PIM/c. We consider that utilizing the access locality makes it possible to reduce the amount of network hardware because of reducing the number of messages transferred among clusters. As a consequence, in PIM/c the network is connected only to cluster controllers (CC) instead of all processors in the cluster. protocol and the implementation of broadcast feature is also considered, not for cache, but for registers to reduce the latency more efficiently. l Crossbar network In network-based mUltiprocessors, for normal process execution, it is more important to increase the throughput than to reduce the latency because the "nonbusy-waiting" feature could overcome the large latency [4]. The PIM/c network unit has message queues to increase the throughput, although they produce an II • I I PE1 CC I gache increase in latency. For dynamic load balancing, use of II the old information may cause wasteful load I dispatching. Therefore, a shortcut path to the message queues is introduced to reduce the latency in accessing load information through the network of PIM/c. 32 Clusters / System J I 1 I PE8 ••• Cache 1 11 .II • • • 11 Interleaved shared-bus ..l Shared memory Hardware extensions in PIM/c require only a small I~aChe I P 1- 8PEs / Cluster t- CC:Cluster Controller PE: Processor Element amount of hardware because the addressable space for broadcasting is limited in the shared bus, and because the increase in the number of interconnections among Fig. 1. The configuration of PIM/c. Each clusters is less than that of a system with a special purpose network [10]. cache has a capacity of 80 Kbytes and consists of 20 byte blocks. 725 B. Broadcast registers in the shared bus hierarchy. In order to reduce the access latency of load information in the shared bus hierarchy, registers with broadcast feature are introduced in PIM (Fig. 2) [12]. We denote these registers as EFR's (Event Flag Register). They have the following features: The register should be written with the load information by its corresponding cluster controller. As the load information is required without waiting at message queues and without waiting for the cluster controllers toreceive, specified registers can always be read in 11 cycles. • one-bit wide to indicate an event, and a fast Network detection feature for control jumps which checks the un~ 1 existence of events . • feature of broadcast write; therefore, registers indicating the same request event to any processor can be written simultaneously. The reference and jump can be done within a cycle. Send Recv. msg. msg. queue queue • When using registers, there is no overhead due to cache misses. Each PIM/c processor has 16 EFRs. PE0 PEl PE7 Shared eeo ee1 CC: Cluster Controller CIA: Cluster Info Register Fig. 3. Shortcut path in the network. The shortcut paths and the registers exist in the router board of the packet switching network. Broken lines show the normal path through the message queues to increase the .... _-.-#-. -:'- ... I network throughput and the bold lines show the shortcut path to bypass the queues. PE: Processing Element EFR: Event Flag Register Fig. 2. Broadcast registers in the cluster. Bold 3. EVALUATION STRATEGY lines show the propagation path of a request event to broadcast registers and the broken lines show the We defined the following two strategies to evaluate memory polling path without hardware support. The the effectiveness of the proposed load balancing thin lines show the reset action of that event. hardware. C. Shortcut path in the network hierarchy. In order to reduce the access latency of load information in the network hierarchy, two kinds of features are introduced; a shortcut path for the specific 3.1 Evaluation on the Real Hardware Real hardware was used for evaluation as the software simulation is almost impossible for the following reasons: • The presence of the cache and the network introduce messages (Fig. 3) [13] and the registers that hold the more parameters. load information are called CIR's (Cluster Information There are many hardware parameters related to the Register). The hardware has the following features: internal states of the cache and the network. The • a shortcut path to message queues. common bus arbitration time, and the message • eight-bit wide registers to indicate load information packet switching time are examples. The overhead in a corresponding cluster. of cache misses and the network latency is important 726 in this evaluation. Thus, simulating the cache and network effects concurrently with processor processor using common load pool [14]. Consequently, an explicit load balancing activities would have taken a great deal of time in communication for the distributed load pools is software simulation. required. • Receiver-initiated load balancing. 3.2 Evaluation using an Artificial Load Model With an aim toward further improvement, we evaluated an artificial load model for the following reasons: • to separate the effect of hardware alone. An evaluation independent of the specific application is necessary in order to isolate the speedup produced by the proposed hardware mechanisms. • to separate the effect of load balancing. The explicit load balancing communication for the distributed load pools should be initiated by fully idle processors in order to avoid wasteful dispatching. Thus the communication is request based. • Communication with arbitrary responder. In order to reduce the response time without interrupting busy processors, a new type of communication, the AR (Arbitrary Responder) communication is introduced in PIM/c [12]. The request is sent to any processor which has more The real KLI execution environment involves many than one load in its load pool. In order to avoid the new control sequences in addition to load high overhead of context switching, every balancing. For example, handling the priority of loads needs another polling action using EFR registers. The total performance depends on the usage of the proposed hardware in other control processor polls the request at intervals where the context switch overhead is low. Thus any processor which detects the request rust responds to it. As the timing to detect requests differs in sequences. each PIM/c processor, this communication method is expected to reduce the response time 4. EVALUATION RESULTS proportionally to the number of processors in a cluster. We carried out the evaluation of the proposed hardware in both shared bus and network-based hierarchies. 4.1 Evaluation of broadcast registers in the shared-bus hierarchy We carried out this evaluation by focusing on the reduction of the latency to access the information about the existence of the idle processors. B. The load model. This model reflects the following characteristics of KLI program execution: • Unit load. We denote the unit as the reduction. The unit is assumed to be 200 cycles in PIM/c (Fig. 4). • Indeterminacy in the granularity of loads. In order to simulate "Tail Recursion Optimization" [17], we define the goal as consisting of an A. The load balancing scheme. The load balancing scheme is explained below: • Distributed load pool. Each processor has its own load pool in order to avoid implicit data transfers between caches due to updating a serial link in case of the generator processor of the load differs from its consumer arbitrary number of reductions (1 to 16). • Indeterminacy in the number of goals . In order to simulate the indeterminacy, we assume that each processor generates an arbitrary number of goals (1 to 4096). • A high write ratio and a high share ratio. Accesses performed within the reductions have the 727 following parameters: write ratio is 0.5, share ratio is 0.5, where write ratio is defined as the ratio of processor by updating their communication areas. The evaluation measures are i and t, and the reduction cost write references to total memory references, and is defmed as follows: Reduction cost = (T - I - t ) / R share ratio is defmed as the ratio of references to shared data area to total memory references . • A high access locality. Figure 5 shows the performance increase in reduction We define the locality as the number of successive accesses to the same address. The value is set to 4 in order to simulate free-list manipulation, which consists of allocating, instantiating, referring and deallocating a memory cell. using registers. The total reduction cost and the load request count are varied in 14 simulation cases. In this figure, request ratio is introduced, which is defmed as the ratio of the load request count r to the total reduction count R. The reduction cost is almost independent of the request ratio. This fact indicates that the memorypolling overhead caused by checking request N PEi occurrences is larger than the overhead due to cache misses using invalidation protocol. The speedup obtained is 15% due to the use of EFRs. 300 o :Unit load Fig. 4. A Load model with varying granularity. : 11.-- Ii) Q) ~ ltll 280 U Z C. Results o/the evaluation in a cluster. 0 We control the initial load amount in each processor ~ :::> 270 to vary load balancing conditions. According to the w 260 0 a: ··········l·············t:i;~··t~~r········· ···········t··~········ deviation of the initial load amounts within processors, 14 cases are simulated with an 8-processor cluster. The 250 0 resulting data are the total elapsed time (1'), the total idle time (I), the total wait time after requesting for load (i), : ·············t·············fWithort::EF~........... . 290 ~ C/) 0 : -61-- A._A_~-tAA~,A i 1 i i 0.01 0.02 0.03 0.04 0.05 REQUEST RATIO Fig. 5. The increase in speed using regis- the total dispatching time (t), the total reduction count ters. The reduction cost is defined as the number of (R) and the load request count (r). The total idle time execution cycles per unit load. The result involves extra includes the time spent waiting for load dispatching cycles for probing. The request ratio is defined as the since requesting a load by updating a bit-map word number of request per reduction. Using memory until receiving a load by reading a non-zero value from its communication area, and the time to wait for polling the reduction cost is high due to the serial execution of a memory access and a branch. Using termination of the whole program. The bit-map word is EFR, both the access and the branch can be done within a data array in which each bit corresponds to a a cycle. The polling is done for three kinds of events; processor requesting load. The total dispatching time load request, load dispatching and termination of the includes the time to select an idle processor by encoding whole program. the bit-map word to the address of its communication area, and the time to dispatch a load to each idle 728 Figure 6 shows the wait time i and the dispatching wasteful dispatching. In this scheme, the cluster to time t as a function of request count. It is confirmed that which goals are dispatched is determined at random the use of EFR with broadcast feature reduces both the wait time and dispatching time. The use of EFR reduces the dispatching time by 20%, and reduces the wait time by 15%. and then this goal dispatch is aborted on the condition that the dispatch target has more loads in the pool than the dispatching cluster. B. The Load model. 2.5 HI The load model among clusters is defmed in such a 2.510' Wllhou,-EFR WAIT TIME --Wllh_EFR 0 • -0 - en way as to reflect the changes in the amount of loads in 2.01r1 2.010' 1.51r1 1.5 10' the load pool. The load model is as follows: ~ ~ w ~ ~ t= -I §: (!) m Z I () =i 1.01r1 1.010' ~ :Q 0 CD .!!1 a.. C/) 5 5.01rJ 5.010' -.(>o-Wllhoul_EFR DISPATCHING TIME --With_EFR 0 2000 3000 4000 5000 0000 0 7000 REQUEST COUNT • An initial goal is denoted by L( 16) (Fig. 7 shows L(5)). • The execution of goal L(i) produces (i-I) subgoals, L(i-l), ... , L(2), L(1). Thus, the goal L(i) has 2i-l reductions. • Each reduction takes 300 cycles to execute using network messages. • The message length required for the load dispatching is 27 bytes long. Thus, it takes 27 cycles to send this message through the one-byte- Fig. 6. The increase in speed using broad- wide network interface. The length of the message casting. The dispatching time and the wait time requesting the load amount is 2 bytes. increase due to the cache misses using an invalidationtype snooping cache. The use of broadcast feature eliminates the overhead due to the cache misses. 4.2. Evaluation of shortcut paths in the network-based hierarchy We carried out this evaluation by focusing on the reduction of the latency in accessing the value of the total load in a cluster. A. The load balancing scheme. The load balancing scheme is described below: • Sender-initiated load balancing. A study of the Multi-PSI system disclosed a problem of the receiver-initiated load balancing sch~eme in large-scale machines, namely that a load request contention may arise at busy processors [15]. In order to avoid this contention, an improved sender-initiated scheme, named "Smart Random Load Dispatching" [5] is efficient in reducing © :Unitload Fig. 7. A load model with floating amount of load. C. Results o/the evaluation among clusters. We control the dispatching rate, which is defmed as the ratio of all goals dispatched to other clusters to all executed goals, by changing the interval of the dispatching control. In order to determine the efficiency of load dispatching, the total elapsed time (T), the total idle time (I) and the dispatching rate (d) are measured. Differences result from the latency of load information. 729 Figure 8 shows the results obtained by applying the smart random load dispatching scheme to 8 cluster approximately 5.5 in an 8-cluster system at a dispatching rate of 0.2. Comparing the two results, the use of the proposed system without support hardware. The normalized elapsed time, which is defmed as the ratio of elapsed hardware halves the normalized elapsed time at 0.2 time by 8 cluster system to elapsed time by single dispatching rate, where the control of dispatching rate cluster, and the utilization of processors are plotted as a seems to be possible. function of the dispatching rate. In order to compare the results in the two cases, we assume that the dispatching rate is controlled to be 0.2, because safe control occurs only at the upper side of the minimum point. Without the support hardware, the resulting increase in speed is It should be noted that the shortcut path can also be used for other load balancing schemes, including the minimum load distribution scheme [16]. These schemes will be evaluated in future work. approximately 3.3 in an 8-cluster system at a dispatching rate of 0.2. w ::2 d-- w ::2 ~ o w en a.. ~ o w ~ ~ a: o z ,. ----- UTILIZATION ··············(············(············1···.... 0..+ 0.2 w ~ w I'• 0811lt 0.6 F o ···1,············· ! till :s 0.8 o W N C 0.6 -i j= :J ~ -i a: 0.4 6 Z « ~ oz - - NORMALIZED ELAPSED TIME 1----+---1--+---+--1 0.2 0.2 0.4 0.6 0.8 0 DISPATCHING RATE - - NORMAUZED ELAPSED TIME 01----+---+--+---+----1 o 0.2 0.4 0.6 0.8 DISPATCHING RATE Fig. 9. Smart random dispatching with support hardware. The normalized elapsed time varies near 0.125 using 8 clusters connected via a Fig. 8. Smart random dispatching without support hardware. The dispatching rate is defined as the ratio of all goals dispatched to other network because the overhead for message handling is quite low. clusters to all executed goals. The normalized elapsed time varies considerably from 0.125 using 8 clusters connected via a network because the overhead for message handling is visible. 5. CONCLUSION Hardware for dynamic load balancing is implemented in both shared-bus and network-based mUltiprocessors. We propose a register with broadcast write feature in Figure 9 shows the results after applying the smart shared-bus multiprocessors. Also, in network-based random load dispatching scheme with hardware multiprocessors, the network unit uses a shortcut path. support. The normalized elapsed time and the utilization The evaluation was carried out using real hardware and of processors are plotted as a function of the an artificial load model. dispatching rate. With the support hardware active, the The evaluation results in the shared bus hierarchy determine the overhead due to memory polling which detects a load request. The proposed hardware reduces processor can reduce the overhead due to requesting the load amount. The resulting increase in speed is 730 the execution time of logic programs by 15%. The evaluation results in the network-based hierarchy Performance of four Snooping Cache Coherency Protocols," Proc. of the 16th ISCA, 1989. show that the overhead due to requesting the load [8] A. R. Karlin, M.S. Manasse, L. Rudolph and D. amount is reduced as a result of introducing the shortcut D. Sleator, "Competitive Snoopy Caching," Proc. path. The proposed hardware reduces the execution time by 50%. Computer Science, Toronto, October, 1986. It is confirmed that the proposed hardware reduces the access latency of load infon~ation, of the 27th Annual Symposium on Foundation of [9] A. Gupta and J. Hennessy, "Comparable and subsequently the overhead produced by dynamic load balancing. Evaluation of Latency Reducing and Tolerating Techniques," Proc. of the 18th ISCA, IEEE, 1991. [10] H. Koike and H. Tanaka, "Multi Context Processing and Data Balancing Mechanism of the Parallel Inference Machine PIE64," Proc. of ACKNOWLEDGEMENTS The authors would like to thank Dr. Shun'ichi FGCS, Vo1.3, 1988. [11] L. Rudolph and Z. Segall, "Dynamic Decentralized Uchida, the manager of the research department of ICOT, for his guidance and support, Dr. Kazuo Taki, Cache Schemes for MIMD Parallel Processors," chief of 1st ICOT laboratory, and Mr. Marius Hancu for helpful discussions. This research was sponsored by ICOT. [12] T. Nakagawa, A. Goto, T. Chikayama, "Slit- REFERENCES Proc. of the 11th ISCA, June, 1984. Check Features to Speedup Interprocessor Software Interruption Handling," IEICE SIG Reports, July, 1989, pp 17-24, (in Japanese). [13] N. Ido, H. Maeda, T. Tarui, T. Nakagawa, M. Sugie, "Parallel Inference Machine PIM/c -Load programming in the fifth generation computer Balancing Support-," the 40th Annu. Convention IPS Japan, 2L-4, (in Japanese) . project," Springer-Verlag, 1987, 1(5) pp 3-28. [2] K. Ueda, "Guarded Horn Clauses: A Parallel Logic [14] M. Sato and A. Goto, "Evaluation of the KLI Parallel System on a Shared Memory Programming Language with the Concept of a Multiprocessor," Proc. of IFIP Working Conf. on Parallel Processing, Pisa, April, 1988. [1] K. Fuchi and K. Furukawa, "The role of logic Guard," TR208, ICOT, 1986. [3] T. Chikayama, H. Sato, T. Miyazaki, "Overview of the Parallel Inference Machine Operating System (PIMOS)," Proc. of the FGCS, voU, 1988. [4] A. Goto, M. Sato, K. Nakajima, K. Taki, A. [15] M. Furuichi, K. Taki, and N. Ichiyoshi, " A Multi-Level Load Balancing Scheme for ORParallel Exhaustive Search Programs on the MultiPSI," In Proc. 9f the 2nd SIGPLAN Sympo. on Principles and Practice of Parallel Programming, pp Matsumoto, "Overview of the Parallel Inference Machine Architecture (PIM)," Proc. of the FGCS, vo1.1, 1988, pp 208-229. [16] S. Sakai, H. Koike, H. Tanaka, T. Motooka, [5] M.' Sugie, M. Yoneyama, N. Ido, T. Tarui, "Load Dispatching Strategy on Parallel Inference Machines," Proc. of FGCS, Vol.3, 1988. "Interconnection network with dynamic load balancing facility," Trans. of Information Processing, Vol. 27, No.5, pp 518-524, 1986, (in [6] J ... Archibald and J. Baer, "Cache Coherence Protocols: Evaluation using a Multiprocessor 50-59, Mar. 1990. Japanese). [17] D. H. D. Warren, " An Improved' Prolog Simulation Model," ACM Trans. on Compo Implementation which Optimises Tail Recursion," Systems, Vol.4, No.4, 1986, pp 273-298. Research paper 156, Dept. o.f Artificial Intelligence, [7] S. J. Eggers and R. H. Katz, "Evaluating the Univ. of Edinburgh, Scotland, 1980. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 731 Evaluation of the EM-4 Highly Parallel Computer using a Game Tree Searching Problem Yuetsu KODAMA Shuichi SAKAI Yoshinori YAMAGUCHI Electrotechnical Laboratory 1-1-4, Umezono, Tsukuba-shi, Ibaraki 305, Japan kodama~etl.go.jp Abstract EM-4 is a highly parallel computer whose eventual target implementation has more than 1,000 processing elements(PEs). The EM-4 prototype consists of 80 PEs and has been fully operational at the Electrotechnical Laboratory since April 1990. EM-4 was designed to execute in parallel not only static or regular problems, but also dynamic and irregular problems. This paper presents an evaluation of the EM-4 prototype for dynamic and irregular problems. For this evaluation, we chose a checkers program as an example of the game tree searching problem. The game tree is dynamically expanded and its structure is irregular because the number and the depth of subtrees of each node depend heavily upon the status of the game. We examine effects of the load balancing by function distribution, data transfer, control of parallelism, and searching algorithms on the EM-4 prototype. The results show that the EM-4 is effective in dynamic load balancing, fine grain packet communication and high performance of instruction execution. 1 Introduction Parallel computing has been effective for static or regular problems such as scientific computing and database systems. Parallel computing is, however, still an active research topic for dynamic or irregular problems. EM-4 is a highly parallel computer which was developed at the Electrotechnical Laboratory in Japan. Its target applications include not only static or regular problems, but also dynamic or irregular problems. EM-4 provides special hardware for parallel computing: high data transfer rate, high data matching performance, dynamic load balancing, and high instruction execution performance. In this paper, we evaluate the performance of EM-4 on a dynamic and irregular problem. The performance of EM-4 on some small programs such as recursive fibonacci is presented in [Kodama et al. 1991]. While the fibonacci program creates many function instances dynamically, it is not irregular because the tree of calling functions is a binary tree, the depth of each branch is similar to those of its neighbors, and the size of each node function is the same and small. We chose a game tree searching problem as a practical problem. This class of programs dynamically expands the game tree, and is irregular because the number of subtrees from each node of the game tree, the depth of su btrees, an d the execution time of each node depends heavily upon the status of the game. Furthermore, the o:-{3 searching algorithm is often used for game tree searching, because it cuts the evaluation of the current tree by using the evaluation of the previous tree. Tree cutting makes the program more dynamic and irregular. This paper presents the evaluation of the EM-4 prototype using a checkers game program as an example of the game tree searching problem. We examine the effect of parallel computing on the EM-4 prototype. Section 2 presents an overview of the EM-4 and its prototype. Section 3 describes a game tree searching problem and a checkers game. Section 4 presents evaluation issues for load balancing, data transfer, control of parallelism, and searching algorithms for the checkers game. Section 5 gives an evaluation and examination of the strategies described in section 4. Section 6 concludes our results and discusses our future plans. 2 The EM-4 Highly Parallel Computer EM-4 is a highly parallel computer whose eventual target implementation has more than 1,000 PEs[Yamaguchi et al. 1989, Sakai et al. 1989]. The EM-4 prototype consists of 80 PEs and has been fully operational since April 1990[Kodama et al. 1990]. 732 SU: IaU: Switching trxUt Input Buffer trxUt retch, Matchinq trxUt EXt1: Jb:eaution trxUt MCl1: Memory Control trxUt MAINT: Maintenace trxUt i'NIJ: 31 32 32 Figure 1: The organization of EM-4 Prototype 2.1 The architecture of EM-4 The organization of the EM-4 prototype is shown in Figure 1. The prototype consists of 80 PEs, and each 5 PEs are grouped and are implemented on a single PE board. The PE of the prototype is an single chip processor which is called EMC-R and is implemented in a C-MOS gate array. The PE has local memory and is connected to the other PEs through a circular omega network. EMC-R is a RISC processor for fine grain packetbased parallel processing. EMC-R generates packets in an execution pipeline, and computation is fired by the arrival of packets. This is a dataflow mechanism, but we improved it so that it can operate on a block which consists of several instructions, executed exclusively from other instructions. This model is called the "strongly connected arc model", and the block is a strongly connected block(SCB). When a packet arrives at a PE, the execution pipeline is fired and EMC-R executes the SCB indicated by the packet. First, EMC-R checks whether the partner of the packet has arrived. If the parter exists, it continues to execute the SCB until the end of the block. If the parter does not exist, EMC-R stores the packet data in a matching memory and waits for the next packet. The packet size of EMC-R is two fixed words and there is only one format consisting of one address word and one data word. It can be generated in a RISC pipeline of EMC-R. During the data word is calculated in a RISC pipeline, the address word is formed in a packet generation unit when the packet output is instructed. Since the network port is only one word wide, first the address word is sent to network, and then the data word is sent. In the second clock cycle, the next instruction can be executed in parallel with data word transfer. The circular omega network has the same structure as an omega network, except that every node of the network is connected to a PE. The network has the following features: (1) The required amount of hardware is D(N), where N is. the number of PEs; (2) The distance between any two PEs is D(logN). The 3 by 3 packet switching unit is in a EMC-R, and a packet can be transferred to a neighboring PE independent of the instruction execution on the PE. Packets are transferred by wormhole routing, and take only M + 1 cycles between PEs which are distance M apart if there is no network conflict. The clock of the EMC-R runs at 12.5 MHz. The RISC pipeline can execute most instruction in one clock cycle; the peak execution performance is 12.5 MIPS. It takes two clock cycles when two operand matching fails, and takes three clock cycles when the matching succeeds. The peak synchronization performance is 2.5 Msync/s. It takes two clock cycles to transfer a packet, and the peak network packet transfer performance is 18.75 Mpacket/s. EM-4 prototype consists of 80 PEs, the peak execution performance is 1 GIPS, its ,peak synchronization performance is 200 Msync/s, and its peak network packet transfer performance is 1.5 Gpacket/s. EMC-R achieves a high performance in both instruction execution and packet data transfer/matching. 733 CALLEE PE[O,2] a b c d e g h [LD,[GA,CA]] is the MLPE packet which shows that PE[GA,CA] has the minimum load LD Figure 2: How to Detect the Minimum Load PE 2.2 Dynamic load balancing method To get high performance in parallel computers, high utilization of PEs, as well as high performance of PEs are necessary. If the program has simple loop structure or static data transfer structure such as in diffusion equation applicaitons, the load of the program can be estimated and the load can be statically balanced at programming or compiling time. But, if the program is dynamic or irregular structure, static load balancing is difficult and dynamic load balancing· is necessary. In the EM-4, we implemented automatic load balancing mechanisms attached to the circular omega topology. In the circular omega network, each node has two circular paths. We use a path to group the PEs, and use another path to achieve dynamic load balancing. Suppose that a PE wants to invoke a new function. This PE will send out a special MLPE{Minimum Load PE) packet. The MLPE packet always holds the minimum load value and the PE address among the PEs which it goes through. The load of each PE is evaluated by hardware in the PE mainly based on the number of packets in the input buffer. At the starting point, the MLPE packet holds its sender's load value and its PE address; when it goes through a certain SU in the circular path, the SU compares the load value of the PE connected to it, and if the value is less than of the packet, the data in the MLPE packet will be automatically rewritten to the current PE's value; otherwise the MLPE packet keeps its value and goes to the next SU. This operation is done in one clock cycles of packet transfer. When the MLPE packet returns to the starting point, it holds the least loaded PE number and its load value. Figure 2 show this. In this figure, PE[l,O] generates an MLPE packet and, after the circulation, it obtains the least loaded PE number [0,2] and its load l. By this method, called the circular path load balancing, each MLPE packet scans s different groups, where s is the number of network stages. When the total number of the PEs increase, coverage of PEs by this load balancing method becomes relatively small. The efficacy of this method is reported in [Kodama et al. 1991]. Since it takes several cycles for the MLPE packet to return, the EM-4 resolves this latency by pre-fetching: it sends a MLPE packet in advance, allocates the new function instance on the PE specified by the returned packet of MLPE, and stores the function ID in a special register of the required PE. When a function call is necessary, the stored function ID is used and another MLPE packet is sent for the next function call. In the pre-fetch strategy, the new function ID may have not yet been stored when a function call is necessary. In this case, the pre-fetch method uses one of the other distribution methods to choose the PE. 3 Game Tree Searching Problem We choose the checkers program as an example of a game tree searching problem in order to evaluate the EM-4 on a dynamic and irregular problem. Since the rules of checkers are very simple, the program makes it easy to characterize the parallel behavior of the program. The rule of checkers game is as follows. Each player moves one of his pieces in turn until the player who has no pieces or moves loses. Pieces can be moved to a forward diagonal area. If there is an opponent's piece in a forward diagonal area, and the next diagonal area is empty, you must jump to the empty area and remove the enemy piece. If you can jump successively, you must jump successively. If your piece arrives at the end of the enemy area, that piece can then move in all four diagonal directions. The Min-Max searching algorithm is the simplest algorithm for the game tree searching problem. This algorithm expands the game tree by the possible moves of each player in turn. When the game tree is expanded to a certain level, each leaf is evaluated. If the stage corresponds to your turn, the maximum node is selected; if the stage is your opponent's turn, the minimum node is selected. Although the MinMax algorithm is simple, it is not efficient because it needs to search every branch. The a-{3 searching algorithm[Slagle 1971] is more efficient than the Min- 734 Max algorithm, because this algorithm tries to cut off the evaluation of unnecessary branches. If the game tree is expanded in a depth-first manner, the resources required to remember the game tree are small. This.expansion makes it easy to cut off the unnecessary branches, but reduces the parallelism. If the game tree is expanded in a breadth-first manner, it results in large parallelism, so this expansion is wellsuited for parallel computers. However, since the number of nodes increases exponentially as a function of the depth of the tree, the resources will be exhausted quickly if the parallelism is not controlled. 4 Execution Issues of a Checkers Game The overheads to parallelize the checkers program are the following: 1. overhead for allocating new function instances on other PEs. 2. overhead for transferring the current status of the table to other PEs. 3. idle PEs caused by an unbalanced load. 4. decline of efficiency caused by cutting branches in the a- f3 search. These overheads depend upon implementation strategy decisions. The function distribution strategy effects the function allocation overhead. Packed data transfer reduces the amount of transfer data. The idle PE ratio depends upon the load balancing strategy. The searching algorithm changes the branch cutting overhead. These overheads also depend upon the control of the parallelism and the searching strategy. Each of these decisions is described in greater detail in the following subsections. 4.1 Function distribution and load balancIng Load balancing is the most important issue in achieving high performance on parallel computers. Since the checkers program requires many function instances to expand the game tree, it distributes them among the PEs in order to balance the load across the machine. Our checkers program can distribute function calls by one of the following two strategies: round-robin distribution Each PE independently chooses the PE which will execute the called function in a round-robin manner. manager distribution A centralized manager PE chooses the PE which will execute the called function. We can also combine the two methods: that is, the manager distribution can be used until a certain level in the game tree expansion, and the round-robin distribution can be used after that level. In the roundrobin distribution, the load might be unbalanced at the beginning of the program. In the manager distribution, the overhead is larger than round-robin distribution because of packet communication overhead and concentration of requests. EM-4 dynamically distributes functions according to the load of PEs by the circular path load balancing described in section 2.2. The dynamic round-robin distribution described below is the third function distribution method that we evaluated in our checkers program. dynamic round-robin distribution A PE is dynamically chosen by the circular path load balancing method, and in the case that the MLPE packet has not returned, a PE is chosen by the round-robin distribution method. 4.2 Data transfer Since EM-4 is a distributed-memory parallel computer, the checkers program sends the status of the table and selected moves by packets to functions on other PEs. The status of the table is represented by a 64 word array, but each word is only 4 bits. The following two transfer methods are considered in the checkers program. unpacked transfer use packets which have data representing a position. packed transfer use packets which have packed data representing 8 positions. While the unpacked transfer sends eight times more packets than the packed transfer, the packed transfer needs to pack and unpack data. 4.3 Control of parallelism Parallelism has to be controlled to both avoid exhaustion of resources, and to provide sufficient parallelism to keep all the PEs busy. To control parallelism, throttling can limit the number of the active functions. If the number of active functions exceeds a certain amount, further requests for calling functions are buffered until other functions are finished. Throttling has the possibility of deadlock. Another way to control parallelism is to switch from breadth-first search to depth-first search at some level of the game tree, where the level can be determined either statically or dynamically. Static switching sets the level by the depth of the game tree. Dynamic switching determines the level using the load 735 of PEs. Breadth-first searching increases parallelism, and depth-first searching restrains parallelism. Our checkers program uses the static switching strategy to control parallelism, because this strategy is very simple. We plan to implement the dynamic switching strategy for the checkers program in the near future. 4.4 Kin-Max breadth-firat depth-firat Game tree searching algorithlTIs The two primary algorithms for the game tree searching problems are the Min-Max algorithm and the a-/3 algorithm. The Min-Max algorithm provides much parallelism in the breadth-first strategy. The a-{3 algorithm has high efficiency in the depth-first strategy. If the a-/3 algorithm is used only with the breadth-first strategy, it ignores the possibility of cutting branches, and it must search more trees than the a-{3 algorithm on a single processor. Since the ratio of branches cut off relative to the whole tree in the a-{3 algorithm increases according to the depth of the searching tree, a parallel a-/3 searching algorithm must be considered to increase the efficiency of branch cutting in the parallel environment. Parallel a-/3 searching is complicated because of the dilemma between parallelism and efficiency of branch cutting. Another important problem is the overhead of terminating functions. Since these function instances are distributed and activated in parallel, the overhead of terminating functions is more than overhead of creating functions. This difficult trade-off is simply resolved in our checkers program by changing algorithm in breadth-first strategy and depthfirst strategy. In the breadth-first strategy, we select the min-max algorithm to expand the parallelism, and in the depth-first strategy, we select the a-{3 algorithm to achieve the efficiency of cutting branch. We call this search "serial a-{3 search" in this paper. This search can be easily implemented, but the efficiency of branch cutting is less than the parallel a-{3 search[Oki et al. 1989]. To get more efficiency from branch cutting, the search that uses a-/3 search from the leaf of breadthfirst strategy is the "partial parallel a-/3 search". This search algorithm is illustrated in Figure 3. In this search, depth-first search is called in parallel from the leaf of breadth-first search, but the top node(which is indicated by B in the figure) of serial depth-first search gets the a-/3 value from the parent node (A) every time when the child node (C) return the evaluation result, and check whether the remain branch (C') can be cut off or not. The merit of this search is that we can expect enough efficiency from branch cutting and the overhead of terminating search is nothing Figure 3: partial parallel search since the child node in depth-first strategy is sequentialized. The checkers program can use the following three searching algorithms. Min-Max search using the Min-Max algorithm both breadth-first and depth-first. serial a-{3 search using the Min-Max algorithm breadth-first, and using the a-/3 algorithm depthfirst. partial parallel a-{3 search using the Min-Max algorithm breadth-first until the last level, and using the a-/3 algorithm in the last level of breadthfirst and then depth-first. 5 Experimental Results on the EM-4 We implemented the checkers program on the EM4 prototype in an assembly language to evaluate the performance of the EM-4 for dynamic and irregular problems. We examine the execution issues discussed in the previous section. 5.1 Effects of function distribution and load balancing An unbalanced workload causes idle PEs. Since the load balancing of the checkers program is performed at the function level, the function distribution strategy must be evaluated. The alternatives for the function distribution of the checkers program are the manager distribution, the round-robin distribution, the dynamic round-robin distribution, and combinations of these. We executed the checkers program using the partial parallel a-/3 search using each function distribution methods. Figure 4 shows the results. We represent the speedup ratio of each distribution relative to the round-robin distribution. We executed each combination of manager distribution and round-robin distribu- 736 eXl!cutio time(ms) speedup (\(\ 1.6-+---1+-/- - - - - 1F--\-f':--\\--+----I- :::===/1:'==:m=a~\=\.:=\\=\\\.:==: 1 1.3-+--/-f--r---t---~\I-\-'..-t\-.....-.... - - t - 30 1 o. 1 round·rob·n ". O.9-1---....>...V'---t----+-----''rl-\---t- 1 2 3 4 5 6 Depth of searching tree o.m 0.0 1 •• ---- ,~, .... , 1.13 . unpacked-t1'!'e;' / ... ....... ~;/ /' ./·:~35 1.61 ~- ..---::. ,z /pa~r·lnst/ 3.~//· packed.tlme 3 1.0-+.:----r--......--+-________+---"\r-I---+- """ / 0 o.3 ,/ ---------, \ ,;;/ 100 :::~::/:~~~~:~~~~~~:y-n-a-nu-.c-r+o-T~-nd-;,:-;~-,~-:~t-·-·-"· "': :->-.:-: .-tII instructions 3.", • .1 _____ ~~.~...b~~.~!~ ~n ....·unpacked-Inst ;/ /~ / .~.// y 2 3 4 5 6 Depth of searching tree Figure 4: Effects of function distribution Figure 5: Comparison of data transfer tion, and the fastest combination is shown in the figure. The combination uses the manager distribution until the third level, and thereafter uses the dynamic round-robin. When the level of tree searching is shallow, manager distribution is better, because the manager distribution allocates functions more evenly. Since the size of each function is large relative to the whole program, the heavily loaded PE will become a bottle-neck and the program cannot achieve sufficient speed-up, even if the load is only slightly unbalanced. When the level ofthe search tree becomes deeper, the dynamic roundrobin distribution is better, because the size of each function becomes small relative to the whole program, and a small load imbalance does not effect the execution time much. On the other hand, in the manager distribution, the requests of PE addresses for the function call concentrate on the manager PE. Because of the queue of requests, the long turnaround time of the function call makes the execution time slow. Furthermore, at the sixth level of the search tree in the manager distribution, the program cannot be executed because of overflow of the packet queue buffer. Since the execution of the dynamic round-robin distribution is 15% faster than the round-robin distribution when the searching tree is deep, this indicates that the dynamic round-robin strategy is effective in the case that there is sufficient parallelism. ory locations in a single PE. We compared the two data transfer method, unpacked and packed. The unpacked transfer uses a packet which has data representing a position, while the packed transfer uses a packet which has packed data representing 8 positions. 5.2 Effects of data transfer To parallelize the program, data must be transferred between PEs, while data is only passed between mem- Figure 5 is the results by the checker program of the partial parallel a-{3 search using the combination of manager and dynamic round-robin method as the function distribution. This figure shows the execution time and the total number of executed instructions of both data transfer method. Note that the execution time and the total number of the executed instructions are figured on a logarithmic scale. In this figure, the number of executed instructions of the packed packet transfer is 50% more than the unpacked transfer for each level. The increase of the executed instructions is caused by the pack and unpack operations. When the level is shallow, the execution of the unpacked transfer is 1.5 times faster than the packed transfer. This speed-up ratio is the same as the instruction amount ratio. But when the level is deep, packed transfer is a little faster than the unpacked transfer while the instruction count of the packed transfer is larger than the unpacked transfer. Figure 6 shows the number of active PEs and overhead PEs in both data transfer strategies. An overhead PE is a PE which is waiting for the ready of the network to send a packet or stores the packet in the memory packet buffer when the on-chip packet buffer overflows. An active PE is a PE which is neither an overhead PE nor an idle PE. At the shallow levels, the active PE ratio of both transfer strategy is low. VVhen 737 Speedup ralio PE ratio (%) 2~ 70 60 50 pack~d-act ,I" 40 / ,// V-- I / /", M n~.~~.~..................................., ' " ' ~/'/ ,/ ,, 10 ...... . .' //,: " unpackec -act ./ 30 20 V ~ : : .5:~-----+--/7;~~--~-----r-----r----T- // .,: /., :- .2~~-----¥----~----1-----+---~r---~- .... unp~cked-ovh ..... I ,/,1 .1~~~-+----~----~----+-----r---~/" ./: .05:~~--J.---+---+---+---t----t- o 5 o 6 Depth of searching tree Figure 6: Examination of the active PE ratio comparing the data transfer the level becomes deep, the active PE ratio of the unpacked transfer is 30% lower than the packed transfer, and the overhead PE ratio of the unpacked transfer is 30% higher than the packed transfer. This high overhead PE ratio of the unpacked transfer is the reason why it is slower than the packed transfer. Since the unpacked transfer needs to send more packets than the packed transfer, the network has many conflicts, resulting in large overhead. Although the packed transfer shows the high ratio of the active PEs on the surface, a third of the instructions are used for packing and unpacking the packets, and the packed transfer is not so effective. Since the pipeline of the EM-4 is designed to send packets quickly, unpacked transfer is suitable for the EM-4. If there are many conflicts in the network, however, the overhead decreases the performance of sending packets. One way to reduce this overhead is to avoid the network conflicts by allocating the function locally. Since the manager and round-robin distributions does not take into account the locality between the PE which calls the function and the PE which executes the function, it increases the possibility of network conflicts. If the execution PE is selected from the neighbors of the calling PE, network conflicts do not occur as frequently. Another way to control the parallelism is by limiting the number of active functions. This is examined in detail in the next subsection. 1 2 3 4 5 6 Deplh of parallel searching Figure 7: Effects of parallelism control 5.3 Effects of parallelism control While parallelism must be exploited to make the program execution faster, as mentioned before, too much parallelism causes some overhead. It is necessary to control the parallelism in order to avoid the exhaustion of resources, and to reduce the overhead of parallelization. The checkers program controls parallelism by switching the searching strategy from a breadthfirst manner to a depth-first manner. Figure 7 shows the speedup ratio to the sequential execution of the a-{3 search when the switchover level of the parallelism control strategy is changed. The execution uses the combination of manager and dynamic round-robin method as the function distribution strategy and the unpacked method as the data transfer strategy. Note that the X-axis represents the depth of the breadth-first searching, while these all execution search the game tree until the depth is the sixth level. In the Min-Max search, the deeper level of parallel searching results in more parallelism, and the maximum speedup becomes 49 times. Exploiting maximum parallelism, however, does not necessarily achieve speedup. One reason is that at the sixth level, too many packets are sent and the overhead of network conflicts becomes much larger than at the shallow levels. Another reason is that excessive parallelism is just overhead such as data transfer or remote function invocation, since sufficient parallelism is exploited until the fifth level. It is sufficient to have as much parallelism as needed to activate every PE and hide the latency of remote access - excessive parallelism is not 738 helpful. The serial a-{3 search executes fastest at the second level, and when the level is deeper the performance decreases. This is because parallel searching uses breadth-first search, and much information that could be used to cut subtrees is discarded to parallelize the program. As parallel searching gets deeper, more information is discarded. As a result, it reduces the efficiency of cutting excessive branches, and increases the number of trees to be evaluated. The partial parallel a-{3 is same as the serial a-{3 search. 5.4 Effects of searching algorithms Figure 7 also shows the effects of searching algorithms. The execution of the Min-Max search on 80 PEs is 49 times faster than the Min-Max search on a single PE, but only 1.8 times faster than the a-{3 search on one PE. This shows that the Min-Max search is suitable for parallel execution, but that it is difficult to compensate for the difference of efficiency between the Min-Max search and the a-{3 search by parallel execution. The a-{3 search is a very serial algorithm, but can achieve 16 times speedup via partial parallel a/3 search, while the serial a-/3 search can achieve 6 times speedup. This is because the partial parallel a-{3 search uses the information of cutting trees at the last level of parallel searching, and the efficiency of cutting trees in the partial parallel a-{3 search is higher than the serial a-{3 search. 6 Conclusion and Future plans To evaluate the highly parallel computer EM-4 on dynamic and irregular programs, we execute the game tree searching problem of checkers on the EM-4 prototype, which consists of 80 PEs. The effects of the strategies for load balancing, data transfer, parallelism control and searching algorithm are examined. Our checkers program achieves 49 times speedup in the Min-Max search and 16 times speedup in the a-{3 search on 80 PEs system. In this execution, the combination of the manager distribution until the third level and the dynamic round-robin distribution thereafter is used as the function distribution method for load balancing, the unpacked transfer is used as the data transfer strategy, and the static switching from the breadth-first to the depth-first at the fifth level in the Min-Max search and at the second level in the a-{3 search is used to control parallelism. In this evaluation, we demonstrated that the EM4 is effective for dynamic load balancing, fine grain packet communication and high performance of instruction execution. In the near future, we plan to implement a dynamic switching strategy which controls parallelism according to the load of neighboring PEs. We will also implement the full parallel a-{3 search, compare it with partial parallel a-{3 search, and make clear the advantages and disadvantages of each method in the EM-4 for the parallel game tree searching. Furthermore, we are designing a higher performance parallel computer EM-5. This computer will reduce the overheads which are found in these evaluations such as network conflicts. Acknow ledgments We wish to thank Dr. Toshitsugu Yuba, Director of the Computer Science Division, Mr. Toshio Shimada, Chief of the Computer Architecture Section for supporting this research, and the staff of the Computer Architecture Section for their fruitful discussions. Special thanks are due to Dr. Mitsuhisa Sato of the Computer Architecture Section and Mr. Andrew Shaw of MIT for their suggestions and careful reading. References [Slagle 1971] James R. Slagle, Artificial Intelligence: The Heuristic Programming Approach, McGrawHill Inc., (1971). [Oki et al. 1989] H. Oki, K. Taki, S. Sei and S. Huruichi, The parallel execution and evaluation of a go problem on the multi PSI, Proc. of the Joint Symp. on Parallel Processing '89, (1989), 351357.(in Japanese) [Yamaguchi et al. 1989] Y. Yamaguchi, S. Sakai, K. Hiraki, Y. Kodama and T. Yuba, An Architectural Design of a Highly Parallel Dataflow Machine, Proc. of IFIP 89, (1989), 1155-1160. [Sakai et al. 1989] S. Sakai, Y. Yamaguchi, K. Hiraki, Y. Kodama and T. Yuba, An Architecture of a Dataflow Single Chip Processor, Proc. ofISCA 89, (1989), 46-53. [Kodama et al. 1990] Y. Kodama, S. Sakai and Y. Yamaguchi, A Prototype of a Highly Parallel Dataflow Machine EM-4 and its Preliminary Evaluation, Proc. of InfoJapan 90, (1990), 291-298. [Kodama et al. 1991] Y. Kodama, S. Sakai and Y. Yamaguchi, Load balancing by Function Distribution on the EM-4 Prototype, to appear in Supercomputing '91, (1991). PROCEEDINGS OF THE INTERNA nONAL CONFERENCE ON FIFTH GENERA nON COMPUTER SYSTEMS 1992, edited by lCOT. © ICOT, 1992 739 OR-Parallel Speedups in a Knowledge Based System: on Muse and Aurora Khayri A. M. Ali and Roland Karlsson Swedish Institute of Computer Science, SICS Box 1263, S-164 28 Kista, Sweden khayri@sics.se and roland@sics.se Abstract The paper presents experimental results of running a knowledge based system that applies a set of rules to a circuit board (or a gate array) design and reports any design errors, on two OR-parallel Prolog systems, Muse and Aurora, implemented on a number of shared memory multiprocessor machines. The knowledge based system is written in SICStus Prolog, by the Knowledge Based Systems group at SICS in collaboration with groups from some Swedish companies, without considering parallelism. When the system was tested on Muse and Aurora, without any modifications, the ORparallel speedups were very encouraging as a large practical application. The number of processors used in our experiment is 25 on Sequent Symmetry (S81), 37 on BBN Butterfly II (TC2000), and 70 on BBN Butterfly I (GP1000). The results obtained show that the Aurora system is much more sensitive to the machine architecture than the Muse system, and the latter is faster than the former on all the three machines used. The real speedup factors of Muse, relative to SICStus, are 24.3 on S81, 31.8 on TC2000, and 46.35 on GP1000. 1 Introduction Two main types of parallelism can be extracted from a Prolog program. The first, AND-parallelism, utilizes possibilities for simultaneous execution of several subproblems offered by Prolog semantics. The second, ORparallelism, utilizes possibilities for simultaneous search for multiple solutions to a single problem. This paper is concerned with two systems exploiting only the latter type of parallelism: Muse [Ali and Karlsson 1990a] and Aurora [Lusk et ai. 1990]. Both systems support the full Prolog language with its standard semantics, and they have been implemented on a number of shared multiprocessor machines, ranging from a few processors up to around 100 processors. Both systems show good speedups, in comparison with good sequential Prolog systems, for programs with a high degree of ORparallelism. The two systems are based on two dif- ferent memory models. Aurora is based on the SRI [Warren 1987] and Muse on incremental copying of the WAM stacks [Ali and Karlsson 1990a]. The two systems are implemented by adapting the same sequential Prolog system, SICStus version 0.6. The extra overhead associated with this adaptation is low and depends on the Prolog program and the machine architecture. For a large set of benchmarks, the average extra overhead for the Muse system on one processor is around 5% on Sequent Symmetry, 8% on BBN Butterfly GP1000, and 22% on BBN Butterfly TC2000. For the Aurora system with the same set of the benchmarks, it is around 25% on Sequent Symmetry, 30% on BBN Butterfly GP1000, and 77% on BBN Butterfly TC2000. Earlier results [Ali and Karlsson 1990b, Ali and Karlsson 1990c, Ali et ai. 1991a, Ali et ai. 1991b] show that the Muse system is faster than the Aurora system for a large set of benchmarks and on the above mentioned machines. In this paper we investigate the performance results of Muse and Aurora systems on those multiprocessor machines for a large practical knowledge based system [Holmgren and Orsvarn 1989, Hagert et ai. 1988]. The knowledge based system is used to check a circuit board (or a gate array) design with respect to a set of rules. These rules may for example be imposed by the development tool, by company standards or testability requirements. The knowledge based system has been written in SICStus Prolog [Carlsson and Widen 1988], by the Knowledge Based Systems group at SICS in collaboration with groups from some Swedish companies, without considering parallelism. The gate array used in our experiment consists of 755 components. The system was tested on Muse and Aurora without any modifications. One important goal that has been achieved by Muse and Aurora systems is running Prolog programs that have OR-parallelism with almost no user annotations for getting parallel speedups. The speedup results obtained are very good on all the machines used for the Muse system, but not for Aurora on the Butterfly machines. We found that this application has high OR-parallelism. In this paper we are going to present and discuss the results obtained from the Aurora and Muse systems on the three machines 740 3 used. The paper is organized as follows. Section 2 briefly describes the three machines used in our experiment. Section 3 briefly describes the two OR-parallel Prolog systems, Muse and Aurora. Section 4 presents the knowledge based system. Sections 5 and 6 present and discuss the experimental results. Section 7 concludes the paper. 2 Multiprocessor Machines The three machines used in our study are Sequent Symmetry S81, BBN Butterfly TC2000, and BBN Butterfly GP1000. Sequent Symmetry is a shared memory machine with a common bus capable of supporting up to 30 (i386) processors. Each processor has a 64-KByte cache memory. The bus supports cache coherence of shared data and its capacity is 80 MByte/sec. It presents the user with a uniform memory architecture and an equal access time to all memory. The Butterfly GP1000 is a multiprocessor machine capable of supporting up to 128 processors. The GPlOOO is made up of two subsystems, the processor nodes and the butterfly switch, which connects all nodes. A processor node consists of an MC68020 microprocessor, 4 MByte of memory and a Processor Node Controller (PNC) that manages all references. A non-local memory access across the switch takes about 5 times longer than local memory access (when there is no contention). The Butterfly switch is a multi-stage omega interconnection network. The switch on the GP1000 has a hardware supported block copy operation, which is used to implement the Muse incremental copying strategy. The peak bandwidth of the switch is 4 MBytes per second per switch path. The Butterfly TC2000 is a similar to the GP1000 but is a newer machine capable of supporting up to 512 processors. The main differences are that the processors used in the TC2000 are the Motorola 88100s. They are an order of magnitude faster than the MC68020 and have two 16-KByte data and instruction caches. Thus in the TC2000 there is actually a three level memory hierarchy: cache memory, local memory and remote memory. Unfortunately no support is provided for cache coherence of shared data. Hence by default shared data are not cached on the TC2000. The peak bandwidth of the Butterfly switch on the TC2000 is 9.5 times faster than the Butterfly GP1000 (at 38 MBytes per second per path). The TC2000 switch does not have hardware support for block copy. OR-Parallel Systems In Muse and Aurora, OR-parallelism in a Prolog search tree is explored by a number of workers (processes or processors). A major problem introduced by ORparallelism is that some variables may be simultaneously bound by workers exploring different branches of a Prolog search tree. Two different approaches have been used in Muse and Aurora systems for solving this problem. Muse uses incremental copying of the WAM stacks [Ali and Karlsson 1990a] while Aurora uses the SRI memory model [Warren 1987]. The idea of the SRI model is to extend the conventional WAM with a large binding array per worker and modify the trail to contain address-value pairs instead of just addresses. Each array is used by just one worker to store and access conditional bindings, i.e. bindings to variables which are potentially shareable. The WAM stacks are shared by all workers. The nodes of the search tree contain extra fields to enable workers to move around the tree. When a worker finishes a task, it moves over the tree to take another task. The worker starting a new task must partially reconstruct its array using the trail of the worker from which the task is taken. The incremental copying of the WAM stacks used in Muse is based on having a number of sequential Prolog engines, each with its own local address space, and some global address space shared by all engines. Each sequential Prolog engine is a worker with its own WAM stacks. The stacks are not shared between workers. Thus, each worker has bindings associated with its current branch in its own copy of the stacks. This simple solution allows the existing sequential Prolog technology to be used without loss of efficiency. But it requires copying data (stacks) from one worker to another when a worker runs out of work. In Muse, workers incrementally copy parts of the (WAM) stacks and also share nodes with each other when a worker runs out of work. The two workers involved in copying will only copy the differing parts between the two workers states. The shared memory space stores information associated with the shared nodes on the search tree. Workers get work from shared nodes through using the normal backtracking mechanism of Prolog. Each worker having its own copy of the WAM stacks simplifies garbage collection, and caching the WAM stacks on machines, like the BBN Butterfly TC2000, that do not support cache coherence of shared data. A node on a Prolog search tree corresponds to a Prolog choicepoint. Nodes are either shared or nonshared (private). These nodes divide the search tree into two regions: shared and private. Each worker can be in either engine mode or in scheduler mode. In the engine mode, the worker works as a sequential Prolog system on private nodes, but is also able to respond to interrupt signals from other workers. Anytime a worker has to ac- 741 cess the shared region of the search tree, it switches to the scheduler mode and establishes the necessary coordination with other workers. The two main functions of a worker in the scheduler mode are to maintain the sequential semantics of Prolog and to match idle workers with the available work with minimal overhead. The two systems, Muse and Aurora, have different working schedulers on the three machines used in our experiment. Aurora has two schedulers: the Argonne scheduler [Butler et al. 1988] and the Manchester scheduler [Calderwood and Szeredi 1989]. According to the reported results, the Manchester scheduler always gives better performance than the Argonne scheduler [Mudambi 1991, Szeredi 1989]. So, the Manchester scheduler will be used for Aurora in our experiment. Muse has only one scheduler [Ali and Karlsson 1990c, Ali and Karlsson 1991], so far. The main difference between the Manchester scheduler for Aurora and the Muse scheduler is in the strategy used for dispatching work. The strategy used by the Manchester scheduler is that work is taken from the topmost node on a branch, and only one node at a time is shared. In Muse, several nodes at a time are shared and work is taken from the bottommost node on a branch. The bottommost strategy approximates the execution of sequential implementations of Prolog within a branch. Another difference between the two schedulers is in the algorithms used in the implementation of cut and side effects to maintain the standard Prolog semantics. Many optimizations have been made of implementation of the Aurora and Muse systems on all the three machines. The only optimization that has been implemented for Muse and not for Aurora is caching the WAM stacks on the BBN Butterfly TC2000. In Aurora the WAM stacks are shared by the all workers while in Muse each worker has its own copy of the WAM stacks. Therefore, it is straightforward for Muse to make the WAM stack areas cachable whereas in Aurora it requires a complex cache coherence protocol to achieve this effect. 4 Knowledge Based System One important process in the design of circuit boards and gate arrays is the checking of the design with respect to a set of rules. These rules may for example be imposed by the development tool, by company standards or by testability requirements. Until now, many of these rules have only been documented on paper. The check is performed manually by people who know the rules well. Increasing the number of gates in circuit boards (or in gate arrays) makes the manual check a very difficult process. Computerizing this process is very useful and may be the most reliable solution. The knowledge based systems group at SICS, in collaboration with groups from some Swedish companies, has been developing a knowledge based system that applies a set of rules to a circuit board (or a gate array) design and reports any design errors [Hagert et al. 1988, Holmgren and Orsvarn 1989]. The groups have developed two versions of the knowledge based system. The first version has been developed using a general purpose expert system shell while the second has been developed using SICStus Prolog. The latter, which will be used in our experiment, is more flexible and more efficient than the former. It is around 10 times faster than the first version on single processor machines. When it has been tested, without any modifications, on Muse and Aurora systems on Sequent Symmetry, the speedups obtained are linear up to 25 processors. One reason for the high degree of OR-parallelism in this kind of application is that all of the rules applied to the circuit board (or a gate array) design are independent or could be made independent of each other. The second source of OR-parallelism is the application of each rule to all instances of a given circuit sub-assembly on the board. A circuit sub-assembly can be either a component (like buffer') inver'ter') nand) and) nor') Or') XOr') etc.) or a group of interconnected components. The knowledge based system mainly consists of an inference engine, design rules, and a database describing the circuit board (or the gate array). The inference engine is implemented as a metainterpreter with only 8 Prolog clauses. The gate array used in our experiment consists of 755 components (Texas gate array family TGC-100), which is described by around 10000 Prolog clauses. The design rules part with its interface to the gate array description is around 200 Prolog clauses. Eleven independent rules are used in this experiment. The metainterpreter applies the set of rules to the gate array description. For a larger gate array more OR-parallelism is expected. It should be mentioned that people who developed the knowledge based system did not at all consider parallelism, but they tried to make their system easy to maintain by writing clean code. They avoided using side effects, but they have used cuts (embedded in ILThen_Else) and findall constructs. The user interface part of this application is not included in our experiment. Since Muse and the Aurora system are also running on larger machines, the BBN Butterfly machines, it was more natural to test the knowledge based system on those machines. The speedup results obtained differ for the Muse and the Aurora system. On 37 TC2000 processors, Muse is 31.8 times faster than SICStus, while Aurora is only 7.3 times faster than SICStus. Similarly, on 70 GP1000 processors Muse is 46.35 times faster than SICStus, while Aurora is only 6.68 times faster than SICStus. The low speedup for the Aurora system is surprising since this application is rich in OR-parallelism. Is this a scheduler problem for Aurora or an engine problem? The following two sections are going to present and 742 analyze the results of Muse and Aurora, in order to try to answer this question. Speed-up 25 5 Timings and Speedups • Muse o Aurora 20 In this section we present timing and speedup results obtained from running the knowledge based system on Muse and Aurora systems. The runtimes given in this paper are the mean values obtained from eight runs. On Sequent Symmetry, there is no significant difference between mean and best values, whereas on the Butterfly machines, mean values are more reliable than best values due to variations of timing results from one run to another!. Variations around the mean value will be shown in the graphs by a vertical line with two short horizontal lines at each end. The speedups given in this section are relative to running times of Muse on one processor on the corresponding machine. The SICStus one-processor runtime on each machine will also be presented to determine the extra overhead associated with adapting the SICStus Prolog system to the Aurora and Muse systems. Sections 5.1, 5.2, and 5.3 present those results on Sequent Symmetry, GP1000, and TC2000 machines, respectively. 5.1 Sequent Symmetry Table 1 shows the runtimes of Aurora and Muse on Sequent Symmetry, and the ratio between them. Times are shown for 1,5, 10, 15,20, and 25 workers with speedups (relative to one Muse worker) given in parentheses. The SICStus runtime on one Sequent Symmetry processor is 422.39 seconds. This means that for this application and on the Sequent Symmetry machine the extra overhead associated with adapting the SICStus Prolog system to Aurora is 26.3%, and for Muse is only 1.0% (calculated from Table 1). The performance results that Table 1 illustrates are good for both systems, and Aurora timings exceed Muse timings by 25% to 26% between 1 to 25 workers. Figure 1 shows speedup curves for Muse and Aurora on Sequent Symmetry. Both systems show linear speedups with no significant variations around the mean values. Table 1: Runtimes (in seconds) of Aurora and Muse on Symmetry, and the ratio between them. I Workers 1 5 10 15 20 25 1/ Aurora I Muse 533.69(0.80) 426.74(1.00) 106.87(3.99) 85.67( 4.98) 53.58(7.96) 42.94(9.94) 36.06(11.8) 28.73(14.9) 27.22(15.7) 21.65(19.7) 21.83(19.5) 17.39(24.5) II Aurora/Muse 1.25 1.25 1.25 1.26 1.26 1.26 IThese variations are due mainly to switch contention. I o o 15 o 10 o 5 o 5 10 15 20 25 Workers Figure 1: Speedups of Muse and Aurora on Symmetry, relative to 1 Muse worker. 5.2 BBN Butterfly GP1000 Table 2 shows the runtimes of Aurora and Muse on GP1000 for 1, 10, 20, 30, 40, 50, 60, and 70 workers. The SICStus runtime on one GP1DDD node is 534.4 seconds. So, for this application and on the GP1DDD machine the extra overhead associated with adapting the SICStus Prolog system to Aurora is 66%, and for Muse is only 7%. Here the performance results are good for the Muse system but not for the Aurora system. Aurora timings are longer than Muse timings by 55% to 594% between 1 to 70 workers. Figure 2 shows speedup curves corresponding to Table 2 with va~iations around the mean values. The speedup curve for. Aurora levels off beyond around 2D workers. On the other hand, the Muse speedup curve levels up as more workers are added. Table 2: Runtimes (in seconds) of Aurora and Muse on GP1DOD, and the ratio between them. I Workers II 1 10 20 30 40 50 60 70 Aurora 886.4(0.65) 105.3(5.44) 74.1(7.72) 72.7(7.88) 64.3(8.91) 72.4(7.90) 65.7(8.71) 80.0(7.15) I Muse 572.3(1.00) 58.3(9.82) 29.8(19.2) 20.7(27.7) 16.1(35.5) 13.8(41.6) 12.4(46.1) 11.5(49.6) II Aurora/Muse 1.55 1.81 2.49 3.52 3.99 5.26 5.29 6.94 I 743 Speed-up Speed-up 50 ! I 30 45 40 t • Muse o Aurora 35 30 • 25 • • Muse 25 t o Aurora • 20 20 • 15 15 10 10 o 5 m: 10 20 30 40 50 60 5 70 Workers 5 Table 3: Runtimes (in seconds) of Aurora and Muse on TC2000, and the ratio between them. Aurora 180.55(0.59) 22.12( 4.79) 16.02( 6.61) 13.66(7.76) 13.79(7.68) I ! f Muse 105.97(1.00) 10.81(9.80) 5.56(19.1) 3.93(27.0) 3.29(32.2) II 10 15 20 25 30 35 Figure 3: Speedups of Muse and Aurora on TC2000, relative to 1 Muse worker. Table 3 shows the performance results of Aurora and Muse on TC2000 for 1, 10, 20, 30, and 37 workers. The SICStus runtime on one TC2000 node is 100.48 seconds. Thus, for this application and on the TC2000 machine the extra overhead associated with adapting the SICStus Prolog system to Aurora is 80%, and for Muse is only 1 10 20 30 37 I Workers BBN Butterfly TC2000 I Workers II ! 0 Figure 2: Speedups of Muse and Aurora on GP1000, relative to 1 Muse worker. 5.3 m: 08- Aurora/Muse I 1.70 2.05 2.88 3.48 4.19 5%. Here also the performance results are good for the Muse system but not for the Aurora system. Aurora timings are longer than Muse timings by 70% to 319% between 1 to 37 workers. Figure 3 shows speedup curves corresponding to Table 3. The speedup curves are similar to the corresponding ones shown in Figure 2. 6 Analysis of Results From the results presented in Section 5 we found that the Muse system shows good performance results on the three machines, whereas the Aurora system shows good results only on the Sequent Symmetry. In this section, we try to explain the reason for these results by studying the Muse and Aurora implementations on one of the Butterfly machines (TC2000). The TC2000 has better support for reading the realtime clock than the GP1000. A worker time could be divided into the following three main activities: 1. Prolog: time spent executing Prolog (i.e., engine time). 2. Idle: time spent waiting for work to be generated when there is temporarily no work available in the system. 3. Others: time spent in all the other activities (i.e., all scheduling activities) like spin lock, signalling other workers, performing cut, grabbing work, sharing work, looking for work, binding installation (and copying in Muse), synchronization between workers, etc. Table 4 and Table 5 show time spent in each activity and the corresponding percentage of the total time. Results shown in Table 4 and Table 5 have been obtained from instrumented versions of Muse and Aurora on the TC2000. The times obtained from the instrumented versions are longer than those obtained from 744 Table 4: Time (in seconds) spent in the main activities of Muse workers on TC2000. Muse Workers Prolog 1 5 10 20 30 128.36(100) 128.80(99.7) 129.28(99.1) 129.90(96.5) 130.32(95.4) Activity Idle 0 0.09(0.1) 0.40(0.3) 3.56(2.6) 4.17(3.0) Others 0 0.26(0.2) 0.71(0.5) 1.17(0.9) 2.11(1.5) Table 5: Time (in seconds) spent in the main activities of Aurora workers on TC2000. Aurora Workers Prolog 1 5 10 20 30 210.42(98.2) 221.24(98.3) 235.34(98.1) 329.60(98.1) 412.97(94.7) Activity Idle 0 0.19(0.1) 0.43(0.2) 1.11(0.3) 13.70(3.2) Others 2.36(1.1) 2.03(0.9) 2.43(1.0) 3.61(1.1) 7.64(1.8) WAM stacks are shared by all workers while in Muse each worker has its own copy of the WAM stacks. The global Prolog tables have been implemented similarly in the both Muse and Aurora systems. Since the Muse engine does not have any problem with the Prolog tables, the problem should lie in the sharing of the WAM stacks in Aurora, coupled with the fact that this application generates around 9.8 million conditional bindings, and executes around 1.1 million Prolog procedure calls. On the average, each procedure call gerierates around 9 conditional bindings. This may mean that the reason why Aurora slows down lies in the cactus stack approach, which causes a great many non-local accesses to the Prolog stacks. This results in a high amount of switch contention once over five workers. This is avoided in the Muse model, since each worker has its own copy of the WAM stacks in the processor local memory and the copy is even cachable. Unfortunately, we could not verify this hypothesis because the current Aurora implementation on the TC2000 does not provide any support for measuring the stack variables access time. uninstrumented systems by around 19-27%. So, they might not be entirely accurate, but they help in indicating where most of the overhead is accrued. 7 Before analyzing the data in Table 4 and Table 5 we would like to make two remarks on these data. The first remark is that in the Aurora system the overhead of checking for the arrival of requests is separated from the Prolog engine time, while in the Muse system there is no such separation. This explains why there is scheduling overhead (Others) in the 1 worker case in Table 5 and not in Table 4. The other remark is that the figures obtained from the Aurora system do not total 100% of time, since a small fraction of the time is not allocated to any of the three activities. However, these two factors have no significant impact on the following discussion. Experimental results of running a large practical knowledge based system on two OR-parallel Prolog systems, M use and Aurora, have been presented and discussed. The number of processors used in our experiment is 25 on Sequent Symmetry (S81), 37 on BBN Butterfly II (TC2000), and 70 on BBN Butterfly I (GP1000). The knowledge based system used in our study checks a circuit board (or a gate array) design with respect to a set of rules and reports any design errors. It is written in SICStus Prolog, by the Knowledge Based Systems group at SICS in collaboration with groups from some Swedish companies, without considering parallelism. It is used in our experiment without any modifications. By careful investigation of Table 4 and Table 5 we find that the total Prolog time of Muse workers is almost constant with respect to the number of workers whereas the corresponding time for Aurora grows rapidly as new workers are added. We also find that the scheduling time (Others) in Table 5 is not very high in comparison with the r:orresponding time in Table 4. Similarly, the difference of Idle time between Muse and Aurora is not so high. So, the main reason for performance degradation in Aurora is the Prolog engine speed. We think that the only factor that slows down the Aurora engine as more workers are added is the high access cost of non-local memory. Non-local memory access takes longer time than local memory access, and causes switch contention. Non-local memory accesses can be due to either the global Prolog tables or the WAM stacks. In Muse and Aurora systems, the global tables are partitioned into parts and each part resides in the local memory of one processor. In Aurora the Conclusions The results of our experiment show that this class of applications is rich in OR-parallelism. Very good real speedups, in comparison with SICStus Prolog system, have been obtained for the Muse system on all three machines. The real speedup factors for Muse are 24.3 on 25 S81 processors, 31.8 on 37 TC2000 processors, and 46.35 on 70 GP1000 processors. The obtained real speedup factors for Aurora are lower (than for Muse) on Sequent Symmetry, and much lower on the Butterfly machines. The Aurora timings are longer than Muse timings by 25% to 26% between 1 to 25 S81 processors, 70% to 319% between 1 to 37 TC2000 processors, and 55% to 594% between 1 to 70 GP1000 processors. The analysis of the obtained results indicates that the main reason for this great difference between Muse timing and Aurora timing (on the Butterfly machines) lies in the Prolog engine and not in the scheduler. The Aurora engine is based on the SRI memory model in 745 which the WAM stacks are shared by the all workers. We think that the only reason why the Aurora engine slows down as more workers are added is to be found in the large number of non-local accesses of stack variables. This results in a high amounts of switch contention as more workers are added. This is avoided in the Muse model, since each worker has its own copy of the WAM stacks in the processor local memory and even cachable in the TC2000. Unfortunately, we could not verify this hypothesis because the current Aurora implementation on the Butterfly machines does not provide any support for measuring access time of stack variables. 8 Acknowledgments We would like to thank the Argonne National Laboratory group for allowing us to use their Sequent Symmetry and Butterfly machines. We thank Shyam Mudambi for his work on porting Muse and Aurora to the Butterfly machines. We also would like to thank Fredrik Holmgren, Klas Orsvarn and Ingvar Olsson for discussions and allowing us to use their knowledge based system. References [Ali and Karlsson 1990a] Khayri A. M. Ali and Roland Karlsson. The Muse Approach to OR-Parallel Prolog. International Journal of Parallel Programming, pages 129-162, Vol. 19, No.2, April 1990. [Ali and Karlsson 1990b] Khayri A. M. Ali and Roland Karlsson. The Muse OR-Parallel Prolog Model and its Performance. In Proceedings of the 1990 North A merican Conference on Logic Programming, pages 757776, MIT Press, October 1990. [Ali and Karlsson 1990c] Khayri A. M. Ali and Roland Karlsson. Full Prolog and Scheduling OR-Parallelism in Muse. International Journal of Parallel Programming, pages 445-475, Vol. 19, No.6, Dec. 1990. [Ali and Karlsson 1991] Khayri A. M. Ali and Roland Karlsson. Scheduling OR-Parallelism in Muse. In Proceedings of the 1991 International Conference on Logic Programming, pages 807-821, Paris, June 1991. [Ali et al. 19"91a] Khayri A. M. Ali, Roland Karlsson and Shyam Mudambi. Performance of Muse on the BBN Butterfly TC2000. In Proceedings of the ICLP'91 Pre-Conference Workshop on Parallel Execution of Logic Programs, June 1991. To appear also in Lecture Notes in Computer Science, Springer Verlag. [Ali et al. 1991b] Khayri A. M. Ali, Roland Karlsson and Shyam Mudambi. Performance of Muse on Switch-Based Multiprocessor Machines. Submitted to the New Generation Computing Journal, 1991. [Butler et al. 1988] Ralph Butler, Terry Disz, Ewing Lusk, Robert Olson, Ross Overbeek and Rick Stevens. Scheduling OR-parallelism: an Argonne perspective. In Proceedings of the Fifth International Conference and Symposium on Logic Programming, pages 15901605, MIT Press, August 1988. [Calderwood and Szeredi 1989] Alan Calderwood and Peter Szeredi. Scheduling OR-parallelism in Aurorathe Manchester scheduler. In Proceedings of the sixth International Conference on Logic Programming, pages 419-435, MIT Press, June 1989. [Carlsson and Widen 1988] Mats Carlsson and Johan Widen. SICStus Prolog User's Manual. SICS Research Report R88007B, October 1988. [Hagert et al. 1988] G. Hagert, F. Holmgren, M. lidell and K. Orsvarn. On Methods for Developing Knowledge Systems-an Example in Electronics, Mekanresult at 88003 (in Swedish), Sveriges Mekanforbund, Box 5506, 114 85 Stockholm, 1988. [Holmgren and Orsvarn 1989] Fredrik Holmgren and Klas Orsvarn. Towards a Domain Specific Shell for Design Rule Checking. In Proceedings of the IFIP TC 10/WG10.2 Working Conference on the CAD Systems Using AI Techniques. pages 221-228, Tokyo, June 6-7, 1989. [Lusk et al. 1990] Ewing Lusk, David H. D. Warren, Seif Haridi, et al. The Aurora OR-parallel Prolog System. New Generation Computing, 7(2,3): 243-271, 1990. [Mudambi 1991] Shyam Mudambi. Performance of Aurora on NUMA machines. In Proceedings of the 1991 International Conference on Logic Programming, pages 793-806, Paris, June 1991. [Szeredi 1989] Peter Szeredi. Performance analysis of the Aurora OR-parallel Prolog System. In Proceedings of the 1989 North A merican Conference on Logic Programming, pages 713-732, MIT Press, March 1989. [Warren 1987] David H. D. Warren. The SRI Model for OR-parallel Execution of Prolog-Abstract Design and Implementation Issues. In Proceedings of the 1987 Symposium on Logic Programming, pages 92102, 1987. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by {COT. © ICOT, 1992 ' 746 A Universal Parallel Computer Architecture William J. Dally Artificial Intelligence Laboratory and Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 billd@ai.mit.edu Abstract Advances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines has a much higher ratio of processor to memory area and hence provides greater processor throughput and memory bandwidth per unit cost relative to conventional memory-dominated machines. This paper describes the technology and architecture trends motivating fine-grain architecture and the enabling technologies of high-performance interconnection networks and low-overhead interaction mechanisms. We conclude with a discussion of our experiences with the J-Machine, a prototype fine-grain concurrent computer. 1 Introduction Computer architecture involves balancing the capabilities of components (processors, memories, and communication facilities), organizing the connections between the components, and choosing the mechanisms that control how components interact. The top-level organization of most computer systems is similar. As shown in Figure 1, all parallel computers consist of a set of processing nodes each of which contains a processor, some memory, and a communication interface. The nodes are interconnected by a communication facility (typically a network). A sequential processor is the special case where there is only a single node and the network is used only to connect to I/O devices. At present, the organization of processors and memories is well understood and network technology is rapidly maturing. While these components continue to evolve IThe research described in this paper was supported in part by the Defense Advanced Research Projects Agency under contracts N00014-88K-0738 and N00014-87K-0825, in part by a National Science Foundation Presidential Young Investigator Award, grant MIP-8657531, with matching funds from General Electric Corporation and IBM Corporation, and in part by assistance from Intel Corporation. Node 1 Node N Node 2 ••• Network Figure 1: The structure of a parallel computer or multicomputer. All multicomputers consist of a collection of nodes connected by a network. Each node contains a processor (P), a memory (M), and a communication interface (C). Machines differ in the balance of component performance and in the mechanisms used for communication and synchronization between the nodes. with improving technology and incremental architecture improvements, they do not provide significant differentiation between machines. With a convergence in machine organization, balance and mechanisms become central architectural issues and serve as the major points of differentiation. This paper explores two ideas related to balance and mechanisms. First, we propose balancing machines by cost, rather than by capacity to speed ratios. Such costbalanced machines have a much higher ratio of processor to memory area and hence much greater processor throughput and memory bandwidth per unit cost compared to conventional machines. Cost-balanced machines are have a fine-grained physical structure. Each node is physically small and has a small amount of memory. Efficient operation with this fine-grained structure depends on high-performance communication between nodes and low overhead interaction mechanisms. The mechanisms that control the interaction between the nodes of a parallel computer determine both the 747 grain-size and the programming models that can be efficiently supported. By choosing a simple, yet complete, set of primitive mechanisms, a parallel computer can support a broad range of programming models and operate at a fine grain size. A fine-grain parallel computer with fast networks and efficient mechanisms has the potential to become a universal computer architecture in two respects. First, this class of machine has the potential to universally displace conventional (sequential and parallel) coarse-grained computers. Secondly, a simple yet efficient set of interaction mechanisms serves as the basis for a parallel computer that is universal in the sense that it runs any parallel programming system. The remainder of this paper explores the issues of balance and mechanisms in more detail. The next section identifies trends in conventional sequential processor architecture that have led to a cost-imbalance between processors and memory. Section 3 discusses how an opportunity exists to greatly improve the performance/cost of computer systems by correcting this imbalance. The next two sections deal with the two enabling technologies: Networks (Section 4) and Mechanisms (Section 5). Together these enable fine-grain machines to give sequential performance competitive with conventional machines while greatly outperforming them on parallel applications. Our experience in building and operating a prototype fine-grain computer is described in Section 6. 2 Trends in Sequential Architecture Two trends are present in the architecture of conventional computers: 1. The size of a processor relative to the size of its memory system is decreasing exponentially. 2. The time required for a processor to interact with an external device connected to its memory bus is increasing. The first trend is due to an attempt to balance computer systems by ratio of processor performance (i/s) to memory capacity (bits). In 1967, Amdahl [22] suggested that a system should have 8Mbits of memory for each Mi/s of processor performance. The processor performance/size ratio (i/s x cm2 ) benefits from technology improvements in both density and speed while the memory capacity/size ratio (bits/ cm2 ) benefits only from density improvements. Thus the processor to memory cost ratio for an Amdahl-balanced system scales inversely with speed improvements. Let K(67) denote the ratio of processor cost to memory cost for such an Amdahl-balanced system in 1967. Every Y years, the line width of the underlying semiconductor technology has halved. As a result, the area of both the processor and the memory was reduced by a factor of four [23]. At the same time, the processor speed increased by a factor of a. To keep such a system Amdahl-balanced, the capacity (and hence the size) of the memory must also be increased by a. Thus, the processor to memory ratio during year x > 67 is given by K( x) = K(67)a(67-X)/Y. For typical values of a = 3 and Y = 5 [23], K(92) = .004K(67). The cost of a conventional machine has become largely insensitive to processor size as a result of this exponential trend in the ratio of processor to memory size. Thus, processor designers have become lavish in their use of areal. Costly features such as large caches, complex data paths, and complex instruction-issue logic are added even though their marginal affect on processor performance (compared to a small cache and a simple organization ) is minor. As long as the size of the machine is dominated by memory, adding area to the processor has a small effect on overall size and cost. The second trend, the increase in external interaction latency, is due to the first trend, to the increasing difference in on-chip to off-chip signal energies, and to to deepening memory hierarchies. As processors get faster and memory size increases the number of processor cycles required to access memory increases. Modern microprocessor-based computers have a latency of 5-20 cycles for a main memory access and this number is increasing. At the same time, decreasing on-chip signal energies require greater amplification to drive off-chip signals. Also, as more levels of caching are introduced, the number of cycles expended before initiating an external memory reference increases and the memory interface becomes specialized for the transfer of cache lines. If a conventional processor is used in a parallel computer, its high external interaction latency limits its communication performance as the network must typically be accessed via the external memory interface. Whether this interface uses DMA to transfer data stored in memory (and possibly cached) or uses writes to a memory-mapped network port, each word of the message must traverse the external memory bus and the cost of initiating an external memory operation is incurred at least once. The slow external memory interface also contributes to the lack of agility in modern processors (that is, their slowness in responding to external events and switching tasks) because a great dea1 of processor state must be transferred to and from memory during these operations. These trends in conventional processor architecture 1 As a result of this lavish use of area, processor sizes have scaled slightly slower than predicted by the formula above. 748 make conventional processors ill-suited for use in a parallel computer. Current cost-insensitive processors are not cost effective in a machine with higher processor to memory ratio where the cost of the processor is an important factor. Their high external interaction latency severely limits their communication performance and their poor agility limits their ability to handle synchronization. This does not mean, however, that conventional instruction set architectures (ISAs) are unsuitable for parallel computing. Rather it is the cost-insensitive design style, deep memory hierarchies, and poor agility that are the problem. As we will see in Section 5, a conventional ISA can be extended with a few instructions to provide an efficient set of parallel mechanisms. Most importantly, the trend toward ever higher memory to processor size ratios has created an enormous opportunity for parallel computing to improve the performance/ cost of computers. By adding more processors while keeping the amount of memory constant, the performance of the machine is dramatically increased with little impact on cost. The current trend, however, of building parallel computers by simply replicating workstation-sized units (increasing processors and memory proportionally) does not exploit this advantage. The processor to memory ratio must be decreased to improve efficiency. This theme is explored in more detail in the next section. 3 Balance Balance, in the context of computer architecture, refers to the ratios of throughput, latency, and capacity of different elements of a computer. In this section we will explore the balance between processor throughput, memory capacity, and network throughput in a parallel computer. A case will be made for balancing machines based on cost 2. Traditionally, machines have been balanced by rules of thumb such as the one due to Amdahl discussed above. However, a more economical design results if a machine is balanced based on cost. A machine is cost-balanced when the incremental peformance increase due to an incremental increase in the cost of each component is equal. Let each component ki in a machine with performance P have cost Ci, then the machine is cost-balanced if OP/OCi = oPjocj;Vi,j [7]. It is difficult to solve these balance equations because (1) no analytic function exists that relates system performance to component cost and (2) this relationship varies greatly depending on the application being run. Also, analyzing existing applications can be misleading 2Much of the material in this section is based on a joint work in progress with Prof. Anant Agarwal of MIT. as they have been tuned to run on particular machines and hence reflect the balance of those machines. A workable approach is to start from the present memory-dominated system and increase the processor and network costs until they reach some fraction of total cost, for example 10%. At this point the system costs a small fraction more than a conventional system. If designed with an appropriate communication network (Section 4) and mechanisms (Section 5), it should provide sequential performance comparable to that of a conventional machine. Applications that are parallelized to take advantage of the machine can potentially speed up by the entire increase in processing cost. To make reasonable balancing decisions, it is important to use manufacturing cost, not component price, as our measure of cost. This avoids distorting our analysis due to the widely varying pricing policies of semiconductor vendors. To simplify our analysis of cost, we will use silicon area normalized to half a minimum line width, >., as our measure of cost [27]. First consider the issue of processor to memory balance. There are two issues: (1) how large a processor to use on each node and (2) how much memory per processor. A 64-bit processor with floating point but no cache and simple issue logic currently costs about 100M).2, about the same as 500Kbits of DRAM, and has a performance of 50Mi/s. Making a processor larger than this gives diminishing returns in performance as heroic efforts are made to exploit instruction-level parallelism [20]. A smaller processor may improve efficiency slightly. If we are allocating 10% of our cost to processors, we will build one processor for every 5Mbits of memory rounding up this gives one processor per MByte. In today's technology a processor of this type with IMByte of memory can easily be integrated on a single chip. In comparison, an Amdahl-balanced machine would provide 64MBytes of memory for each processor and be packaged in 30-50 chips. Providing a small cache memory for the processor is cost effective; however a large cache and/ or a secondary cache are not. Adding a small 4KByte I-cache and Dcache requires about 16M).2 of area and greatly boosts processor performance achieving hit rates greater than 90% on many codes [35]. Making the cache much larger or deepening the memory hierarchy would greatly increase processor area with a very small return in performance. Also, using a small co-located memory reduces processor access time to DRAM memory. The network to memory balance is achieved in a similar manner, by adding network capability until cost is increased by a small fraction. A great deal of network performance comes at very little cost. The PC (printedcircuit) boards on which the processor-memory chips are mounted have a certain wiring capacity and the periphery of the chips can support a certain number of I/O 749 pads3 . The network can make use of most of these pin and wire resources at a very small cost. The cost of the network router itself is small; a competant router can be built in less than 10MA 2 [16]. For example, a router on an integrated processor-memory chip could easily support 6 16-bit wide channels from which a 3-D network can be constructed (Section 4). Conventional PC boards and connectors can easily handle these signals. Attempting to increase network bandwidth beyond this level becomes very expensive. To add more channel pins, the router must be moved to a separate chip or even split across several chips incurring additional overhead for communication between the chips. These chips are pad-limited and most of their area is squandered. If the amount of memory per node is increased proportionally to the cost of the network router to hold the memory to network cost ratio constant, the network bandwidth per bit of memory decreases (and the processor to memory ratio is distorted). A computer design can be approximately cost-balanced by using technology constraints to determine the processor/memory /network ratios. A simple three step method gives a well cost-balanced system: 1. Size the processor to the knee of its performance/ cost curve to get a cost effective processor. 2. Set the processor to memory ratio to allocate a fixed fraction I (in the example above 0.1) of cost to the processor to get a machine that is within 1/1 - I of the optimium cost. 3. Holding processor and memory sizes constant, size the network to the knee of its performance/cost curve to get a cost effective network. Machines that are cost-balanced using this method offer aggregate processor performance and local memory bandwidth that is 50 times that of an Amdahl-balanced machine per unit cost. This performance advantage will expand by a factor of a every Y years. Why are coarse-grained Amdahl-balanced machines widespread both in uniprocessors and parallel computers? In uniprocessors, the number of processors is not a free variable. Thus the designer is driven to increase the size and cost of a single processor far past the knee of its performance/cost curve. Existing parallel computers are driven to a coarse grain-size because (1) they are built using processors that lack appropriate mechanisms for communication and synchronization, (2) their networks are too slow to provide fast access to all memory in the machine[2], 3Typical PC boards support 20wires/cm on each of 4-8 wiring layers. Typical ICs support 100pads/cm along their periphery with 20-50% of these pads reserved for power. and (3) converting software to run in parallel on these machines requires considerable effort [21]. Much of the difficultly associated with (3) is due to the partitioning requied to get good performance because of 1 and 2. For cost-balanced machines to be competitive, increasing the number of processors must (1) not substantially reduce single-processor performance and (2) must provide the potential for near-linear speedup on certain problems. To retain single-processor performance on a machine with a small amount of memory per node, the network and processor communication mechanisms must provide a single processor access to any memory location in the machine in time competitive with a main memory access in a conventional machine. Singleprocessor performance depends on network latency. To provide speedup on parallel applications, the processor's communication and synchronization mechanisms must provide for low-overhead interaction and the network throughput must be sufficient to support the parallel communication demands. Parallel speedup depends on throughput and agility. The two key technologies for building cost-balanced machines are efficient networks, and processor mechanisms for communication and synchronization. The next two sections explore these technologies in more detail. 4 Network Architecture and De- . sIgn The interconnection network is the key component of a parallel computer. The network accepts messages from each processing node of a parallel computer and delivers each message to any other processing node. Latency, T, and throughput, As, characterize the performance of a network. Latency is the time (s) from when the first bit of the message leaves the sending node to when the last bit of the message arrives at the receiving node. Aggregate throughput AsN is rate of message delivery (bits/s) when the network is fully loaded. T must be kept low to achieve good performance for sequential codes and for the portions of parallel codes where the parallelism is insufficient to keep the machine busy. During these periods performance is latencylimited and execution time is proportional to T. During periods where there is abundant parallelism, performance is throughput limited. Recent developments in network technology give throughputs and latencies that approach physical and information theoretic bounds given pin and wire constraints. A detailed discussion of this technology is beyond the scope of this paper. This section briefly summarizes the major results. An interconnection network is characterized by its topology, routing, and flow control [11]. The topology of 750 ~~oor-----~----~----~------~----~ , () c ~ , - - , ._ •• - 2500 2000 c~ - , ...... 'Mre Delay ,, ,,, ,, ,, B Conventional Cube Flat Express Cube, 1.16 " Hierarchical Express Cube, 1-4, 1.3 1500 1000 IV¥ , D~ 500 i ., .... " " ,., , ....... . ................................ °0~----~50------10~0-----1-W------2-00-----2~50 Distance Figure 2: Insertion of express channels into a k-ary 3cube gives performance within a small factor of physical limits: (A) One dimension of a regular k-ary 3-cube network, (B) Inserting one-level of express channels optimizes the ratio of wire to node delay for messages travelling long distances, (C) Hierarchical express channels also reduce the number of switching decisions to the minimum, logq N. (D) Adding multiple channels at each level adjusts network bisection bandwidth to maximize throughput. a network is the arrangement of nodes and channels into a graph. Routing specifies how a packet chooses a path in this graph. Flow control deals with the allocation of channel and buffer resources to a packet as it traverses this path. The topology strongly affects T since it determines (1) how many hops H a message must make, (2) the total wire distance D (cm) that must be traversed, and (3) the channel width W (bits) which is limited by the bisection width of the wiring media divided by the channel bisection of the network 4 • The latency seen by a single message in a network with no other traffic (zero-load latency or To) is directly determined by these three factors: D L To = HTn + -;- + W r (1) Where Tn is the propagation delay of a node (s), v is the signal propagation veloci ty 5 (cm/ s), and f is the wire bandwidth (S-l). The three-dimensional express cube topology [12], a k-ary 3-cube with express channels added to skip intermediate hops (Figure 2B) when travelling large distances, can simultaneously optimize H, D, and AsN to 4In some small networks, Vi is constrained by component or module pinout and not by bisection width. 5Typically v is a fraction of the speed of light O.3c S; v S; c. Latency vs. Distance for Express Cubes Figure 3: Latency as a function of distance for a hierarchical express channel cube with i = 4, 1 = 3, 0: = 64, and a flat express channel cube with i = 16, 0: = 64. In a hierarchical express channel cube latency is logarithmic for short distances and linear for long distances. The crossover occurs between D = a and D = ia logi a. The flat cube has linear delay dominated by Tn for short distances and by Tw for long distances. achieve performance that is within a small fraction of physical and information-theoretic limits. The number of hops H is bounded by logqN if a q way decision is made at each step. The express cube network achieves this bound by inserting a hierarchy of interchanges into a k-ary n-cube network (Figure 2C). The wire distance, D is kept to within 2- 1 / 3 of the physical minimum by always following a manhattan shortest path. Finally, the number of network channels can be adjusted to use all available wiring capacity (Figure 2D). Figure 3 compares the performance of flat and hierarchical express cubes with a regular k-ary n-cube and a wire with no switching. The ratio of the delay of a node, T - n, to the delay of a wire between two adjacent nodes, D( 1) / v, is denoted a = Tn V / D (1). The figure assumes a = 64. The figure shows that a flat express cube decreases delay to a multiple of wire delay determined by the ratio 0: to interchange spacing, i. Interchange spacing is set to the square-root of the distance to balance the delay due to local channels with the delay due to express channels. The hierarchical cube with three levels (1 = 3) permits small interchange spacing and allows local and global delays to be optimized simultaneously. The advantages of minimum H and maximum AsN achieved by the express cube topology are important for very large networks. For smaller networks (less than 4K nodes), however, a simpler three-dimensional torus or mesh network, k-ary 3-cube, is usually more cost ef- 751 fective. The 3-D mesh also provides manhattan shortest paths in physical space to keep D near minimum, has a very regular structure, and uses uniformly short wires simplifying the electrical design of the network. :300 ~ .2- g 250 ~ Three-dimensional networks are required to obtainin adequate throughput for machines larger than 256 nodes. As machines grow, the throughput per node varies inversely with the number of nodes in a row, as N 1 / 2 for a 2-D network and as N 2 / 3 for a 3-D network. 3-D networks provide adequate throughput up to 4K nodes (16 nodes per row). Beyond this point express cubes and/or careful management of locality is required. For machines of 256K or larger, express cubes become bisectionlimited and locality must be exploited. No cost-effective network can scale throughput linearly with the size of the machine. Above a certain size, all networks become bisection-width limited and hence have a throughput that grows as N 2 / 3 • Figure 4: Latency as a function of offered traffic for a 2ary 8-fly network with 1, 2, 4, 8, and 16 virtual channels per physical channel. Routing, the assignment of a path to a message, determines the static load balance of a network. Most routers built to date have used deterministic routing where the path depends only on the source and destination nodes. Deterministic routers can be made simple and fast, and deadlock avoidance becomes much easier. In particular, deterministic routing in dimension order permits the switch to be cleanly partitioned [17]. For some traffic patterns, deterministic routing results in a degradation in performance due to channel load imbalance. However, for most cases deterministic routing has proved adequate. plexing them on demand, a network loaded with uniform traffic can operate at 90% of its peak channel capacity. In comparison, the throughput of a network with only a single buffer per node saturates at 20% to 50% of capacity depending on the topology and routing. Virtual channel flow control uses several small, independent buffers in place of a single large queue to more efficiently use valuable router storage. Figures 4 and 5 show the effect of adding virtual channels to the latency and throughput of 2-ary n-fly networks. Several adaptive routing algorithms have been proposed [14, 4, 25] that are capable of dynamically detecting and correcting channel load imbalance. Adaptive routers also are able to route around a number of faulty nodes and channels. Most adaptive routers require much more complex logic than deterministic routers. The planar adaptive routing algorithm [4] is particularly attractive in that it retains much of the simplicity of dimensionorder routing. Flow control involves dynamically allocating buffer and channel resources to messages in the network. Most parallel computer networks use wormhole routing [8] in which buffers are allocated to messages while channels are allocated to flow-control digits or flits. To keep routers small and fast, channel buffers are often shorter than messages. Thus it is possible for a message to be blocked on the receiving side of a channel while part of the message remains on the transmitting side. With only a single buffer per channel, blocking a message on the transmitting side would idle the channel wasting network resources. Virtual-channel flow control permits messages to pass blocked messages and make use of what would otherwise be idle channels [13]. By associating several buffers (virtual channels) with each physical channel and multi- 200 150 ••..•• 1 Lanes - - - 2Lanes _. - 4 Lanes ._ •• - 8Lanes 16Lanes 100 50 0 0.0 0.2 0.4 0.6 0.8 1.0 Traffic(fraclion of capacity) The network technology described above is able to meet the goal of providing global memory access with a latency comparable to that of a uniprocessor. Compare for example a 64-node 3-D torus with 1MByte per node with a comparably sized single processor machine with 64Mbytes. Both of these machines will fit comfortably on a desktop. Since network channels are uniformly short it is customary to operate them at twice the processor rate [10] (or more [5]). For our comparison we will use a processor rate of 50MHz and a network clock of 100MHz. The 64-node torus requires an average of 6 hops to reach any node in the machine (HTn =60ns). A message of six 16-bit flits (L/Wj=60ns) is sent in each direction for a read operation. The composition time of the message and the initiation of the memory access can be overlapped with this L/W j. Thus the one-way communication time is 120ns. The memory access itself takes lOOns. Adding the reply communication time (again terminal operations are overlapped with the L/W j time) gives a total access time of 340ns. The uniprocessor requires 1 cycle to get off chip, 2 cycles to get across a bus, and 1 cycle to initiate the memory operation (80ns total). Again the memory read itself is lOOns and the reply across the bus requires another SOns for a total of 752 ~ I 1.0 '0 0.8 iI 0.6 r----r-----r---,....----r---- f ~ 0.4 0.2 ...... n .14 0.0 0~--.......----+---'~2----I'6-----'20 Number 01 Virtual Channels Figure 5: Throughput of 2-ary n-fly networks with virtual channels as a function of the number of virtual channels. 260ns. Thus the uniprocessor is only 80ns or 24% faster. Much of the additional delay can be attributed to the fact that the parallel computer network has more decisions to make during routing and is able to handle many messages simultaneously. While these capabilities have a slight negative affect on latency, they give a significant throughput advantage. To see the throughput advantage, consider the problem of rotating a matrix about its center row. To perform one 64-bit move, the conventional machine requires two memory cycles or 520ns for a rate of 123Mbits/s. With an interleaved memory and a lockup-free memory interface (which few processors have) it could overlap operations to complete one every 160ns for a rate of 400Mbits/s. The parallel computer on the other hand can apply its entire bidirectional bisection bandwidth of 256 16-bit channels to the problem for a total bandwidth of 409.6GBits/s. In summary, modern interconnection network technology gives latency comparable to conventional memory access times with throughput orders of magnitude higher. Raw network performance solves only half of the communication problem, however. To use such a network effectively requires efficient communication mechanisms. 5 Mechanisms Mechanisms are the primitive operations provided by a computer's hardware and systems software. The abstractions that make up a programming system are built from these mechanisms [18, 9]. For example, most sequential machines provide some mechanism for a pushdown stack to support the last-in-first-out (LIFO) storage .allocation required by many sequential models of computation. Most machines also provide some form of memory relocation and protection to allow several processes to coexist in memory at a single time without interference. The proper set of mechanisms can provide a significant improvement in performance over a bruteforce interpretation of a computational model. Over the past 40 years, sequential von Neumann processors have evolved a set of mechanisms appropriate for supporting most sequential models of computation. It is clear, however, from efforts to build concurrent machines by wiring together many sequential processors, that these highly-evolved sequential mechanisms are not adequate to support most parallel models of computation. These mechanisms do not support synchronization of events, communication of data, or global naming of objects. As a result, these functions, inherent to any parallel model of computation, must be implemented largely in software with prohibitive overhead. For example, most sequential machines require hundreds of instructions to create a new process or to send a message. This cost prohibits the use of fine-grain programming models where processes typically last only' a few tens of instructions and messages contain only a few words. It is not hard to construct mechanisms that permit tasks to be created and messages sent in a few instruction times; however, these mechanisms are not to be found on conventional processors. Some parallel computers have been built with mechanisms specialized for a particular model of programming, for example dataflow or parallel logic programming. However, our studies have shown that most programming models require the same basic mechanisms for communication, synchronization, and naming. More complex model-specific mechanisms can be built from the basic mechanisms with little loss in efficiency. Specializing a machine for a particular programming model limits its flexibility and range of application without any significant gain in performance. In the remainder of this section, we will examine mechanisms for communication, synchronization, and naming in turn. Communication between two processing nodes volves the following steps: In- 1. Formatting: gathers the message contents together. 2. Addressing: selects the physical destination for the message. 3. Delivery: transports the message to the destination. 4. Allocation: assigns space to hold the arriving message. 5. Buffering: stores the message into the allocated space. 753 6. Action: carries out a sequence of operations to handle the message. All programming models use a subset of these basic steps. A shared memory read operation, for example, uses all six steps. A read message is formatted, the address is translated, the message is delivered by the network, the message is buffered until the receiving node can process it, and finally a read is performed and reply message is sent as the action. Some models, such as synchronous message passing always send messages to preallocated storage and thus omit allocation (step 4). In some cases, no action is required to respond to a message and step 6 can be omitted. The SEND instruction, first used in the message-driven processor [15, 16], with translation of destination addresses [19] efficiently handles the first two steps: formatting and addressing. A message is sent with a sequence of SEND instructions followed by a SENDE instruction. A SEND instruction takes a number of arguments equal to the number of read register ports (typically two) and appends its arguments to a message. A SENDE instruction is identical to the SEND except that it also signals the end of the message. The first SEND after a SENDE starts a new message. By making full use of the register bandwidth the SEND instruction reduces formatting overhead to a minimum. The alternative approaches of formatting a message (1) in memory or (2) by writing to a memory mapped network port have much lower bandwidth and higher latency. Translation is achieved by interpreting the first word of the message stream (the first argument of the first SEND) as a virtual destination address and translating it to a physical address when a message is sent. A simple translation-Iookaside buffer (TLB) efficiently performs this translation. This approach of translating virtual network addresses to physical addresses during the SEND operation permits message sends from user code to be fully protected without incurring the overhead of a sytem call (as is done on many machines today). User code is only permitted to send messages to addresses that are entered in TLB. Sending a message to any other address raises an exception. Communication operations that do not require allocation and or remote action can use a subset of the basic mechanism. A remote write operation, for example, requires neither of these functions. A voiding allocation and action in this case eliminates the overhead of copying the message from newly allocated storage to its final destination. The first SEND instruction of a message can specify whether allocation (.A suffix) and/or spawning a task (. S suffix) are required [19]. A SEND with no suffix would simply perform a remote write, SEND. A would allocate but not initiate a remote action, and SEND. SA would do both. The sending node treats these three SEND operations identically and simply sends along the two option bits with the message. The receiving node examines the option bits to determine whether allocation and/or action is required. If an action is required, the routine to be invoked is specified by the second word of the message. Storage allocation and message buffering must be performed in hardware to achieve adequate performance. While approaches using stack (LIFO) or queue (FIFO) based storage are simple to implement [10], they may require copying if messages are not deallocated in order. An alternative is to allocate message buffers off a free list of fixed-sized segments [40]. Management of such a free list is simple (only a single pointer is required) and it does not restrict message lifetimes. Messages too long for the fixed-sized segments can be handled in an overflow area. With any allocation scheme, a method for handling message buffer overflow is required. Because handling an overflow may require access to other nodes, the network must be usable even when a full buffer is causing messages to back up into the network. This is accomplished on the J-Machine by using two virtual networks [10]. The actual overflow handling may be performed in software as it is a rare event. While many strategies may be used to handle overflow, a simple one is to return overflowing messages to their senders. With this scheme each node must guarantee that it has storage to hold each message it originates until it is acknowledged. The final step of a communication operation is to initiate a remote action by creating and dispatching a task. A task or process consists of a thread of control and an addressing environment. A thread can be created in a few clock cycles by loading a processor's IP to set the thread of control and initializing its memory management registers to alter the addressing environment. On the J-Machine, each message in the message queue is treated as a thread that is ready to run and threads are dispatched when they reach the head of the queue This dispatching on message arrival also serves as the basis of a synchronization mechanism. Synchronization enforces an ordering of events in a program. It is used, for example, to ensure that one process writes a memory location before another reads it, to provide mutual exclusion during critical sections of code, and to require all processes to arrive at a barrier before any processes leave. Any synchronization mechanism requires a namespace that processes use to refer to events, a method for signalling that an event is enabled, and a method for forcing a processor to wait on an event. Using tags for synchronization, as with the presence bits on the HEP [36], uses the memory address space as the synchronization namespace. This provides a large synchronization 754 namespace with very little cost as the memory manangement hardware is reused for this function. It also has the benefit that when signaling the availability of data, the data can be written and the event signaled in a single memory operation. Since it naturally signals the presence of data, we refer to this synchronization using tags on memory words as data synchronization[40]. With synchronization tags, an event is signaled by setting the tag to a particular state. A process can wait on an event by performing a synchronizing access of the location which raises an exception if the tag is not in the expected state. A synchronizing access may optionally leave the tag in different state. Simple producer/consumer synchronization can be performed using a single state bit. In this case, the producer executes a synchronizing write which expects the tag to be empty and leaves it full. A synchronizing read which expects the location to be full and leaves it empty is performed by the consumer. If the operations proceed in order, no exceptions are raised. An attempt to read before a write or to write twice before a single read raises a synchronization exception. More involved synchronization protocols require additional states (for example to signal that a process is waiting on a location) [19]. a The communication mechanism described above complements data synchronization by providing a means for a process on one node to signal an event on a remote node. In the simplest case, a message handler can perform a synchronizing read or write operation. However, it is often more efficient to move some computation to the node on which the data is resident. Consider for example the problem of adding a value to a remote location 6 . One could perform a remote synchronizing read that marks the location empty to gain exclusive access, perform the add, and then perform a remote synchronizing write. Sending a single message to invoke a handler that performs the read, add, and write on the remote node, however, reduces the time to perform the operation, the number of messages required, and the amount of time the location is locked. Many machines have implemented some form of global barrier synchronization. For example, the Caltech Cosmic Cube [32] had four program accessible wire-or lines for this purpose. While global barrier synchronization is useful for some models, it can be emulated rapidly using communication and data synchronization. If there is sufficient slack time from when a process signals that it has reached the barrier to when it waits on the barrier, this emulation will not affect program performance. The required amount of slack time varies logarithmically with the number of processors performing the barrier. Also, the major use of barrier synchronization (inserting a barrier between code that produces a structure 6This occurs for example when performing LU decomposition of a matrix. (e.g., array) and code that consumes the structure) is eliminated by data synchronization. By synchronizing in the data space on each individual element of the data structure, control space synchronization on the program counter between the producer and consumer is neither required nor desired. It is more efficient to allow the producer and consumer to overlap their execution subject to data dependency constraints. Barrier synchronization mechanisms also have the disadvantage that they require a separate namespace which tends to be small because of the prohibitive cost of providing many simultaneous barriers, and they consume pin and wire resources that could otherwise be used to speed up the general communication network. The mechanism that enforces event ordering solves only half of the synchronization problem. Efficient synchronization also requires an agile processor that can rapidly switch processes and handle events and messages to reduce the exception handling and context switching overhead when switching processes while waiting on an event. Rapid task switching can be provided by providing multiple register sets or a named-state register set [29]. Exception handling is accelerated by specifically vectoring exceptions, providing separate registers for exception handling, and explicitly passing arguments to exception handlers [19]. 6 Experience In the Concurrent VLSI Architecture Group at MIT, we have built the J-Machine [10], a prototype fine-grain parallel computer with a high-speed network and efficient yet general communication and synchronization mechanisms. The J-Machine was built to test and evaluate our ideas on mechanisms and networks, as a proof of concept for this class of machine, and as a testbed for parallel software research. Small prototoypes have been operational since June of 1991. We expect to have a 1024-processor J-Machine on-line during the summer of 1992. The J-Machine communication mechanism permits a node to send a message to any other node in the machine in < 1.5flS. On message arrival, a task is created and dispatched in 200ns. A translation mechanism supports a global virtual address space. These mechanisms efficiently support most proposed models of concurrent computation and allow parallelism to be exploited at a grain size of 10 operations. The hardware is an ensemble of up to 65,536 nodes each containing a 36-bit processor, 4K 36-bit words of on chip memory, 256K words of DRAM, and a router. The nodes are connected by a high-speed 3-D mesh network with deterministic dimension order routing. The J-Machine has about the grain 755 512·144 bit SRAM 512·144 bit SRAM (2Kwords) (2Kwords) Internal MemOry Interface Internal Memory Interface X Router II II Y Router Z Router .----- I II Address Arithmetic Unit (Datapath) $: c ~ Pre' Net Input Diag Em"," Memory EJ Interface II NM~~"I I lelKI 2 Q. - 0 ::D > r c (') 2- l~ 0 ::::I [ Registers Arithmetic/logic Unit (Datapalh) - Figure 6: FloOl'plan and Photograph of a MessageDriven Processor chip, size of the cost-balanced machine described in Section 3, one processor per megabyte of memory. A photograph of the message-driven processor chip used in the J-Machine is shown in Figure 6. One of these chips combined with three external DRAM parts forms a J-Machine node. An array of 64 nodes is packaged on a single board (Figure 7). These boards are stacked and connected side-to-side to form larger J-Machines. Three software systems are currently operational on the J-Machine. It runs Concurrent Smalltalk (CST) [24], a version of Id based on the Berkeley TAM system [37, 6], and a dialect of "C". Execution of these diverse programming systems has demonstrated the efficiency and flexibilty of the J-Machine mechanisms. Table 1 shows the advantage of efficient mechanisms. The left column of the table lists the operations involved in performing a remote memory reference on a l024-node parallel computer. The next two columns list the approximate number of instruction times required to perform each operation on the Intel Paragon [5] and Operation Send 4- Word Message Network Delay Buffer Allocation Switch To Handle Msg Presence Test Send 3-Word Return Msg Network Delay Buffer Allocation Switch To Handle Msg Switch To Restart Task TOTAL Paragon 600 32 20 1000 5 600 32 20 1000 1000 4309 J-Machine 3 10 0 10 0 3 10 0 3 10 49 Ideal 2 10 0 1 0 2 10 0 1 1 27 Table 1: The time to perform a remote memory reference on the Intel Paragon, a conventional messagepassing multicomputer, the J-Machine, a fine-grain parallel computer, and the time that could be achieved with current technology (Ideal). Switch refers to a task switch. 756 Figure 7: Photograph of a 64-node J-Machine board. the J-Machine. Many of these times were derived from the study reported in [38]. The final column of the table shows the times that could be achieved with techniques that are currently understood. The table shows that while both machines have fast networks the time to carry out a simple remote action is many times greater on the conventional machine. The single largest contributor is the task switching time 7 . The overhead of task switching in a conventional operating system is unacceptable in this environment. Even if the task switch time were reduced to zero, the overhead of sending a message8 in a system where this function is handled in software is still prohibitive. End-to-end hardware support for communication is required to achieve acceptable latency. The rightmost column represents times that could be achieved by making some minor modifications to the J-Machine. In particular, task switch time could be reduced from 10 cycles (when registers need to be saved) or 3 cycles (w /0 register save) to a single cycle by providing more support for multithreading [29, 39]. The J-Machine would also benefit from more user registers, automatic destination translation on message send, being able to subset the communication operation, and a . 7The estimate of 1000 instruction times or 25/1-s for the i860 extrapolated from other microprocessors and hence very generOUS; because of the complexity of event handling on this chip the actual number is higher. ' 8S ome receive time is also included in this number. IS non-LIFO message buffer. 7 Related Work Like the message-driven processor from which the MIT J-Machine is built, the Caltech MOSAIC [33], Intel iWARP [3], and INMOS Transputer [26] are integrated processing nodes that incorporate a processor with memory and communication on a single chip. These integrated nodes, however, lack the efficient mechanisms of the MDP and thus cannot efficiently support many different models of computation. Also, the softwarerouted, bit-serial Transputer network does not have adequate performance for many applications. Many machines built for a specific model of computation have been generalizing their mechanisms. For example, the MIT Alewife machine [1], while specialized for the shared-memory model, provides an interprocessor interrupt facility that can be used for general message-passing. Being memory mapped, this operation is somewhat slower than the register-based send operation described above. Dataflow machines, which once hard-wired a particular dataflow model into the architecture [30, 34], have also been moving in the direction of general mechanisms with the EM4 [31] and *T [28]. 757 8 Conclusion Two enabling technologies, fast networks (Section 4) and efficient interaction mechanisms (Section 5), make it possible to build and program fine-grain parallel computers. Fine-grain machines have much less memory per processor than conventional machines because they are balanced by cost, rather than by capa.city to speed ratios. Increasing the processor to memory ratio improves the processor throughput and local memory bandwidth by a factor of 50 with only a small increase in system cost. We expect this dramatic performance/ cost advantage will lead to mechanism-based fine-grain parallel computers becoming universal, replacing sequential computers in all sizes of systems from personal desktop computers to institutional supercomputers. This universal parallel computer will not happen with existing semiconductor price structures, where processor silicon is an order of magnitude more expensive per unit area than memory silicon. Cost effective fine-grain computing requires a true jellybean (inexpensive and plentiful) processingnode chip. Low-latency networks enable each node in a fine grain machine to access any memory location in the machine in time competitive with a global memory access in a conventional machine. Thus, the small memory per node does not limit either the problem size that can be handled or sequential execution speed. A fine-grain machine can execute sequential programs with performance competitive with conventional machines. High-bandwidth networks and efficient interaction mechanisms enable fine-grain computers to apply their high aggregE.te processor throughput and memory bandwidth with minimum overhead. Reducing interaction overhead to a few instruction times (Table 1) increases the amount of parallelism that ca.n be economically exploited. It also simplifies programming as tasks and data structures no longer have to be grouped into large chunks to amortize large communication, synchronization, and task-switching overheads. At MIT we have built and programmed the J-Machine to test, evaluate, and demonstrate our network and mechanisms. By running three programming systems on the machine, we have demonstrated the flexibility of its mechanisms and generated some ideas on how to improve them. The next step is to work to commercialize this technology by developing a more integrated and higherperformance processing node in today's technology and by providing bridges of compatibiliity to existing sequential software. References [1] Anant Agarwal et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor. In Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, 1991. [2] David Bailey. The NAS Parallel Benchmarks. Presentation given in 1991. [3] Shekhar Borkar et al. iWARP: An Integrated Solution to High-Speed Parallel Computing. In Proceedings of the Supercomputing Conference, pages 330-338. IEEE, November 1988. [4] Andrew A. Chien and Jae H. Kim. Planar-Adaptive Routing: Low-cost Adaptive Networks for Multiprocessors. In Proceedings of the International Symposium on Computer Architecture, Queensland, Australia, May 1992. IEEE. [5] Intel Corporation. Paragon XP IS. Product Overview, 1991. [6] David E. Culler et al. Fine-Grain Parallelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 164-175. ACM, April 1991. [7] William J. Dally. Directions in Concurrent Computing. In Proceedings of the International Conference on Computer Design, pages 102-106. IEEE, October 1986. Conference at Port Chester, New York. [8] William J. Dally. Wire-Efficient VLSI Multiprocessor Communication Networks. In Paul Losleban, editor, Proceedings of Stanford Conference on Advanced Research in VLSI, pages 391-415. MIT Press, 1987. [9] William J. Dally. Mechanisms for Concurrent Computing. In Proceedings of the International Conference on Fifth Generation Computer Systems, pages 154-156, December 1988. [10] William J. Dally. The J-Machine System. In Patrick Winston with Sarah A. Shellard, editor, Artificial Intelligence at MIT: Expanding Frontiers, chapter 21, pages 536-569. MIT Press, 1990. [11] William J. Dally. Network and Processor Architecture for Message-Driven Computers. In Suaya and BirtwhistIe, editors, VLSI and Parallel Computation. Morgan Kaufmann, 1990. [12] William J. Dally. Express Cubes: Improving the Performance ofk-ary n-cube Interconnection Networks. IEEE Transactions on Computers, pages 1016-1023, September 1991. [13] William J. Dally. Virtual-Channel Flow Control. IEEE Transactions on Parallel and Distributed Systems, 3(2), March 1991. 758 [14] William J. Dally and Hiromichi Aoki. Adaptive Routing using Virtual Channels. IEEE Transactions on Parallel and Distributed Computing, 1992. [15] William J. Dally et al. Architecture of a MessageDriven Processor. In Proceedings of the 14th International Symposium on Computer Architecture, pages 189-205. IEEE, June 1987. [16] William J. Dally et al. Design and Implementation of the Message-Driven Processor. In Proceedings of the 1992 Brown/MIT Conference on Advanced Research in VLSI and Parallel Systems. MIT Press, March 1992. [17] William J. Dally and Paul Song. Design of a Self-Timed VLSI Multicomputer Communication Controller. In Proceedings of the International Conference on Computer Design, pages 230-234. IEEE, October 1987. [18] William J. Dally and D. Scott Wills. Universal Mechanisms for Concurrency. In G. Goos and J. Hartmanis, editors, Proceedings of PARLE-89, pages 19-33. Springer-Verlag, June 1989. [19] William J. Dally, D. Scott Wills, and Richard Lethin. Mechanisms for Parallel Computing. In Proceedings of the NATO Advanced Study Institute on Parallel Computing on Distributed Memory Mutliprocessors. Springer, 1991. [20] J. A. Fisher and B. R. Rau. Instruction-Level Parallel Processing. Science, pages 1233-1241, September 1991. [21] Geoffrey Fox et al. Solving Problems on Concurrent Computers. Prentice Hall, 1988. [22] John L. Hennessy and Patterson David A. Computer A rchitecture a Quantitative Approach. Morgan Kaufmann, San Mateo, 1990. [23] John 1. Hennessy and Norman P. Jouppi. Computer Technology and Architecture: An Evolving Interaction. Computer, pages 18-29, September 1991. [24] Waldemar Horwat, Andrew Chien, and William J. Dally. Experience with CST:Programming and Implementation. In Proceedings of the ACM SIGPLAN 89 Conference on Programming Language Design and Implementation, 1989. [25] S. Konstantinidou and 1. Snyder. Chaos router: architecture and performance. In 18th Annual Symposium on Computer Architecture, pages 212-221, 1991. [26] InMOS Limited. IMS T424 Reference Manual. Order Number 72 TRN 00600, November 1984. [27] Carver A. Mead and Lynn A. Conway. Introduction to VLSI Systems. Addison-Wesley, Reading, Mass, 1980. [28] Rishiyur S. Nikhil, Gregory M. Papadopoulos, and Arvind. *T: A Multithreaded Massively Parallel Architecture. Computation Structures Group Memo 325-1, Massachusetts Institute of Technology Laboratory for Computer Science, November 15 1991. [29] Peter R. Nuth and William J. Dally. A Mechanism for Efficient Context Switching. In Proceedings of the International Conference on Computer Design. IEEE, October 1991. [30] Gregory M. Papadopoulos and David E. Culler. Monsoon: an Explicit Token-Store Architecture. In The 17th Annual International Symposium on Computer Architecture, pages 82-91. IEEE, 1990. [31] S. Sakai et al. An Architecture of a Dataflow Single Chip Processor. In Proceedings of the 16th Annual Symposium on Computer Architecture, pages 46-53, 1989. [32] Charles 1. Seitz. The Cosmic Cube. Communications of the ACM, 28(1):22-33, January 1985. [33] Charles L. Seitz et al. Sub micron Systems Architecture. Semiannual Technical Report Caltech-CS-TR-9005, Department of Computer Science, California Institute of Technology, March 15 1990. [34] Toshio.Shimada, Kei Hiraki, Kenji Nishida, and Satosi Sekiguchi. Evaluation of a Prototype Data Flow Processor of the Sigma-l for Scientific Computations. In 13th Annual International Symposium on Computer Architecture, pages 226-234. IEEE, June 1986. [35] Alan Jay Smith. Cache Memories. Computing Surveys, 14(3):473-530, September 1982. [36] Burton J. Smith. Architecture and applications of the HEP multiprocessor computer system. In SPIE Vol. 298 Real-Time Signal Processing IV, pages 241-248. Denelcor, Inc., Aurora, Col, 1981. [37] Ellen Spertus and William J. Dally. Experiments with Dataflow on a General-Purpose Parallel Computer. In Proceedings of International Conference on Parallel Processing, pages II231-II235, Aug 1991. [38] Brian Totty. Experimental Analysis of Data Management for Distributed Data Structures. Master's thesis, University of Illinois, 1991. [39] Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In The 16th Annual International Symposium on Computer Architecture, pages 273-280. IEEE Computer Society Press, 1989. [40] D. Scott Wills. Pi: A Parallel Architecture Interface for Multi-Model Execution. PhD thesis, Massachusetts Institute of Technology, May 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 759 An Automatic Translation Scheme from Prolog to the Andorra Kernel Language * Francisco Bueno bueno@fi.upm.es Manuel Hermenegildo t or herme@cs.utexas.edu herme@fi.upm.es Facultad de Informatica Universidad Politecnica de Madrid (UPM) 28660-Boadilla del Monte, Madrid - Spain Abstract The Andorra family of languages (which includes the Andorra Kernel Language -AKL) is aimed, in principle, at simultaneously supporting the programming styles of Prolog and committed choice languages. On the other hand, AKL requires a somewhat detailed specification of control by the user. This could be avoided by programming in Prolog to run on AKL. However, Prolog programs cannot be executed directly on AKL. This is due to a number of factors, from more or less trivial syntactic differences to more involved issues such as the treatment of cut and making the exploitation of certain types of parallelism possible. This paper provides basic guidelines for constructing an automatic compiler of Prolog programs into AKL, which can bridge those differences. In addition to supporting Prolog, our style of translation achieves independent and-parallel execution where possible, which is relevant since this type of parallel execution preserves, through the translation, the user-perceived "complexity" of the original Prolog program. 1 Introduction A desirable goal in logic programming language design is to support both the don't-know nondeterministic, search-oriented programming style of Prolog and the don't-care indeterministic, concurrent communicating agents programming style of committed-choice languages. Furthermore, from an implementation point of view it is interesting to be able to support the orand independent and-parallelism often exploited in the former (e.g. [Lus88, AK90, Ka187, HG90j) as well as the dependent and-parallelism exploited in the latter (e.g. [Cra90, IMT87, HS86]). The Andorra family of languages is aimed at simultaneously supporting -This work was funded in part by both ESPRIT project 2471 "PEPMA" and CICYT project 305.90. tPlease direct correspondence to Manuel Hermenegildo at the above address. these two programming paradigms and their associated modes of parallel execution. The Andorra proposal in [War] (called the "basic" andorra model, on which the Andorra-I system [SCWY90] is based) defined a framework which allowed or-parallelism and also the andparallel execution of deterministic goals (deterministic "stream and-parallelism"), this now being called the "Andorra Principle." An important idea behind the choice of control in the basic Andorra model is to perform the least possible amount of computation while allowing the maximum amount of parallelism to be exploited. Another and complementary way of achieving this goal which has also been identified [HR89, HR90] is to also run in parallel nondeterministic goals, but provided (or while) they are independent ("independent and-parallelism" lAP). In order to also include this type of parallelism the Extended Andorra Model (EAM) [War90, HJ90] defines an execution framework which allows lAP in addition to the forms of parallelism supported in the basic Andorra model. The EAM defines rules which specify a series of admissible steps of computation from each possible given state. Several rules can be admissible from a given state and this gives rise to both nondeterminism and indeterminism, and also to opportunities for parallel execution. One important issue within this framework is thus that of control: i.e. which of the admissible rules should be applied in order to achieve the most efficient execution while ,attaining the maximum parallelism. Two obvious approaches to treating the above mentioned issue are to put control decisions in the hands of the programmer or to try to do this automatically by compile-time and/or run-time analysis. The Andorra Kernel Language (AKL) [HJ90, JH91], uses explicit control. In particular, AKL allows (dependent) parallel execution of determinate subgoals, as stated by the Andorra Principle, but it also allows the more general forms of parallel execution of the EAM, albeit controlled by the programmer. The specification of control is done, among other mechanisms, by positioning 760 the goals and constraints before or after a guard operator, in a way that can be reminiscent of the labeling of unification as input or output (i.e. ask or tell constraints [Sar89]) in the GHC language [Ued87aj. These operators divide body clauses into two parts, the guard and the actual body. Guards are executed In independent environments and proceed unless they attempt to perform output unification, while bodies wait until guards are completely solved and goals in the body promoted. Such goals are then executed concurrently provided they are deterministic, in the spirit of the Andorra Principle. These properties give a means of control to the programmer which can be used to achieve parallel execution of general goals. The AKL is therefore quite a powerful language. However, it does put quite a burden on the programmer in requiring certain specification of control. In particular, Prolog programs cannot always be executed directly on the AKL. This is due to a number of factors from more or less trivial syntactic differences to mor~ involved issues such as the treatment of cut, labeling of unification, and making the exploitation of certain types of parallelism, most notably lAP, possible without user involvement and preserving the programmerperceived complexity of the original program. The objective of this paper is to investigate how the above mentioned differences can be bridged, through program analysis and transformation. It points out the non-trivial problems involved in performing such a translation, and then provides solutions for these problems. Although desirable, our aim at this point is not to provide the best possible translation, which would take advantage of AKL properties to achieve a large reduction of search space, but rather to bridge the gap between Prolog and AKL in a manner that no increment in the search space is done, and also lAP can be exploited (with the important result of achieving "stability" in the frame of AKL for these cases). Building on partial translation approaches presented in [JH90, Her90] the paper presents a basic algorithm for constructing a translator from Prolog to AKLI. An important feature of the translation approach proposed herein is that it automatically detects and allows the parallel execution of independent goals (as well of course as or-parallelism, and the parallel execution of deterministic goals even if they are dependent as per the Andorra Principle). The execution of independent goals in parallel has the very desirable properties of preserving the program complexity perceived by the programmer [HR89]. Important requirements for such a translation are the compile-time detection of goal independence and input/output modes. This requires in general a global analysis of the program, perhaps us1 Veda [Ved87bj proposed automatic translation from Prolog to a committed-choice language (GHC, in his case). However, our aim and target language are quite different. ing abstract interpretation. In the approach proposed herein heavy use will be made of our compile-time tools, developed in. the context of &-Prolog [HG90j. In particular, Prolog programs are first analyzed and annotated as &-Prolog programs (thus making goal independence explicit), and then they are translated into AKL. In the following section, the AKL control model and its rules are briefly reviewed together with some syntactic conventions. Then transformations for Prolog constructions for a basic translation are presented in section 3 and some rules for combining the AKL model with our purpose of achievement of independent parallelism are shown in section 4. Section 5 will present the analysis tools and why they are needed in the translation process. In section 6 some results are shown for the execution of a number of benchmarks automatically translated, and section 7 presents some conclusions. 2 The Andorra Kernel guage Revisited Lan- In this section we present a brief overview of the AKL model of execution, in order to make the paper selfcontained. The purpose is to, based on an understanding of this, extract the correct rules for a translation of Prolog which achieves the desired results. AKL and its model of execution have been fully described in [JH91, HJ90j. AKL is a language with deep guards. Thus, clauses are divided into two parts: the guard and the body, separated by a guard operator. Guard operators are: wait (:), cut (!), and commit (I). The following syntactical restrictions apply: • Each clause is expected to have one and only one guard operator; • All clauses in the definition of a predicate have to be guarded by the same guard operator. So, if any of the clauses is not guarded, the guard operator of its companions is assumed and positioned just after the clause neck. • A wait operator is assumed, and in the above mentioned position, where no other operator can be assumed using the above mentioned rules. Guards are regarded as part of clause selection. This means that a clause body is not entered unless head unification succeeds and its guard is completely solved. Then, execution proceeds by "expansion" of the present configuration by application of a rule of the computation model. The rules in the AKL model allow rewriting of configurations (states) leading to valid configurations from valid ones. They are fully described in [JH91j, so we will simply enumerate them, providing very informally the concept behind the rule, rather than a precise definition: 761 1. Local forking: unfolds an atomic goal into a choice partition([EIR] ,C,Left,Right):E < C, I. Left = [EILeft1], partition(R,C,Left1,Right) . partition([EIR] ,C,Left,Right):- of all the alternatives in its definition (but without creating "copies" 2 yet of continuation goals). 2. Nondeterminate promotion: promotes one guarded goal with solved guard in a choice of several of them (i.e. copies the goal to the parent continuation, applying its constraint/substitution to it, and creates a "copy" of the continuation environment). 3. Determinate promotion: special case of the above when there is a single guarded goal in a choice if its guard is solved (no copying of the continuation environment is necessary). 4. Failure and synchronization rules: remove or fail configurations in the usual way. 5. Pruning rules: handle the effects of pruning guard operators. 6. DMtribution and bagof rules: do the distribution of guards and the bagof operation. These rules basically represent the allowable transitions of the EAM. The last three rules are less relevant for our purposes. In addition to these rules there are three basic control restrictions in the general computation model (meta-rules) which control the application of the above rules and which are highly relevant to our independent style translation: • Pruning in AKL has to be quiet, that is, a solution for the guard of a cut or commit guarded clause may not further restrict (or constrain) variables outside its own configuration. • Goals in the guard of a clause are completely and locally executed. This means that execution of guards is simultaneous but independent of the parent environment. • Nondeterminate promotion is only admissible within a stable subgoalof a configuration. A goal is stable if no rule is applicable to any subgoal, and no possible changes in its environment will lead to a situation in which a rule is applicable in the goaL As we shall soon see these three restrictions force the conditions under which translation has to be done if we want to achieve parallelism and correct pruning in the translated clauses. But first, we will illustrate the AKL execution model with a simple example: partition([],_,Left,Right):- I, Left = [], Right = []. 2 Although we refer to "copying" throughout the paper, part of the continuation goals could in principle be shared [War90j. E )= C, I, Right = [EIRight1], partition(R,C,Left,Right1). For a query such as partition([2.1] .3.1.D) the initial configuration would be a choice-point with the three clauses for the predicate. Head unification would fail the first alternative ([] = [2.1]), but the second one would succeed ([EIR1= [2.1] • C=3. E=C (i.e. 2>=3) would be executed (and failed) only after promotion. After failure of this branch, determinate promotion of the remaining one would be applicable, and execution would proceed as before. 3 Translating Prolog Constructions Having the aforementioned rules in mind, we now discuss transformation rules for translating basic Prolog constructions, disregarding any possible exploitation of lAP. Even this straightforward step is nontrivial, as we shall soon see. This is due mainly to the semantics of cut in both Prolog and AKL, cut being a guard operator in the latter. With the restrictions required for guard operators to achieve both syntactic and semantic correctness in AKL, we find problems in the following constructions: • syntactical restrictions: - definitions of predicates in which a pruning clause appears, - clauses in which more than one cut appears; • semantic restrictions: - if-then-elses, where the cut has a "local" pruning effect, - pruning clauses where the cut is regarded as 762 noisy (i.e. attempts to further restrict variables outside its scope), - side-effects and meta-logical predicates, whic.h should be sequentialized. The transformations required to deal with these constructions are proposed in the following subsections. This is done mainly through examples. The aim is thus not to provide precise and formal definitions of program transformations but rather to provide the intuition behind the process of translation. In subsequent sections we will discuss other issues involved in the process of translation, such as achievement of lAP, problems in this, and its relation with the AKL stability conditions. 3.1 Direct translation First, as all AKL clauses in a definition are forced to have the same guard operator, we have to ensure this is achieved. For example: Example 1 Same guard operator in a definition p(X,Y):- q(X), reV). p(X,Y):- test(X) , !, output(Y) . p(X,Y):- a(X,Y). p(X,Y):- q(X), reV). p(X,Y):- pc(X,Y). pc(X,Y):- test(X) , !, output(Y) . pc(X,Y):- a(X,Y). Note that clauses before the pruning one will have an (assumed) wait operator and clauses after that one (and that one itself) will have an (assumed) cut operator. All of them but the pruning one have an empty guard. Note that, had the program not been rewritten, the rules for assuming guard operators would have put a cut operator in the first clause, which is obviously not the correct translation. Note also,· that only one guard operator is to be allowed in a clause. Therefore repeated cuts in the same body (which are otherwise strongly discouraged as a matter of style and declarativeness) have to be "folded" out using the technique sketched below: Example 2 p(X, Y):- Single guard operator in a clause teat(X) , !, test (Y), !, accept(X,Y). p(X,Y):- test(X) , !, foo(X, Y). foo(X,Y):- test(Y), !, accept(X,Y). Second, the AKL cut operator is regarded as a guard operator, and, furthermore, it has to be quiet (which is not the case in some Prolog constructions, which cannot be easily translated to AKL). One of them is local pruning, i.e. if-then-else. Indeed, an if-then-else can be viewed as a disjunction containing a cut whose scope is limited to the disjunction itself, rather than the clause in which it appears. Thus the following preprocessing can be done: Example 3 Local pruning of if-then-else p(X):- (cond(X) -> q(X, Y) r(X,Z) ), s(Y,Z). p(X):- foo(X,Y,Z), s(Y,Z). foo(X,Y,_):- cond(I), I, q(X, Y). foo(X,_,Z):- r(I,Z). Last but not least, we have to ensure the quietness of all AKL cuts. A cut is quiet if it does not attempt to bind variables which are seen from outside its own scope, that is, the clause where they appear. Then, if this is not the case, we have to make that binding explicit in the form of an equality constraint (a unification) and place it after the cut itself, i.e. outside the guarded part of the clause: Example 4 Making a cut quiet p(X,Y):- teateX) , output(Y), !. p(X,Y):- s(I,Y). p(X,Y):- test(X) , output (Yl), !, Yl=Y. p(X,Y):- s(X,Y). Note that knowledge of input/output modes of variables is required for performing this transformation, and that the transformation may not always be safe3 • This will be discussed in the following subsection. 3.2 Noisiness of cut The main difference between cut in Prolog and cut in AKL is that cut is quiet in AKL 4 • "Quiet" in the context of a cut means that the solution of the cut's guard is quiet, that is, it does not add constraints to variables outside the guarded goals themselves, other than those which already appear in its environment. Indeed, a transformation such as the one proposed in example (3.1).4 can make a noisy cut quiet. What it does is to delay output unification until the guard is promoted by making it explicit in the body part of the clause. We regard a variable to be output in a query if execution for this query will further constrain it; a variable will be regarded as input if execution will depend on its state of instantiation (or constraint). In other words, a variable is an output variable in a literal if it is further instantiated by the query this literal represents it is an input variable if it makes a difference for the' execution of the literal whether the variable is instantiated or not 5 • Note that a given variable can be both input and output, or none of them. 3Note also that this transformation, when safe, may be of advantage as well in standard Prolog compilers in order to avoid trailing overhead. 4Nevertheless, a noisy cut has also been implemented in AKL, which we will discuss later. 5These definitions are similar to those independently proposed in [SCWY91], (and also in the spirit of those of Gregory [Gre8S]), which describes translation techniques from Prolog to AndorraI, an implementation of the Basic Andorra Model. ~ltho~gh ~he techniques used in such a translation have some relatlonshl~ wl~h those involved in Prolog-AKL translation, the latter requires m practice quite different techniques due to AKL being based on the 763 The objective of a transformation such as the one proposed is to rename apart all output variables in the head of a pruning clause, and then bind the new variables to the original ones in the body of the clause leaving input variables untouched. In general, it is un~ wise to rename apart input variables since,· from their own definition, this renaming would make the variable ~ppear uninstantiated and potentially result in growth In the search space of the goals involved. This would not meet our objective of preserving the complexity of ~he progra~ (and perhaps not even that of preserving Its semantIcs). However, since a variable can be both input and output a conflict between renaming and notrenaming requirements appears in such cases. For these cases< in which a variable cannot be "moved" after the cut guard operator a real noisy cut is needed. This ?pe~ator exists in AKL (!!), together with a sequentialIzatlOn operator, the sequential conjunction (&). It is necessary that every noisy cut be sequentialized, this to ensure that pruning would occur in the same context that it would in Prolog. Thus, every literal call to the pruning predicate has to be sequentialized to its right, and every other call to a predicate sequentialized has in turn to be also sequentialized. For this reason noisy cut is not very efficient, and thus the translation tries to minimize its use. At this point we can summarize the action that should be taken in every case to transform the pruning clauses of a Prolog program, based on the knowledge of input/output variables, that is, whether they are "tested" or not and further instantiated or not. Here we use "noisy" to mean the transformation that defaults to the AKL noisy cut, and "move" to refer to the renaming of variables like in example (3.1).4. II Further Instantiated? yes no unknown I Tested? I Action II yes no unknown noisy move user none user move user '" yes no unknown Note that the knowledge of input/output modes in the Prolog program that is assumed in this transformation requires in general a global analysis of the program and can only be approximated, the translator having to make conservative approximations or warn the user ("user" cases above) when insufficient information is available. Note also that the "user" cases can be replaced by "noisy" cases if a non-interactive transformation is preferred. This subject will be discussed further in section 5, as well as the type of analysis required. Extended A?dorra Model (thus having to deal with the possibility of parallehsm among n~n-deterministic goals and the stability rules) .and the rather different way in which the control of the execu~lOn model (explicit in AKL and implicit in Andorra-I) is done in each language. 3.3 Synchronization of side-effects In general, the purpose of side-effect synchronization is to prevent a side effect from being executed before other preceding (in the sense of the sequential operational sema.il.tics) side-effects or goals, in the cases when such adherence to the sequential order is desired. In our context, if side-effects are allowed within parallel AKL code and a behaviour of the program identical to that observable on a sequential Prolog implementation is to be preserved, then some type of synchronization code should be added to the program. In general, in order to preserve the sequential observable behaviour, sideeffects can only be executed when every subgoal to their left has been execute'd, i.e. when they are "leftmost" in the execution tree. However, a distinction can be made between soft and hard side-effects (a side-effect is regarded to be hard if it could affect subsequent execution) , see [DeG87] and [MH89]. This distinction allows more parallelism. It is also convenient in this context to distinguish between side-effect built-ins and side-effect procedures, i.e. those procedures that have side-effects in their clauses or call other side-effect procedures. To achieve side-effect synchronization, various compile-time methods are possible: • To use a chain of variables to pass a "leftmost token", taking advantage of the suspension properties of guards to suspend execution until arrival of the token [SCWY91]. • To use chains of variables as semaphores with some compact primitives that test their value. In [MH89] a solution was proposed along such lines, and its implementation discussed. • To use a sequentialization built-in to make the sideeffect and the code surrounding it wait; this primitive would be in our case the sequentialization operator "&". In the first solution, a pair of arguments is added to the heads of relevant predicates for synchronization. Side-effects are encapsulated in clauses with a wait (:) guard containing an "ask" unification of the first argument with some known value (token), to be passed by the preced~g side-effect upon its completion. Upon successful execution of the current side-effect the second argument is bound ("tell") to the known value and the token thus passed along. This quite elegant solution can be optimized in several cases. The second solution can be viewed as an efficient implementation of the first one, which allows further optimization [MH89]. The logical variables which are passed to procedures in the extra arguments behave as semaphores, and synchronization primitives operate on the semaphore values. 764 In the third solution, every soft side-effect is synchronized to its left with the sequentialization operator "&", and every hard one both to its left and right. This sequentialization is propagated upwards to the level needed to preserve correctness. This introduces some unnecessary restrictions to the parallelism available. However, if side-effects appear close to the top of the execution tree, this may be quite a good solution. 4 Stability and Achievement of Independent And-Parallelism In order to achieve more parallelism than that available by the translations described so far one might think of translating Prolog into AKL so that every subgoal could run in parallel unrestricted. However, this can be very inefficient and would violate the premise of preserving the results and complexity of the computation expected by the user. On the other hand, and as mentioned before, parallel execution of independent goals, even if they are nondeterministic, is an efficient and desirable form of parallelism and its addition motivated the development of the EAM, on which the AKL is based. Nevertheless, in AKL goals known to be independent have to be explicitly rewritten in order to make sure that they will be run in parallel. This is because of the rules that govern the (nondeterminate) promotion, that is, the stability condition on nondeterminate promotion, which will prevent these goals for being promoted if they try to bind external variables for output. Therefore, one important issue is the transformation that is needed to avoid suspension of independent goals. This is presented in section 4.1. Also, independence detection can and will be used to reduce stability checking, a potentially expensive operation. Clearly, an important issue in this context is how stability /goal independence is detected. In the framework of the &-Prolog system we have already developed technology and the associated tools for determining independence conditions for goals and partially evaluating many of those conditions at compile-time through program analysis. Conceptual models for independent and-parallel execution have been presented and their correctness and efficiency proved [HR89j; among all we focus on the and-parallelism models proposed in [HR90, HR89j. For different but related models the reader is referred to the references in those papers. As mentioned before, in the translation process we propose to use algorithms and tools already developed in the context of &-Prolog. In this context, a series of algorithms used in the &-Prolog compiler for annotating Prolog programs have been implemented and described in [MH90j. These algorithms select goals for parallel execution and, using the sufficient rules proposed in [HR89], generate the conditions under which inde- pendence is achieved and therefore independent parallel execution ensured. The result is a transforma.tion of a given Prolog clause into an &-Prolog clause containing parallel e'xpressions which achieve such independent and-parallelism. The output of this analysis is made available for the translation process in the form of an annotated &-Prolog program [HG90j, i.e. the program itself expresses whi~h goals are independent and under which conditions. These conditions are expressed in the form of if-then-elses which have the intuitive meaning of "if the conditions hold then run in parallel otherwise sequentially." The parallelism itself is made explicit by using the "&" operator to denote parallel conjunction instead of the standard sequential conjunction denoted by "," 6. Some new issues are involved in the interaction between the conditions of these parallel expressions and other goals run in parallel concurrently, as it would be. the case in AKL. These will be presented in section 4.2. 4.1 The transformation proposed At this point the &-Prolog conditionals are regarded as input to the translator. As such, if-then-elses are preprocessed in the form mentioned in the previous sections and the remaining issue is the treatment of the parallelization operator "&". In implementing this operator we will use the AKL property that allows local and unrestricted execution of guards, i.e., goals that are encapsulated in a guard can run in parallel with goals in other guards even if they are nondeterministic. The transformation that takes advantage of this will: • put goals known to be independent in (different) guards, and • extract output arguments from the guards, binding them in the body part of the clauses, the last step being required so that the execution of these goals is not suspended because of their attempting to perform output unification. With the guard encapsulation we ensure that those predicates will be executed simultaneously and independently. The' following example illustrates the transformation involved: Example 5 Encapsulation of independent subgoals p(X):- (ground (X) , indep(Y,Z) -> q(X,Y) t r(X,Z) ; q(X,Y) , r(X,Z) ), s(Y,Z) . p(X):- pp(X,Y,Z), s(Y,Z). pp(X,Y,Z):- ground(X) , indep(Y, Z), !, qp(X,Y), rp(X,Z). pp(X,Y,Z):- q(X,Y), r(X,Z). qp(X,Y):- q(X,Y1), ., Y=Y1. rp(X,Z):- r(X,Z1), ., Z=Z1. 6Note that in AKL these operators have just the opposite meaning!. 765 When the condition is met, both sub goals will be tried by the local fork rule, then both guards will be completely and locally solved, and then, as goals are independent on X (X is ground) and no output is produced in the guard, the nondeterminate promotion rule is always applicable and all solutions will be tried in the standard cartesian product way. Thus, parallel execution is ensured for those goals that are identified as independent. On the other hand, when the condition fails (the goals being dependent) they appear together in a body with an empty guard. This means that the guard will be immediately solved, the clause body promoted, and subgoals tried simultaneously. Then the standard stability and promotion rules will apply. It should be noted that, as in the case of cut, and in addition to detecting goal independence, to be able to perform this transformation it is necessary to have inferred mode information regarding the predicate clauses. In section 5 techniques used in order to infer this information will be reviewed. 4.2 Cohabitation of dependent and independent and-parallelism and stability checks When evaluating the conditions of parallel expressions at run-time within a parallel framework such as that of the AKL, they may not evaluate to the same value than during a Prolog execution. This is what we have termed in another context the CGE-condition problem [GSCYR91]7, and may result in a loss (or increase) of parallelism. To deal with these issues, different levels of restrictions can be placed on the translation: • Disallow any parallel execution except for those goals found to be independent. • Allow parallel execution only for goals not binding variables that appear in the conditions or CG E. • Allow parallel execution outside a CGE but sequentialize before and after the conditional parallel expressions. • Allow unrestricted parallel execution unrestricted, i.e. no sequentialization is to be done. The first solution can be implemented by translating every conjunction as a sequential AKL conjunction, except those joining independent goals. This will lead to 7Note that some other problems mentioned in [GSCYH91] regarding the interaction between independent and dependent andparallelism (in particular, the deterministic goal problem) are less of an issue in the proposed translation to AKL because independent goals execute in their own environments, thanks to the dynamic scoping of AKL guards. In any case, the AKL implementation is assumed to cope with all types of goal activations possible within the EAM. a type of execution where only goals known to be independent are run in parallel and which directly resembles that of &-Prolog [RG90]. The same search space as &-Prolog will be explored. Nondeterminate (and determinate) promotion will then be restricted to only independent and sequential goals. Thus, one very important advantage of this translation is that no checks on stability ever need to be done, as stability is ensured for sequential and independent execution. This is an important issue since stability checking is a potentially expensive operation (and very closely related to independence checking). Thus, in an ideal AKL implementation code translated as above, i.e. free of stability checks, should run with comparable efficiency to that of &-Prolog. On the other hand, the transformation loses determinate dependent and-parallelism and its desirable effect of co-routining, which could be useful in reducing search space [SCWY90j. The second solution attempts to preserve the environment in which the CGE evaluates while allowing coroutining of goals that don't affect CG E conditions and goals. Although interesting, this appears quite difficult to implement in practice as it requires very sophisticated compile-time analysis and will probably incur in run-time overheads for checking of the conditions placed in the program. The third solution can be viewed as a relaxation of the first one to achieve some coroutining, or as an efficient (and feasible) way of partially implementing the second one. Goals before and after are allowed to execute in parallel using the Andorra Principle, but they are sequentialized just before and after a CG E. In this way CGEs evaluate in the same context as in Prolog and the same level of independent and-parallelism is achieved. This translation has the good characteristics regarding search space of the previous one. In addition, some reduction of search space due to coroutining will be achieved. However, stability checking, although reduced, cannot in general be eliminated altogether. The fourth solution will allow every goal to run in parallel. The full EAM and AKL operational semantics (including stability) has to be preserved. The execution of goals which are unconditionally independent or depend only on groundness checks (conditionals in the parallel expressions are composed of ground/1 and indep/2 checks, as in the example of section 4.1) will be the same as in &-Prolog as eager execution of other goals cannot affect ground or empty checks [GSCYH91j. However, independence' checks may fail where they wouldn't in Prolog (therefore losing this parallelism), but also succeed where they would fail in Prolog (therefore gaining this parallelism). Also, the number of parallel steps will always be the same or less as in Prolog (although different than in &-Prolog). This solution (as well as the first and second ones) appear as quite reasonable compromises and offer different trade- 766 offs. The current translation approach uses this fourth option, but the others should also be explored. 5 II Inferring modes - Abstract Interpretation We have mentioned in previous sections the need for inferring modes of clause variables (i.e. whether they are input or output variables) in Prolog programs. The main reason for this need is that we have to know which are the output variables in a clause in order to rename them apart and place corresponding bindings for them in the body part of the clause in both • the pruning clauses (as shown in section 3.2), and • the remade clauses for parallel execution (as shown in section 4.1 in example 5). Much work has been done in global analysis of logic programs to infer run-time properties, and, in particular, modes, mostly using the technique of abstract interpretation [CC77]. A more sophisticated sort of variable binding analysis (comprising groundness, aliasing, and freeness information) is instrumental in the process of inferring the independence conditions for literals in a body. While not strictly needed, such an analysis is extremely useful as it allows the reduction of the number of conditions and therefore the improvement of performance by reducing run-time checking [WHD88, MH91b] (these papers provide references to the important body of other work in this area). The standard global analyzer in the &-Prolog compiler, described in [MH91 b], infers groundness and variable sharing/ aliasing. Since variable freeness is also needed for the AKL translator, this analyzer has been extended to use the algorithm described in [MH91a] and infer variable freeness information. It turns out that freeness information is very useful for many reasons [MH91a]. In the translation process it is essential for determining input/output arguments. This we can show by simply expressing the information required for the table in section 3.2 in terms of information directly available froni abstract interpretation. In order to do this, recall, as defined in section 3.2, that a program variable (or an argument) is output in a literal if the call to the corresponding predicate further instantiates this variable, and it is input in a literal if its state of instantiation is going to be checked in the execution of the call for that literal. With these definitions in mind the following table shows how the input or output character of variables can be decided in a good number of cases based on the information directly available from global analysis: From the table we identify cases in which it is clear that the variable is known not to be an input variable, without any further analysis (i.e. when the variable is Before ground free semi 1 After (ground) free semi ground semi 1 semi2 ground Output? Input? no no yes yes no yes yes * II -" no no --. ? ? free). Furthermore, we realize that if a variable is known not to be an output variable then it doesn't need to be renamed apart and it is not necessary to determine whether it is an input variable or not ("*" cases). Reducing the cases where knowing if a variable is input is quite useful since inferring whether a variable binding is needed or not requires additional analysis ("7" cases) . This analysis seeks to decide if a variable is crucial in clause selection or checking. Note that the analysis has to be extended for every child procedure of the one being analyzed. Finally, we would like to also mention that combining mode/type analysis (such as the one used in [SCWY91] or [J an90j) with the accurate tracking of sharing and freeness information of [MH91a] could be very helpful in this process (improving the ability to more accurately resolve different degrees of partial instantiation such as the semil/ semi2 cases in the table above) and is part of our plans for future work. 6 Performance Timings This section presents some results on the timing of a number of benchmarks in a prototype AKL system. The AKL versions of the programs obtained through automatic compile-time translation are compared with versions specifically written for AKL. Timings for the original Prolog versions are also included for comparison and also with the intention of identifying translation paradigms that help efficiency. With this aim in mind, the set of benchmarks has been chosen so that performance results are obtained for several different programming paradigms, and a number of different translation issues are taken into account. The results show that translation suffices in most cases, provided state-of-art analysis technology is used. Timings 8 have been done for the Prolog program (compiled and interpreted), the AKL program obtained from automatic translation and the "hand-writtenAKL" version. Execution until the first solution is obtained has been measured. Timings are an average of ten consecutive executions done after a first one (not timed) and are given in in Il?-ilisseconds, rounded up to tens. 8SICStus 1.8 and a sequential AKL 0.0 prototype system, made available by SICS, have been used. 767 We briefly introduce the programming paradigms represented by each of the benchmarks used. qsort has been translated in two ways, one that "folds" pruning definitions, and another one that is able to "extend" the cut to all clauses; the latter showing an advantage w.r.t. the former. sort illustrates the advantage of being able to detect that some cuts are not noisy (as opposed to defaulting to noisy cut in every case). In fact, in this case the translated version is slightly faster than the hand-coded one. For money we have used three different versions. In the first version of the program the problem is solved through extensive backtracking. In the second one the ordering of goals is improved at the Prolog level. In the third version the Prolog builtins are translated into AKL specific ones. As in zebra the difference with the "hand-written" version is in the use of the arithmetic predicates: addition is programmed in the hand-coded AKL version as illustrated by the sum/3 predicate, Bum(X.Y.Z):- pluB(X.Y.zO). I. Z = ZOo Bum(X.Y.Z):- minUB(Z.Y.XO). I. I = XO. Bum(X.Y.Z):- minus(Z.X.YO). I. Y = YO. in which the coroutining effect provides a "constraint solving" behaviour. Scanner is a program where AKL can take a large advantage from concurrent execution and the "determinate-first" principle, even without explicit control, and this is shown in the good performance of the translated program. On the other hand, in triangle and knights heavy use of special AKL features has been made, through hand-optimization. Prolog compiled qsortl qsort sort money 1 money moneyb zebra scanner triangle knights qsort matrix hanoi query maps 30 30 20 66,590 47,790 47,790 8,550 1,407,450 3,140 79,960 Prolog interpret. 290 290 50 520,190 391,190 391,190 43,740 8,838,000 7,260 855,049 Prolog compiled Prolog interpret, 30 50 10 70 90 290 400 50 340 540 AKL AKL translated "hand" 290 290 910 530 530 530 1,980 120 11,020 480 750 290 870 294,370 294,070 187,920 10,380 540 152,230 1,165,020 AKL AKL translat. (encap.) translat. (direct) 290 610 70 370 140 290 690 310 1,600 2,240 In matrix, hanoi, query, and maps (and also qsort) , encapsulation of different programming paradigms has been tried. The results show that encapsulating independent goals which are deterministic provides no improvement, but performance improves when they are nondeterministic. Performance also improves in the case of goals which act in producer/consumer fashion ( maps). These results suggest that AKL control similar to that of hand-coded versions can be imposed automatically for paradigms other than independence of goals. The automatic transformation achieves reasonably good results when compared to. code specifically written for AKL, provided one takes into account that the starting point is a Prolog program with little specification of control, and it is being compared to an AKL program where control has been greatly optimized by the programmer. The examples where the largest differences show are those in which the control imposed by hand in the AKL program changes the complexity of the algorithm, generally through smart use of suspension (as in the sum/3 predicate), something that the transformation can not yet do automatically. However, the results also show that it would obviously be desirable to extend the translation algorithms towards implementing some of the smart forms of control that can be provided by an AKL programmer. When comparing with Prolog, both the interpreted and compiled Prolog figures should be considered, as the AKL system prototype used is somehow something in between a compiler and an interpreter. The results show that a variable performance improvement can be obtained whenever determinism is significant in the problem (this is quite spectacular in scanner). Also, the encapsulation transformation can help efficiency in some cases. In any case the figures are of course preliminary and a more exhaustive study should clearly be done after improvements in the translation prototype and the AKL system, and also when an actual parallel AKL system is available. 7 Conclusions We have presented an algorithm for translating Prolog into AKL which in addition achieves independent and-parallel execution of appropriate goals. We have pointed out a series of non-trivial problems associated with such a translation and proposed solutions for them based on existing global analysis technology. We have shown how to take advantage both of the AKL execution model (the Extended Andorra Model) and the independence analysis performed in the context of &. Prolog to produce a translation that allows the exploitation of all the forms of parallelism present in AKL (dependent-and, independent-and, and or-parallelism) while offering the user the familiar Prolog (or, in general, logic with minimal control) view (and debugging ease!). Most importantly, this is done while preserving or improving the user-perceived complexity of the program. The transformation is relevant even in the case of a sequential AKL implementation since the reduction of stability checking which follows from knowledge 768 of goal independence can already be of significant advantage, given the expected cost· of stability tests. In the case of a parallel AKL implementation the transformation amounts to a form of automatic parallelization and search space reducing implementation for Prolog programs which. exploits the EAM, and imposes a particular form of control on it. A sequential AKL implementation is already being developed at SICS with a first prototype already running. The translator itself is also being implemented and a preliminary version is already integrated with the &-Prolog system compilation tools. The combination has been tested and some sample programs executed successfully on AKL, and compared with their specific AKL counterparts. Further work is expected in the translator as better translation algorithms are developed to take more specific advantage of the AKL control facilities, in particular coroutining, in more accurately detecting input and output variables, in adapting the algorithms to possible evolutions of the AKL, in evaluating the performance of the translated programs with respect to Prolog, and in the formal proof of the correctness of the transformation and its preservation of user expected computation size, the latter point being supported already in part by the basic results on independent and-parallelism. Programming Languages, pages 238-252, 1977. [Cra90] [DeG87] D. DeGroot. Restricted AND-Parallelism and Side-Effects. In International Symposium on Logic Programming, pages 80-89. San Francisco, IEEE Computer Society, August 1987. [Gre85] S. Gregory. Design, Application and Implementation of a Parallel Logic Programming Language. PhD thesis, Imperial College of Science and Technology, London, England, 1985. [GSCYH91] G. Gupta, V. Santos-Costa, R. Yang, and M. Hermenegildo. IDIOM: A Model Intergrating Dependent-, Independent-, and Or-parallelism. Technical report, University of Bristol, March 1991. [Her90] M. Hermenegildo. Compile-time Analysis Requirements for the Extended Andorra Model. In Sverker Jansson, editor, Parallel Logic Programming Workshop, Box 1263, S-163 13 Spanga, SWEDEN, June 1990. SICS. [HG90] M. Hermenegildo and K. Greene. &-Prolog and its Performance: Exploiting Independent And-Parallelism. In 1990 International Conference on Logic Programming, pages 253-268. MIT Press, June 1990. [HJ90] S. Haridi and S. Janson. Kernel Andorra Prolog and its Computation Model. In Proceedings of the Seventh International Conference on Logic Programming. MIT Press, June 1990. [HR89] M. Hernienegildo and F. Rossi. On the Correctness and Efficiency of Independent And-Parallelism in Logic Programs. In 1989 North American Conference on Logic Programming, pages 369-390. MIT Press, October 1989. [HR90] M. Hermenegildo and F. Rossi. Non-Strict Independent And-Parallelism. In 1990 International Conference on Logic Programmmg, pages 237-252. MIT Press, June 1990. Acknow ledgements The authors would like to thank Seif Haridi, Sverker Jansson, Johan Montelius, and Mats Carlsson of SICS, and David H.D. Warren, Vitor Santos Costa, and Gopal Gupta of U. of Bristol for many useful discussions. Also thanks to SICS for making the prototype AKL implementation available for experimentation. This work has been performed in the context of the ESPRIT "PEPMA" project and has greatly benefited from discussions with other members of the partner institutions, most significantly from SICS, U. of Bristol, and U.P. Madrid. References [AK90] K.A.M. Ali and R. Karlsson. The Muse Or-Parallel Prolog Model and its Performance. In 1990 North American Conference on Logic Programming. MIT Press, October 1990. [CC77] P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Conf. Rec. 4th Acm Symp. on Prin. of .Jim Crammond. Scheduling and VmabIe Assignment in the Parallel Parlog implementation. In 1990 North American Conference on Logic Programming. MIT Press, , 1990. 769 [HS86] A. Houri and E. Shapiro. A sequential abstract machine for flat concurrent prolog. Technical Report CS86-20, Dept. of Computer Science, The Weizmann Institute of Science, Rehovot 76100, Israel, July 1986. [MH91b] K. Muthukumar and M. Hermenegildo. Compile-time Derivation of Variable Dependency Using Abstract Interpretation. Journal of Logic Programming, 1991. To appear (also published as Technical Report FIM 59.1/IA/90, Computer Science Dept, Universidad Politecnica de Madrid, Spain, Aug 1990). [Sar89] Vijay A. Saraswat. Concurrent Constraint Programming Languages. PhD thesis, Carnegie Mellon, Pittsburgh, 1989. School of Computer Science. [SCWY90] V. Santos-Costa, D.H.D. Warren, and R. Yang. Andorra-I: A Parallel Prolog System that Transparently Exploits both And- and Or-parallelism. In Proceedings of the Srd. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, April 1990. [SCWY91] L. Kale. Parallel Execution of Logic Programs: the REDUCE-OR Process Model. In Fourth International Conference on Logic Programming, pages 616-632. Melbourne, Australia, May 1987. V. Santos-Costa, D.H.D. Warren, and R. Yang. The Andorra-I Preprocessor: Supporting Full Prolog on the Basic Andorra Model. In 1991 International Conference on Logic Programming, pages 443456. MIT Press, June 1991. [Ued87a] E. Lusk et. al. The Aurora Or-Parallel Prolog System. In International Conference on Fifth Generation Computer Systems. Tokyo, November 1988. K. Ueda. Guarded Horn Clauses. In E.Y. Shapiro, editor, Concurrent Prolog: Collected Papers, pages 140-156. MIT Press, Cambridge MA, 1987. [Ued87b] K. Ueda. Making Exhaustive Search Programs Deterministic. New Generation Computing, 5(1):29-44, 1987. [War] D. H. D. Warren. The Andorra Principle. Presented at Gigalips workshop, 1987. Unpublished. [War90] D. H. D. Warren. The Extended Andorra Model with Implicit Control. In Sverker Jansson, editor, Parallel Logic Programming Workshop, Box 1263, S-163 13 Spanga, SWEDEN, June 1990. SICS. [WHD88] R. Warren, M. Hermenegildo, and S. Debray. On the Practicality of Global Flow Analysis of Logic Programs. In Fifth International Conference and Symposium on Logic Programming. MIT Press, August 1988. [IMT87] N. Ichiyoshi, T. Miyazaki, and K. Taki. A Distributed Implementation of Flat GHC on the Multi-PSI. In Fourth International Conference on Logic Programming, pages 257-275. University of Melbourne, MIT Press, May 1987. [Jan90] G. Janssens. Deriving Run-time Properties of Logic Programs by means of Abstract Interpretation. PhD thesis, Dept. of Computer Science, Katholieke Universiteit Leuven, Belgium, March 1990. [JH90] S. Janson and S. Haridi. Programming Paradigms of the Andorra Kernel Language. Technical Report PEPMA Project, SICS, Box 1263, S-164 28 KISTA, Sweden, November 1990. Forthcoming. [JH91] Sverker Janson and Seif Haridi. Programming Paradigms of the Andorra Kernel Language. In 1991 International Logic Programming Symposium, pages 167-183. MIT Press, 1991. [KaI87] [Lus88] [MH89] [MH90] [MH91a] Abstract Interpretation. In 1991 International Conference on Logic Programming. MIT Press, June 1991. K. Muthukumar and M. Hermenegildo. Efficient Methods for Supporting Side Effects in Independent And-parallelism and Their Backtracking Semantics. In 1989 International Conference on Logic Programming. MIT Press, June 1989. K. Muthukumar and M. Hermenegildo. The CDG, UDG, and MEL Methods for Automatic Compile-time Parallelization of Logic Programs for Independent Andparallelism. In 1990 International Conference on Logic Programming, pages 221237. MIT Press, June 1990. K. Muthukumar and M. Hermenegildo. Combined Determination of Sharing and Freeness of Program Variables Through PROCEEDINGS OF THE INTERNATIONAL CONFERENCE. ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 770 Recomputation based Implementations of And-Or Parallel Prolog Gopal Gupta t Department of Computer Science Box 30001, Dept. 3CU, New Mexico State University Las Cruces, NM 88003-0001 gupta a & b), and assuming that a and b have 3 solutions each (to be executed in or-parallel form) and the query is ?- q, then the corresponding and-or tree would appear as shown in figure 1. b is entirely recomputed (in fact, the tree could contain up to 9 c-nodes, one for each combination of solutions of a and b). To represent the fact that a parallel conjunction can have multiple solutions we add a branch point (choice point) before the different composition nodes. Note that c-nodes and branch points serve purposes very similar to the Parcall frames and markers of the RAP-WAM [H86, HG90]. The C-tree can represent or- and independent and-parallelism quite naturallyexecution of goals in a c-node gives rise to independent and-parallelism while parallel execution of untried alternatives gives rise to or-parallelism. t. q Key: q • CJ (a & b) c:::::J Choice point Share Node Composition Node Key: • Choice point al a2 a3 b3 Figure 1: And-Or Tree One problem with such a traditional and-or tree is that bindings made by different alternatives of a are not visible to different alternatives of b, and vice-versa, and hence the correct environment has to be created before the continuation goal of the parallel conjunction can be executed. Creation of the proper environments requires a global operation, for example, Binding Array loading in AO-WAM [GJ89, G91a]' the complex dereferencing scheme ofPEPSys [BK88], or the "global forking" operation of the Extended Andorra Model [W90]. To eliminate this possible source of overhead in our model, we extend the traditional and-or tree so that the various or-parallel environments that simultaneously exist are always separate. The extension essentially uses the idea of recomputing independent goals of a parallel conjunction of &-Prolog [HG90] (and Prolog!). Thus, for every alternative of a, the goal b is computed in its entirety. Each separate combination of a and b is represented by what we term as a composition node (c-node for brevity). Thus, each composition node in the tree corresponds to a different solution for the parallel conjunction, i.e., a different "continuation". Thus the extended tree, called the Composition-tree (C-tree for brevity), for the above query might appear as shown in figure 2-for each alternative of the and-parallel goal a, goal bi b2 b2 Figure 2: Composition Tree Notice the topological similarity of the C-tree with the purely or-parallel tree shown in figure 3 for the program above. Essentially, branches that are "shared" in the purely or-parallel tree (i.e. that are "common", even though different binding environments may still have to be maintained -we will refer to such branches and regions for simplicity simply as "shared") are also shared in the C-tree. This sharing is represented by means of a share-node, which has a pointer to the shared branch and a pointer to the composition node where that branch is needed (figure 2). Due to sharing the subtrees of some independent and-parallel goals maybe spread out across different composition nodes. Thus, the subtree of goal a is spread out over c-nodes Cl, C2 and C3 in the C-tree of figure 2, the total amount of program-related work being essentially maintained. t In fact, a graphical tool capable of representing this tree has shown itself to be quite useful for implementors and users of independent and- and or-parallel systems [eGglj. 774 that the whole environment is visible to each member processor of the team. In copying is used, then processors in the team share the copy. Note that in the limit case there will be only one processor per team. Also note that despite the team arrangement a processor is free to migrate to another team as long as it is not the only one left in the team. Although a fixed assignment of processors to teams is possible a flexible scheme appears preferable. This will be discussed in more detail in section 4.3. The concept of teams of processors has been successfully used in the Andorra-I system [SW9I], which extends an or-parallel system to accommodate dependent and-parallelism. Key: - indicates end of a's branch b3 bl b2 b2 Figure 3: Or-Parallel Tree 4.1 And-Or Parallelism & Teams of Processors We will present some of the implementation isuues from the point of view of extending an or-parallel system to support independent and-parallelism. When a purely or-parallel model is extended to exploit independent and-parallelism then the following problem arises: at the end of independent and-parallel computation, all participating processors should see all the bindings created by each other. However, this is completely opposite to what is needed for or-parallelism where processors working in or-parallel should not see the (conditional) bindings created by each other. Thus, the requirements of or-parallelism and independent andparallelism seem anti-thetical to each other. The solutions that have been proposed range from updating the environment at the time independent and-parallel computations are combined [RK89, GJ89] to having a complex dereferencing scheme [BK88]. All of these operations have their cost. We contend that this cost can be eliminated by organising the processors into teams. Independent andparallelism is exploited among processors within a team while or-parallelism is exploited among teams. Thus a processor within a team would behave like a processor in a purely and-parallel system while all the processors in a given team would collectively behave like a processor in a purely or-parallel system. This entails that all processors within each team share the data structures that are used to maintain the separate or-parallel environments. For example, if binding arrays are being used to represent multiple or-parallel environments, then only one binding array should exist per team, so 4.2. C-tree & And-Or Parallelism The concept of organising processors into teams also meshes very well with C-trees. A team can work on a c-node in the C-tree-each of its member processors working on one of the independent and-parallel goal in that c-node. We illustrate this by means of an example. Consider the query corresponding to the and-or tree of figure 1. Suppose we have 6 processors PI, P2, ... , P6, grouped into 3 teams of 2 processors each. Let us suppose PI and P2 are in team I, P3 and P4 in team 2, and P5 and P6 in team 3. We illustrate how the C-tree shown in figure 2 would be created. Execution commences by processor PI of team I picking up the query q and executing it. Execution continues like normal sequential execution until the parallel conjunction is encountered, at which point a choice point node is created to keep track of the information about the different solutions that the parallel conjunction might generate. A c-node is then created (node CI in figure 2). The parallel conjunction consists of two and-parallel goals a and b, of which a is picked up by processor PI, while b is made available for andparallel execution. The goal b is subsequently picked up by processor P2, teammate of processor PI. Processor PI and P2 execute the parallel conjunction in and-parallel producing solutions a1 and b1 respectively. In the process they leave choice points behind. Since we allow or-parallelism below and-parallel goals, these untried alternatives can be processed in or-parallel by other teams. Thus the second team, consisting of P3 and P4 picks up the untried alternative corresponding to a2, and the third team, consisting of P5 and P6, picks up the untried alternative corresponding to a3. Both these teams create a new c-node, and restart the execution of and-parallel goal b (the goal to the right of goal a): the first processor in each team (P3 and P5, respectively) executes the alternative for a, while the second processor in each team (P4 and P6, respectively) executes the restarted goal b. Thus, there are 775 3 copies of b executing, one for each alternative of a. Note that the nodes in the subtree of a, between c-node Cl and the choice points from where untried alternatives were picked, are "shared" among different teams (in the same sense as the nodes above the parallel conjunction are-different binding environments still have to be maintained). Since there are only three teams,. the untried alternatives of b have to be executed by backtracking. In the C-tree, backtracking always takes place from the right to mimic Prolog's behaviour-goals to the right are completely explored before a processor can backtrack inside a goal to the left. Thus, if we had only one team with 2 processors, then only one composition node would actually need to be created, and all solutions would be found via backtracking, exactly as in &-Prolog, where only one copy of the Parcall frame exists [H86, HG90]. On the other hand if we had 5 teams of 2 processors each, then the C-tree could appear as shown in fig 4. In figure 4, the 2 extra teams steal the untried alternatives of goal b in c-node C3, This results in 2 new c-nodes being created, C4 and C5 and the subtree of goal bin c-node C3 being spread across c-nodes C3, C4 and C5. The topologically equivalent purely or-parallel tree of this C-tree is still the one shown in figure 3. The most important point to note is that new c-nodes get created only if there are resources to execute that c-node in parallel. Thus, the number of cnodes in a C-tree can vary depending on the availability of processors. point, which falls in the scope of a parallel conjunction, only after at least one solution has been found for each goal in that parallel conjunction. Thus, c-nodes C2, C3, C4 and C5 (fig 4) will be created only after the first team (consisting of PI and P2) succeeds in finding solutions ai and bi respectively. In this situation if b were to fail, then the c-node Cl will fail, resulting in the failure of the whole parallel conjunction. 4.3. Processor Scheduling Since our abstract model of C-trees is dependent upon the number of processors available, some of the processor scheduling issues can be determined at an abstract level, without going into the details of a concrete realization of the C-trees. As mentioned earlier, teams of processors are used to carry out or-parallel work while individual processors within a team perform and-parallel work. Since and-parallel work is shared within a team, a processor can in principle steal andparallel work only from members of its own team. Orparallel work is shared at the level of teams, thus only an idle team can steal an untried alternative from a choice point. An idle processor will first look for andparallel work in its own team. If no and-parallel work is found, it can decide to migrate to another team where there is work, provided it is not the last remaining processor in that team. If no such team exists it can start a new team of its own, perhaps with idle processors of other teams, and the new team can steal or-parallel work. One has to carefully balance the number of teams and the number of processors in each team, to fully exploit all the and- and or-parallelism available in a given Prolog program t . 5. Environment Representation bl b2 Choice point [J Share Node Composition Node b2 'The composition·nodes CI, C2 and C3 are created one each !: ::"~w~=~v;J:': ::':J..~~:llo':! ilie,!:!C~f and·parallel goal b in composition node C3 are picked by So far we have described and-or parallel execution with recomputation at an abstract level. We have not addressed the crucial problem of environment representation in the C-tree. In this section we discuss how to extend the Binding Arrays (BA) method [W84,W87] and the Stack-copying [AK90] methods to solve this problem. These extensions enable a team of processors to share a single BA without wasting too much space. others. The equivalent purely or-parallel tree is shown in fig 2. Figure 4: C-tree for 5 Teams It might appear that intelligent backtracking, that accompanies independent and-parallelism in &-Prolog, is absent in our abstract and-or parallel C-tree model. This is because if b were to completely fail, then this failure will be replicated in each of the three copies of b. We can incorporate intelligent backtracking by stipulating that an untried alternative be stolen from a choice 5.1 Sharing vs Non-Sharing In an earlier paper [GJ90] we argued that environment representation schemes that have constant-time task creation and constant-time access to variables, but non-constant time task-switching, are superior to those t Some of the 'flexible scheduling' techniques that have been developed for the Andorra-I system [D91] can be directly adapted for optimal distribution of or- and and-parallel work. 776 methods which have non-constant time task creation or non-constant time variable-access. The reason being that the number of task-creation operations and the number of variable-access operations are dependent on the program, while the number of task-switches can be controlled by the implementor by carefully designing the work-scheduler. The schemes that have constant-time task creation and variable-access can be further subdivided into those that physically share the execution tree, such as Binding Arrays scheme [W84, W87, LW90] and Versions Vectors [HC87] scheme, arid those that do not, such as MUSE [AK90] and Delphi [CA88]. Both these kinds of schemes have their advantages. The advantage of nonsharing schemes such as Muse and Delphi are that less synchronization is needed in general since each processor has its own copy of the tree and thus there is less parallel overhead [AK90]. This also means that they can be implemented on non-shared memory machines more efficiently. However, operations that may require synchronization and voluntary suspension such as side effects, cuts and speculative scheduling are more overhead prone to implement. When an or-parallel system reaches a side effect which is in a non-leftmost or-branch, it has two choices: (i) it can suspend the current branch and switch to some other node where there is work available, the suspended branch would be woken up when it becomes leftmost; or (ii) it can busywait at the current branch until it becomes left most. In case (i) an or-parallel system that does not share the execution tree, such as Muse, will have to save its current execution stack in a scratch memory-area since switching to a new node means that the current stack would be overwritten due to copying of the branches corresponding to the new node. Even if modern sophisticated multiprocessor Operating Systems may allow some memory-saving optimizations, a substantial memory overhead may still be presentt. The same holds for case (ii), where a modern OS may manage to avoid busy-waiting, but at the cost of extra memory. The essential conclusion is that for some applications (those that require processors to synchronize often riue to presence of a large number of side-effects and cuts) environment representation schemes which share the or-tree are better, and for some other applications (those that require processors to synchronize less often) schemes which maintain an independent ortree per processor are better. With this observation in mind we have extended both types of environment t Experimental results show that processors may voluntarily suspend as much as 10 to 100s of times for large sized programs [S191j. representation schemes to accommodate independent and-parallelism with recomputation of goals. We first describe an extension of the Binding Arrays scheme, and then an extension of the stack-copying technique. Due to space limitations the essence of both approaches will be presented rather than specifying them in detail as full models, which is left as future work. 5.2. Environment Representation using BAs Recall that in the binding-array method [W84, W87] an offset-counter is maintained for each branch of the or-parallel tree for assigning offsets to conditional variables (CVs)t that arise in that branch. The 2 main properties of the BA method for or-parallelism are the following: (i) The offset of a conditional variable is fixed for its entire life. (ii) The offsets of two consecutive conditional variables in an or-branch are also consecutive. The implication of these two properties is that conditional variables get allocated space consecutively in the binding array of a given processor, resulting in optimum space usage in the BA. This is important because a large number of conditional variables might need to be created at runtimei. a BA \- ~~\f 1!0> c1 Fig (i): Part of a C-tree Figure (ii): Optimal Space Allocation in the BA Figure 5: BAs and Independent And-Parallelism In the presence of independent and-parallel goals, each of which has multiple solutions, maintaining contiguity in the BA can be a problem, especially if processors are allowed (via backtracking or or-parallelism) to search for these multiple solutions. Consider a goal with a parallel conjunction: a, (true => b 8& c), d. A part of its C-tree is shown in figure 5(i) (the figure t Conditional variables are variables that receive different bindings in different environments [GJ90j. i For instance, in Aurora [LW90j about 1Mb of space is allo- cated for each BA. 777 also shows the number of conditional variables that are created in different parts of the tree). If band e are executed in independent and-parallel by two different processors PI and P2, then assuming that both have private binding arrays of their own, all the conditional variables created in branch b-b1 would be allocated space in BA of PI and those created in branch of ee1 would be allocated space in BA of P2. Likewise conditional bindings created in b would be recorded in BA of PI and those in e would be recorded in BA of P2. Before PI or P2 can continue with d after finding solutions b1 and e1, their binding arrays will have to be merged somehow. In the AO-WAM [GJ89, G91a] the approach taken was that one of PI or P2 would execute d after updating its Binding Array with conditional bindings made in the other branch (known as the the BA loading operation). The problem with the BA loading operation is that it acts as a sequential bottleneck which can delay the execution of d, and reduce speedups. To get rid of the BA loading overhead we can have a common binding array for PI and P2, so that once PI and P2 finish execution of band e, one of them immediately begins execution of d since all conditional bindings needed would already be there in the common BA. This is consistent with our discussion in section 4.1 about having teams of processors where all processors in a team would share a common binding array. However, if processors in a team share a binding array, then backtracking can cause inefficient usage of space, because it can create large unused holes in the BA. This is because processors in a team, that are working on different independent and-parallel branches, will allocate offsets in the binding array concurrently. The exact number of offsets needed by each branch cannot be allocated in advance in the binding array because the number of conditional variables that will arise in a branch cannot be determined a priori. Thus, the offsets of independent and-branches will overlap: for example, the offsets of kl CVs in branch bl will be intermingled with those of k2 CVs in branch cl. Due to overlapping offsets, recovery of these. offsets, when a processor backtracks, requires tremendous book-keeping. Alternatively, if no book-keeping is done, it leads to large amount of wasted space that becomes unusable for subsequent offsets (see [GS92, G91, G91a] for more details). 5.2.1. Paged Binding Array To solve the above problem we divide the binding array into fixed sized segments. Each conditional variable is bound to a pair consisting of a segment number and an offset within the segment. An auxiliary array keeps track of the mapping between the segment number and its starting location in the binding array. Dereferencing CVs now involves double indirection: given a conditional variable bound to (i, 0), the starting address of its segment in the BA is first found from location i of the auxiliary array, and then the value at offset 0 from that address is accessed. A set of CV s that have been allocated space in the same logical segment (i.e. CV s which have common i) can reside in any physical page in the BA, as long as the starting address of that physical page is recorded in the ith slot in the auxiliary array. Note the similarity of this scheme to memory management using paging in Operating Systems, hence the name Paged Binding Array (PBA)t. Thus a segment is identical to a page and the auxiliary array is essentially the same as a page table. The auxiliary and the binding array are common to all the processors in a team. From now on we will refer to the BA as the Paged Binding Array (PBA), the auxiliary array as the Page Table (PT), and our model of and-or parallel execution as the PBA modelt. Every time execution of an and-parallel goal in a parallel conj unction is started by a processor, or the current page in the PBA being used by that processor for allocating CVs becomes full, a page-marker node containing a unique integer id i is pushed onto the trail-stack. The unique integer id is obtained from a shared counter (called a pt_eounter). There is one such counter per team. A new page is requested from the PBA, and the starting address of the new page is recorded in the ith location of the Page Table. i is referred to as the page number of the new page. Each processor in a team maintains an offset-counter, which is used to assign offsets to CV s within a page. When a new page is obtained by a processor, the offset-counter is reset. Conditional variables are bound to the pair , where i is the page number, and 0 is the value of the offset-counter, which indicates the offset at which the value of the CV would be recorded in the page. Every time a conditional variable is bound to such a pair, the offset counter 0 is incremented. If the value of 0 becomes greater than K, the fixed page size, a new page is requested and new page-marker node is pushed. t Thanks to David H. D. Warren for pointing out this similarity. t A paged binding array has also been used in the ElipSys system of ECRC [VX91], but for entirely different reasons. In ElipSys, when a choice point is reached the BA is replicated for each new branch. To reduce the overhead of replication, the BA is paged. Pages of the BA are copied in the children branches on demand, by using a "copy-on-write" strategy. In ElipSys, unlike our model, paging is not necessitated by independent andparallelism. 778 A list of free pages in the PBA is maintained separately (as a linked list). When a new page is requested, the page at the head of the list is returned. When a page is freed by a processor, it is inserted in the freelist. The free-list is kept ordered so that pages higher up in the PBA occur before those that are lower down. This way it is always guaranteed that space at the top of the PBA would be used first, resulting in optimum space usage of space in the PBA. While selecting or-parallel work, if the untried alternative that is selected is not in the scope of any parallel conjunction, then task-switching is more or less like in purely or-parallel system (such as Aurora), modulo allocation/ deallocation of pages in the PBA. If, however, the untried alternative that is selected is in the and-parallel goal g of a parallel conjunction, then the team updates its PBA with all the conditional bindings created in the branches corresponding to goals which are to the left of g. Conditional bindings created in g above the choice point are also installed. Goals to the right of g are restarted and made available to other member processors in the team for and-parallel execution. Notice that if a C-tree is folded into an or-parallel tree according to the relationship shown in figures 2 and 3, then the behaviour of (and the number of conditional bindings installed/ deinstalled during) task switching would closely follow that of a purely or-parallel system such as Aurora, if the same scheduling order is followed. Note that the paged binding array technique is a generalization of the environment representation technique of AO-WAM [GJS9, G91a], hence some of the optimizations [GJ90a] developed for the AO-WAM, to reduce the number of conditional bindings to installed/deinstalled during task-switching, will also apply to the PBA model. Lastly, seniority of conditional variables, which needs to be known so that "older" variables never point to "younger ones" , can be easily determined with the help of the pair. Older variables will have a smaller value of i; and if i is the same, then a smaller value of o. More details on Paged Binding Arrays can be found in [GS92, G91]. 5.3. The Stack Copying Approach An alternative approach to represent multiple environments in the C-tree is to use explicit stack-copying. Rather than sharing parts of the tree, the shared branches can be explicitly copied, using techniques similar to those employed by the MUSE system [AK90]. To briefly summarize the MUSE approach, whenever a processor PI wants to share work with another processor P2- it selects an untried alternative from one of the choice points in P2's stack. It then copies the entire stack of P2, backtracks up to that choice point to undo all the conditional bindings made below that choice point, and then continues with the execution of the untried alternative. In this approach, provided there is a mechanism for copying stacks, the only cells that need to be shared during execution are those corresponding to the choice points. Execution is otherwise completely independent (modulo side-effect synchronization) in each branch and identical to sequential execution. If we consider the presence of and-parallelism in addition to or-parallelism, then, depending on the actual types of parallelism appearing in the program and the nesting relation between them, a number of relevant cases can be distinguished. The simplest two cases are" of course those where the execution is purely or-parallel or purely and-parallel. Trivially, in these situations standard MUSE and &-Prolog execution respectively applies, modulo the memory management issues, which will be dealt with in section 5.3.2. Of the cases when both and- and or-parallelism are present in the execution, the simpler one represents executions where and-parallelism appears "under" orparallelism but not conversely (i.e. no or-parallelism appears below c-nodes). In this case, and again modulo memory management issues, or-parallel execution can still continue as in Muse while and-parallel execution can continue like &-Prolog (or in any other local way. The only or-parallel branches which can be picked up appear then above any and-parallel node in the tree. The process of picking up such branches would be identical to that described above for MUSE. In the presence of or-parallelism under andparallelism the situation becomes slightly more complicated. In that case, an important issue is carefully deciding which portions of the stacks to copy. When an untried alternative is picked from a choice-point, the portions that are copied are precisely those that have been labelled as "shared" in the C-tree. Note that these will be precisely those branches that will also be copied in an equivalent (purely or-parallel) MUSE execution. In addition, precisely those branches will be recomputed that are also recomputed in an equivalent (purely and-parallel) &-Prolog execution. Consider the case when a processor selects an untried alternative from a choice point created during execution of a goal gj in the body of a goal which occurs after a parallel conjunction where there has been andparallelism above the the selected alternative, but all the forks are finished. Then not only will it have to copy 779 all the stack segments in the branch from the root to the parallel conjunction, but also the portions of stacks corresponding to all the forks inside the parallel conjunction and those of the goals between the end of the parallel conjunction and 9j. All these segments have in principle to be copied because the untried alternative may have access to variables in all of them and may modify such variables. On the other hand, if a processor selects an untried alternative from a choice point created during execution of a goal 9i inside a parallel conjunction, then it will have to copy all the stack segments in the branch from the root to the parallel conjunction, and it will also have to copy the stack segments corresponding to the goals 91 ... 9i-1 (i.e. goals to the left). The stack segments up to the parallel conjunction need to be copied because each different alternative within the 9iS might produce a different binding for a variable, X, defined in an ancestor goal of the parallel conjunction. The stack segments corresponding to goals 91 through 9i-1 have to be copied because the different alternatives for the goals following the parallel conj unction might bind a variable defined in one of the goals 91 .. . 9i-1 differently. 5.3.1. Execution with Stack Copying We now illustrate by means of a simple example how or-parallelism can be exploited in non deterministic and-parallel goals through stack copying. Consider the tree shown in figure I that is generated as a result.of executing a query q containing the parallel conjunction (true => a(X) &' b(Y». For the purpose of illustration we assume that there is an unbounded number of processors, PI ... Pn. Execution begins with processor PI executing the top level query q. When it encounters the parallel conjunction, it picks the subgoal a for execution, leaving b for some other processor. Let's assume that Processor P2 picks up goal b for execution (figure 6.(i)). As execution continues PI finds solution ai for a, generating 2 choice points along the way. Likewise, P2 finds solution bi for b. Since we also allow for full or-parallelism within and-parallel goals, a processor can steal the untried alternative in the choice point created during execution of a by PI. Let us assume that processor P3 steals this alternative, and sets itself up for executing it. To do so it copies the stack of processor PI up to the choice point (the copied part of the stack is shown by the dotted line; see index at the bottom of figure 6), simulates failure to remove conditional bindings made below the choice point, and restarts the goals to its right (i.e. the goal b). Processor P4 picks up the restarted goal band finds a solution bi for it. In the meantime, P3 finds the solution a2 for a (see figure 6.(ii)). Note that before P3 can commence with the execution of the untried alternative and P4 can execute the restarted goal b, they have to make sure that any conditional bindings made by P2 while executing b have also been removed. This is done by P3 (or P4) getting a copy of the trail stack of P2 and resetting all the variables that appear in it. Like processor P3, processor P5 steals the untried alternative from the second choice point for a, copies the stack from PI and restarts b, which is picked up by processor P6. As in MUSE, the actual choice point frame is shared to prevent the untried alternative in the second choice point from being executed twice (once through PI and once through P3). Eventually, P5 finds the solution a3 for a and P6 finds the solution bi for b. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• y ............................ .. Pl P2 P3 P4 P5 I Ie!~'!: a1 bl (~ , b) /0J \ a2 (i) P6 (ii) (iii) • • .. • .. • ........• ...... • ........ ·4 ..• ..•• ....•• .......... •• .. ..:1 P7 (a J. /0) i~ al (iv) , b) (a , P8 b) (a , P9 b) \Ie I~\ \Ie \: .: \lei b2 ~2 (v) b2 a3 (vi) b2 i ••••••••••••••••••••••••••••••••• _ •••••••••••••••••••• • •••••••••••• 1•••••••••••••••••••••••••••••• branch executed locally .......•..............••. ..............··..·0 copied branch embryonic branch (untried alternative) Figure 6: Parallel Execution with Stack Copying Note that now 3 copies of b are being executed, one for each solution of a. The process of finding the solution bi for b leaves a choice point behind. The untried alternative in this choice point can be picked up for execution by another processor. This is indeed what is done by processors P7, P8 and P9 for each copy of b that is executing. These processors copy the stack of P2, P4 and P6, respectively, up to the choice point. 780 The stack segments corresponding to goal a are also copied (figures 6.(iv), 6.(v), 6.(vi)) from processors PI, P3 and P5, respectively. The processors P7, P8 and P9 then proceed to find the solution b2 for b. Execution of the alternative corresponding to the solution b2 in the three copies of b produces another choice-point. The untried alternatives from these choice points can be picked up by other idle teams in a manner similar to that for the previous alternative of b (not shown in figure 6). Note that if there were no processors available to steal the alternative (corresponding to solution b3) from b then this solution would have been found by processors P7, P8 and P9 (in the respective copies of b that they are executing) through backtracking as in &-Prolog. The same would apply if no processors were available to steal the alternative from b corresponding to solution b2. 5.3.2. Managing the Address Space While copying stack segments we have to make sure that pointers in copied portions do not need re,location. In Muse this is ensured by having a physically separate but logically identical memory spaces for each of the processors [AK90]. In the presence of and-parallelism and teams of processors a more sophisticated approach has to be taken. All processors in a team share the same logical address space. If there are n processors in the team the address space is divided up into m memory segments (m ;:::: n). The memory segments are numbered from I to m. Each processor allocates its heap, local stacks, trail etc. in one of the segments (this also implies that the maximum no. of processors that a team can have is m). Each team has its own independent logical address space, identical to the address space of all other teams. Also, each team has an identical number of segments. Processors are allowed to switch teams so long as there is a memory segment available for them to allocate their stacks in the address space of the other team. Consider the scenario where a choice point, which is not in the scope of any parallel conjunction, is picked up by a team Tq from the execution tree of another team Tp. Let x be the memory segment number in which this choice point lies. The root of the Prolog execution tree must also lie in memory segment x since the stacks of a processor cannot extend into another memory segment in the address space. Tq will copy the stack from the xth memory segment of Tp into its own xth memory segment. Since the logical address space of each team is identical and is divided into identical segments, no pointer relocation would be needed. Failure is then simulated and the execution of the un- tried alternative of the stolen choice point begun. In fact, the copying of stacks can be done incrementally as in MUSE [AK90] (other optimizations in MUSE to save copying should apply equally well to our model, and are left as future work). Now consider the more interesting scenario where a choice point, which lies within the scope of a parallel conjunction, is picked up by a processor in a team Tq from another team Tp. Let this parallel conjunction be the CGE (true =} gl& ... &gn) and let gi be the goal in the parallel conjunction whose sub-tree contains the stolen choice point. Tq needs to copy the stack segments corresponding to the computation from the root up to the parallel conjunction and the stack segments corresponding to the goals gl through gi. Let us assume these stack segments lie in memory segments of team Tp and are numbered Xl, ... , xk. They will be' copied into the memory segments numbered Xl, ... , Xk of team Tq. Again, this copying can be incremental. Failure would then be simulated on gi. We also need to remove the conditional bindings made during the execution of the goal gi+1 ... gn by team Tp. Let Xk+1 ... Xl be the memory segments where gi+1 .. , gn are executing in team Tp. We copy the trail stacks of these segments and reinitialize (i.e. mark unbound) all variables that appear in them. The copied trail stacks can then be discarded. Once removal of conditional bindings is done the execution of the untried alternative of the stolen choice point is begun. The execution of the goals gi+1 ... gn is restarted and these can be executed by other processors which are members of the team. Note that the copied stack segments occupy the same memory segments as the original stack segments. The restarted goals can however be executed in any of the memory segments. An elaborate description of the stack-copying approach, with techniques for supporting side-effects, various optimizations that can be performed to improve efficiency, and implementation details are left as future work. Preliminary details can be found in [GH91]. 6. Conclusions & Comparison with Other Work In this paper, we presented a high-level approach capable of exploiting both independent and-parallelism and or-parallelism in an efficient way. In order to find all solutions to a conjunction of non-deterministic andparallel goals in our approach some goals are explicitly recomputed as in Prolog. This is unlike in other and-or parallel systems where such goals are shared. This allows our scheme to incorporate side-effects and to support Prolog as the user language more easily and simplifies other implementation issues. 781 In the context of this approach we also presented two techniques for environment representation in the presence of independent and-parallelism which are extensions of highly successful environment representation techniques for supporting or-parallelism. The first technique, based on Binding Arrays [W84, W87], and termed Paged Binding Array technique, yields a system which can be viewed as a direct combination of the Aurora [LW90] and &-Prolog [HG90] systems. The second technique based on stack copying [AK90] yields a system which can be viewed as a direct combination of the MUSE [AK90] and &-Prolog systems. If an input program has only or-parallelism, then the system based on Paged Binding Arrays (resp. Stack copying) will behave exactly like Aurora (resp. Muse). If a program has only independent and-parallelism the two models will behave exactly like &-Prolog (except that conditional bindings would be allocated in the binding array in the system based on Paged Binding Arrays). Our approach can also support the extralogical features of Prolog (such as cuts and side-effects) transparently [GS91], something which doesn't appear to be possible in other independent-and/or parallel models [BK88, GJ89, RK89]. Control in the models is quite simple, due to recomputation of independent goals. Memory management is also relatively simpler. We firmly believe that the approach, in its two versions of Paged Binding Array and stack copying can be implemented very efficiently, and indeed their implementation is scheduled to begin shortly. The implementation techniques described in this paper can be used for even those models that have dependent and-parallelism, such as Prometheus [SK92], and IDIOM (with recomputation) [GY91]. They can also be extended to implement the Extended Andorra Model [W90]. Acknowledgements Thanks to Vitor Santos Costa for his numerous comments on this paper, and to Raed Sindaha and Tony Beaumont for answering many questions about Aurora and its schedulers. The research presented in this paper has benefited from discussions with Kish Shen, Khayri Ali, Roland Carlsson, and David H.D. Warren, all of whom we would like to thank. This research was supported by U.K. Science and Engineering Research Council grant GR/F 27420, ESPRIT project PEPMA and CICYT project TIC90-1105-CE. [AK90] References K. Ali, R. Karlsson, "The Muse Or-parallel Prolog Model and its Performance," In Proceedings of the North American Conference on Logic Programming '90, MIT Press. [AK91] K. Ali, R. Karlsson, "Full Prolog and Scheduling Or-parallelism in Muse," To appear in International Journal of Parallel Programming, 1991. [BK88] Uri Baron, et. al., "The Parallel ECRC Prolog System PEPSys: An Overview and Evaluation Results," In Proceedings of FGCS '88, Tokyo, pp. 841-850. [CA88] W. F. Clocksin and H. Alshawi, "A Method for Efficiently Executing Horn Clause Programs Using Multiple Processors," In New Generation Computing, 5(1988), 361-376. [CC89] S-E. Chang and Y.P. Chiang, "Restricted And-Parallelism Model with Side Effects," Proceedings of North American Conference on Logic Programming, 1989, MIT Press, pp. 350-368. [CG91] 'M. Carro, L. Gomez, and M. Hermenegildo, "VISANDOR: A Tool for Visualizing And/Or-parallelism in Logic Programs," Technical Report, U. of Madrid (UPM), MadridSpain, 1991. Dutra, "Flexible Scheduling in the Andorra-I System," In Proc. ICLP'91 Workshop on Parallel Logic Prog., Springer Verlag, LNCS 569, Dec. 1991. [D91] I. [G91] G. Gupta, "Paged Binding Array: Environment Representation in And-Or Parallel Prolog," Technical Report TR-91-24, Department of Computer Science, University of Bristol, Oct. 1991. [G91a] G. Gupta, "And-Or Parallel Execution of Logic Programs on Shared Memory Multiprocessors," Ph.D. Thesis, University of North Carolina at Chapel Hill, Nov. 1991. [GS92] G. Gupta, V. Santos Costa, "And-Or Parallel Execution of full Prolog based on Paged Binding Arrays," To appear in Proceedings of Parallel Languages and Architectures Europe (PARLE '92), June 1992. [GH91] G. Gupta and M. Hermenegildo, "ACE: And/Or-parallel Copying-based Execution of Logic Programs," Technical Report TR-9125, Department of Computer Science, University of Bristol, Oct. 1991. Also in Springer Verlag LNCS 569, Dec. '91. [GJ89] G. Gupta and B. J ayaraman, "Compiled And-Or Parallel Execution of Logic Programs," In Proceedings of the North American Conference on Logic Programming '89, MIT Press, pp. 332-349. [GJ90] G. Gupta and B. J ayaraman, "On Criteria for Or-Parallel Execution Models of Logic Programs," In Proceedings of the North A mer- 782 ican Conference on Logic Programming '90, MIT Press, pp. 604-623. [GJ90a] G. Gupta and B. Jayaraman, "Optimizing And-Or Parallel Implementations," In Proceedings of the North American Conference on Logic Programming '90, MIT Press, pp. 737-756. [GS91] G. Gupta, V. Santos-Costa, "Cut and Side Effects in And-Or Parallel Prolog," Technical Report TR-91-26, Department of Computer Science, University of Bristol, Oct. 1991. [GY91] G. Gupta, V. Santos Costa, R. Yang, M. Hermenegildo, "IDIOM: A Model for Integrating Dependent-and, Independent-and and Or-parallelism," In Proc. Int'l. Logic Programming Symposium '91, MIT Press, Oct. 1991. [H86] M. V. Hermenegildo, "An Abstract Machine for Restricted And Parallel Execution of Logic Programs". 3rd International Conference on Logic Programming, London, 1986. [HG90] M. V. Hermenegildo, K.J. Greene, "&-Prolog and its performance: Exploiting Independent And-Parallelism," In Proceedings of the 7th International Conference on Logic Programming, 1990, pp. 253-268. [HN86] M. V. Hermenegildo and R. I. Nasr, "Efficient Implementation of backtracking III AND-parallelism" , 3rd International Conference on Logic Programming, London, 1986. [HC87] B. Hausman, et. al., "Or-Parallel Prolog Made Efficient on Shared Memory Multiprocessors," in 1987 IEEE Int. Symp. in Logic Prog., San Francisco, CA. [HC88] B. Hausman, A. Ciepielewski, and A. Calderwood, "Cut and Side-Effects in Or-Parallel Prolog," In International Conference on Fifth Generation Computer Systems, Tokyo, Nov. 88, pp. 831-840. [LK88] Y-J. Lin and V. Kumar, "AND-parallel execution of Logic Programs on a Shared Memory Multiprocessor: A Summary of Results" , in Fifth International Logic Programming Conference, Seattle, WA. [LW90] E. Lusk, D.H.D. Warren, S. Haridi et. al. "The Aurora Or-Prolog System", In New Generation Computing, Vol. 7, No. 2,3,1990 pp. 243-273. [MH89] K. Muthukumar and M. Hermenegildo, "Complete and Efficient Methods for Supporting Side-effects III Independent/Restricted And-Parallelism," In Proc. of ICLP, 1989. [MH89a] K. Muthukumar, M. V. Hermenegildo, "Determination of Variable Dependence Information through Abstract Interpretation," In Proc. of NACLP '89, MIT Press. [RS87] M. Ratcliffe, J-C Syre, "A Parallel Logic Programming Language for PEPSys" In Proceedings of IICAI '87, Milan, pp. 48-55. [RK89] [S91] [S89] [SH91] B. Ramkumar and L. V. Kale, "Compiled Execution of the REDUCE-OR Process Model," In Proc. of NACLP '89, MIT Press, pp. 31333l. R. Sindaha, "The Dharma Scheduler Definitive Scheduling in Aurora on Multiprocessor Architecture," Technical Report, Department of Computer Science, University of Bristol, forthcoming. P. Szeredi, "Performance Analysis of the Au- . rora Or-parallel Prolog System," In Proc. of NACLP, MIT Press, 1989, pp. 713-732. K. Shen and M. Hermenegildo, "A Simulation Study of Or- and Independent AndParallelism," In Proc. Int'l. Logic Programming Symposium '91, MIT Press, Oct. 1991. [SI91] R. Sindaha, Personal Communication, Sep. 1991. [SK92] K. Shen, "Studies of And-Or Parallelism in Prolog," Ph.D. thesis, Cambridge University, 1992, forthcoming. [SW91] . V. Santos Costa, D. H. D. Warren, R. Yang, "Andorra-I: A Parallel Prolog system that transparently exploits both And- and OrParallelism," In Proceedings of Principles & Practice of Parallel Programming, Apr. '91, pp. 83-93. [VX91] A. Veron, J. Xu, et. al., "Virtual Memory Support for Parallel Logic Programming Systems," In PARLE'91, Springer Verlag, LNCS 506, 1991. [W84] D. S. Warren, "Efficient Prolog Memory Management for Flexible Control Strategies," In The 1984 Int. Symp. on Logic Prog., Atlantic City, pp. 198-202. D. H. D. Warren, "The SRI-model for OrParallel execution of Prolog - Abstract Design and Implementation Issues," 1987 IEEE Int. Symp. in Logic Prog., San Francisco. [W87] [W90] D.H.D. Warren, "Extended Andorra Model with Implicit Control" Talk given at Workshop on Parallel Logic Programming, 7th ICLP, Eilat. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 783 Estimating the Inherent Parallelism in Prolog Programs David C. Sehr • Laxmikant V. Kale t University of Illinois at Urbana-Champaign Abstract In this paper we describe a system for compile time instrumentation of Prolog programs to estimate the amount of inherent parallelism. Using this information we can determine the maximum speedup obtainable through OR- and AND/OR-parallel execution. We present the results of instrumenting a number of common benchmark programs, and draw some conclusions from their execution. 1 Introduction In this paper we describe a method for timing Prolog programs by instrumenting the source code. The resulting program is run sequentially to estimate the sequential and best possible OR parallel execution times. This method is then extended to give the b~st possible AND/OR parallel execution time. Our instrumentation does not drastically r~ duce efficiency, and we present the results of a number of programs. Our AND parallelism estimation method is based upon the work of by Kumar [1988] in estimating the inherent parallelism in Fortran programs. His method augments the source program with a timestamp for each data item d, which is updated each time d is written. In order to honor dependences, each computation that reads d can begin no earlier than the time recorded in d's timestamp. The largest timestamp computed by such an augmented program is the optimal parallel time for the original program. This time can be used to evaluate how well a given implementation exploits parallelism. This paper comprises six sections. The remain• Center for Supercomputing Research and Development, 305 Talbot Laboratory, 104 S. Wright St., Urbana, IL 61801, USA. (sehr@csrd.uiuc.edu) This work was supported by Air Force Office of Scientific Research grant AFOSR 90-0044 and a grant from the IBM Corporation to CSRD. t Department of Computer Science, Digital Computer Laboratory, 1304 W. Springfield Ave., Urbana, IL 61801, USA. (tale!Dlcs. uiuc. edu) This work was supported in part by NSF grant NSF-CCR-89-02496. der of the first presents some terminology. The second describes measuring the amount of OR parallelism in a Prolog program. The third section extends this method to include AND parallelism. The fourth presents the timing methods used for several builtin predicates. The fifth section gives the results of our technique on the UCB Benchmarks. The last section presents some conclusions and suggests some future work. 1.1 Terminology A prolog program consists of a top-level query and a set of clauses. The top-level query is a sequence of literals; we shall also use the term query to refer to any arbitrary sequence of literals. A literal is an atom or a compound term consisting of a predicate name and a list of subterms or arguments. Each subterm is an atom or a compound term. The number of sub terms of a compound term is its arity. A clause has a head literal and zero or more body literals. A clause with no body literals is a fact; others are rules. Clauses are grouped into procedures by the predicate name and arity of their head literals. The rest of this paper assumes some working knowledge of Prolog's execution strategy. For our timings we model a program's execution as traversal of its OR tree (SLD tree). Each node in an OR tree is labeled by a query. The first literal of the query at node N is the literal at N. The label of the root is the top-level queryl. Each child N of a node M is produced by unifying a clause C's head with the literal L at M. N's query is formed by replacing L in M's query by the body of C. The left-to-right order of such children is the order of the clauses in the source program. A leaf node with an empty query is a success. Sequential Prolog systems traverse this tree depth-first and left to right. 1 Which may have appeared in the source program, or may have been typed by the user at the read-evaluate-print prompt. 784 2 Sequential and OR time The most efficient OR parallel implementations of Prolog to date [Warren 1987, Ali 1990] have been based upon the Warren Abstract Machine (WAM) [Warren 1983]. Because of this, we compute critical path timings in number of WAM instructions executed. The number of instructions is an approximation to execution time, since each type of WAM instruction takes a slightly different time. Variations in execution time come mainly from two sources: argument unification and backward execution. The former comes from the get_value and unify _value instructions, whose costs depend on the size of the terms they unify, which can be substantial. We address this by making the cost of these instructions the number of unification steps they perform. Backward execution comes from instruction failure and may perform significant bookkeeping changes, especially for deep backtracking. Different WAM implementations, particularly parallel ones, have differing costs for backward execution. In the measurements presented here we have assumed zero backward execution cost, but other cost assumptions can be used. The execution time of a program has two components. The literal L at a node N in the OR tree is a call to a procedure p. Calling p consists of setting up L's calling arguments by a sequence of put instructions and performing the call by a call or execute instruction. The execution time of this sequence is a statically computable time tp(L) for L, which we approximate by the number of put instructions plus one. Executing a called procedure consists of trying clauses in succession. If C is being tried for the call, the call arguments are unified with the arguments of C's head literal H. This is done by get and unity instructions and takes a time tu(H). In general the execution time of these instructions cannot be estimated at compile time, so this head unification is performed by calls to run-time routines for the corresponding WAM instructions. tu (H) is the sum of the times computed by these routines. To represent execution times the OR tree is given two new labels. First, each node N is labeled with the time tp(L) Jor the literal L at N. Second, each edge (N, M) is labeled with tu(H), where H is the head literal of the clause C applied to produce node M. The program's all-solutions sequential execution time is the sum of the all tp's and tu's in the tree's processed region 2 • 2 Predicates such as cut may prevent traversal of parts of the tree. fib(O,l). fib(l,l). fib(I,F) :I > 1, 11 is I - 1, fib(ll,Fl), 12 is I - 2, fib(12,F2), F is Fl + F2. Figure 1: A program to be timed ~ .p~::::.u~t_.:.::v::J.ar.=.ia_bl_e_A_l_,A_l...... 1tp(L) Ts(L) pucconstant 3,AO saIl fib!2 T~) geCvariable AO,YO ttU(H) gecvariable Al,Yl f·····~;h;;li~~;~1s·····1 1 ••••••• _--._-----.--_.---••••• ~ Ts U pucvalue Xl,Al ( ~ puCvariable X2,A2 1 tp(Li) Te(Li) r··;;~~;;i~;·~~~·k····! ~ ~ I •••••••••••••••••••••••••••••• r····~;h;;·ii~~;~i~·····l I •••••••••••••••••••••••••••••• Te(C) Figure 2: Execution of a timed literal L 2.1 Pure Prolog Finding the minimum OR parallel time requires finding the critical path in the OR tree. For a pure Prolog program this is done by summing the tp's and tu's from the root to each leaf. The critical path has the largest such sum. Programs containing builtins such as read, setof, recorda, and assert require timing in sequential order. We first describe the method for pure programs and extend it to handle these predicates below. Figure 1 shows a program to be instrumented, and Figure 2 shows its execution. The time at which literal L is to be processed is denoted by T,(L). If L is at the root of the OR tree, then T,(L) O. Otherwise T,(L) is the time the p~e­ ceding computation finished. Execution of L begms with the puts and call, which take time tp(L), as we noted above. Thus the earliest time any clause can be tried for L is T,(L)+tp(L). This is the start time T,(C) for every clause C applied for L, since all are tried in OR parallel. Head unification for = 785 C begins at T, (C) and is done by get (and unify) instructions. If successful, this completes at time T,(C) + tu(H). If C is a fact, then the end time is Te(C) + 1, where the 1 is for the proceed instruction. fib(A,B,Ts,Te) :(A == 0 j var(A», get_constant(A,O, Ts, Tul), get_constant(B,l, Tul, Tu2) , update_time (Tu2, 1, Te). If C is a rule, each literal Li is processed as L was, begins at time T,(Li), and ends successful execution at time Te(Li). The first body literal begins at time T,(L 1 ) T,(C)+tu(H). If the call from Li is successful and returns at time Te(Li), then the next literal Li+l starts at time T,(Li+d Te(Li). This continues until the last literal Ln completes at time Te(Ln), which is also the finish time Te(C) for C. fib(A,B,Ts,Te) :(A == 1 j var(A», get_constant(A,l, Ts, Tul), get_constant(B,l, Tul, Tu2), update_time (Tu2, 1, Te). = = fib(A,B,Ts,Te) :get_variable(A,N, Ts, Tul), get_variable(B,F, Tul, Tneck), update_time (Tneck, 4, Tel), N > 1, update_time(Tel, 6, Te2), Nl is N - 1, update_time(Te2, 3, Ts3), fib(Nl, Fl, Ts3, Te3), update_time(Te3, 6, Te4), N2 is N - 2, update_time (Te4, 3, Ts5), fib(N2, F2, Ts5, Te5), update_time(Te5, 6, Te), F is Fl + F2. The time for a success is Te(L) for the last literal L in the top-level query. The time for a failed instruction in C is T,(C) plus the portion oftu(H) computed before the failure. Most builtins are given a cost of one, and builtin failure takes the same time as a successful call does. The system maintains a global critical path time Tmax. Whenever a library routine performing head unification fails at time TJ, it examines Tmax, and stores the larger of the two times as the new Tmax. The library routine that computes T, (C) also updates Tmax , and the top-level query is modified to update it as well. Figure 3 shows the timed version of Figure 1. Each clause has two new arguments, Ts and Te, and head unification is performed by routines such as get_constant and get_variable. These routines perform the corresponding WAM operation and update the critical path time. The first two clauses are facts, so the end time is computed by an update_time literal for the proceed instruction. The third clause is a rule, so each body literal L has a preceding update_time literal. If L refers to a user-defined predicate this literal computes T,(L) + tp(L) for use as the start time for the call. If L refers to a builtin predicates (except those in Section 4) the update_time literal adds tp(L), plus one for the builtin's execution time, and uses this as the end time for L. Each clause also has an initial index literal that enables last call optimization. Moving head unifications to the body made indexing impossible, so this literal is added to perform first argument indexing. If this is not done, last call optimization rarely works. This literal appears sufficient for last call optimization with the Sicstus compiler. Figure 3: Program after instrumentation 3 Adding AND parallelism The critical path time determines the best possible OR parallel execution time. Often segments of a branch can execute simultaneously, and doing so would reduce that critical path time. This is AND parallel execution, and unlike OR parallelism, it requires testing for dependences even in pure Prolog programs. In this section we describe the application of Kumar's [1988] techniques for Fortran to estimate the best AND/OR parallel execution time. The method we describe extends his work to deal with the dynamic data structures and aliasing present in Prolog. We believe this framework has the advantage over other methods [Shen 1986, Tick 1987] of allowing us to extend it to measure critical path times in programs with user parallelism. A program's dependences can only be exactly determined at execution time, since one execution may have a dependence while another does not. A compiler, to ensure legal execution, must assume a 786 __ {N} - / .. ,b(N,F) IN} {N\ Nl is N-l • N2 is N-2 • I I {N1} {N2} fi~i fi~~ '{F1, /2} • FisFl+F2 Figure 4: A dependence graph dependence exists unless it can be proven not to. Because of this, compilers often infer many more dependences than are actually present in the program. Another use of the method we propose is to compute exact dependences to test the effectiveness of dependence tests. There are a number of AND parallel execution models that differ in their treatment of the dynamic nature of dependences. The approaches range from dependence graphs that are static [Kale 1987, Chang et al 1985, Wise 1986] to partly dynamic (conditional) [DeGroot 1984, Hermenegildo 1988] to completely dynamic [Conery and Kibler 1985]. Kal'e [1984] notes that in some rare situations it may be beneficial to evaluate dependent literals in parallel. His Reduce-Or Process Model allows for dependent AND parallelism, but his implementation [Ramkumar and Kale 1989] supports only independent AND parallelism. Epilog [Wise 1986] also permits dependent AND parallelism, but provides a primitive (CABO) to curtail it. The model we have developed includes dynamic, independent AND parallelism, with a strict sequential ordering on dependent literals. We are only able to present the results here for independent AND parallel execution, though, because of a problem in the Prolog system used to execute the instrumented programs. In the future we hope to report the timings for the more general approach. 3.1 Dependences The third clause in Figure 1 contains six body literals that might potentially execute in parallel. The arguments of the > builtin must both be nu- meric expressions, so to execute correctly the argument I to fib must be an integer. Because neither writes I, the two is goals can execute independently. Each reads B and produces a binding for 11 or 12, the values of B for the recursive instances. Since all fib clauses read B, the recursive calls can only begin after their corresponding is. The final is literal requires the value of both F1 and F2 so the two fib calls must precede the final is. There need be no other ordering between literals. Figure 4 shows the dependence graph for the clause. There is a node for the initial call to fib and a node for each body literal. Recursive computations are represented by shaded areas. An arc between two nodes represents a dependence, or that the node at the tail must precede the node at the head of the arc. Dependence arcs are labeled with the variables causing them. Such a variable v causes a dependence 6 in one of two ways. First, if the node at the tail of Qbinds v, and and v is read at the head, then there is a data dependence. Second, if the node at the head of 6 binds v and the node at the tail reads v using a metalogical predicate (var, write, etc.), then there is an anti-dependence. Anti-dependences arise when a literal succeeds with a variable v unbound and would fail or produce incorrect output because v is subsequently bound. 3.2 Shadow terms Dependences are detected at run time by shadow terms. Each term t has a shadow term "p(t) associated with it, which mirrors t's structure. The shadow of an atomic term is the atom a. The shadow term of a compound term t l(t 1 , ••• , tn) is s( "p(tt) , ... , .,p(t n )) , where .,p(ti) is the shadow for = ti. A variable must be bound for a dependence to exist, so the shadow term for a variable keeps the binding times for that variable (there can be multiple bindings, since some may be variable-tovariable). The shadow of an unbound variable is unbound. If v is bound to any term t at time T by a get_variable or unify_variable instruction, the shadow variable "p( v) is dereferenced and the final variable is bound to the structure w("p(t), T). The same operation is performed if v is bound to a non-variable term t by any other instruction. If v is bound to another variable Vi by any other instruction at time T, an alias has been created. The two shadows reflect this by dereferencing both .,p( v) and "p( Vi) and binding the final variables of both to the term w(.,p'(v),T), where .,p'(V) is a new unbound 787 fib(A. B, Sa, Sb. Ta, Te) .(A var(A», get_constant(A,O, Sa, Ts, Tul), get_constant(B,l, Sb, Tul, Tu2), update_times (Tu2, 1, Te). == °; fib(A. B, Sa, Sb, Ts, Te) .(A 1 ; var(A», get_constant(A,l, Sa, Ts, Tul), get_ constant (B, 1, Sb, Tul, Tu2) , update_times (Tu2 , 1, Te). == fib(A. B, Sa, Sb. Ts. Te) .get_variable(A,I, Sa, Sn, Ts, Tul), get_variable(B,F, Sb, Sf, Tul, Tu), max_shadow_time(Tu, [Sn], Ttl), update_time(Ttl, 4, Tel), I > 1, max_shadow_time(Tu, [Snl,Sn], Tt2), update_time(Tt2, 6, Te2), 11 is 1 - 1, set_shadows([Snl],[ll],Te2), update_time(Tu, 3, Ts3), fib(ll, Fl, Snl, Sfl, Ts3, Te3), max_shadow_time(Tu, [Sn2,Sn], Tt4), update_time(Tt4, 6, Te4), 12 is 1 - 2, set_shadows([Sn2], [12], Te4), update_time(Tu, 3, Ts6), fib(12, F2, Sn2, Sf2, Ts6, Te6), max_shadow_time(Tu, [Sf,Sfl,Sf2], Tt6), update_time(Tt6, 6, Te6), F is Fl+F2, set_shadows([Sf], [F], Te6), max([Tel,Te2,Te3,Te4,Te6,Te6], Te). Figure 5: AND/OR instrumented program variable. If v is examined by a meta-logical builtin at time T, ..p( v) is dereferenced, and the final variable is bound to m(..p' (v), T), where ..p' (v) is a new unbound variable. 3.3 Dependences with shadow terms Figure 5 shows fib after instrumentation for AND/OR parallelism. Each variable V in a clause has a shadow variable Sv, and each head argument has a shadow argument. The end time for a clause is the largest end time for any literal in that clause, as if each literal starts immediately after head unification and suspends until its dependences are satisfied. In Figure 5 the end time is shown as computed by a max literal at the end of the clause. This is for clarity of presentation only, because this would inhibit last call optimization. In the real version a current maximum is passed to each body literal in successIOn. The head unification routines now include shadow variables as arguments, since it is in these instructions that dependences in user-defined predicates are enforced. These routines previously computed their finish time only from the start time and the cost of the instruction. Now there is the possibility that the instruction must wait until the shadow time for a variable causing a dependence before performing the unification. Hence the completion time is computed by performing the unification and keeping a current time. Whenever a term is referenced the current time becomes the maximum of the current time and the timestamp. The unification is then performed and the current time incremented. Two other predicates enforce dependences involving builtins. The first, max_shadow_time, computes the earliest time the builtin's arguments are available3 from the latest time in the arguments' shadows. This enforces data dependences that have the builtin as their head. The builtin's end time is computed by update_time, as before. The second predicate, set-Bhadows, builds shadows for changes to the arguments of a builtin. Shadows are built for those arguments that are bound or are examined by meta-Iogicals, and they are constructed from the variable bindings after execution. This handles builtins at the tail of a dependence. For some builtins such as =.. this can be fairly complex. 4 Builtin predicates Prolog has several types of builtin predicates, each with a different set of effects on critical path timing. We have already noted that meta-logical builtins (var, write, etc.) can cause anti-dependences. In this section we describe four other kinds of predicates and methods for timing each of them. 4.1 Predicates involving call There are four predicates that implicitly use the meta-logical builtin call. They are bagof, setof, not, and \ +. Timing these predicates requires two 3This predicate is also used to enforce independent-AND parallel execution, by making every user predicate strict 788 Te-Tmax Tmax-max{pop, Te) Te2=max{Td(2), Te1j+ 1 lasLio=Te2 Figure 6: Processing setof Figure 7: Processing the input/output predicates kinds of special handling. First, since call's arguments may be constructed at run time, instrumentation is done at run time. This is done by including the the instrumentation program in the timed program. Second, setot, bagot, not and \+ traverse an entire OR tree, so their finish times are related to the longest path in that tree. A stack of maximum times is used with nested calls to these predicates to collect a subtree's maximum time. For setof and bagot we also add one for each solution for the cost of building the returned list. Figure 6 shows the processing of a call to setof that computes all the solutions for the p(X) in region R and collects them in a list L. Since it traverses the whole OR tree R required to compute p(X), setof's finish time is the longest completion time in R. The maximum time is maintained by update_time in the global variable Tmax4. Since there may be a previous maximum time greater than the largest completion time in R, Tmax is pushed on a stack and the start time for the setof is used as Tmax. R is traversed and the maximum time is stored in Tmax, as always. The return time for setof, Te is Tmax. At the end of setof Tmax is set to the maximum of the stack value and Te, so again Tmax contains the global largest time. cross-branch dependences, since the observable order of input/output needs to conform to Prolog's left-to-right order. Figure 7 depicts the execution of a program with two writes, WI and W2. Data dependence would permit each write to start when its arguments were ready (times Td(l) and Td(2) respectively) were it not for the order of output. WI must write its output before W2, so to determine when input or output can be done we maintain a global variable last_io. In this example, W2 cannot write its output until max{Td(2), last.io}. Writes cost one time unit, so W2 can start no earlier than max{T)d(2) , Td(l) + I}. In the instrumented version each input/output predicate is preceded by a literal that updates last_io. 4.2 Read and write Neither setof nor pure Prolog cause dependences between branches in the OR tree. The input/output predicates (read, write, etc.) cause 41n the implementation of our system the maximum time, along with a parallelism histogram, is maintained by several C routines accessed through a foreign function interface, but this is done only for the sake of efficiency. 4.3 Recorda and recorded Prolog also has the builtins recorda, recorded, and erase to manipulate an internal database. Parallel accesses to relations in the database must appear to preserve the sequential execution order. Accesses to different database relations do not affect one another, so this order is only within a relation. It is not necessary to serialize accesses to each relation to preserve the appearance of sequential access order. All we need is to guarantee that read accesses to an element by recorded occur after the write access that placed that element there, and that write accesses (recordas and erases) are ordered. The former is enforced by pairing each item placed in the database with its insertion time. Accesses by recorded wait until the maximum of the data dependence time Td and the element's insertion time. The write order is enforced by labeling each relation with a last..modify that is updated 789 Program Name chat_parser crypt divide10 fast-ITIU flatten log10 met~qsort mu nand nreverse ops8 poly10 prover qsort queens8 query reducer serialise tak times10 unify zebra Serial WAM Instr. 1014791 31787 207 8899 5218 119 38675 5925 180145 4460 163 307177 7159 5770 33821 17271 279220 3199 1431202 207 29490 261858 OR Parallel Speedup 257 58 1 9.1 1.25 1 2.1 16.7 5.4 1 1.04 1.1 4.5 1.3 26.4 243 2 1.4 1.1 1 1.6 453 AND/OR Parallel Speedup 1596 114 2 10.7 2.37 1.2 3.7 17.7 14.3 1 2.8 76.3 14.2 1.5 69.3 480 3.3 1.9 686 1.9 3.5 482 Table 1: Instrumented benchmark times just like last_io. 4.4 Assert and retract Prolog also allows as s ert and retract to modify the program at run time. These predicates are timed by the method for call and that for the internal database. The former is because the asserted clause can be constructed at run time, and hence the instrumentation must be done then. The latter is because predicates modified at run time must obey the access rules for database updates. The write-write (assert and retract) order is enforced by updating the last...modify for the predicate. The read-write ordering is maintained by adding a first literal to each asserted clause that records when it was added. This is used to determine the earliest time a read (a clause builtin or call to the modified predicate) can execute. 5 Analysis of programs Table 1 presents the results obtained by instrumenting 23 of the University of California at Berke- ley's UCB benchmarks. These programs range over a variety of sizes and purposes. There are several interesting facts to observe from these programs. First, David H. D. Warren's assertion [Warren 1987] that OR parallelism was likely to produce significant speedups on a range of programs appears to be borne out. Several programs achieved small speedups from OR parallelism, mostly due to shallow backtracking (e.g flatten, ops8, polyl0, qsort, tak, unify). Improved indexing would probably eliminate most of this OR parallelism. A number of programs exhibited essentially no OR parallelism (e.g. divide10, log10, nreverse, timesl0). In general, independent AND parallel execution improved the performance of programs already speeded up by OR parallel execution by a small factor (1-6). These programs have all shown reasonable speedups in real OR parallel systems[Szeredi 1989]. Our results show that there is plenty of parallelism in several of these programs to extend to much larger machines (e.g. consider chat_parser, query and zebra). Those with smaller speedups may profit from the introduction of independent AND parallelism. Of the programs that were mostly OR-sequential, the majority get very small speedup by applying independent AND parallel execution. For divide10, log10, and times10, this is because the AND parallel sub-problems are very unbalanced; that is, one sub-problem is much larger than the other. For nreverse, the reason is that independent AND parallel execution is not able to execute the two body goals of nreverse in parallel. It is a recurrence, and is hence completely sequential. This can be addressed by replacing the algorithm or applying a parallel recurrence solver. The best results for independent AND parallelism come from polyl0 and tak. In both cases these give rise to fairly large numbers of independent subcomputations. In the case of tak, the branching factor is approximately three and the calling depth is large, so a large speedup is obtained. Qsort on a well-chosen input list with a better partition routine should be able to obtain similar results. These results are just the beginning of understanding the parallelizability of programs, as we would like information on the more general AND and other sorts of parallelism. However, they can tell us something about how much speedup we can reasonably expect from parallel models. Moreover, examining these programs to see where dependences occur should help in designing restruc- 790 turing transformations. 6 Conclusions The amount of OR and AND/OR parallelism in a Prolog program can be effectively measured by sequentially executing an instrumented version of that program. The timings obtained this way give a best-possible speedup under two different parallelism models, and can be used for a number of purposes. First, they can be used to evaluate the ability of a parallel execution model to exploit parallelism. These results can suggest areas of improvement for such models. We intend to instrument a number of programs for this purpose. With some relatively simple extensions this technique can measure the amount of a number of lower-level program characteristics. Among these are unification parallelism, backtracking properties, aliasing, data dependences, and dereference costs. Prolog can also be extended with predicates for source-level parallelism. With proper timing methods, this instrumentation method can be used to evaluate restructuring transformations for Prolog. The instrumentation system we described has been extended with such predicates and we have begun to evaluate transformations. In the future we will describe these extensions to the instrumentation method as well as the results of our restructuring transformations. Acknowledgments The authors would like to thank David Padua for his many useful suggestions about this work. References [Ali 1990] Khayri Ali. The muse or-parallel prolog model and its performance. In Proceedings of the 1990 North American Logic Programming Conference, pages 757-776, 1990. [Chang et a/1985] J. Chang, A. M. Despain, and D. DeGroot. And-parallelism of logic programs based on a static data dependency analysis. In Proceedings of Compcon 85, 1985. [Conery and Kibler 1985] J.S. Conery and D.F. Kibler. And parallelism and non determinism in logic programs. New Generation Computing, 3:43-70, 1985. [DeGroot 1984] D. DeGroot. Restricted andparallelism. In Proceedings of the International Conference on Fifth Generation Computer Systems, pages 471-478. North Holland, 1984. [Hermenegildo 1988] M. V. Hermenegildo. Independent AND-Parallel Prolog and its Architecture. Kluwer Academic Publishers, 1988. [Kale 1984] Laxmikant V. Kale. Parallel Architectures for Problem Solving. PhD thesis, State University of New York at Stony Brook, 1985. [Kale 1987] Laxmikant V. Kale. Parallel execution of logic programs: the reduce-or process model. In Proceedings of the International Conference on Logic Programming, pages 616632, May 1987. [Kumar 1988] Manoj Kumar. Measuring parallelism in computation intensive scientific/engineering applications. IEEE Transactions on Computers, 37(9), September 1988. [Ramkumar and Kale 1989] B. Ramkumar and L.V. Kale. Compiled execution of the reduceor process model on multiprocessors. In Proceedings of the 1989 North American Conference on Logic Programming, pages 313-331, October 1989. [Shen 1986] Kish Shen. An investigation of the argonne model of or-parallel prolog. Master's thesis, University of Manchester, 1986. [Szeredi 1989] Peter Szeredi. Performance analysis of the aurora or-parallel prolog system. In Proceedings of the 1989 North American Conference on Logic Programming, pages 713-732, 1989. [Tick 1987] Evan Tick. Studies in Prolog A rchitectures. PhD thesis, Stanford University, June 1987. [Warren 1983] David H. D. Warren. An abstract prolog instruction set. Technical report, SRI International, October 1983. Technical Note 309. [Warren 1987] David H.D. Warren. The sri model for or parallel execution of prolog - abstract design and implementation. In Proceedings of the 1987 Symposium on Logic Programming, pages 92-103. September 1987. [Wise 1986] Michael Wise. Prolog Multiprocessors. Prentice-Hall International Publishers, 1986. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 791 Implementing Streams on Parallel Machines with Distributed Memory t Koichi Konishi tTsu tomu Maruyama t Akihiko Konagaya :j:Kaoru Yoshida :j:Takashi Chikayama Abstract Stream-based concurrent object-oriented programming languages (SCOOL) to date have been typically implemented in concurrent logic programming languages (CLL). However, CLLs have two drawbacks when used to implement message streams on parallel machines with distributed memory. One is the lack of restriction on the number of readers of a shared variable. The other is a cascaded buffer representation of streams. These require many interprocessor communications, which can be avoided by language systems designed specially for SCOOLs. The authors have been developing such a language system named A'UM-90 for A'UM, a SCOOL with highly abstract stream communication. This paper presents the optimized method used in A'UM-90 to implement streams on distributed memory. A stream is represented by a message queue, which migrates to its reader's processor after the processor becomes known. The improvement from using this method is estimated in terms of the number of required interprocessor communication, and is demonstrated by the result of a preliminaryevaluation. 1 Introduction One natural use of concurrent logic programming languages(CLLs) is to implement the Actor or objectIn a CLL, it is oriented programming models. easy to specify objects running concurrently, communicating with one another by messages sent in streams[Shapiro and Takeuchi 1983]. Message streams in CLLs are especially useful, as they provide flexibility and modularity, and facilitates the exploitation of parallelism; they allow dynamic re-configuration of communitNEC Corporation 4-1-1, Miyazaki, Miyamae-ku, Kawasaki, Kanagawa 216, Japan {konishi, maruyama, ·konagaya}@csl.cl.nec.co.jp tInstitute for New Generation Computer Technology 1-4-28, Mita, Minato-ku, Tokyo 108, Japan {yoshida, chikayama}@icot.or.jp cation channels, while each object knows little about the partners with whom it is communicating. To support this style of programming, a number of languages have been proposed ([Furukawa et al. 1984] [Yoshida and Chikayama 1988] [Kahn et al. 1986] [Saraswat et al. 1990]). We call these languages streambased concurrent object-oriented languages(SCOOL). Most research on SCOOLs to date has been focused on providing excellent expressibility. While SCOOLs have been implemented in CLLs, to our knowledge, no language system dedicated for SCOOLs has been implemented. A dedicated system for SCOOL can be much more efficient than those implemented in CLLs when the abstraction and other information in programs are fully exploited. The authors have been developing such a dedicated system for a kind of SCOOL, A'UM. The system is named A'UM-90, and is targeted for multiprocessor systems with distributed memory. . In this paper, some drawbacks of CLLs as implementation languages for stream communications are discussed, then it is shown how A'UM's well-regulated abstract streams can be efficiently implemented. A brief description of such an implementation is given, its improvement over a CLL implementation is estimated, and the results of a preliminary evaluation are given. The next section describes the implementation of objects and stream communication in CLLs. Section 3 introduces SCOOLs as natural descendants of CLLs. Section 4 explains why CLLs are inadequate for implementing streams. Section 5 describes A'UM and A'UM-90 briefly. Section 6 describes the implementation of stream communication in A'UM-90 and its costs. Section 7 shows some results of evaluation. The last section gives conclusion. 2 Objects in eLL Stream-based concurrent object-oriented programming languages have evolved from efforts to embody the Actor or object~oriented programming models in CLLs[Shapiro and Takeuchi 1983]. This style of pro- 792 object([message(Arguments) I In], State) :true I method (Arguments , State, NewState), object(In, NewState). Figure 1: A clause representing an object gramming has the virtues of object-oriented programming such as modularity and natural parallelism in an extended way[Kahn et al. 1989]. For example, an object implemented in a CLL may have multiple input ports, and communication ports can be transferred between processes. Moreover, it can send messages before the destination is determined. In this chapter, an implementation of object-oriented programming in a CLL is briefly described. Many CLLs (FCP, FGHC, Flehg, Oc, Strand, etc.) have been proposed to date. We use FGHC[Ueda 1985] in the following explanation. Figure 1 shows a typical example of representing an object in FGHC. The behavior of an object is defined by a number of clauses similar to the one above. Given these clauses, a goal named obj ect represents the state of an object at a certain moment. The first argument is a shared variable used as a communication port, from which the object receives messages. The second argument is the internal state of the object. When another goal sharing the variable with the first goal assigns a term [message(Actuals) I Rest] to the variable, the above clause can be selected, and Rest becomes shared by the two goals. Actuals are bound to Arguments, and the body of the clause is executed. A goal named method performs most of the actual work, creating new states and assigning it to NewState. A new object goal is created with Rest as the first argument and NewState the second. Thus, an object, or a process, is represented by the recurring creation of goals with altered states. Communication ports are represented by variables shared by two goals. One goal emits a message by assigning a structure containing a message and a new variable. When the other goal receives the message by successfully matching itself with a head of a clause, the new variable becomes shared, to be used as a new port. By repeating this procedure, these goals can communicate as many messages as required, one after another. The connection is closed when a structure containing no variable is assigned. Communication in this style is called stream communication. Basically, stream communication is one-to-one as described above. However, several streams of messages can easily be merged into one by a simple process. A merger should have several ports representing the input streams to be merged and one more for the output. It receives a message from one of its input ports and forwards it to the output port. Many types of mergers with varying policies can be devised. A merger of one type might receive from an arbitrary port, utilizing the non-determinism in clause selection of the CLL. A merger of another type might concentrate on one port until the connection through it is closed, then it might move on to another port. We call the former type a merger, and the latter an appender, because it effectively appends streams one after another. 3 SCOOL Programming objects in a CLL has several obvious drawbacks. First, the implementation of stream communication is explicitly described in the program. Streams are explicitly formed using messages and a variable, and many to one communications are implemented with merger processes. Programmers must make sure that the same conventions are used throughout their programs. Secondly, contentions are apt to happen, due to the lack of restriction on multiple writers to a variable. Lastly, the verbosity, in particular manipulation of internal states, is excessive. It is cumbersome to provide all the details of communication. Many SCOOLs have been proposed to remove these drawbacks ([Furukawa et al. 1984] [Kahn et al. 1986] [Yoshida and Chikayama 1988] [Saraswat et al. 1990]). These languages have a form for class definition, introduced to make a concise description of object behavior possible. Stream communication is denoted by dedicated expressions, with its implementation removed from programs. To our knowledge, all SCOOLs have been implemented in CLLs. It is natural and efficient to use CLLs for this purpose, but is problematic with respect to the resulting system's performance. CLL systems can not provide a thoroughly object-oriented view efficiently, such as integers operated on by messages: Another problem is implementing stream communication on a multiprocessor system with distributed memory. We focus on the latter problem, and explain the inadequacies of CLLs in the next section. 4 Problems in implementing streams in CLLs Stream communication, and more generally asynchronous communication, uses message buffers to store pending messages. In distributed memory multiprocessor systems, accessing a message buffer requires interprocessor communications(IPC), unless both the accessing process and the buffer are on the same processor. While a single IPC suffices to write a message into a 793 buffer on a remote processor, reading a message requires two: a request and a reply. Placing the buffer on the reader's processor can save one IPC for each message communicated through the buffer. However, it's difficult for CLL systems to place the buffer on the reader's processor. CLL systems use a shared variable as a message buffer, and they can't tell the readers of a variable from the writers. In addition, there may be multiple readers for a variable. In that case, there is a relatively small advantage in saving IPCs for only one reader among many. Moreover, the number of IPCs required would not be reduced even if the buffer is placed on the reader's processor. In a CLL, streams are represented as a sequence of message buffers, and the writer only knows the last one. When it becomes full, a new buffer is appended to the sequence, and if it is created on the reader's processor, the address must be propagated to the writer. This costs an additional IPC for every message sent. Since CLL systems may not place shared variables on the reader's processor, implementing these streams in CLLs results in costly remote reads, repeated for every buffer. The argument so far prompts the development of a dedicated system for SCOOLs. A'UM-90 is such a system for A'UM, a SCOOL that thoroughly integrates streams into its specification. The next section describes A'UM and gives an overview of A'UM-90. where selector is the method's name, and actions specify the operations it performs. The only operations methods are allowed to perform are connecting a stream to another, creating an object, and sending a message to a stream. Streams in A'UM 5.2 Stream communication in A'UM is highly abstract, providing safe communications and the notion of channels. Directed variables prevent contentions for a stream. The semantics of variables are enhanced so that they denote a set of confluent streams called a channel, a more general concept than a stream. All variables in A'UM have a stream as their value. The role of streams in A'UM is similar to pointers m Lisp; streams are the sole way of referencing objects. 5.2.1 Operations on Streams A stream is a sequence of messages, directed to a certain receiver. A message sent to a stream is placed at the end of the stream. Sending is expressed simply by juxtaposing a stream and a message, as follows. stream message Connection of two streams are denoted by the following syntax. receiver = stream 5 5.1 A'UM and A'UM-90 Behavior of Objects All A'UM objects run concurrently. They keep internal states called slots, and execute methods according to the messages they receive. The class an object belongs to defines its behavior. A class definition has the following form, which includes the declaration of the class name, the classes it inherits from, slot names (local state) and definitions of its methods. class class_name. super_class_decl sloLdecl method_defs end. An object receives messages from only one stream, called its interface. An object is referenced by connecting a stream to its interface. Streams connected to the object later on will be merged into the interface. A method is defined by the following form. selector -) actions. This means that all messages sent to stream flow into receiver. Closing a stream indicates that no more messages will be sent through it. Closing is always performed automatically, when a stream is discarded. In addition, messages arriving at an object's interface stream are consumed exclusively by that object. This operation is also performed automatically. 5.2.2 Directed Streams Stream connection is asymmetric; a stream may only be connected to another stream once, but many other streams may be connected to it. In order to assure at compile-time that streams are connected only once, references to a stream are classified into two types, called directions. An inlet is a reference to a stream from which messages flow; an outlet is another kind of reference in which messages are sentI. The single connection of a stream is assured by the restrictions requiring that a stream has only one inlet and that the right hand value of a connect expression be an inlet. Inlets and outlets are distinguished syntactically. Variables referencing inlets are denoted with a variable name with A prepended to it, e.g. AX. Slots holding inlets and lThey are named from an object's point of view. 794 class account. out balance. :init -> 0 = !balance. :deposit(AAmount) -> !balance + Amount = !balance. :withdraw(AAmount, AAck) -> (Amount < !balance) ? ( : (true· -> !balance - Amount = !balance. : (false -> Ack :overdrawn(!balance). ) . :balance(!balance) -> . end. Figure 2: Bank account outlets are written as slot names preceded by @ and by ! ,respectively. Expressions have a value whose direction is determined according to their kind. Messages are distinguished by the directions of their arguments as well as their number, and the message's name. 5.2.3 Channel Abstraction Two types of stream confluence, namely mergers and appenders have special support in the language. As mentioned earlier, a merger performs non-deterministic merging, and an appender connects streams one after another in a specified order. A channel is a tree formed of these confluences of streams, Variables represent a channel of a particular form, consisting of an appender and an arbitrary number of mergers. All outputs of the mergers are connected to inputs of the appender. For a variable named Foo, AFoo is an inlet of the root stream of the channel. Foo$1, Foo$2, Foo$3, and so on, are leaf streams. Foo is equivalent to Foo$1. They are appended into the root in the order of their number. When there are many expressions having the same number, the streams they denote are merged before being appended. Using channels reduces the description of mergers and appenders in programs, which would be indecipherable otherwise. 5.3 An Example Program Figure 2 is an example A'UM program defining a class for a bank account. Arguments in a message are connected with values of the expressions in the selector corresponding to the message. For example, : deposi t receives an outlet and connects AAmount to it. : balance receives an inlet and connects it to the value of !balance. A binary expression is a macro form. It expands into a send expression, which sends to the left hand value a message with two arguments, the right hand value and an inlet of a new stream. The name ofthe message is determined according to the operator. A macro form evaluates into an: outlet of the new stream. Thus, ! balance + Amount are expanded into !balance :add(Amount, AResult), with Result as its value. exp? (... ) is an anonymous class definition, which is used to represent a conditional behavior. Either of the methods : (true or : (false is executed by the instance of the anonymous class, according to the result of Amount < !balance. 5.4 An outline of A'UM-90 A'UM-90 is an A'UM language system, independent of any CLL. It provides efficient stream communication on a distributed memory multiprocessor system. Moving stream data structures to their reader's processor saves many IPCs, which are otherwise required in stream communication. A'UM-90 manages coarse-grained processes. Specifically, a process executes an instance of a user-defined class. An A'UM-90 system consists of a compiler and an emulator. The compiler generates code for an abstractmachine designed for the system, and the emulator executes the code. Two different types of platform have been used. One is a Sequent Symmetry with 16 processors, and the other is a number of Sun Sparc Stations communicating by Ethernet. Although a Symmetry has shared memory, we used it as a distributed memory machine. We used a small part of the memory to implement message communication, and'divided the rest among processors. 6 Implementation of Streams in A'UM-90 The implementation described here fully utilizes information on stream abstraction and message flow direction available in A'UM programs. Although the delivery of messages is somewhat delayed, the number of IPCs required is significantly reduced, when many messages are sent through a long cascade of streams. Moreover, the delay is eliminated in many cases by various subtle optimization methods. 6.1 Streams A stream is represented by a structure consisting of a message queue, a pointer to its receiver, and a reference count. The reference count is necessary for detecting 795 ~ PE#3 _connect~J migrate Stream! connect Figure 3: Stream location closed streams and for implementing the appenders correctly. The structure is named M node, where M stands for merging. A merger is simply represented as an M node having· more than one pointer referring to it. An appender is represented by a structure consisting of an M node and a pointer to the following stream. The structure is named A node. With these structures, implementing operations on streams within a processor is straightforward. Sending a message is simply queuing it. Connecting a stream to a receiver is making the pointer in the stream point to the receiver and increasing the reference count of the receiver. When a stream is closed, its reference count is decreased. Receiving a message is just dequeuing it. 6.2 Location of Streams As argued in a previous section, a stream should be placed on its receiver's processor in order to decrease the number of IPCs. However, when a stream is created, its receiver is still unknown. So we place it on the processor local to its creater at its creation, and let it migrate later to the receiver's processor(see Figure 3). Since it is always an object that ultimately receives messages sent to a stream, the stream migrates to the object's processor. When the stream is directly connected to the object, it migrates immediately. If it is connected to an intermediate stream, it waits until the intermediate stream migrates. Suppose that an address of a stream in a processor is announced to an object in another processor and that the stream has not yet migrated. If the object sends messages to the stream, two series of IPCs occur, one for sending them to the stream, and another for the migration proeess of the stream. We eliminate the former series by putting the messages into a new stream created on the same processor as the sending object and connecting the new stream to the original. With this strategy, and assuming that objects do not migrate, all messages except those used for implementing the strategy are transferred between processors at most once. In the next section, a more detailed description of the stream migration is given. 6.3 Migration Procedure In the following description, all streams are supposed to reside in different processors until they move. Operations within a processor are trivial, and are assumed to cost much less than ones involving IPCs. It is also supposed that streams are connected ,in a processor other than that of the receiving object. Otherwise, the migration procedure is so simple to become identical with an ordinary sending without migration. 1. A stream is placed on the same processor as its cre- ator object. 2. When the stream is connected, a control message named where is sent to the specified receiver. The control message has a pointer to the stream and a tag showing the type of the stream, i.e., either an M node or an A node. 3. The where causes the following actions according to the type of the receiver: a stream before its migration handles the control message as if it is an ordinary message. That is, it is put into the receiver's queue. It will be transferred again when the receiver eventually migrates, and will be forwarded to another receiver, which should cause the following case. an object or a stream after its migration creates a new node of the type indicated by the tag in the control message, and reports the address of the new node by a control message named here to the stream waiting for the reply. When the type of the immigrant and the receiver is the same, the receiver creates no new node, and reports its own address. 4. When the stream receives the here, it migrates to the specified new residence, in one of the following manners according to its type: M node It sends all messages in its queue to the new residence. If it hasn't been closed yet, it leaves in the former residence a pointer forwarding to the new location. The original residence will be reclaimed when it is closed. A node In addition to the procedure for the M node, the stream to be appended to the migrating one is connected to the same receiver at the moment when this A node is closed. That is, a new where with a pointer to the stream is sent to the receiver. 796 6.4 Migration Cost Each stream creates a where. It is transferred between processors twice, once when the stream is connected, and once when its receiver migrates. The second transfer doesn't happen if the receiver is an already moved stream or an object. Suppose a channel connected to an object consists of n streams, and of which nd are connected directly to the object, then the number of IPCs for where isn+(n-nd). A here is created in correspondence with a where, and is transferred between processors once. For all here's, n IPCs occur. Migration brings about no transfer of control messages, so the number of IPCs required for migration is n + (n - nd) + n = 3n - nd. Closing a stream requires another kind of control message. We call it close. Each stream sends its reader one close when closed. This adds up to n close's requiring n IPCs. Ordinary messages are transferred between processors always once. If there are m ordinary messages to be sent, then, in total, transfers between processors occur. How many IPCs occur for stream communication if streams don't move? Neither of where and here are created. A close is still created for a stream. The number of times ordinary messages and close's are transferred depends on the structure of the channel. A channel is a tree having streams as its nodes. Suppose the i-th node receives mi messages, and its depth is d· where a depth of a node is number of streams in the p~th from the leaf to the root. For example, the depth of a leaf directly connected to an object is 2. Then messages sent to the i-th leaf is transferred di - 1 times, and the total number of transfers will be: n I:(d i -1)(mi + 1) nd is the number of control messages used to move all streams. The above condition says that if the channel has some intermediate nodes between the root and leaves, and more than a certain number of messages are sent through them, then stream migration is beneficial. Conversely, if all streams in a channel are directly connected to an object, or too few messages are sent, streams should not be moved. The next section discusses some optimization based on detecting those cases. 6.5 Further Optimization The left-hand side of the above condition becomes zero when all streams are directly connected to an object. When connecting a stream, it is detected at run-time that the receiver is an object; pointers are tagged to indicate the type of the pointed structure. By not moving those streams, the right-hand side is also decreased to zero when the left-hand becomes zero. When less than two messages are sent through a stream, the stream does not migrate, i.e. it .does not send out a where. More detailed analysis shows that two is the least number to make stream migration beneficial. In addition, various minor optimization methods are applied to reduce the delay of the first message's delivery. For example, the first message is sent with a where, packed together in one IPC, if it is available when the where is sent out. When a where is received by a stream that only bridges two other streams, receiving no ordinary messages, it immediately forwards this where instead of sending out a new one. Such a stream can be distinguished by checking its reference count when it receives a where. 7 Evaluation In order to evaluate performance of the implementation described in the sections so far, we measured the following three values: i=l The condition when it requires less IPCs to implement stream communication with migrating streams than without them is: • Delay time • IPC load • Total elapsed time for entire execution of a program n I:(d i - 1)(mi + 1) > (3n - nd) +m +n i=l This can be rewritten as: n I:(d i - 2)(mi + 1) > 3n - nd i=l Since di can not be smaller than 2, di - 2 never becomes negati ve. The next term mi +1 is the number of messages sent from a node, including a close. The last term 3n - As a control, we measured against an A'UM-90 system which does not migrates streams. We call this system NO_WHERE, and the system that performs the migration WHERE in the following sections. Programs used in the measurement of delay time and IP C load form a linear channel, a long chain of streams without any branches, and send along the channel. Figure 5 shows the objects' configuration. Each PE creates one stream on itself. When the PE receives a message connect, it connects its stream to the next lower stream 797 2800r---~--~--~--~-~--~--~--. : : .l~~~~-: . . . ,,::;:=~~~~:;~.~•. ............ -_.------+----------- hello- ~ ___ 2600 :--_ _ PE_#_O_ _ _ _ _ _-';i:" 1 t ""w-::J""I"li!!!"" .---------, .:,." . :,.),:,.:,.r, I...".,!,,,":".,,,!,.' . •..,!",. iii! ",, , , , (, , ,'!,! !-!, ,. - - . ".,---' COllnect PE#l connect - H - - - - H - - ' 1 PE#2 ------1--- --------;;ERE, UPSTREAM-+- 2000 / PE#3 1800 1600 Figure 4: Objects' configuration 1400 .!------=-----:':------'4:----5~---:-6-----:':--~:------! 1 number of hellos Figure 5: Delay time on another PE. Also, the first PE releases several messages named hello at its stream. The connect circulates around the PEs, one at a time, through a channel different from that thorough whiCh hellos flow. Two programs which differ in direction of the circulation were used. We call one of them DOWNSTREAM, in which a connect flows in the same direction as hellos, and the other UPSTREAM, in which a connect flows against hellos. The connect in Figure 5 is flowing UPSTREAM. The time was measured from after the release of the hellos and a connect until the arrival of the last hello. 9000 WHERE, UPSTREAM-+WHERE, DOWNSTREAM-+-NO WHERE, UPSTREAM ·0 .. NO_WHERE, DOWNSTREAM· ..... 8000 7000 6000 5000 4000 3000 2000 0~=2;:t;0=---=---=J-~~O--:::::---=--6~~-=---==-:;:t80:==::;10~0~1~2~O=;J14t;0==1~60;=1~8:=O~20 0 number of hellos 7.1 Delay time Figure 6 shows the result of the delay time measurement, sending up to ten messages down a channel of length ten. The values are elapsed time measured on an unloaded Sequent Symmetry, using 10 PEs. They includes CPU time and idle time during which PEs were waiting for messages. In the DOWNSTREAM case, delay time in the WHERE is longer than in the NO_WHERE by at most 1000 msec, as expected. In the UPSTREAM case, however, messages arrive earlier in the WHERE than in the NO_WHERE by 200 msec. The reason for this reversal is that the migration of streams took place concurrently with the circulation of the connect in the WHERE. After the connect reached the uppermost PE, hellos were sent directly to their final receiver in the WHERE, while, in the NO_WHERE, they flowed through every PE having a part of the channel. From these results, we can expect that the difference in the delay time of the WHERE and the NO_WHERE would be smaller than 1000 msec when the connections of a channel's constituent streams occur in a varying order. Also, note that the delay time for the first message in the WHERE is much smaller than those for the later messages. This results from the optimization, mentioned at the section 6.5, of sending a where and the first message together whenever possible. Figure 6: IPC load 7.2 IPC load Figure 7 shows the result of the IPC load measurement, sending up to 200 messages down a channel oflength 500. The values are CPU time measured on an unloaded Sequent Symmetry, using 10 PEs. The results confirm that the IPC load in the NO_WHERE eventually becomes much larger than that in the WHERE as the number of released messages grows. 7.3 Total elapsed time Figure 8 and Figure 9 shows results of measurements using a program PRIME, which enumerates prime numbers by the generate-and-test method. The graphs in Figure 8 are obtained from 10 PEs in a Symmetry, and those in Figure 9 are from isolated Ethernet network consisting of two Sun Sparc Stations. The top two graphs in each figure are elapsed time, the next two are average total CPU time for a PE, and the other one is CPU time for a PE, spent only for processing other than IPC. The last one is estimated from CPU time for execution using 1 PE, divided by the number of PEs, i.e., 10. The graphs for elapsed time shows that the WHERE 798 5000 4500 NO WHERE, elapsed---WHERE, elapsed-+-NO_ ~=~~~; 4000 3500 Q Q) 3000 ~ 2500 ~ -~ '-' C'" ~~g -::- :"O",C~////// // 2000 1500 .. /Jr." 1000 500 200 400 600 800 1000 1200 1400_ 1600 1800 2000 maximum integer tested Figure 7: P RIME on shared memory 70000 r - - - - r - - , - - - . . - - - - , - - - , - - , - - - , - - - , - - - , - - - - . NO WHERE, elapsed..-WHERE, elapsed -+-60000 NO WHERE, CPU -0--WHERE, CPU -,-CPu, no IPC ~-50000 This is not the case for A'UM. A'UM has abstract stream communication, whose implementation is left as the language systems' responsibility. In addition, every stream is restricted to have only one reader. So streams in A 'UM can be more efficiently implemented than ones in CLLs. An A'UM-90 moves a stream to its reader's processor, and saves about half of the IPCs required in CLLs. In spite of the migration, it deliver the first message through the stream with small delay. A prime number generator program runs up to 40 % faster in an A'UM-90 than in the system does not migrate streams. While the optimization method given in this paper tries to reduce the number of IPCs for a given distribution of objects, it is also important to find the best distribution of objects. Of course, those methods have to balance the amount of IPCs and the parallelism exploitation. Acknowledgments We thank Shinji Yanagida and Toshio Tange of NEC Scientific Information System Development for developing the A'UM-90 abstract-machine emulator. References Figure 8: PRIME on Ethernet is faster than the NO_WHERE. On a Symmetry, the entire speedup can be explained by decrease of CPU time. There is up to 40% improvement in CPU time spent for IPC, which can be read from the difference between total and non-IPC portion of CPU time. On Ethernet, the speedup is much larger than the decrease of CPU time, due to much slower communication. 8 Conclusion Streams in CLLs are difficult to implement efficiently for two reasons: 1. Message buffers are not always placed on their readers' processor, because an arbitrary number of readers are allowed for a buffer. Therefore, interprocessor reading from the buffer takes place with two IPCs, instead of one required for writing into it. 2. A stream is represented by cascaded message buffers, which CLLs don't treat as a single body. Consequently, even if these buffers are placed on their reader's processor, their address has to be repeatedly sent to their writer. [Furukawa et al. 1984] K. Furukawa, A. Takeuchi, S. Kunifuji, H. Yasukawa, M. Ohki, K. Ueda, Mandala: A Logic Based Knowledge Programming System, Proc. FGCS'84, November 1984. [Kahn et al. 1986] K. Kahn, E. D. Tribble, M. S. Miller, D. G. Bobrow, Objects in Concurrent Logic Programming Languages, Proc. OOPSLA'86, September, 1986. [Kahn et al. 1989] K. Kahn, Objects - a fresh look, Proc. Third European Conf. on Object-Oriented Programming, Cambridge University Press, July 1989. [Saraswat et al. 1990] V. A. Saraswat, K. Kahn, J. Levy, Janus: A step towards distributed constraint programming, North American Logic Programming Conference, October 1990. [Shapiro and Takeuchi 1983] E. Shapiro, A. Takeuchi, Object-oriented Programming in Concurrent Prolog, New Generation Computing, 1, 1983. [Ueda 1985] K. Ueda, Guarded Horn Clauses, Technical Report TR-103, ICOT, June 1985. [Yoshida and Chikayama 1988] K. Yoshida, T. Chikayama, A 'UM: A Stream-Based Object-Oriented Language, Proc. FGCS'88, November 1988. - PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 799 Message-Oriented Parallel Implementation of Moded Flat GHC Kazunori U eda Masao Morita Institute for New Generation Computer Technology 4-28, Mita l-chome, Minato-ku, Tokyo 108, Japan ueda@icot. or. jp Mitsubishi Research Institute 3-6, Otemachi 2-chome, Chiyoda-ku, Tokyo 100, Japan morita@asdal.mri.co.jp Abstract We proposed in [Ueda and Morita 1990] a new,. message-oriented implementation technique for Moded Flat GHC that compiled unification for data transfer into message passing. The technique was based on constraint-based program analysis, and significantly improved the performance of programs that used goals and streams to implement reconfigurable data structures. In this paper we discuss how the technique can be parallelized. We focus on a method for shared-memory multiprocessors, called the shared-goal method, though a different method could be used for distributed-memory multiprocessors. Unlike other parallel implementations of concurrent logic languages which we call process-oriented, the unit of parallel execution is not an individual goal but a chain of message sends caused successively by an initial message send. Parallelism comes from the existence of different chains of message sends that can be executed independently or in a pipelined manner. Mutual exclusion based on busy waiting and on message buffering controls access to individual, shared goals. Typical goals allow lastsend optimization, the message-oriented counterpart of last-call optimization. We are building an experimental implementation on Sequent Symmetry. In spite of the simple scheduling currently adopted, preliminary evaluation shows good parallel speedup and good absolute performance for concurrent operations on binary process trees. 1. Introduction Concurrent processes can be used both for programming computation and for programming storage. The latter aspect can be exploited in concurrent logic programming to program reconfigurable data structures using the following analogy, records pointers +-----t +-----t (body) goals streams (implemented by lists) where a (concurrent) process is said to be implemented by a multiset of goals. nt([], _, L,R) :-true I L= [J , R= [J . nt([search(K,V) ICs] ,K, Vl,L,R) :- true I V=Vl, nt(Cs,K,Vl,L,R). nt([search(K,V)ICs],Kl,Vl,L,R) :-KKl I R=[search(K,V)IR1], nt(Cs,Kl,Vl,L,Rl). nt ([update(K, V) ICs] ,K, _, L ,R) : - true I nt(Cs,K,V,L,R). nt([update(K,V)ICs],Kl,Vl,L,R) :-KKll R=[update(K,V)IR1], nt(Cs,Kl,Vl,L,Rl). tee] ) :-true I true. t ([searchC, V) I Cs]) : - true I V=undefined, t(Cs). t ([update(K, v) ICs]) : - true I nt(Cs,K,V,L,R), tel), t(R). Program 1. A GHC program defining binary search trees as processes An advantage of using processes for this purpose is that it allows implementations to exploit parallelism between operations on the storage. For instance, a search operation on a binary search tree (Program 1), given as a message in the interface stream, can enter the tree soon after the previous operation has passed the root of the tree. Programmers do not have to worry about mutual exclusion, which is taken care of by the implementation. This suggests that the programming of reconfigurable data structures can be an important application of concurrent logic languages. (The verbosity of Program 1 is a separate issue which is out of the scope of this paper.) Processes as storage are almost always suspending, but should respond quickly when messages are sent. However, most implementations of concurrent logic languages have not been tuned for processes with this characteristic. In our earlier paper [Ueda and Morita 1990], we proposed messageoriented scheduling of goals for sequential implementation, which optimizes goals that suspend and resume 800 frequently. Although our primary goal was to optimize storage-intensive (or more generally, demand-driven) programs, the proposed technique worked quite well also for computation-intensive programs that did not use one-to-many communication. However, how to utilize the technique in parallel implementation was yet to be studied. Parallelization of message-oriented scheduling can be quite different from parallelization of ordinary, process-oriented scheduling. An obvious way of parallelizing process-oriented scheduling is to execute different goals on different processors. In message-oriented scheduling, the basic idea should be to execute different message sends on different processors, but many problems must be solved as. to the mapping of computation to processors, mutual exclusion, and so on. This paper reports the initial study on the subject. The rest of the paper is organized as follows: Section 2 reviews Moded Flat GHC, the subset of GHC we are going to implement. Section 3 reviews message-oriented scheduling for sequential implementation. Section 4 discusses how to parallelize messageoriented scheduling. Of the two possible methods suggested, Section 5 focuses on the shared-goal method suitable for shared-memory multiprocessors and discusses design issues in more detail. Section 6 shows the result of preliminary performance evaluation. The readers are assumed to be familiar with concurrent logic languages [Shapiro 1989]: 2. Moded Flat GRC and Constraint-Based Program Analysis Moded Flat GHC [Ueda and Morita 1990] is a subset of GHC that introduces a mode system for the compiletime global analysis of dataflow caused by unification. Unification executed in clause bodies can cause bidirectional dataflow in general, but mode analysis tries to guarantee that it is assignment to an uninstantiated variable effectively and does not fail (except due to occur check). Our experience with GHC and KLI [Ueda and Chikayama 1990] has shown that the full functionality of bidirectional unification is seldom used and that programs using it can be rewritten rather easily (if not automatically) to programs using unification as assignment. These languages are indeed used as generalpurpose concurrent languages, which means that it is very important to optimize basic operations such as unification and to obtain machine codes close to those obtained from procedural languages. For global compile-time analysis to be practical, it is highly desirable that individual program modules can be analyzed separately in such way that the results can be merged later. The mod~ system of Moded Flat GHCis thus constraint-based; the mode a of a whole program can be determined by accumulating the mode constraints obtained separately from the syntactic analysis of each program clause. Another advantage of the constraint-based system is that it allows programmers to declare some of the mode constraints, in which case the analysis works as mode checking as well as mode inference. The modularity of the analysis was brought by the rather strong assumption of the mode system: whether the function symbol at some position (possibly deep in a data structure) of a goal 9 is determined by 9 or by other goals running concurrently is determined solely by that position specified by a path, which is defined as follows. Let Pred be the set of predicate symbols and Fun the set of function symbols. For each p E Pred with the arity n p, let Np be the set {1,2, ... ,np}. N f is defined similarly for each f E Fun. Now the sets of paths Pt (for terms) and P a (for atoms) are defined using disjoint union as: Pt = ( L fEFun N f)* , Pa = ( L N p ) x Pt. pEPred An element of Pa can be written as a string (p, i)(h, j1) ... (fn,jn), that is, it records the predicate and the function symbols on the way as well as the argument positions selected. A mode is a function from Pa to the set {in, out}, which means that it assigns either of in or out to every possible position of every possible instance of every possible goal. Whether some position is in or out can depend on the predicate and function symbols on the path down to that position. The function can be partial, because the mode values of many uninteresting positions that will not come to exist can be left undefined. Mode analysis checks if every variable generated in the course of execution will have exactly one out occurrence (occurrence at an out position) that can determine its top-level value, by accumulating constraints between 'the mode values of different paths. Constraint-based analysis can be applied to analyzing other properties of programs as well. For instance, if we can assume that streams and non-stream data structures do not occur at the same position of different goals, we can try to classify all the positions into (1) those whose top-level values are limited to the list constructors (cons and nil) and (2) those whose top-level values are limited to symbols other than the list constructors, which is the simplest kind of type inference. Other applications include the static identification of 'singlereference' positions, namely positions whose values are not read by more than one goal and hence can be discarded or destructively updated after use. This could replace the MRB (multiple-reference bit) scheme [Chikayama and Kimura 1987], a runtime scheme 801 (p) (q) sender's goal record receiver's goal record adopted in current KL1 implementations for the same purpose. 3. Message-Oriented (Sequential) Implementation In a process-oriented sequential implementation of concurrent logic languages, goals ready for execution are put in a queue (or a stack or a deque, depending on the scheduling). Once a goal is taken from the queue, it is reduced as many times as possible, using last-call optimization, until it suspends or it is swapped out. A suspended goal is hooked on the uninstantiated variable( s) that caused suspension, and when one of the variables is instantiated, it is put back into the queue. Message-oriented implementation has much in common with process-oriented implementation, but differs in the treatment of stream communication: It compiles the generation of stream elements into procedure calls to the consumer of the stream. A stream is an unbounded buffer of messages in principle, but message-oriented implementation tries to reduce the overhead of buffering and unbuffering by transferring control and messages simultaneously to the consumer whenever possible. To this end, it tries to schedule goals so that whenever the producer of a stream sends a message, the consumer is suspending on the stream and is ready to handle the message. Of course, this is not always possible because we can write a program in which a stream must act as a buffer; messages are buffered when the consumer is not ready to handle incoming messages. Process-oriented implementation tries to achieve good performance by reducing the frequency of costly goal switching and taking advantage of last-call optimization. Message-oriented implementation tries to reduce the cost of each goal switching operation and the cost of data transfer between goals. Suppose two goals, p and q, are connected by a stream sand p is going to send a message to q that is suspending on s. Message-oriented implementation represents s as a two-field communication cell that points to (1) the instruction in q's code from which the processing of q is to be resumed and (2) q's goal record containing its arguments (Fig. 1). (Throughout the paper, we assume that a suspended goal will resume its execution from the instruction following the one that caused suspension, not from the first instruction of the predicate.) To send a message m, p first loads m on a hardware register called the communication register, changes the current goal to the one pointed to by the communication cell of s, and calls the code pointed to by the communication cell of s. The goal q gets m from the communication register and may send other messages in its turn. Control returns to p when all the message sends caused directly or indirectly by m put~get ~----~mes.~mes. ~----~ comm. reg. (hardware) Fig. 1. Immediate message send code for buffering (p) sender's goal record (q) CJ comm. reg. (hardware) Fig. 2. Buffered message send have been processed. However, if m is the last message which p can send out immediately (i.e., without waiting for further incoming messages), control need not return to p but can go directly to the goal that has outstanding message sends. This is called last-send optimization, which we shall see in Section 5.4 in more detail. We have observed in GHCjKL1 programming that the dominant form of interprocess communication is one-to-one stream communication. It therefore deserves special treatment, even though other forms of communication such as broadcasting and multicasting become a little more expensive. One-to-many communication is done either by the repeated sending of messages or by using non-stream data structures. Techniques mentioned in Section 2 are used to analyze which positions of a predicate and which variables in a program are used for streams and to distinguish between the sender and the receiver( s) of messages. When a stream must buffer messages, the communication cell representing the stream points to the code for buffering and the descriptor of a buffer. The old entries of the communication cell are saved in the descriptor (Fig. 2). In general, a stream must buffer incoming messages when the receiver goal is not ready to handle them. The following are the possible reasons [Ueda and Morita 1990]: 802 Fig. 3. Binary search tree as a process (1) (selective message receiving) The receiver is waiting for a message from other input streams. (2) The receiver is suspending on non-stream data (possibly the contents of messages). (3) The sender of a message may run ahead of the receIver. (4) When the receiver r belongs to a circular process structure, a message m sent by r may possibly arrive at r itself or may cause another message to be sent back to 1'. However, unless m has been sent by last-send optimization, r is not ready to receive it. The receiver examines the buffer when the reason for the buffering disappears, and handles messages (if any) in it. Process-oriented implementation often caches (part of) a goal record on hardware registers, but this should not be done in message-oriented implementation III which process switching takes place frequently. 4. Parallelization How can we exploit parallelism from message-oriented implementation? Two quite different methods can be considered: Distributed- goal method. Different processors take charge of different goals, and each processor handles messages sent to the goals it is taking charge of. Consider a binary search tree represented using goals and streams (Fig. 3) and suppose three processors take charge of the three different portions of the tree. Each processor performs message-oriented processing within its own portion, while message transfer between portions is compiled into inter-processor communication with buffering. Shared- goal method. All processors share all the goals. There is a global, output-restricted de que [Knuth 1973] of outstanding work to be done in parallel, from which an idle processor gets a new job. The job is usually to execute a non-unification body goal or to send a message, the latter being the result of compiling a unification body goal involving streams. The message send will usually cause the reduction of a suspended goal. If the reduction generates another unification goal that has been compiled into a message send, it can be performed by the same processor. Thus a chain of message sends is formed, and different chains of message sends can be performed in parallel as long as they do not interfere with each other. In the binary tree example, different processors will take care of different operations sent to the root. A tree operation may cause subsequent message sends inside the tree, but they should be performed by the same processor because there is no parallelism within each tree operation. Unlike the shared-goal method, the distributedgoal method can be applied to distributed-memory multiprocessors as well as shared-memory ones to improve the throughput of message handling. On shared-memory multiprocessors, however, the sharedgoal method is more advantageous in terms of latency (i.e., responses to messages), because (1) it performs no inter-processor communication within a chain of message sends and (2) good load balancing can be attained easily. The shared-goal method requires a locking protocol for goals as will be discussed in Section 5.1, but it enables more tightly-coupled parallel processing that covers a wider range of applications. Because of its greater technical interest, the rest of the paper is focused on the shared-goal method. 5. Shared-Goal Implementation In this section, we discuss important technicalities in implementing the shared-goal method. We explain the method and the intermediate code mainly by examples. Space limitations do not allow the full description of the implementation, though we had to solve a number of subtle problems related to concurrency control. 5.1 Locking of Goals Consider a goal p(Xs, Ys) defined by the following single clause: p([AIXs1],Ys) :- true I Ys=[AIYs1], p(Xs1,Ys1). In the shared-goal method, different messages in the input stream XS may be handled by different processors that share the goal p (Xs, Ys). Any processor sending a message must therefore try to lock the goal record (placed in the shared memory) of the receiver first and obtain the grant of exclusive access to it. The receiver must remain locked until it sends a message through Ys and restores the dormant state. The locking operation is important in the following respect as well: In message-oriented implementation, the order of the elements in a stream is not represented 803 spatially as a list structure but as the chronological order of message sends. The locking protocol must therefore make sure that when two messages, 0:' and /3, are sent in this order to p (Xs, Ys), they are sent to the receiver of Ys in the same order. This is guaranteed by locking the receiver of Ys before p(Xs, Ys) is unlocked. 5.2 Busy Wait vs. Suspension How should a processor trying to send a message wait until the receiver goal is unlocked? The two extreme possibilities are (1) to spin (busy-wait) until unlocked and (2) to give up (suspend) the sending immediately and do some other work, leaving a notice to the receiver that it has a message to receive. We must take the following observations into account here: (a) The time each reduction takes, namely the time required for a resumed goal to restore the dormant state, is usually short (several tens of CISC instructions, say), though it can be considerably long sometimes. (b) As explained in Section 5.1, a processor may lock more than one goal temporarily upon reduction. This means that busy wait may cause. deadlock when goals and streams form a circular structure. Because busy wait incurs much smaller overhead than suspension, Observation (a) suggests that the processor should spin for a period of time within which most goals can perform one reduction. However, it should suspend finally because of (b). Upon suspension, a buffer is prepared as in Fig. 2, and the unsent message is put in it. Subsequent messages go to the buffer until the receiver has processed all the messages in the buffer and has removed the buffer. As is evident from Fig. 2, no overhead is incurred to check if the message is going to the buffer or to the receiver. The receiver could notice the existence of outstanding messages by checking its input streams upon each reduction, but it incurs overhead to (normal) programs which do not require buffering. So we have chosen to avoid this overhead by letting the sender spawn and schedule a special routine, called the retransmitter of the messages, when it creates a buffer. The retransmitter is executed asynchronously with the receiver. When executed, it tests if the receiver has been unlocked, in which case it sends the first message in the buffer and re-schedules itself. For the shared resources other than goals (such as logic variables and the global deque), mutual exclusion should be attained by busy wait, because access to them takes a short period of time. On the other hand, synchronization on the values of non-stream variables (due to the semantics of GHC) should be implemented using suspension as usual. 5.3 Scheduling Shared-goal implementation exploits parallelism between different chains of message sends that do not interfere with each other. For instance, a binary search tree (Fig. 3) can process different operations on it in a pipelined manner, as long as there is no dependence between the operations (e.g., the key of a search operation depending on the result of the previous search operation). When there is dependency, however, parallel execution can even lower the performance because of synchronization overhead. Another example for which parallelism does not help is a demand-driven generator of prime numbers which is made up of cascaded goals for filtering out the multiples of prime numbers. The topmost goal receiving a new demand from outside filters out the multiples of the prime computed in response to the last demand. However, until the last demand has almost been processed, the topmost goal doesn't know what prime's multiples should be filtered out, and hence will be blocked. These considerations suggest that in order to avoid ineffective parallelism, it is most realistic to let programmers specify which chains of message sends should be done in parallel with others. The simple method we are using currently is to have (1) a global deque for the work to be executed in parallel by idle processors and (2) one local stack for each processor for the work to be executed sequentially by the current processor. Each processor obtains a job from the global deque when its local stack is empty. We use a global deque rather than a global stack because, if the retransmitter of a buffer fails to send a message, it must go to the tail of the deque so it may not be retried soon. Each job in a stack/ deque is uniformly represented as a pair (code, env), where code is the job's entry /resumption point and env is its environment. The job is usually to start the execution of a goal or to resume the execution of a clause body. In these cases, env points to the goal record on which code should work. When the job is to retransmit buffered messages, env points to the communication cell pointing to the buffer. When a clause body has several message sends to be executed in parallel, they will not put in the deque separately. Instead, the current processor executing the clause body performs the first send (and any sends caused by that send), putting the rest of the work to the deque after the first send succeeds in locking the receiver. Then an idle processor will get the rest of the work and perform the second message send (and any sends caused by that send), putting the rest of the rest back to the deque. This procedure is to guarantee the order of messages sent through a single stream by different processors. Suppose two messages, 0:' and /3, are sent by a goal like Xs= [0:' ,/31 Xs 1]. Then we have to make sure that the processor trying to send /3 will 804 not lock the receiver of Xs before the processor trying to send a has done so. 5.4 Reduction This section outlines what a typical goal should do during one reduction, where by 'typical' we mean goals that can be reduced by receiving one message. As an example, consider the distributor of messages defined as follows, p([A!Xs] ,Ys,Zs) :- true! Ys=[A!Ysi] , Zs=[A!Zsi], p(Xs,Ysi,Zsi). where we assume A is known, by program analysis or declaration, to be a non-stream datum. (Otherwise a somewhat more complex procedure is necessary, because the three occurrences of A will be used for one-totwo communication.) The intermediate code for above program is: entry(p/3) rcv _val ue (Ai) get_cr(A4) send_call(A2) put_cr(A4) send_call(A3) } or send_jmp(A3) . execute The Ai's are entries of the goal record of the goal being executed, which contain the arguments of the goal and temporary variables. Other programs may use Xi's, which are (possibly virtual) general registers local to each processor, and GAi's, which are the arguments of a new goal being created. The label entry(p/3) indicates the initial entry point of the predicate p with three arguments. The instruction rcv _val ue (Ai) waits for a message from the input stream on the first argument. If messages are already buffered, it takes the first one and puts it on the communication register. A retransmitter of the buffer is put on the deque if more messages exist; otherwise the buffer is made to disappear (Section 5.7). If no messages are buffered, which is expected to be most probable, rcv_value unlocks the goal record, and suspends until a message arrives. In either case, the itlstruction records the address of the next instruction in the communication cell (or, if the communication cell points to a buffer, in the buffer descriptor). The goal is usually suspending at this instruction. The instruction get_cr(A4) saves into the goal record the message in the communication register, which the previous rcv_value(Ai) has received. Then send_call (A2) sends the message in the communication register through the second stream. The instruction send_call(A2) tries to lock the receiver of the second stream and if successful, transfers control to the receiver. If the receiver is busy for a certain period of time or it isn't busy but is not ready to handle the message, the message is buffered. The instruction' send_call does not unlock the current goal record. When control eventually returns, put_cr(A4) restores the communication register and send_call(A3) sends the next message. When control returns again, execut e performs the recursive call by going back to the entry point of the predicate p. Then the rcv _val ue (Ai) instruction will either find no buffered messages or find some. In the former case, rcv_value(Ai) obviously suspends. In the latter case, a retransmitter of the buffer must have been scheduled, and so rcv_value(Ai) can suspend until the retransmitter sends a message. Moreover, the resumption address of the rcv _ val ue (Ai) instruction has been recorded by its previous execution. Thus in either case, execute effectively does nothing but unlocking the current goal. This is why last-send optimization can replace the last two instructions into a single instruction, send_jmp(A3). The instruction send_jmp(A3) locks the receiver of the third stream, unlocks the current goal, and transfers control to the receiver without stacking the return address. Last-send optimization enables the current goal to receive the next message earlier and allows the pipelined processing of message sends. Note that with last-send optimization, the rcv _ val ue (Ai) instruction will be executed only once when the goal starts execution. The instructions executed for each incoming message are those from get_cr(A4) through send_ jmp(A3) . The above instruction sequence performs the two message sends sequentially. However, a variant of send_call called send_fork stacks the return address on the global deque instead of the local stack, allowing the continuation to be processed in parallel. Note that send_fork leaves the continuation to another processor rather than the message send itself for the reason explained in Section 5.3. We have established a code generation scheme for general cases including the spawning and the termination of goals (Section 5.5), explicit control of message buffering (Section 5.6), and suspension on nonstream variables. Several optimization techniques have been developed as well, for instance for goals whose input streams are known to carry messages of limited forms (e.g., non-root nodes of a binary search tree (Fig. 3)). Finally, we note that although processoriented scheduling and message-oriented scheduling differ in the flow of control, they are quite compatible in the sense that an implementation can use both in running a single program. Our experimental implementation has actually been made by modifying a process-oriented implementation. 5.5 An Example Here we give the intermediate code of a naIve reverse 805 The program: (1) nreverse ([H IT] ,0) (2) nreverse ([] , 0) (3) append([IIJ] ,K,L) (4) append( [] , K, L) true true true true append(01,[H] ,0), nreverse(T,01). 0= [] . L=[IIM], append(J,K,M). K=L. entry(nreverse/2) rcv_value(A1) receive a message from the 1st arg (the program is usually waiting for incoming messages here) check_not_eos(101) if the message is eos then collect the current comm. cell and goto 101 get_cr(X3) save the message H in the comm. reg. to the register of the current P E commit Clause 1 is selected (no operation) put_cc(X4) create a comm. cell with a buffer push_value(X3) put the message H into the buffer push_eos put eos into the buffer g_setup(append/3,3) create a goal record for S args and record the name put_value(A2,GA3) set the Srd arg of append to a put_value(X4,GA2) set the 2nd arg of append to [H] put_com_variable(A2,GA1) create a locked variable 01 and set the 2nd arg of nreverse and the 1st arg of append to the pointer to 01, assuming that append will turn 01 into a comm. cell soon g_call execute append until it suspends return unlock the current goal and do the job on the local stack top label(101) commit Clause 2 is selected (no operation) send_call(A2) send eos in the comm. reg. to the receiver of a proceed deallocate the goal record and return entry(append/3) deref(A3) rcv_value(A1) check_not_eos(102) commit sendn_jmp(A3) label(102) commit send_unify_jmp(A2,A3) dereference the Srd arg L receive a message from the 1st argo if the message is eos then collect the current comm. cell and goto 102 Clause S is selected (no operation) send the received message to the receiver of L, where 'n' means that the instruction assumes that L has been dereferenced Clause 4 is selected (no operation) make sure that messages sent through K are forwarded to the receiver of L, and return Fig. 4. Intermediate code for naIve reverse program (Fig. 4). In order for the code to be almost self-explanatory, some comments are appropriate here. Suppose the messages ml, ... , mn are sent to the goal nreverse (In, Out) through In, followed by the eos (end-of-stream) message indicating that the stream is closed. The nreverse goal generates one suspended append goal for each mi, creating the structure in Fig. 5. The ith append has as its second argument a buffer with two messages, mi and eos. The final eos message to nreverse causes the second clause to forward the eos to the most recent append goal holding m n . The append holding m n , in response, lets different (if available) processors send the two buffered messages mn and eos to the append holding m n -1. The message mn is transferred all the way to the append holding m1 and appears in Out. The following eos causes the next append goal to send m n -1 and another eos. The performance of nrevers e hinges on how fast each append goal can transfer messages. For each incoming message, an append goal checks if the message is not eos and then transfers both the message and control to the receiver of the output stream. The message remains on the communication register and need not be loaded or stored. The send_unify_jmp(1'1,r2) instruction is used for the unification of two streams. Arrangements are made so that next time a message is sent through 1'1, the sender is made to point directly to the communication cell of 1'2' If the stream 1'1 has a buffer (which is the case with nreverse), the above redirection is made to happen after all the contents of the buffer are sent to the receiver of 1'2. It is worth noting that the multiway merging of streams can transfer messages as efficiently as append. 806 Fig. 5. Process structure being created by nreverse([rnl,'" ,rn n ] ,Out) 5.6 Buffering As discussed in Section 5.2, the producer of a stream s creates a buffer when the receiver is locked for a long time. However, this is a rather unusual situation; a buffer is usually created by s's receiver when it remains unready to handle incoming messages after it has unlocked itself. Here we re-examine the four reasons of buffering in Section 3: (1) Selective message receiving. This happens, for instance, in a program that merges two sorted streams of integers into a single sorted stream: omerge([AIX1],[BIY1] ,Z) :- A< B Z=[AIZ1], omerge(X1,[BIY1] ,Z1). omerge([AIX1], [BIY1] ,Z) :- A>=B I Z=[BIZ1], omerge([AIX1],Y1,Z1). Two numbers, one from each input stream, are necessary for a reduction. Suppose the first number A arrives through the first stream. Then the goal omerge checks if the second stream has a buffered value. Since it doesn't, the goal cannot be reduced. So it records A in the goal record and changes the first stream to a buffer, because it has to wait for another number B to come through the second stream. Suppose B( > A) arri ves and the first clause is selected. Then the second stream ~hould become a buffer and B will be put back. The first stream, now being a buffer, is checked and a retransmitter is stacked if it contains an element; otherwise the buffer is made to disappear. Finally A is sent to the receiver of the third stream. The above procedure is admittedly complex, but this program is indeed one of the hardest ones to execute in a message-oriented manner. A simpler example of selective message receiving appears in the append program in Section 5.5; its second input stream buffers messages until the nonrecursi ve clause is selected. (2) Suspension on non-stream data. The most likely case is suspension on the content of a message (e.g., the first argument of an update message to a binary search tree). When a goal receives from a stream s a message that is not sufficiently instantiated for reduction, it changes s to a buffer and puts the message back to it. A retransmitter is hooked on the uninstantiated variable( s) that caused suspension, which will be invoked when any of them are instantiated. (3) The sender of a stream running ahead of the receiver. It is not always possible to guarantee that the sender of a stream does not send a message before the receiver commences execution, though the scheduling policy tries to avoid such a situation. The simplest solution to this problem is to initialize each stream to an empty buffer. However, creating and collecting a buffer incurs certain overhead, while a buffer created for the above reason will receive no messages in most cases. So the current scheme defers the creation of a real buffer until a message is sent. Moreover, when the message is guaranteed to be received soon, the put_corn_variable instruction (Fig. 4) is generated and lets the sender busy-wait until the receiver executes rcv_value. (4) Circular process structure. When the receiver sends more than one message in response to an incoming message, sequential implementation must buffer subsequent incoming messages until the last message is sent out. In parallel implementation, the same effect is automatically achieved by the lock of the goal record, and hence the explicit control of buffering is not necessary. The retransmission of a buffer created due to the reason (1) or (3) is explicitly controlled by the receiver. When a buffer is created due to the reason (2) or by the sender of a stream, a retransmitter of the buffer is scheduled asynchronously with the receiver. 5.7 Mutual Exclusion of Communication Cells The two fields of a communication cell representing a stream may be updated both by the sender and the receiver of the stream. For instance, the sender may create a buffer and connect it to the cell when the receiver is locked for a certain period of time. The receiver may set or update the cell by the rcv_value instruction, may create or remove a buffer for the cell when buffering becomes necessary or unnecessary, may execute send_unify_jmp and connect the stream to another, and may move or delete the goal record of its own. This of course calls for some method of mutual exelusion for communication cells. The simplest solution would be to lock a communication cell whenever updating or reading it, but locking both a goal record and a communication cell for each message send would be too costly. It is highly desirable that an ordinary message send, which reads but does not update a communication cell, need not lock the communication cell. However, without locking upon reading, the following sequence can happen and inconsistency arises: (1) the sender follows the pointer in the second field (the environment) of the communication cell, (2) the receiver starts and completes the updating of the communication cell (under an appropriate locking protocol), and then 807 Table 1. Performance Evaluation (in seconds) Language Processing GHC 1 1 2 3 4 5 6 7 8 C (recursion) C (iteration) cc-O cc-o PE (no locking) PE PEs PEs PEs PEs PEs PEs PEs binary process tree (5000 operations) (search) (update) 1.25 1.38 0.78 0.55 0.44 0.36 0.33 0.33 0.33 1.83 2.10 1.15· 0.81 0.63 0.53 0.46 0.39 0.36 0.71 0.32 0.72 0.35 naIve reverse (1000 elements) 2.23 3.27 2.43 1. 71 1.33 1.10 0.96 0.85 0.77 (225 (154 (207 (294 (377 (456 (523 (591 (652 kRPS)* kRPS) kRPS) kRPS) kRPS) kRPS) kRPS) kRPS) kRPS) (* kilo Reductions Per Second) (3) the sender locks the (wrong) record r (the goal record for the receiver or a buffer for the communication cell) obtained in Step (1) and calls the code pointed to by the first field (the code) of the updated communication cell. This can be avoided by not letting the receiver update the second field of the communication cell. The receiver instead stores into the record r the pointer p to the right record. The receiver accordingly sets the first field of the communication cell to the pointer to a code sequence (to be called by the sender in Step (3)) that notifies the sender of the existence of the pointer p. The sender can now access the right record pointed to by p via the wrong record r, but it is still desirable that p is finally written into the second field of the communication cell so that the right record can be accessed directly next time. This update of the communication cell must be done before the sender is unlocked and the control is completely transferred to the receiver. For this purpose, we take advantage of the fact that the 1-byte lock of a record can take states other than 'locked' and 'unlocked'. When the lock of a record has one of these other states, a special routine corresponding to that state runs before the goal record of the sender is unlocked. This feature is being used for updating the second field of a communication cell safely. 6. An Experimental System and Its Performance We have almost finished the initial version of the abstract machine instruction set for the shared-goal method. An experimental runtime system for performance evaluation has been developed on Sequent Symmetry, a shared-memory parallel computer with 20MHz 80386's. The system is written in .an assembly language and C, and the abstract machine instructions are expanded into native codes automatically by a loader. A compiler from Moded Flat GHC to the intermediate code is yet to be developed. The current system employs a simple scheme of parallel execution as described in Section 5.3. When the system runs with more than one processor, one of them acts as a master processor and the others as slaves. They act in the same manner while the global deque is non-empty. When the master fails to obtain a new job from the deque, it tries to detect termination and exceptions such as stack overflow. The current system does not care about perpetually suspended goals; they are treated just like garbage cells in Lisp. A slight overhead of counting the number of goals in the system will be necessary to detect perpetually suspended goals [Inamura and Onishi 1990] and/or to feature the shoen construct of KL1 [Veda and Chikayama 1990], but it should scarcely affect the result of performance evaluation described below. Locking of shared resources, namely logic variables, goal records, communication cells, the global deque, etc., is done using the xchg (exchange) instruction as usual. V sing Program 1, we measured (1) the processing time of 5000 update operations with random keys given to an empty binary tree and (2) the processing time of 5000 search operations (with the same sequence of keys) to the resulting tree with 4777 nodes. The number of processors was changed from 1 to 8. For the oneprocessor case, a version without locking/unlocking operations was tested as well. The numbers include the execution time of the driver that sends messages to the tree. The result was compared with two versions of (sequential) C programs using records and pointers, one using recursion and the other using iteration. The performance of nreverse (Fig. 4) was measured as well. The results are shown in Table 1. The results show good (if not ideal) parallel speedup, though for search operations on a binary tree, the performance is finally bounded by the sequen- 808 tial nature of the driver and the root node. Access contention on the global deque can be another cause of overhead. Note, however, that the two examples are indeed harder to execute in parallel than running independent processes in parallel, because different chains of message sends share goals. Note also that the binary tree with 4777 nodes is not very de~p. The binary tree program run with 4 processors outperformed the optimized recursive C program. The iterative C program was more than twice as fast as the recursive one and was comparable to the GHC program run with 8 processors. The comparison, however, would have been more preferable to parallel GHC if a larger tree had been used. The overhead of locking/unlocking was about 30% in nreverse and about 10% in the binary tree program. Since nreverse is one of the fastest programs in terms of the kRPS value, we can conclude that the overhead of locking/unlocking is reasonably small on average even if we lock such small entities as individual goals. As for space efficiency, the essential difference between our implementation and C implementations is that GHC goal records have pointers to input streams while C records do not consume memory by being pointed to. The difference comes from the expressive power of streams; unlike pointers, streams can be unified together and can buffer messages implicitly. One may suspect that message-oriented implementation suffers from poor locality in general. This is true for data locality, because a single message chain can visit many goals. However, streams in process-oriented implementation cannot enjoy very good locality either, because a tail-recursive goal can generate a long list of messages. Both process-oriented and message-oriented implementations enjoy good instruction locality for the binary tree program and nreverse. Comparison of performance between a messageoriented implementation and a process-oriented implementation was reported in [Ueda and Morita 1990] for the one-processor case. 7. Conclusions and Future Works The main contribution of this paper is that messageoriented implementation of Moded Flat GHC was shown to benefit from small-grain, tightly-coupled parallelism on shared-memory multiprocessors. Furthermore, the result of preliminary evaluation shows that the absolute performance is good enough to be compared with procedural programs. These results suggest that the programming of reconfigurable storage structures that allow concurrent access can be a realistic application of Moded Flat GHC. Programmers need not worry about mutual exclusion necessitated by parallelization, because it is achieved automatically at the implementation level. In procedural languages, parallelization may well require major rewriting of programs. To our knowledge, how to deal with reconfigurable storage structures efficiently in non-procedural languages without side effects has not been studied in depth. We have not yet fully studied language constructs and their implementation for more minute control over parallel execution. The current scheme for the control of parallelism is a simple extension to the sequential system; it worked well for the benchmark programs used, but will not be powerful enough to be able to tune the performance of large programs. We need a notion of priority that should be somewhat different from the priority construct in KL1 designed for process-oriented parallel execution. The notion of fairness may have to be reconsidered also. KL1 provides the shoen (manor) construct as well, which is the unit of execution control, exception handling and resource consumption control. How to adapt the shoen construct to message-oriented implementation is another research topic. Acknowledgments The authors are indebted to the anonymous referees for helpful comments. References [Chikayama and Kimura 1987] T. Chikayama and Y. Kimura, Multiple Reference Management in Flat GHC. In Proc. 4th Int. Conf. on Logic Programming, MIT Press, 1987, pp. 276-293. [Inamura and Onishi 1990] Y. Inamura and S. Onishi, A Detection Algorithm of Perpetual Suspension in KLl. In Proc. Seventh Int. Conf. on Logic Programming, MIT Press, 1990, pp. 18-30. [Knuth 1973] D. E. Knuth, The Art of Computer Programming, Vol. 1 (2nd ed.). Addison-Wesley, Reading, MA, 1973. [Shapiro 1989] Shapiro, E., The Family of Concurrent Logic Programming Languages. Computing Surveys, Vol. 21, No.3 (1989), pp. 413-510. [Ueda and Morita 1990] K. Veda and M. Morita, A New Implementation Technique for Flat GHC. In Proc. Seventh Int. Conf. on Logic Programming, MIT Press, 1990, pp. 3-17. A revised, extended version to appear in New Generation Computing. [Ueda and Chikayama 1990] K. Ueda and T. Chikayama, Design of the Kernel Language for the Parallel Inference Machine. The Computer Journal, Vol. 33, No.6 (Dec., 1990), pp. 494-500. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 809 Towards an Efficient Com pile-Time Granularity Analysis Algorithm X. Zhong, E. Tick, S. Duvvuru, L. Hansen, A. V. S. Sastry and R. Sundararajan Dept. of Computer Science University of Oregon Eugene, OR 97403 Abstract We present a new granularity analysis scheme for concurrent logic programs. The main idea is that, instead ot" trying to estimate costs of goals precisely, we provide a compile-time analysis method which can efficiently and precisely estimate relative costs of active goals given the cost of a goal at runtime. This is achieved by estimating the cost relationship between an active goal and its subgoals at compile time, based on the call graph of the program. Iteration parameters are introduced to handle recursive procedures. We show that the method accurately estimates cost, for some simple benchmark programs. Compared with methods in the literature, our scheme has several advantages: it is applicable to any program, it gives a more precise cost estimation than static methods, and it has lighter runtime overheads than absolute estimation methods. 1 Introduction The importance of grain sizes of tasks in a parallel computation has been well recognized [6, 5, 7]. In practice, the overhead to execute small grain tasks in parallel may well offset the speedup gained. Therefore, it is important to estimate the costs of the execution of tasks so that at runtime, tasks can be scheduled to execute sequentiallyor in parallel to achieve the maximal speedup. Granularity analysis can be done at compile time or runtime or even both [7]. The compile-time approach estimates costs by statically analyzing program structure. The program is partitioned statically and the partitioning scheme is independent of runtime parameters. Costs of most tasks, however, are not known until parameters are instantiated at runtime and therefore, the compiletime approach may result in inaccurate estimates. The runtime approach, on the other hand, delays the cost estimation until execution and can therefore make more accurate estimates. However, the overhead to estimate costs is usually too large to achieve efficient speedup, and therefore the approach is usually infeasible. The most promising approach is to try to get as much cost estimation information as possible at compile time and make the overhead of runtime scheduling very slight. Such approach has been taken by Tick [10], Debray et al, [2], and King and Soper [4]. In this paper, we adopt this strategy. A method for the granularity analysis of concurrent logic programs is proposed. Although the method can be well applied to other languages, such as functional languages, in this paper, we discuss the method only in the context of concurrent logic programs. The key observation behind this method is that task spawning in many concurrent logic program language implementations, such as Flat Guarded Horn Clauses (FGHC) [13], depends only on the relative costs of tasks. If the compile-time analysis can provide simple and precise cost relationships between an active goal and its subgoals, then th~ runtime scheduler can efficiently estimate the costs of the subgoals based on the cost of the active goal. The method achieves this by estimating, at compile time, the cost relationship based on the call graph and the introduction of iteration parameters. We show that for common benchmark programs, the method gives correct estimates. 2 Motivations Compile-time granularity analysis is difficult because most of the information needed, such as size of a data structure and number of loop iterations, are not know~ until runtime. Sarkar [7] used a profiling method to get the frequency of recursive and nonrecursive function calls for a functional language. His method is simple and does not have runtime overheads, but can give only a rough estimate of the actual granularity. In the logic programming community, Tick [10j first proposed a method to estimate weights of procedures by analyzing the call graph of a program. The method, as refined by Debray [1], derives the call graph of the program, and then combines procedures which are mutually recursive with each other into a single cluster (i.e., a strongly connected component in the call graph). Thus the call graph is converted into an acyclic graph. Procedures in a cluster are assigned the same weight 810 which is the sum of the weights of the cluster's children (the weights of leaf nodes are one, by definition). This method has very low runtime overhead; however, goal weights are estimated statically and thus cannot capture the dynamic change of weights at runtime. This problem is especially severe for recursive (or mutually recursi ve ) procedures. As an example of the method, consider the naivereverse procedure in Figure 1. (The clauses in the nrev/2 program do not have guards, i.e., only head unification is responsible for commit.) Examining the call graph, we find that the algorithm assigns a weight of one to append/3 (it is a leaf), and a weight of two to nrev/2 (one plus the weight of its child). Such weights are associated with every procedure invocation and thus cannot accurately reflect execute time. Debray et al. [2] presented a compile-time method to derive costs of predicates. The cost of a predicate is assumed to depend solely on its input argument sizes. Relationships between input and output argument sizes in predicates are first derived based on so-called data dependency graphs and then recurrence equations of cost functions of predicates are set up. These equations are then solved at compile time to derive closed forms (functions) for the cost of predicates and their input argument sizes, together with the closed forms (functions) between the output and input argument sizes. Such cost and argument size functions can be evaluated at runtime to estimate costs of goals. A similar approach was also proposed by King and Soper [4]. Such approaches represent a trend toward precise estimation. For nrev/2, Debray's method gives Costnrev(n) = O.5n 2 + 1.5n + 1, where n is the size of the input argument. This function can then be inserted into the runtime scheduler. Whenever nrev/2 is invoked, the cost function is evaluated, which obviously requires the value n, the size of its first argument. If the cost is bigger than some preselected overhead threshold, the goal is executed in parallel; otherwise, it is executed sequentially. The method described suffers from several drawbacks (see [11] for further discussion). First, there may be considerable runtime overhead to keep track of argument sizes, which are essential for the cost estimation at runtime. Furthermore, the sizes of the initial input arguments have to be given by users or estimated by the program when the program begins to execute. Second, within the umbrella of argument sizes, different metrics may be used, e:g., list length, term depth, and the value of an integer argument. It is unclear (from [2, 4]) how to correctly choose metrics which are relevant for a given predicate. Third, the resultant recurrence equations for size relationships and cost relationships can be fairly complicated. It is therefore worth remedying the drawbacks of the above two approaches. It is also clear that there is a tradeoff between precise estimation and runtime overhead. In fact, Tick's approach and Debray's approach represent two extremes in the granularity estimation spectrum. Our intention here is to design a middleof-the-spectrum method: fairly accurate estimation, applicable to any procedures, without incurring too much runtime overhead. 3 Overview of the Approach We argue here, as in our earlier work, that it is sufficient to estimate only relative costs of goals. This is especially true for an on-demand runtime scheduler [8]. Therefore, it is important to capture the cost changes of a subgoal and a goal, but not necessarily the "absolute" granularity. Obviously the costs of subgoals of a parent goal are always less than the cost of the parent goal, and the sum of costs of the subgoals (plus some constant overhead) is equal to the cost of the parent goal. The challenging problem here is how to distribute the cost of the parent goal to its subgoals properly, especially for a recursive call. For instance, consider the naive reverse procedure nrev/2 again. Suppose goal nrev([1,2,3,4J ,R) is invoked (i.e., clause two is invoked) and the cost of this query is given, what are the costs of nrev( [2,3,4J ,R1) and append(R1,[lJ,R)? It is clear that the correct cost distribution depends on the runtime state of the program. For example, the percentage of cost distributed to nrev ( [1,2,3,4] , R) (i.e., as one of the subgoals of nrev([1,2,3,4,5] ,T) will be different from that of cost distributed to nrev ( [1,2J ,R). To capture the runtime state, we introduce an iteration parameter to model the runtime state, and we associate an iteration parameter with every active' goal. Since the cost of a goal depends solely on its entry runtime state, its cost is a function of its iteration parameter. Several intuitive heuristics are used to capture the relations between the iteration parameter of a parent goal and those of its children goals. To have a simple and efficient algorithm, only the AND/OR call graph of the program, which is slightly different from the standard call graph, is considered to obtain these iteration relationships. Such relations are then used in the derivation of recurrence equations of cost functions of an active goal and its subgoaIs: The recurrence equations are derived simply based on the above observation, i.e., the cost of an active goal is equal to the summation of the costs of its subgoals. We then proceed to solve these recurrence equations for cost functions bottom up, first for the leaf nodes of the modified AND/OR call graph, which can be obtained in a similar way in Tick's modified algorithm by clustering those mutually recursive nodes together in the AND/OR call graph of the program (see Section 2). After we obtain all the cost functions, cost distribution functions are derived as follows. Suppose the cost of an 811 active goal is given, we first solve for its iteration parameter based on the cost function derived. Once the iteration parameter is solved, costs of its subgoals, which are functions of their iteration parameters, can be derived based on the assumption that these iteration parameters have relationships with the iteration parameter of their parent, which are given by the heuristics. This gives the cost distribution functions desired for the subgoals. To recap, our compile-time granularity analysis procedure consists of the following steps: 1. Form the call graph of the program and cluster mutually recursive nodes of the modified AND lOR call graph. 2. Associate each procedure (node) in the call graph with an iteration parameter and use heuristics to derive the iteration parameter relations. 3. Form recurrence equations for the cost functions of goals and subgoals. 4. Proceed bottom up in the modified ANDOR call graph to derive cost functions. 5. Solve for iteration parameters and then derive cost distribution functions for each predicate. 4 4.1 Deriving Cost Relationships Cost Functions and Recurrence Equations To derive the cost relationships for a program, we u'se a graph G (called an AND lOR call graph) to capture the program structure. Formally, G is a triple (N, E, A), where N is a set of procedures denoted as {PI, P2, ... ,Pn} and E is a set of pair nodes such that (PI, P2) E E if and only if P2 appears as one of the subgoals in one of the clauses of Pl. Notice that there might be multiple edges (PI,P2) because PI might call P2 in multiple clauses. A is a partition of the multiple-edge set E such that (PbP2) and (PI,P3) are in one element of A if and only if P2 and P3 are in the body of the same clause whose head is Pl' Intuitively, A denotes what subgoals are AND processes. After applying A to edges leaving out a node, edges are partitioned into clusters which correspond to clauses and these clauses are themselves OR processes. Figure 2 shows an example, where the OR branches are labeled with a bar, and AND branches are unmarked. Leaf facts (terminal clauses) are denoted as empty nodes. As in [1], we modify G so that we can cluster all those recursive and mutually recursive procedures together and form a directed acyclic graph (DAG). This is achieved by traversing G and finding all stronglyconnected components. In this traversing, the difference between AND and OR nodes is immaterial, and we simply discard the partition A. A procedure is re- cursive if and only if the procedure is in a stronglyconnected component. After nodes are clustered in a strongly-connected component in G, we form a DAG G' , whose nodes are those strongly-connected components of G and edges are simply the collections of the edges in G. This step can be accomplished by an efficient algorithm proposed by Tarjan [9]. The cost of an active goal P is determined by two factors: its entry runtime state s during the program execution and the structure of the program. We use an integer n, called the iteration parameter, to approximately represent state s. Intuitively, n can be viewed as an encoding of a program runtime state. Formally, let S be the set of program runtime states, M be a mapping from S to the set of natural numbers N such that M(s) = n for s E S. It is easy to see that the cost of P is a function of its iteration parameter n. It is also clear that the iteration parameter of a subgoal of P is a function of n. Hereafter, suppose Pij is the ph subgoal in the ith clause of p. We use Iij (n) to represent the iteration parameter of Pij. The problem of how to determine function Iij will be discussed in Section 4.2. To model the structure of the program, we use the AND lOR call graph G as an approximation. In other words, we ignore the attributes of the data, such as size and dependencies. We first derive recurrence equations of cost functions between a procedure P and its subgoals by looking at G. Let Costp (n) denote the cost of p. Three cases arise·in this derivation: Case 1: P is a leaf node of G' which is nonrecursive. This includes cases where that P is a built-in predicate. In this case, we simply assign a constant c as Costp (n). c is the cost to execute p. For instance such cost can be chosen as the number of machine instructions in p. For the next two cases, we consider non-leaf nodes p, with the following clauses (OR processes), Let the cost of each clause be Costej (n) for 1 ::; j ::; k. We now distinguish whether or not P is recursive. Case 2: P is not recursive and not mutually recursive with any other procedures. We can easily see that 812 k Costp(n) ~ L CostcJ(n). 4.2 (1) j=1 There are several intuitions behind the introduction of the iteration parameter. As we mentioned above, ite~­ ation parameter n represents an encoding of a program runtime state as a positive integer. In fact, this type of encoding has been used extensively in program verification, e.g., [3], especially in the proof of loop termination. A loop C terminates if and only it is possible to choose a function M which always maps the runtime states of C to nonnegative integers such that M monotonically decreases for each iteration of C. Such encoding also makes it possible to solve the problem that once the cost of an active goal is given, its iteration parameter can be obtained. This parameter can be used to derive costs of its subgoals (provided the iteration-parameter functions 1m are given), which in turn give the cost distribution functions. Conservatively, we approximate Cost p(n) as the right-hand side of the above inequality. Notice that in a committed-choice language, the summation in the above inequality can be changed to the maximum (i.e., max) function. However this increases the difficulty of the algebraic manipulation of the resultant recurrence equations (see [11] for example) and we prefer to use the summation as an approximation. Case 3: P is recursive or mutually recursive. In this case, we must be careful in the approximation, since minor changes in the recurrence equations can give rise to very different estimation. This can be seen for spli t in qsort example in Section 2. Admittedly, the encoding of program states may be fairly complicated. Hence, to precisely determine the iteration-parameter functions for subgoals will be complicated too. In fact, this problem is statically undecidable since this is as complicated as to precisely determine the program runtime behavior at compile time. Fortunately, in practice, most programs exhibit regular control structures that can be captured by some intuitive heuristics. To be more precise, we first observe that some clauses are the "boundary clauses," that is, they serve as the termination of the recursion. The other clauses, whose bodies have some goals which are mutually recursive with p, are the only clauses which will be effective for the recursion. Without loss of generality, we assume for j > u, Cj are all those "mutually recursive" clauses. For a nonzero iteration parameter n (i.e., n > 0), we take the average costs of these clauses as an approximation: To determine the iteration-parameter functions, we first observe that there is a simple conservative rule: for a recursive body goal p, when it recursively calls itself back again, the iteration parameter must have been decreased by one (if the recursion terminates). This is similar to the loop termination argument. Therefore, as an approximation, we can use Im(n) = n - 1 as a conservative estimation for a subgoal pim which happens to be p (self-recursive). Other heuristics are listed as follows: and for n = 0, we take the sum of the costs of those "boundary clauses" as the boundary condi tion of Costp (n ): §1. For a body goal pim whose predicate only occurs in the body once and it is not mutually recursive with p (i.e., not in a stronglyconnected component of p), lim(n) = n. u Costp(O) = L CostCj (0). j=1 The above estimation only gives the relations between cost of p and those of its clauses. The cost of clause Cj can be estimated as nj Costcj(n) = CHeadj +L Costpjm(Ijm(n)) Iteration Parameters (3) m=1 where CHeadj is a constant denoting the cost for head unification of clause Cj and Ijm (n) is the iteration parameter for the mth body goal. Substituting Equation 3 back into Equation 1 or 2 gives us the recurrence equations for cost functions of predicates. §2. If Pim is mutually recursive with p and its predicate only occurs once in the body, lim (n) =n-1. §3. If Pim is mutually recursive with P and its predicate occurs I times in the body, where 1 > 1, lim ( n) = n / 1 (this is integer division, i.e., the floor function). The intuitions behind these heuristics are simple. Heuristic §1 represents the case where a goal does not invoke its parent. In almost all programs, this goal will process information supplied by the parent, thus the it- 813 eration parameter remains unmodified. Heuristic §2 is based on the previous conservative principle. Heuristic §3 is based on the intuition that the iteration is divided evenly for multiple callees. Notice for the situation in heuristic §3, we can also use our conservative principle. However, we avoid use of the conservative principle, if possible, because the resultant estimation of Cost p( n) may be an exponential function of n, which, for most practical programs, is not correct. These heuristics have been derived from experimentation with a number of programs, placing a premium on the simplicity of I (n). A partial summary of these results is given in Section 6. A remaining goal of future research is to further justify these heuristics with larger programs, and derive alternatives. 4.3 An Example: Quicksort After we have determined the iteration-parameter functions, we have a system of recurrence equations for cost functions. These system of recurrence equations can be solved in a bottom-up manner in the modified graph G'. The problem of systematically solving these recurrence equations in general is discussed in [11]. Here, we consider a complete example for the qsort/2 program given in Figure 2. The boundary condition for Costqsort (n) is that Costqsort(O) is equal to the constant execution cost d1 of qsort/2 clause one. The following recurrence equations are derived: Costqsort(O) Costqsort(n) With Heuristic §3, we have CostC2 = d2 + Costsplit(n) + 2Costqsort(n/2) where d2 is the constant cost for the head unification of the second clause of qsort/2. Similarly, the recurrence equations for Costsplit (n) are Costsplit(O) Cost split (n ) d3 (Costc2 + CostcJ/2 Furthermore, Cos t C2 CostC3 d4 + Costsplit(n -1) where d4 is the constant cost for the head unification of the second (and the third) clause of spl it. We first solve the recurrence equations for split, which is in the lower level in G' and and then solve the recurrence equations for qsort. This gives us Costsplit(n) = d3 +d4 n which can be approximated as d4 n and Costqsort(n) = d1 + d2 10g n + d4 n log n, which is the well known average complexity of qsort. Finally, it should be noted that it is necessary to distinguish between the recursive and nonrecursive clauses here and take the average of the recursive clause costs as an approximation. If we simply take the summation of all clause costs together as the approximation of the cost function, both cost functions for split and qsort would be exponential, which are not correct. More precisely, if the summation of all costs of clauses of split is taken as Costsplit(n), we will have Costsplit(n) = d3 + 2(d4 + Costsplit(n - 1)) The solution of Costsplit (n) is an exponential function, which is not correct. 5 Distributing Costs So far, we have derived cost functions of the iteration parameter for each procedure. However, to know the cost of a procedure, we need to first know the value of its iteration parameter. This, as pointed out in our introduction, may require too much overhead. We notice that, in most scheduling policies (such as on-demand scheduling), only relative costs are needed. This can be relatively easily achieved in our theory since cost functions only have a single parameter (iteration parameter). To derive cost distributing formulae for a given procedure and its body goals, the first step is to solve for the iteration parameter n in Equation 3 assuming that Costp(n) is given at runtime as Cpo Assuming that clause i is invoked in runtime, we approximate CoStCi (n) as Cp and solve Equation 3 for n. Let n = F( Cp ) be the symbolic solution, which depends on the runtime value of Costp(n) (i.e., Cp ), we can easily derive costs of its subgoals of clause i as we can simply substitute n with F( Cp ) in CostPim (Iim( n)), which gives rise to the cost distributing functions we need to derive at compile time. Let's reconsider the nrev/2 procedure. equations are derived as follows: Costnrev(n) Cost nrev (0) Costnrev(n - 1) The cost + Costappend(n) Cl Cost append (n) Costappend(n - 1) Costappend(O) C2 + Ca We can easily derive the closed forms for these two cost functions as Costappend(n) = n x C a + C2 which can be approximated as C a x n, and Costnrev(n) == C a X n 2 /2. Now, given the Costnrev(n) as C n we solve for nand have n = ~. Hence, we have Cost nrev (n - 1) = 814 Ca(jWf -.1)2/2 and. C~stap~end(n) = CajWf. These are the desIred cost dIstrIbutmg functions. It should be pointed out that in some cases it is not necessary to first derive the cost functions and then derive the cost distributing functions since we can simply derive the cost distributing scheme directly from the cost recurrence equations. For example, consider the Fibonacci function, where the cost equations are Cost jib( n) Cj Costjib(O) CI + 2 x Costjib(n/2) Without actually deriving the cost functions of Cost jib( n), we can simply derive the cost distributing relationship from the first equation as Costjib(n/2) = (Costjib(n) - Cj )/2. Also note that at compile time, the cost distributing functions should be simplified as much as possible to reduce the runtime overhead. It is even worthwhile sacrificing precision to get a simpler function. Therefore, a conservative approach should be used to derive the upper bound of the cost functions. In fact, we can further simplify the cost function derived in the following way. If the cost function is of a polynomial form such as conk + cln k- l + ... Ck, we simplify it as kconk and if the cost function is of several exponential components such as Clan + C2bn where b > a, we simplify it as (Cl + c2)bn . This will simplify the solution of the iteration parameter and the cost distributing function and hence simplify the evaluation of them at runtime. 5.1 Runtime Goal Management The above cost relationship estimation is well suited for a r~ntime scheduler which adopts an on-demand scheduling policy (e.g., [8]), where PEs maintain a local queue for active goals and once a PE becomes idle, it requests a goal from other PEs. A simple way to distribute a goal to a requesting PE is to migrate an active goal in the queue. The scheduler should adopt a policy to decide which goal is going to be sent. It is obvious that the candidate goal should have the maximal grain size among those goals in the queue. Hence, we can use a priority queue where weights of goals are their grain sizes (or costs). The priority is that the bigger the costs are, the higher priority they get. Because the scheduler only needs to know the relative costs, we can always assume the weight of the initial goal is some fixed, big-enough number. Based on this initial cost and the cost distributing formulae derived at compile time, every time a new clause is invoked, the scheduler derives the relative costs of body goals. The body goals are then enqueued into the priority queue based on their costs. Some bookkeeping problems arise from this approach. First, even though we can simplify the cost distributing functions at compile time to some extent, the runtime overhead may still be large, since for each procedure invocation, the scheduler has to calculate the weights of the body goals. One solution to this problem is to let the scheduler keep track of a modulo counter and when the content of the counter is not zero, the scheduler simply lets the costs of the body goals be the same as that of their parent. Once the content of the counter becomes zero, the cost-distributing functions are used. If we can choose an appropriate counting period, this method is reasonable (one counter increment has less overhead than the evaluation of the cost estimate). Another problem in this approach is that for longrunning programs, costs may become negative, i.e., the initial weight is not large enough. Since we require only relative costs, a solution is to reset all costs (including those in the queue, and in suspended goals), when some cost becomes too small. Cost resetting requires the incremental overhead of testing to determine when to reset. As stated above, we need to choose the initial cost as big as possible. However, this can introduce an anomaly for our relative cost scheme. To see this, consider the nrev example again. Suppose that the initial query is nrev([l, ... ,50J). The correct query cost is approximately 50 x 50 = 2500. The correct cost of its immediate append goal is approximately 49, and the correct cost of one of its leaf descendant goals nrev ( [J) is one (the head unification cost). If we choose the initial cost as a big number, say 106 , then the corresponding iteration parameter is 103 • This will give the cost of nrev ( [J ) as (10 3 - 50)2 which is bigger than the estimated cost of the initial append goal (only around 10 3 ). In other words, this gives an incorrect relationship between goals near the very top and near the very bottom of the proof tree. For this particular example, the problem could be finessed by precomputing the "correct" initial value of the iteration parameter: exactly equal to the weight of the query. However, in general, a correct initial estimation is not always possible, and when it is possible, its computation incurs too much overhead. All compile-time granularity estimation schemes must make this tradeoff. Fortunately, in our scheme" the problem is not as serious as it first appears. For initial goals with sufficiently large cost, our scheme is still able to give correct relative cost estimation for sufficiently large goals which are not close to leaves of the execution call graph. This can be seen in the nrev example, where the relative costs among nrev ( [2, ...• 50J through nrev ( [42, ... ,50J ), and the initial append are still correct in our scheme. Correct estimation for the large goals (those near the root of the proof tree) is more important than that for small goals (those near the leaves ) because the load balance of the system is largely dependent on those big goals, and so is performance. 815 Heuristic §1 §2 §3 all Applicable 24 29 4 32 Correct 21 26 2 27 Percentage 87.5% 89.6% 50.0% 84.7% Table 1: Statistics for Benchmark Programs Heuristic §1 §2 §3 all Applicable 64 49 6 111 Correct 57 55 4 101 Percentage 89.1% 87.3% 66.7% 91.0% level while recursively traversing down for each element (which may be tree structures). Again, this presents inherent difficulty for our scheme because we take the call graph as the sole input information for the program to be analyzed. To summarize, our statistics show that our scheme achieves a fairly high percentage. of correct estimation. However, we need to apply multiply-recursive heuristics §2 and §3 with more finesse. Further quantitative performance studies of the algorithm's utility are presented in Tick and Zhong [11]. Those multiprocessor simulation results quantify the advantage of dynamically scheduling tasks with the granularity information. Table 2: Statistics for a Compiler Front End 7 6 Conclusions and Future Work Empirical Results: Justifying the Heuristics We applied our three heuristics and the cost estimation formulae to two classes of programs. The first class includes nine widely used benchmark programs [12], containing 32 procedures. The second class consists of 111 procedures comprising the front-end of the Monaco FGHC compiler. The results are summarized in Table 1 and Table 2. For each heuristic, the tables show the number of procedures for which the heuristic is applicable (by the syntactic rules given in Section 4.2), and the number for which the heuristic is correctly estimates complexity. The row labeled "all" gives the total number of procedures analyzed. Since more than one heuristic may be applicable in a single procedure, the total number of procedures may be less than the sum of the previous rows. From the tables, we see that §1 and §2 apply most frequently. This indicates that most procedures are linear recursive (i.e., have a single recursive body goal) which can be estimated correctly by our scheme. The relatively low percentage of §3 correctness is because the benchmarks are biased towards procedures with exponential time compl~xity, whereas §3 usually gives polynomial time complexity. Analysis of the benchmarks indicated two major anomalies in the heuristics. Although §1 may apply, a procedure may distribute a little work (say, the head of a list) to one body goal and the rest of the work (say, the tail of the list) to another goal. This cannot be captured by §1, which essentially treats the head and tail of the list as equal, i.e., a binary tree. A correct cost analysis needs to explore the data structures of the program. For recursive procedures, §3 can capture only the fixed-degree divide & conquer programming paradigm. However, the compiler benchmark contained procedures which recursively traverse a list (or vector) and the degree of the divide & conquer dynamically depends on the number of top-level elements in the list (or vector). In this situation, the procedure may have to loop on the top We have proposed a new method to estimate the relative costs of procedure execution for a concurrent language. The method is similar to Tick's static scheme [10], but gives a more accurate estimation and reflects runtime wei,l!;ht chan,l!;es. This is achieved bv the introduction of an iteration parameter which is used to model recurSIons. Our method is based on the idea that it is not the absolute cost, but rather the relative cost that matters for an on-demand goal scheduling policy. Our method is also amenable to implementation. First, our method can be applied to any program. Second, the resultant recurrence equations can be solved systematically. In comparison, it is ·unclear how to fully mechanically implement the schemes proposed in [2, 4]. Nonetheless, our method may result in an inaccurate estimation for some cases. This is because we use only the call graph to model the program structure, not the data. We admit that further static analysis of program structure such as argument-size relationships can give more precise estimations. Future work in granularity analysis includes the development of a more systematic and precise method to solve the derived recurrence equations. It is also necessary to examine this method for more practical programs, performing benchmark testing on a multiprocessor to show the utility of the method. Acknowledgements E. Tick was supported by an NSF Presidential Young Investigator award, with funding from Sequent Computer Systems Inc. The authors wish to thank S. Debray and the anonymous referees for their helpful criticism. REFERENCES [1] S. K. Debray. A Remark on Tick's Algorithm for Compile-Time Granularity Analysis. Logic Programming Newsletter, 3(1):9-10, 1989. 816 [2] S. K. Debray, N.-W. Lin, and M. Hermenegildo. Task Granularity Analysis in Logic Programs. In SIGPLAN Conference on Programming Language Design and Implementation, pages 174-188. ACM Press, June 1990. [3] D. Gries. Science of Programming. Springer-Verlag 1989. ' [4] A. King and P. Soper. Granularity Control for Concurrent Logic Programs. In International Computer Conference, Turkey, 1990. [5] B. Kruatrachue and T. Lewis. Grain Size Determination for Parallel Processing. IEEE Software, pages 23-32, January 1988. [6] C. McGreary and H. Gill. Automatic Determination of Grain Size for Efficient Parallel Processing. Communications of the ACM, 32:1073-1978, 1989. [7] V. Sarkar. Partitioning and Scheduling Parallel Programs for Execution on .Multiprocessors. MIT Press, Cambridge MA, 1989. [8] M. Sato and A. Goto. Evaluation of the KLI Parallel System on a Shared Memory Multiprocessor. In IFIP Working Conference on Parallel Processing, pages 305-318. Pisa, North Holland, May 1988. [9] R. E. Tarjan. Data Structures and Network Algorithms, volume 44 of Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia PA, 1983. [10] E. Tick. Compile-Time Granularity Analysis of Parallel Logic Programming Languages. New Generation Computing, 7(2):325-337, January 1990. [11] E. Tick and X. Zhong. A Compile-Time Granularity Analysis Algorithm and its Performance Evaluation. Journal of Parallel and Distributed Computing, submitted to special issue. [12] E. Tick. Parallel Logic Programming. MIT Press, Cambridge MA, 1991. [13] K. Veda. Guarded Horn Clauses. In E.Y. Shapiro, editor, Concurrent Prolog: Collected Papers, volume 1, pages 140-156. MIT Press, Cambridge MA, 1987. nrev( [] ,R) : - R= [] . nrev([HIT],R) :- nrev(T,R1), append(R1,[H],R). append([],L,A) ;- A=L. append ( [H IT] ,L,A) : - A= [H IA1], append(T ,L ,A1) ·back to nrev back to append Figure 1: Naive Reverse and its Call Graph qsort([], S) :- S=[]. qsort([MIT],S) :split(T,M,S,L), qsort(S,SS), qsort(L,LS), append(SS,LS,S). spli t ( [] , M, s, L) : - S= [], L= [] . split([HIT],M,S,L) :- H < M I S=[HITS] , split(T,M,TS,L). split([HIT],M,S,L) :- H >= M I L=[HITL], split(T,M,S,TL). back to qsort back to split Figure 2: Quick Sort: FGHC Source Code and the AND/OR Call Graph PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 817 Providing Iteration and Concurrency in Logic Programs through Bounded Quantifications Jonas Barklund and Hakan Millroth, UP MAIL Computing Science Dept., Uppsala University, Box 520, S-751 20 Uppsala, Sweden E-mail: jonas@csd.uu.seorhakanm@csd.uu.se Abstract Programs operating on inductively defined data structures, such as lists, are naturally defined by recursive programs. Millroth has recently shown how many such programs can be transformed or compiled to iterative programs operating on arrays. The transformed programs can be run more efficiently than the original programs, particularly on parallel computers. The paper proposes the introduction of 'bounded quantifications' in logic programming languages. These formulas offer a natural way to express programs operating on arrays and other 'indexable' data structures. 'Bounded quantifications' are similar to 'array comprehensions' in functional languages such as Haskell. They are inherently concurrent and can be run efficiently on sequential computers as well as on various classes of parallel computers. 1 PROCESSING DATA STRUCTURES There are two principal ways of building a data structure in a logic program. AI. Use a recursive relation which defines explicitly the contents of a finite part of the data structure and then uses itself recursively to define the rest of the data structure. (There is, of course, an obvious duality between these operations. ) Method A is often natural when one uses inductively defined data structures, including lists, trees, etc. Method B is often natural when one uses data structures whose elements can be indexed. Some data structures, most importantly lists, fall in both categories and which method is most natural depends on the context. 2 RECURSION We can broadly classify recursive programs in 'conjunctive' and 'disjunctive' programs (some are a mixture). The former category use recursion to compute a conjunction, like the following lessall program. l lessall(A, [BIXD lessall(A, [D. f- A < B 1\ lessall(A, X). A formula lessall(A, [Bt, B 2, . .. ,BnD reduces to the finite conjunction A < BI 1\ A < B2 1\ ... 1\ A < Bn which could be expressed more briefly as Vi{l :::; i :::; n -+ A < Bd. BI. Express directly the contents of each element of the data structure, preferrably through an 'indexing' of the elements of the data structure. This reduction can be performed at compile time, except that the value of n is the length of the list actually supplied to the program. Such a program can be run efficiently as an iteration on a sequential computer. The latter category uses recursion to compute a disjunction, for example the member program. Correspondingly there are two principal ways of traversing a data structure in a logic program. member(A, [BIXD member(A, [BIX]) A2. Use a recursive relation which examines explicitly the contents of a finite part of the data structure and then uses itself recursively to traverse the rest of the data structure. B2. Access directly the contents of each element of the data structure, preferrably through an 'indexing' of the elements of the data structure. ff- A = B. member(A, X). A formula member(A, [BI, B 2, ... , BnD reduces to the finite disjunction A = BI V A = B2 V ... V A = Bn lOur language consists (initially) of clauses whose bodies may contain comj unctions , disjunctions and negations. We assume "Herbrand" equality except for arithmetic expressions and array elements. All examples can be easily translated into Prolog or Godel (Hill & Lloyd, 1991). 818 which could, in turn, be expressed more briefly as 3i{1 ~ i ~ n /\ A = Bd which can, similarly, be run efficiently. Millroth's compilation method (1990, 1991), based on Tarnlund's Reform inference system (1992) transforms 'conjunctive' and 'disjunctive' recursive programs to the iterative programs above. 2.1 Concurrency The conjunction, or disjunction, in a logic program can be interpreted as a concurrent operator, as in ANDparallel and OR-parallel logic programming systems. This does not yield sufficient concurrency for running recursive programs efficiently on parallel computers. Even using a concurrent connective, work is only initiated on one 'recursion level' in each step. This implies a linear run time which can be approximated by an expression An + B (where A is the overhead for each recursion level, n is the recursion depth and B is the time spent in each recursion level). The number of literals in a recursive clause is typically much smaller than the depth of the recursion. For recursive programs with simple bodies, such as lessall or member, the An term will always dominate Only for small recursion depths and complex bodies will the B term be significant. Recursive programs transformed by Millroth's method have a much larger potential to run efficiently on parallel computers. The iterative programs can be run in parallel on n processors unless prohibited by data dependencies etc. Techniques for parallelizing this kind of iterations have been developed for, and applied to, FORTRAN programs for some time. 3 step( Gl , G z ) f size(O, Gt, So) /\ size(O, Q, So) /\ size(O, G 2 , So) /\ size(l, Gl , SI) /\ size(l, Q, Sl) /\ size(l, G z, SI) /\ 'v'I'v'J{Q[I,JJ = GIll -1 mod So,J -1 mod SIJ + GIll -1 mod So,JJ + GIll -1 mod So,J + 1 mod SIJ + Gl [I, J -1 mod SIJ + GIlI,J + 1 mod Stl + GdI + 1 mod So,J -1 mod SIJ + GIll + 1 mod So,J] + Gl [I + 1 mod So, J + 1 mod SIJ -+ (Q[I, JJ < 2/\ Gz[I, J] = 0 V Q [I, JJ = 2 /\ Gz [I, J] = 1 V Q [I, J] = 3 /\ Gz [I, J] = 1 V Q [I, J] > 3 /\ Gz [I, J] = O)}. We can also present a simple example of the use of explicit existential quantifiers. The problem is to find the position I in a array X of some element which is smaller than a given value A. small(I, X, A) f- f- 'v' B'v' I {X[IJ 4 -+ reverse(X1 ,XZ ) f size( 0, Xl, L) /\ size( 0, X z , L) /\ 'v'A'v'I{XdIJ = A -+ Xz[L - 1- 1J = A}. (Our notation assumes that the expression L - I - 1 is evaluated and replaced by its value. We also assume that array indices are zero based. Finally, we let size(D, X, S) express that the size of the array X in dimension D is S.) We may express one generation of Conway's game of Life: = I}. BOUNDED QUANTIFICATION --+ [x]} where 8 is a formula which is "obviously" true for only a finite number of values of x, denoted by, say, { Co, Cl, ... , Ck-l}. In this case the quantification is clearly equivalent to the finite conjunction (8[eo] (8[elJ --+ (8[Ck-l] --+ A < B}, provided that the the value of the expression X[IJ is the I th element of the array X. We may express reversal of the elements in an array: B < A /\ J Consider those universally quantified formulas which are instances of the schema 'v'x{8[xJ =B -+ In all these examples we have quantified over the elements of an indexable data structure. There are other useful relations which can be expressed naturally in this way, and run efficiently. Specifically we want to include all quantifications over the elements of a finite set, whose members are 'obvious'. Below we will be somewhat more precise what this means. EXPLICIT QUANTIFICATION It is possible to build arrays and other indexable data structures, or express relations over them using recursive programs. It is often more natural to use a universal or existential quantification over the members of the data structure. We may express the lessall relation over arrays as lessall( A, X) 3J {X[J] = B --+ [co]) /\ [CI]) /\ ... /\ [Ck-l]) which is, by the definition of 8, equivalent to Similarly, a formula which is an instance of the schema 3x{8 [xJ/\ [x]} is under the same assumptions equivalent to We propose to 1. identify a set of formulas which always are true for only a finite number of objects, we call them range formulas, 819 2. make a system which recognizes those instances of the schema above where 8 is a range-formula, we call them bounded quantifications, and 3. interpret bounded quantifications concurrently. The conjuncts obtained from a bounded quantification may be run in any order, even simultaneously, provided that any data dapendencies (arising, e.g., from numerical expressions) are satisfied. Since a range formula is required to hold for a finite number of objects, it is possible to enumerate them (as we have indeed done above with {co, Cl, ... , Ck-l} ). It will become apparent from examples below that it is very useful to have range formulas relate each object with a unique integer in {a, 1, ... , k - I}. In the following sections we will first identify a few useful range formulas and then show how to run bounded quantifications efficiently on sequential and parallel computers. 5 RANGE FORMULAS The following is an incomplete set of interesting range formulas. 5.1 Array and "structure" elements As we have seen, it is useful to quantify over all elements of a data structure. In an array, each element is associated with a unique integer in the range, say, {a, 1, ... ,n}. We could, for example let XlI] = E (where X is an array, I is a variable and E is a term) be a range formula and the lessall and reverse programs above are examples of its use. It may be difficult to write a compiler which recognizes precisely this use of an equality as a range formula. One solution would be to predefine, say, the predicate symbol eIt by elt(I, X, E) f- XlI] = E. and only recognize predications on the form elt(·,·,·) as range formulas. 5.2 Integer ranges An obviously useful range formula would be one which is true for the first k integers ([0, k - 1]). Again, the formula X /\ X < K expresses exactly that relation, but for practical reasons it may be wise to define the binary predication cardinal(X, K) to stand for the binary relation which is true whenever X < K. Note that the enumeration in this case coincides with the objects themselves. Note, moreover, that it is trivial to obtain a range formula which is true for all integers in an arbitrary range [I, J] using the binary cardinal predicate. °: :; °: :; 5.3 Enumerable types A logic programming language with types is likely to contain "enumerable" types, for example, finite sets of distinct constants. One may wish to consider any predication, whose predicate symbol coincides with the name of such a type, a range relation. For example, suppose that colour is a type with the elements spades, hearts, clubs, and diamonds (in that order). Then colour(I, X) is a range formula which is true if and only if I is and X is spades, I is 1 and X is hearts, I is 2 and X is clubs, or I is 3 and X is diamonds. Note that in this view an enumerable type of K elements is isomorphic with the integer range [0, K - 1], so it does not really add anything to the language as such. 2 ° 5.4 List elements and list suffixes Lists are usually operated upon by recursively defined programs. Still, there are occasionally reasons for expressing programs through bounded quantifications. We propose two range formulas involving lists. The first associates every element of some list with its (zero-based) position in the list. The second enumerates every (not necessarily proper) suffix of some list (with the list itself being suffix 0). We propose to recognize the predication member(I, L, X) as a range formula which is true if and only if X is the Ith element of the list L. The predication suffix(I, L, X) is a range formula which is true if and only if X is the I th suffix of the list L. Note that if the length of L is K and [] denotes an empty list, then suffix(O, L, L) and suffix(I(, L, []) are true formulas. (Since Prolog has no occur check, a programmer in that language could apply these predicates to cyclic "terms". We leave the behaviour in such a case undefined. ) 5.5 Finite sets Given that finite sets are provided as a data structure it would make sense to have range formulas for sets (e.g., membership), as has been suggested by Omodeo (personal communication). This is an interesting proposal, but is is difficult to represent arbitrary sets efficiently in a way that allows the elements to be enumerated. Multisets (bags) are easier to implement, but these are, on the other hand, quite similar to lists, except that the order in which elements occur is irrelevant. 6 SEQUENTIAL ITERATION Consider a bounded quantification \f x {8 [x] ---+ <.P [x]}, such that 8[x] is true when (and only when) the value of x is one of {CO,Cl, ... ,Ck-d. We may run the conjuncts <.P[ca] /\ <.P[Cl] /\ ... /\ [ck-d in any order, provided that any data dependencies are satisfied. 2They do, however, seem to make programs easier to understand and debug. 820 We consider now a bounded quantification without dependencies. Running it on a sequential computer is straightforward: translate the quantified formula into an iteration which evaluates, in sequence, the formulas cI>[co], cI>[Cl], ... , cI>[Ck-l]. Since the compiler knows in advance about the possible range formulas, it may generate specialized code for each kind of range formula. For example, if the range formula 8[x] is member(I, X, L) then we can illustrate the resulting code as allocate_environment; = deref(l); while Cy != NIL) y { x = deref(y->head); code for cI>[x]; y = deref(y->tail); } deallocate_environment; using a C-style notation. (Note that we ignore the enumeration of the list elements in this example.) Assuming that the implementation is based on WAM (Warren, 1983) the "code for cI>[x]" may introduce choice points (and thus be unable to deallocate environments) if there are alternative solutions for cI>[x1. In the important case that the proof for cI>[x] is deterministic, every pass through the loop will begin in the same environment. This is more efficient than the corresponding recursive computation in Prolog (under WAM) which will allocate and deallocate an environment for each recursive call. Most implementations will also refer to the symbol table when making the recursive call. That is somewhat less efficient than the (conditional) jump performed at the end of a loop. We predict that together these improvements will result in substantial savings, particularly when proofs are deterministic, the bodies of recursive clauses are small and recursion is deep. Meier also notes these advantages when compiling some recursive programs as iterations (1991). 7 PARALLEL ITERATION On sequential computers bounded quantification, when at all appropriate, is likely to offer significant improvements over the corresponding recursive programs, run in the usual way. The potential speed-ups on parallel computers are still more dramatic. Consider the conjunction obtained from a bounded quantification Vx{8[x] -+ cI>[x]}. Since we may run the conjuncts in any order, we may also run them all in parallel (similarly for disjunctions), provided that we add synchronization to satisfy dependencies. 7.1 Running deterministic programs There are several methods for running deterministic iterations in parallel; these ideas have successfully applied to FORTRAN programs for a long time. The following is one of the simplest. If there are k processors, numbered from 0 to k - 1, simply let processor i evaluate cI>[Ci], for each i, 0 ::; i < k. If there are fewer than k processors, say k' processors, simulate k processors by letting processor i evaluate cI>[Cj], for each j, 1 ::; j < k, such that j modulo k' is i. If the computation of each cI>[Ci] is deterministic, then this is quite straightforward. 7.2 Running nondeterministic programs Suppose that the formula cI> is such that there is a choice of two or more potential proofs for some conjunct cI>[Ci]. If no two conjuncts cI>[Ci] and cI>[Cj]' i #- j, share any variables, then we have independent parallelism in which backtracking is 'local' and easily implemented, cf., e.g., DeGroot (1984). This is a special case of the more general situation in which one can compute the variable assignments satisfying each conjunct independently of each other. For example, the conjuncts may share a variable, whose value at runtime is an array, and only access distinct elements of it. In general it is not possible to verify this condition statically so some run time tests will be necessary. Consider the other case: that the free variables of conjuncts interact in such a way that it is not possible to compute variable assignments independently for each conjunct. In that case the corresponding recursive program, if run in the usual way using depth-first search of the proof tree, has to perform deep backtracking to earlier recursion levels. When investigating this class of programs we have noted that they occur surprisingly infrequently. Running such programs often leads to a combinatorial explosion of potential proofs which is only feasible when backtracking over a few recursion levels. The programs also do not behave nicely when running on, e.g., WAM. They tend to consume stack space rapidly if choice information prevents environments from being deallocated. The problem of simultaneously finding variable assignments for a set of non-independent and non-deterministic conjuncts is also very difficult. Earlier research on backtracking in AND-parallel logic programming systems by, e.g., Conery (1987) confirms this claim. Our current position is therefore to refuse to run in parallel any bounded quantification for which we cannot show statically, or at least with simple run time tests, that the conjuncts are independent. In the context of AND-parallel logic programming systems, DeGroot among others have investigated appropriate run time tests for independency. Note that the overhead for such tests is lower in our context. One test (say, for determining whether a free variable in a bounded quantification is instantiated at run time) is sufficient for starting 821 arbitrarily many independent computations. By applying these requirements also when running bounded quantifications on sequential processors it is guaranteed that the stack size when starting the proof of each conjunct will be constant. 8 SIMD AND MIMD PARALLEL COMPUTERS We believe that bounded quantifications will run efficiently on both SIMD and MIMD parallel computers. When the bodies of bounded quantifications are simple and no backtracking is needed inside them, the capabilities of SIMD parallel computers are sufficient. It seems that most programs belong, or can be made to belong, to this class. For those programs which do more complicated processing in the bodies of bounded quantifications, e.g., backtracking, not all processors of a SIMD parallel computer will be active simultaneously. This will reduce the efficiency of such a computer, while it may still be possible to fully utilize a MIMD parallel computer. 9 OTHER OPERATIONS We think it is also beneficial to predefine certain useful operations, such as reductions and 'scans' over lists and arrays. Such operations will make it easy to eliminate many parallelization problems with variables shared between conjuncts in bounded universal quantifications. For example, this is a program which computes the inner product S of two arrays X and Y. i_p(X, Y, S) f size(O, X, Z) /\ size(O, Y, Z) /\ size(O, T, Z) /\ VIVQ{Y[I] = Q --+ T[I] = XlI] x Q} /\ reduce( +, T, S). The arrays X, Y and T are shared between all conjuncts but they all access distinct elements of the arrays. (The variable Q was only introduced to maintain the standard form of bounded quantifications. It seems convenient and possible to relax, the syntax to recognize expressions such as VI {T[I] = XlI] x Y[I]} as bounded quantifications, which is certainly even more elegant. Sometimes the partial sums are also needed in the computation. In this case it is useful to compute a 'scan' with plus over an array. The result is an array of the same length but where each element contains the sum of all preceding elements in the first array. 10 FURTHER EXAMPLES We now turn to a few more examples written using bounded quantifications. In the authors' opinion these formulas express at a high level the essentials of the algorithms they implement. In some cases they contain formulas reminiscent of what would be (informally expressed) loop invariants when programming in another language. 10.1 Factorial The following program computes the factorial of N. The program shows the use of the cardinal range formula. factorial( N, F) f size(O, T, N) /\ VI {cardinal(I, N) reduce( x, T, F). 10.2 --+ T[I] = I + 1} /\ Fibonacci The following program computes the Nth Fibonacci number. The program is remarkable in being both simple and efficient, since it does not recompute any Fibonacci numbers. Similar effects have been accomplished using 'memo' relations and 'bottom-up' resolution, etc., but this solution appears both simple, elegant and semantically impeccable. fibonacci( N, F) f size(O, T, N + 1) /\ VI {cardinal( I, N) --+ I I = °/\ T [1] = 1V = 1 /\ T [1] = 1 V 1 > 1 /\ T [I] = T [I - 1] F 10.3 + T [I - 2]} /\ = T[N -1]. Finding roots in oriented forests Suppose that the array P represents an oriented tree. 3 Each element of P contains the index of the parent of some node; roots contain their own index. The following program returns a new array in which each element points immediately to the root of its forest. This is an example of a parallel-prefix algorithm and it also illustrates how bounded quantifications and recursion can be used together. find(P,P) f - VI{P[I] = P[I] --+ P[I] = P[P[I]]}. find(Po, P) f VIVJ{Po[I] = J --+ (J = Po[J]/\ PdI] = JV J i= Po [J]/\ PI [I] = Po [J] )} /\ 10.4 Matrix transposition The following little program transposes a matrix. trans(MI, M 2 ) f size(O, M I , A) /\ size(1, Ml, B) /\ size(O, M 2 , B) /\ size(1, M 2 , A) /\ VIVJVQ{MI[I, J] = Q --+ M 2 [J, I] = Q}. 3Recall that an oriented tree is a "directed graph with a specified node R such that: each node N # R is the initial node of exactly on arc; R is the initial node of no arc; R is a root in the sense that for each node N # R there is an oriented path from N to R" (Knuth, 1968). 822 10.5 Numerical integration VI { cardinal(I, N) l bf (x)dx using Simpson's method (a quadrature method). In the program below we let A and B be the limits, N the number of intervals and I the resulting approximation of the integral. We assume that the relation r(X, Y) holds if and only if f(X) = Y, where f is the function being integrated. intsimp( A, B, N, I) f W = (B - A)/N /\ size(O, G, 2 x N + 1) /\ size(O, Z, N) /\ VIVY{G[I] = Y -+ r(A + I x W/2, Y)} /\ VIVS{Z[I] = S-+ S = W x (G[2 x I] + 4 x G[2 x I + 1] + G[2 x I + 2))/6} /\ reduce( +, Z, I). The array G is set up to contain the 2 x N + 1 values f(a), f(a + w/2), f(a + w), ... , f(b - w), f(b - w/2), f(b). These values are used to compute the area for each of the intervals, stored in Z. Finally the sum of the areas is computed. 10.6 Linear regression This is an example of a more involved numeric computation, adopted from Press et al. (1989). The problem is to fit a set of n data points (Xi, Vi)' :s; i < n, to a straight line defined by the equation Y = A + Bx. We assume that the uncertainty O'i associated with each item Yi is known, and that all Xi (values of the dependent variable) are known exactly. Let us first define the following sums. ° S - ",n-l 1 L....i=O ~ S x - ",n-l Ei.. L....i=O (T~ The coefficients A and B of the equation above can now be computed as The following program computes A and B from three arrays X, Y and U. linear_regression(X, Y, U, A, B) f size(O, X, N) /\ size(O, Y, N) /\ size(O, U, N) /\ size(O, Z, N) /\ size(O, Zx, N) /\ size(O, Zy, N) /\ size(O, Zxx, N) /\ size(O, Zxy, N) /\ -+ Z[I] = 1/(U[I] x Uri)) /\ Zx[I] = X[I]/(U[I] x Uri]) /\ Zy[I] = Y[I]/(U[I] x Uri)) /\ Zxx[I] = (X[I] x X[I))/(U[I] x Uri)) /\ Zxy[I] = (X[I] x Y[I))/(U[I] x Uri))} /\ reduce( +, Z, S) /\ reduce( +, Zx, Sx) /\ reduce( +, Zy, Sy) /\ reduce( +, Zxx, Sxx) /\ reduce( +, Zxy, Sxy) /\ Delta = S x Sxx - Sx x Sx /\ A = (Sxx x Sy - Sx x Sxy)/ Delta /\ B = (S x Sxy - Sx x Sy)/ Delta. The following program computes an approximation to the integral It is obvious that this program can be run in O(1og n) time using n processors, dominated by the reductions. The bounded quantification which computes the intermediate arrays Z, Zx, Zy, Zxx and Zxy runs in constant time using n processors. 11 LIST EXAMPLES The following two examples are present simply to show that it is possible to express also list algorithms using bounded quantifications, although the recursive programs are usually more elegant. 11.1 Lessall The lessall program for lists is of course very similar to the array program (this makes it easy to change the data structure) . lessall(A, L) 11.2 f- VBV I {member{I, L, B) -+ A < B}. Partition The partition program, finally, is an example of a program which is much clearer when expressed recursively. We intend that partition(X, A, L, H) be true if and only if L contains exactly those of elements of X which are less than or equal to A, and H contains exactly those which are greater than A. The partition predicate is usually part of an implementation of Hoare's Quicksort algorithm. Here is the recursive program: partition( [], A, [], []). partition([BIX]' A, L, [BIH)) f A :s; B /\ partition(X, A, L, H). partition([BIX], A, [BIL], H) f A> B /\ partition(X, A, L, H). In the following program which uses bounded quantifications, we have tried to keep some of the structure of the recursive program. partition(X, A, L, H) f VFxVZVI{suffiai...I,X, Fx) -+ member(I,SL,L) /\ member(I, SH, H) /\ part(Fx,L,H,A,SL,SH)} /\ 823 14 member(l, SL, L) /\ member(l, SH, H). part([] , [], [], A, SL, SH). part([BIX]' L, H, A, SL, SH) f J=I+1/\ member(J, SL, Ld /\ member(J, SH, HI) /\ (A ::; B /\ L = LI /\ H = [BIHI ] V A> B /\ L = [BILl] /\ H = Hd. The program computes two lists of lists SL and SH which are scans of partitions on X, picking out those elements which are less than or greater than A, respectively. 14.1 12 NESTED BOUNDED QUANTIFICATIONS Consider a bounded quantification whose body is another bounded quantification: Provided that 8dx] is true for any x in {co, Cll ... ,ck-d, and that similarly 8 2 [y] is true for any y in {do, d l , ... , dl-d, the nested bounded quantification is equivalent to the k x R element conjunction [Co, do] /\ [CI' do] /\ [co, dl ] /\ [CI, dl ] /\ . . . /\ ... /\ [co, dl /\ [CI' dl - l ] l ] As before, provided that all data dependencies are satisfied, all these conjuncts can be run simultaneously. 13 TOLERATING DEPENDENCIES In all examples shown above the computations of the conjuncts obtained from a bounded quantification have been independent. Therefore the conjuncts could be computed in any order, for example in parallel. There are interesting computations where the resulting conjuncts are dependent. Consider, for example, the following program (adapted from a program by Anderson & Hudak [1990]) which defines an n x n matrix A through a recurrence. rec(A) f size(O, A, N) /\ size(l, A, N) /\ VIVJ{A[I,J] = X -7 I=l/\X=lV I>l/\J=l/\X=lV I>l/\J>l/\ X = A[I - 1, J] + A[I -1, J - 1] + A[I, J -I]}. This program requires a co-routining implementation of bounded quantification to run on a sequential computer or synchronization to run on a parallel computer. We are currently investigating whether automatic generation of synchronization/ co-routining code is sufficient or if the programmer should be allowed to annotate the program, for example, through read-only variables (Shapiro, 1983). RELATED WORK We noted above that M. Meier has suggested (1991) how to compile some tail recursive (conjunctive as well as disjunctive) programs to iterative programs on top of WAM. Several authors, e.g., Lloyd & Topor (1984) and Sato & Tamaki (1989), have discussed methods for running logic programs with arbitrary formulas in bodies. Our method only covers a limited extension of Horn clauses. Array Comprehensions It is obvious that there are similarities between arrays and bounded quantifications on one side, and the array comprehensions proposed for the Haskell language (Hudak & Wadler, 1990) on the other. Both concepts aim to express the contents of an array, or the relationship between several arrays, declaratively. It appears to us, as with most functional programming language concepts, that when they are at all appropriate they offer a more compact and occasionally more elegant notation. For example, the factorial program above could have been expressed more easily if an expression describing the temporary array T could have been written immediately. However, when the relationship between the elements of more than one array are to be described, the bounded quantifications appear to be more comprehensive. Array comprehensions are, in general, evaluated by lazy computation. This can be thought of as a degenerated form of concurrency which suspends part of a computation until it is known that it must be performed. We do not think lazy computation is necessary, provided unification with the "logical variable" and a more general form of concurrency. Futures (Halstead, 1985) are yet another way of giving names for values which are yet' to be fully computed. 14.2 Nova Prolog The ideas presented above originated as a generalization of the language Nova Prolog (Barklund & Millroth, 1988).4 Here, however, it is appropriate to present Nova Prolog as a language embodying a subset of bounded quantifications. The subset is chosen to obtain a language tailored specifically for massively parallel SIMD computers, such as the Connection Machine. More specifically, we assume that we can store some data structures in such a way that processor i has particularly efficient access to the ith element of each data structure. We say that those data structures are distributed. 4Nova Prolog relates to Prolog in much the same way as *LISP (by Thinking Machines Corp.) relates to Common LISP and C* (also by Thinking Machines Corp.) to C. That is, it is a sequential programming langauge extended with a distributed data structure and a control structure for expressing computations over each element on the data structure. 824 We currently limit the distributed data structures to be compound terms; in fact only those compund terms whose function symbol is pterm and whose arity is some fixed value. We shall call them 'pterms.' (This is to help a compiler distinguish distributed data structures from other compound terms.) Since pterms are the only distributed data structures and they are compound terms, the only range formula we need is arg( i, t, x). 5 We have chosen a syntax for bounded quantifications which makes it possible to combine the range formula with the quantification of variables. In Nova Prolog a formula where T is a pterm, is called a 'parall' and has the same meaning as the bounded quantification YIYAN A z · .. YAn( arg(I , T I , AI) ~ arg(I, Tz, Az) 1\ ... 1\ arg(I, Tn, An) 1\ <1>[1]), namely that <1> is true for every corresponding element Ai of T i , 1 :::; i :::; n. We can see that in Nova Prolog the 'index' I is implicit and is denoted by the constant symbol self in the body <1>. All examples above for array computations can' be translated into Nova Prolog. vVe have recently implemented parts of Nova Prolog in *LISP (Blanck, 1991). 15 CONCLUSION AND FUTURE WORK We have defined bounded quantifications, a new construct for logic programming languages. We have discussed how they can be efficiently implemented on sequential and parallel computers. They offer clarity as well as efficiency and we propose that language designers and implementors consider including them in implementations of, e.g., Prolog, Godel and KLl. A natural continuation of this work is to verify experimentally that bounded quantifications can be implemented efficiently in sequential and concurrent languages, and on sequential and parallel computers. It is also important to investigate how data dependencies and other synchronization considerations can be handled, when bounded quantifications are interpreted concurrently. REFERENCES Anderson, S. & Hudak, P., 1990, Compilation of Haskell Array Comprehensions for Scientific Computing. In Proc. SIGPLAN '90 Coni. on Programming Language Design and Implementation. ACM Press, New York, N.Y. 5The difference from the elt predicate we proposed earlier is that arg operates on compund terms, rather than arrays, and that indexing is one-based. This is of course related to the use of the arg predicate in Prolog. Barklund, J. & Millroth, H., 1988, Nova Prolog. UPMAIL Tech. Rep. 52. Computing Science Dept., Uppsala University. Blanck, J., 1991, Abstrakt maskin for Nova Prolog. Internal report. Computing Science Dept., Uppsala University. DeGroot, D., 1984, Restricted And-Parallelism. In Proc. Inti. Coni. on Fifth Generation Compo Systems 1984, pp. 471-8. North-Holland, Amsterdam. Halstead, R., 1985, Multilisp-a Language for Concurrent Symbolic Computation. ACM TOPLAS, 2, 501-38. Hill, P. M. & Lloyd, J. W., 1991, The Godel Report. Tech. Rep. 91-02. Computer Science Dept., University of Bristol. Hudak, P. & Wadler, P., 1990, Report on the Programming Language Haskell. Tech. Rep. YALEU/DCS/ RR-777. Dept. of Computer Science, Yale Univ. Knuth, D. E., 1968, The Art of Computer Programming. Volume 1 / Fundamental Algorithms. Reading, Mass. Lloyd, J. W. & Topor, R. W., 1984, Making Prolog more Expressive. J. Logic Programming, 1, 225-40. Meier, M., 1991, Recursion vs. Iteration in Prolog. In Proc. 8th Inti. Coni. on Logic Programming (ed. K. Furukawa), pp. 157-69. MIT Press, Cambridge, Mass. Millroth, H., 1990, Reforming Compilation of Logic Programs. Ph.D. thesis. Uppsala Theses in Computing Science 10. Computing Science Dept., Uppsala University. (A summary will appear in the next item.) Millroth, H., 1991, Reforming Compilation of Logic Programs. In Proc. 1991 Inti. Logic Programming Symp. (ed. V. Saraswat, K. Ueda). MIT Press, Cambridge, Mass. Press, W. H. et al., 1989, Numerical Recipes. The Art of Scientific Computing. Cambridge Univ. Press, Cambridge, U.K. Sato, T. & Tamaki, H., 1989, First Order Compiler: a Deterministic Logic Program Synthesis Algorithm. J. Symbolic Computation, 8, 605-27. Shapiro, E., 1983, A Subset of Concurrent Prolog and Its Interpreter. Technical Report TR-003. ICOT, Tokyo. Tarnlund, s.-A., 1992, Reform. In Massively Parallel Reasoning Systems (ed. J. A. Robinson). To be published by MIT Press, Cambridge, Mass. Warren, D. H. D., 1983, An Abstract Prolog Instruction Set. SRI Tech. Note 309. SRI International, Menlo Park, Calif. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 825 An Implementation for a Higher Level Logic Programming Language Ross A. Paterson t Anthony S.K. Cheng* Software Verification Research Centre Department of Computer Science The University of Queensland 4072, Australia Abstract For representing high level knowledge, such as the math~ ematical knowledge used in interactive theorem provers and verification systems, it is desirable to extend Prolog's concept of data object. A basic reason is that Prolog data objects-Herbrand objects-are terms of a minimal object language, which does not include its own object variables, or quantification over those variables. Qu-Prolog (Quantifier Prolog) is an extended logic programming concept which takes as its data objects, object terms which may include free or bound occurrences of object variables and arbitrary quantifiers to bind those variables. Qu-Prolog is unique in allowing its data objects to include free occurrences of object variables. In this paper the design of the abstract machine for Qu-Prolog is given. The underlying design of the machine reflects the extended data objects and Qu-Prolog's unification algorithm. 1 Introduction The extended logic programming language Qu-Prolog (Quantifier Prolog) [Cheng et ai. 1991, Paterson and Hazel 1990, Paterson and Staples 1988, Staples et al. 1988a, Staples et ai. 1988b] has been designed to provide improved support for language processing applications such as interactive proof systems. Its main feature is that it supports higher level symbolic data types than does Prolog. In particular, the data objects which QuProlog reasons about are terms of a full first order logic syntax, which includes both object level variables and arbitrary bindings of object level variables. The language >.Prolog [Miller and Nadathur 1986], which extends Prolog with typed lambda-terms, may also be used for these purposes. Qu-Prolog is weaker, in that its terms correspond to second-order lambdaterms; substitution is provided, but not application of terms to terms. However, in Qu-Prolog, as in traditional notation, term variables may refer to open terms, raising further questions of whether an object level variable oc• e-mail: chenaGcs.uq.oz.au t present address: Department of Computing, Imperial College, London SW7. curs free in a term, or whether two object level variables are distinct. The Qu-Prolog Abstract Machine (QuAM) [Cheng and Paterson 1990] is designed as the target for compilation of the logic programming language Qu-Prolog. QuAM is developed from the Warren Abstract Machine (WAM). New mechanisms are introduced to handle quantified terms and substitutions and flexible programming in Qu-Prolog. This paper presents the basic structure of the language and describes its implementation. The main features of Qu-Prolog are described in section 2. In section 3, unification is extended to Qu-Prolog terms. The design of QuAM is given in section 4. Some examples are given in section 5. It is assumed that the reader has some knowledge of the design of WAM [AltKaci 1990, Warren 1983] and the compilation of logic programming languages. 2 Qu-Prolog - the Language Qu-Prolog has Prolog as a subset, and uses Edinburgh Prolog syntax for constants and structures, and for ordinary variables which are intended to range over arbitrary object level terms. These variables will be referred to as meta variables, in recognition of the meta level status of the Qu-Prolog language relative to the object language. In addition, Qu-Prolog introduces syntax to represent object level variables and quantifiers, as follows. Qu-Prolog has other features not described here. These include persistent variables, which are used to manage incomplete information in the database. For a description of persistent variables and their implementation, see [Cheng and Robinson 1991]. 2.1 Object Variables Since object level variables are simply part of the object level syntax, it might seem natural to name them at the Qu-Prolog (meta) level by constants. Instead, Qu-Prolog refers to object level variables only by a type of QuProlog (~eta) level variable, called object-var variables. The semantics of object-var variables is that they range over object level variables. The success of this approach reflects the common intuition that object level variables are interchangeable. 826 The phrase 'object variable' is commonly used to abbreviate 'object-var variable' since it has no other use in describing Qu-Prolog syntax. For an occasional reference to a variable of the object language, the phrase 'object level variable' will be used. Qu-Prolog ,object variables have the same lexical conventions as constants. In order to distinguish them, object variable notations must be declared by ob j ect_var /1. The declaration convention is that an explicit declaration of an object variable name also implicitly declares all variant names derived by appending an underscore followed by a positive integer. The standard library declares the atoms x, y and z as object variables. As each object variable is intended to range over all object level variables, it is important to know whether two object variables denote different object level variable. This information can be supplied implicitly or by explicit use of the predicate distinctJrom/2. For example, x distinctJrom y asserts that x and y do not denote the same object level variable. By default, all object variables occurring in the same clause/query are distinct from each other. Remark: In fact Qu-Prolog makes internal use of some meta level constants representing'object level variables. These terms, called local object variables, are mentioned below but they are not discussed here in detail. Their key role is as 'new' variables, for use when changing bound variables. This newness is implemented by a convention that they are excluded from instantiations of user accessible meta variables and object-var variables. 2.2 Quantifiers Qu-Prolog can reason about object level terms which include arbitrary quantifiers, in much the same way that Prolog can reason about terms which include arbitrary function symbols. The user declares quantifier notations as needed. Thus it is possible to have representations of f for integral calculus as well as 't/, 3 for first order logic. Distinct quantifier notations in Qu-Prolog represent distinct object level quantifiers. Qu-Prolog uses the traditional prefix notation for quantified terms. Quantifiers are declared explicitly by executing ope Precedence, quant, Q) where Q is the representation for the quantifier; Q must have the same lexical structure as a Prolog constant. 2.3 Substitutions Throughout logical reasoning, the need for substitutions arises naturally. Qu-Prolog directly supports parallel substitution for free occurrences of object level variables. The syntax for substitutions in Qu-Prolog is [tI/XI, ... ,tn/Xn ] * term where Xl, ••• ,Xn are object variables and tl, . .. ,tn are arbitrary Qu-Prolog terms. Qu-Prolog substitutions are evaluated at unification time, in accordance with the standard concept of correct substitution into quantified terms, which substitutes only for free occurrences of variables and which changes bound variables to avoid capture of free variables from the substituted terms. For a term 81 * ... * 8 n * Y where 81, ... ,8 n is a sequence of substitutions, the substitutions are applied from right to left. That is, 8 n is applied to y first. The effect of applying a substitution to a term can be observed with this example: After applying the rightmost substitution, the result will be: • • * ti if for some i = 1, ... , n, Xi = y, or 8 * Y if for all i = 1, ... , n, Xi distinct-from y. 8 It is also possible that there is insufficient information at a particular stage to determine which of these cases applies. In that case evaluation of the substitution will be delayed. That may lead to delaying of unification subproblems, perhaps extending beyond the current unification call. As well as substitutions appearing in user inputs, the system can generate substitutions via unification. For example, the problem lambda X A = lambda y B has the solution A = [x/V] * B. 2.4 Example As a small example, we give a A-calculus evaluator in Qu-Prolog. The terms of the A-calculus are transcribed directly, except that we use the infix constructor (0 for application. First, we declare the quantifier lambda and the application operator: ?- op(700, quant, lambda). ?- op(600, yfx, (0). Now the following predicate defines the structure of Aterms: lambda_ term(x) . lambda_term(A(OB) lambda_term(A), lambda_term(B). lambda_term(lambda x A) lambda_ term (A) . For example, the following are A-terms: x lambda x x (lambda x x) (Oy lambda x (x(Oy) Note that A-terms may contain free object variables. Now we can define a single-step reduction predicate on A-terms: 827 ?- op(800, xfx, =». (lambda x A)~B => [B/x]*A. A~B => C~B A => C. A~B => A~C B => C. lambda x A => lambda x B :- A => B. The first clause is the well-known ,B-rule. The others allow rewrites anywhere in the expression. If desired, we could also add the 17-rule: lambda x A~x => A :- x not_free_in A. The full reduction relation in the usual reflexive, transitive closure of the single-step reduction predicate: ?- op(800, xfx, =>*). A =>* C :- A => B, !, B =>* C. A =>* A. 3 3.2 Quantifiers To motivate the treatment of unification for quantified terms, consider [vlx] * t Qu-Prolog extends Prolog unification to cover the new data objects in the language. Two terms are unified if they are equivalent up to changes of bound variables (a equivalent). Since unification for Prolog terms is not changed (except that Qu-Prolog includes occurs checking), our discussion will concentrate on the new features. Because Qu-Prolog unification is more difficult than ordinary unification-it is not decidable, but semidecidable [Paterson 1989]-we often encounter subproblems which cannot be solved at that point in the computation, but we may be able to make further progress on them later. Such sub-problems are delayed, waiting for a relevant variable (or variables) to be instantiated, at which point they are re-attempted. If the sub-problems remain unsolved at the end of query solution, they are displayed as part of the answer. This approach has proved practical in our implementation. We have also found it useful to delay sub-problems to avoid branching. As a simple example, consider the unification problem [X/y]*Z = c, where cis a constant (a similar situation arises with structures). The unification can succeed in one of two ways: • Projection: Z = c. Here the substitution has a null = y and X = c. Hence it is impossible to determine a unique most general unifier. Rather than branch the unification problem, QuProlog delays it until the binding of Z is known. 3.1 Object Variables Since an object variable is intended to range over object level variables, and since object variables are the only Qu-Prolog terms of this type, an object variable can be instantiated only to another object variable. Further, unification fails if the object variables denote distinct object level variables. Also, whenever a meta variable is unified with an object variable, the meta variable is bound to the object variable. lambda y y Intuitively, the two terms are unifiable without instantiation of x or y, because the terms are the same up to change of bound variable. To unify x and y would be incorrect: the two terms are a equivalent even if x and y denote distinct object level variables. Hence during quantifier unification, Qu-Prolog uses substitution to rename the bound variables to a common bound variable. The bound variable must not appear in the unified terms. This is where the local object variables mentioned previously are used. In general, a problem of the form q x t = q y t' is reduced to Unification • Imitation: Z effect on Z. = lambda x x = [vly] * t' for some new local object variable v, and unification continues. Here is how the approach applies to the example, (v is a local object variable). lambda y y lambda x x *x [vlx]*x v lambda v [vly] lambda v [vlx] [vly] *y *y v (success) A substitution containing local object variables, when applied to a meta variable, may be removed by a rule called inversion: problem of the form [vlx] * X = t is reduced to the two problems a: X [xlv] * t, x not-free_in t = For example, we have the following reduction: lambda x A lambda y y A [v/y] * y [xlv] * [vly] * y, x not-free_in [v/y] * y x, x not-free_in v A x [v/x] * A A Unification produces the answer A = x. As a further example, consider lambda x A = lambda y x Since x does occur free on the right and cannot occur free on the left, this unification problem should fail. In QuProlog unification, that failure is detected when, at the time of calculation of A = [x I v] * [v I y] * x, the constraint x not-free_in [v I y] * x is generated and tested; and after substitution evaluation, the test fails. Such not-free_in constraints may be delayed if they cannot be immediately decided. For example, the unification problem lambda x A lambda y [xlz] *Z 828 gives the solution A = [x/v] * [v/y] provided: * [x/z] * Z x nOLfreLin [v/y] * [x/z] * Z In the absence of further information about Z, the noLfree_in test must be delayed. 3.3 data cell with a VARIABLE tag is used to indicate an unbound variable in Qu-Prolog. The value field of the data cell contains a pointer to a list of delayed problems associated with the variable (Figure 1). Although the representation of variables is different to WAM, the classification into temporary and permanent variables, the age determining method and the rules of binding a variable are retained. + IVARIABLEI Occurs Checking Unlike Prolog, occurs checking is included as standard in Qu-Prolog unification. However, it is not always possible to determine whether a variable occurs in the final form of a term. For example, it is impossible to determine whether X occurs in the term [X/y] * Z without knowing more information about Z. If Z is bound to y, X occurs in the term. On the other hand, if Z is bound to a constant c, X does not occur. Thus, if we are considering a sub-problem of the form X = t, we cannot always reduce the problem. We use two conservative syntactic conditions: delayed problems Figure 1: An Unbound Qu-Prolog Variable The REFERENCE tag is retained to indicate that one variable is bound to another one. When two heap variables are bound together, the one created more recently points to the one created earlier on the heap. The delayed problems from the younger one are appended to those of the older one. Unbound Object Variables • If X occurs in t outside of any substitution, and t is not of the form s * X, the unification fails, for the X must appear in t no matter how other variables are instantiated . delayed problems • If X does not appear in t, including substitutions, X is instantiated to t. distinct object variables Figure 2: An Unbound Object Variable If neither of these conditions is met, the unification subproblem must be delayed, pending further instantiation ofX. 4 One of the design criteria for QuAM is that the efficiency of ordinary Prolog queries within Qu- Prolog must be maintained wherever possible. Thus, most of the features of WAM are retained and the description below will concentrate on the differences between QuAM and WAM. The current implementation of QuAM differs from the present description in that it uses an experimental representation for structures, intended for future enhancements to the Qu-Prolog language with higher-order predicates and multiple-place quantifiers. The present paper focuses on other aspects of the machine, so we omit these details here, assuming a WAM-like representation of structures. Because of the difference of the representation of the structures, no performance evaluation will be given. A description of the current implementation can be found in [Cheng and Paterson 1990]. 4.1 x The Qu-Prolog Abstract Machine Data Objects Unbound Variables Because of the association with delayed problems described below, the representation of a self reference cell for unbound variables as in WAM is inapplicable. A OBJECLVARIABLE y z OBJECLVARIABLE Figure 3: x distincLfrom y and x distincLfrom z A separate tag OBJECT_VARIABLE is given tothe object variables to distinguish its function from the variables. The value field has the same purpose as the value field in variables. The second cell in an object variable points to a list of object variables from which it is distinguished (Figures 2, 3). Rather than record all object variables in the distinctness list, an ALL-DISTINCT tag is placed in this cell for local object variables. 829 The classification method, the binding rules and the age determining method used for variables is also applied to object variables. The OBJECT _REFERENCE tag indicates that an object variable is bound to another object variable. When two object variables are bound together, the distinctness information from both object variables are merged together and placed in the older object variable and the delayed problems will be woken up. substitution, while the renaming 2n cells refer to the object variables and terms. Again the substitution pairs are stored in the reverse order for easy evaluation (Figure 6). 2 OBJECT _REFERENCE Quantified Terms Qu-Prolog currently allows 1 place quantifiers (i.e. quantifiers with one bound variable) only. To represent quantified terms in Qu-Prolog, a tag QUANTIFIER is introduced, analogous to the STRUCTURE tag of the WAM. Such a value points to a three contiguous cells, containing the quantifier atom, a reference to the bound object variable, and the body ofthe quantified term (Figure 4). QUANTIFIER q ATOM OBJECT_VARIABLE term Figure 4: Quantified Term q x term Substitution Operators In QuAM, an application of one or more substitutions to a term is represented as a data cell, marked with a SUBSTITUTION_OPERATOR tag and pointing to a pair of cells. The first cell contains a pointer to the list of substitutions, while the second is a data cell denoting the term (Figure 5). The list of substitutions is stored in reverse order, with the innermost substitution at the front, to simplify evaluation. SUBSTITUTION_OPERATOR Figure 5: sub * term An ordinary parallel substitution is represented as a data cell with the property tag, containing a pointer to a pair of cells. The first of the cells is a pointer to the parallel substitution, while the second represents the rest of the substitution list. A parallel substitution involving n pairs of object variables and terms is represented as a block of 2n + 1 cells; the first contains the size of the INTEGER OBJECT_REFERENCE INTEGER - r---+ y 456 - r---+ x 123 Figure 6: A Parallel Substitution s * [123/x, 456/y] Each substitution list contains a marker describing the property of the substitution list. It is used during unification to assist the determination of whether or not the unification can be solved by projection. In general, a problem of the form s * A = t, where t is a constant, structure, quantified term or object variable, can always be reduced by imitation. If s is known not to contain any terms of the same top-level structure as t, then the problem cannot be solved by projection. Thus branching is eliminated and we can proceed by imitation. Otherwise, the unification problem will be delayed to avoid branching. In most cases, the whole substitution list must be examined in order to eliminate projection. In special cases, the marker will contain enough information to make a complete search unnecessary. It is also convenient to know if a substitution list consists solely of renamings generated by quantifier unification, as such a list can be safely inverted. Thus, each substitution list is marked as one of: • invertible: the substitution list consists solely of renamings. • object variables only: the substitution list is not invertible, but its range contains only object variables. • others: the range of the substitution list contains constants, structures, quantifiers or meta variables. 4.2 Data Areas QuAM supports the same data areas as in WAM. The heap provides space to store data objects as well as the distinctness information and linking cells required for delayed problems. The local stack holds choice points and environments. The choice points are enlarged to reflect the extra data areas and registers. Because the delayed problems list and distinctness information must be reset to their previous value upon backtracking, the method of trailing (i.e. resetting the address to null) used in WAM is inapplicable. Each entry in the trail is extended to be a pair of addresses and 830 previous values to provide extra information for backtracking. In addition to the standard WAM data areas, a delayed problems stack that holds any delayed problem generated during unification is provided. Apart from containing pointers to the arguments for the delayed problem, it has a type tag and a solved tag. The type tag indicates whether the delayed problem is a unification or a noLlree_in problem (Figure 7). The solved tag is set whenever the problem is solved. [f(l)jxJ*A ~IFY f(l) Figure 7: Delayed Problem: [!(1)/x] *A = 1(1) When a query is solved, any delayed problem that remains is printed as a constraint to the solution. Storing the delayed problems in a separate area allows fast access to the problems when the solution is printed. 4.3 Registers There are a few extra registers used in QuAM: • the top of the delayed problems stack, • a list of formerly delayed problems that have been woken up, • The substitution pointer register points to the entry in the parallel substitution where the next component is to be added. As well as the X registers, there is an associated set of registers, known as the XS (X substitution) registers, which hold the substitution of a term when the substitution and the term of a SUBSTITUTION_OPERATOR data cell are broken up during dereference. This procedure enables the substitution to be passed from the outer structure to the inner terms effectively. Because each Y register is one data cell in size, and an OBJECT_VARIABLE is two cells in size, a Yregister cannot hold an OBJECT_VARIABLE directly, and instead contains a reference to an OBJECT _VARIABLE in the heap. 4.4 Instruction Set For each new data object provided in QuAM, there are put and get instructions to build and unify the data object. The new instructions are: put-object-variable Xi Create a new object variable on the heap, and place a reference to it in Xi. get-object-variable Xi Xj Copy the object variable reference in Xj into Xi. put-object-value Xi Xj Copy the object variable reference in Xi into Xj. get-object-value Xi Xj Unify X Sj, Xj with the object variable referenced by Xi. put-quantifier q Xi Xj Xk Construct a quantified term, with quantifier q, bound object variable Xi and body Xh and place a reference to it in Xk. geLquantifier q Xi Xj Xk Match the term in X Sk, Xk with a quantified term, with quantifier q and bound object variable Xi. The body is placed in X Sj, Xj. In each of the last two instructions, the register Xi must have been previously set to an object variable. Note that some of these instructions use the XS registers, while others ignore them, expecting any substitution to be incorporated into the term in the X register. Thus during head matching substitutions are conveniently accessible in the substitution registers, allowing efficient dereferencing, and sharing of substitutions. However, if such a value is to be a sub-term, its substitution (if any) must be re-incorporated into the term. There is a set of put instructions to build substitutions, but no corresponding set of get instructions. This is because a substitution occurring in the head must be built in the same way as if it had occurred in the body, and then the substituted term must be unified with the corresponding head argument (or component). The instructions available are: puLsubs_operator Xi Xj Combine X Sj and Xj into a SUBSTITUTION_OPERATOR, and place a reference to it in Xi. puLempty_subs Xi Set X Si to an empty substitution. pULparalleLsubs n Xi Prepend a parallel substitution, consisting of n pairs (each supplied with the next instruction), to X Si. pULparalleLsubs_pair Xi Xj Add a pair, substituting Xj for the object variable referred to by Xi, to the parallel substitution currently under construction. puLsubs Xi Xj Transfer a substitution from X Si to X Sj. seLobject-property Xi Set the property tag on XSi to "object variables only". determine_property Xi determine the property tag of X Si. 831 put_variable XO XO 1. A put_empty_subs XO put_object_variable Xl Yo y put_atom 'b' X2 put_object_variable X3 1. x put_atom 'a' X4 put_parallel_subs 2 XO Yo * A put_parallel_subs_pair Xl X2 Yo [b/y] * A put_parallel_subs_pair X3 X4 Yo [a/x,b/y]*A determine_property XO The only new procedural instructions are: do_delayed_problems Solve any woken delayed problems. This instruction is executed after the head has been matched. noLfree_in Perform a noLfree_in test during quantifier unification. 4.5 Dereference Because of the presence of substitution, additional operations are included into the dereference algorithm. The substitutions are evaluated during dereference whenever possible. Given an object variable, the substitution will map the object variable to its value. Depending on the type of the data object encountered in the term, dereference also simplifies the substitution before ree turning. 5 Examples A number of small examples are given here to highlight the design differences between QuAM and WAM. 5.1 put_subs_operator XO XO put_structure f/l Xl unify _value XO 1. group together Whenever a substitution is associated with a term in the head, that term together with the substitution will be built by put instructions and general unification will be called. For example, consider the following clause from the A-calculus evaluator: Quantified Terms Quantified terms are constructed in a similar fashion to the unary structures, except for the object variable. The following sequence of instructions shows how a quantified term lambda x x is built in register Xl: put_object_variable XO put_quantifier lambda XO XO Xl 1. ~ Matching a quantified term is slightly more complicated than structure matching. Apart from matching the term from outside in (i.e. match the quantifier befor~ matching the body), it must establish that the bound variable of the quantified term in the head does not occur freely in the body of the quantified term from the query. Thus, a not..free_in instruction must be executed before the quantifier matching is performed. The following instructions match the argument Xo with the head argument (lambda x A)COB: get_structure CO/2 XO unify_variable X2 Yo lambda x A unify_variable XO 1. B put_object_variable X3 Yo x put_empty_subs X3 not_free_in X3 X2 get_quantifier lambda X3 X2 X2 Yo A 5.2 If the substitution is nested inside another term, an extra step is needed. A SUBSTITUTION _OPERATOR data object is created to group the substitution and its associated term together. To construct the term f ( [a/x, b/y] * A), the following additional instructions are required: Substitutions QuAM is designed to create substitutions independent of the term. The term is created before the substitution. The example [a I x, b I y] * A illustrates this property. (lambda x A)COB => [A/x]*B. In section 5.1 above, we gave the translation of the matching of the first argument, leaving x in X 3 , A in X 2 and B in Xo . .The following instructions match the second argument (in Xl): put_subs_operator XO XO Yo group together B put_subs_operator X2 X2 1. group together A put_empty_subs XO 1. *B put_parallel_subs XO 1 1. *B put_parallel_subs_pair X3 X2 1. [A/x] determine_property XO get_value XO Xl 1. unify with the argument Note that A and B must both be combined with their substitutions, if any. In the case of A, this allows the value to fit into a cell in the substitution pair. In the case of B, the substitution must be i~corporated into the value, and the substitution register set to empty, so that the new substitution will be outside any existing substitutions. If the substitution is nested within another term, the outer term is matched by the get instructions, while the substitution is built and unified with the appropriate component. 6 Conclusions QuAM has been implemented in C under the SUN 4 environment. The compiler was initially implemented in NU-Prolog [Naish 1986], and subsequently transferred to Qu-Prolog, which includes Prolog as a subset. 832 Qu-Prolog, including the extensions and features mentioned here, has been motivated particularly by the need to rapidly prototype interactive proof systems, and currently it is the implementation language for a substantial experimental proof system [Robinson and Tang 1991]. [Staples et ai. 1988a] J. Staples, P.J. Robinson, R.A. Paterson, R.A. Hagen, A.J. Craddock and P.C. Wallis, Qu-Prolog: an Extended Prolog for Meta Level Programming, Proc. of the Workshop on Meta Programming in Logic Programming, University of Bristol, June 1988. Acknowledgements [Staples et al. 1988b] J. Staples, R.A. Paterson, P.J. Robinson and G.R. Ellis, Qu-Prolog: Higher Level Symbolic Computation, Key Centre for Software Technology, Department of Computer Science, University of Queensland, 1988. John Staples, Peter Robinson, Gerard Ellis and Dan Hazel have made substantial contributions to the design and implementation of QuAM. This research was supported by the Australian Research Council. References [Ai"t-Kaci 1990] H. Ai"t-Kaci, The WAM: a (Real) Tutorial, Report No.5, Paris Research Laboratory (PRL), France, 1990. [Cheng and Robinson 1991] A.S.K. Cheng and P.J. Robinson, An Implementation for Persistent Variables in Qu-Prolog 3.0, Software Verification Research Centre, Department of Computer Science, University of Queensland, 1991. [Cheng and Paterson 1990] A.S.K. Cheng and R.A. Paterson, The Qu-Prolog Abstract Machine, Technical Report No. 149, Key Centre for Software Technology, Department of Computer Science, University of Queensland, February 1990. [Cheng et al. 1991] A.S.K. Cheng, P.J. Robinson and J. Staples, Higher Level Meta Programming in QuProlog 3.0, Proc. of 8th International Conference on Logic Programming, Paris, June ~991. [Miller and Nadathur 1986] D.A. Miller and G. Nadathur, Higher-order Logic Programming, Proc. of 3rd International Conference on Logic Programming, London, July 1986. [Naish 1986] L. Naish, Negation and Quantifiers in NUProlog, Proc. of 3rd International Conference on Logic Programming, London, July 1986. [Paterson 1989] R.A. Paterson, Unification of Schemes of Quantified Terms, Technical Report No. 154, Key Centre for Software Technology, Department of Computer Science, University of Queensland, Dec. 1989. [Paterson and Hazel 1990] R.A. Paterson and D. Hazel, Qu-Prolog 3.0 - Reference Manual, Technical Report No. 195, Key Centre for Software Technology, Department of Computer Science, University of Queensland, 1990. [Paterson and Staples 1988] R.A. Paterson and J. Staples, A General Theory of Unification and Solution of Constraints, Technical Report No. 90, Key Centre for Software Technology, Department of Computer Science, University of Queensland, 1988. [Robinson and Tang 1991] P.J. Tang, The Demonstration Prover: Demo2.1, Technical Verification Research Centre, land, September 1991. Robinson and T.G. Interactive Theorem Report 91-4, Software University of Queens- [Warren 1983] D.H.D. Warren, An Abstract Prolog Instruction Set, Technical Note 309, Artificial Intelligence Center, Computer Science and Technology Division, SRI International, 1983. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERA nON COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 833 Implementing Prolog Extensions· a Parallel Inference Machine Jean-Marc Alliot* Andreas Herzig t Mamede Lima-Marques+ (alliot@irit.fr) (herzig@irit.fr) (mamede@irit.fr) Institut de Recherche en Informatique de Toulouse 118 Route de Narbonne 31062 Toulouse cedex, France Abstract We present in this paper a general inference machine for building a large class of meta-interpreters. In particular, this machine is suitable for implementing extensions of Prolog with non-classical logics. We give the description of the abstract machine model and an implementation of this machine in a fast language (ADA), along with a discussion on why and how parallelism can easily increase speed, with numerical results of sequential and parallel implementation. 1 Introduction In order to get closer to human reasoning, computer systems, and especially logic programming systems, have to deal with various concepts such as time, belief, knowledge, contexts, etc ... Prolog is just what is needed to handle the Horn clause fragment of first order logic, but what about non-classical logics? Just suppose we want to represent in Prolog time, knowledge, hypotheses, or two of them at the same time; or to organize our program in modules, to have equational theories, to treat fuzzy predicates or clauses. All these cases need different ways of computing a new goal from an existing one. Theoretical solutions have been found for each of the enumerated cases, and particular extensions of Prolog have been proposed in this sense in the literature-. Examples are [BK82]' [GL82], Tokio [FKTM086], NPROLOG [GR84]' Context Extension [MP88], Templog [Bau89], Temporal Prolog [Sak89], and [Sak87]. For all these solutions it is possible to write specific meta-interpreters in Prolog that implement these non-classical systems ([SS86]). But there are disadvantages of a meta-interpreter: lower speed and compila*Supported by the Centre d'Etudes de la Navigation Aerienne, France tSupported by the Medlar Esprit Project tSupported by CAPES - Brasil tion notoriously inefficient. If we want to go a step further, and to write proper extensions of Prolog, then the problem is that costs for that are relatively high (because for each case we will lead to write a new extension), and we are bound to specific domains: we can only do temporal reasoning, but not reasoning about knowledge (and what if we want to add modules?). Our aim is to define a framework wherein a superuser can create easily "his" extension of Prolog. This framework should be as general as possible. Hence, we must provide a general methodology to implement non-classical logics. There are four basic assumptions on which our frame is built: 1. to keep as a base the fundamental logic programming mechanisms that are backward chaining, depth first strategy, backtracking, and unification, 2. to parametrize the inference step: it is the superuser who specifies how to compute the new goal from a given one, and he specifies it in a logic form. 3. to be able to rewrite goals. 4. to select clauses "by hand". Points (2) and (3) postulate a more flexible way of computing goals than that of Prolog, where first a clause is selected from the program, then the Robinson unification algorithm is applied to the clause and the head of the goal, and finally a new goal is produced. Point (4) introduces a further flexibility: the superuser may select clauses that do not unify exactly with the current goal, but just "resemble" it in some sense. Even more, if the current goal contains enough information to produce the next goal, or if we just want to simplify a goal or to reorder literals we don't need to select a fact clause at all. The assumptions (1) and (2) were at base of the development. of a meta-level inference system called MOLOG [FdC86], [ABFdC+86], [BFdCH88], 834 (Esp87b], (Esp87a]. The inference machine that is presented in this paper is a complete rewriting of MOLOG realizing assumption (4). It has been developped at IRIT ((Bri87] and (AG88]). A formal specification of the inference mechanism called TIM : Toulouse Inference Machine, together with various examples, has been published in (BHLM91]. Here, in this paper, we present the 'J\RS.KI : Toulouse Abstract Reasoning System for Knowledge Inference, which is an abstract machine in which the inference mechanism can be implemented. In the preliminary version of this work nothing has been said about abstract machine and implementation, and the specifications are being defined more clearly now. 'J\R.sKr was designed to implement parallelism (see sections 6 and 7). For example, for a given definite fact and goal clauses, more than one rule is possible. In this case it is possible to use a different processor for each rule. The parallel machine wasdevelopped and differents solutions was be done. 2 Horn clauses The base of the language is that of Prolog. That language can (but need not) be enriched with context operators if one wants to mechanize non-classical logics. Characteristically, non-classical logics possess symbols with a particular behaviour. These symbols are • either classical connectors with modified semantics (e.g. intuitionist, minimal, relevant, paraconsis tent logics) • or new connectors called context operators (necessary and possible in modal, knows in epistemic, always in temporal, if in conditional logics ). Example In epistemic logics, the context operators are knows and comp, and knows(a):P means that agent a knows that P comp(a):P means that it is compatible with a's knowledge that P Hence inference engines for non-classical logics must reckon for the particular behaviour of some given symbols. These properties will be handled by built-in features of the inference engine. The conditio sine qua non for logic programming languages is that they possess an implicational symbol to which a procedural sense can be given. To define a programming language it's less important if this is material implication or not, but it's rather the dynamic aspect of implication that makes the execution of a logic program possible. That is why the TIM language is built around some arrow-like symbol. We suppose the usual definition of terms and atomic formulas of logic programming. Intuitively, TIM Horn Clauses are formulas built with the above connectors, such that dropping the context we may get a classical Horn clauses. Now for each logic programming language we suppose a particular set of context operators. This set depends on the logic programming language we want to implement, e.g. in epistemic logic it is {knows, comp} and in temporal logic it is {always, sometimes}. Formally we define by mutual recursion: Definition 2. 1 - contexts m( t 1, ... , t n ) is a context if mis a context operator n ~ 0, and for 1 ~ i ~ n every t i is either a term or a definite clause. Definition 2. 2 - goal clauses ?P is a goal clause if P is an atomic formula ?(G /\ F) is a goal clause if ?G, ?F are goal clauses ?MOD : F is a goal clauses if ?F is a goal clause and MOD is a context Definition 2. 3 - definite clauses P is a definite clause if P is an atomic formula MOD: F is a definite clause if F is a definite clause and MOD is a context F G is a definite clause if F is a definite clause and G is a goal clause +- Definition 2. 4 - TIM Horn clause A TIM Horn clause (or Horn clause for short) is either a goal clause or a definite clause. Note that Horn clauses may contain several implication symbols. We shall also use the term Modal Horn clauses if we are speaking of a modal logic. A set of definite clauses is called a database. In the following sections we shall use the definition of the head of a Horn clause. Definition 2. 5 - Head of a Horn clause • H is a head of H. • H is a head of F /\ G if H is a head of F. • H is a head of F +- G if H is a head of F. • H is a head of MOD : F if H is a head of F. 3 3.1 Writing meta-interpreters General Mechanism Just as in Prolog, to decide whether a given goal follows from the database essentially means to compute step by step new subgoals from given ones. In our 835 3.2 ~ Failure L--_ _ _ _ _ _ _ _...... clause ~ rule I Backtrack L--_ _ _ _...... L - - - - - - r_ _ _ _---J ~e I Backtrack Rewriting No Selecting and Executing Inference Rules Final form? ~ Yes SUCCESS Figure 1: General mechanism of the TIM machine An inference rule is of the form: A,? B f-?C where A is a definite clause and B, C are goal clauses. It can be read: If the current goal clause unifies with B and the selected database clause unifies with A then a new goal can be inferred that is unified with C. In the style of Gentzen's sequent calculus, inference rules can be defined recursively as follows: A,?B f-?C A', ?B' f-?C' where A,A' are definite clauses and B, C, B', C' are goal clauses. As usual in metaprogramming, objects of the object language are represented by variables of the metalanguage l . Essentially, what can be tested here is any condition on the form of A,A', B, C, B', C', or on the existence of a database clause of a certain form. E.g. we can let an inference rule depend on the (non- )existence of some clause in some particular module of the database. In the recursive definition the following conditions must be met 2 : • var(A') C var(A) • A' is a head of A or A is a head of A' case, the computation of the new subgoal is specified by the superuser. The general inference mechanism is described in figure 1. There are five steps: Clause selection: We select a clause to solve the first sub-goal of the question. Rule selection: We select a rule to be applied to the current clause and the current question. • C' is a variable • C' is a head of C A special category of inference rules are reflexive rules: true, ? B f-?C A', ? B' f-?C' These rules use the special fact true. The conditions that these rules must meet are: • A' is either: Rule execution: The execution of the rule "modifies" the current clause and the current question and builds a resolvent. Rewritting of the resolvent: When we reach a termination rule, we rewrite the resolvent into a new question. a variable3 , or any definite clause constructed from the variables in Band C and constants. • C' is a variable • C' is a head of C This system is doubly non determinist, because we have both a clause selection (as in standard Prolog) and a rule selection. Partial termination rules are written: A, ? B f-?C if Condition They end the recursivity in resolution. These are some examples : the Prolog rule for goai conjunctions: A,?BACf-?DAC A,?Bf-?D We are going in the next sections to explain how this mechanism can be implemented. In subsection 3.2, we will discuss rule selection and execution, in subsection 3.4 rewriting and in subsection 3.3 clause selection. In section 6, we will come back to rule selection to show how efficient mechanism can be used to improve resolution speed. lTo be correct, the real form of inference rule is a little different : a procedural condition expressed with elementary functions of the abstract machine (see section 5) can be added. This enables a more precise control over execution. 2It is these conditions on the form of the inference rules that warrant the efficiency of the implementation. 3This variable will be unified with a new fact taken in the clause base End of resolution : A resolution is completed when we reach a final form: the goal clause true. 836 the Prolog rule for implications in database clauses: At-B,?GI-?BAD A,?G I-?D the Prolog partial termination rule is: p, ?p I-?true Note that here we make use of unification. These three rules are exactly what is needed to implement Prolog. To summarize, the execution of an inference rule modifies the current fact and the current question and constructs a resolvent. The resolvent has the same structure than the question or any other fact. Partial resolution is achieved when we reach a partial termination rule. How rules are selected is defined by the user. We will see in the section 6 how this is exactly done. For the moment, we say that rules are taken in the order they appear in the rule base. 3.3 Rewriting the Resolvent into a New Question As soon as we have reached a partial termination rule, we rewrite the resolvent to create the new question to solve. Rewriting is useful not only in order to simplify goals, but also in order to eliminate the true predicate from the new goal clause. Rewrite rules are of the form: GI~G2 and allow to replace a term that is matched by G I in the resolvent with some substitution a by the term (G2)a in the new question. For example, the Prolog rewrite rule is: true A A~ A In epistemic logic, the rule: knows(a) : knows(a) : A ~ knows(a) : A is a useful simplification. 3.4 Selecting Database Clauses The user can define the way clauses are selected in the base. But this selection "by hand" must be chosen among a given set (that currently implements only two methods: classical Prolog selection and least used clause selection). Using the abstract machine, it is possible to build another selection mechanism (for example indexing selection on the first operator) but it has not been implemented yet and it is not described in this paper. 4 Examples: Modules In this section we are going to show how to specify modules with dynamic import. Here, any module name, such as m, ml, m(2), etc ... is considered to be a context. Module logic U,fM:Ut fM:NU O,?GI-?NG M:O,?M:GI-?M:NG O,?GI-?NG truel\G-v>true M:true-v.true Table 1: Rules for Module logics The goal mi : m2 : G succeeds if G can be proved using clauses from the modules m1 and m2. The inference rules are that for Prolog, plus two supplementary rules to handle module operators (table 1). The first rule represents the case where a module M is used to compute a new goal, and the second where another module name eventually occurring in G is used. Others types of modules such as modules with static import or with context extension [MP88], can be specified by just adding as new inference rule. In [BHLM91], we have shown how temporal logics, hypothetical reasoning and logics of knowledge and belief can be specified elegantly in our framework. 5 The abstract machine The goal of the ']!\.R;:Ki abstract machine is to bridge the gap between the description of inference rules in logical form as shown above, and the real implementation of the rule in an efficient programming language. Compared to the WAM, the ']!\.R;:Ki abstract machine deals with different objects, and has a quite different goal, but on the whole, principles are identical; we will also define our machine in terms of data, stacks, registers and instructions set. We do not have enough room here to describe completely the machine. So, we shall not speak of the "classical" parts of resolution that are identical: i.e unification or backtracking. Let's say that the machine relies on classical structure sharing for unification, and on depth first search and backtracking. Before going further, we must tell about the Great Lie. ']!\.R;:Ki does not use classical logic operators A or t-. For consistency and simplicity sake, all operators either modal, temporal, classical, are represented in our formalism in the same way and are treated by the machine in the same way also. Let's see that on an example: The logical clause written in Prolog: At-BAG will be written in ']!\.R;:Ki: A( G) : A(B) : A Here B is the argument of A and A is qualified by A(B). All operators have arguments, and qualify an 837 object. For example, the 84 modallogic 4 clause: D(X) : (D(a) : p +- O(a) : p) will be written: D(X) : /\(O(a) : p) : D(a): p and O( a) : p is the argument of /\ that qualifies D( a) : The clauses stack: Each element of this stack is composed of: • a pointer in the object stack to the beginning of the clause • a pointer to the head predicatel l p. This could look like the polish reverse notation, but it is not exactly the same. In the polish reverse notation K pq (that is p /\ q) gives the same role to p and q. In /\ (p) : q, p and q have really different parts to play: p is an operand of /\ and q is the object qualified by /\(p). This destroys the symmetry of /\, but should be considered as an advantage here. In all classical Prolog, solving p /\ q is different from solving q /\ p: the operator is not symmetric at all. This formalism was not adopted lightly. The first versions did not use it, and gave a special place to the classical operators: we had a lot of problems to describe correctly the inference mechanism. Adopting this structure greatly enhanced the simplicity and the efficiency of the system. 5.1 Data structures First of all, boolean objects (true, false) with classical operations associated (not, or, and) are implemented along with integer and floats, with their standard operations. All data are organized in stacks. There are currently nine basic data types, and nine corresponding stacks. The objects stack: holds all the objects on which the machine operates. An object can be either: an operatorS, a predicate6 , a variable, an integer, a float, a cons 7 , alfree8 • Elements of this stack will be called ObjectElement? The operands stack: Objects do not hold their operands. Each object that has arguments holds the number of its operands and a pointer to an element of this stack that holds pointers to all the operands lO • Elements of this stack are called Opera ndElement. 4From now on, we will only use the S4 modal logic. A classical introduction is [HC72]. We will use the following notations: 0 is knows, 0 is compatible. Modal operators have arguments that must be constants. The new operator O[ must be added to the original language as shown in ([CH88]). 5 An operator is an object that has objects as arguments and qualify an other object. 6 A predicate is an object that has arguments but do not qualify any other object. 7The classical LISP cons 8 alfree is a special object quite similar in its behaviour to a variable that would always be free (alfree is the abbreviation of always free). 9Strings are currently not implemented. laThe operand stack is probably a technical mistake and will probably be suppressed in future versions of the machine • the number of free variables in the clause. Elements of this stack are called ClauseElement. The environments stack: Each element is a pair composed of a pointer to an object and a pointer in the environment stack in that the object has to be evaluated (classical structure sharing implementation). Elements of this stack are called EnvironmentElement. The Trail stack: Pointers to the environment list for resetting to free some variables when backtracking (classical structure sharing implementation). Elements of this stack are called TrailElement. The backtrack stack: Each element holds all information necessary to backtracking (values of top of stacks). Elements of this stack are called BactrackElement. The question stack: Each element is a pair composed of a pointer of an object and a pointer to the environment where this object must be evaluated. The question stack holds goals to be solved. Elements of this stack are called QuestionElement. The resolvent stack: stack for the resolvent elements. The resolvent is built with the current question and the current selected fact. When reaching a partial termination rule, the resolvent is re-written using rewriting rules on the top of the question and becomes the new question. Elements of this stacks are called resolventElement. The predicates stack: Holds predicate structures. There are also nine other types: pointers12 to object in each stack, respectively ObjectPointer) OperandPointer)ClausePointer) EnvironmentPointer) TrailPointer) BacktrackPointer) resolventPointer) QuestionPointer. At last, there is the rules array. This array describe how resolution rules behave in the system. We will come back to this later. 5.2 Registers The registers described here are what we call global registers or main registers (see figure 2). There exists 11 Useful when using classical Prolog clauses selection to increase speed. l2We usually use the term pointer that is not exactly appropriate. Our pointers should be thought as abstract data types, that can be implemented as real pointers, or as indexes of an array, or anything similar. 838 Register Qcurr FCurr FEnv CClause CRuie TrTop ObTop BTTop Qtop RTop EnvTop Description Pointer to the current object in the question Pointer to the current object in the clause Pointer to the environment of the current clause Pointer to the current clause Index of the current rule used Pointer to the top of Trail Stack Pointer to the top of Object Stack Pointer to the top of Backtrack stack Pointer to the top of question stack Pointer to the top of resolvent stack Pointer to the top of environment stack Figure 2: Abstract machine registers Push(x : object) return pointer Read(i : pointer) return object Pull return object Modify(x: object; i : pointer) SetTop(i : pointer) Position return pointer instruction unify, that unifies (Structl, Envl) with (Struct2, Env2)13. Let's see on an example how the abstract machine code is used to implement rules 14 : D(X) : A, ?D(X) : B I-?D(X) : 0 D(X) : A, ?B I-?O is translated into: . RO:=Read(Qcurr) if not unify(Fcurr,Fenv,GetNumStruct(RO),GetNumEnv(RO) then return false else Pushreso!vent(RO) endif Qcurr := Qcurr+l return true 6 Rule selection with parallelism Figure 3: Operations available on each stack also general purpose registers that can be temporarily used for calculations. We will note them RO, RI, ... in the following pages. At time t, the machine is completely defined by the values of its stacks and its registers. 5.3 Instructions set We describe here the instruction set of the abstract machine. We can not, because of lack of space, describe it extensively, but the next few lines give an intensive definitions of all instructions. For each type of object, there are twice as many functions as there are components in the object, one for getting the value of the component and one for setting this value. Moreover, for each of the nine stacks there are 6 basic operations implemented (see figure 3). +(p:pointerj i:integer):pointer Increments pointer p by i - (p:pointerj i:integer) :pointer Decrements pointer p by i -(pl,p2 : pointer):integer Returns the number of elements between pI and p2. There are also some classical functions: Assignment, Equality test, Conditional constructions. This ends the description of atomic functions. We will need in the following lines the classical macro- In section 3.4, we said that resolution rules were chosen in the rules base in order of appearance. We are going to show here that this mechanism can be greatly enhanced by indexing the rules base and using parallel execution of rules. 6.1 Indexation of rules The rules necessary to implement S4 are shown on top of table 2. Remember that due to the uniform notation of the abstract machine the clause !\(A) : B of the second rule is in fact the implication B t - A. We can see that, for a given fact and a given question, we have to try a lot of different rules. This creates a second non-determinism that greatly slows down the implementation of the language. But trying all rules is usually not useful, because for a given fact and a given question, only a few rules will match the shape of the fact and the shape of the question. For example, if the fact is D(X) : A and the question <>I(X, 1): B only rules 9 and 11 can be used. So, for a given logic, we can develop extensively all possible cases. For 84, this gives table 2. This way, given a fact and a question, the array gives directly the rules that can be applied and there is often only one rule that can be applied. This transforms the double non-determinism in an almost simple non-determinism much closer to Prolog complexity. 80, in a large number of cases, it is not necessary to backtrack on rule selection. 13 unify is of course written with atomic instructions. l40ther examples can be found in [AIled]: full implementation of S4 logic, among others (Fuzzy logic, module logic). 839 I Fact I Question I Rules Rl Type Rule Rule Number 1 2 Rule Rule Rule 3 4 Rule Rule Rule 6 7 8 9 Form p, ?p I-?true 11 A(A}:B,?CI-?A(A}:D B ?CI-W B,?A(A):CI-?A(A):D B ?CI-W A,?O(X):BI-?C A,?BI-?C o r(X ,I):A,?O(X):BI-?Or(X ,1):C A, ?O(X):BI-?C Od X ,l}:A,?Or(X,l}:BI-?Or(X,I):C A,?BI-?C D(X):A,?D(X):BI-?D(X):C D(X):A,?BI-?C D(X):A,?o(X):BI-?O(X):C D(X):A,?BI-?C D(X}:A, ?Or(X,I):BI-?Or(X,I):C D(X):A,?BI-?C D(X):A,?O(X):BI-?O(X):C A,?O(X):BI-?C D(X):A,?BI-?C A ?BI-?C Fact Pred Pred Question Pred Usable rules p, ?p I-?true Pred o Rule Rule Rule 5 10 /\ /\ Pred /\ /\ /\ o o /\ o o o Pred /\ o o 01 01 01 o A,.t\lX):BI-.A(X):(,' A ?BI-?C A,?O(X):BI-?C A ?BI-?C A(X):A,?BI-.A(X):C A ?BI-?C t\(X):A,.A Y):l1/-·.A(Y):(,' A(X):A,?BI-?C A(X):A,!O:Y :l1l-.A(X ):(,' A?O Y :BI-?C A(X):A,.D[Y :l1l-.A(X):C A,?D Y :BI-?C A(X):A,!Or( ,l):l1/-·.A(X):C A ?Or(YI):BI-?C D(X):A,:BI-?C A ?BI-?C D(Y):A,.A(X):BI-.A(X):C D(YI:A ?BI-?C D(X):A,?O(Y):BI- !C A ?<>(Y :BI-?C D(Y):A, !O(X ):l1l-'!C D(YI:A,?BI-?C D(X ):A,'!O(X ):l1l-'!O(X ):(,' A,?O X):BI-?C -D(X):A, !~(A ,:l1t !VIA J:C D(X :A ?BI-?C D(X):A,?Orl ,l):BI-!C A,?odYl :BI-?C D(X ):A,!Or( ,1 ):11/-·[<;>1(-" ,1):(,' D(X :A ?BI-?C <;>r(Y):A,.A( ):l1t-·.A(X):C OI(Y):A,?BI-?C Or(Y):A,?o(X):BI-?C OI{Y):A,?BI-?C o r(X,I):A,!O(X ):l1l-'!Or(X,1 J:C A ?O(X):BI-?C Or(X,I):A,!Or(X,l):l1l- tOr(X,l):C A ?BI-?C Table 2: S4 logic rules and their exhaustive development R2 o R3 o R4 o Table 3: Rules PI 0 against 0 L-._ _ _ _ _ _ _ _....J ~ Failure clause PI I I I T T I T P4 , I Each processor will continue resolution with a fourth of the resolution tree Figure 4: Parallel execution of S4 rules 6.2 Parallel rule execution The abstract machine was designed to enable an easy implementation of parallelism. Sometimes, for a given definite fact and a given goal clause, more than one rule is possible: we can use a different processor for each rule. For example, in the S4 logic, if the fact is D(X) : A and the question is O(X) : B, four rules can be used (table 3). With four processors, each one can continue the resolution with a different rule. Figure 4 shows how the inference system running originally on processor Pl. With four processors Pi, P2, P3, P4 available, it is possible to solve, in parallel, S4 rules described in table 3. The information transferred from one processor (Pi) to its children (P2, P3, P4) are the abstract machine data stacks and the abstract machine registers. Some stacks are never transferred (the backtrack stack, the trail stack) because the child does not need to backtrack over the current resolution point. This parallelism induces no side effects : as soon as one processor has received data, it will not have to communicate 840 L to all : free P to L : request L to P : Ok P to L : Data Figure 5: Fully interconnected network with its parent any more until it has finished its own resolution. Moreover, there is no overhead in processing time because parallelism is explicit in the language itself: overhead comes only from communication between processes. Four models (Master/slaves network, fully interconnected networks, ring networks, top-down networks) are under development; we just mention them and we will not discuss them in detail 15 • Fully interconnected network: Every processor can distribute work to any other processor that is free. A very simple protocol is used to prevent two processors to send at the same time data to the same processor (figure 5). This protocol will solve problems as represented in figure 4. Master/slaves network: The master process distributes work to all other processes, which, in turn, can not distribute any work. This protocol will also solve problems as represented in figure 4. Ring network: Here each processor can send work to the next one, and the last processor can send work to the first. Top-Down network: In the Top-Down Network, each processor can only send information to the following one but the last processor can't send information to the first one. In ring networks and top-down networks, resolution is not exactly as represented in figure 4. 7 7.1 Implementing Parallelism The "classical" machine The new abstract machine specifications was the result that began with the first implementation of MOLOG, in C, in 1988. Coding the new machine took less than two months. Of course, two years spent in coding other abstract machines (that proved to be unsatisfactory) helped a lot. From the beginning, the stress was on getting a 150n all practical implementations issues, details can be found in [AIled]. program as close as possible to the specifications of the abstract machine. That was the reason why the ADA language has been chosen: the specifications of the abstract machine are exactly the specifications of the main package of the implementation. Moreover, compared to other implementations previously written in C, coding and debugging was a lot easier and faster. We wanted also to be able to easily implement parallelism. So, for example, stacks are implemented with arrays and there is not a single real pointer in the system, only indexes. It has an interesting well known side effect: we never run out of stack space, because if a stack becomes full, we just have to copy it to a new larger stack. All indexes are still valid. The mechanism is invisible to the programmer and the user and very useful with some very recursive non-classical problems. This was done at the loss of performance. Accessing any object in a stack requires two function calls and three tests plus the classical indirection. The 'l\R.sK:r machine runs about fifteen times slower than C-Prolog16 on PROLOG problems. This could easily be enhanced by recoding the machine with efficiency in mind. Coding a logic is very easy as soon as it follows the general framework given in section 3.2. The S4 logic was implemented in one day. and tested with the classical "wise men" puzzle. The puzzle is solved in three minutes on a HP-720 workstation with the full amount of knowledge (more than twenty clauses). With only the five clauses necessary to solve the problem, the solution is found in less than a second, hundred times faster than the MOLOG interpreter. 7.2 The parallel machine The parallel machine was developped with an ETHERNET network as medium for data transfer. The parallel system is made of many 'l\RS.Kr machines running on different workstations, linked by INTERNET sockets 1 7. The only configuration tested was a topdown network. Results are shown in table 4. It would be too long to discuss them here in detail. Full explanations can be found in [AIled]. We can briefly say that, over three processors, the network is clearly too slow and becomes the bottleneck of the system. A large part of time is lost in communicating with other processors. There are different solutions that could be used to enhance performances: • We can use parallelism only for branches that are 161t is however faster than some classical PROLOG written in compiled Common Lisp 171t was quite easy to do, because all necessary packages for communication and parallelism had been developped previously for other projects. Reusability of software is a major advantage of ADA. 841 # 1 2 3 4 of Procs P1 319+1 166+10 129+24 129+26 P2 145+6 142+50 140+46 P3 P4 Acknowledgements We wish to thank Luis Farinas Del Cerro for valuable discussions. 77+17 46+31 22+9 Table 4: CPU +system time used close to the root of the tree. This will decrease the number of sent packets. 8 9 References [ABFdC+86] R. Arthaud, P. Bieber, 1. Farinas del Cerro, J. Henry, and A. Herzig. Automated modal reasoning. In Proc. of the Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, july 1986. • We can try a master/slave network. The master processor will be almost devoted to sending packets but slaves would not spare time on this. [AG88] J. M. AIliot and J. Garmendia. Une Implementation en "C" de MOLOG. Rapport D.E.A, Universite Paul Sabatier, Toulouse, France, 1988. • We can improve the amount of sent data; some stacks can only grow, and are never modified under a certain depth. We could only send new data, and not the whole stack. [AL91] Jean-Marc Alliot and Marcel Leroux. En route air traffic organizer, un systeme expert pour Ie contr6le du trafic aerien. In Proceedings of the • We could try to use a different medium. An ethernet network is a very slow device for parallelism, and, moreover, our network is usually crowded with packets coming from other stations or other X-terminals. It would be very interesting to implement the machine on a multi-processor computer with shared memory segments, or on a transputers network. We were not able to do it yet because we lack access to such a machine. We are very eager to try such an approach. If we are able to find a machine with many processors, the inference machine could be almost as fast as a standard PROLOG even when solving nonclassical logic problems, because the double nondeterminism would be almost reduced to classical PROLOG non-determinism. [AIled] Jean-Marc Alliot. Tarski: une machine paralIe Ie pour l'implementation d'extensions de prolog. Master's thesis, Universite Paul Sabatier, To be published. [Bau89] Marianne Baudinet. Logic Programming Semantics: Techniques and Applications. PhD thesis, Stanford University, feb 1989. [BFdCH88] P. Bieber, L. Farinas del Cerro, and A. Herzig. MOLOG - a modal PROLOG. In E. Lusk and R. Overbeek, editors, Proc. of the 9th Int. Conf. on Automated Deduction, LNCS 310, pages 487499, Argonne - USA, may 1988. Springer Verlag. [BHLM91] P. Balbiani, A. Herzig, and M. Lima-Marques. TIM: The Toulouse Inference Machine for nonclassical logic programming. In M.M. Richter and H. Boley, editors, Processing Declarative K nowledge, number 567 in Lecture Notes in Artificial Intelligence, pages 365-382. Springe-Verlag, 1991. [BK82] K. A. Bowen and R. A. Kowalski. Amalgamating language and metalanguage in logic programming. In K. Clark and S. Tarnlund, editors, Logic Programming, pages 153-172. Academic Press, 1982. [Bri87] M. Bricard. Unemachine abstraite pour compiler MOLOG. Rapport D.E.A., Universite Paul Sabatier - LSI, 1987. [CH88] Luis Farinas Del Cerro and Andreas Herzig. Linear modal deductions. In E. Lusk and R. Overbeek, editors, Proc. of the 9th Int. Conf. on Automated Deduction Computer Systems. SpringerVerlag, 1988. [Esp87a] Esprit Project p973 "ALPES". MOLOG Technical Report, may 1987. Esprit Technical Report. [Esp87b] Esprit Project p973 "ALPES". MOLOG User Manual, may 1987. Esprit Technical Report. [FdC86] 1. Farinas del Cerro. MOLOG: A system that extends PROLOG with modal logic. New Generation Computing, 4:35-50, 1986. Conclusion We think the implementation of any logic given by inference rules of the form defined in the earlier sections can be done in a very short amount of time (one or two days at most). The development of an automatic translator from the logical shape of the rules to the abstract machine specifications suggests itself and is a subject of current work. Now, it is hoped that fast, general and efficient implementations of such logics could bring a new area of development for expert systems. In particular, in the C.E.N .A.IS a large expert system (3,000 rules) using fuzzy and temporal logics has been developped in Prolog ([AL91]). This expert systems could be an excellent test for 1\lliKI. 18The CENA is an institution responsible for studies of new systems for Air Traffic Control in France International Conference on Expert systems and their applications, Avignon, May 1991. 842 [FKTM086] M. Fujita, S. Kono, H. Tanaka, and T. MotoOka. Tokio: Logic programming language based on temporal logic and its compilation to prolog. In Third Int. Conf. on Logic Programming, pages 695-709, jul 1986. [GL82] M. Gallaire and C. Lasserre. Meta-level control for logic programs. In K. Clark and S. Tarnlund, editors, Logic Programming, pages 173-188. Academic Press, 1982. [GR84] D. Gabbay and U. Reyle. N-prolog: An extension of prolog with hypothetical implications. lounal of Logic Programming, 1:319-355, 1984. [HC72] G. E. Hughes and M. J. Cresswell. An Introduction to Modal Logics. Methuen & Co. Ltd, USA, 2 edition, 1972. [MP88] Luis Monteiro and Antonio Porto. Modules for logic programming based on context extension. In [Sak87] Y Sakakibara. Programming in modal logic: An extension of PROLOG based on modal logic. In [Sak89] Takashi Sakuragawa. Temporal PROLOG. In Int. Con! on Logic Programming, 1988. Int. Conf. on Logic Programming, 1987. RIMS Conf. on software science and engineering, 1989. [SS86] L Sterling and E. Shapiro. The Art of Prolog. The MIT Press, USA, 1986. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 843 Parallel Constraint Solving in Andorra-I Steve Gregory and Rong Yang Department of Computer Science University of Bristol Bristol BS8 1TR, England steve/rong@cs.bris.ac.uk Abstract The subject of this paper is the integration of two active areas of research: a parallel implementation of a constraint logic programming language. Specifically, we report on some experiments with the and/orparallel logic programming system Andorra-I extended with support for finite domain constraint solving. We describe how the language supported by Andorra-I can be extended wi th finite domain constraints, and show that the computational model underlying Andorra-I is well suited to execute such programs. For example, most constraints are automatically executed eagerly so as to reduce the search space; moreover, they are executed concurrently, using dependent and-parallelism. We have compared the performance of some constrained search programs on Andorra-I with that of conventional generate-and-test programs. The results show that the use of constraints not only reduces the sequential execution time, but also significantly increases the and-parallel speedup. 1 Introduction Much of the success of Prolog has been due to its suitability for applications involving search: the. language provides a relational notation which is very convenient for expressing non-deterministic problems and it can be implemented with impressive efficiency. However, the search strategy built into Prolog is a rather naive one, which tends to perform an unnecessary amount of search for problems that are stated in a simple manner. To solve realistic search problems in Prolog, it is often necessary to perform additional forward computation in order to reduce the search space to a manageable size. However, since this extra computation must be programmed in Prolog itself, it may be an expensive overhead which partly offsets the speed benefits of the reduced search. Moreover, the resulting program is more opaque and difficult to write than a natural solution in Prolog. To improve on the search strategy of Prolog while retaining its advantages is the motivation for the development of constraint logic programming (CLP) systems. Most of the CLP languages that have been proposed are based on Prolog, extended with the ability to solve constraints in one or more domains. CLP languages use knowledge specific to their domain to execute certain goals ("constraints") earlier than would be possible in Prolog, thus potentially reducing the search space. Provided that the constraint solving mechanism is implemented efficiently and that the language is simple to use, the search time can be reduced at little cost in either forward computation time or increased program complexity. One type of CLP language, which has proved particularly useful for combinatorial search problems, is that based on finite domains; this is described in a little more detail in Section 2. There have been many projects in recent years to develop parallel implementations of Prolog. Most of these systems incorporate either or-parallelism, independent and-parallelism, or both. In contrast, the Andorra-I system is an implementation of Prolog that exploits or-parallelism together with dependent andparallelism, which is the sole form of parallelism exploited in most implementations of concurrent logic programming languages such as Parlog and GHC. Andorra-I has proved effective in obtaining speedups in programs that have potential or-parallelism and those with potential and-parallelism, while in some programs both forms of parallelism can be exploited. Andorra-I, and the Basic Andorra model on which it is based, are described briefly in Section 3. The subject of this paper is the integration of the above strands of research: a parallel implementation of a constraint logic programming language. Specifically, we report on our experiences with extending the Prolog-like language supported by Andorra-I to support finite domain constraint solving. There are two main reasons why this is of interest: 1. Language. To investigate how easily the required language extensions can be supported by the Basic Andorra model. 844 2. Performance. To ensure that the finite domain extensions can be implemented efficiently in Andorra-I and that the efficiency is retained in parallel execution. Although a prototype or-parallel implementation of the Chip language has been developed [Van Hentenryck 1989b], we are not aware of any previous investigation of and-parallelism with finite domain constraints. By adding these extensions to Andorra-I we can experiment with both forms of parallelism and compare them. It is particularly interesting to compare the performance of constrained search programs on the Basic Andorra model with that of conventional generate-and-test programs (apart from the expected reduction in overall execution time). The constraint solving represents additional forward computation, so - provided that the constraints can be effectively solved in parallel- we would expect and-parallelism to be increased. At the same time, since the search space is reduced, there may be less scope for orparallelism. The performance results obtained with Andorra-I confirm these expectations. The next two sections describe the background to the paper. Section 4 discusses the implementation of finite domain constraints on the Basic Andorra model. It describes in detail the language extensions that we have implemented and the structure of programs that use them. Section 5 presents some results of running constrained search problems on Andorra-I. Section 6 concludes the paper. 2 Finite domain constraints The idea of adding finite domain constraints to logic programming originated with the work of Van Hentenryck and his colleagues, and was first implemented in the language Chip [Van Hentenryck and Dincbas 1986; Dincbas et al. 1988; Van Hentenryck 1989a]. Chip extends Prolog in several ways to handle constraints; the principal extensions relevant to finite domains are outlined below. 2.1 Domain variables Some variables in a program may be designated domain variables, ranging over any specified finite domain. Domain variables appear to the programmer like normal logical variables but are treated differently by unification and by constraints. 2.2 Constraints on finite domains Goals for certain constraint relations behave ina special way when they have domain variables as arguments. For example, if X is a domain variable, the goal X ~ 5 can be executed by removing from the domain of X all items greater than 5. This in tum may reduce the search space that the program explores. A user-defined predicate may be made a constraint by using a 'forward' or 'lookahead' declaration, while some primitives (e.g., inequality) have such declarations implicitly. (Unification can have a similar effect: unifying two domain variables reduces the domain of both to the intersection of their original domains, while unifying a domain variable and a constant may fail.) 2.3 Coroutining Constraints should be executed as early as possible in order to reduce the search space. For example, X ~ Y could be executed as soon as either X or y has a value and the other is a domain variable. In general, a coroutining mechanism ensures that control switches to a constraint goal as soon as it can be executed. The simplest such control rule is forward checking, used for forward-declared constraints, whereby a constraint is executed as soon as its arguments contain at most one domain variable and are otherwise ground. The constraint goal is then effectively executed for each member of its argument's domain and values that cause failure are removed from the domain. The lookahead rule, often used for inequality relations such as '~', can even execute constraints whose arguments contain more than one domain variable; we shall not consider this further in this paper. 3 The Basic Andorra model The Basic Andorra model is a computational model for logic programs which exploits both or-parallelism and dependent (stream) and-parallelism. The model works by alternating between two phases: 1. Determinate phase. Determinate g()als are executed in preference to non-determinate goals. While determinate goals exist they are executed in parallel, giving dependent and-parallelism. (A goal is considered determinate if the system can detect that it can match at most one clause.) This phase ends when no determinate goals are available or when some goal fails. 2. Non-determinate phase. When no determinate goals remain, one goal - namely, the leftmost one that is not det only (see below) - is selected and a choicepoint created for it. Or-parallelism can be obtained by exploring choicepoints in parallel. The model and its prototype implementation, Andorra-I, are described in [Santos Costa et al. 1991]. Andorra-I supports· the Prolog language augmented with a few features specific to the model. For example, det_only declarations allow the programmer to specify that goals for some predicate 845 can only be executed in the determinate phase; if such a goal remains in the non-determinate phase it cannot be used to create a choicepoint, even if it is the leftmost goal. Conversely, non _ de t _ 0 n 1 y declarations can be used to prevent goals from executing in the determinate phase even if they are determinate. Performance results for Andorra-I show that the system obtains good speedups from both orparallelism and and-parallelism. The best andparallel speedups are obtained for programs that are completely determinate (and therefore have no orparallelism to exploit). The best or-parallel speedups come from search programs, especially when searching for all solutions. Unfortunately, very little and-parallel speedup has typically been observed in running standard Prolog search programs on Andorra-I. One reason for this is the sequential bottleneck inherent in the Basic Andorra model: the periods (both during the nondeterminate phases and while backtracking) when no and-parallel execution is performed. This suggests that the key to obtaining greater and-parallel speedup is to increase the "granularity" of the and-parallelism. That is, it is important to minimize the number of choicepoints created and the number of goal failures, relative to the total number of inferences. One way to achieve this in search programs is by the use of constraint satisfaction techniques. 4 Implementing finite domains in Andorra-I In order to experiment with finite domain constraint solving on Andorra-I, we have defined and implemented finite domains and a few simple primitives to operate on them. Our system defines a new data type, a domain, which exists alongside numbers, structures, etc. Domains can only be used as arguments to the domain primitives and have no meaning elsewhere in a program; for example, they cannot be printed. A domain is created with a set of possible values that it may take; eventually it may become instantiated to one of those values, at which time we call it an instantiated domain. In contrast with the Chip concept of domain variables, a domain instantiated to t is not identical to t. We write a domain as a set {tt,. . .,tn}, where tt,. .. ,tn are its current possible values; {t} represents an instantiated domain. Our domains,are easier to implement than domain variables because there is no need to change many basic operations of the system such as unification, suspension on variables, etc. At the same time, the efficiency of implementation should be comparable with that of domain variables, while our primitives are still quite convenient to use. We describe our primitives first and then outline their use and implementation. 4.1 Finite domain primitives Domains can be created by the primitives make domain and make domains. The latter is potentially more efficient when creating many domains ranging over the same values since the table of values can be shared. All of the other primitives operate on existing domains; they can only be executed when their first argument is instantiated and will fail if this is not a domain. domain_var performs the mapping between a domain and its ultimate value, while domain remove allows the removal of values from a domain.-Either of these may cause the domain to be instantiated: the first in a positive way, the second by removing all but one of the values. doma in_gue s s is the only non-determinate primitive. The last two, domain_size and domain_values, may yield different results depending on when they are called and should therefore be used with care. make_domain(D,Set) Can be executed when Set is instantiated to a non-empty list of distinct atomic terms, [tt, .. .,tn ]. D, which should be an unbound variable, is bound to a new domain, {tt,. .. ,tn }. make_domains (Ds,Set) Can be executed when Set is instantiated to a non-empty list of distinct atomic terms, [tt,. . . ,tn ], and D s is a list of variables. Each variable in D s is bound to a new domain, {tt, ... ,tn }. domain_var(D,Var) Unifies Va r with the value variable (a normal logical variable) of domain D. Subsequently, if D becomes an instantiated domain {t}, t is unified with Va r. Alternatively, if Va r becomes instantiated to t, if t is currently in the domain D, D becomes an instantiated domain {t}, otherwise failure occurs. domain_remove (D,Value) Can be executed when Value is ground. If Value is not currently in the domain D, there is no effect. If D is the instantiated domain {Val u e} the primitive fails. Otherwise Value is removed from the domain; if only one value, t, remains in the domain D becomes instantiated ~o {t}. domain_9uess (D) Instantiates D non-determinately to one of its possible values. If D is the domain {tt, . .. ,t n }, D is instantiated successively to (ttl, ... , {tn}. Note that do m a in 9 u e s s (D) is nondeterminate (unless D is already instantiated) and can therefore be executed only if there are no determinate goals to execute. 846 domain guess (Ql), Ql :i"" Q2, Ql * Q2 Ql * Q3, Ql * Q3 Ql * Q4, Ql * Q4 domain guess (Q2), Q2 :i"" Q3, Q2 * Q3 Q2 * Q4, Q2 * Q4 domain guess (Q3), Q3 :i"" Q4, Q3 * Q4 domain_guess (Q4) • domain_size(D,Size) S i z e is unified with a positive integer which indicates the number of values currently in domainD. domain_values (D,Values) Val u e s is· unified with a list of the values currently in domain D. 4.2 Finite domain programming Like Chip, our aim is to provide the programmer with a language as close as possible to Prolog but with the extensions necessary for constraint programming. However, the "Prolog" language supported by the Basic Andorra model differs in behaviour from that of regular Prolog, and this affects how the language is used. In this section we outline how our primitives can be employed in the context of Prolog on Andorra-I to solve constraint problems. Program 1 is our solution to the familiar N-queens problem. This program is almost identical to the Chip one on p123 of [Van Hentenryck 1989a], except that the result of the goal four_queens (Qs) is a list of domains (which can be converted to a numeric value by domain_ var). However, it executes differently. The execution order in Chip is the same as in Prolog, repeatedly executing a domain_guess goal for one domain followed by a no a t t a c k goal to remove inconsistent values from the other domains. On Andorra-I the program executes all of the queens and noattack goals first, since they are determinate, and sets up all '*' constraints before domain _gues s is called to non-determinately generate domain values. four queens (Qs) :Qs = [Ql, Q2, Q3, Q4] , make domains (Qs, [1,2,3,4]), queens (Qs) • queens ( [ ] ) . queens ([QIQs]) domain guess(Q), noattack(Q, Qs, 1), queens (Qs) . noattack ( , [], ). noattack (Ql, [Q2TQs], N) Ql Q2, Ql Q2 - N, Ql Q2 + N, Nl is N + 1, noattack(Ql, Qs, Nl). * * * Program 1: N-queens At the end of the first determinate phase, the resolvent contains only the following goals, for domain_guess and the inequality predicate '*', where each of Ql, Q2, Q3, and Q4 is an uninstantiated domain: - 1, Ql - 2, Ql - 3, Ql - 1, Q2 - 2, Q2 - 1, Q3 * * * * * * Q2 + 1, Q3 + 2, Q4 + 3, Q3 + 1, Q4 + 2, Q4 + 1, The only goals that can be executed in the nondeterminate phase are for domain guess, since the '*' goals are treated as det only (see Section 3). Selecting the leftmost goal, domain guess (Ql), Ql is instantiated non-determinately to the domain {1} and a new determinate phase begins, in which all nine '*' goals containing Ql can be executed in parallel. This example illustrates a difference between our language and Chip, which follows from the Basic Andorra model: that the order of goals in a clause is ' irrelevant. Constraints and generators can appear in any order, but the constraints will always be set up before any non-determinate bindings are made. This is important, since it results in a smaller search space. In order to get the same effect (called "generalized forward checking") in Chip, the structure of the program has to be changed. However, we do have to make sure that constraints can be executed determinately, so that they execute first, whereas constraints need not be determinate in Chip. The inequality predicate '*' used above is an example of a constraint that is to be executed by forward checking. Such predicates can be programmed using the primitives of Section 4.1. As an example, Program 2 defines a constraint plusorminus (X, Y, C), which means X=Y-C or X=Y+C. This can be executed in a forward checking way when either of domains X and Y is instantiated and the third argument is ground; it then leaves only (at most) the two values Y-C and Y+C (resp. X-C and x+c) in the domain of X (resp. Y). In Program 2 we use Pandora syntax [Bahgat and Gregory 1989]. The plusorminus procedure is a "don't-care procedure" in the style of Parlog: the first clause removes the appropriate values from the domain of Y if domain X is instantiated, while the second does the converse. This procedure uses the da t a primitive to wait for the domain to be instantiated and the operator ': to commit to the appropriate clause. A sequential conjunction operator '& is used in the pm procedure, so that the values currently in domain Yare found (by a call to domain_values) only after the other arguments are instantiated. It then f i 1 t e rS these values to find which ones must be removed from the domain, and removes them by calling domain_remove. In addition to primitive constraints such as inequality, Chip allows user-defined constraints. These are conventional Prolog procedures augmented with a 'f 0 r war d' declaration indicating which I I 847 arguments should be ground and which should be domain variables. For example, plusorminus is defined [Van Hentenryck 1989a: pl34] as follows: forward plusorminus(d,d,g). plusorminus(X,Y,C) :- X is Y - C. plusorminus(X,Y,C) :- X is Y + C. implements the first-fail heuristic (the noattack procedure is the same as in Program 1), and illustrates the general structure of such programs. Note that the "guessing" and "checking" components (the guess_queens and check_queens procedures) must be separated, though their order is unimportant. four queens (Qs) :Qs = [Ql, Q2, Q3, Q4] , make domains (Qs, [1,2,3,4]), guess queens (Qs), check=queens(Qs) . The problem with allowing user-defined constraints in Andorra-I is that the procedures may in general be non-determinate and, in any case, a search is required through the elements of a domain. One way to handle such constraints is by transforming the procedure to a determinate, forward checking, equi valen t, as we did with p 1 u s 0 r min u s in Program 2. Another way would be to use a "determinate bagof" primitive which is currently being implemented in Andorra-I. This is similar to the bagof of Prolog but it executes as part of the determinate phase as a new subcomputation, even if it has to create internal choicepoints. mode plusorminus(?, ?, ?). plusorminus(X, Y, C) [N 1from(N+ 1)]. F 2: sieve([P 1L]) ==> [P 1sieve(filterp(P,L»]. sieve(from(2)) in eland truncate(X, [H 1TJ) in C 3 is tried as follows: ~ ~ ~ ~ call E-Unify(sieve(from(2», [H 1T]) call E-Unify(from(2), [P 1L]) /* by F 2 */ exit E-Unify([21 from(2+ 1)], [21 from«2+ 1)]) /* by F 1 */ call E-Unify(sieve([21 from(2+ I)], [H 1T]) exit E-Unify([21 sieve(filterp(2, from(2+1»)], [2 1sieve(filterp(2,from(2+ 1»)]) 1* by F 2 */ In this E-Unification+ process, the reduction of a functional term is initiated when a head pattern of a clause or rewrite rule is a non-variable term and the corresponding argument of the caller is a functional term. Note that the functional term is not completely reduced to its normal form, but to WHNF, which makes it possible to handle the infinite data structures. The complete deSCription of the E-Unification algorithm, called E-Unification with Lazy Evaluation, is presented in [Nang et al. 1991]. FWAM-II, an abstract machine for Lazy Aflog, is an extension of W AM augmented with the manipulation of functional closure. It is characterized by that: • it adds the reduction mechanism to the W AM architecture, and • it employs an environment-based reduction rather than graph reduction. Since W AM uses an environment for the variables in the body of a clause, the conventional environment-based reduction scheme is more suitable to W AM than the graph reduction is in the combination. Therefore, FW AM-II behaves similarly to the WAM in the execution of a clause, whereas it works' similarly to an environment-based reduction machine in the reduction of functional term. This W AM-based approach has been also adopted in other abstract machines for the functional logic language, such as K-WAM [Bosco et al. 1989] for KLEAF and a W AM model [Nadathur and Jayaraman 1989] for A.-Prolog. The E-unification of Lazy Aflog is realized in FWAM-II via the reducibility checking in the unification instructions, which immediately calls the reduction process if the passed argument is a functional term and corresponding pattern is not a non-variable term. To implement the suspension and reactivation of functional closure, a run-time structure (called, Reduction Stack) is added to W AM structure. Figure 1 shows a compiled FWAM-II code for the filterp function in Exam~ pIe I, where mode and eq are predefined strict functions. Upon the benchmark testing [Nang et al. 1991], the mechanism of FWAM-II is relatively less effICIent than W AM executing pure Prolog programs because of its overhead to construct and reference the fun~ti?nal closure, but it can support lazy evaluation in lOgiC m the abstract machine level. Consequently, it is argued that FWAM-II can support not only all the features of logic language but also the essential features of functional language with the performance comparable toWAM. re~~ction F 3 : filterp(P, [X 1L]) ==> «X%P) == a 1 filterp(P,L» F 4: [X 1filterp(P,L)]. 3 A Parallel Computational Model In Example I, a query ":- test(100)." generates 100 consecutive prime numbers as its result. In the course of the refutation of the query, the unification of truncate(100, ~bstract for Lazy Aflog Although FWAM-II would be an efficient sequential machine. for Lazy Aflog, it has the speed limitation because of ItS sequential nature. A natural way to overcome this obstacle is to extend it in parallel. This 853 F 1: filterp(P, [X I L]) F 2: => «X%P) == 0 I filterp(P,L» [X I filterp(P,L)]. 3 allocate % Pattern Matching 'P',X1 fget_value X2 fget_list 'X' match_value 'L' match_value % Guard Checking try_me_else_L F2 X,Xl put_value P,X2 put_value call_P_Arity_N model2,2 O,X2 put_integer call_P_Arity_N eq12,2 % Committing commit % Construct WHNF 'filterp/2', Xl write_function 'P' write_value 'L' write_value Xl rewrite_value % Returning return trust_me_elseJail 'filterp/2', Xl write_function 'P' write_value 'L' write_value X2 write_list 'X' write_value Xl write_value X2 rewrite_value return Figme 1 A Compilation Example section addresses our point of view that adopts the RAP as our starting point, and presents a parallel computational model for Lazy Aflog. 3.1 Parallelisms in Lazy Aflog Programs Lazy Aflog has various kinds of parallelisms inherited from both function and logic, such as ANDParallelism, OR-Parallelism, and Argument-Parallelism. Among these parallelisms, we adopt the Independent AND-Parallelism as the primary parallelism owing that: • Ideally, all parallelisms in the Lazy Aflog program can be exploited in the parallel extension. However, it may require a complex control mechanism that may degrade the performance gains obtained through the parallel execution. • Since the Argument-Parallelism in the functional language part can be viewed as a kind of Independent-AND Parallelism in the logic language part, we can exploit parallelisms in both of the functional and the logic parts in a simple and coherent manner if there is a parallelizing method for it. • There have emerged an efficient and powerful computational model and an abstract machine for Independent-AND Parallelism of logic programs. DeGroot's RAP Model and RAP-WAM [Hermenegildo 1986] are such a computational model and an abstract machine, respectively. 3.2 A Parallel Computational Model: PR 3 A parallel computational model for Lazy Aflog, called PR3 [Nang 1992], is a parallel model which can support both of the parallel resolution and parallel lazy reduction simultaneously. The basic principle to spawn a parallel task is as follows; Rule 1) the subgoals in a clause are executed in parallel when their arguments are independent or ground Rule 2) the arguments of a functional term are reduced in parallel when their WHNFs are demanded and the function is a strict one Rule 3) the alternative clauses and rewrite rules are tried sequentially using the top-down strategy The algorithm of independent and ground are same as the ones defined in [DeGroot 1984]. This principle can be expressed with an intermediate code, called CGE+ (Conditional Graph Expression +), which is an extension of DeGroot's CGE [DeGroot 1984]. It is used to express the necessary conditions to spawn the subgoals or function reductions in paralleL The body of a clause and righthand side of a rewrite rule are expressed by the CGE+, which is informally defined as follows; 1) G : a simple goal (or subgoal) whose argument can be a functional term. 2) (SEQ E 1 . . . En) : execute expressions E 1 through En sequentially 3) (PAR E 1 . . . En) : execute expressions E 1 through En in parallel 4) (GPAR (V 1 . . . Vk) E 1 . . . En) : if all the variables V 1 through Vk are ground, then execute expressions E 1 through En in parallel ; otherwise, execute them sequentially 5) (IP AR (V 1 . . . Vk) E 1 . . . En) : if all the variables V 1 through Vk are mutually independent, then execute expression E 1 through En in parallel; otherwise, execute them sequentially 6) (IF B E 1 E 2) : if the expression B is evaluated to true, execute expression E 1; otherwise, execute expression E 2 7) F (SEQ Fl'" Fn) : if F is a construct symbol or non-strict function symbol, then construct WHNF F ( Fl' .. Fn) sequentially ; otherwise (i.e. F is a strict function symbol) evaluate expressions F 1 throuph Fn s,equentially and eventually evaluate F(F 1 . . . Fn) 8) F (PAR F 1 .. , Fn) : if F is a construct symbol or non-strict function symbol, then COl~stru~t W~NF F ( F 1 .. , Fn) sequentially ; otherwISe (z.e: F 15 a strict function symbol) evaluate expresslOns F 1 throuph Fn i,n parallel and eventually evaluate F( F1 .,. Fn) The expressions 1) through 6) are the same as the DeGroot's CGE for the clause (actually they are improved CGE defined in [Hermenegildo 1986]), while expressions 7) and 8) are new expressions for rewrite 854 rules. Note that there are no conditions to check the groundness of function arguments in the expressions 7) and 8), since they are automatically checked by pattern matching semantics of the rewrite rules. That is, the arguments of a rewrite rule are always ground, hence, they can be always evaluated in parallel. In expression 8), the arguments are reduced in parallel only if the function is strict, which results in its WHNF. Otherwise, it is rewritten to the term in the right-hand side, which is returned as the result. In this case, as it is not WHNF, it induces another reduction process. The reason to adopt this reduction strategy rather than directly call the nonstrict function, is in order to keep the storage optimization based on tail-recursion. (b) When the body ofCl is executed. The goals p(X.z) and q(Y,W) can be executed in parallel Example 2 is a CGE + for a Lazy Aflog program. It can be automatically generated from the Lazy Aflog program by the parallelizing compiler, or programmed directly by the programmer. Example 2 A CGE+ for the Lazy Aflog program C 1 : test(X,Y) :- (IPAR (X,Y) p(X,Z) q(Y,W», r(f(Z), g(W». c 2: test(X,Y). C3 : p(a,I). C 5 : q(c,3). C 4 : p(b,2). C 6 : q(d,4). C7: r(2,S). L...-_ _ _ (c) The reduction of f(l) causes the reductions of fib(l) and fib(l *2) in parallel C 8 : r(4,72). F 1 : f(X) ==> (X == 0) I 0, F2 : +(PAR fib(X) fib(2*X». g(Y) ==> *(PAR factorial(Y), fib(Y». In Example 2, as the subgoals p (X ,Z) and q (Y ,W) would generate the values of Z and W that are taken into the terms I(Z) and g(W), the goal r(/(Z),g(W» should be executed after the evaluation of them. Figure 2 is the snapshots of the parallel execution of the CGE+ in Example 2 when "Q 1 :- test(b,d)" is given. In Figure 2, the rectangle, circle and rounded-rectangle represent OR node, AND node and reduction node, respectively. The number attached to each node represents the order of execution, while the filled nodes represent the activated nodes at that time. Note that, since the Unification Parallelism is not exploited in PR 3, the functional terms 1(1) and g (3) in the step (c) are reduced sequentially, although they can be evaluated in parallel if the innermost-like reduction strategy is used. The backward execution of the PR 3 is the same as the one presented in [Hermenegildo 1986] because there are no backtracking in the reduction phases after a functional term is eventually reduced to WHNF. For example, in the step (d), the subgoal q (Y ,W) which generate the arguments W would search alternative solutions for Y and W when a fail is occurred, rather than to generate another WHNF for g(3) or 1 (1). 4 A Parallel Extension of FWAM-II for PR3 The desirable characteristics of parallel abstract machine is to support the parallel execution while retaining the performance optimizations offered by the current sequential systems. To achieve this goal, a parallel abstract machine for PR3, called PFWAM-II (Parallel FWAM-II), is designed as an extension of the sequential abstract machine FWAM-II. It is equipped with the runtime structures and instruction set to fork and join the parallel executions. We adopted the nm-time structures 3 -2:p(~ C7:r(2.5) C8:r(4.72 (d) Since there are no clause unified with r(2,12), , a 'fail' message is sent to q(Y,W), and now C6 is tired. Figure 2 The Parallel Execution Snapshots of the Lazy Aflog Program in Example 2 and instructions of the RAP-WAM for the extension of FWAM-II because it is also an extension of W AM for AND-Parallel execution of Prolog and has a general primitive to fork and join the parallel tasks. Figure 3 shows the relationships between WAM, FWAM-II, RAPWAM, and PFWAM-II. RAP-WAM ~=~_~ PFWAM-/I Figure 3 The Relationships Between WAM, FWAM-II, RAP-WAM and PFWAM-I1 855 4.1 Run-Time Structures for Parallel Execution The run-time structure of PFWAM-II is an extension of FWAM-II for parallel executions as shown in Figure 4. It consists of three parts; First, the Heap, Trail, Environment, and Choice Point are structures for the execution of the logic part, and inherited from W AM ; Secondly, RS (Reduction Stack) is the structure only for the function reduction, and inherited from FWAM-II ; Finally, GS (Goal Stack), ParCali Frame, Local Goal Marker, Input Goal Marker, and Wait Marker are run-time structures for the parallel executions of subgoal or function reduction, that are inherited from RAP-WAM with slight modifications. HEA REDUCTION SIM:K H CFA3 CP~ P CODE , - 1 - TRm MESSAGE BUFFER MB -,- Figure 4 Data Areas and Registers for One PFWAM-II In fact the run-time structures of the parallel execution is almost ~he same as that of RAP-WAM except that a parallel task ill PFWAM-IT can be a reduction of a func~onal term as well as the evaluation of subgoal, whereas m RAP-WAM, only the evaluation of a subgoal can be a parallel task The run-time structure for parallel execution are the Goal Frame, ParCall Frame, Input Goal Marker, Local .Goal Marker, and Wait Marker. Let us explain them focusmg on the extensions which allow them to be also used for function reduction. • The Goal Frame : The subgoals or the functional terms which are ready to be executed in parallel are pushed onto the Goal Stack Each entry in the GS is also called a Goal Frame as in RAP-WAM. A Goal Frame contains all the necessary information for the remote execution of tasks. There are two kinds of Goal Frame in PFWAM-II; one is for a subgoal, and the other is .for a function reduction. They are distinguished by the special tag in the Goal Frame. When a Goal Frame is the one for t1;le subgoal, the structure of Goal Frame is the same as in RAPW AM ; otherwise (i.e. it is one for the function reduction), it contains the extra pointer to the functional term to be reduced. In both cases, they are stolen from Goal Stack by a remote processor, and executed remotely in the same way. • The ParCall Frame: It is used to keep track of the parallel tasks during forward and backward executions of PR 3. The entries and meanings of the ParCali Frame that is created for each parallel task are the same as in RAP-WAM. If a ParCall Frame is the one for the parallel function reductions, it immediately disappears from the Local Stack when the parallel reductions are completed because there is no backtracking in the reduction process. It is different from the case of parallel subgoal calls, in which it remains in the Local Stack in order to select the appropriate actions during backtracking. • The entries and meanings of the Input Goal Marker, Local Goal Marker, and Wait Marker are the same as in RAP-W AM. However, they also immediately disappears when the task is a function reduction and it is reduced to WHNF. The general execution scenario of PFWAM-II is as follows. As soon as a processor steals a task from another processor's Goal Stack, it creates an Input Goal Marker on its top of Local Stack, and checks whether it is a subgoal or a function reduction. If it is a subgoal, the processor starts working on the stolen sub goal by loading its argument registers from the parameter register fields in the Goal Frame and fetching instructions starting at the location (procedure address) received. If the stolen task is a function reduction, the processor loads the arguments and finds the starting address of the corresponding rewrite rule by referencing the functional term stored in the Heap of the parent processor. It was recorded on the Goal Frame by the parent processor. At any case, the local stacks of the processor will then grow (and shrink) as indicated by the semantics of FWAM-II. When a parallel call is reached, a ParCall frame is created on the top of the Local Stack and tasks are pushed on to the Goal Stack. If there are no idle processors in the system at that time, the processor itself gets the goal from its Goal Stack again, makes a Local Goal Marker, and executes the task locally. If the parallel call is one for the subgoals, an Wait Marker is created on the top of the Local Stack as soon as all subgoals succeed. It is used for the backward execution of PFWAM-II. However, if the parallel call is for the function reduction, the ParCall Frame, Local Goal Marker, or Input Goal Marker, created on the local Stack can be removed since there is no backtracking in the reduction process. After the parallel call is finished, the execution can continue normally beyond the parallel call. 4.2 Instruction Set The instruction set of PFW AM-II consists of the FWAM-II instructions and the new instructions implementing RAP as shown in Table-I. Since the FWAM-II instructions were explained in [Nang et al. 1991] and the instructions to fork and join the parallel call when tasks are subgoals are almost the same as the RAP-WAM, we only explain the instructions to control the parallel reduction. To fork and join the parallel executions are actually the same as the RAP-W AM when the parallel call is a determinate one. However, some attentions are required since the tasks to be forked can be functional terms. • pushJeduce Vn, Slot_Num It makes a new goal frame on the Goal Stack with the Slot_Num for the functional term pointed by Vn. 856 The PFWAM-il Instruction Set The PFWAM-II Instruction Set YVAM instructIons J:'roceaure _'-..ontrol L try retry L trust L try_me_else L retry_me_else L trust me else fail Get get_variable Vi,Ai get_value Vi, Ai get_constant get_list get_structure get_nil C,Ai Ai S,Ai Ai .-!n~exmg switch on term switch-on-constant switch=on=structure Ai, v,c,l,s n, ff n,ff Put put_variable put_value put_unsafe_value put_constant put_list put_structure put_nil Vi,Ai Vi,Ai Yi, Ai C,Ai Ai S,Ai Ai call execute proceed allocate deallocate Llause_~..;onJ:!"Ql P/arity Unify unify_variable unny_value unny_unsafe_value unny_constant unny_list un~_structure un' nil uni -void Vi Vi Yi C S I{eauctIonlnstructIons l'get tget_value fget_constant fget_list fget_structure fget_nil ~atcnmg \Ii, Ai C,Ai Ai S,Ai Ai match_value match constant match-structure match=list R~uction Contro~ commit return rewrite_value 1!n~ng Vi C S write_value write constant write-structure write-list write-function write-structure value Reducmg reduce_value RewrItmg Vi arallel Abstract Machme ~[)eClhC InstructIons InstructIons Parallel ReauctIon push_call JJush_reduce Pid/Arit~lot# Slot_#, bel aeallocate_pcall check_read y Vn,Vm check_independent waiting_ on siblings ~-l'Vi ~M check_me_else_Iabel Label check_ground Vn allocate_pcall #_oCslot, M J20R _pending__ goal • deallocate-rcall It is used to join the parallel reductions. It waits until the number of goals to wait on in current ParCall Frame is 0; then, removes the current ParCall Frame from the local Stack. Figure 5 shows the simplified PFWAM-ll codes for F 2 of the CGE + in Example 2, in which since '+' and '*' are strict functions, their arguments are reduced directly rather than constructing the functional closure. 5 Analysis 5.1 Performance Evaluation In order to estimate the performance of our parallel extension, a simulator for PFWAM-II is developed. In this simulation, we assumed that there is a common shared memory for the run-time structures of each processor which are interconnected by a network. Each processor can access the run-time structures of other processors without additional overheads. The performance of PFWAM-ll is estimated by counting the number of memory and register references, where the time for referencing data stored in the shared memory (whether it is local or not) is assumed 3 times longer than the time for register referenCing, and the times for other operations such as arithmetic are ignored for the sake of simplicity. We use three benchmark programs : the first one is FibonaccilO that is to compute the 10th fibonacci number, the second is CheckSO [Hermenegildo 1986] in which there are 10 parallel tasks each of which calls itself 50 Ai C S Ai F,Ai 5, Ai Ai ~peclhcs Vn, ~lot_# F 2: g(Y) ==> *(PAR factorial(Y), fib(Y». F 2: allocate % Pattern Matching fget_ value Yl, Xl % Spawn Parallel Reduction for factorial(Y) allocate_pcall 2, 2 put_value Yl, Xl write function factorial/l, Y2 write=value Xl pushJeduce Y2,2 % Spawn Parallel Reduction for fib(y) put_value Yl, Xl write_function fib/l, Y3 write_value Xl pushJeduce Y3, 1 % Gather the Results pop_pending_goal deallocate_pc all % Construct WHNF put_value Y2, Xl put_value Y3, X2 call_P_Arity_N */2, 2, 1 rewrite_value Xl % Returning return Figure 5 An Compilation Example for CGE+ in Example 2 times, and the third is Symbolic Derivation [Hermenegildo 1986] which is to find the derivative with respect to a variable. There are 176 parallel tasks in the FibonaccilO, 857 10 parallel tasks in the CheckSO, and 152 parallel tasks in the Symbolic Derivation. These benchmarks are programmedin both of logic and functional programming. In the simulation of function reduction, the effect of different reduction strategies is also measured. The simulated reduction strategies are Innermost Reduction in which the innermost functional terms are reduced first before the outer is tried, Semi-Lazy in which only the strict functions are reduced in the innermost fashion, and Lazy Reduction in which all functions are reduced in the outermost fashion. Upon the simulation results, the parallelizing overhead, which is defined as the extra execution time for parallel code running on the single processor, is measured as about 30-60 % when the grain size is relatively small (for example, FibonaccilO and Symbolic Derivation), whereas about less than 1 % when the grain size of parallel task is large enough to ignore the overhead (for example, CheckSO). Figure 6 graphically shows the speedup of the execution time of all benchmark programs as a function of the number of processors. In this figure, since CheckSO has only 10 parallel tasks, the speedup doest not increase when the number of processors is larger than 10. The speedup of other benchmark programs are not linear because they have too fine-grained parallelism. The most important fact which can be identified from Figure 5 is that, whether they are programmed in the logic or functional style, and whether the reduction strategy is innermost or outermost, the speedup behaviour is almost same. The speedup ratio is not dependent on the execution mechanisms, but the availability and grain size of parallelism in the benchmark programs. In other words, PFWAM-II can support both of the parallel resolution and parallel reduction with the almost same efficiency. Figure 7 shows the Working, Waiting, and Idle times for Symbolic Derivation as a function of the number of processors. It is from identified from Figu~e 7 that the processor utilization ratio is reduced proportlOnal to the number of processors, and the parallel reduction mechanism permits higher utilization ratio than parallel resolution because there is no restriction to steal a task from other processors when the task is a function reduction (i.e., there is no "garbage slot problem [Hermenegildo 1986]" when executing the function reduction). 5.2 Comparison with Related Work One of the most related works is the CSELT's work centering around K-LEAF. K-LEAF [Levi and Bosco 1987] is a functional logic language based on the transformation. A rewrite rule in K-LEAF program is transformed into Prolog clause with an extra argument for the return value, and the nested function is flattened with produced variable for the outermost search strategy. K-WAM is an abstract machine to support outermost-SLD resolution which is the inference rule of K-LEAF. Accordingly, there is no real reduction mechanism in K-LEAF and KWAM. A parallel extension of K-W AM on a distributed memory multiprocessor is also developed [Bosco et al. 1990]. In this work, K-WAM is extended to control the OR-parallel execution of K-LEAF progra~, and parallelism is restricted to be one-soiutlOn. The major difference between the parallel extension of K-WAM and PFWAM-IT is that the former is designed for exploiting only OR-parallelism in the flattened K-LEAF programs, while the later is designed for exploiting only ANDparallelism of Lazy Aflog programs. AN?- Fi bonacci 1O. Speedup 7~------------------------------~ 6 5 4 3 2 o 2 4 6 8 10 12 14 16 Number of Processors (a) SpeedUp vs. # of Processors for Fibonaccil0 CheckSO.Speedup 10~------------------------------~ 9 Innermost Semi-Lazy Lazy I I I I I I I I I "'~ ---o 2 ... ,. ,. .. "," 'Resolution 4 6 8 10 12 14 16 Number of Processors (b) SpeedUp vs. # of Processors for CheckSO Derivation.Speedup 6~----------------------------~ 5 4 .. Resolution "," '" "..-_ 3 ,," 2 " ... -- Innermost ,. "11 " (Semi-) Lazy o 2 4 6 8 10 12 14 16 Number of Processors (c) SpeedUp vs. # of Processors for Symbolic Derivation Figure 6 SpeedUp vs. Number ofProcessors for Benchmark Programs 6 Summary This paper presents a pair of a parallel computational model and its abstract machine for a functional logic language, called Lazy Aflog, which was proposed as a cost-effective mechanism to incorporate functional language features into logic language. The proposed 858 References Resolution: Processors Utilization Ratio (%) 100 80 60 40 20 o 3 5 7 9 11 13 15 Number of Processors (a) Case of Logic Programming (Parallel Resolution) Number of Processors (b) Case of Functional Programming (Parallel Reduction) Figure 7 Working, Waiting Idle Times for Symbolic Derivation computational model underlies De 100% with 128 processors) since the lazy-splitting is better in coping with irregular shaped tree. However, the overhead of redundant computation makes the lazy-splitting rule unsuitable for a shallow search tree such as that of the 8-queens, the zebra, or the turtles program. The height of the search trees for these programs is not sufficiently larger than log(128), the level at which each of the 128 processors commits to its own tasks. 4.2 7 logN The Tree Programs o 7 1:1~;1 0 B 23456 B B B o 567 log N Speed-up Factors Speed-up factor is defined as sequential runtime divided by the parallel run time. It is a generally accepted indication of how well a parallel system is able to improve the runtime of a program. Next, we present data showing speed-up factors of the proposed approach on the selected benchmark programs. Table 3 lists speed-up factors from a simulation study running on a uniprocessors. In this simulation, the run time is measured by the number of resolutions performed in the execution (number of nodes traversed in the proof tree). Proc. I Prog 128 II------'---'-=Ea-g-e 4 I 8 I 16 I 32 I 64 .... r_-sp"""li:-tt"'""in-g---L.----l 8-queens 9-queens zebra turtles pattern n-square tree 3.9 2.9 3.2 3.0 2.8 2.6 2.4 7.5 4.5 4.0 5.2 5.5 2.8 2.4 9.4 8.6 8.3 8.6 6.2 3.2 3.7 31.9 22.7 15.3 15.3 21.7 3.7 6.8 40.1 42.2 20.6 27.6 21.7 6.7 6.8 n-square tree 2.2 1.6 2.8 2.3 Lazy-splitting 4.3 7.2 13.0 2.7 4.9 9.0 17.7 14.2 17.0 16.7 9.1 8.6 12.5 3.7 6.5 Table 3: Speed-up from Simulation Study. Speed-up is defined as sequential runtime divided by parallel runtime. Table 4 lists speed-up factors from a parallel emulation study running on a BBN Butterfly TC2000 with 32 processors. The run time is measured by the physical clock. We assume that each resolution step takes constant time. Cost of a real resolution step varies in general. However, here we are merely interested in the total time of a task which consists of a large number of resolution steps. The total time (the sum of the time by all resolution steps) can be considered as the average cost of each resolution step times the number of resolutions. In other words, the difference of time spent on each resolution step is immaterial. For a given program, the constant can be regarded as the average cost of each resolution step. 865 In order to observe the real overhead of task allocation, which is the time to compute the partition of tasks, the resolution speed must be realistic. In the emulation, resolution engine speed is set equal to that of Aurora Parallel Prolog2 , a well known parallel Prolog implementation, running on one Butterfly processor. Both the eager and the lazy scheduling strategies are implemented in the emulator. The eager-splitting rule was used for the programs n-queens, zebra patten and turtles. The lazysplitting rule was used for the programs n-square and tree. From the emulation study, we are able to verified that the sequential simulation, which measures run time by the number of resolution steps performed, accurately reflects the speed-up result by the parallel emulation, which measures run time by the real clock, for up to 32 processor. The overhead of calculating the task distribution, the only overhead not considered in the simulation, is nearly invisible in the emulation, given that the speedup factors are almost identical to that from the sequential simulation. Notice that there is no communication involved here. Program 1 4 proc 8-queens 9-queens zebra turtles patten 1 1 1 1 1 4.0 3.0 3.2 3.1 2.8 n-square tree 1 1 2.2 1.6 8 proc 16 proc Eager-splitting 7.6 9.3 4.6 8.4 4.0 8.0 5.2 8.2 5.5 6.1 Lazy-splitting 2.8 4.2 2.3 2.7 32 proc 16.5 16.4 8.9 8.2 12.1 6.9 4.6 Table 4: Speed-up from Emulation Study. 4.3 Performance Comparison with Aurora Parallel Prolog The same set of benchmarks were run with Aurora Parallel Prolog on the Butterfly machine. Runtime and speedup factors (the best out of 10 runs) are listed in table 5. The Peak Speed-up Factors: The speed-up curves for all benchmark programs either have reached the peak (bold face numbers) or at least level off with Aurora Parallel Prolog on 32 processors, as shown i-n Table 5. Using the self-organizing scheduling approach, simulation results (Table 3) on up to 128 processors showed that: • the peak speed-up factors for the 8-queens, zebra and turtles programs (with fine grain parallelism) exceed, by a margin of at least 200%, experimental results on Aurora; • the peak speed-up factors for the 9-queens program is twice as that on Aurora; 2 Aurora O.6/Foxtrot, patch #8, with the Manchester Scheduler. • the peak speed-up factors for the n-square program (with a very bushy search tree) is about 30% faster that that on Aurora. I Program II 1 proc 8-queens 9-queens zebra turtles pattern n-square 1,620 7,500 2,600 4,300 1,084 2,230 I 16 proc I 24 proc 141/11.5 533/14.1 490/5.3 550/7.8 130/8.3 190/11.7 122/13.3 367/20.4 500/5.2 580/7.4 160/6.8 170/13.1 I 32 proc 123/13.2 350/21.4 525/4.9 569/7.5 240/4.5 178/12.6 Table 5: Runtime (ms.) / Speed-up factors with Aurora Parallel Prolog Speed-up Comparison: Given the number of processors, the speed-ups achieved by self-organizing scheduling appears to be comparable to that of Aurora, but somewhat lower when the number of processors is small (e.g < 16). Note that these results are obtained without communication. The same speed-up result is expected to hold regardless of the speed at which resolution engine is running. Therefore, absolute speed comparison will favor the self-organizing scheduling scheme. 5 Discussion In the above experiment we studied the behavior of the proposed technique without communication among processors. We demonstrated that the scheme is able to effectively deal with problems which render mostly fine-grained parallel tasks under a traditional scheduler. The loss of processor utilization due to the unevenness in load distribution can be more than covered by the benefit of reduced scheduling overhead. The advantage of the proposed technique is its non-communicating nature, as frees it from possible constraints such as communication bandwidth among processors that could otherwise limit the ability of a scheduler to function effectively. The limitation, however, is its unable to re-use processors that complete tasks they allocated before the termination of the (parallel) execution. We have shown, in the above simulation study, that this would not necessarily compromise performance of programs specially those that generate mostly fine-grained tasks at run time under a traditional scheduler. But the worse case scenario could happen despite the effort to obtain a better balanced load distribution by removing structural imbalance of the search tree and using a statistically even distribution rule. Below, we discuss options to deal with the problem. One possible solution to the problem is to resort to dynamic task redistribution as existing schedulers do. As we know, the overhead of dynamic task redistribution is relatively small for medium to large-grained tasks, and it provides us with the adaptiveness necessary to deal 866 with some extraordinary shape search space. On the other hand, the self-organizing scheduling approach introduces low overhead and thus ensures that when it does not help improve performance it is not expect degrade it either. When the two methods are careful integrated, it can be a combination that takes advantage of what the two methods are best at. The issue is when and how dynamic task redistribution should be invoked to achieve the best result. Preliminary research has been conducted in this direction and we will present results in a separate paper. Another option that alleviates the problem is to have idle processors collected by a higher level scheduler (e.g. the operating system) and assigned to other queries. The idea is to use dynamic scheduling only at the level of user queries which usually offer larger granule. In a multi-user environment, this approach can yield a high system throughput given sufficient queries. Global load balancing is involved here. It appears an interesting subject for future investigation. Static program analysis that provides probability of cut-offs according to given query patterns will be very helpful to guide task distribution. More research is yet to be done before this becomes a feasible alternative to the currently used statistical distribution rule. Finally, we note that an interesting feature of the selforganizing scheduling approach is that it establishes linkage between processor mapping and the syntax of a program. This feature provides user a mean to influence the mapping of processors to tasks, as would be particularly helpful for applications in which tasks are clearly defined and dynamic task redistribution is known to be not beneficial (there are many such applications). Again, dynamic task redistribution can be used to guard against abuse of this feature. We presented data showing the effectiveness of the proposed methods on programs that belong to the generateand-test category. By removing structural imbalances in a program, it was found that a reasonably balanced load distribution can be obtained by following a statistically even distribution rule. We discussed two distinct task distribution rules, the eager-splitting rule and lazy-splitting rule and examed their effectiveness. We showed that the peak speed-up factors with selforganizing scheduling for a set of benchmark programs exceeds, by a substantial margin, results achieved on the same programs by Aurora Parallel Prolog, a well-known parallel Prolog implementation. Given a fixed number of processors, the speed-up factors by the self-organizing scheduling scheme are competitive. By experimenting with the two near-extreme case task distribution rules we also demonstrated that adaptability can be gained on the cost of redundant computation within this framework. We believe that the condition for task distribution derived in the paper can be useful for other scheduling schemes. Also, the idea of removing structural imbalances in a program will help with a tree-based scheduler that employs the top-most dispatching strategy [But88, Cald88]. We are currently investigating incorporating traditional task redistribution techniques in order to handle large but highly uneven shaped search trees. Preliminary results indicate that allowing limited communication among processors one can substantially improve the efficiency of the execution. Global load balancing, aimed at maximizing throughput of a system that supports multiple user and multiple queries, is an interesting topic for future research. 6 References Conclusion and Future Work A task scheduling technique, self-organizing scheduling, is proposed in this paper. The method directs processors to share the search space, a search tree defined implicitly by the program, according to universal rules followed by every processor in the system. Load balance is achieved by altering the shape of the search tree to remove the so-called structural imbalance (see section 3), and imposing a statistically even task distribution rule to deal with the randomness in cut-offs in the tree. Resolution engines only share the program and the original query. A condition for task distribution that minimizes the average parallel runtime is given and proved. An advantage of the method is that it allows all processors to operate independently on private resources both for resolution and task allocation, while being able to maintain a fairly balanced load distribution among processors. The effectiveness of the self-organizing scheduling scheme is independent of the speed of the resolution engine, and architectural characteristics of the multiprocessor. [Ali90] Ali, K. and Karlsson, R., "The Muse Or-Parallel Prolog Model and its Performance", Proceeding of the North American Conference of Logic Programming, 1990, MIT press, 1990. [Ali91] Ali, K. and Karlsson, R., "Scheduling OrParallelism in Muse", Proceeding of the 8th International Conference on Logic Programming, MIT Press, 1991. [But88] Butler, R., Disz, T., Lusk, E., Overbeek, R., and Stevens, R., "Scheduling OR-Parallelism: an Argonne perspective", Logic Programming, Proceedings of the Fifth International Conference and Symposium on Logic Programming, MIT press, 1988. [Cald88] Calderwood, A., Szeredi P., "Scheduling Orparallelism in Aurora - the Manchester Scheduler", Proceedings of the Sixth International Conference on 867 Logic Programming, pages 419-435, MIT Press, Jun. 1989. [Clock88] Clocksin, W. F. and Alshawi, _H., "A Method for Efficiently Executing Hown Clause Programs Using Multiple Processors", New Generation Computing, 5, 1988 P 361-376 OHMSHA, Ltd. and Springer Veriag. [GiuI90] Giuliano, M., Kohli, M., Minker, J., Durand, I., "Prism: A Testbed for Parallel Control", Parallel Algorithms for Machine Intelligence, edited by Kanal, L., and Kumar, V., to appear. [Kale85] Kale, L. V., "Parallel Architectures for Problem Solving", Technical report No. UIUCDCS-R-851237, Department of Computer Science, University of Illinois at Urbana-Champaign. [Kumar87] Kumar V. and Nageshwara Rao V. "Parallel Depth First Search. Part II. Analysis" International Journal of Parallel Programming, Vol. 16, No.6, 1987. [Lloyd84] Lloyd, J. W. "Foundations of Logic Programming", Springer-Verlag, 1984. [Lusk] Lusk, E., Warren H. D., Haridi, S., et al. "The Aurora Or-Parallel Prolog System", Argonne internal technical report. . [Mud91] Mudambi, S., "Performance of Aurora on NUMA Machines", Proceeding of the 8th International Conference on Logic Programming, MIT Press, 1991. [VR90] Van Roy, P. 1., "Can Logic Programming Execute as Fast as Imperative Programming", Univ. of California, Berkeley Technical Report UCB/CSD 90/600, Dec., 1990. Appendix We prove the following theorem: Theorem: Let N be the number of processors, let m C~ is an integer) be the number of tasks whose sizes are statistically identical and exhibits the following property: 1. the probability density function is non-increasing, or 2. the probability density function is symmetric with respect to a positive central point. then the average parallel runtime is minimized iff identical number of processors are assigned to each of the tasks. Before the proof, we describe some basic terminology and notations to be used. Capital letters X, Y, Z are used for random variables. The probability density function for X is fx(x), the cumulative probability distribution function for X is Fx(x), we have Fx(x) = f~oo fx(t)dt by definition. Or in other words, fx(x) = Fx(x). In addition, fx(x) ~ 0 and 0 ::; Fx(x) :::; 1. Fx(x) is non-decreasing since fx(x) ~ o. Runtime of a parallel execution is the longest runtime of all processors. Runtime is measured by the size of a task, in our case, the number of nodes to be traversed in a search tree. N is the number of processor available. Tl, T2 , ••• , Tm are random variables denoting the size of m tasks which are statistically identical, that is, with an identical probability distribution function f(x) and F(x). Let kl' k2' ... , km be the number of processors assigned to TIl ... ' Tm , respectively. kl + k2 + ... + km = N. We illustrate the proof with a special case when m = 2. Proof: Let Z be a random variable denoting the runtime by assigning kl to task Tl and k2 to task k2. We assume that Tl is processed in time f; and T2 is processed in time Tl T2 Z = max(-,-) kl k2 The cumulative distribution function for Z is FAx), f!. probability that Z ::; x probability that (~~ ::; x) AND (~: ::; x) probability that (Tl ::; klX) AND (T2 ::; k2X) F(klX)F(k2X) Average runtime is the mean of Z, We- need to show that Z is minimized when kl = k2' given that kl + k2 = N, a constant. For fixed kI,k2' define function G(x) = F(klX)~F(k2X). We have since F(klX )F(k2X) ::; G2(X), given that F(x) is nonnegative. Equality holds when kl = k2 • Case I: the probability density function f(x) is nonincreasing. It can be shown that the curve of F (x) is either of an arch shape, or a straight line, as illustrated in figure 5. The curve of G( x) lies below (or on) that of F( x) because the curve of G(x) is composed from center points in lines whose two ends are on curve F(x). G(x) - F(x) ::; 0, hence G2(X) - F2(X) = (G(x) - F(x))(G(x) + F(x)):::; 0 Therefore, 868 y klx x (k1+k2)x/2 k2x Figure 6: An S Shape Distribution Figure 5: An Arch Shape Distribution Equation holds when kI Thus, we have I: (1 - Pz(x»dx ~ that 11, after the rotation, completely lies above or on 12 • Thus, = k2 • 1:(1- 2 G (x»dx ~ 1:(1 - and equality holds when kI = k2 • Thus the mean of Z is minimized when kI = k2 • Case II: the probability density function f(x) is symmetric with respect to a positive center point, denoted by C. The curve of F( x) is of the shape an S tilted to the right, as illustrated in figure 6. The curve of G(x) is another S shape curve "contained" in that of F(x). We want to show that ' or, This is equivalent to showing smce i: i: (F(x) - G(x))dx (F(x) F(C + x) - G(C + x) ~ G(C - x) - F(C - x) P2(x»dx ~0 + G(x))dx > 0 Notice that we can no longer have (F(x) - G(x)) ~ 0 for all x. However, the integral of (F(x) - G(x)) can still be non-negative if we can proof the shaded areas' A2 is larger or equal to Al in figure 6. It suffices to show that for any (C - x) and (C + x) on the X axe, F(C+x)-G(C+x) ~ G(C-x)-F(C-x), and equality holds when k1 = k2. Observe that (C-x,G(C-x)) is the center point of a line, 11, whose end points are on the curve of F(x). (C+x,G(C+x)) is the center point of another line, 12 , whose end points are on the curve of F(x). Now, rotate the lower part of the S shaped curve of F( x) 180 0 • The two part of the S matches each other and it can be shown Equality holds when k1 = k2. Proof done for m = 2. 0 The same idea can be used to prove the general case. A formal proof of the general case will not be presented here, but we note that a property of polygon that is crucial to the proof is that the center of a convex polygon resides inside the polygon. Acknowledgement: I wish to thank Professor Jack Minker for his guidance on this work. Thanks to Dr. Mark Guiliano for his comments on an early draft of this paper. Also, I would like to express my appreciation to Argonne National Laboratory for providing parallel computing facilities. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 869 Asymptotic Load Balance of Distributed Hash Tables No buyuki Ichiyoshi and Kouichi Kimura Institute for New Generation Computer Technology 1 - 4 - 28 Mita, Minato-ku, Tokyo 108, Japan {ichiyoshi,kokimura}@icot.or.jp Abstract The distributed hash table is a parallelization of the hash table obtained by dividing the table into subtables of equal size and allocating them to the processors. It can handle a number of search/insert operations simultaneously, increasing the throughput by up to p times that of the sequential version, where p is the number of processors. However, in the average case, the peak throughput is not attained due to load imbalance. It is clear that the table size m must grow at least linearly in p to balance the load. In this paper, we study the rate of growth of m relative to p necessary to maintain the load balance on the average (or to make it approach the perfect load balance). It turns out that linear growth is not enough, but that moderate growth-namely w(p log2 p )-is sufficient. The probabilistic model we used is fairly general and can be applied to other load balancing problems. We a.lso discuss communication overheads, and find that, in the case of mesh multicomputers, unless the network channel bandwidth grows sufficiently as p grows, the network will eventually become a performance bottleneck for distributed hash tables. 1 Introduction Parallel computation achieves speedup over sequential computation by sharing the computational load among processors. The load balance between processors is central in determining the parallel runtime (though other factors also affect performance). Unlike uniform computational tasks in which almost perfect load balance is achieved by allocating data uniformly to the processors, non-uniform computational tasks such as search problems pose non-trivial load balancing problems. In most non-uniform tasks, worst-case computational complexity is far larger than average-case complexity; and the W0rst case is usually a very rare case. Thus, the study of average case performance is important, and it has been conducted for sorting and searching [Knuth 1973], optimization problems [Coffman and Lueker 1991], and many others [Vitter and Flajolet 1990]. However, there seems to have been little work on average-case performance analysis in regard to parallel algorithms, especially on highly-parallel computers, a notable exception being [Kruskal and Weiss 1985]. In this paper, we study the average-case load balance of distributed hash tables on highly parallel computers. A distributed hash table is a parallelization of a hash table, in which the table is divided into subtables of equal size to be allocated to the processors. It can handle a number of search/insert operations simultaneously, increasing the throughput up to p times that of the sequential version, where p is the number of processors. However, in average cases, the peak throughput is not attained due to load imbalance. Intuitively, the more buckets allocated to each processor, the better the average load balance becomes. It is clear that under a constant load factor a = n/m (n is the number of elements in the table, m is the table size), m must grow at least linearly in p to balance the load. We shall investigate the necessary / sufficient rate of growth of m relative to p so that the load balance factor-the average processor load divided by the maximum processor load-approaches 1 as P' -+ 00. It turns out that linear growth is not enough, but that moderate growth-namely, w(p log2 p )-is sufficient. This means that the distributed hash table is a data structure that can exploit the massive computational power of highly parallel computers, with problems of a reasonable size. We also briefly discuss communication overheads on multicomputers, and find that, in the case of mesh multicomputers, unless the network channel bandwidth grows sufficiently as p grows, the network will eventually become a performance bottleneck for distributed hash tables. The rest of the paper is organized as follows. Section 2 describes the distributed hash table and defines the problem we shall analyze. The terminology of average-case scalability analysis is introduced in Section 3. The analysis of load balance is presented in Section 4. The full proofs of the propositions appear in [Kimura and Ichiyoshi 1991]. The communication overheads are considered in Section 5. The last section summarizes the paper. 870 2 2.1 Distributed Hash Tables Distributed Hash Tables The distributed hash table is a parallelization of the hash table. A hash table of size m = pq is divided into subtables of equal size q and the subtables are allocated to p processors. The two most simple bucket allocations are: The block allocation The k- th bucket (k ~ 1) belongs to the ( l( k-1) j qJ + l)-th subtable/ and The modular allocation The k-th bucket (k ~ 1) belongs to the (((k-1) mod p) + 1)-th processor. At the beginning of a hash operation (search or insert) for an element x, the hash function is computed for x to generate a number h (1 ::; h ::; m), and the element (or the key) is dispatched to the processor which contains the h-th bucket. The rest of the operation is processed at the target processor. For better performance, it is desirable to maximize the locality. Thus, when the indirect chaining scheme is employed for hash collision, the entire hash chain for a given bucket should be contained in the same processor which contains the bucket. With open addressing, linear probing has the best locality (under the allocation scheme (1» but its performance degrades quickly as the load factor increases. Other open addressing schemes have better sequential performance characteristics [Knuth 1973], but have less locality. For this reason and also for simplicity of analysis, we choose the indirect chaining scheme. The bucket allocation scheme does not influence the load balance analysis in this case. The absence of a single entry point that can become a bottleneck makes the distributed hash table a suitable data structure for highly parallel processing. The peak throughput increases linearly with the number of processors. The problem is: \iVhen does the "real" performance approach the "peak" performance? When elements are evenly distributed over the processors, linear growth in the number of data elements is sufficient for linear growth in performance. On the other hand, in the worst case, all elements in the hash table might belong to a single subtable so that performance does not increase at all. We are not interested in these two extremes, but in average performance, just as we are more interested in the average complexity of hash operations in sequential hash tables rather than worst-case complexity. lWhen p does not divide m, taking q = fm/pl works but it may lead to a sub-optima.! load balance (e.g., consider the case m = p + 1). A better load balance can be realized by a mapping function which is a little more complicated than simple division. 2.2 Problem Definition There can b·e a number of uses of hash tables depending on the application. Here we examine the following particular use of the hash table. Concurrent Data Generation, Search and Insertion Initially, there is an "old" distributed hash table containing "old elements" and an empty "new" distributed hash table. The old and new tables are of the same size m = pq (p is the number of processors and q is the number of buckets assigned to each processor) and use the same hash function. Also, some "seeds" of new elements are distributed randomly across the processors. (1) Concurrent Data Generation Each processor generates "new elements" from the allocated seeds. It is assumed that the time it takes each processor to generate new elements is proportional to the number of generated elements. (2) Concurrent Data Dispatch Each processor computes the hash values of the new elements and dispatches the elements to the target processors accordingly. (3) Concurrent Search Each processor does a search in the old table for each of the new elements it has received. (4) Concurrent Insert Each processor inserts those new elements that are not found in the old table into the new table. No interprocessor communication arises, because the old and new hash tables use the same hash function. The above usage may seem a little artificial, but the probabilistic model and the analysis for it should be easily applicable to other usages. In the analysis of load balance, the data dispatch step is ignored (equivalently, instantaneous communication is assumed). This is discussed in Section 5. 3 Scalability Analysis Average Speedup and Efficiency We. denote the sequential runtime by T(l) and the parallel runtime using p processors by T(p). The speedup is defined by S(p) = T(l)jT(p), and the efficiency by E(p) = S(p)/p. Efficiency is the ratio between the "real" performance (obtained for a particular problem instance) and the "peak" performance of the parallel computer. In the absence of speculative computation, the efficiency is less than or equal to 1. 871 f is an asymptotic super-isoefficiency function if it is an asymptotic super-isoefficiency function for some E > 0, i.e., the efficiency is bounded away from 0 as Since we intend to engage ourselves in an average-case analysis, we need to define the "average speedup" and the "average efficiency". p --+ Definition 1 We define the average collective speedup O"(p) by E(T(l))/E(T(p)) (E(X) denotes the expectation of X) and the average collective efficiency 'T} (p) by O"(p)/p. The reason why we analyze the above defined average collective speedup, and not the expected speedup in the literal sense-E(T(l)/T(p))-is that: (1) it is much simpler to analyze E(T(l))/E(T(p)) than analyze E(T(1)jT(p)), and (2) in cases where any average speedup figure is meaningful our definition is a better indicator of overall speedup. Suppose we run a number of instances 11 ,12 , .. . from some problem class, then the collective speedup defined by L-i T(l,Ii)j L-i T(p,Ii) (T(l,I;) and T(p,I;) are sequential and parallel runtimes for problem instance Ii) and represent overall speedup. This is more meaningful than anyone of arithmetical mean, geometric mean, or harmonic mean that may be calculated from the individual speedups T(l, Ti)/T(p, Ti). Scalability Analysis and Isoefficiency We would like to study the behavior of 'T}(p) as p becomes very large. In general, for a fixed amount of total computation W, 'T}(p) decreases as p increases, because there is only finite parallelism in a fixed problem. On the other hand, in many parallel programs, for a fixed p, 'T}(p) increases as HI grows. KU111_ar and Rao [1987] introduced the notion of isoefficiency: if HI needs to grow according to f(p) to maintain an efficiency E, then f (p) is defined to be the isoefficiency function for efficiency E. A rapid rate of growth in the isoefficiency function indicates that nearpeak performance of a large-scale parallel computer can be attained only when very-sometimes unrealisticallylarge problems are run. Such a parallel algorithm and/or data structure is not suitable for utilizing a large-scale parallel computer. (We will refer to the isoefficiency by this original definition by exact isoefficiency.) Since it is sometimes impossible to maintain an exact E because of the discrete nature of the problem, the following weaker definitions of isoefficiency may be more suitable or easier to handle. Asymptotic Isoefficiency f is an asymptotic isoefficiency function for E if lim 'T}(p) p->co =E under TiV = f(p). Asymptotic Super-Iso efficiency f is an asymptotic 8upel'-isoefficiency function for E if lim inf 'T}(p) 2: E p--+co under VV = f(p). 00. An exact isoefficiency function for E is an asymptotic isoefficiency Junction for E; and an asymptotic isoefficiency function for E is an asymptotic super-isoefficiency function for E. In the analysis of load balance, we study the balance of essential computation. Essential computation is the total computation performed by processors excluding the parallelization overheads. The amount of essential computation is equal to pT(p) minus the total overhead time spent on things such as message handling and idle time. In the absence of speculative computation, we can identify the amount of essential computation with the sequential runtime. 2 The terminology for load balance analysis is defined like that for speedup/efficiency analysis, except that "essential computation" replaces "runtime": the total essential computation corresponds to sequential runtime; maximum processor load corresponds to parallel runtime; and load balance factor3 corresponds to efficiency. We use the same terminology for isoefficiency functions. In the following analysis, we study asymptotic isoefficiency for 1 and asymptotic super-isoefficiency. (Since we are not dealing with exact isoefficiency, we drop the adjective "asymptotic" for brevity.) 4 4.1 Analysis of Load Balance Assumptions For the sake of probabilistic analysis, we consider a model in which the following values are treated as random variables (RVs): the number of old and new elements belonging to the j - th bucket on the i-th processor (1 ::; i ::; p, 1 ::; j ::; q) denoted by Aij and Bij respectively, and the number of new elements generated at the i-th processor denoted by Gi . First, we make some assumptions on the distributions of these random variables. The two alternative models of h~sh tables are the Bernoulli model in which the number of elements n inserted in m buckets is fixed (a = n/m) and the probability that an element has a given hash value is uniformly l/m, and the Poisson model in which the occupancy of each bucket is an independent Poisson random variable with parameter a [Vitter and Flajolet 1990]. We choose the Poisson model, because it is simpler to analyze directly, and because, with regard to the distributions of maximum 2If we ignore various sequential overheads such as cache miss, process switching, and paging. 3Not to be confused with the load factor of hash tables. 872 bucket occupancy in which we are interested, those under the Bernoulli model approach those under the Poisson l11.odel as m ~ 00 [Kolchin et al. 1978J. For a similar reason, we assume that G i (1 ~ i ~ p) are independent identically distributed (i.i.d.) random variables having a Poisson distribution with some parameter ,. It follows that the total number of new elements has and by the asa Poisson distribution with parameter sumption on the hash function, Bi/s are i.i.d. random variables having a Poisson distribution with parameter fJ = p, / m = ,/ q. We assume that load factors a and fJ of the old and new hash tables are constant (do not change with p, q). To summarize, Aij and Bij are i.i.d. random variables having a Poisson distribution with parameters a and fJ, and G i are i.i.d. random variables with a Poisson distribution with parameter qfJ. Note that Gi's and Bk'S are not independent because 2:i G i = 2:ij B ij . p" Thus, the total essential computation is: L l~i~p (ignoring the constant factor). As for the search step, some searches are successful (the new element is found in the old table) and others are unsuccessful. For simplicity of analysis, we choose a pessimistic estimate of the essential computation and assume that all searches are unsuccessful. We also assume that an unsuccessful search involves comparison of the new elements against all the old elements in the bucket. Thus, the number of comparisons made by an unsuccessful search in the bucket with Aij elements is Aj + 1 (the number of elements plus one for the hash table slot containing the pointer to the collision chain). Therefore, the essential computation of the search step is: Hisearch = L L (Aij + 1)Bij . l::;i::;p l::;j::;q (again ignoring the constant factor). \~Te make a similar assumption for the insert step: every insert is done after an unsuccessful search in the new table. Thus, the essential computation of the search step for bucket j on processor i is: L (l + 1) = Bij(Bij + 1)/2, 0:SI:SBij-l and the total essential computation for the search step IS Vliinsert = L L l~i:Sp l~j~q Bij(Bij + 1)/2. + WI' + W:"), where WI = Gi , WI' = 2:1:S;j:S;q(Aij + l)Bij , WI" = 2:1:S;j~q BiABij and + 1)/2. The maximum processor load is W(p) = l~i:S;p' max(W~ + W~' + W~") t , The average load balance factor 7](p) is E (W(1)) /pE (W(p)). We would like to know what rate of growth of q is necessary/sufficient so that 7](p) ~ 1 as p ~ 00. Since E ( Essential Computation and Load Balance Factor Since each data generation is assumed to take the same time, the essential computation of the data generation step is: lVgen = Gi. (W: l~i~p L (WI l~i~p 4.2 L W(l) = E ( + WI' + WI")) L WI) l~i~p +E (L l~i~p WI') +E L ( WI") l$i:S;p + E (W;') + E (W;")) , p(E (W;) and E (m~x(W: + WI' + WI")) l~t:S;p ~ E (m~x WI) + E (m~x WI') + E (m~x Wf"), l~t~p l~t~p l:S;t~p we have 7](p) + E (W") + E (Will) 1 1 . + E (m~x WI') + E (m~x WI") E (W') ~ E (m~x WI) l~t~p 1 l~t~p l~t~p Thus, if E (max l$i~p WI) E(WD, E (max WI') E (Wn, and E(max W~") E (W;") l$i~p l$i~p t (as p ~ 00), then 7](p) ~ 1. The above are also necessary conditions, because all three summands are significant as p ~ 00. The random variable Gi , having a Poisson distribution with parameter qfJ, has the same distribution as the sum of q i.i.d. random variables Hij (1 ~ j ~ q) with a Poisson distribution with parameter fJ. Thus, we are led to the study of the average maximum of p sums of q i.i.d. random variables Wij (1 ~ i ~ p, 1 ~ j ~ q) with a distribution that does not change with p and q. In our distributed hash table example, we are interested in the cases in which each W ij is either a Poisson variable, the product of two Poisson variables, or a polynomial of a Poisson variable. 873 4.3 Average Maximum of Sum of i.i.d. Random Variables We give sketches of the proofs or cite the results. The details are presented in [Kimura and Ichiyoshi 1991]. 4.3.1 Poisson Variable The asymptotic distribution of the maximum bucket occupancy has been analyzed by Kolchin et al. [1978]. The following is the result as cited m [Vitter and Flajolet 1990]. Theorem. 1 (Kolchin et al.) If Xi (1 ~ i ~ p) are i. i. d. random variables having a Poisson distribution with parameter /-l) the expected maximum bucket occupancy is Ai J.1 where = E (max Xi) l::;i::;p b is rv 1 (b + I)! - p e-J.1/-lb b! Lemma 2 Let Xi (1 ~ i ~ p) be i.i.d. random variables distributed as X. For all ai 2 0 (1 ~ i ~ p) such that al + ... + a p = 1) alX I + ... + apXp -< X. SKETCH OF PROOF: Let all az 2 0 and al + az = l. For arbitrary c 20, max{alXI +azXz,c}+max{a1XZ + azXt,c} ~ max{X1,c}+max{Xz,c}. The expectation of the left hand side is equal to 2E (max{ alXI + azXz , c}), and that of the right hand side is E (max{XI' c} + max{Xz,c}) = E (max{XI' c}) + E (max{Xz,c}) = 2E(max{X,c}). Lemma 3 Let Xi (1 ~ i ~ r s) and i.i.d. random variables. We have = 8(1)) b rv logp/loglogp. The proof is based on the observation that, as p becomes large, P {MJ.1 > b} as a function of b approaches the step function having value 1 for b smaller than band o for b larger than b, and the expectation of MJ.1 is equal to its summation from b = 0 to b = 00. Vie extend Kolchin's theorem to the product of Poisson variables and polynomials of a Poisson variable. 4.3.2 For the convex sum of i.i.d. variables, we have the following lemma. Thus, alXI + azXz -< X. The case for p > 2 can be reduced to p - 1 using the above. 0 Finally, the following lemma gives an upper bound on the sum of the product of two sets of i.i.d. random variables. - - - - < - < ---. When /-l max{XI,XZ, .. . ,Xp} -< max{Yi,Xz, ... ,Xp} -< ... -< max{Yi, ... , Yp} 0 {b/-l i!if /-l/-l == w(1ogp)) o(logp)j an integer gl'eater than /-l such that e-J.1/-lb+1 SKETCH OF PROOF: Product of Two Poisson Variables We introduce a partial order on the class M of nonnegative random variables with a finite mean. XlYi -< Y iff Lemma 1 Let Xi (1 ~ i ~ p) and ii (1 ~ i ~ p) be i.i.d. random variables distributed as X and Y. If X -< Y) then r s) be + ... + XrsYrs -< (Xl + ... + Xr)(Yi + ... + Ys). + Theorem 2 Let Xi (1 ~ i ~ q) and ii (1 ~ i ~ q) be i. i. d. having a Poisson distribution with parameter a and ;3. If q = w(1ogZ p)) then (as p (~i't" lEo Xi; Yij) -+ 00). SKETCH OF PROOF: There are a number of natural properties concerning this partial order. For example, if X -< Y and Z is independent of X, Y, then X + Z -< Y + Z, xz -< YZ, max{X, Z} -< max{Y, Z}, etc. Note X -< Yi and X -< Yz do not imply 2X -< Yi +Y;. The utility of -< in analyzing the expected maximum is illustrated by the following lemma. ~ i ~ SKETCH OF PROOF: We can prove XlYi + ... XtYt+Z -< Xl (Yi + ... +Yt) Z (Z independent of XiS, iis) by conditioning Z and using Lemma 2. By repeatedly "collecting" the Xij iij's and replacing them with the bracketed 0 terms, we have the desired result. E Definition 2 For X, Y E M, we define X E (max{X, c}) ~ E (max{Y, c}) for all c 2 o. ii (1 E (max l::;i::;p (X·I}':;1 t t Let q = r 2 , + ... + X·tq }':;tq )) ~ E (max l::;i::;p (X·t I + ... ~ E + X·tr )(}':;l + ... + }':;tr )) t (m~x(Xil + ... )) E (m~x(YiI + ... )) l::;t::;p l::;t::;p by the Lemma 2 and 3. The sum of r i.i.d. Poisson variables with parameter a is distributed as a Poisson variable with parameter ra. Thus, if r = w(logp), then E (~t;~(Xil + ... + X ir )) rv ra = E (Xn + ... + X 1r ) by Kolchin's theorem. This is similar for the sum of iij. This is what we needed. 0 874 4.3.3 Polynomial of Poisson Variable The treatment of upper bounds on the expected maximum of the sums of a polynomial of i.i.d. random variables is more involved. We only list the result. Theorem 3 Let Xi (1 :::; i :::; q) be i. i. d. having a Poisson dist1'iblltion with parameter 0, and c(X) be a polynomial of degree d > 0 with non-negative coefficients. If q = w(logd p), then (as p --T (0), where c(X) = adX(d) + ... + aiX(l) + a~, and c*(X) = adXd + ... + aiX 1 + a~ (X(k) = X(X 1)··· (X - k + 1) is the falling power of X). As for corresponding lower bounds on the necessary growth rate of q, we only know at present that if q = o( (log p / log log p)2), the ratio between the expected maximum and the mean tends to 00 as p --T 00. 4.4 The Isoefficiency for Load Balance Now, let us suppose q = w(log2 p). Then, is immediate from Kolchin's theorem. Also, 1024, and 4096 and q = 1, 19p, Ig2 p, and Ig3 p (lg denotes the logarithm with base 2). The experimental load balance factors (on the vertical axis) are plotted against the number of processors (on the horizontal axis). The experimental load balance factor "'p,q for p, q are calculated by E (2:\$j$q WIi ) "'p,q = ~============~~ max1$i$p 2:1$j$q Wij , where Wij is one of Xij, Xii Yij and Xi~) «a), (b) and (c), respectively in the figure), and the average max1$i$p 2:1$j$q Wij is calculated from the result of 50 simulation runs. Xii and Yii are generated according to the Bernoulli model (i.e., a table X[l..pq] is prepared, and n = pqa. random numbers x's with x ~ 0 were generated, each x mod p) + I)-th table entry, etc.). The. going to the coefficient of variation (the ratio of standard deviation to average) of max1$i$p 2:1$j$q W ij is larger for X(2) and XY than for X, and it decreases as p becomes larger or q becomes larger. Table 1 gives the coefficients of variation for p = 64 and 4096. By and large, the results seem to confirm the asymptotic analysis. For the product and the second falling power, 0 (log2 p) appears to be a sufficient rate of growth of q for 'T/ to converge to 1. Even logarithmic growth (q = 19 p) does not lead to very poor load balance factors at least up to p = 4096 (approx. 0.5 for XY and approx. 0.4 for X(2). «x 5 by Kolchin's theorem and the proposition for the product of two Poisson variables. Finally, since X(X + 1)/2 is a polynomial of degree 2, if q = w(1og2 p). vVe ha.ve shown that if q = w(log2 p), the average collective load balance factor 'T/(p) --T 1 as p --T 00. Therefore, HI = 0(pq) = w(plog2 p) is a sufficient condition for isoefficiency for 1. 4.5 Simulation Communication Overheads We briefly discuss the communication overheads when distributed hash tables are implemented on multicomputers. A multicomputer (also referred to as a distributed-memory computer and a message-passing parallel computer) consists of p identical processors connected by some interconnection network. On such computers, the time it takes to transfer a message of length L (in words) from a processor to another which is D hops away in the absence of network contention4 is ts+thD+twL, where ts is the constant start-up time, th is the per-hop time, and tw is the per-word communication time. We choose the mesh architecture for consideration (two-dimensional square meshes in particular) since many of the recent "second generation" multicomputers have such topologies. Examples include J-Machine, Intel Paragon, and parallel inference machines Multi-PSI and PIM/m. We note that the average traveling distance of a random message (a message from a randomly chosen processor i to another randomly chosen processor i', allowing i = if) is 3(JP- )p) 3VP on the meshes. It is roughly f'V A simple simulation program was run to test the applicability of the asymptotic analysis for p up to 4096. Fig. 1 shows the results for 0 = f3 = 4, p = 4, 16, 64, 256, 4Communication latency in the absence of network contention is called zero-load latency. 875 (8) X (b) XV Load 1.0 Balance ~ Factor 0.8 .......... / ~ ...... ........•. .................... ~...... (c) X(X-1) 1.0..--------------, 1.0..--------------, 0.8 ........·.............•.....·••····••••.. •......,,· ......... ·a 0.8 ..•.............•..............••..•.••..•....•.............•............•....... ~C 0.6 .•.••••... ~ 0.6 ••.••.•..• ~•..... 0.4+---~=--~-----; 0.4+---~---------; 0.2 ................................................................................ . -0-- q (Ig p)A2 0.2 ............................................................................. . = q =Igp q=1 O.O+----~--....------.-....j 1 10 100 1000 Number 10000 0.4+--~----=-..::::::::===------l 0.0+----------------1 1000 10000 1 10 100 O.O+-------------...j 1000 10000 10 100 1 0' Proceaaora Figure 1: Experimental Load Balance Factors (0: = (3 = 4) Table 1: Coefficients of Variation of Maximum Load (0: X XY X~2) q=1 11.0% 17.8% 24.8% p = 64 q=6 q = 36 6.3% 2.6% 12.2% 5.0% 13.1% 6.0% 1/3 of the diameter of the network, which is 2(.JP - 1). We can easily see that HI = n(p3/2) is a necessary and sufficient condition for super-isoefficiency due to zeroload latency, which is a situation worse than that due to load imbalance. In real networks, the impact of message collisions must be taken into account. Instead of estimating the time required for data dispatch using a precise model of contention, we compare the amount of traffic generated by random communication and the capacity of the network. The traffic of a message is defined by the product of its traveling distance and its length. It indicates how much network resource (measured by channel x network cycle time) the message consumes. The capacity of a network is defined by the sum of the bandwidth of all network channels (channels that connect routers). It indicates the peak throughput of the netw-ork. The basic fact is that the time required for completely delivering a set of messages is at least !'I1/ C, where !'I1 is the total message traffic and C is the capacity of the network. The average traffic generated by L:l. success reduct ion = 57 Q=Cappend(Ca.b.c] .Cd] .@17)] A=ClI2,3] B=Ca.b,c.d] ?_ r ~% Sample program "" test (Q,A.B):- true: append(Cl,2l.C3].A) • geLq(Q) • append ([a.b .c]. Cd] .B). append (CH:n.V .Z):- true: Z=CH:Z2] • append

, 1, ... ,n with respect to I and X, there exists a 'basic plus Rule E6' simplification chain I /\ , 11 /\ ~, ... "n+k /\~+k' where k ~ 0 is the number of "extra" variable elimination steps. Since, according to Proposition 3.3, basic simplification chains are finite, so are entailment simplification chains. D So far we know that we can compute for any basic constraint a normal form 'lj; with respect to I and X by applying the simplification rules as long as they are applicable. Although the normal form 'lj; may not be unique, we know that 1/\ and ,/\ 'lj; are equivalent in every model of FTo. Proposition 5.6 For every basic constraint one can compute a normal form 'lj; with respect to I and X. Every such normal form 'lj; satisfies: I I=r ?JX iff I I=r ?JX'lj;, and I I=FT ?JX iff I I=FT ?JX'lj;. Proof. Follows from Propositions 5.4, 5.5, 4.2 and 4.l. D In the following we will show that from the entailment normal form 'lj; of with respect to I it is easy to tell whether we have entailment, disentailment or neither. Moreover, the basic normal form of 1/\ is exactly 1/\ 'lj; in the first case (and in the second, where I /\ 1- = 1-), and "almost" in the third case (cf. Lemma 5.3). Proposition 5.7 A basic constraint i- 1- is normal with respect to I and X if and only if the following conditions are satisfied: 1. is solved, X -oriented, and contains no variable that is bound by ,; 2. if :v = y and xfu E I, then yfv 3. if:v = y andxfu E I 4. if :v = y and Ax 5. if :v = y rf. for every v; andyfv E I, thenu E I, then By rf. for = B. Lemma 5.8 If i- 1- is normal with respect to I and X, then 1/\ is satisfiable in every model of FT. #- 1- be normal with respect to I and X. Furthermore, let I = IN /\ IG and = N /\ G be the unique decompositions in normalizer and graph. Since the variables bound by IN occur neither in IG nor in , it suffices to show that IG /\ N /\ G is satisfiable in every model of FT. Let NbG) be the basic constraint that is obtained from IG by applying all bindings of N' Then IG /\ N /\ G is equivalent to N /\ N( IG) /\ G and no variable bound by N occurs in NbG) /\ G. Hence it suffices to show that N( IG) /\ G is satisfiable in every model of FT. With the conditions 2-5 of the preceding proposition Proof. Let Theorem 5.9 (Disentailment) Let 'lj; be a normal form of cP with respect to I and X. Then I I=r ...,?JX cP iff 'lj; = 1-. Proof. Suppose 'lj; = 1-. Then I I=r ...,::IX'lj; and hence I=r ...,?JX cP by Proposition 5.6. To show the other direction, suppose I I=r ...,?JX cP. Then I I=r ...,?JX'lj; by Proposition 5.6 and hence 1/\ 'lj; unsatisfiable in T by Proposition 4.2. Since T is a model of FT (Theorem 2.1), we know by the preceding lemma I that 'lj; = 1- (since 'lj; is assumed to be normal). D We say that a variable x is dependent in a solved constraint cP if cP contains a constraint of the form Ax, xfy or x ~ y. (Recall that equations are ordered; thus y is not dependent in the constraint x == y.) We use 1JV(cP) to denote the set of all variables that are dependent in a solved constraint cP. In the following we will assume that the underlying signature S l±J F has at least one sort and at least one feature that does not occur in the constraints under consideration. This assumption is certainly satisfied if the signature has infinitely many sorts and infinitely many features. Lemma 5.10 (Spiting) Let cPl, ... , cPn be basic constraints different from 1-, and Xl, . .. , Xn be finite sets of variables disjoint from Vb). Moreover, for every i = 1, ... ,n, let cPi be normal with respect to I and Xi, and let cPi have a dependent variable that is not in Xi. Then 1/\ ...,?JX1cP1 /\ ... /\ ...,?JXncPn is satisfiable in every model of FT. = v; every B; and Ax E I and By E I, then A it is easy to see that cPN(!G) /\ cPG is a solved clause. Hence we know by axiom scheme Ax3 that cPN( ,G) /\ cPG is satisfiable in every model of FT. D Proof. Let I = IN /\ IG be the unique decomposition of I into normalizer and graph. Since the variables bound by IN occur neither in IG nor in any i, it suffices to show that IG /\ ...,?JX11 /\ ... /\ ...,?JXncPn is satisfiable in every model of FT. Thus it suffices to exhibit a solved clause 6 such that IG ~ 6 and, for every i = 1, ... ,n, V(6) is disjoint with Xi and 6/\ cPi is unsatisfiable in every model of FT. Without loss of generality we can assume that every Xi is disjoint with V ( I) and V ( cPj) - X j for all j. Hence we can pick in every cPi a dependent variable Xi such that :Vi ~ Xj for any j. Let Zl, .•. ,Z/e be all variables that occur on either side of equation Xi == Y E cPi, i = 1, ... , n (recall that Xi is fixed for i). None of these variables occurs in any Xj since every cPi is Xi-oriented. Next we fix a feature 9 and a sort B such that neither occurs in I or any cPi. Now 6 is obtained from I by adding constraints as follows: if A:Vi E cPi, then add BXi; if xdY E cPi, then 1020 add ;£d i; to enforce that the variables pairwise distinct, add Zl, ... , Zk are It is straightforward to verify that these additions to '1 yield a solved clause {j as required. 0 Proposition 5.11 If cP is solved and 'DV(cP) FT 1= V3XcP. ~ X) then Proof. Let cP = cPN 1\ cPG be the decomposition of cP in normalizer and graph. Since every variable bound by cP is in X, it suffices to show that V3X cPG is a consequence of FT. This follows immediately from axiom scheme Ax3 since ¢G is a solved clause. 0 Theorem 5.12 (Entailment) Let'lj; be a normal form of cP with respect to '1 and X. Then '1 1=7 3X cP iff 'lj; i- -.L and 'DV ('lj;) ~ X. Proof. Suppose '1 1=7 3X cPo Then we know '1 1=7 3X'lj; by Proposition 5.6, and thus '11\ ,3X'lj; is unsatisfiable in T. Since '1 is solved, we know that '1 is satisfiable in T and hence that '11\ 3X'lj; is satisfiable in T. Thus 'lj; i- -.L. Since '1 1\ ,3X'lj; is unsatisfiable in T and T is a model of FT, we know by Lemma 5.10 that 'DV('lj;) ~ X. To show the other direction, suppose 'lj; i- -.L and 'DV('lj;) ~ X. Then FT 1= V3X'lj; by Proposition 5.11, and hence T 1= V3X'lj;. Thus '1 1=7 3X'lj;, and hence '1 1=7 3X ¢ by Proposition 5.6. 0 Theorem 5.13 Let ¢ be a basic constraint. Then '1 3X¢ iff'1 I=FT 3X¢. 1=7 Proof. One direction holds since T is a model of FT. To show the other direction, suppose '1 1=7 3X cPo Without loss of generality we can assum~ that cP is normal with respect to '1 and X. Hence we know by Theorem 5.12 that cP i- -.L and 'DV( 'lj;) ~ X. Thus FT 1= V3X cP by Proposition 5.11 and hence '1 I=FT 3X¢. 0 Theorem 5.14 (Independence) Let cPl, ... , cPn be basic constraints) and Xl, ... ,Xn be finite sets of variables. Then Proof. To show the nontrivial direction, suppose '1 1=7 3X1cP1 V ... V 3XncPn. Without loss of generality we can assume that, for all i = 1, ... , n, Xi is disjoint from Vb), cPi is normal with respect to '1 and Xl, and cPi i- -.L. Since '11\,3X1 cP11\ . . . 1\,3Xn cPn is un satisfiable in T and T is a model of FT, we know by Lemma 5.10 that 'DV( cPk) ~ X k for some k. Hence '1 1=7 3XkcPk by Theorem 5.12. 0 6 Conclusion We have presented a constraint system FT for logic programming providing a universal data structure based on rational feature trees. FT accommodates recordlike descriptions, which we think are superior to the constructor-based descriptions of Herbrand. The declarative semantics of FT is specified both algebraicly (the feature tree structure T) and logically (the first-order theory FT given by three axiom schemes). The operational semantics for FT is given by an incremental constraint simplification system, which can check satisfiability of and entailment between constraints. Since FT satisfies the independence property, the simplification system can also check satisfiability of conjunctions of positive and negative constraints. We see four directions for further research. First, FT should be strengthened such that it subsumes the expressivity of rational constructor trees [7, 8]. As is, FT cannot express that ;£ is a tree having direct subtrees at exactly the features 11, ... ,In. It turns out that the system CFT [24] obtained from FT by adding the primitive constraint (;£ has direct subtrees at exactly the features f1, ... ,fn) has the same nice properties as FT. In contrast to FT, CFT can express constructor constraints; for instance, the constructor constraint ;£ == A( y, z) can be expressed equivalently as A;£ 1\ ;£{1, 2} 1\ ;£ly 1\ ;£2z, if we assume that A is a sort and the numbers 1,2 are features. Second, it seems attractive to extend FT such that it can accommodate a sort lattice as used in [1, 3, 4, 5, 23]. One possibility to do this is to assume a partial order :S on sorts and replace sort constraints A;£ with quasi-sort constraints [A]x whose declarative semantics is given as [A]x == V B;£. B::;A Given the assumption that the sort ordering :S has greatest lower bounds if lower bounds exist, it seems that the results and the simplification system given for FT carry over with minor changes. Third, the worst-case complexity of entailment checking in FT should be established. We conjecture it to be quasi-linear in the size of '1 and cP, provided the available features are fixed a priory. Fourth, implementation techniques for FT at the level of the Warren abstract machine [2] need to be developed. References [1] H. Ai't-Kaci. An algebraic semantics approach to the effective resolution of type equations. Theoretical Computer Science, 45:293-351, 1986. 1021 [2] H. Alt-Kaci. Warren's Abstract Machine: A Tutorial Reconstruction. The MIT Press, Cambridge, MA,1991. [3] H. Alt-Kaci and R. Nasr. LOGIN: A logic programming language with built-in inheritance. The lournal of Logic Programming, 3:185-215, 1986. [4] H. Alt-Kaci and R. Nasr. Integrating logic and functional programming. Lisp and Symbolic Computation, 2:51-89, 1989. [5] H. Alt-Kaci and A. Podelski. Towards a Meaning of LIFE. Proceedings of the 3rd International Symposium on Programming Language Implementation and Logic Programming (Passau, Germany), J. Maluszynski and M. Wirsing, editors. LNCS 528, pages 255-274, Springer-Verlag, 1991. [6] R. Backofen and G. Smolka. A complete and decidable feature theory. Draft, German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, 6600 Saarbrucken 11, Germany, 1991. To appear. [7] A. Colmerauer. Equations and inequations on finite and infinite trees. In Proceedings of the 2nd International Conference on Fifth Generation Computer Systems, pages 85-99, 1984. [8] A. Colmerauer, H. Kanoui, and M. V. Caneghem. Prolog, theoretical principles and current trends. Technology and Science of Informatics, 2(4):255292, 1983. [9] S. Haridi and S. Janson. Kernel Andorra Prolog and its computation model. In D. Warten and P. Szeredi, editors, Logic Programming, Proceedings of the 7th International Conference, pages 31-48, Cambridge, MA, June 1990. The MIT Press. [10] J. J affar and J.-L. Lassez. Constraint logic programming. In Proceedings of the 14th ACM Symposium on Principles of Programming Languages, pages 111-119, Munich, Germany, Jan. 1987. [11] M. Johnson. Attribute- Value Logic and the Theory of Grammar. CSLI Lecture Notes 16. Center for the Study of Language and Information, Stanford University, CA, 1988. [12] R. M. Kaplan and J. Bresnan. Lexical-Functional Grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pages 173-381. The MIT Press, Cambridge, MA, 1982. [13] M. Kay. Functional grammar. In Proceedings of the Fifth Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, 1979. Berkeley Linguistics Society. [14] J.-L. Lassez, M. Maher, and K. Marriot. Unification revisited. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming. Morgan Kaufmann, Los Altos, CA, 1988. [15] J. L. Lassez and K. McAloon. A constraint sequent calculus. In Fifth Annual IEEE Symposium on Logic in Computer Science, pages 52-61, June 1990. [16] M. J. Maher. Logic semantics for a class of committed-choice programs. In J.-L. Lassez, editor, Logic Programming, Proceedings of the Fourth International Conference, pages 858-876, Cambridge, MA, 1987. The MIT Press. [17] K. Mukai. Partially specified terms in logic programming for linguistic analysis. In Proceedings of the 6th International Conference on Fifth Generation Computer Systems, 1988. [18] K. Mukai. Constraint Logic Programming and the Unincation of Information. PhD thesis, Tokyo Institute of Technology, Tokyo, Japan, 1991. [19] M. Nivat. Elements of a theory of tree codes. In M. Nivat, A. Podelski, editors, Tree Automata (Advances and Open Problems), Amsterdam, NE, 1992. Elsevier Science Publishers. [20] W. C. Rounds and R. T. Kasper. A complete logical calculus for record structures representing linguistic information. In Proceedings of the 1st IEEE Symposium on Logic in Computer Science, pages 38-43, Boston, MA, 1986. [21] V. Saraswat and M. Rinard. Concurrent constraint programming. In Proceedings of the 7th Annual A CM Symposium on Principles of Programming Languages, pages 232-245, San Francisco, CA, January 1990. [22] G. Smolka. Feature constraint logics for unification grammars. The lournal of Logic Programming, 12:51-87, 1992. [23] G. Smolka and H. Alt-Kaci. Inheritance hierarchies: Semantics and unification. lournal of Symbolic Computation, 7:343-370, 1989. [24] G. Smolka and R. Treinen. Relative simplification for and independence of CFT. Draft, German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, 6600 Saarbrucken 11, Germany, 1992. To appear. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1022 Range Determination of Design Parameters by Qualitative Reasoning and its Application to Electronic Circuits Masaru Ohki, Eiji Oohira, Hiroshi Shinjo, and Masahiro Abe Central Research Laboratory, Hitachi, Ltd. Higashi-Koigakubo, Kokubunji, Tokyo 185, Japan ohki@crl.hitachi.co.jp Abstract There are numerous applications of qualitative reasoning to diverse fields of engineering. The main application has been to diagnosis, but there are a few applications to design. We show a new application to design, suggesting valid ranges for design parameters; this application follows the step of structure determination. The application does not provide more innovative design, but it is one of the important steps of design. To implement it, we use an envisioning mechanism, which determines all possible behaviors of a system through qualitative reasoning. Our method: (1) performs envisioning with design parameters whose values are initially undefined, (2) selects preferable behaviors from all possible behaviors found by the envisioning process, and (3) calculates the ranges of those design parameters that give the preferable behaviors. We built a design-support system Desq (D..e.sign ~upport system based on u,ualitative reasoning) by improving an earlier qualitative reasoning system Qupras (Q.u.alitative Rhysical reasoning ~stem). We added three new features: envisioning, calculating the undefined parameters, and propagating new constraints on constant parameters. The Desq system can deal with quantities qualitatively and quantitatively, like Qupras. Accordingly, we may someday be able to determine the quantitative ranges, if the parameters can be expressed quantitatively. Quantitative ranges are preferable to qualitative values, to support the determination of design parameters. 1 Introduction Recently, many expert systems have been used in the diverse fields of engineering. However, several problems still exist. One is the difficulty of building knowledge bases from the experience of human experts. The other is that these expert systems cannot deal with unimaginable situations [Mizoguchi 87]. Reasoning methods using deep knowledge, which is the fundamental knowledge of a domain, are expected to solve these problems. One reasoning method is qualitative reasoning [Bobrow 84]. Qualitative reasoning determines dynamic behaviors, which are the states of a dynamic system and its state changes, using deep knowledge of the dynamic system. Another feature of qualitative reasoning is that it can deal with quantities qualitatively. So far, there have been many applications of qualitative reasoning to engineering [Nishida 88a, Nishida 88b, Nishida 91]. The main application has been to diagnosis [Yamaguchi 87, Ohwada 88], but recently there have also been applications to design [Murthy 87, Williams 90]. In this paper, we show a new application to design that supports decisions by suggesting valid ranges for design parameters; it follows the step of structure determination. This application is not considered to be more innovative than the previous applications to design [Murthy 87, Williams 90], but it is one of the important steps of design [Chandrasekaran 90]. The key to design support is applying an envisioning mechanism, which predicts the behaviors of the dynamic system, to those design parameters whose values are undefined. If the envisioning is performed on condition that the design parameters whose values a designer wants to determine are undefined, all possible behaviors under the undefined design parameters can be predicted by the envisioning process. Some hypotheses are made to obtain each behavior. The main reason why hypotheses are made is that conditions written in the definitions of objects and· physical rules cannot be evaluated because the design parameters are undefined. Among the obtained possible behaviors, more than one behavior desired by the designer is expected to exist. The designer can select the behaviors which he/she exactly prefers. Although the designer may not know the values of the design parameters, helshe knows the desired. behavior. The values of the undefined parameters can be calculated from the hypotheses made to obtain the desired behavior. 1023 To sum up, the method of determining valid ranges for design parameters offers the following: (1) Performs envisioning for design parameters whose values are initially undefined, (2) Selects preferable behaviors from possible behaviors found by the envisioning process, and (3) Calculates the ranges of those design parameters that give the preferable behaviors. We used a qualitative reasoning system Qupras (Qualitative llhysical reasoning .system) [Ohki 86, Ohki 88, Ohki 92] to construct a decision support system Desq (~sign §.uppoit system based on ,Qualitative reasoning) that suggests valid ranges for design parameters. Qupras, using knowledge about physical rules and objects after being given an initial state, determines the followings: (1) Relations between objects that are components of physical systems. (2) The subsequent states of the system following a transition. We extended Qupras to construct Desq as follows: (1) Envisioning In Qupras, if a condition of a physical rule or an object cannot be evaluated, Qupras asks the user to specify the condition. We extended Qupras to allow it to continue assuming an unevaluated condition. (2) Calculating the undefined parameters After envisioning all possible behaviors, Desq calculates the ranges of the undefined design parameters that give the behavior specified by the designer. (3) Propagation of new constraints on constants In the envisioning process, constraints related to some constant parameters become stronger because conditions in the definitions of physical rules and objects are hypothesized. The constraints propagate to the subsequent states. (4) Parallel constraint solving Qupras uses a combined constraint solver consisting of three basic constraint solvers: a Supinf method constraint solver, an Interval method constraint solver, and a Groebner base method constraint solver, all written in ESP. The processing load of the combined constraint solver was heavy, so we converted it to KL1 to speed up processing. Desq can deal with quantities qualitatively and quantitatively like Qupras. Accordingly, we may someday be able to get quantitative ranges, if the parameters can be given as quantitative values. Quantitative ranges may be preferable for decision support. The usual qualitative reasoning like [Kuipers 84] gives qualitative ranges. Section 2 shows how Desq suggests ranges for design parameters, Section 3 describes the system organization of Desq, Section 4 shows an example of Desq suggesting the value of a resistor in a DTL circuit, Section 5 describes related works and Section 6 summarizes the paper. 2 Method of determining design parameters In design, there are many cases in which a designer does not directly design a new device, but changes or improves an old device. Sometimes designers only change parameters of components in a device to satisfy the requirements. The designer, in such cases, knows the structure of the device, and needs only to determine the new values of the components. This is common for electronic circuits. We apply qualitative reasoning to the design decisions. The key process used to determine design parameters is envisioning. Our method is as described in Section 1: (1) All possible behaviors of a device are found by envisioning, with design parameters whose values are initially undefined. (2) Designers select preferable behaviors from these possible behaviors. (3) The ranges of the design parameters that give the preferable behaviors are calculated using a parallel constraint solver. If a condition in the definitions of a physical rule or an object cannot be evaluated, Desq hypothesizes one case where the condition is valid and another where it is not valid, and separately searches each case to find all possible behaviors. This method is called envisioning, and is the same as [Kuipers 84]. If a contradiction is detected, the reasoning is abandoned. If no contradiction is detected, the reasoning is valid. Finally, Desq finds several possible behaviors of a device. The characteristics of this approach are as follows: (1) Only deep knowledge is used to determine design parameters. (2) All possible behaviors with regard to undefined design parameters are found. Such information may be used in safety design or danger estimation. (3) Ranges of design parameters gIVIng preferable behaviors are found. If a designer uses numerical CAD systems, for example, lO24 Initial state DTL Circuit is satisfied. Desq similarly hypothesizes this condition. Finally, Desq finds two possible behaviors for the initial data. Then, Desq calculates the resistance Rb. The resistance must be larger than 473 ohms to give the desired behavior, where the circuit acts as a NOT circuit because the transistor is "on". If the resistance is smaller than 473 ohms, the circuit shows another behavior which is not preferable. Thus, the resistance Rb must be larger than 473 ohms. This proves that Desq can deal with quantities qualitatively and quanti tatively. 5V 5 V --t.....-4I....I-r--+-l + Unevaluatedcondition: VoltageofDl< 0.7 V ? }I' " Hypothesis 1 (VoltageofDl ~O. 7 V) Hypothesis (VoltageofDl < 0.7 V) .' I • JI' Conflict 2 t Unevaluated condition: Base Voltage ofTr~ O. 7 V lHlypothesis 1 / " ? lHlypothesis 2 (Base Voltage ofTr ~ 0.7 V (Tr On)) (Base Voltage of Tr < O. 7 V (Tr Oft)) t 3 System organization This section describes the system organization of Desq. Figure 2 shows that Desq mainly consists of Undefined parameter Rb ~473 0 473 0 >Undefined parameter Rb ~O 0 three subsystems: Figure 1 An example of deciding an undefined parameter (1) Behavior reasoner This subsystem is based on SPICE, helshe need not simulate values Qupras. It determines all possible outside the ranges. behaviors. Figure 1 shows an example of suggesting (2) Design parameter calculator ranges for a design parameter. This example This subsystem calculates ranges of illustrates the determination of a resistance value design parameters. in a DTL circuit. The designer inputs the DTL structure and the parameters of the components (3) Parallel constraint solver except for the resistance Rb. This subsystem is written in KLl, and is Desq checks the conditions in the definitions of executed on PIM, Multi-PSI, or Pseudo physical rules and objects. If they are satisfied, the Multi-PSI. equations in their consequences are sent to the When the designer specifies initial data, the parallel constraint solvers. But, it is not known behavior reasoner builds its model corresponding what state the diode Dl is in, because the to the initial state, by evaluating conditions of resistance Rb is undefined. The first condition is physical rules and objects. The physical rules and whether the voltage of Dl is-lower than 0.7 volts. objects are stored in the knowledge base. The Desq hypothesizes two cases; in the first the model in Desq uses simultaneous inequalities in condition is not satisfied, and in the second it is. The first Initial data hypothesis is abandoned Output ~---.---., because the parallel rr====ir=====;j m~C constraint solver detects pariUllc:t6r> a conflict with the other ~9JJ~9t<: equations. In the .... on PSI second hypothesis, no •••• conflict is - detected. :Query ••••• Query After some more •••• '" ~ hypotheses are made, another state is detected where it is not known Simultaneous on Pseudo Multi-PSI whether or not the inequalities condition giving the state of the transistor Tr Figure 2 System organization Preferable behavior Unpreferable behavior~ • 1025 the same way as that in Qupras. Simultaneous inequalities are passed to the parallel constraint solver to check the consistency and store them. If an inconsistency is detected, the reasoning process is abandoned. Conditions in the definitions of physical rules and objects are checked by the parallel constraint solver. If the conditions are satisfied, the inequalities in the consequences of the physical rules and objects are added to the model in the parallel constraint solver. If a condition cannot be evaluated by the parallel constraint solver, envisioning is performed. Finally, when all possible beh~viors are found, the design parameter calculator deduces the ranges of design parameters that give preferable behaviors. 3.1 Behavior reasoner 3.1.1 Qupras Outline Qupras is a qualitative reasoning system that uses knowledge from physics and engineering textbooks. Qupras has the following characteristics: (1) Qupras has three primitive representations: physical rules (laws of physics), objects and events. . (2) Qupras determines the dynamic behaviors of a system by building all equations for the system using knowledge of physical rules, objects and events. The user need not enter all the equations of the system. (3) Qupras deals with equations that describe basic laws of physics qualitatively and quantitatively. (4) Qupras does not require quantity spaces to be given in advance. It finds the quantity spaces for itself during reasoning. (5) Objects in Qupras can inherit definitions from their super objects. Thus, physical rules can be defined generally by specifying the definitions of object classes with super objects. Qupras is similar to QPT [Forbus 84], but does not use influence. The representations describing relations of values in Qupras are only equations. Qupras aims to represent laws of physics given in physics textbooks and engineering textbooks. Laws of physics are generally described not by usi ng influences in the textbooks, but by using equations. Therefore, Qupras uses only equations. The representation of objects mainly consists of existential conditions and relations. Existential conditions correspond to conditions needed for the objects to exist. Objects satisfying these conditions are called active objects. The relations are expressed as relative equations which include physical variables (hereafter physical quantities are referred to as physical variables). If existential conditions are satisfied, their relations become known as relative equations that hold for physical variables of the objects specified in the physical rule definition. The representation of physical rules mainly consists of objects, applied conditions and relations. The objects are those necessary to apply a physical rule. The representations of applied conditions and relations are similar to those of objects. Applied conditions are those required to activate a physical rule, and relations correspond to the laws of physics. Physical rules whose necessary objects are activated and whose conditions are satisfied are called active physical rules. If a given physical rule is active, its relations become known as in the case of objects. Qualitative reasoning in Qupras involves two forms of reasoning: propagation reasoning and prediction reasoning. Propagation reasoning determines the state of the physical system at a given moment (or during a given time interval). Prediction reasoning determines the physical variables that change with time, and predicts their values at the next given point in time. The propagation reasoning also determines the subsequent states of the physical system using the results from the prediction reasoning. 3.1.2 Behavior Reasoner The behavior reasoner is not much different from that of Qupras. The two features below are additions to that of Qupras. (1) Envisioning In Qupras, if conditions of physical rules and objects cannot be evaluated, Qupras asks the user to specify the conditions. It is possible for Desq to continue to reason in such situations by assuming unevaluated condi tions. (2) Propagation of new constraints on constants There are two types of parameters (quantities): constant and variable. In envisioning, the constraints related to some constant parameters become stronger by hypothesizing some conditions in the definitions of physical rules and objects. The constraints propagate to the subsequent states. Before the reasoning, all initial relations of the objects defined in the initial state are set as known relations, which are used to evaluate the conditions of objects and physical rules. Initial relations are mainly used to set the initial values of the physical variables. If there is no explicit change to an initial relation, the initial relation is held. An example of an explicit change is the prediction of the next value in the prediction reasoning. 1026 ...... Add constraints . . . _____ . Communication among constraint solvers Linear part Figure 3 Combined constraint solver Propagation reasoning finds active objects and physical rules whose conditions are satisfied by the known relations. If a contradiction is detected, the propagation reasoning is stopped. If no condition of physical rules and objects can be evaluated, the reasoning process is split by the envisioning mechanism into two process: one process hypothesizing that the condition is satisfied and other hypothesizing that it is not. Prediction reasoning first finds the physical variables changing with time from the known relations that result from the propagation reasoning. ,Then, it searches for the new values or the new intervals of the changing variables at the next specified time or during the next time interval. Desq updates the variables according to the sought values or intervals in the same way as Qupras. The updated values are used as the initial relations at the beginning of the next propagation reasoning. 3.2 Design parameter calculator The method of calculating the design parameters is simple. After finding all possible behaviors, the designer specifies which design parameters to calculate. Then, the upper and lower values of the specified parameters are calculated by the parallel constraint solver. 3.3 Parallel constraint solver The parallel constraint solver tests whether the conditions written in the definitions of physical rules and objects are proven by the known relations obtained from active objects and active physical rules, and from initial relations. We want to solve nonlinear simultaneous inequalities to test the conditions in the definitions of objects, physical rules and events. More than one algorithm is used to build the combined constraint solver, because we ·do not know of any single efficient algorithm for nonlinear simultaneous inequalities. We connected the three solvers as shown in Figure 3. The combined constraint solver consists of the following three parts: (1) Nonlinear inequality solver based on the interval method [Simmons 86], (2) Linear inequality solver based on the Simplex method [Konno 87], and (3) Nonlinear simultaneous equation solver based on the Groebner base method [Aiba 88]. If anyone of the three constraint solvers finds new resul ts, the resul ts are passed on to the other constraint solvers by the control parts. This combined constraint solver can solve broader equations than each individual solver can. However, its results are not always valid, because it cannot solve all nonlinear sim ul taneous inequalities. The reason why we can get quantitative ranges is that the combined constraint solver can process quantities quantitatively as well as qualitatively. 4 Example 4.1 Description of Model We show another example of the operator. We use a DTL circuit identical to that the same as in Figure 1. In this example, however, the input voltage and the resistance Rb are undefined. initial_state dtI objects Rl-resistor; Rg-resistor ; Rb-resistor ; Tr-transistor; Dl-diode; D2-diode2 ; initiaCrelations connect(ti !RI,tl !Rg) ; connect(t2!Rg,tI !DI,tI !D2) ; connect(t2!D3,tl!Rb,tb!Tr) ; connect(t2!RI,tc!Tr) ; resistance@RI=6000.0 ; resistance@Rg=2000.0 ; resistance@Rb>= 0.0; v@tI !R} = 5.0 ; v@t2!Dl >= 0.0; v@t2!Dl =< 10.0 ; v@te!Tr = 0.0 ; v@t2!Rb = 0.0 ; end. Figure 4 Initial state for DTL circuit 1027 The initial data is shown in Figure 4. The "objects" field specifies components and their classes in the DTL circuit. The "initial_relations" field specifies the relations holding in the initial state. For example, "connect(t2!Rg, t1!Dl, t1!D2)" specifies that the terminal t2 of the resistor Rg, the terminal t1 of the diode Dl, and the terminal t1 of the diode D2 are connected. The "!" is a symbol specifying a part. The "t2!Rg" expresses the terminal t2 which is one part of Rg. Rb is specified as a resistor in the "objects" definition. The "@" indicates a parameter. The "resistance@Rl" represents the resistance value of Rl. The "resistance@RI = 6000.0" specifies that RI is 6000.0 ohms. The resistance Rb is constrained to be positive, and the input voltage, which is the voltage of the terminal t2 in the diode Dl, is constrained to be between 0.0 and 10.0 volts. Both values are undefined, and Rb is a design parameter. Figure 5 shows the definition of a diode. Its super object is a two_terminal_device, so the diode inherits the properties of the two_terminal_device, i.e., it has two parts, both of which are terminals. Each terminal has two attributes "v" for voltage and "i" for current. The diode has an initial object tenninal:Tenninal attributes v; i; end. object two_tenninal_device:TTD parts_of tI-terminal ; t2-terminal; end. object diode:Di supers two_tenninal_device; attributes v', i; resistance-constant ; initial_relations v@Di=v@tl!Di-v@t2!Di ; state on conditions v@Di>=0.7; relations v@Di=0.7; i@Di>=O.O; state off condition v@Di<0.7; relations resistance@Di= 100000.0 ; v@Di=resistance@Di*i@Di ; end. Figure 5 Definition of diode physics three_conneccl objects TID1 - two_tenninaCdevice; TID2 - two_tenninal_device ; TID3 - two_tenninal_device ; T1-tenninal partname t1 part_ofTTDl ; T2-tenninal partname t1 part_ofTTD2; TI-tenninal partname t1 parcofTTD3; conditions connect(T1,T2,T3); relations v@T1 =v@T2; v@T2=v@T3; i@T1 + i@T2 + i@T3 = 0 ; end. Figure 6 Definition of physics relation, which specifies the voltage difference between its terminals. The diode also has two states: one is the "on" state where the voltage difference is greater than 0.7, and the other is the "off' state where the voltage difference is less than 0.7. If the diode is in the "on" state, it behaves like a conductor. In the "off' state, it behaves like a resistor. A transistor is defined like a diode, but it has three states, "off', "on" and "saturated" (In the example of Figure 1, we used a transistor model with two states, "off' and "on"). Figure 6 shows the definition of a physical rule. The rule shows Kirchhoffs law when the terminals t1 of three two_terminaLdevices are connected. It is assumed that the current into t1 of a two_terminal_device flows to the terminal t2. In fact, three two_terminal_devices can be connected in eight ways depending on how the terminals are connected. Table 1 All behaviors of DTL circuit State RanRe of input 10N-ON-SAT 1.40081 -1.5381 20N-ON-ON 1.4 -1.40081 30N-ON-OFF 0.7-1.4 40N-OFF-ON 0-1.4007 50N-OFF-OFF 0-}.4 6 OFF-ON-SAT 1.40081 -10.0 70FF-ON-ON 1.4 -10.0 80FF-ON-OFF 0.rl0.0 19 OFF-OFF- * ~onfhct Ranl1.e ofresistance value 486.16 - infmity 482.75 - infinity 0-233,567 100,000 - infmitv 0-233,567 460.9 - infmity 457.8 - 488.53 0-484.1 RanRe of output 0.2 0.r5.0 4.94 0.842-5.0 4.94 0.2 0.2-5.0 4.94 4.2 Results Table 1 shows all behaviors of the DTL circuit obtained by envisioning. The state column indicates the states of the diode, the diode2 and the transistor. The following columns show the range of the input voltage (volts), the range of the resistance Rb (ohms), and the range of the output voltage (volts). As is shown, the envisioning found nine states. Because the input voltage and the resistance Rb were undefined, the conditions of the two diodes and the transistor could not be 1028 evaluated. So, Desq was used to hypothesize both cases, and to search all paths. Figure 7 shows the relationship between the resistance and the input voltage. The reason why the ranges in Table 1 overlap is because the models of the diodes and the transistor are approximate models. A designer can decide, by looking at Figure 7, the resistance Rb for the DTL circuit to behave as a NOT circuit. It is desired for Rb to be greater than about 0.5 k ohms, and less than about 100 k ohms, so that the DTL 'circuit can output a low voltage (nearly 0 volts) when the input is greater than 1.5 volts, or can output a high voltage (nearly 5 volts) when the input is less than about 1.5 volts. The range is shown by the area enclosed by the dotted lines in Figure 7. 200k 5. Related Works Desq does not suggest structures of devices like the methods of [Murthy 87] and [Williams 90]. Rather, it suggests the ranges of design parameters for preferable behaviors. The suggestion is also useful, because determining values Figure 7 of design parameters is one of the important steps of design [Chandrasekaran 90]. This approach may be regarded as one application of constraint satisfaction problem solving. There are several papers that deal with electronic circuits as examples, using constraint satisfaction problem solving [Sussman 80, Heintze 86, Mozetic 91]. Sussman and Steele's system cannot suggest ranges for design parameters, because their system uses only equations. Heintze, Michaylov and Stuckey's work using CLP(R) to design electronic circuits is the most similar to Desq, but Desq is different from Heintze's work for the following points: (1) Knowledge on objects and laws of physics is more declarative for Desq. (2) Desq can design ranges of design parameters (of devices) that change with time. (3) Desq can deal with nonlinear inequalities, and Desq can solve nonlinear inequalities in some cases. In Mozetic and Holzbaur's work, numerical and qualitative models are used. In their view, our approach uses numerical models rather than qualitative models. But, if a constraint solver is used to solve inequalities, it is possible to use both numerical and qualitative calculations. 1.0 2.0 10.0 Input voltage (volts) Relationship between Resistance and Input voltage 6. Conclusion We have described a method of suggesting ranges for design parameters using qualitative reasoning, and implemented the method in Desq. The ranges obtained are quantitative, because our system deals with quantities quantitatively as well as qualitatively. In an example utilizing the DTL circuit, Desq suggested that the range of a resistance (Rb in Figure 1) should be greater than about 0.5 k ohms and less than about 100 k ohms to work the DTL circuit as a NOT circuit. If the designer wishes for a more detailed design, for example, to minimize the response time by performing numerical calculation, helshe need not calculate outside the range, and thus can save on the calculation cost, which is greater for direct numerical calculation (outside range). However, there are some possibilities that Desq cannot suggest valid ranges or the best ranges for design parameters. This is because of the following: (1) The ability to solve nonlinear inequalities in Consort is short Desq may suggest invalid or weak ranges because Consort cannot perfectly solve nonlinear inequalities. But, almost all results can be obtained by performing more 1029 detailed analysis using numerical analysis systems, for example, SPICE. (2) Inexact definitions are used It may be difficult to describe the definitions of physical rules and objects. This is because from inexact definitions, inexact results may be obtained. (3) The ability to analyze circuits is short The current Desq cannot analyze positive feedback. If there are any posi tive feedbacks in a circuit, Desq may return wrong results. The example in this paper does not change with time. We are currently working on how to determine ranges of design parameters (of circuits) that change with time, for example, a Schmidt trigger circuit. In such a case, we need to propagate new constraints on constant parameters. Moreover, we are investigating the load balancing of the parallel constraint solver to speed it up. 7. Acknowledgements This research was supported by the ICOT (institute of New Generation .Q..Q.mputer Technology). We wish to express our thanks to Dr. Fuchi, Director of the ICOT Research Center, who provided us with the opportunity of performing this research in the Fifth Generation Computer Systems Project. We also wish to thank Dr. Nitta, Mr. Ichiyoshi and Mr. Sakane (Current address: Nippon Steel Corporation) in the seventh research laboratory for their many comments and discussions regarding this study, and Dr. Aiba, Mr. Kawagishi, Mr. Menjyu and Mr. Terasaki in the fourth research laboratory of the ICOT for allowing us to use their GDCC and Simplex programs, and helping us to implement our parallel constraint solver. And we wish to thank Prof. Nishida of Kyoto Univers'ity for his discussions, Mr. Kawaguti, Miss Toki and Miss Isokawa of Hitachi Information Systems for their help in the implementation of our system, and Mr. Masuda and Mr. Yokomizo of Hitachi Central Research Laboratory for their suggestions on designing electric circuits using qualitative reasoning. References [Aiba 88] Aiba, A., Sakai, K., Sato Y., and Hawley, D. J.: Constraint Logic Programming Language CAL, pp. 263276, Proc. of FGCS, ICOT, Tokyo, 1988. [Bobrow 84] Bobrow, D. G.: Special Volume on Qualitative Reasoning about Physical Systems, Artificial Intelligence, 24, 1984. [Chandrasekaran 90] Chandrasekaran, B.: Design Problem Solving: A Task Analysis, AI Magazine, pp. 59-71, 1990. [Forbus 84] Forbus, K. D.: Qualitative Process Theory, Artificial Intelligence, 24, pp. 85-168, 1984. [Hawley 91] Hawley, D. J.: The Concurrent Constraint Language GDCC and Its Parallel Constraint Solver, Proc. of KL1 Programming Workshop '91, pp. 155-165, ICOT, Tokyo, 1991. [Heintze 86] Heintze, N., Michaylov, S., and Stuckey P.: CLP(R) and some Electrical Engineering Problems, Proc. of the Fourth International Conference of Logical Programming, pp.675-703, 1986. [Konno 87] Konno, H.: Linear Programming, NikkaGirren, 1987 (in Japanese). [Kuipers 84] Kuipers, B.: Commonsense Reasoning about Causality: Deriving Behavior from Structure, Artificial Intelligence, 24, pp. 169-203, 1984. [Mozetic 91] Mozetic, I. and Holzbaur, C. : Integrating Numerical and Qualitative Models within Constraint Logic Programming, Proc. of the 1991 International Symposium on Logic Programming, pp. 678-693, 1991. [Mizoguchi 87] Mizoguchi, R.: Foundation of expert systems, Expert system - theory and application, Nikkei-McGrawHill, pp. 15, 1987 (in Japanese). [Murthy 87] Murthy, S. and Addanki, S.: PROMPT : An Innovative Design Tool, Proc. of AAAI-87, pp.637-642, 1987. [Nishida 88a] Nishida, T.: Recent Trend of Studies with Respect to Qualitative Reasoning (I) Progress of Fundamental Technology, pp. 1009-1022, 1988 (in Japanese). [Nishida 88b] Nishida, T;: Recent Trend of Studies with Respect to Qualitative Reasoning (II) New Research Area and Application, pp. 1322-1333, 1988 (in Japanese). [Nishida 91] Nishida, T.: Qualitative Reasoning and its Application to Intelligent Problem Solving, pp. 105-117, 1991 (in Japanese). [Ohki 86] Ohki, M. and Furukawa, K: Toward Qualitative Reasoning, Proc. of Symposium of Japan Recognition Soc., 1986, or ICOT-TR 221, 1986. [Ohki 88] Ohki, M., Fujii, Y., and Furukawa, K: Qualitative Reasoning based on Physical Laws, Trans. Inf. Proc. Soc. Japan, 29, pp. 694-702,1988 (in Japanese). [Ohki 92] Ohki, M., Sakane, J., Sawamoto, K, and Fujii, Y.: Enhanced Qualitative Physical Reasoning System: Qupras, New Generation Computing, 10, 1992 (to appear). [Ohwada 88] Ohwada, H., Mizoguchi, F., and Kitazawa, Y.: A Method for Developing Diagnostic Systems based on Qualitative Simulation, J. of Japanese Soc. for Artif. Intel., 3, pp. 617 -626, 1988 (in Japanese). [Simmons 86] Simmons, S.: Commonsense Arithmetic Reasoning, Proc. of AAAI-86, pp. 118-128, 1986. [Sussman 80] Sussman, G. and Steele, G. : CONSTRAINTS A Language for Expressing Almost-Hierachical Descriptions, Artificial Intelligence, 14, pp. 1-39, 1980. [Yamaguchi 87] Yamaguchi, T., Mizoguchi, R., Taoka, N., Kodaka, H., Nomura, Y., and Kakusho, 0: Basic Design of Knowledge Compiler Based on Deep Knowledge, J. of Japanese Soc. for Artif. Intel., 2, pp. 333-340, 1987 (in Japanese). [Williams 90] Williams B. C.: Interaction-based Invention: Designing Novel Devices from First Principles, Proc. of AAAI-90, pp.349-356, 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1030 Logical Implementation of Dynamical Models Yoshiteru Ishida Division of Applied Systems Science Kyoto University, Kyoto 606 Japan ishida@kuamp.kyoto-u.ac.jp Abstract In this paper, we explore the logical system which reflect the dynamical model. First, we define the" causality" which requires "time reference". Then, we map the causation to the specific type of logical implications which requires the time fragment dt > 0 at each step when causal changes are made. We also propose a set of axioms, which reflect the feature of state-space and the relation between time and state-space. With these axioms and logical implications mapped from the dynamical systems, the dynamical state transition can be deduced logically. We also discussed an alternative way of deducing the dynamical state change using time operators and state-space operators. 1 namical model as well as some meta-rules which reflects that the state-space of dynamical systems is continuous to logical rules, the qualitative reasoning on dynamica systems can be done by logical deductions. Section 2 discusses the causality on the dynamica model. The causality is defined in terms of physica time. Then the causation is mapped to the logical implication which requires time fragment (dt > 0) at each step. Cause-effect sequence is obtained by the deduction where the new fact dt > 0 is required at each step. Section 3 discusses the relation between some concepts on dynamical models and those on logical systems. Section 4 presents a set of axioms from which state transitions are deduced logically. Section 5 discusses an alternative formalization of logical systems for deducing the dynamical changes. Introduction Although the dynamical systems and logical systems are considered to be completely different systems, there are several elements in common. We mapped from dynamical systems to logical systems to investigate the following questions: (1) How the fundamental concepts in dynamical systems such as observability, stability can be related to those in logical systems such as completeness, soundness ? (2) In order to attain the dynamical simulation on the mapped logical systems, what are necessary? (3) Can the qualitative simulation be carried out by deducing the future state from the current state and some axioms characterizing time, state-space and their relations? We consider it is crucial to discriminate (physical) causality explicitly from logical deducibility. We studied a causality characterized by "the time reference" other than event dependency for the discussion of physical causality. The physical causality (or equivalently "change" through physical time) is intrinsically embedded in a dynamical model which states the causal relation between what is changed and what makes the cha.nge. In this paper, we treat the physical causality as specific type of deduction which always requires the fact of the time fra.gment dt > 0 at each step. By mapping the dy- 2 2.1 Mapping Causality in Dynamical Models to Logical Implications Causality referring to time The causality has the following two requirements, which seem intuitively sound for a causality for the discussion of dynamical change. When we say "the event A caused the event B", we must admit (1) Time Reference: The event A occurred "before" the event B, (2) Event Dependency: The occurrence 0 the event B must be "dependent on" the occurrence 0 the event A. The "time reference" plays a crucial role to make clear distinction between "the causality" and logical deduction. In the original dynamical model of the form: dY/dt = X contains the "built-in causal" direction from the right hand side to the left hand side. We restrict ourselves to interpret the form dY/ dt = X as follows: X > 0 caused dt > 0) or is capable of causing the event of Y increase dY > 0 ). The requirement of the new fact dt > 0 should be claimed to verify the "built-in causality". Thus the 1031 these causal path, the following model is obtained. form will be mapped to the logical form: 2.2 Language for dynamics In order to logically describe the constraint of dynamical model, we use the following First Order Predicate Calculus. We use the 4-place predicates p(x,i), n(x,i), z(x,i) which should be interpreted as positive, negative, zero of the variable x at certain moment i. p(x,i), for example is interpreted as follows: ( Z') = {true, f al se, P x, d6Xs/dt = -a' 6Po - d· 6Xs d6Q/dt = b·(6Pi-6Po-c·(2X s·Q6Q-Q 2 6X s)/ X s2 d6Po/dt = e . (2Q6Q - f . 6Po) where a, b, c, d, e, and f are appropriately chosen positive constants. 6 x denotes the variance from the equilibrium point of x. if x (at time i) > 0; otherwise. Since the state must be unique at any moment, these predicates must satisfy the following uniqueness axioms U. U-(l) 'r/x'r/i(p(x, i) ~ (('" n(x, i)) U-(2) 'r/x'r/i(n(x, i) ~ (('" p(x,i)) U-(3) 'r/x'r/i(z(x, i) ~ ((rv n(x, i)) 1\ ('" 1\ ('" 1\ ('" z(x, i)))) z(x,i)))) p(x, i)))) We also use the 2-place predicate of inequality > Xs Po / 9 _~ xxxxxxxxxxxxxx~xxxx~~xxxxxx pi Fig.1 (x, y). Other than these three predicates, we also use functions such as d/dt(time derivative), +(addition), -(substraction), . (multiplication) , /(division) defined on the time varying function x( t) in our language. With these predicates, the causality defined from X to Y can be written by: 2.3 A Schematic Diagram of Pressure Regulator The first equation of the model, for example, mapped to the logical formulae: n(6po(t), i) 1\ p(dt, i) ~ p(d6xs/dt, i) p(X(t),i) 1\ p(dt, i) ~ p(dY/dt, i) p(6po(t), i) p(dt, i) ~ n(d6xs/dt, i) n(X(t), i) 1\ p(dt, i) ~ n(dY/dt, i) z(6po(t), i) V z(dt, i) ~ z(d6xs/dt, i) z(X(t), i) V z(dt, i) ~ z(dY/dt, i) Causality in dynamical models We formalize the" causality" by the propagation of sign in the dynamical model. In the propagation, time reference is included, since p(dt, i) is always needed to conclude the causation. Example 2.1. 1\ With the set of logical formulae, which are mapped from the dynamical equations and the following axioms, we can obtain a cause-effect sequence by the causal deduction on this model. 1-(1 ) 'r/x'r/i'r/j(z(x, i) 1\ p(dx/dt, i) 1\ (j > i) 3k((j > k) 1\ ~ (k > i) I\p(x,j,k))) I-(2) 'r/x'r/i'r/j(z(x, i) In order to compare the simulation results with those done by other qualitative simulation [de Kleer and Brown 1984], we use the same example of pressure regulator as shown in Fig. 1. We can identify the causality in the feedback path. The flow also is caused by a driving force and by the available area for the flow. Further, the pressure at a point is caused by the flow through the point. Reflecting IS 1\ n(dx/dt, i) 1\ (j > i) ~ 3k((j > k)l\(k > i)l\n(x,j,k))) These are the instant change rules [de Kleer and Bobrow 1984]' which state that z(x, i) is a point with measure zero. Suppose Pi is disturbed p( 6Pi, 0) when the system is in a stationary state (all the derivatives are zeros 1032 then the initial sign vector is (8Pi, 8Po, 8Q, 8X s) = (+,0,0,0). We will use this state-state vector notation when needed instead of an awkward notation of p(8Pi, 0), z(8Po, 0), z(8Q, 0), z(8Xs,0). By the causal deduction, p(8Q, N1) is first obtained( first step). Including this new state as the fact, we can then obtain p(8Po, N2) by the causal deduction again(2nd step). Including this state as the new results and using third time fragment dt > 0, we obtain n(5Xs, N3) by the causal deduction (3rd step). 3 Logical System and Dynamical System In the previous section, we regarded the causality builtin the dynamical model as logical implication. Then, the dynamical state change can be carried out in a similar manner to deduce the new fact from the logical formulae corresponding to the dynamical model and the time fragment p( dt, i). In order to use the causal relation in the dynamical model, the dynamical model must be original one. That is, the original dynamical model must reflect causal path between two physical entities. In this section, we consider some correspondence between the important concepts in dynamical systems and those in logical systems. Theorem 3.1(observability and deducibility) The dynamical system is qualitative observable from a observer y iff the non-zero of the observer y can be deduced in the mapped logical system when the fact that some variables (corresponding to the dynamical system)are non-zero is given. This result can be used to save some deduction processes when some variables are known to be observable or not. Further, this result can also used to investigate the qualitative stability which can be known by the observability of the system [Ishida 1989]. Definition 3.2 (completeness and soundness) The mapped logical system is called complete (sound) if all the state which can (not) be attained by the corresponding dynamical system in the finite time can (not) be deduced in the finite number of steps. Conjecture 3.3 The mapped logical system is always complete but not always sound. This fact is often stated in qualitative reasoning, but not formally proved yet. Most formal discussion may be found in [Kuipers 1985,Kuipers 1986], stating that Each actual behavior of the system is necessarily among those produced by the simulation. But, There are behaviors predicted by qualitative simulation which do not correspond to the behaviorof any system sarisfying the qualitative structure description. We will see the example showing the lack of soundness of the mapped logical system in the next section. The lack of soundness is due to the following fact. Proposition 3.4 Two equivalent dynamical systems may be mapped to the different logical systems. That is, two dynamical systems which can be transformed to each other, may be mapped to the different logical systems. In fact, a dynamical system is usually mapped to a part of the exact logical system. Therefore, in order to make the mapped logical system close to the dynamical model, we must map from the multiple dynamical models which are equivalent as a dynamica model, and combine these mapped logical systems. We have not yet known what kinds of equivalent dynamica models suffice to make the mapped logical system exact. 4 Reasoning about State by Deduction Thp. causal deduction stated in the previous section cannot say anything as to changes when some time interva passed. That is, when many variables approaching to zero, which one reaches zero first. In order to determine this, meta-rules which are implicit in dynamical models must be explicitly introduced. The following axioms reflect the fact that the state-space of the dynamical models are continuous. The lack of continuous and dense space in the logical system is the fundamental points which discriminate logical systems from dynamical systems. T-(l) 'Vx'Vi(p(x, i) 1\ n(dx/dt, i) ~ 3j((j > i) 1\ z(x,j))) ~ 3j((j > i) I\p(x,j))) T-(2) 'Vx'Vi(n(x, i) I\p(dx/dt, i) These axioms T-(1),(2) comes from value continuity rule stated in [de Kleer and Bobrow 1984]. This axiom T does not correctly reflect the world of dynamica model. Even if x > and dx / dt < 0, x does not necessarily become zero in the finite or infinite time. ° 1033 M-(1) \fx\fj1\fj2(p(x,j1) "n(x,j2)" (j2 > j1)) --+ 3j3(z(x, j3) "(j3 > j1) " (j2 > j3))) M-(2) \fx\fj1\fj2(n(x,j1) "p(x,j2) " (j2 > j1)) --+ 3j3(z(x, j3) "(j3 > j1) " (j2 > j3))) These axioms M-(1),(2) corresponds to the well-known intermediate value theorem, which reflects the continuity of the function x. Axioms T and M states the continuity of the state-space and that of the function from time to state-space. Other than axioms U,I,M,T, we need the following assumptions. That is, the state remains to be the same as the nearest past state unless otherwise deduced. We call this no change assumption. We could not formalize this assumption by a logical formula of our language so far. This seems to be a common problem to any formalization for reasoning about such dynamic concepts as state change, actions, and event. The situation calculus [McCarthy and Hayes 1969], for example, uses the Frame Axioms! to avoid this problem. Example 4.1. Let us consider the mass-spring system with friction [de Kleer and Bobrow 1984] (Fig. 2) whose model is of the form: (4-1) dx / dt = v (4-2) dv /dt = -kx - fv where k and f are positive constants. (4-2) is the original form containing the built-in causality whereas (4-1) is the definition of v. since (p(x,O) "p(v, 0)) U Gdm --+ n(dv/dt,O). This result n( dv /dt, 0) is inconsistent with the initial pattern p(dv/dt,O) under the uniqueness axiom U. We do not consider the initial sign pattern which contains zero for any variables, since the pattern will change immediately to the sign pattern with only non zero patterns by axiom I. Thus these three patterns cover all the possible sign combinations. We only show the deduction for the simulation of the case 1 when p(x,O), n(v,O), n(dv/dt, 0) are given as the initial pattern. Other cases can be deduced in a similar manner to this case 1 from the initial sign pattern, the set of logical formulae G dm and G ch . By the axiom T, p(x, 0)" n(v, 0) 3- Fig.2 X A Schematic Diagram of Mass-Spring System with Friction 3N1((N1 > 0)" z(x, N1) By the no change assumption, other variables are assumed to remain the nearest past signs; that is n(v, N1), n(dv/dt, N1). However, (n( v, Nl) " z(x, N1)) U G dm --+ p(x, Nl). Thus, we have z(x, Nl), n(v, Nl), p(dv/dt, Nl). Then by the axiom M, (N1) 0) "n(dv/dt, 0) "p(dv/dt, N1) 0)" (N1 > N2)" (z(dv/dt, N2)) --+ 3N2((N2 > By the no change assumption, other variables at time N2 are assumed to remain the nearest past signs; that is p(x, N2), n(v, N2). Since n(v, N2) " z(dv/dt, N2) U G dm -~---- --+ --+ p(cPv/dt 2 ,N2) and by the axiom I, p(d 2 v/dt 2 , N2) " z(dv/dt, N2) ,,(Nl > N2) 3N3((NI > N3)" (N3 > N2) "p(dv/dt, N3)). --+ Again by the no change assumption, p(x, N3), n( v, N3). By applying the axiom I to the state at N 1, As for the initial sign patterns of (x, v, dv/dt), we consider oIily three cases; (+, -, -), (+, +, -), (+, -, +). Let G dm denote the set of logical formulae corresponding to the dynamical model, and Gch those corresponding to the axioms U,I,T,M. The sign pattern (+,+,+) and its opposite pattern (-, -, -) are are inconsistent, 1 Frame axioms are collection of statements that do not change when an action is performed. n(v, Nl)" z(x, Nl) --+ 3N4((N4 > Nl) "p(x, N4)). n(v,N4) and p(dv/dt,N4) are obtained by the no change assumption. By applying the axiom T to the 1034 (x, dx/dt, d?x/dt 2 ) = (0,0,0) which is attained when in- state N4, n(v,N4) "p(dv/dt,N4) (z(v, NS)). -+ 3NS((NS > N4)" Again signs of other variables at NS remain to be the same as those at N4. By applying the axiom I to the state NS, we have finite time passed in the dynamical model. In fact, we only have periodic states as shown in Tables 1. However, the infinite sequences of deduction similar to this convergence can be found. When the initial sign pattern is (x, dx/dt, d?x/dt 2 , ... ) = (+, -, +, ... ), apply the axiom T to n(dx/dt,O) then we have 3Nl((Nl > 0)" z(dx/dt, Nl)). z(v, NS) " p(dv/dt, NS) p(v, N6)) -+ 3N6((N6 > NS)" Then applying the axiom M to this result, we will have In summary we have deduced, the set of the state at different time p(x, N2), n(v, N2), z(dv/dt, N2), 3N2((NI > N2)" z(d?x/dt 2 , N2)). p(x, N3), n(v, N3), p(dv/dt, N3), z(x, Nl), n(v, Nl), p(dv/dt, Nl), n(x,N4), n(v,N4), p(dv/dt,N4), n(x,NS), z(v,NS), p(dv/dt, NS), n(x, N6), n(v, N6), p(dv/dt, N6) and the order of time (0 < N2 < N3 < Nl < N4 < NS < N6). This application of the axiom M progressively to any higer order time derivative of x. That is we have Tables 1 show the state transitions starting the initial patterns case 1, case2 and case3. This is an interesting corresponding between the dynamical model and the mapped logical systems. It may suggest to introduce some operations in the logical system (other than deduction) which corresponds to the operation limt-+oo x(t). We will show this convergence can be deduced even in the finite step using the logical implications mapped from a different (but equivalent) dynamical model. The dynamical model (4-1), (4-2) is equivalent to the dynamical model: Tables 1 State Transition by Logical Deduction 0 1 2 3 + + + 0 - - - 0 + + + + + 0 6 + At step 6, the opposite pattern initial pattern comes. 4 S It I of the I dx/dt I d2x/dt 2 I I!HI ~ I I At step 2, the same pattern as the initial pattern of case 1 comes. case 3 I t I x I dx/dt I d2x/dt 2 I I!II~I + 1)" z(di +1 x/dt i +1, Ni + 1)). (4-3) E = x 2 + l/k * (dx/dt)2 (4-4) dE/dt = - f(dx/dt)2 . case 2 x 3Ni + 1((Ni > Ni -I ! I At step 2, the opposite pattern of the initial pattern of case 2 comes. This states that E and hence x will event ually become zero as long as f > O. Table 2 shows the state transition of the mapped logical system. This convergence of the dynamical system is attained in the infinite time, and hence need not be deduced in the mapped logical system. Since the current logical system does not have the concepts of convergence and infinite step, these concepts are out of scope of the mapped logical systems. The results show that the logical system mapped from the dynamical model (4-3)(4-4) is quite differet from that mapped from the dynamical model (4-1)(4-2), althogh these dynamical models are equivalent. Therefore, this example shows the correctness of Proposition 3.3. This point is also fundamental difference between dynamica systems and logical systems. Table 2 State Transition of Mass-Spring System(Energy Model) It" x In the logical system mapper from the dynamical model (4-1) and (4-2), it is impossible to deduce the state which corresponds to the convergence to the point I dx/dt I E I dE/dt I I~II~I ~ I~I * 0 denotes any sign +, - I o. 1035 5 Discussions We first discuss the temporal logic with the temporal operators F,P [Rescher and Urquhart 1971], where FA (PA) means A will(was) be true at some(past) future time. With the axiom schemata, the feature of these temporal operators, and even the features of time (e.g. whether it is transitive, dense, continuity) can be characterized. However, since the logic does not tell anything about the feature of the state-space and the relation between the state-space and time, it is not possible to infer the change in the state-space. In fact, the axioms I, T, M given at section ? characterize the feature of the statespace. An alternative to our approach is to define the space operators similar to the time operators. One way of defining space operator follows: Fx, Px where FxA(PxA) means that A is true at some point where x is larger(smaller) than the current value. With this definition, the previous time operator can be written as F t, Pt. With these space operators, the axioms I,T,M may be written as: 1-(1) z(x) ~ GAp(x)) 1-(2) z(x) ~ HAn(x)) T-(l) p(x) ~ Px(z(x)) T-(2) n(x) ~ Fx(z(x)) M-(l) p(x) AFAn(x)) ~ Fx(FAz(x))) M-(2) n(x) AFAp(x)) ~ Fx(FAz(x))) tion under the condition of dt > O. Time independent relations are mapped to only deductions. Then causa reasoning is done by requiring the facts dt > 0 in every step. This logical reasoning can be implemented on the the logical reasoning system such as prolog by providing axioms so far proposed and the mapped dynamical models. (2) In modeling; since we use the causality built in the dynamical model, we skip qualitative modeling process. That is, we use the dynamical model as qualitative model. However, the dynamical models must be carefully selected to insure the causal path in the dynamica models can be reflected on the mapped logical systems. 6 Conclusion We discussed a mapping from dynamical systems to logical systems to see the correspondence of the fundamental concepts in these two domains, to implement the causal reasoning system on a logical deduction system. To clearly separate the physical causality from the usua deduction, we defined causality in physical system by making time explicit. Many fundamental problems remains such as; whether or not the complete and sound logical system for a dynamical system exists? If yes, how the complete and sound logical system can be attained? References [de Kleer and Brown 1984] de Kleer, 1. and Brown, 1.S. A qualitative physics based on confluences. Artificia Intelligence, 24, 7-83, 1984. Since these axiom schemata I,T,M characterize the feature of only state-space itself, we need the following axioms TS which characterize the monotonic relation between time and state-space. [Forbus 1984] Forbus, K. D. Qualitative process theory. Artificial Intelligence 24, 85-168, 1984. TS-(l) p(dx/dt) ~ ((F A ~ FxA) A (PA ~ PxA)) TS-(2) n(dx/dt) ~ ((PA ~ FxA) A (F A ~ PxA)) [de Kleer and Bobrow 1984] de Kleer, 1. and Bobrow, D. G. Qualitative Reasoning with Higher-Order Derivatives. Proc. of AAAI 84 , 86-91, 1984. [Kuipers 1986] Kuipers, B. Qualitative simulation. Artificial Intelligence 29, 289-337, 1986. Here, the time operators are used instead of the time index for the sign predicates p,n,z. The good point of this space operator approach is that it can be discussed as a natural extension of temporal logic with temporal operators. However, its critical point is that although these space/time operators can tell the temporal precedence of the event but it cannot describe that the different event A, B occurred at the same time. In the approach taken in section .4, it is described by putting the same time tags. When compared with the qualitative reasoning [de Kleer and Bobrow 1984], our way of qualitative reasoning is different from theirs in the following two points: t 1) In reasoning; we defined another causality which refers to time strictly. Causal reasoning is carried out by mapping causality in dynamical models to the deduc- [Kuipers 1985] Kuipers, B. The Limits of Qualitative Simulation. 1JCAI85, pp.128-136, 1985. [Ishida 1989] Ishida Y. Using Global Properties for Qualitative Reasoning: A Qualitative System Theory. Proc. of IlCAI 89, Detroit ,pp. 1174-1179, 1989. [Struss 1988] Struss, P. Global filters for qualitative behaviors Proc. of AAAI 88, 301-306, 1988. [Rescher and Urquhart 1971] Rescher, 1. and Urquhart, A. Temporal Logic, Springer Verlag 1971. [McCarthy and Hayes 1969] McCarthy, 1. and Hayes, P.l. Some Philosophical Problems from the Standpoint of Artificial Intelligence, Machine Intelligence 4, Edinburgh University Press, 1969. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1036 The CLASSIC Knowledge Representation System or, KL-ONE: The Next Generation Ronald J. Brachman Alexander Borgida* Deborah L. McGuinness Peter F. Patel-Schneider Lori Alperin Resnick t AT&T Bell Laboratories, 600 Mountain Ave., Murray Hill, NJ 07974-0636, U. S. A. Abstract CLASSIC is a recently developed knowledge representation (KR) system, based on a view of frames as structured descriptions, with several important inferable relationships, including description classification. While much about CLASSIC is novel and important in its own right, it is especially interesting to consider the system in light of its unusual (for Artificial Intelligence) intellectual history: it is the result of over a d~cade of research and evolution in representation systems that trace their origins back to work on KL-ONE, arguably one ~f the most long-lived and influential approaches to KR III the history of AI. We outline some of the novel contributions of CLASSIC, but pay special attention to its roots, illustrating the maturation of some of the Qriginal features of KL-ONE and the decline and fall of others. A number of key ideas are analyzed-including the interpretation of frames as descriptions, the classification inference, and the role of a knowledge representation system in a knowledge- based application. The rare traceable relationship between CLASSIC and its ancestor gives us an opportunity to assess progress in a generation of knowledge representation research. 1 Introduction An unfortunately large fraction of work in Artificial Intelligence is ephemeral, accompanied by much sound and fury, but, in the end, signifying virtually nothing. Work on systems with significant longevity to the basic ideas, such as STRIPS, appears to be the exception rather than the rule in AI. In the area of knowledge representation (KR), there are ideas that have lived on for years, but very few systems or approaches have seen more than a minimal number of users for a minimal number of years. l The KL-ONE system [7, 11] is different: it was "born" over a dozen years ago, and has had continuous evolution and influence ever since. Its offspring now number at least twenty significant projects worldwide, all based directly on its key ideas of classification and structured inheritance. With well more than a decade behind us, this rich * Also with the Department of Computer Science, Rutgers University, New Brunswick, NJ. tElectronic mail addresses: rjb@research.att . com, borgida@cs.rutgers.edu, dlm@research.att.com, pfps@research.att.com,resnick@research.att.com. lSNePS and Conceptual Graphs are among the few exceptions. history bears closer examination, especially with the advent of the CLASSIC Knowledge Representation System, a recent development that clarifies and amplifies many of the central ideas that were more crudely approximated in the KL-ONE of 1978. CLASSIC goes substantially beyond KL-ONE in its treatment of individuals and rules, its clarification of subsumption and classification, its integration with its host language, and its concrete stand on the role of a KR system as a limited deductive database management system. While a description of the CLASSIC system would be interesting in its own right, its motivation and contribution are more easily understood by placing it in the proper context. Thus, rather than describe the system in isolation, we here briefly explore some of its key properties in light of their intellectual debt to KL-ONE and its children. Besides making the case for CLASSIC, this will also provide us an opportunity to assess in retrospect the impact of some of the original ideas introduced by KLONE. This is a chance to see how far we have come in a "generation" of knowledge representation research. 2 KL-ONE: The First Generation KL-ONE was the first implementation (ca. 1978) of a representation system developed in Brachman's thesis [7]. It was influenced in part by the contemporary Zeitgeist of "frames" (e.g., see [20]), with emphasis on structured objects and complex inheritance relationships. But KLONE's roots were really in semantic networks, and it had a network notation of labeled nodes and links. Despite its appearance, in some key respects KLONE was quite different from both the semantic network systems that preceded it, and the frame systems that grew up as its contemporaries. Following papers by Woods [33] and Brachman [6], KL-ONE rejected the prevailing idea of an open-ended variety of (domainspecific) link- and node-names, and instead embraced a small, fixed set of (non-domain-specific) "epistemological primitives" [8] for constructing complex structured objects. These constructs-which represented basic general relationships like "defines-an-attribute-of" and "is-aspecialization-of," rather than domain-specific relationships like "owns" or "has-employee"-were considered to be at a higher level of representation than the datastructuring primitives used to implement them. They could be used as a foundation for building applicationdependent conceptual models in a semantically meaningful way (rather than in the ad hoc fashion typical of semantic nets). 1037 s.ndOate 11.11 - n ,HILI Figure 1: A KL-ONE Concept. In addition to its clear stand on the semantics of semantic networks, the original KL-ONE introduced a number of important ideas, including these: • rather than manipulating "slots" -which are in reality low-level data structures--KL-ONE looked at relationships as roles to be played; roles get their meanings from their interrelations-just like the roles in a drama-and they are not just meaningless labeled fields of records or indistinguishable empty bins into which values are dropped; • a role taxonomy, which allowed roles to be subdivided into more specific roles; e.g., if child is a more specific role than relative, then being a child entails something more constrained than being a relative, but includes everything that being a relative in general does; • structural descriptions, which served to define the relationships between role players; e.g., the difference between a buyer and a seller in a PURCHASE event would be specified by reference to other concepts that specified in which direction money and goods would flow. These concepts would give substance to the roles, rather than leaving their meanings open and subject only to human interpretation of strings like "buyer." • structured inheritance, which reflected the fact that concepts (KL-ONE'S name for frames/classes) were complex structured constructs and their parts were not independent items to be manipulated arbitrarily. The KL-ONE language showed its semantic-network heritage rather directly, in that KL-ONE structures were drawn in diagrams, with different link-types being indicated with different pictorial realizations. For example, Figure 1 illustrates a typical KL-ONE concept: the "STARFLEET-MESSAGE" concept uses its parent, "MESSAGE," to create the description corresponding to "a MESSAGE whose Sender is a STARFLEET-COMMANDER." In general, a user built a KL-ONE net like this by calling rather low-level LISP functions, whose actions might be to "create a role node" or "add a superconcept link." After a number of years of use and reimplementation, it gradually became clear that KL-ONE's approach to structured objects was substantially different than that of virtually all of its contemporary systems. The primary realization was that those objects had previously been used for (at least) two purposes [6,9]: (1) to represent statements, usually of some typical properties (e.g., "elephants are gray"), and (2) to act as structured descriptions, somewhat like complex mathematical types (e.g., "a black telephone," rather than "all telephones are black"). In the KL-ONE community, the structureddescription aspect came to be emphasized over the assertional one. Viewing frames as descriptional, rather than assertional, emphasized the intensional aspects of knowledge representation. This had one primary benefit: it yielded the idea that the central inference to be drawn was subsumption-whether or not one description is necessarily more general than another. Subsumption in turn led to the idea of description classification-taking a description and finding its proper place in a partial order of other descriptions, by finding all subsuming (more general) descriptions and all subsumed (more specific) descriptions. KL-ONE-based classification systems were subsequently used in a number of interesting applications, including natural language understanding [11], information retrieval [27], expert systems [22], and more. Because of this view of frames, the research foci in the KL-ONE family gradually diverged somewhat from those of other frame projects, which continued to emphasize typicality and defaults. Another key issue in the KL-ONE community has been the tension between the need for expressiveness in the language and the desire to keep implementations computationally reasonable. Two somewhat different approaches can be seen: NIKL [17], and subsequently LOOM [19], added expressive power to the original KL-ONE language, and admitted the possibility of incomplete classification. KRYPTON [12], and subsequently KANDOR [26], on the other hand, emphasized computational tractability and completeness. While neither of these approaches is right for every situation, they provide an interesting contrast and highlight a significant current issue in knowledge representation. This topic is still under active exploration (see Sections 4.5-4.6). Over the last decade, systems based on the ideas in KLONE have proliferated in the United States and Europe (with significant ESPRIT funding), with at least twenty related efforts currently underway (see [34]). The work has also inspired seven workshops, two recently being held in Germany (in 1991) and one coming soon in the US (1992). These workshops have attracted both theoretical and practical scientists from several countries, and made it clear that the class of "KL-ONE-like" representation systems has both important theoretical substance and practical impact. 3 The CLASSIC System The CLASSIC Knowledge Representation System 2 represents a new generation of KL-ONE-like systems, emphasizing simplicity of the description language, a formal approach, and tractability of its inference algorithms. In this regard, it is most like KANDOR (and also BACK [32]), which, while setting important directions for limited subsumption-based reasoning, had a number of inadequacies. However, the CLASSIC system goes significantly 2CLASSIC stands for "CLASSification of Individuals and Concepts." It has a complete, fully documented implementation in Common Lisp, and runs on SUN workstations, Apple Macintoshes, Symbolics Machines, etc. It has been distributed to numerous (> 40) universities for research use. 1038 beyond previous description-based KR systems in many important respects, including its language, integration with the host system, treatment of individuals, and clarity on the role of a KR system. In CLASSIC's language, there are three types of objects: • concepts, which are descriptions with potentially complex structure', formed by composing a limited set of description-forming constructors; concepts correspond to one-place predicates; • roles, which are simple formal terms for properties; roles correspond to two-place predicates; within this class, CLASSIC distinguishes attributes, which are functional, from multi-roles, which can have multiple fillers; • individuals, which are simple formal constructs intended to directly represent objects in the domain of interest; individuals are given properties by asserting that they are described by concepts (e.g., "Chardonnay is a GRAPE") and that their roles are filled by other individuals (e.g., "Bell-Labs' parent-company is AT&T"). The CLASSIC description language is uniform and compositional-the meaning of a complex description is a simple combination of the meanings of its parts. 3 The complete description language grammar in Figure 2 illustrates its simplicity. Besides the description language, the interface to CLASSIC has a small number of operators on knowledge bases for the creation of new concepts (and the assignment of names to them), which include defined concepts, with full necessary and sufficient conditions; primitive concepts, which have only necessary conditions (see [9]); and disjoint (primitive) concepts, which cannot share instances (e.g., MALE and FEMALE). There is also an operator to explicitly "close" a role; this makes the assertion that there can be no more fillers for the role (see below). It is important to emphasize that the description constructors and knowledge base operators were chosen only after careful study and extensive experience with numerous KR systems. For example, virtually every objectcentered representation system has a way to restrict the type of an attribute; this yields our ALL constructor. All KR languages need to assert that a role is filled by an object; this corresponds to FILLS. CLASSIC's set captures the central core of virtually all KL-ONE-like systems in an elegant way: the constructors are minimal, in that one can not be reduced to a combination of others; and they have a uniform, prefix notation syntax, which allows them to be composed in a simple and powerful way. Rules (see Sec. 4.4), procedural tests, numeric ranges (MAX, MIN) and host language values expand the scope of KL-ONE-like concepts; these were included after clear user need was demonstrated. Certain more complex operators were excluded because they would have clearly made inference intractable or undecidable. Thus, CLASSIC's language is arguably the cleanest structured description language that tempers expressiveness of descriptions with tractability of inference (but see Section 4.5), elegantly balancing representational needs and inferential constraints in a uniform, simple, compositional framework. 3CLASSIC has a formal semantics, but we will not be able to elaborate on it here. See [4]. CLASSIC has many novel features, and improves on its predecessors in a number of ways, one of the most telling of which is its treatment of individuals. Anything that can be said about a concept can be said about an individual; thus, partial knowledge about individuals is maintained and used for inference. For example, we can assert that a person has at least three children ((AT-LEAST 3 child)) without identifying them, or that all of the children-whoever they are-are female ((ALL child FEMALE)). Individuals from the host language (e.g., LISP), such as strings and numbers, can be freely used where CLASSIC-supported individuals can, with consistent treatment. When any individual is added or augmented, or when a new concept is defined, complete propagation of properties is carried out, so that all individuals are continuously classified properly, and monotonic updates are treated completely. The rolefillers of an individual are not considered under the usual closed-world assumption; this better supports the accumulation of partial knowledge about individuals. Roles· can be "closed" explicitly when all of their fillers are known. Most crucially, an individual cannot be proven to satisfy an ALL restriction or an AT-MOST restriction by looking at its fillers for the role unless all of those fillers are known. Previous systems either treated this aspect of assertions incompletely or incorrectly. Rather than delve further into CLASSIC's individual features, we will attempt to better articulate its more general contributions by examining its relation to the issues that started this whole line of thinking over a decade ago. In that respect we can not only appreciate gains made in CLASSIC, but understand the strengths and weaknesses of the original KL-ONE proposals. 4 Key Intellectual Developments CLASSIC is innovative in a number of ways, and bears little surface resemblance to KL-ONE. But it is also very much a descendant of that system, which introduced a number of key ideas to the knowledge representation scene. While we will not have an opportunity here to delve into all of these ideas, we will examine a few of the more important issues raised by the original system and its successors. 4.1 Subsumption as a Central Inference In KL-ONE, as in all semantic networks that preceded it (and most systems to follow), the backbone of a domain representation was an "IS-A" hierarchy. The IS-A ("superc" in KL-ONE) link served to establish that one concept was a subconcept of another, and thus deserved to inherit all of the features of the superconcept. Virtually all of these systems forced the user to state directly that such a link should be placed between two explicitly named concepts. This type of user responsibility is still common in virtually all frame-based systems and expert system shells. In the early 1980's w~ discovered that in a classification-based system this was the wrong way around. In the KL-ONE-descendant languages of KRYPTON and KANDOR, where the meaning of a concept could be determined simply and directly from its structure (be- 1039 ::= THING I CLASSIC-THING I HOST-THING I built-in names conjunction (AND < concept-expression> +) I universal value restriction (ALL ) I (AT-LEAST ) I minimum cardinality maximum cardinality (AT-MOST ) role-filling (FILLS +) I role-filler equality (SAME-AS *) I test (HOST concept) (TEST-H *) I set of individuals (ONE-OF +) I numeric range limits (MIN ::= I ::= ::= I I I ' ::= I ::= «attribute-name> +) ::= a function in the host language (COMMON LISP) with three-valued logical return type Figure 2: The CLASSIC Description Language (comments in italics). cause the logic had a compositional semantics and necessary and sufficient definitions), it became clear that IS-A relations were purely derivative from the structure of the concepts. In other words, the subsumption relation 4 between two descriptions was determined without any need for a complete explicit hierarchy of IS-A connections. Of course, it might 'make a difference to the efficiency of the system if allsubsumption relationships that had been calculated were cached in some kind of structure that obviated the need to compute them a second time, and this is now common practice. But in a system like CLASSIC, it is clear that this is strictly an efficiency issue. In essence, systems that force a user to think only in terms of direct IS-A links place the entire burden of knowledge structuring on that user. Since every IS-A assertion is taken at its word, the system can provide no feedback that the correct relationship has been r~p­ resented; all responsibility is the user's. On the other hand, the CLASSIC system (and others like it) can reliably decide under which concepts a new concept or individual must fit, since it has a compositional interpretation of the parts of any concept. This provides valuable help to the user in structuring large knowledge bases, because it is all too easy for us to assume that just because we know something that a term (e.g., a complex concept, like RED-WINE) implies, the system will know it as well. This advantage has been documented in the LASSIE system [14], which uses classification to support a software information system. Systems that do not do classification do not have defined concepts, and therefore treat everything as primitive [9]. Thus we can be falsely lulled into assuming that when we assert that a particular WINE has color = Red, the system will know that it is a RED-WINE; but a non-classification system will not make that inference. 5 4S ubsumption is defined formally in [18] and [4]. Concept a subsumes concept b iff instances of b are instances of a in all possible interpretations. 5Note that CLASSIC and its cousins all do normal inheritance of properties. Most ofthese systems are strictly monotonic for simplicity, but LOOM [19] has a default component. 4.2 From LISP Functions to Languages The realization that the structure of a concept is the only source of its meaning, and that any IS-A hierarchy is induced by such structures, leads to another significant point of departure for the CLASSIC system. CLASSIC has a true knowledge representation language-a grammar of expressions. KL-ONE and even many of its successors treated a knowledge base as a set of data structures to be more or less directly manipulated by a user, and thus the user interface was strictly in terms of node- and link-managing functions. Instead (following KRYPTON) CLASSIC is really based on a formal logic, with a formal syntax, rules of inference, and a formal interpretation of the syntax (see [4]). Of all of the KL-ONE-like systems, the CLASSIC system has the cleanest language. As shown in Figure 2, the language is simple, uniform, and compositional. Figure 3 illustrates the difference in style between KLONE structures and the lexical language of the CLASSIC system. 6 The advantages of a true logic over a set of data-structure-manipulating programs should be obvious: one can write parsers and syntax checkers for the language, formal semantics can be specified, inference mechanisms can be verified to adhere to the semantics, etc. 4.3 Attached Procedures One of the more popular features of the early frame systems was the ability to "attach" programs to pieces of the data structures. The ultimate incarnation of this idea was probably KRL [3], which had an elaborate process framework, including "servants," "demons," "traps," and "triggers." The program fragments could be invoked at various times, and cause arbitrary computations to occur. KL-ONE had its own elaborate procedure attachment and invocation framework. However, arbitrary access to LISP meant that KR systems with this feature ceded control completely to the user-an at6The symbols ~ and == indicate a primitive concept specification and a defined concept specification, respectively. The KL-ONE community has developed an algebraic notation that includes operators like these for all constructs in CLASSIC and related languages. 1040 So_ U.NIL) (AND (AT-LEAST 1 sender) (ALL sender PERSON) (AT-LEAST 1 recipient) (ALL recipient PERSON) (AT-LEAST 1 body) (AT-MOST 1 body) (ALL body TEXT)) PRIVATE-MESSAGE ~ (AND MESSAGE (AT-MOST 1 recipient)) MESSAGE [;;; Figure 3: CLASSIC Expressions and KL-ONE Diagrams (adapted from [11]). tached procedure could alter any data structure in any way at any time. The semantics of KL-ONE networks and other frame systems thus became very hazy once attached procedures were utilized. In CLASSIC, we have invented an important way to control the use of such "escape hatches." Through the notion of the TEST-C and TEST~H constructors, we have isolated the use of procedures in the host language to testing predicates. As one can see from the grammar, such concepts are treated syntactically uniformly with other concepts. The procedure simply provides a primitive sufficiency condition for the concept-it will be invoked only when trying to recognize an instance. These test functions are particularly useful when trying to relate individuals from the host language, such as when two roles are filled with numbers, and one should be a multiple of another. In their use, the user agrees to avoid side-effects and to use only monotonic procedures (i.e., those whose value never changes from true to false or vice versa in the presence of purely monotonic updates). While under arbitrary circumstances, resorting to program code for tests renders the semantics of the language useless, in CLASSIC, if the user abides by this "contract," the semantics of concepts with tests is manageable, and the inferences that the system draws are still guaranteed to be sound. Indeed, tests work just like other restrictions on concepts as far as classification of individuals goes, but since the procedures are inscrutable they have the flavor of primitive concepts. While primitive concepts allow primitive necessary conditions, tests give us primitive sufficient conditions. Another innovation in CLASSIC is the requirement that the test functions must be 3-valued. If a system like CLASSIC says that an individual does not satisfy a concept, then that means only that it cannot be currently proven to do so. A complementary question can still be asked-whether it can be proven that the individual could never satisfy the description (i.e., that it is disjoint from the concept). For example, if Fred has exactly one child (i.e., (AND (AT-LEAST 1 child) (AT-MOST 1 child))), but nothing is known about it yet, then he cannot be proven to satisfy the description (ALL child FEMALE) . But it is possible that at a later time he could be, if he were stated to have a known female child. On the other hand, if it were asserted that his child was Barney, who was known to be a MALE, and MALE and FEMALE were disjoint concepts, then it would be provable that Fred could never satisfy the description. Thus, in order to fit into the classification framework, procedural tests must provide the same facility-to differentiate between a guarantee never to satisfy a description and lack of ability to prove it given the current knowledge base. 4.4 Definitions, Assertions, Individuals As mentioned, KL-ONE ultimately distinguished itself from other frame languages by its emphasis on structured descriptions and their relationships, rather than on contingent and typical facts. At one point in its development, the system was in a strange state: there were facilities for building complex concepts, but none for actually using them to describe individual objects in the domain. "Individual concepts" were KL-ONE's initial attempt to. distinguish between generic class descriptions and descriptions that could apply only to single individuals. As it turned out, these were typically misused: an individual concept with two parent concepts could only really mean a conjunctive description. One example that was used often was the conjunction of DRIVING- IN-MASSACHUSETTS and HAZARDOUS-ACTIVITY, intended to express the fact that driving in Massachusetts is hazardous. However, in truth the concept including them both was just a compound concept with no assertional force at all. While KL-ONE initially correctly distinguished between the import of different links between concepts, it failed to distinguish between those and a link that would make a contingent assertion about some individual. Eventually an alternative mechanism was proposed-the "nexus," to stand for an individual-but this was never really used. In the end, it took the work on KRYPTON to get this right. In KRYPTON, it was proposed that terminological knowledge (knowledge about the structure of descriptions) and assertional knowledge (facts) 'are two complementary aspects of knowledge representation competence, and that they should be maintained by distinct components, with an appropriate logical connection between them. From this distinction arose the terms "TBox" and "ABox," which are used extensively in the KL-ONE community to refer to the two components. But KRYPTON went too far in another direction, integrating an entire first-order logic theorem-prover as its assertional component. The CLASSIC system makes what we think is a better compromise: it has a limited objectcentered logic that properly relates descriptions and individuals. As is apparent from the grammar, CLASSIC treats assertions about individuals in a parallel and uniform manner with its treatment of the formation of subconcepts; but it also carefully distinguishes the logical meaning of the different relationships. Thus, for example, while individuals can be used in concept .value restrictions (i.e., in a ONE-OF expression, e.g., (ALL wine-color (ONE-OF Red White Blush))), no contingent property of an individual can be used in determining subsumption between two concepts (e.g., if Whi te just happens to be my favorite color for a wine, that fact cannot be used in any subsumption inference). 1041 As mentioned, CLASSIC also supports the propagation of information between individuals. If we assert that some individual is described by a complex description (e.g., that Rebecca is a PERSON whose mother is a DOCTOR), then that may imply some new properties about other related individuals (e.g., we should assert that Rebecca's mother, if known, is a DOCTOR). Such propagated properties can in turn cause other properties to propagate (e.g., that Rebecca's mother's office is a DOCTOR' S-OFFICE).7 This type of inference was never handled in KL-ONE, and only partially handled in some of its successors. Note that as soon as a property propagates from one individual to another, the latter individual might now fall under some new descriptions. CLASSIC takes care of this re-classification inference as well (as well as any further propagations that result, etc.). The CLASSIC system has two other features along these lines that distinguish it from its predecessors. First, the previously mentioned apparatus does not allow the expression of general contingent rules about individuals. Thus, given only what is in the CLASSIC concept grammar, while we could form the concept of, for example, a LATE-HARVEST-WINE, we could not assert that all LATE-HARVEST-WINEs are SWEET-WINEs. The sweetness is a derivative property-it is not part of the meaning of LATE-HARVEST-WINE, but rather a simple contingent property of such wines. In CLASSIC, one can also express general rules of a simple form. A rule has a named concept as the left-hand side, with an applicability condition (filter) that limits the rule's firing to the desired sub cases (i.e., if x is a with property , then x is a < conceptb > ). These rules are used only in reasoning about individuals, and do not affect subsumption relationships. 8 Most KL-ONE-like systems were unclear about the status of individuals that could easily be expressed in the host implementation language (i.e., numbers and strings in LISP). CLASSIC integrates such individuals in a simple and uniform way, and makes it virtually transparent whether an individual is implemented directly in the host language, or in the normal complex structure for CLASSIC individuals. This aspect of CLASSIC has proven critical in applications that deal with real data (for example, from a database), as in [29]. 4.5 KR and Computational Complexity Once it was apparent that the clearly defined logical relationship of subsumption was central to the KL-ONE family, a new factor could be introduced to the analysis of frame- based knowledge representation systems. In 1984, Brachman and Levesque gave a formal analysis of the complexity of computing subsumption in some frame languages [10]. That analysis showed that the apparent 7In order to keep the complexity down, CLASSIC only Thus, if propagates properties to known individuals. Rebecca's mother were unknown, the system would not attempt to create an individual about which to assert the DOCTOR description. If it did, it would then have to do very complex reasoning about existentials. 8S ome of the newer KL-oNE-derivatives, such as LOOM, have developed similar rule mechanisms. simplicity of some frame languages could be deceptive, and that the crucial subsumption inference was co-NPhard. The original paper initiated a sequence of results on the complexity of computing in the KL-ONE family, culminating most recently in two that show that the original language is in fact undecidable [24, 28]. This line of analysis has caused some major rethinking of the knowledge representation enterprise. No longer can we view language features as simply providing more expressiveness (which was the common view in the early years of knowledge representation). Rather, as in other areas of computer science, we must consider how expensive it will be to add a feature to a language. The addition of new features may demand the excision of some others in order to maintain computational manageability, or the system must be clear on where it is incomplete. In CLASSIC, subsumption is complete and tractable, but with respect to a slightly non-standard semantics; that is, it is clear what CLASSIC computes, and how fast it can compute it, but it does not compute all the standard logical consequences of a knowledge base. In this regard, we have opted for a less conservative approach than in KANDOR, but a more limited and disciplined approach than in LOOM. The consequences of this are explored briefly in the next section. We should point out that the viability of our approach has been proven in practice: CLASSIC is the first KL-ONE-derived system to be deployed in a fielded (AT&T proprietary) product, used every day in critical business operations. It was expressive enough to do the job. 4.6 The Role of a KR System The above developments in the KL-ONE saga give rise to an important general question that usually goes unasked in AI: what role is a knowledge representation system expected to play? There are clearly different approaches here. On one extreme we have the large commercial systems, or expert system shells, which include substantial knowledge representation apparatus. The philosophy of those systems seems to be that a KR system should provide whatever apparatus is necessary to support virtually any AI application. In that regard, such systems are like very powerful programming languages, with complex data-structuring facilities. But this is definitely not the only approach, and in many respects its requirements are overly demanding. Given the kind of complexity results mentioned above, users of such powerful systems must be very careful in "programming" their KR tools: predicting when a computation will return is difficult or impossible in a very expressive logic. In many contexts (but not all, of course), it may be appropriate for a knowledge representation system to act in a more constrained fashion, rather like the database component of an application system. This is the point of view explicitly espoused in CLASSIC. Users cannot expect to program arbitrary computations in CLASSIC, but in return they get predictable response time and clear semantics. The burden of programming an application, such as a medical diagnostician, must be placed on some other component of the overall system. Since most KR systems attempt to be application-independent, it is ap- 1042 propriate for them not to be asked to provide general diagnostic, planning, or natural language-specific support. What is gained in return for certain limitations (and this in part accounts for the appeal of databases) is a system that is both complete with respect to an intuitive and simple semantic model and efficient to use. Failure to acknowledge this general issue has been a source of difficulty with knowledge representation systems in AI. KL-ONE, uniformly with its contemporary KR systems (and subsequently NIKL), never really took a stand as to the role it should play. This has resulted, for example, in a pair of recent critiques of NIKL [15, 30], for failing to live up to a promise it perhaps was never intended to make. With CLASSIC, on the other hand, we expect to provide a powerful database service, but with limited deductive and programming support. This is a unique kind of database service, as it is both deductive and object-oriented (see [5]). But nevertheless it is firmly limited. To use the CLASSIC system in the context of an expert system, for example, it would be appropriate to use it as a substitute for working memory in a rule-based programming system like OPS5, not for all computation to be done by the overall system. Several recent applications ([14], [29], [23], and others) have shown convincingly that this approach, while not satisfying all needs for all applications, is quite successful in important cases. 5 Perspective While CLASSIC is a "KL-ONE-like" system, it differs in so many ways from the original that it must be treated in its own right. While KL-ONE began the thinking on numerous key issues, it has taken us until CLASSIC to begin to truly understand many of them. Among its virtues, the CLASSIC Knowledge Representation System • isolates an important set of language constructs, distilled from many years of use of frame representations, and knits them together in an elegant, straightforward language with a compositional interpretation; novel language features include enumerated sets of individuals treated in a uniform manner with other concepts (ONE-OF), and limited generic equalities between role fillers (SAME-AS); • treats individuals in a more complete way than its predecessors, supporting propagation of facts and reclassification of individuals; • allows contingent universal rules that are automatically applied, with the affected individuals being reclassified and any derived facts being propagated; • offers tight, uniform integration of individuals from the host language, including numeric range concepts (MAX, MIN); • offers a facility for writing procedural 3-valued tests as primitive sufficiency conditions, and integrates such tests into the language and semantics in a clean way. 9 9CLASSIC also allows retraction of any asserted fact, with full dependency maintenance, but we have not had room to discuss this here. CLASSIC offers these facilities in the context of complete computation of subsumption, while remaining computationally tractable. The CLASSIC system can be thought of as a limited, deductive, object-oriented database management system as well as a knowledge representation system, and has been used to support several real-world applications. 1o . In this discussion, we have limited ourselves to considering the KL-ONE family and its contributions. Related work involving manipulation of types and their relations can be found in programming language research, in some semantic data modeling work, and in feature logics in support of (among other things) natural language processing. We do not have room to draw comparisons with this other work, but in general it is clear that the bulk of that work does not include classification and descriptionprocessing of the sort found so prevalently in KL-ONE-like systems. Recent work in some of these areas does bear a strong relationship to ours, but not by accident: work on KL-ONE and its descendants has had direct influence, for example, on LOGIN [lJ (a programming language), CANDIDE [2J (a DBMS), and feature logics [21J. There are still, of course, many open questions yet to challenge CLASSIC and its relatives. Technically, the notion of a "structural description," introduced by KLONE, has still not been treated adequately (although the SAME-AS construct provides a limited form of relationship between roles). And there are important computational questions to be answered so that CLASSIC can handle significant-sized databases, involving persistence of KB's, automatic loading of data from conventional DBMS's, and complex query processing. But perhaps chief among the remaining research questions is how exactly to cope with the tradeoff we are forced to make between expressive power and computational tractability. Is it even possible to provide the kind of knowledge representation and inference services demanded by AI applications in a computationally manageable way? The CLASSIC Knowledge Representation System has provided convincing evidence that this is possible at least for a limited set of applications, but it is but one point in a large space of possibilities that we are still mapping out, after more than a dozen years of research inspired by KL-ONE. References [1] Alt-Kaci, H., and Nasr, R. "LOGIN: A Logic Programming Language with Built-in Inheritance," Journal of Logic Programming, 3:187-215, 1986. [2] Beck, H. W., Gala, S. K., and Navathe, S. B. "Classification as a Query Processing Technique in the CANDIDE Data Model," Pmc. Fifth Intl. Conf. on Data Engineering, Los Angeles, 1989, pp. 572-58l. [3] Bobrow, D. G., and Winograd, T. A., "KRL, A Knowledge Representation Language," Cognitive Science 1(1), 1977, pp. 3-46. lOOne testimony to the success of CLASSIC'S clean and simple approach is the fact that a group from the University of Calgary has simply picked up a written description of the system and quickly implemented their own version as a c++ library to support their work in knowledge acquisition [16]. 1043 [4] Borgida, A., and Patel-Schneider, P. F., "A Semantics and Complete Algorithm for Subsurnption in the CLASSIC Description Logic," unpublished manuscript, AT&T Bell Laboratories, Murray Hill, NJ, 1992. Submitted for publication. [5] Borgida, A., Brachman, R. J., McGuinness, D. L., and Resnick, L. A., "CLASSIC: A Structural Data Model for Objects," Proc. 1989 ACM SIGMOD Intl. Conf. on Management of Data, Portland, Oregon, June, 1989, pp. 59-67. [6] Brachman, R. J., "What's in a Concept: Structural Foundations for Semantic Networks," Intl. Journal of Man-Machine Studies, 9(2), 1977, pp. 127-152. [7] Brachman, R. J., "A Structural Paradigm for Representing Knowledge," Ph.D. Thesis, Harvard University, Division of Engineering and Applied Physics, 1977. Revised as BBN Report No. 3605, Bolt Beranek and Newman, Inc., Cambridge, MA, May, 1978. [8] Brachman, R. J., "On the Epistemological Status of Semantic Networks." In Associative Networks: Representation and Use of Knowledge by Computers. N. V. Findler (ed.). New York: Academic Press, 1979, pp. 350. [9] Brachman, R. J., " 'I Lied about the Trees,' or, Defaults and Definitions in Knowledge Representation," AI Magazine, Vol. 6, No.3, Fall, 1985. [10] Brachman, R. J., and Levesque, H. J., "The Tractability of Subsurnption in Frame-Based Description Languages," Proc. AAAI-84, Austin, TX, August, 1984, pp.34-37. [11] Brachman, R. J., and Schmolze, J. G., "An Overview of the KL-ONE Knowledge Representation System," Cognitive Science, 9(2), April-June, 1985, pp. 171-216. [12] Brachman, R. J., Fikes, R. E., and Levesque, H. J., "Krypton: A Functional Approach to Knowledge Representation," IEEE Computer, Vol. 16, No. 10, October, 1983, pp. 67-73. [13] Brachman, R. J., McGuinness, D. L., Patel-Schneider, P. F., Resnick, L. Alperin, and Borgida, A. "Living with CLASSIC: How and When to Use a KL-ONElike Language." In Principles of Semantic Networks. J. Sowa (ed.). San Mateo, CA: Morgan Kaufmann, 1991, pp. 401-456. [14] Devanbu, P., Brachman, R. J., Selfridge, P. G. and Ballard, B. W., "LaSSIE: A Knowledge-Based Software Information System," CACM, Vol. 34, No.5, May, 1991, pp.34-49. [15] Doyle, J., and Patil, R. S., "Two Theses of Knowledge Representation: Language Restrictions, Taxonomic Classification, and the Utility of Representation Services," Artificial Intelligence, Vol. 48, No.3, April, 1991, pp. 261-297. [16] Gaines, B. R., "Empirical Investigation of Knowledge Representation Servers: Design Issues and Applications Experience with KRS," SIGART Bulletin, Vol. 2, No.3, pp. 45-56. [17] Kaczmarek, T. S., Bates, R., and Robins, G., "Recent Developments in NIKL," Proc. AAAI-86, Philadelphia, PA, 1986, pp. 978.,..985. [18] Levesque, H. J., and Brachman, R. J., "Expressiveness and Tractability in Knowledge Representation and Reasoning," Computational Intelligence, Vol. 3, No.2, Spring, 1987, pp. 78-93. [19] MacGregor, R. M., "A Deductive Pattern Matcher," Proc. AAAI-87, St. Paul, MN, pp. 403-408. [20] Minsky, M., "A Framework for Representing Knowledge." In The Psychology of Computer Vision. P. H. Winston (ed.). New York: McGraw-Hill Book Company, 1975, pp. 211-277. [21] Nebel, B., and Smolka, G., "Attributive Description Formalisms ... and the Rest ofthe World." In Text Understanding in LILOG. O. Herzog and C.-R. Rollinger (eds.). Berlin: Springer-Verlag, 1991, pp. 439-452. [22] Neches, R., Swartout, W. R., and Moore, J., "Enhanced Maintenance and Explanation of Expert Systems Through Explicit Models of Their Development," Proc. IEEE Workshop on Principles of KnowledgeBased Systems, Denver, CO, 1984, pp. 173-183. [23] Nonnenmann, U., and Eddy, J. K., "Knowledge-Based Functional Testing for Large Software Systems," Proc. FGCS-92, Intl. Con! on Fifth Generation Computer Systems, Tokyo, June, 1992. [24] Patel-Schneider, P. F., "Undecidability of Subsumption in NIKL," Artificial Intelligence, Vol. 39, No.2, June, 1989, pp. 263-272. [25] Patel-Schneider, P. F., "A Four-Valued Semantics for Terminological Logics," Artificial Intelligence, Vol. 38, No.3, April, 1989, pp. 319-35l. [26] Patel-Schneider, P. F., "Small can be Beautiful in Knowledge Representation," Proc. IEEE Workshop on Principles of Knowledge-Based Systems, Denver, CO, December, 1984, pp. 11-16. [27] Patel-Schneider, P. F., Brachman, R. J., and Levesque, H. J., "ARGON: Knowledge Representation meets Information Retrieval," Proc. First Con! on Artificial Intelligence Applications, Denver, CO, December, 1984, pp. 280-286. [28] Schmidt-Schauss, M., "Subsumption in KL-ONE is Undecidable," Proc. KR '89: The First Intl. Conf. on Principles of Knowledge Representation and Reasoning, Toronto, May, 1989, pp. 421-431. [29] Selfridge, P. G., "Knowledge Representation Support for a Software Information System," Proc. Seventh IEEE Con! on Artificial Intelligence Applications, Miami Beach, FL, February, 1991, pp. 134-140. [30] Smoliar, S. W., and Swartout, W., "A Report from the Frontiers of Knowledge Representation," Technical Report, USC Information Sciences Institute, Marina del Rey, CA, 1988. [31] Vilain, M., "The Restricted Language Architecture of a Hybrid Representation System," Proc. IJCAI-85, Los Angeles, 1985, pp. 547-55l. [32] von Luck, K., Nebel, B., Peltason, C., and Schmiedel, A., "The Anatomy of the BACK System," KIT Report 41, Technical University of Berlin, January, 1987. [33] Woods, W. A., "What's in a Link: Foundations for Semantic Networks." In Representation and Understanding: Studies in Cognitive Science. D. G. Bobrow and A. M. Collins (eds.). New York: Academic Press, 1975, pp. 35-82. [34] Woods, W. A., and Schmolze, J. G., "The KL-ONE Family," to appear in Computers and Mathematics with Applications, Special Issue on Semantic Networks in Artificial Intelligence. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by rCOT. © ICOT, 1992 1044 Morphe: A Constraint-Based Object-Oriented Language Supporting Situated Knowledge Shigeru Watari, Yasuaki Honda, and Mario Tokoro* e-mail: {watari.honda.mario}@csl.sony.co.jp Sony Computer Science Laboratory Inc. 3-14-13 Higashi-Gotanda, Shinagawa-ku, Tokyo 141, Japan Abstract This article introduces Morphe, a programming language aimed to support construction of open systems. In open systems, the programmer cannot completely anticipate the future use of his programs as components of new environments. When independently developed systems are integrated into an open system, we eventually have inconsistent representations of the same object. This is because knowledge about the world is partial and relative to a perspective. We show how Morphe treats relative (and eventually inconsistent) knowledge by incorporating the notions of situations and perspectives. 1 Introduction In modeling complex systems, one is often required to work with mUltiple representations of some aspects of reality. The notion of situation has been studied in computer science [Barwise 83][Barwise 89][Cooper 90] as an important concept in capturing the relative representation of knowledge about the world. The importance of such a notion stems from the epistemological assumption that any representation of the world is partial and relative to some perspective-that of the observer. In the cognitive process, the observer abstracts from reality only those aspects that he finds relevant; irrelevant portions are discarded. Sometimes this limited, abstracted representation is sufficient to allow one to perform certain tasks. In such cases we do not Heed to think about relative perspectives, and we can work as though our knowledge were an absolute and unique mapping of the real world. However, there are plenty of examples that show this is not true. In order to understand what is happening in the target world, we are forced to assume * Also with Keio Uuiversity. 3-14-1 Hiyoshi, Kohoku-ku. Yokohama, 223 JAPAN e-mail: lllariocQlkeio.ac.jp that the representation we are working with is relative, and furthermore, that we must eventually change perspectives in order to capture the real properties of the system we are representing. This is often the case when we have ambiguous representations and we are not able to resolve this ambiguity until we have some further information at hand. Typically ambiguity arises when we try to combine information from different sources. For example, in dialogue understanding the knowledge of the one must be combined with the knowledge of the other to capture the exact meaning of an utterance [Numaoka 90]. Whenever there is some inconsistent information, the speakers must exchange further information in order to resolve the inconsistency. Other examples can be seen in multi-agent systems [Bond 88][Osawa 9I]-where we have different agents with different knowledge bases that must be partially shared-and versioning systems as used in software development tools and engineering databases [Katz 90]-where we have different versions of the same object. A ground for extensive use of the notion of situation is in open systems [Hewitt 84], because in open systems the designer of a program cannot know a priori the nature of the environments in which their pieces of knowledge (called objects henceforth) will be used in the future. Along with its continuous evolution, an open system must be capable of integrating pieces of knowledge from different sources, and eventually these new pieces will conflict with existing ones. In this paper we formalize the notion of situation as embedded in Morphe, a knowledge base and programming system which supports construction of open systems. Situation in Morphe is associated with a general notion of environment of interpretation. It represents a consistent set of properties (described by formulas) in a multi-version knowledge base. Rather than being a mere name for a part of the absolute real world, a situation has its own representation in Morphe, namely a routed, 1045 directed, acyclic, and colored graph. The notion of situation provides for two novel concepts: compositional adaptation and situated polymorphic objects. With compositional adaptation component objects are grouped within composite objects so that a component object is made to adapt to the requirements of the environment represented by the composite object. Situated polymorphic objects are objects that have multiple representations which depend on the situation they are used in. Situation is used to disambiguate the ambivalent interpretation of situated polymorphic objects. The remainder of this paper is organized as follows: Section 2 gives an overview of Morphe's features through some examples. Morphe's formal syntax and semantics are sketched in Section 3 and Section 4, respectively. In this work we concentrate on the data modeling aspect of Morphe. Some important features (such as setvalued attributes, distinction between local and sharable attributes, user-defined constraints, and dynamic generation of new situations at update transactions) were not treated in the presentation for the sake of brevity and clarity. In Section 4 we give emphasis in showing how the domain of colored dags fits well to representing different perspectives to a shared object. In the last section we conclude this work. in that in Morphe even in a single knowledge base version we can have different object versions. The programmer can chose a particular version of the knowledge base through situation descriptors-formulas that index terms-which can be used within programs or in queries. In the development phase of a system, Morphe keeps track of transaction updates and creates consistent versions of the knowledge base. 1 2.1 We will represent Sony CSL, a computer science laboratory, where Mario works as a director. We know that a representation of Mario already exists in the system and we want to share that representation. The existing representation is of Mario as a professor at an university. 1. person: [ 2. 3. 2 Overview of Morphe Morphe is a programming language which integrates object-oriented programming, constraint-based logic programming, and situated programming. It features: • Querying capability for knowledge bases, • Incremental construction of systems with inheritance and adaptive reuse of existent software, • Multiple representations, • Treatment of inconsistent knowledge through the notion of situation. The basic aim of Morphe is to provide a system that supports easy construction of open information systems. There are two areas of support that are essential: 1. Easy integration of new pieces of knowledge, and 2. Treatment of shared inconsistent knowledge. The Morphe system is a multi-version knowledge base with multi-versioned objects. We use the term multiversion knowledge base following the notion of multiversion databases as introduced by Cellary and Jomier in [Cellary 90]. Our approach differs from Cellary-Jomier's Example: Mario Joins Sony CSL 4. 5. name : string; age : integer; sex : {male, female}; age ~ 0]; laboratory : [ name : string; director : person; researcher :: person]; mario: person * [ name : "Mario"; age: 44]; scsI : laboratory * [ name : "Sony CSL"; director : person * [ machine: "NEWS"]]; scsl.director = mario. The first two expressions define the types for person and laboratory, and expressions 3 and 4 define mario and scsI as "instances" of person and laboratory, respectively. Expression 5 makes mario join scsI as its director. Objects in Morphe are typed. For example, the expressions name: string and age: integer specify that the name of a person has type string and the age of a person has type integer. String and integer are primitive types provided in Morphe. The colon in those expressions represents a built-in predicate that specifies the type of the term on its left-hand side. Another builtin predicate is the one represented by the equal sign, as in director = mario, which specifies that director IThe operational aspects of manipulating situations are not emphasized in this work. Instead we will emphasize the declarative (or modeling) aspects of objects and situations. 1046 and mario should have the same type. Expressions comprising these built-in predicates are called formulas or constraints. 2 We can also construct complex types from primitive ones through object descriptors. An object descriptor is a set of formulas enclosed in brackets ("[]"). In the example, the expression person: [.. :] introduces a new type named person defined by the object descriptor on the right hand side of the colon. As in unification grammar formalisms [Shieber 86] and some logic based programming languages [Kifer 89] [Yokota 92], Morphe does riot make a distinction between classes and instances. Strictly speaking, every expression in Morphe is a type expression, and the execution of a Morphe program consists of finding the appropriate types for the variables, or in other words, solving the set of type constraints. Morphe provides domain specific constraint solvers and allows users to define predicates for new domains, as the predicate ;::: in the expression age ;::: o. In this article we concentrate on showing how Morphe treats the notions of situations and polymorphic objects, leaving the discussion of other forms of constraints for another paper. Expressions using the colon predicate resemble attribute-value pairs of feature structure grammars and hence we sometimes refer, though improperly, to terms on the left-hand side of the colon operator as attributes and those on the right-hand side as values. Besides object descriptors, there is another type of constructor: braces ("{}"). While object descriptors construct types intensionally, from formulas, braces construct types extensionally, from terms. For exam pIe, the expression sex: {male, female} specifies that the attribute sex of a person has type male or female. Stated in another way, the same expression defines a new type person. sex as a set of two constant types {male, female}. A type can be made more and more specific as we add more restrictive constraints (formulas) into the associated object descriptor, and it becomes an "instance" when all the attributes are assigned constant types. In the code above, scsI is an instance of laboratory because the formulas in the object descriptor of the former are more restrictive than those in the object descriptor of the latter. Because all terms are types, even scsI, which is an "instance", can be made more specific by adding more formulas into its object descriptor. The way to do so is by composing object descrip2The term "constraint" used here follows the terminology of constraint logic programming framework as formalized by Jaffar and Lassez in [Jaffar 87J. tors through C C *' ,, the composition operator. The code which defines mario composes the type person with the object descriptor [name: mario j age: 44]. The resulting object descriptor contains all the formulas of both operand types. The constraint solver then evaluates the most specific set of formulas in the resulting object descript or , yielding [name: "Mario" j age: 44] as the type of mario. Determining the most specific sets of formulas is the same as determining the greatest lower bound of a set of terms. The associated procedure for determining the greatest lower bound is called unification, following the terminology of feature-structure grammar formalisms [Shieber 86]. 2.2 Compositional Adaptation With composition we can refine a type by giving more specific "values" for the attributes-as in mario aboveor we can add new properties to an existing type. The type laboratory. director in the example is defined as a person plus an additional attribute: machine. Morphe allows for creating new types in a very particular way. The type director is defined in a specific context: scsI. This is an essential aspect of what we call compositional adaptation[Honda 92]. With compositional adaptation we make an object "adapt" to a new environment by transforming the object so that it obeys the type constraints specified in the environment. This process takes place when the predicate C C =" is evaluated. When the expression director = mario is evaluated, it either succeeds or fails. If it succeeds, the object denoted by scsI. director is unified with the object denoted by mario, and the result of the unification can be accessed from both scsl.director and mario. 3 The object enters a new environment "acquiring" new properties and constraints. In the example, mario acquires the additional attribute machine as specified in the environment scsI, and scsI. director acquires all the original properties of mario. 2.3 Situated Polymorphic Objects In programming languages, the term polymorphism has been traditionally associated with the capability of giving different things the same name. Morphe's notion of polymorphism follows in the same vein. In Morphe the 3The full version of Morphe allows programmers to specify which components of the type are private (i.e., local) and which are public (i.e., sharable). The public part of two objects must be compatible for the unification to succeed, while the private part is not affected in the unification. 1047 same object can have different versions, eventually incompatible with each other. Incompatible versions of an object are called morphes, and objects that have multiple morphes are called polymorphic objects. By incompatibility of morphes we mean incompatibility of their types. 4 Different morphes of the same (polymorphic) object may fundamentally mean two things: 1) different states due to updates, or 2) different representations due to different perspectives. Each morphe of a polymorphic object is situated. The evaluation of a polymorphic object is the evaluation of a morphe, the selection of which is subordinated to the selection of a situation where the object participates. Each morphe is a consistent set of constraints that describe the behavior of the object in a given situation. For instance, a person may exhibit different and eventually contradictory behavior depending on the situation in which he acts. Inconsistent sets of constraints yield different values to be assigned to the same attribute. For example, suppose that the definition of mario, instead of that given in expression 3, had been: mario: person * [name: ' 'Mario' '; birthyear : 1947; sex : male; machine: ' 'Mac' ']; After mario joins scsi, the attribute machine of mario is assigned the value' 'News' ) when he plays his role as scsi. director and a different value-' 'Mac) )- in other situations. 2.4 Specifying a Situation Morphe's notion of situation is tied to the notion of environment of interpretation. In the domain of interpretation, a situation is a graph representing the program being interpreted. Situations are used to disambiguate inconsistencies in the knowledge base. When an object participates in different environments (eventually created by independent programs) and is subject to independent transformations, it is often the case that the object must behave differently in each of them. Once the programmer wants a different view (or representation) for the object, the system creates a new version of the object in such a way that the situation is kept consistent. When evaluating an expression within a situation, the system keeps track of the path through which the object containing that expression is being accessed. Access to an object from different perspectives is realized as different paths to the object. A path is a sequence of labels that allows one to navigate through the entire system, 4Informally, incompatible types means that the values of a type cannot be the values of the other. We give a formal definition of type incompatibility in the next section. along the arcs in the graph. For example, if we want to refer to Mario when he plays his role of a director at SCSL we use the path scsi. director. Paths can be combined with formulas which filters the morphes of an object referred from the same path. For example, if we had several versions of Mario distinguished according to his age, we could access the representation of Mario at Sony CSL when he was at the age of 40 by using the expression: scsl.directorCO[age = 40] . We can also change the perspective by switching the path in the navigation. For example, we can switch the view from mario to scsl.director with the path mario i scsl.director, which gives us the representation of mario from scsi. director's perspective. 3 Syntax The alphabet of Morphe consists of: 1) A: a set of atoms, 2) L: a set of labels, 3) X: an infinite set of variables, 4) the distinguished predicate symbols: ":" (colon) and "=" (equal), 5) the composition operator "*", 6) the logical connective "j", 7) the path constructors: ".", "j", and "@"j 8) the auxiliary symbols "( )", "[ 1", "{ }", ",", and "" Atoms denote primitive indivisible objects. Example atoms are: integer, string, 3, and "Mary'). Labels are the names of the objects. The distinguished label Home denotes the topmost object in a particular situation. 5 In the semantic domain, the label names an arc which allows access to the objects down the (directed) graph. 3.1 Terms (7) Objects are denoted by terms. Terms are defined by: 7 ::= x I a I p I [Ill 7 *7 where x are variables, a are atoms, p are paths, I are formulas, and 7 * 7 are compositions. The terms of the form [fl are called object descriptors. Object descriptors construct complex objects through formulas, which are defined by: I ::= p : 7 I7 = 71 Ij I A colon predicate is a typing constraint. An expression e : t, where e is a path and t is a term, specifies that the 5 Typically, the object denoted by Home represents the user's "home object", which' is the user's entry-point into the Morphe system. 1048 type of the object denoted by e has at least the properties defined by t. For example, the formula mario: person specifies that the mario has at least the properties specified by person. The equal predicate specifies object sharing. Given el : tl and e2 : t 2, where el and e2 are paths, the expression el = e2 states that el and e2 denote the same object, and hence they have equal types. The shared object is "viewed" from different perspectives: any change to the object performed from a perspective must be reflected into other perspectives. Because the atomic predicates colon (":") and equal ("=") impose a structure on the objects in the domain of interpretation (Le., graphs), they are called structural predicates, in contrast to other domain predicates and user defined predicates. In this article we discuss only the structural predicates and hence we call them simply predicates. A path names an object through a sequence of labels. Paths are defined by: p ::= lll.p I pip I p@[J] where l are labels. When an object is polymorphic due to different access paths, we select a morphe by the associated path. For example, in the subsystem: a: [b: [c: X]id: [c: Y]ia.b = a.d] the polymorphic value of c can be disambiguated through the appropriate path: a.b.c : x, and a.d.c : y. A path of the form PI i P2 is a path switch. It allows one to view the same object from a different perspective. For example, the value of a.b i d.c is y, instead of x. A path of the form p@[J] is called a conditional path. The formula enclosed in brackets on the right hand side of the @ sign is called a situation descriptor, because it specifies a family of situations which entail !. A conditional path has a meaning only in the situations where the formula enclosed in the brackets is entailed. For notational convenience we write 1: {tl@[fl], t2@[f2]} instead of 1@[f 1 ] : tli 1@[f 2] : t2' Conditional paths are used to select version morphes of polymorphic objects. For example, given Composition is a binary operation TxT --+ T which composes two terms to produce a new term. Given two terms tl and t2, their composition h * t2 is the union of the formulas contained in both terms. For example, [name: "John"i age: "integer"] * [age: 23] == [name: "John"; age: integer; age: 23]. 3.2 Ordering on Terms We have seen that terms denote objects in the intended domain, and formulas associate terms in order to represent complex structures in that domain. The colon operator specifies the structure of the object denoted by a given path. We can now amplify its use as a binary predicate over two terms to construct a partial ordering in the set of terms. We start with atoms. We assume that the atoms in A are partially ordered according to a binary relation represented by "~A'" For example: "Mary" ~A string, and 3 ~A integer. If x ~A y and y ~A X we say that x and yare congruent, and write x ~ A y. The greatest lower bound of a set of elements B C A, denoted by 1 B is defined as usual: 1 B = in! E A such that Vx E B. in! ~A x. For notational convenience, we will denote the greatest lower bound of two atoms x and y by x 1 y. The greatest lower bound does not always exist. The elements c of A such that x : c implies x ~ A C are called the constants of A. We extend the partial ordering to the set of terms with the binary relation ":", defined by the rules below. In these rules, r is a set of formulas which defines a situation. r f- x : y (if x, yEA and x S:A y) rf-t:O r, (e : t) f- e :t r f- e@[4>] : t r,4>f-e:t a: [b: {X,Y}iC: {w@[b: x],v@[b: y]}] where a, b, and c are labels and x, y, w, and v are atoms, there are two possible values of a. c, which depend on the possible values of b. The formulas b : Xi c : w) and b : Yi c : v determine two distinct situations of a. The value of a.c can be disambiguated by providing an appropriate conditional path: a.b.c@[b: x] : w, and a.b.c@[b : y] : v. r f- tl : t~ ... r f- tn : t~ r f- [EI: tlj ... jln: tnj ... jlm: tmJ: [EI : t~j ... jln: t~J rf-t:t r f- tl : t2 r f- t2 : t3 r f- tl : t3 1049 The congruence relation on the set of terms is defined by: x ~ y iff x : y and y : x. The operation! that gives the greatest lower bound of a set of atoms is also extended to terms. The rules below describe U, the greatest lower bound of two terms, defined so that t1 U t2 : h and t1 U t2 : t2. []ut~t xUy~xly [l : t] U [l : t'] [1 : (t u t')] ~ [h : t1] U [b : t2] ~ [h : t1j 12 : t2] [h : t1j ... ;In : tn;li : ti; ... ;I~ : t~] U [11 : t~j ... ;In : t~jl{ : t{; ... j Ii : ti] ~ [h : t1 U t~ j ••• j In : tn U t~j Ii : ti; ... j l~ : t~jh :t~j ... jIn :t~jl{ :tL···jl{ :4] t1 U t2 ~ t2 U t1 t1 U (t2 u t3) tut ~ ~ 4.2 Definition: Graph Morphism A graph morphism I : 9 IN : Ng ---t Ngl and IA : Ag g' is a pair of functions Agl such that: and I A preserve the incidence relations: srcUA(a)) IN(src(a)) and tgtUA(a)) IN(tgt(a)), U t3 t U t2 does not Semantics The formal semantics of Morphe is based on the algebraic approach to graph grammars as described in [Ehrig 86] and [Ehrig 90]. The domain of interpretation of Morphe is a set of colored, rooted, directed, and acyclic graphs. Following [ParisiPresicce 86]6, we impose a structur~ in the coloring alphabet in order to represent unification in that domain. 2. I A preserve the arc colors: Va E Ag. color:'UA(a)) = color:(a), and 3. Vx ENg. cOlor:'UN(X)) ~ color:(x). A graph morphism indicates the occurrence of a graph within another graph. A graph morphism I = UN, I A) is called injective if both IN and I A are injective mappings, and it is called surjective if both IN and I A are surjective. If I : 9 ---t g' is injective and surjective it is called an isomorphism, and there is also an inverse isomorphism 1-1 : g' ---t g. In this case we say that 9 and g' are congruent and write 9 ~g g'. 4.3 4.1 ---t ---t 1. IN (t1 u t2) Two terms t1 and t2 are incompatible iff h exist. 4 color: : Ag ---t CA associates a color to each arc; srcg : Ag ---t N g associates with each arc a unique source node; tgt g : Ag ---t Ng associates with each arc a unique target node; rootg is a distinguished node called the root of the graph. It satisfies: tgt- 1(root g) = 0. In what follows we refer to C-dags as graphs. A graph 9 is a subgraph of g' (written g ~g g') iff N g ~ Ngl, Ag ~ Agl, and the functions color:, color:, srcg, and tgtg are the restrictions of the corresponding mappings of g'. Definition: Colored Graphs Let X be an infinite set of variables, A the set of atoms, L the set oflabels (as introduced in Section 3), and 0 a set of identifiers. Let C = (CN, CA) be a pair of alphabets where CN = OuAuX and C A = L. The partial-order in A, SA, is extended on C N (and denoted SN) such that x SN y iff X SA y or y EX. A C-colored graph (or C-dag, for short) is a graph 9 over C defined as a tuple where: Ng is the set of nodes; Ag is the set of arcs; color: : Ng ---t CN associates a color to each node; 6F. Parisi-Presicce, H. Ehrig, and U. Montanari allowed variables in graphs (and productions) so that they could represent composition of graphs using relative unification. A. Corradini, U. Montanari, F. Rossi, H. Ehrig, and M. Lowe [Corradini 90] further extended that work to represent general logic programs with hypergraphs and graph productions. Subsumption Subsumption is an ordering on graphs which corresponds to the relative specifity of their structures. A graph 9 subsumes h (h ~9 g) iff there exists a graph-morphism I : 9 ---t h such that I(root g ) = rooth. The semantic counterpart of the greatest upper bound of a set of terms (ref. Section 3.2) is the join of two graphs, which is their "most general unifier". The join of graphs gl and g2 (notated gl Uc g2) is a graph h such that h ~9 gl and h ~9 g2· 4.4 Semantic Structure The semantic structure of Morphe is a tuple A =< 9*, ~g, Ug, T > where: 1. 9*, the domain of interpretation, is the set of all variable-free (Le., ground) C-dags. 1050 2. The relation above. ~g and the operation Ug are as defined 3. Top (T) is the distinguished element of g* defined by: V9 E g*. 9 ~g T. 4.5 Interpretation A consistent set of formulas is represented with a Cdag with variables. The C-dag representation of a set of formulas is called a situation. A Morphe program is mapped by the interpreter into a set of situations which are ordered according to the subsumption relation. The evaluation of a query is a mapping from the C-dag representing the query to the set of situations in the hierarchy. If no situation is specified, the interpreter evaluates in a default situation. While parsing its input, the interpreter keeps track of this situation in order to resolve eventual ambiguities. Let Io: : A ~ eN be a function that maps each atom in A to a node color in CN, and I>.. : L ~ CA another function that maps each label to an arc color in CA' Variable Assignment A variable assignment in a situation s is a mapping fJ, : X ~ g* which maps variables to ground C-dags. We extend the variable assignment to other terms with the following clauses: • If a is an atom, fJ,(s,a) = 9 s.t. N g = {x},Ag = 0, and colorN(x) = Io:(a). • If 1 is a label, fJ,( s, l) 3a EA.,. colorA (a) roots and tgt(a) = rootg. 9 ~g s s.t. I>..(l) and src(a) • If 1is a label, and e is a path, fJ,(s, l.e) = fJ,(fJ,(s, l), e). • If e is a path and ¢ is a formula, fJ,( s, e@[¢]) = fJ,( s, e) if s 1= ¢. • fJ,(s, [4>]) = 9 ~g s s.t. 9 1= 4>. Formulas The "truthness" of a formula is relative to a specific situation. We say that a situation s models a formula ¢ under a variable assignment fJ, (written s 1=1' ¢) iff there is a subgrapll of s with the properties specified by the formula. 1=1' e : t iff fJ,(s, e) ~g fJ,(s, t). s 1=1' el = e2 iff fJ,(s, el) ~g. fJ,(s, e2). s 1=1' ¢j'IjJ iff S 1=1' ¢ and s 1=1' 'IjJ 5 Conclusion This paper has shown how the notions of situation and polymorphic objects in Morphe can handle situated knowledge in open systems. We claim that the Morphe features shown here are suited to support incremental development of a complex system. When a set of constraints is added to a situation, the new formulas may conflict with the old ones. Morphe helps the developer to find the locus of inconsistency, and in the cases where the programmer wants a new version of the system, Morphe splits the inconsistent situation into new subsituations whenever it is possible. Some meta-rules based on domain-dependent heuristics may help the system to decide on which actions to take in the presence of conflict. Syntactically, a situation was defined as a set of fOfllmlas which define a hierarchy of versions of the knowledge' base. Situation descriptors can be used in programs in order to specify a priori the family of situations in which the program is expected to work. Once the system is provided with a way to determine the right situation, the associated morphe can be selected and then passed to the constraint solver in order to proceed with the evaluation of the program or the query. Most existing typed programming languages impose a distinction between types and values syntactically, and types are usually associated with the variables in order to check whether the value assigned to a variable is compatible with the associated type. Morphe does not impose such a distinction at the syntactic level, though it bears both the notions of "types" and "values". An equal treatment of types and values was achieved in Morphe by imposing a partial order on the set of terms. This partial ordering was identified as the subsumption relation over directed acyclic graphs in the domain of interpretation. In this work we have shown only those features that we find most interesting to capture the intuitive notion of relative knowledge, perspective, and situations. Problems concerning changes of situations in the presence of transaction updates, locality of information and sharing (Le., unification), database querying facilities, and the operational semantics were not treated here. We hope however that the contents of this article have given the readers an insight on the problems and solutions concerning relative representations of objects in open systems. Acknowledgments S Sony Computer Science Laboratory has been a privileged environment for discussing the problems and requirements of open distributed systems. Discussions with the 1051 other members of this laboratory have provided the underlying motivations for developing Morphe.. In particular, we wish to thank Ei-Ichi Osawa, for his collaboration at the initial phase of Morphe, and Akikazu Takeuchi and Chisato Numaoka for their helpful comments on the formalisms presented in this work. Watari thanks the members of Next-Generation Database Working Group promoted by ICOT. Discussions in the group promoted a better understanding of the requirements for advanced data base programming languages. References [Barwise 83] Jon Barwise and John Perry. and Attitudes. The MIT Press, 1983. Situations [Barwise 89] Jon Barwise. The Situation in Logic. Center for the Study of Language and Information, 1989. [Hewitt 84] Carl Hewitt and Peter de Jong. Open Systems. In J. Mylopoulos and J. W. Schmidt M. L. Brodie, editors, On Conceptual Modeling, Springer-Verlag, 1984. [Honda 92] Yasuaki Honda, Shigeru Wat ari , and Mario Tokoro. Compositional Adaptation: A New Method for Constructing Software for Open-ended Systems. JSSST Computer Software, Vo1.9, No.2, March 1992. [Jaffar 87] Joxan Jaffar and Jean-Louis Lassez. Constraint Logic Programming. In Proceedings of the Fourteenth ACM Symposium of the Principles of Programming Languages (POPL'87), January 1987. [Katz 90] Randy H. Katz. Toward a Unified Framework for Version Modeling in Engineering Databases. ACM Computing Surveys, Vo1.22, No.4, December 1990. [Bond 88] Alan H. Bond and Les Gasser, editors. Readings in Distributed Artificial Intelligence. Morgan Kaufmann, 1988. [Kifer 89] Michael Kifer and Georg Lausen. F-Logic: A Higher-Order Language for Reasoning about Objects, Inheritance, and Scheme. In Proceedings of the ACM SIGMOD Conference on Management of Data, ACM, 1989. [Cellary 90] Wojciech Cellary and Genevieve Jomier. Consistency of Versions in Object-Oriented Databases. In Dennis McLeod, Ron Sacks-Davis, and Hans Schek, editors, Proceedings of 16th International Conference on Very Large Databases, August 1990. [Numaoka 90] Chisato Numaoka and Mario Tokoro. Conversation Among Situated Agents. In Proceedings of the Tenth International Workshop on Distributed Artificial Intelligence, October 1990. [Cooper 90] Robin Cooper, Kuniaki Mukai, and John Perry, editors. Situation Theory and its Applications - Volume 1. Center for the Study of Language and Information, 1990. [Corradini 90] Andrea Corradini, Ugo Montanari, Francesca Rossi, Hartmut Ehrig, and Michael Lowe. Graph Grammars and Logic Programming. In Proc. of the 4th International Workshop on Graph-Grammars and Their Application to Computer Science, SpringerVerlag, March 1990. [Ehrig 86] Hartmut Ehrig. Tutorial Introduction to the Algebraic Approach of Graph Grammars. In Proc. of the 3rd International Workshop on Graph- Grammars and Their Application to Computer Science, SpringerVerlag, December 1986. [Ehrig 90] Hartmut Ehrig and Michael Lowe Martin Korff. 'llitorial Introduction to the Algebraic Approach of Graph Grammars Based on Double and Single Pushouts. In Proc. of the 4th International Workshop on Graph-Grammars and Their Application to Computer Science, Springer-Verlag, March 1990. [Osawa 91] Ei-Ichi Osawa and Mario Tokoro. Collaborative Plan Construction for Multiagent Mutual Planning. Technical Report SCSL-TR-91-008, Sony Computer Science Laboratory, August 1991. [ParisiPresicce 86] Francesco Parisi-Presicce, Hartmut Ehrig,and Ugo Montanari. Graph Rewriting with Unification and Composition. In Proc. of the 3rd International Workshop on Graph- Grammars and Their Application to Computer Science, Springer-Verlag, December 1986. [Shieber 86] Stuart M. Shieber. An Introduction to Unification-Based Approaches to Grammar. Center for the Study of Language and Information, 1986. [Yokota 92] Kazumasa Yokota and Hideki Yasukawa. Towards an Integrated Knowledge Base Management System. In Proceedings of the FGCS'92, ICOT, June 1992. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1052 On the Evolution of 'Objects in a Logic Programming Framework F. Nihan Kesim Marek Sergot Department of Computing, Imperial College 180 Queens Gate, London SW7 2BZ, UK fnk@doc.ic.ac.uk, mjs@doc.ic.ac.uk Abstract The event calculus is a general approach to the representation of time and change in a logic programming framework. We present here a variant which maintains a historical database of changing objects. We begin by considering changes to the internal state of an object, and the creation and deletion of objects. We then present separately the modifications that are necessary to support the mutation of objects, that is to say, allowing objects to change class and internal structure without loss of identity. The aims are twofold: to present the modified event calculus and comment on its relative merits compared with the standard versions; and to raise some general issues about object-orientation in databases which do not come to light if dynamic aspects are ignored. 1 Introduction There has been considerable research on combining logic-based and object-oriented systems, and reasoning with complex objects. Many proposals have been put forward for incorporating features of objectoriented systems into logic programming and deductive databases [Abiteboul and Grumbach 1988, Zaniolo 1985, Chen and Warren 1989, Kifer and Lausen 1989, Dalal and Gangopadhyay 1989, Maier 1986, Bancilhon and Khoshafian 1986]. But opinions vary widely as to what are the characteristic and beneficial features of objects and comparatively little attention has been given to the dynamic aspects of objects. Yet change in internal state of an object as it evolves over time is often seen as a characteristic feature of object-oriented programming; and the ability of object-oriented representations to cope gracefully with change has often been cited as a major advantage of this style of representation. It is these dynamic aspects that we wish to address in this paper. We are not concerned with object-oriented programming, but with object-oriented representation of data in (deductive) databases. We address such problems as how objects change state, how deletion and creation of objects can be described and how an evolving object can change its class over time. In order to avoid the discussion of destructive assignment, we formulate change in the context of a historical database which stores all past states of objects in the database. Historical databases are logically simpler than snapshot databases because change is then simply addition of new input. A snapshot of the historical database at any given time is an objectoriented database in the sense that it supports an object-based data model. In this paper we present an object-based variant of the event calculus [Kowalski and Sergot 1986] which is a general approach to the treatment of time and change within a logic programming framework. We use this modified event calculus to describe changes to objects. The objectives of this paper are twofold: to present the object-based variant of the event calculus; and to raise some general issues about objectorientation in databases that we believe do not come to light if dynamic aspects are ignored. These more general points are touched upon in the course of the presentation, and identified explicitly in the concluding section. In the following section we give a brief summary of the original event calculus. Section 3 presents the basic data model that is supported by the objectbased variant. In section 4 we present this objectbased variant and discuss how it can be applied to describe changes in objects. In section 5 we address the mutation of objects, where objects are allowed to change their classes during their evolution. We conclude the paper by summarising, and making some remarks about object-based representations in general. 1053 2 The Event Calculus holds_at(R, T) ;happens(Ev, Ts), Ts ::; T, initiates(Ev, R), . not broken(R, Ts, T). We have omitted the clauses for holds_for which are similar. The interpretation of not as negation by failure in the last condition for holds_at gives a form of default persistence: property R is assumed to hold at all times after its initiation by event Ev unless there is information to the contrary. The event calculus has been developed and ex.tended in various different ways (see for instance [Sripada 1991, Eshghi 1988]). But what is important for present purposes is to stress that the underlying data model in all of these applications is relational. The properties that events initiate and terminate are facts like rank(jim,professor). In database terms they are tuples of relations; in logic programming terms they are ground unit clauses or ground atoms or standard first order terms, depending on what is taken as the semantics of holds_at. A snapshot of the historical database at any given time is a relational database. In this paper we modify the event calculus in order to describe changes to a database which supports an object-oriented data model. Before moving on to present this modification, we wish to make one further remark about the representation of events. One of the most common motivations for introducing object-oriented extensions to logic programming languages [Chen and Warren 1989, Ait-Kaci and Nasr 1986, M. KiferandWu 1990] is to overcome the restrictions h:nposed by the fixed arity of predicates and functors. These restrictions are particularly evident in the representation of events: Jim was promoted to professor in 1989, Jim was promoted from lecturer, Jim was promoted by his department in 1989 could all be descriptions of the same promotion recording different amounts of information about the event. In general, it is difficult or impossible to devise a fixed arity representation for events, because these representations cannot cope gracefully with the range of descriptions that can be expected even for events of the same type. (The philosopher Kenny refers to this phenomenon as the 'variable polyadicity' of events.) The standard way of dealing with 'variable polyadicity' is to employ binary predicates. Thus [Kowalski and Sergot 1986] represents events in the following style: event(e1). act(e1, promote). object(e1, jim). newrank(e1, prof). happens(e1,1989) . broken(R, Ts, T) ;happens(Ev* , T*), Ts < T* ::; T, terminates(Ev*, R). Chen and Warren [Chen and Warren 1989] have developed this usage of binary predicates and have given it a formal basis. Their language C-Iogic allows the use of structured terms which can be decomposed into subparts. These terms are record-like tuples with The primitives of the event calculus are events together with some kind of temporal ordering on them, periods of time, and properties which are the facts and relationships that change over time. Events initiate and terminate periods of time for which properties hold. The effects of each type of event are described by specifying which properties they initiate and terminate. Given a set of events and the times at which they occurred, the event calculus derives (computes) which facts hold at which times. As an example, consider a fragment of a departmental database. An event of type promote(X, New) initiates a period of time for which employee X holds rank New and terminates whatever rank X held at the time of the promotion: initiates(promote(X,New), rank(X,New)). terminates(promote(X, New), rank(X,_)). Given a fragment of data: happens(promote(jim, assistant), 1986). happens(promote(jim, lecturer), 1988). happens(promote(jim,professor), 1991). the event calculus computes answers to queries such as : ?- holds_at(rank(jim,R), 1990). R=lecturer ?- holds_for(rank(jim,lecturer), P). P=1988-1991 The original presentation of the event calculus showed how a computationally useful formulation can be ·derived from general axioms about the properties of periods. It gave particular attention to the case where events (changes in the world) are not necessarily reported in the order in which they actually occur. For the purpose of this paper, it is sufficient to consider only the simplest case, where the assimilation of events into a database is assumed to keep step with the occurrence of changes in the world, and where the times of all event occurrences are known. Under these simplifying assumptions, the event calculus can be formulated thus: 1054 named labels. In the syntax of C-Iogic (also resembling the syntax of LOGIN [Ait-KaciandNasr 1986] and O-logic [Maier 1986]) the event e1 can be described thus: happens(event :, e1[act => promote, object => jim, newrank => prof], 1989). e1 is an identity which uniquely determines the event, and the labels are used to complete the specification of the event. Chen and Warren give a semantics to C-Iogic directly, and also by transformation to an equivalent first-order (Prolog) formulation that uses unary predicates for types and binary predicates for attributes. In this paper we use C-Iogic syntax as a convenient shorthand for describing events, and we exploit C-Iogic's transformation to Prolog by mixing C-Iogic and standard Prolog syntax freely. Thus we shall also write, for example, event:e1[act=>promote, object=> jim, newrank=?prof]. happens(e1,1989). Chen and Warren's transformation to Prolog make all of these formulations equivalent. 3 The Data Model Our objective in this paper is to focus' attention on the dynamic aspects of objects. For this purpose, we take a very simple data model which exhibits only the most basic features associated with object orientation. As will be illustrated, this simple data model already raises a number of important problems; further elaborations of this data model are mentioned in the concluding section of the paper. The basic building block of the model is the concept of an object. An object corresponds to a real world entity. Each object has a unique identity to distinguish it from other objects. The objects have attributes whose values can be other objects (Le. their identities). We assume that all attributes are singlevalued. Objects are organized into class hierarchies, defined explicitly by is_a relationships among classes. A class denotes a set of object identities; the class-subclass relation (is_a) is the subset relation. A class describes the internal structure (state) of its instances by attribute names. The state of an instance is determined by the values assigned to these attributes. A subclass inherits the structure (attribute names) of its superclass(es). As an example consider the following class hierarchy: ' person (attributes:name, address) /""- student employee (attributes:section, supervisor) (attributes: dept, rank) The instances of the class student have the internal structure described by the attributes name, address, section and supervisor. Similarly the state of an employee instance is described by name, address, dept and rank. The class hierarchy is represented by is_a relations as: is_a(student, person). is_a(employee, person). The relation between a class and its instances is represented by the instance_ofrelation. The instances of a class C are also instances of the superclasses of C. The instance_of relation can be represented thus: instancLof(tom, student). instance_of(mary, employee). etc. together with instance_of(X, Class) +is_a(Sub, Class), instance_of(X, Sub). These definitions will be adjusted in later sections when we consider time dependent behaviour. Multiple inheritance without overriding can be expressed by the is_a and instance_of relationships. This type of multiple inheritance causes no additional difficulty and is not mentioned again. 4 Object-Based Event Calculus Database applications require an ability to model a changing world. The world changes as a result of the occurrence of events and hence it is very natural to describe such a changing world using a description of events. Given a description of events, it is possible to construct the state of the world using the the event calculus. 4.1 State, Changes One way of dealing with the evolution of an object over time (as suggested to us by several groups, independently) is to view the changing object as a collection of different though related objects. Thus, if we have an employee object jim in the database, which changes over time, jim at time ti, jim at time t2, jim at time ts are all different objects. Their common time-independent attributes are inherited from jim by some kind of 'part_of' mechanism. This approach has a certain appeal, but a moment's thought reveals it must be rejected for practical reasons. Each time an object is modified a new object is created. This new object becomes the most recent state of the object with a different identity. In this case, all other objects referring to the modified object should also be modified to refer to the new version. However updating them means creating other new objects in turn, 1055 which results in an explosion in the number of objects in the database. In [M. Kifer and Wu 1990] a system of this type is described. They have to' use equality in order to make certain denotations (i.e. object ids) in fact refer to the same object and provide some navigation methods through versions in order to get appropriate versions of an object. The alternative is to have one 'Object jim and to parametrize its attributes with times at which these attributes have various values. A state change in an object now corresponds to changing the value of any of its attributes. For instance if a person moves to a new place, the value of the address attribute changes; if an employee is promoted the rank attribute changes accordingly. Formulation of this idea in the spirit of the event calculus is straightforward. Instead of happens(promote(jim,professor}, 1991}. it is convenient to separate out the object that has been affected by the event : happens(jim, promote(professor}, 1991}. Now events are indexed by object; every object has associated with it the events that affected it. Events initiate and terminate periods of time for which a given attribute of a given object takes a particular value: initiates(Obj, promote(NewRank}, rank, NewRank}. Given a set of event descriptions which are indexed by object identities, the modified event calculus constructs the state of an object. We can ask queries to find out the value of an attribute of an object at a specific time or we can access the state of an object at any time by querying all of its attributes : ?- holds_at(jim, rank, R, 1989}. ?- holds_at(jim, Attr, Val, 1989}. The following is the basic formulation of the objectbased event calculus used to reason about the changing state of objects : holds_at(Obj, Attr, Val, T} ~ happens(Obj, Ev, Ts}, Ts ~ T, initiates(Obj, Ev, Attr, Val}, not broken(Obi, Attr, Val, Ts, T}. broken(Obj, Attr, Val, Ts, T} ~ happens(Obj, Ev·, T·}, Ts < T· ~ T, terminates(Obj, Ev·, Attr, Val}. terminates(Obj, Ev·, Attr, _} ~ initiates(Obj, Ev·, Attr, _}. Informally, to find the value of an attribute of an object at time T, we find an event which happened before time T, and initiated the value of that attribute; and then we check that no other event which terminates that value has happened to the object in the meantime. The last clause for terminates is to satisfy the functionality constraint of the attributes. Since we are considering only single-valued attributes we can simply state that the value of an attribute is terminated if an event initiates it to another value. (The usage of the anonymous variable '_' in this clause is not a mistake). The original event calculus can compute the periods of time for which a property holds. We can have the same facility for· the attributes of objects. The following compute the periods of time for which an attribute takes a particular value : holds_for(Obj, Attr, Val, (8 - E)} ~ happens(Obj, Ev, S}, initiates(Obj, Ev, Attr, Val}, terminated(Obj, Attr, Val, 8, E}. terminated(Obj, Attr, Val, 8, E} ~ happens(Obj, Ev, E}, terminates(Obj, Ev, Attr, Val}, 8 not broken(Obj, Attr, Val, 8, E). < E, holds_for is used to find the period of time for which an attribute has a particular value. The time period is represented by its start (8) and end (E) points. We also require another clause for holds_for to deal with periods that have no end-point (Le. an attribute is initiated but there is no event which terminated its value). This can be written in a similar style, which we omit. Since objects are organized into classes, it is natural and convenient to structure the specification of the effects of a given event according to the class of object it affects. If an event is defined to affect the instances of a class, then the same event specification applies to the instances of subclasses. For example, consider a departmental database in which objects are organized according to the class hierarchy given in section 3. We can specify the effects of these events in the following way : initiates(Obj, moverAddress}, address,Address} ~ instance_of(Obi,person}. initiates(Obj,promote(NewRank}, rank, N ewRank} ~ instance_of(Obj, employee}. The effects of changing the address are valid for all persons (Le. all students and employees as well). However promotion is a type of event which can happen to employee objects only. In the formulation as presented here, it is possible to assert that an object of class person was promoted - but this event has no effect (does not initiate or terminate anything) unless the object is also an instance of class employee. An 1056 alternative is to arrange for event descriptions to be checked and rejected at input if the class conditions are not satisfied. This alternative requires more explanation than we have space for; it is peripheral to our main points, and we omit further discussion. We have discussed how event calculus can be used to describe changes to the values of attributes of objects. Apart from the events that cause state changes of existing objects, there are other kinds of events which cause the creation of new objects or deletion of objects. 4.2 Creation of Objects The creation of a new object of a given class means adding new information about an entity to the database. In the real world being modeled, there are events which create new entities. Birth of a person, manufacturing of a vehicle or hiring a new employee are examples of such events. We can think of describing object creation by events whose specification will provide the necessary information about the initial state of the object. For creation, we need to say what the class of an object is and specify somehow its initial state. In a practical implementation, generation of a unique identity for a newly created object can be left to the system; conceptually, all object identities exist, and the 'creation' of an object is simply assigning it to a chosen class. Assigning the new identity to the class initiates a period of time for which the new object is a member of that class. This makes it necessary to treat class membership as a time-dependent relationship. We introduce a new predicate assigns to describe instance addition to classes. For the time being we assume that once an object is assigned to a class it remains an instance of this class throughout its lifetime. Class changes are discussed separately in section 5. We can handle creation of objects by specifying which events assign objects to which classes. We use the same event description to initialize the state of the object. As an example consider registration of a student ali, which causes the creation of a new student object in the database. The specification of the event and the necessary rules to describe creation are as follows: event: e23 [act => register, object => ali, section => Ip, supervisor => bob]. assigns(event:E[act=>register, object=> ObJJ, ObJ, student). initiates(Obj, E , section, B) +event : E[act=> register, object=> Obj, section=} Bj. initiates(Obj, E, supervisor, B) +event: E[act=> register, object=> Obj,supervisor=>Sj. The assigns statement is used to assign the identity of the object Obj to the class student; the initiates statements are used to initialize the object's state. Now the occurrence of the event is recorded by : happens(e23, 1991). To specify that the event has happened to the object ali we use the rule: happens(Obj, Ev, T) +happens(event:Ev[act=> register, object=} Obj}, T). Note that we have two happens predicates: one binary (for asserting that events happened at a given time), and one ternary (to index events by objeds affected). We have to notice also that creating a new object of class C, creates a new instance of the superclasses of C as well. There are several ways to formulate this. The simplest is to write: assigns(Ev, Obj, Class) +is_a(Bub, Class), assigns(Ev, Obj, Sub). 4.3 Deletion of Objects There are two kinds of deletions that we are going to discuss in this paper. One is absolute deletion of an object where the object is removed from the database. The other one deletes an object from its class but keeps it as an instance of another class, possibly one of the superclasses. The second case is related to mutation of objects as they change class, which will be discussed in section 5. For the purposes of this section, we assume that when an object is deleted it is removed from the set of instances of its class and the superclasses, and that all its attribute values are terminated. For example, if a person dies, all the information about that person is deleted from the database. We use a new predicate destroys to specify events that delete objects and write the following: terminates(Obj, Ev, Attr, _) +- destroys(Ev, Obj). This rule has the effect that all attributes Attrdefined in the class of the object and also those inherited from super classes are terminated. If an event destroys an object 0 which is an instance of class C, then that event removes 0 from class C and all superclasses of C. There is one point to consider when deleting objects in object-oriented databases. If we delete an object x, there might be other objects that have stored 1057 the identity of x as a reference. The deletion therefore can lead to dangling references. A basic choice for object-oriented databases is whether to support deletion of objects at all [Zdonik and Maier 1990]. We choose to allow deletion of objects and we eliminate dangling references by adding another rule for the broken predicate: broken(Obj, Attr, Val, Ts, T) .happens(Val, Ev*, T"'), Ts < T'" ::; T, destroys(Ev"', Val). We obtain the effect that the value Val of the attribute Attr is terminated by any event .which destroys the object Val. 4.4 Class Membership As we create and delete objects the instances of a class change. Class membership, which is described by the instance_of relation, is a dynamic relation that changes over time. We can handle the temporal behaviour by adding a time parameter. We now have events that initiate and terminate periods of time for which an object 0 is an instance of a class C. The instance_of relation is affected when a new object is assigned to a class or when an object is destroyed. By analogy with holds_at, the following finds the instances of a class at a specific time : instance_of(Obj, Class, T) .happens(Ev, Ts), Ts::; T, assigns(Ev, Obj, Class), not removed(Obj, Class, Ts, T). removed(Obj, Class, Ts, T) .happens(Obj, Ev"', T*), Ts < T* ::; T, destroys(Ev"' , Obj). With this time variant class membership we can ask queries to find the instances of a class at a specific time. For example: ?- instance_of(Obj, employee, 1980). We can also write the analogue of holds_for to compute periods, which we omit here. In the example we have been using, we have represented the rank of an employee object by including an attribute rank whose value might change over time. But suppose that instead of using an attribute rank, we had chosen to divide the class of employees into various distinct subclasses: is_a (lecture r, employee). is_a(professor, employee). It is at least conceivable that this alternative representation might have been chosen, assuming that all employee objects have roughly the same kind of internal structure. Is the choice between these two representations simply a matter of personal preference? Not if we consider the evolution of objects over time. The first representation allows for change in an employee's rank straightforwardly, since this just changes the value of an attribute. The second does not, since no object can change class in the formulation of this section. The only way of expressing, say, a promotion from lecturer to professor, is by destroying (deleting) the lecturer object and creating a new professor object. But then how do we relate the new professor object to the old lecturer object, and how do we preserve the values of unchanged attributes across the change in class? In the next section we will examine the problem of allowing the class of an individual object to change. 5 Mutation of Objects: Changing the Class The ability to change the class of an object provides support for object evolution [Zdonik 1990]. It lets an object change its structure and behaviour, and still retain its identity. For instance, consider an object that is currently a person. As time passes it might naturally become an instance of the class student and then later an instance of employee. This kind of modification is usually not directly supported by most systems. It may be possible to create another object of the new class and copy information from the old object to it, but one loses the identity of the old object. We want to describe this kind of evolution by event specifications. For example graduation causes a student to change class. If we delete student ali from class student, then he will lose all the attributes he has by virtue of being a student, but retain the attributes he has by virtue of being a person. The effects of this event can be described by removing ali only from class student and terminating his attributes selectively. The attributes that are going to be terminated can be derived from the schema information. For dealing with this type of class change we use a new predicate removes in place of the predicate destroys of section 4.3: removes(event: Ev[act => graduate, object=> ObJJ, Obj, student). terminates(Obj, E, Attr, _) .event:Ev[act=> graduate, object=> Obj}, attribute(student,Attr). 1058 The clauses for the time-dependent instance_of relation must be modified too, to take removes into account: removed(Obj, Class, Ts, T) ~ happens(Obj, Ev*, T*), Ts < T*$ T, removes(Ev*, Obj, Class). The graduation of the student ali corresponds to moving him up the class hierarchy. Now consider hiring ali as an employee. This will correspond to moving down the hierarchy. The specification of an event causing such a change will likely include values to initialize the additional attributes associated with the subclass. So the effects of hiring ali will be to assign him to the employee class and initiate his employee attributes. The event might be: event: e21{act => hire, object => ali, dept => cs, rank => lecturer} And we can declare the following: assigns(event:E{act=>hire, object=> Obj}, Obj, employee). initiates(Obj, E, dept, D) ~ event:E{act=>hire, object=>Obj, dept=>Dj. initiates(Obj, E, rank, R) ~ event:E{act=> hire, object=> Obj, rank=> R}. Note that in changing class first from student to person, then from person to employee, ali retains all the attributes he has as a person. We have described this class change by two separate events: graduation and hiring. We can also imagine a single event which would cause an object to change its class from student to employee directly, say hire-student event. We could then describe the changes using the description of this event: removes(event: E{act => hire-student, object => Obj}, Obj, student). assigns(event: E{act => hire-student, object => Obj}, Obj, employee). The initial values of the additional attributes will again be given in the event specification. As in the case of having two separate events, we have not lost the values of the attributes as a person, and we have not removed the object from class person. We have illustrated three kinds of class changes: changing from a class C to a direct superclass of C, changing from C to a direct subclass of C and changing from C to a sibling class of C in the hierarchy. In general, changing an object from class Cl to class C2 involves removing from Cl and assigning to C2 and specifying in the event description how the initial values of C2 attributes are related to the values of old Cl attributes. 6 Concluding Remarks We have presented a variant of the event calculus which maintains an object-based data model where the standard versions maintain a relational one. Section 4 considered state changes of objects in this framework, and the creation and deletion of objects. Section 5 discussed the modifications that are necessary to support also the mutation of objects - change of an object's class and its internal structure without loss of its identity. There are other object-oriented features that can usefully be incorporated into the object-based data model. Removing the restriction that attributes are all single-valued causes no great' complication. We are currently developing other extensions, such as the inclusion of methods in classes for defining the value of one attribute in terms of the values of other attributes, and we are investigating what additionaJ complications arise when the schema itself is subject to change. In object-oriented terminology, event types - like promote, change-address, and so on - correspond to methods: their effects depend on the class of object that is affected; the predicates intiates and terminates for attribute values, and assigns, destroys and removes for objects and classes are used to implement the methods (they would be replaced by destructive assignment if we maintained only a changing snapshot database). Of course, execution of this event calculus in Prolog does not yield an objectoriented style of computation. At the implementationallevel, objects are not clustered (except by Prolog's first argument indexing), and the computation has no element of message-passing. The implementation and the computational behaviour can be given a more object-oriented flavour by using for example the techniques described by [Chen 1990] for C-Iogic, or the class templates of[McCabe 1988]. We are currently investigating what added value is obtained by adjusting these implementational and computational details. The object-oriented version ofthe event calculus offers some (computational) advantages over the standard relational versions, that we do not go into here for lack of space. Whatever the merits of the objectbased variant of the event calculus, we believe that its formulation forces attention to be given to important aspects of object-orientation that are otherwise ignored. We limit ourselves to two general remarks: 1) In the literature, the terms type and class are often used interchangeably. Sometimes type is used in its technical sense, but then it is common to see illustrative examples resembling 'Mary is of type stu- 1059 dent'. If we consider the dynamics of object-oriented representations, then these examples are either badly chosen or the proposals are fundamentally flawed. 'Mary' might be a student now but this will not hold forever. We could surely not contemplate an approach where an update to a database requires a change to the type system, and hence to the syntax of the representational language. These remarks do not apply to object-oriented programming where there is no need to make provision for updates that change the type of an object. The static notion of a type corresponds to the treatment of a class we presented in section 4: an object mayor may not exist at a given time, but when it exists it is always an instance of the same class . If we wish to go beyond this, to allow objects to mutate, then a dynamic notion of class is essential. This is not to say that types have no place in object-oriented databases. A student can become an employee over time, but a student cannot become a filing cabinet, and a filing cabinet cannot become an orange. Both static types and dynamic notions of class are useful. The consideration of the dynamics of objects - how they are allowed to evolve over time - suggests one immediate and simple criterion for choosing which notion to use: the type of an object cannot change. 2) In section 4.3, we assumed that all attributes of an object are terminated when the object is destroyed; in section 5, removal of an object from the class C terminates all attributes the object has by virtue of being an instance of the class C. The reasoning behind this assumption is this: attributes are used to represent the, possibly complex, internal state of an object. When an object ceases to exist, it is not meaningful to speak any more of its internal state. Of course, some information about an object persists even after it ceases to exist. It is still meaningful to speak of the father of a person who has died, but it is not meaningful to ask whether this person likes oranges or is happy or has an address. The development of these ideas suggests that we should distinguish between what we call 'internal attributes' and 'external relationships'. Internal attributes describe the state of a complex object, and they cease to hold when the object ceases to exist or ceases to be an instance of the class with which these attributes are associated. External relationships continue to hold even after the object ceases to exist. We are being led to a kind of hybrid data model together with some tentative criteria for choosing between representation as attribute and representation as relationship with other objects. The analysis given here is rather superficial, but it indicates the general directions in which we are planning to pursue this work. Acknowledgements. F.N. Kesim would like to acknowledge the financial support by TUBITAK, the Scientific and Research Council of Turkey. References [Abiteboul and Grumbach 1988] S. Abiteboul and S. Grumbach. COL : A logic-based language for complex objects. In International Conference on Extending Database Technology- EDBT'BB, pages 271-293, Venice, Italy, March 1988. [Ait-KaciandNasr 1986] H. Ait-Kaci and R. Nasr. Login: A logic programming language with builtin inheritance. The Journal of Logic Programming, 1986. [Bancilhon and Khoshafian 1986] F. Bancilhon and S. Khoshafian. A calculus for complex objects. In Proceedings of the 5th ACM-SIGACT-SIGMOD Symposium on Principles of Database Systems, pages 53-59, Cambridge, Massachusetts, March 1986. [Chen 1990] Wei dong Chen. A General Logic-Based Approach to Structured Data. PhD thesis, State University of New York at Stony Brook, 1990. [Chen and Warren 1989] W. Chen and D. Warren. C-Iogic of complex objects. In Proceedings of the Bth ACM SIGACT-SIGMOD-SIGART Symposium on the Principles of Database Systems, 1989. [Dalal and Gangopaduyay 1989] M. Dalal and D. Gangopadhyay. OOLP: A translation approach to object-oriented logic programming. In The First International Conference on Deductive and Object-Oriented Databases, pages 555-568, Kyoto,Japan, December 4-6 1989. [Eshghi 1988] K. Eshghi. Abductive Planning with the Event Calculus. In Proc. 5th International Conference on Logic Programming,1988. [Kifer and Lausen 1989] M. Kifer and G. Lausen. Flogic: A higher-order language for reasoning about objects, inheritance, and scheme. In Proceedings of the ACM-SIGMOD Symposium on the Management of Data, pages 134-146, 1989. [Kowalski and Sergot 1986] R.A. Kowalski and M. Sergot. A logic-based calculus of events. New Generation Computing, 4:67-95, 1986. 1060 [M. Kifer and Wu 1990] G. Lausen M. Kifer and J. Wu. Logical foundations of object-oriented and frame-based languages. Technical report, Department of Computer Science, SUNY at Stony Brook, June 1990. [Maier 1986] D. Maier. A logic for objects. In Proceedings of the Workshop on Foundations of Deductive Databases and Logic Programming, pages 6-26, Washington D.C., August 1986. [McCabe 1988] F.G. McCabe. Logic and Objects: Language Application and Implementation. PhD thesis, Department of Computing, Imperial College, 1988. [Sripada 1991] S. M. Sripada. Temporal Reasoning in Deductive Databases. PhD thesis, Department of Computing, Imperial College, 1991. [Zaniolo 1985] C. Zaniolo. The representation and deductive retrieval of complex objects. In Proceedings of Very Large Databases, page 458, Stockholm, 1985. [Zdonik 1990] S. B. Zdonik. Object-oriented type evolution. In F. Bancilhon and P. Buneman, editors, Advances in Database Programming Languages, pages 277-288. ACM Press, 1990. [Zdonik and Maier 1990] S. B. Zdonik and D. Maier, editors. Readings in Object-Oriented Database Systems, chapter 4, page 239. Morgan Kaufmann, 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1061 The panel on a future direction of new generation applications Fumio Mizoguchi Science University of Tokyo Intelligent System Laboratory Noda, Chiba 278, Japan 1 Introduction This paper introduces a panel to be held at the application t.rack of FGCS'92 conference. This panel will be devoted to a future direction of new generation applications. The goal is to discuss about the applications with various paradigms which have been explored in the areas of knowledge representation, logic programming, machine learing and parallel processing. It is my hope that by expressing different perspectives of the panelists, we will understand the importance of the underlying paradigms, the real problem areas, and a direction of next generation applications. The word paradigm itself is originally come from T. Kuhn's book called "The Structure of Scientific Revolution (1962)". Recently, this word is referred by the AI researchers because of its sophisticated meanings which indicates a current research trend or a future direction. Here, I will use this word in this context that implies new bases and views for exploration of applications without too much philosophical discussion. In this short paper, I will attempt to outline the perspectives represented by the panelists. Althouth the ideas and the positions papers will be represented in t.he following pages in the proceedings, I will try to guide the rough views which will be necessary for this panel discussion. The context is my subjective impressions on the current trends and research directions. 2 KR paradigm Ronald J. Brachman will talk about his knowledge representation language called Classic and his experiences through the use of Classic for the developments in applications. He might refer the knowledge representation as K R which follows his research communi ty. KR might be the starting point for any AI based application system. KR is one of the main paradigms of AI researches including natual language understanding and coginitive science. There are a lots of attempts in the design of KR language and systems such as KRL, FRL and KLONE in the late 1970's. The 1980's was the following productive period for KR system developments and theories. The first dedicated international KR conference was held recently, and many important ideas and foundations were presented in the conference. This state of art has been reviewed by R.Brachman at the AAAI meeting in 1990. He has presented KR and issues which are related to the field, history, development of the 1980's, the future of KR and open research problems. I am especially interested in his highlights for the future of KR which predicts the current trends of common knowledge base and ontology. Now, KR should be standardized for the further developments for any knowledge systems. The related paper for Classic will be presented at the technical session and he will talk about his position based upon his paper presentation. The panel will start with KR and related topics. 3 CLP paradigm Catherine Lassez will represent the constraint logic programming(CLP) which is a new face for handling constraints in Operations Research, Computational Geometry, Robotics and Qualitative Physics. Reasoning with constraint is very important for these application areas. These problems aTe sometimes required heavy computational resource and are related to combinatorial characteristics. The novel aspects of CLP is the unified framework of knowledge representation for numeric a,nd non-numeric constraints, solution algorithm and data query system. Also, CLP has been implemented as the programming languages such as CLP(R), CHIP, CAL, Prolog-III and Triton. These languages are used for the various application domains which are linkage between AI and OR. As for the financial applications, CLP is very good affinity for describing the financial equations and relations. Constraint is also useful to the handling qualitative knowledge in Computational Geometry and Naive Physics. In order to show the expressive power of CLP, it is necessary to demonstrate the speed and performance for the same problems which are OR people's proposed. This is challenging for any AI researchers and Logic programmers to persuade other field researchers through the recent progress on programming which can avoid the brute forces of numerical calculation. She will 1062 present her experiences on the developments on the theories and applicat.ions. The details will be shown in her very intensive long position paper in this panel. 4 ILP paradigm Stephen M uggleton will represent his'recent notion of inductive logic programming(ILP) which uses the inverse resolution and relat.ive least general generalisation. ILP is newly formed research area in the integration of machine learning and logic programming. Machine learning is very attractive paradigm for knowledge acquisition and learning which any AI system is addressed. With the advent of machine learning research, there are a lots of developments in tools for classifying large data using concepts learning and neural network methods. Muggleton's recent development for his ILP is called GOLEM which is a first order induction algorit.hm for generating rules from given examples. Each example is a first order ground atom and each rule is a first order Horn clause. Rules can be used to classify new examples. GOLEM is implemented in SUN's using C and very efficient for inducing rules from examples. Another example of ILP will be presented in the invited speaker, Ivan Bratko and he will talk about learning qualitative model of dynamic system using GOLEM learing program. ILP is different from CLP, but in its spirit, idea is come from the logic programming paradigm. As is well known, Shapiro's work on Model Inference System(MIS) is implemented using Prolog and it is very clear logical model for learning. Using logic programming paradigm, ILP is unified approach to induction and deduction which provides knowledge system with more powerfull inference facilities. Namely, as for inductive component, IPL is very useful for inducing rules from data and then, using the rules, system infers deductively data into known diagnostic states. Therefore, ILP is new approach to application with very large data which are further classified into categorization. These kinds of applications are found in the area of protein engineering and fault diagnosis for satellites. He was the organizer of the first ILP workshop and the second workshop which will be held after the FGCS conference. ILP is very young paradigm for machine learning and there will be another exploration in theory and application. He will talk about the recent research with the relationship between Valiant's PAC-Learning framework. Machine learning is most active research area and it will be the next stage that it will deal with realistic problems. 5 PP paradigm Kazuo Taki will represent the Parallel Processing(PP) paradigm which the Fifth Generation Computer System Project aims to explore and to develop both sides of hardware and software derived from the concurrent logic programming which shows affinity for both expressing concurrency and executing in parallel. With the continious efforts in langualge and implementation research in the FGCS project, KL-l has expressive for describing many complex applications programs with efficient performance. Most important aspects in the use of the concurrent system are to built large scale parallel software which is further accumulated as the experiences in parallel programming. A new style of programming requires a new thinking way of programming and the model of computation. This is also true for KL-l language and for applying it to complex applications such as VLSI-design, DNA analysis and legal reasoning system. Basing upon these experiences, he will focuss on the parallel language culture which is necessary for the next generation computer like multi-PSI and PIM. The hardware progress has made rapidly compared with software technology and the accumulation of parallel programming experiences are very important for the re-use and the economy of coding. The current issue of parallel programming is how to transfer knowledge in software technology developed by the FGCS project in order to explore the culture of the concurrent system. Therefore, as for the future directions, PP paradigm is how to use in the widely adopted computational environment. He will talk about the issue of the parallel programming culture and the experiences in the use of KL-l for applications. 6 Future directions I will introduce the various paradigms for knowledge information processing starting from KR to PP. Each paradigm has distinctive and novel features for explorat.ion of applications. As for my position, I am interested in the research on the fusion of paradigms which is the integration of CLP and ILP for example. I will call this paradigm as Inductive Constraint Logic Programming(ICLP not conference name!) which is the natural extension of constraint logic programming into inductive inference for constraints in Spacial Geometry and Robotics. This framework is also useful for the Naive Physics and qualitative reasoning system without large amount of background knowledges for rules generations. We will examine our approach to Naive Kinematics and simple image processing for spacial reasoning. At this stage, the application domain is very simple, but for the research on Robotics t.hat learns, the inductive component is very important in the knowledge acquisition on the constraints and then deductively use the constraints for the further moves. The fusion of paradigms will be necessary foundation for the next generation applications. We should re-examine the current paradigms for the different problems areas such as 0 R, Robotics and Computational Geometry. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1063 Knowledge Representation Theory Meets Reality: Some Brief Lessons from the CLASSIC Experience Ronald J. Brachman AT&T Bell Laboratories, 600 Mountain Ave., Murray Hill, NJ 07974-0636, U.S.A. rjb@research.att.com Abstract Knowledge representation is one of the keys to Artificial Intelligence, and as a result will play a critical role in many next generation computer applications. Recent results in the field look promising, but success on paper may be misleading: there is a significant gap between a theoretical result or proposal and its ultimate impact in practice. Our recent experience in converting a fairly typical knowledge representation design into a usable system illustrates how many aspects of "reduction" to practice can significantly influence and force important changes to the original theoretical foundation. I briefly motivate our work on the CLASSIC representation system and outline a handful of ways in which practice had significant feedback to theory. The general lesson for next generation applications is the need for us in our research on core technology to take more seriously the influence of implementation, applications, and users. 1 Knowledge Representation Representation of knowledge has always been the foundation on which research and development in Artificial Intelligence has rested. While no single representation framework has come to dominate the field, and while there are important challenges to the utility of conventional· representation techniques from "connectionists" arid others, it is very likely that the next generation of AI and AI-related applications will still subscribe to the hypothesis that intelligent behavior can arise from formal reasoning over explicit symbolic representations of world knowledge. The centrality of the need to represent world knowledge in AI systems, expert systems, robots, and Fifth Generation applications has helped increase interest in formal systems for representation and reasoning-so much so that over the last decade, the explicit subfield of "Knowledge Representation" (KR) has taken on its own identity, with its own international conferences, IFIP working group, etc. This subfield has been prolific. It has attracted the attention of the greater AI community with highly visible problems like the "Yale Shooting Problem" and systems like CYC. It has collected its own set of dedicated researchers, and has increasing numbers of graduate students working on formal logics, nonmonotonic reasoning, temporal reasoning, model-based diagnosis, and other important issues of representation and reasoning. It is probably fair to say that in recent years, formal and theoretical work has become preeminent in the KR community.l Concomitantly, it appears to be generally lThis has happened for numerous reasons, and while it believed that when the theory is satisfactory, its reduction to practice will be relatively straightforward. This transition from theory to practice is usually considered uninteresting enough that it is virtually impossible to have a technical paper accepted at a conference that addresses it; it seems to be assumed that all of the "hard" work has been done in developing the theory. This attitude is somewhat defensible: it is common in virtually all other areas of AI; and there often really isn't anything interesting to say further about a KR formalism as it is implemented in a system. However, my own group has had substantial recent experience with the transition of a knowledge representation system from theory to practice that contradicts the common wisdom, and yields an important message for KR research and its role in next generation applications. In particular, our view of what we thought was a clean and clear-and "finished" -formal representation system was substantially influenced by the complexity and constraint of the process of turning the logic into a usable tool. 2 The CLASSIC Effort As of several years ago, we had developed a relatively small, elegant representation logic that was based on many years of experience with description hierarchies and a key inference called classification. As described in a companion paper at this conference [Brachman et al., 1992], the CLASSIC system was a product of many years of effort on numerous systems, all descended from the KL-ONE knowledge representation system. Work on KL-ONE and its successors grew to be quite popular in the US and Europe in the 1980's, largely because of the semantic cleanliness of these languages, the appeal of object-centered (frame) representations, and their provision for some key forms of inference not available in other formalisms (e.g., description classification). The reader familiar with KR research will note that numerous publications in recent years have addressed formal and theoretical issues in "KL-ONE-like" languages, including formal semantics and computational complexity of variant languages. However, the key prior efforts all had some fundamental flaws, and work on CLASSIC was in large part launched to design a formalism that was free of these defects. Another central goal of CLASSIC was to produce a compact logic and ultimately, a small, manageable implemay have some negative consequences (as addressed here), it is positive in many respects. The early history of the field was plagued by vague and inadequate descriptions of ad hoc solutions and computer programs; recent emphasis on formality has encouraged more thorough and rigorous work. 1064 mented representation and reasoning system. A small system has important advantages in a practical setting, such as portability, maintainability, and comprehensibility. Our intention was to eventually put KR technology in the hands of non-expert technical employees, to allow them to build their own domain models and maintain them. CLASSIC was also designed to fill a small number of application needs. We had had experience with a form of deductive information retrieval (most recently in the context of information about a large software system [Devanbu et al., 1991]), and needed a better tool to support this work. We also had envisioned CLASSIC as a deductive, object-oriented database system (see [Borgida et al., 1989]; success on this front was eventually reported in [Selfridge, 1991]). After analyzing the applications, assessing recent progress in KL-ONE-like languages, and solving a number of the technical problems facing earlier systems, we produced a design for CLASSIC that felt complete; the logic was presented in a typical academic-style conference paper in 1989 [Borgida et al., 1989]. In this design, some small concessions were made to potential users, including a procedural test facility that would allow some escape to the host implementation language for cases that CLASSIC could not handle. Given the clarity and simplicity of this original design of CLASSIC, we ourselves held the traditional opinion that there was essentially no research left in implementing the system and having users use it in applications. At that point, we began a typical AI programming effort, to build a version of CLASSIC in COMMON LISP. 3 Influences in the "Reduction" to Practice As the research LISP version neared completion, we began to confer with colleagues in a development organization about the potential distribution of CLASSIC within the company. Despite the availability of a number of AI tools in the marketplace, an internal implementation of CLASSIC held many advantages: we could maintain it and extend it ourselves, in particular, tuning it to real users; we could assure that it integrated with existing, non-AI environments; and we could guarantee that the system had a well-understood, formal foundation (in contrast to virtually all commercially available AI tools). Thus we undertook a collaborative effort to create a truly practical version of CLASSIC, written in c. Our intention was to develop the system, maintain it, create a training course, and eventually find ways to make it useful in the hands of AI novices. To make a long story short, it took at least as much work to get CLASSIC to the point of usability as it did to create the original logic that we originally thought was the culmination of our research. Our view of the language and knowledge base operations supporting it changed substantially as a result of this undertaking, in ways that simply could not be anticipated when consider a paper design of the logic. The factors that influenced the ultimate shape of CLASSIC were quite varied, and in most cases, were not influences that we-or most other typical researchers, I suspect-would have expected to have forced more research before the logic was truly finished. These ranged from the need to be reasonable in the release and main- tenance of the software itself to some specific needs for key applications that could not really have been anticipated until the system was actually put into practical use. Here is a brief synopsis of the five main types of issues that influenced the ultimate shape of the CLASSIC system: • the constraints of creating and supporting a system for real users caused numerous compromises. For one thing, upward compatibility of future releases is a critical issue with real software, and it meant that any construct in the language in which we were not completely confident might better be left out of the released system. Issues of run-time performance (which also dictated the exclusion of some features) also had surprising effects on what we could realistically include in the released version. • certain detailed implementation considerations played a role in determining what was included in the system. These included certain tradeoffs that affected the design, such as the tremendous space consequences an inverse relationship ("inverse roles") feature would have had, or the consequences of certain fine-grained forms of truth maintenance (to allow for later retraction of asserted facts). Some features (our SAME-AS construct, for example) were just so complex to implement that they were better left out of the initial release. • concern for real users alerted us to issues easily ignored with a pure logic. These involved the sheer learn ability and usability of the language and the system. Errorhandling, for example, was of paramount concern to our real consumers, and yet the very idea never arose when considering the initial CLASSIC language. Similarly, the uniformity of abstractions and the simplicity of the interface were critical to acceptability of our system. The potential consequences of user "escapes" with side-effects was another related concern. Finally, explanation of the system's behavior-again, not an issue when we designed the logic-might make the difference between success and failure in using the system. • as soon as a system is put to any real use, mismatches in its capabilities and specific application needs become very evident. In this respect, there seems to be all the difference in the world between the few small examples given in typical research papers and the details of real, sizable knowledge bases. In the case of CLASSIC, our lack of attention to the details of numbers and strings in the logic meant substantial more work before implementation. Another issue that plagued us was the lack of attention to a query language for our KR system (a common lack in most AI KR proposals). • finally, what looked good (and complete) on paper did not necessarily hold up under the fire of real use. Even with a formal semantics, certain operators prove tricky to understand in practice, and subtle interactions between operators that arise in practice are rarely evident from the formal work. Simply being forced by an implementation effort to get every last detail right certainly caused us to re-examine several things we thought we had gotten correct in the original logic, and I suspect this would be the case with virtually every sufficiently complex KR logic that ends up being implemented. 1065 4 Some Lessons The main lesson to be learned here is that despite the ability to publish pure accounts of logics and their theoretical properties, the true theoretical work on knowledge representation systems is not really done until issues of implementation and especially of use are addressed head-on. The "theory" can hold up reasonably well in the transition from paper to system, but the typical KR research paper misses many make-or-break issues that determine a proposal's true value in the end. Arguments about needed expressive power, the impact of complexity results, the naturalness and utility of language constructs, etc., are all relatively hollow until made concrete with specific applications and implementation considerations. For example, in our context, the right decision was clearly to start with a small version of the system for release, and extend it only as needed. Given the complexity of software maintenance, it may never make sense to try to anticipate in advance all possible ways that all possible users might want to express concepts. 2 A small core with an extension mechanism might in reality be better than a large, extraordinarily expressive-and complex-system. In the case of CLASSIC, we have been able to place in the hands of relatively naive users a fairly sophisticated, state-of-the-art inference system with a formal semantics and well-founded inference mechanism, and have them use it successfully, needing only to make a small number of key extensions to meet their real needs. There are several consequences here for next generation applications of knowledge representation research. First, it is important that the research community recognize as legitimate and important the class of issues that arise from implementation efforts-issues relating to size, for example, that have always been the legitimate concern of the database community; issues relating to implementation tradeoffs and complexities; and issues relating to software release and maintenance. Second, unless our KR proposals are put to the test in real use on real problems, it is almost impossible to assess their real value. So much seems to be different when a proposal is reduced to practice that it is unclear what the original contribution really is. Third, it is quite critical that at least some fraction of the community address directly the needs of users and the constraints and issues in their applications. Too much research with only mathematics as its driving force will continue to lead KR (and other areas of AI research) farther afield. Not only that, it is clear that truly interesting research questions arise when driven from real rather than toy or imagined needs. References [Borgida et al., 1989] A. Borgida, R. J. Brachman, D. L. McGuinness, and L. A. Resnick. CLASSIC: A Structural Data Model for Objects. In Proceedings of the 1989. ACM SIGMOD International Conference on Management of Data, pages 59-67, June 1989. [Brachman et al., 1992] R. J. Brachman, A. Borgida, D. L. McGuinness, P. F. Patel-Schneider, and L. A. 2Ironically, the ongoing and sometimes virulently argued debate over how much expressive power to allow in KR systems may in the end be settled by simple software engineering considerations. Resnick. The CLASSIC Knowledge Representation System, or, KL-ONE: The Next Generation. In Proceedings of the International Conference on Fifth Generation Systems, Tokyo, June 1992. [Devanbu et al., 1991] P. Devanbu, R. J. Brachman, P. G. Selfridge, and B. W. Ballard. LaSSIE: A Knowledge-Based Software Information System. CACM, 34(5):34-49, May 1991. [Selfridge, 1991] P. G. Selfridge. Knowledge Representation Support for a Software Information System. In Proceedings of the Seventh IEEE Conference on AI Applications, pages 134-140, Miami Beach, Florida, February 1991. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1066 Reasoning With Constraints Catherine Lassez IBM T.J. Watson Research Center P.O.Box 704, Yorktown Heights, NY 10598, USA lassez@watson.ibm.com Constraints are key elements in areas such as Operations Research, Constructive Solid Geometry, Robotics, CAD/CAM, Spreadsheets, Model-based Reasoning and AI. Languages have been designed specifically to solve constraints problems. More recently, the reverse problem of designing languages that use constraints as primitive elements has been addressed. Constraints handling techniques have been incorporated in programming languages and systems like CLP(~), CHIP, CAL, CIL, Prolog III, 2LP, BNR-Prolog, Mathematica and Trilogy. In the rule-based context of Logic Programming, the CLP scheme [5] provides a formal framework to reason with and about constraints. The key idea is that the important semantic properties of Horn clauses do not depend on the Herbrand Universe or Unification. These semantic properties and their associated programming methodology hold for arithmetic constraints and solvability (and in many other domains including strings, graphs, booleans, ... ). The CLP scheme is a main example of the use of constraints as the primitive building blocks of a class of programming languages, since logic formulae can be themselves considered as constraints. In the same spirit constraints have been introduced in committed choice languages in Maher [14], and in the work of Saraswat [15], and in Database querying languages by Kanellakis, Kuper and Revesz [6]. The link between classical AI work on constraints, and Logic Programming has been described by van Hentenryck [17]. Not surprisingly there are many different paradigms reflecting the integration of constraints and languages. The main differences come from the aims of the language: general purpose programming language, database or knowledge based query language, or a tool for problem solving. In mathematical programming the focus is on optimization, in artificial intelligence the focus is on constraint satisfaction and constraint propagation, in program verification the focus is on solvability. This should be reflected in the design of appropriate languages, but constraint programming should also have its own focus and theory. We have developped a general framework for a systematic treatment of specific domains of constraints. We recall that a logic formula is viewed as an implicit and concise representation of its set of logical consequences and that the answer to a query Q is a set of substitutions which establish a relationship between the variables of Q, satisfied if and only if Q is a logical consequence of the formula. The-key point is that a single algorithm, Resolu-' tion, is sufficient to answer all queries. These properties of logic formulae have counterparts in other domains. In particular, Tarski's theorem for quantifier elimination in closed fields[16] establishes that an arithmetic formula can be viewed as representing the set of all its logical consequences, that is the set of all arithmetic formulae it entails. Furthermore, a single algorithm, Quantifier Elimination, is required in analogy with logic formulae and resolution. At the design and implementation level, however, the problems are far more difficult than for logic formulae. To try and circumvent these problems one must make heavy use of results and algorithms from symbolic computation, operations research, computational geometry etc... Also, as in the case of logic formulae, we have to sacrifice generality to achieve acceptable efficiency by carefully selecting sets of constraints for which suitable algorithms can be found. Parametric 'queries Applying the paradigmatic aspects of reasoning with logic formulae to linear arithmetic, we have that: • a set of constraints is viewed as an implicit representation of the set of all constraints it entails • there is a query system such that an answer to a query Q is a relationship that is satisfied if and only if the query is entailed by the system. • there exists a single algorithm to answer all queries. Given a set S of arithmetic constraints as a conjunction of linear equalities, inequalities and negative constraints (disjunctions of inequations), we define a parametric query [7J as: 3GYt, GY2, ••• , fJ 'Vxt, X2, •• , : S => GYIXI + GY2X2 + ... ::; fJ I\R(GYl' GY2, ••• , fJ)? 1067 where S is the set of constraints in store and R is a set of linear relations on the parameters aI, a2, ... , (3. Parametric queries provide a general formalism to extract information from sets of constraints and to express standard operations. For instance: 1. is S solvable? If not, what are the causes of unsolvability? 2. does S contains redundancies or implicit equalities? 3. is S equivalent to S'? 4. is it true that x = 2 is implied by S? = a is implied by S? does there exist a linear relation ax + (3y + ... = , 5. does there exist a such that x 6. implied by S? 7. does there exist a}, a2, ... (3 such that S::::} alX + a2Y'" ~ (3 and al = 2a2 - 1 ? The solvability query is typical of linear programming and corresponds to the first phase of the Simplex method. Finding the causes of unsolvability is a typical problem of constraints manipulation system where the constraints in store can be modified to restore solvability using feedback information provided by the solver. Queries 2 and 3 both address the problem of constraint representation. Redundancy is a major factor of complexity in constraints processing and the removal of redundancies and the detection of implicit equalities are key steps in building a suitable canonical representation for the constraints [10] [12]. Queries 4 and 5 are classic Constr~int Satisfaction Problems (eSP) and queries 6 and 7 are generalization of CSP to linear relations: variables are bound to satisfy given linear relations instead of simply values. A priori, there does not seem to be any real connections between these various queries. However, they can all be expressed as parametric queries which ask under what conditions on the parameters aI, a2, ... , j3, the constraint in the query is implied by the constraints in store. By varying the parameters, specific queries can be formulated. For instance, • is x bound to a specific value a? 3a}, a2, ... ,j3, s.t. S::::} a1x1 +a2x2+ ... al = 1, a2 = 0, ... , j3 = a. = j3 with • is x ground? same as above but with j3 unconstrained. • does S implies 2Xl + 3X2 ~ O? as above with a1 = 2, a2 = 3, ... , j3 = O. • what are the constraints implied by the projection of S the {XI, Y2}-plane? All parameters except aI, a2, j3 set to 0 The test for solvability and the classic optimization problem can also be expressed in this way: • is S solvable? . as above with all parameters aI, a2, ... set to 0 except j3 2:: O. (by Fourier's theorem, which states that a set of constraints is solvable if and only if the elimination of all the variables results in a tautology) • what are the upper and lower bounds of f = Xl + X2 + X3? as above with al = 1, a2 = 1, a3 = 1, all other parameters are set to 0 except (3 2:: O. The answer gives the upper and lower bounds for (3 that correspond to the minimum and maximun of f. Parametric queries generalize logic programming queries which ask if there exists an assignment of values to the variables in the query so that the query becomes a logical consequence of the program clauses. They also generalize esP's queries which are restricted to constraints of the type x = a. We now must address the problem of finding a finite representation for the answers to the queries. Parametric queries are more complex than simple conjunctions of constraints as they involve universal quantifiers, non linearity and implication. However by using a result linked to duality in linear programming [8]' we can reduce the problem to a case of conjunction of linear constraints. The Subsumption Theorem states that a constraint is implied by a set of constraints S iff it is a quasi-linear combination of constraints in S. A quasi-linear combination of constraints is a positive linear combination with the addition of a positive constant on the righ-hand side. For instance, let S be the set {2x + 3y - z ~ 1, x - y + 2z ~ 2, x - Y + z ~ O} and Q be the query 3a,j3, \:Ix, y, S::::} ax + (3y ~ 1? The following relations express that the constraint in Q is a quasi-linear combination of the constraints in S. + + 2).1 ).2 ).3 = 3).1 - ).2 - ).3 a = (3 ).1 + 2).2 + ).3 = + 2).2 + q = 1 ).1 2:: 0, ).2 2:: 0, ).3 2:: 0, q 2:: 0 -).1 0 where the ).i'S are the multipliers of the constraints in S. It is from this simpler formulation that variables are eliminated. Variable elimination is the key operation to obtain answers to queries. It plays the role ofresolution in Logic Programming. With inequalities, the complexity problems are far more severe than in Logic Programming, even in the restricted domain of conjunctions of positive linear constraints. 1068 Fourier's method The basic algorithm is Fourier's[2]. The severity of the problem is illustrated by the table below: Number of variables eliminated Number of constraints generated ° 32 226 12,744 39,730,028 390,417,582,083,242 1 2 3 4 Actual number of constraints needed 18 40 50 19 2 The middle column gives the size of the output of Fourier's method to eliminate between 1 to 4 variables from an initial set of 32 constraints. The right most column gives the minimum size of equivalent outputs. Fourier's elimination is in fact doubly exponential as it generates an enormous amount of redundant information. Even if we remove redundancy on the fly, we are still left with exponential size for intermediate computation and potential exponential size for output. To solve this problem, one must look for output bound algorithms (an important area of study in computational geometry), that will guarantee an output when its size is small, bypassing the problem of intermediate swell. Also in the case where the size of the output is unmanageable, there is no point in computing it. However, we may sacrifice completeness and search for an approximation of reasonable size. That brings us back to avoiding intermediate swell. The extreme points method This method, derived from the formalism of parametric queries, is interesting as it shows that variable elimination can be viewed as a straightforward generalization of a linear program in its specification and as a generalization of the simplex in its execution. Let S = Ax :S; b and let V be the set of variables to be eliminated, the associated generalised linear program G LP is defined as: h! ~=l L: ).iail = G'l L: ).iaik = G'k L: \b i = f3 L: \aik+l =0 L:).i =0 =1 ).i ~ 0 L: ).iai m where extr denotes the set of extreme points. ~ represents the conditions to be satisfied by a combination of constraints of S that eliminates the required variables. The normalization of the ). 's ensures that ~ is a polytope. extr(cp(~)), solutions of GLP, determine a finite set of constraints which defines the projection of S. The coordinates of the extreme points of cp(~) are the coefficients of a set of constraints that define the projection. The objective function in the usual linear program can be viewed as a mapping from Rn to R, the image of the polyhedron defined by the constraints being an interval in R. The optimization consists in finding a maximum or a minimum, that is one of the extreme points of the interval. In a GLP, the objective function represents a mapping from Rn to Rm and instead of looking for one extreme point, we look for the set of all extreme points. At the operational level, we can execute this GLP by generalizing the simplex method. The extreme points of cp(~) are images of extreme points of~. So we compute the set of extreme points of ~, map them by cp and eliminate the images which are not extreme points. It is important to note that although the extreme points method is better that Fourier in general because it elim- . inates the costly intermediate steps, there are still two main problems: the computation of the extreme points of ~ can be extremely costly even when the size of the projection is small and also the method produces a highly redundant output [1]. The convex hull method Variable elimination has long been treated as algebraic manipulations based on the syntax of the constraints rather than their semantics. Fourier's Procedure and EPM are no exceptions. Consequently, the complexity of these methods is tied to the initial polyhedral set instead of to the projection itself. Quantifier elimination can also be viewed as an operation of projection. Exploiting this remark in a systematic way leads to more output bound algorithms which guarantee an output when its size is reasonable and an approximation otherwise [9]. In the bounded case, the idea is trivial: by running linear programs we compute constraints whose supporting hyperplanes bound both the polytope to be projected and its projection. The traces of these hyperplanes on the projection space provide an approximation containing the projection. At the same time the extreme points provided by the linear programs project on points of the projection. The convex hull of these points is a polytope that is included in the projection. Iterating this process leads to the projection. Whether we have an output bound algorithm or not will however depend on the choice of points. The difficulties that remain are that we do not want to make any assumption on the input polyhedral set which can be bounded or not, full dimensional or not, redundant or not, empty or not. Standard linear programming techniques can be used to determine solvability and to transform the input if required into a set of equations defining its affine hull and a set of inequalities defining a full-dimensional polyhedral set in a smaller space. A straightforward variable elimination in the set of equations gives the affine hull of the projection which will be part of the final output. 1069 This simplification based on geometrical considerations allows us to eliminate as many variables as possible by using only linear programming and gaussian elimination before getting into the costly part of elimination. In the bounded case, the algorithm works directly on the input constraints. The projection is computed by successive refinements of an initial approximation obtained by computing with linear programs enough extreme points of the projection so that their convex hull is full-dimensional. Successive refinements consist in adding new extreme points and updating the convex hull. The costly convex hull construction is done in the projection space thus the main complexity of the algorithm is linked to the size of the output. The process stops when either the projection has been found or the size of the approximation has reached a user-supplied bound. In the unbounded case, the problem is reformulated using the generalised linear program representation which is bounded by definition. cp(~) is computed by projection. The output will consist of the convex hull of cp(~) but also the set of its extreme points, from which the constraints defining the projection are derived. The advantage over the extreme points method is that we compute directly the extreme points of the projection. We do not need to compute the extreme points of ~, this computation being the source of enormous intermediate computation and high redundancy in the output. Implicit equalities and causes of unsolvability Fourier's algorithm can be used to trace all subsets of constraints in S that cause unsolvability or that are implicit equalities [11]. By using the quasi-dual formulation, we can acheive the same effects by running linear programs. The quasidual formulation which corresponds to Fourier's algorithm is CP: f3 = bT }. AT}. = 0, ~: { ~}.i=1,. }.i ~ 0 \;fz. Here CP maps ~m to ~, where m is the number of constraints in S. Since we want to compute the minimum of CP subject to ~ we need to solve the following linear program D: minimize bT }. subject to AT}. = 0 ~\ = 1 }.i ~ 0 Vi. It is obvious that, in general, solving S in this manner is far more efficient than using Fourier's algorithm. Since D is a variant of the dual simplex in Linear Programming, it inherits nice properties from the standard dual simplex such as good incremental behavior, no need to introduce slack variables and no restriction to positive I Quasi- Dual D I Unsolvable Properties of S • • • • • Strongly solvable Full dimensional No implicit equalities Unbounded and no p~ojection has arallel facets • • • • • Full dimensional No implicit equalities Bounded or exists projection with parallel facets • Solvable · . . • Weakly Solvable • Not full dimensional • Exists implicit equalities • An evident minimal subset of im licit e ualities • • Unsolvable • An evident minimal infeasible subset variables. More importantly as a side effect of the solvability test we obtain information about the algebraic properties of the constraints and about the geometric structure of the associated polyhedron. The properties of D are summarized in the table. Conclusion Much of the existing work on constraints has been done in diverse domains with their own distinctive requirements. Even in the restricted domain of linear arithmetic constraints, there is a wealth of knowledge and algorithms. To build systems to reason with constraints requires borrowing and synthesizing various notions and this led to the emerging concept of a unified framework of a single representation, the parametric query, and solution technique, variable elimination, for handling all the different operations on constraints. This approach shares key aspects with Logic Programming, with variable elimination playing the rule of resolution. The viability of this approach, both from a knowledge representation and knowledge processing aspects, is beeing tested with applications in the domain of spatial reasoning [3] and graphic user-interface [4]. Empirical results with an initial implementation have shown that a variety of small (about a hundred inequalities in two dimensions) and fairly large problems (up to about 2,000 inequalities over 70 variables) can be processed in times ranging from less than a second to a few minutes. Ongoing work includes the design and implementation of an integrated system based on the proposed framework and incorporating several solvers. The potential applicability of more recent interior points method is also investigated. Many properties of linear arithmetic constraints hold for constraints in other domains. These properties have been abstracted and generalized in [13]. 1070 References [I] T. Huynh, C. Lassez and J-L. Lassez, Practical Issues on the Projection of Polyhedral Sets, to appear Annals of Maths and AI. [2] T. Huynh, C. lassez and J-L Lassez, Fourier Algorithm Revisited, 2nd International Conference on Algehraic and Logic Programming Springer-Verlag Lecture Notes in Computer Sciences, 1990. [3] T. Huynh, L. Joskowicz, C. Lassez and J-L. Lassez, Reasoning About Linear Constraints Using Parametric Queries, in Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Sciences, Springer-Verlag vol. 472 December 1990. [4] R. Helm, T. Huynh, C. Lassez and K. Marriott, A Linear Constraint Technology for User Interfaces, to appear Proceedings of Graphics Interface '92 [5] J. Jaffar and J.L. Lassez, Constraint Logic Programming, Proceedings of POPL 1987, Munich. [6] P. Kanellakis, G. Kuper and P. Revesz, Constraint Query Languages, Proceedings of the A CM Conference on Principles of Datahase Systems, Nashville 90. [7] J-L. Lassez, Querying Constraints, Proceedings of the ACM conference on Principles of Datahase Systems, Nashville 1990. [8] J-L Lassez, Parametric Queries, Linear Constraints and Variable Elimination Proceedings of DISCO 90, Springer-Verlag Lecture Notes in Computer Sciences. [9] C. Lassez and J-L. Lassez, Quantifier Elimination for Conjunctions of Linear Constraints via a Convex Hull Algorithm, IBM research Report, T.J. Watson Research Center, RC 16779 (1991), to appear Academic Press. [10] J-L. Lassez, T. Huynh and K. McAloon, Simplification and Elimination of Redundant Arithmetic Constraints, Proceedings of NACLP 89, MIT Press. [11] J-L. Lassez and M.J. Maher, On Fourier's Algorithm for Linear Arithmetic Constraints, IBM Research Report, T.J. Watson Research Center, RC 14114 (1988). To appear Journal of Automated Reasoning. [12] J-L. Lassez and K. McAloon, A Canonical Form for Generalized Linear Constraints, IBM Research Report, T.J. Watson Research Center, RC 15004 (1989), to appear Journal of Symholic Computation. [13] J-L Lassez, K. McAloon, A Constraint Sequent Calculus LICS 90. Philadelphia. [14] M. Maher, A Logic Semantics for a class of Committed Choice Languages, Proceedings of ICLP4, MIT Press 87. [15] V. Saraswat, Concurrent Constraint Logic Programming, to appear MIT Press. [16] L. Van Den Vries, Alfred's Tarski's Elimination Theory for Closed Fields, The Journal of Symholic Logic, vol.53 n.1, March 1988. [17] P. van Hentenryck, Constraint Satisfaction in Logic Programming, The MIT Press, 1989. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE. ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 t071 Developments in Inductive Logic Programming Stephen Muggleton The 'lUring Institute, 36 North Hanover Street, Glasgow Gl 2AD, UK. Abstract Inductive Logic Programming (ILP) is a research area formed at the intersection of Machine Learning and Logic Programming. ILP systems develop predicate descriptions from examples and background knowledge. The examples, background knowledge and final descriptions are all described as logic programs. A unifying theory of Inductive Logic Programming is being built up around lattice-based concepts such as refinement, least general generalisation, inverse resolution and most specific corrections. In addition to a well established tradition of learning-in-the-limit results, recently some results within Valiant's PAC-learning framework have been demonstrated for ILP systems. Presently successful applications areas for ILP systems include the learning of structure-activity rules for drug design, finite-element mesh analysis design rules, primarysecondary prediction of protein structure and fault diagnosis rules for satellites. 1 Introduction Deduction and induction have had a long strategic alliance within science and philosophy. Whereas the former enables scientists to predict events from theories, the latter builds up the theories from observations. The field of Inductive Logic Programming [6,8] unifies induction and deduction within a logical setting, and has already provided notable examples of the discovery of new scientific knowledge in the area of molecular biology [5, 7]. 2 Theory In the general setting an ILP system S will be given a logic program B representing background knowl- edge and a set of positive and negative examples (E+, E-), typically represented as ground literals. In the case in which B ~ E+, S must construct a clausal hypothesis H such that BAHpE+ where B, Hand E- are satisfiable. In some approaches [16, 13] H is found via a general-to-specific search through the latt.ice of clauses. This lattice is rooted at the top by the empty clause and is partially ordered by O-subsumption (H O-subsumes H' with substitution () whenever H() ~ H'). Two clauses are treated as equivalent when they both O-subsume each other. Following on from work by Plotkin [12], Buntine [1] demonstrat.ed that the equivalence relation over clauses induced by O-subsumption is generally very fine relative to the the equivalence relation induced by entailment between two alternative theories with common background knowledge. Thus when searching for the recursive clause for member/2, infinitely many clauses containing the appropriate predicate and function symbols are 0subsumed by the empty clause. Very few of these entail the appropriate examples relative to the base case for member/2. Specific-to-general approaches based on Inverse Resolution [9, 14, 15] and relative least general generalisation [1, 10] maintain admissibility of the search while traversing the coarser partition induced by entailment. For instance Inverse Resolution is based on inverting the equations of resolution to find candidate clauses which resolve with the background knowledge to give the examples. Inverse resolution can also be used to add new theoretical terms (predicates) to the learner's vocabulary. This process is known as predicate invention. Several early ILP authors including Plotkin [12] and Shapiro [16] proved learning in the limit results. Recently, ILP learnability results have been proved 1072 within Valiant's PAC framework for learning a single definite clause [11] and in [3] for learning a multiple clause predicate definition assuming the examples are picked from a simple-distribution. 3 Applications ILP is rapidly developing towards being a widely applied technology. In the scientific area, the ILP system Golem [10] was used to find rules relating the structure of drug compounds to their medicinal activity [5]. The clausal solution was demonstrated to give meaningful descriptions of the structural factors involved in drug activity with higher acuracy on an independent test set than standard statistical regression techniques. In the related area of predicting secondary structure of proteins from primary amino acid sequence [7] Golem rules had an accuracy of 80% on an independent test set. This was considerably higher than results of other comparable approaches. Golem has also been used for building rules for finite-element-mesh analysis [2] and for building temporal fault diagnosis rules for satellites [4]. 4 Conclusion Inductive Logic Programming is developing into a new logic-based technology. The field unifies induction and deduction within a well-founded theoretical framework. ILP is likely to continue extending the boundaries of applicability of machine learning techniques in areas which require machine-construction of structurally complex rules. References [1] W. Buntine. Generalised subsumption and its applications to induction and redundancy. Artificial Intelligence, 36(2):149-176, 1988. [2] B. Dolsak and S. Muggleton. The application of Inductive Logic Programming to finite element mesh design. In S.H. Muggleton, editor, Inductive Logic Programming, London, L992. Academic Press. [3] S. Dzeroski, S. Muggleton, and S. Russell. Pac-Iearnability of determinate logic programs. TIRM, The Turing Institute, Glasgow, 1992. [4] C. Feng. Inducing temporal fault dignostic rules from a qualitative model. In S.H. Muggleton, editor, Inductive Logic Programming. Academic Press, London, 1992. [5] R. King, S. Muggleton R. Lewis, and M. Sternberg. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proceedings of the National Academy of Sciences (to appear), 1992. [6] S. Muggleton. Inductive logic programming. New Generation Computing, 8(4):295318, 1991. [7] S. Muggleton, R. King, and M. Sternberg. Predicting protein secondary-structure using inductive logic programming, 1992. submitted to Protein Engineering. [8] S.H. Muggleton. Inductive Logic Programming. Academic Press, 1992. [9] S.H. Muggleton and W. Buntine. Machine invention of first-order predicates by inverting resolution. In S.H. Muggleton, editor, Inductive Logic Programming. Academic Press, London, 1992. [10] S.H. Muggleton and C. Feng. Efficient induction of logic programs. In S.H. Muggleton, editor, Inductive Logic Programming, London, 1992. Academic Press. [11] D. Page and A. Frisch. Generalization and learnability: A study of constrained atoms. In S.H. Muggieton, editor, Inductive Logic Programming. Academic Press, London, 1992. [12] G.D. Plotkin. Automatic Methods of Inductive Inference. PhD thesis, Edinburgh University, August 1971. [13] R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239-266, 1990. [14] C. Rouveirol. Extensions of inversion of resolution applied to theory completion. In S.H. Muggleton, editor, Inductive Logic Programming. Academic Press, London, 1992. [15] C. Sammut and R.B Banerji. Learning concepts by asking questions: In R. Michalski, J. Carbonnel, and T. Mitchell, editors, Machine 1073 Learning: An Artificial Intelligence Approach. Vol. 2, pages 167-192. Kaufmann, Los Altos, CA,1986. [16] E.Y. Shapiro. Algorithmic program debugging. MIT Press, 1983. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1074 Towards the General-Purpose Parallel Prosessing System Kazuo Taki Institute for New .Generation Computer Technology 4-,28, Mita 1-chome, l\1inato-ku, Tokyo 108, JAPAN taki@icot.or.jp 1 Introduction Processing power of the recent microprocessors grows very rapidly; It almost gets over the power of mainframe computers. Trends of the continuous improvement of the semi-conductor technology suggest that the processing power of one-chip processor devices will reach 2000 MIPS until the end of 1990s, and that the parallel computer system with 1000 processors will be installed in a cabinet which will real~ze 2 TIPS (tera instructions per second) peek speed. Such a gigantic power hardware is no longer hard to imagine because recent large-scale parallel computers for scientific processing, just appeared in the market, suggests a trend of large parallel computers. However, the software technology on the scientific parallel computers focuses on very limited application domains. Hardware design is also shifted to the applications somewhat. The parallel processing paradigm on those systems is the data parallelism. Problem modeling, language specification, compiling technique, a part of OS design, etc. are all based on the data parallelism. The characteristics of the data-parallel computation are regular computation on uniform data or synchronous computation in other word. The coverage of this paradigm is limited to non-wide area of application domains, such as dense matrices computation, image processing, and other problems with regular algorithms on uniform data. To make full use of the gigantic power parallel machines in the future, the other parallel processing paradigms, that cover much wider range of application domains, are longed to be developed. 2 New Domain of Parallel Application Knowledge processing is the target application domain of FGCS project. Characteristics of knowledge processing problems are different much from that of scientific computations based on the data-parallel paradigm. Dynamic and non-unif01'm computation often appear in the knowledge processing. For example, when a heuristic search problem is mapped on a parallel computer, workload of each computation node changes dra.stically depending on expansion and pruning of the search tree. Also, when a knowledge processing program is constructed from many heterogeneous objects, each object arises non-uniform computation. Computation loads of these problems are hardly estimated before execution. These large computation problems with dynamism and non-uniformity are called the dynamic and nonuniform problems in this paper. When a system supports the new computation paradigm suitable for the dynamic and non-uniform problems, its coverage of the application domain must expand not only to the knowledge processing but also to some classes of large numerical and symbolic computation that have less data-parallelism. 3 Research Themes The dynamic and non-uniform problems arise new requirements mainly on the software technology. They need more complex program structure and more sophisticated load balancing scheme than that of the dataparallel paradigm. These items, listed below, have not been studied enough for the dynamic and non-uniform problems with large computation. 1. Modeling scheme to realize large concurrency 2. Concurrent algorithms 3. Programming techniques 4. Load balancing schemes .5. Language design 6. Language implementation 7. OS implementation 8. Debug and performance monitoring supports The latter five items should be included in the topics of design and implementation of the system layer. The former three items should be included in the application layer or more general framework of soft.ware development. 4 Approach Such an approach has been taken in the FGCS project that the system layer (including the topics 5 to 8 in section 3) was carefully tailored to suit the dynamic and non-uniform problems and topics of the upper layer (1 to 4) were studied on the system. Key Features in the System Layer: The system layer satisfies these items to realize efficient programming and execution of the target problems. 1075 1. Strong descriptive power for complex concurrent programs schemes. The language features helps the research of various concurrent algorithms and programming techniques. 2. Easy to remove bugs Application Development: Practical large applications have been implemented [Nitta 1992]. such as: 3. Ease of dynamic load balancing • LSI-CAD system: Logic simulation / Placement 4. Flexibility for changing the load allocation and scheduling schemes to cope with difficultv on estimating actual computation loads before execution • Genome analysis system: Protein sequence analysis / folding simulation / structure analysis Mainly, the language feature realizes these characteristics and the language implementation supports efficiency. The key language features are listed below. • Legal reasoning system • Go game playing system v • Small-grain concurrent processes: A lot of communicating processes with complex structure can be easily described, realizing large concurrency. • Implicit synchronization/communication : They are performed between concurrent processes even in remote processors, which helps to write less buggy programs. • Separation of concurrency description and mapping: Programmers firstlv describe concurrency of the program without conVcerning with mapping (load allocation). Mapping can be specified with a clearly separated syntax after the COllCurrency description is finished. Runtime support for the implicit remote synchronization enables it. • Handling a scheduling without destroying the clear semantics of the single-assignment language • Handling a group of small-grain processes as a task The language implementation realizes an efficient execution of these features, including a efficient kernel implementation of memory management, process scheduling, communication, virtual global name space, etc. [Taki 1992J. The other functions, which are written i~ the language, realize an research and development e!lvlronment of parallel software including a programmmg system, task management functions, etc. as as Research for the Upper Layer: Research topics of 1 to 4 in section 3 have been studied. After toy problems have been tested enough, R&D on practical large applications become important. Strong cooperation of experts on application domain and on parallel processing is indispensable for those R & D. Several R&D teams have been made for each application development. Firstly, the research topics have been studied focusing on each application, then commonly applicable paradigms and schemes are extracted and supported by the system as libraries, as functions or programming samples. 5 Current Status System hnplementation: A concurrent logic programming language KL1, which has those features listed in s~tion 4, has bee.n efficiently implemented on the parallel mference machme PIM. A parallel operating system PIMOS, which is written in KL1, supports an R&D environment for parallel software. Very low-cost implementation of those features [Taki 1992J encourages the research of load balancing • Other eight application programs with different knowledge processing paradigms Most of them arise dynamic or non-uniform computation. Some measurements show very good speedup and absolute speed by parallel processing. Common Paradigms and Schemes: Efforts on extracting common paradigms and schemes from each application development have been continuing. Categorizing dynamic process structures and load distribution schemes have been carried on. Performance ana.lvsis methodologies have also been studied [Nitta 1992]. ' A multi-level dvnamic load distribution scheme for search problems i~ already supported as a library 'program. A modeling. programming and mapping scheme based on (l lot of small conClIrl'ent objects have been commonly used among several application programs. 6 Conclusion New paradigms of parallel processing, that can cover the dynamic and non-uniforrn pmblems, are expected to expand application domains of parallel processing much larger than ever. The dynamic and non-uniform problems must be a large application domain of parallel processing, coming next to the applications based on the data-pamllelism. Parallel processing systems, that support efficient programming and execution of the dynamic and nonuniform problems. will get close to the general-purpose parallel processing system. The KL1 language system, developed in the FGCS project. realize many useful features for efficient programming and executioll of that problem domain. Mam' application developments have been proving effectivene~s of the language features and their implementation. R&D of problem modeling schemes. concurrent algorithms, programming techniques and load balancing schemes for that problem domain have started in the project, and still have to be continued. The accumulation of those software technology must make the true general-purpose parallel processing system. References [Nitta 1992] K. Nitta, K. Taki and N. Ichiyoshi. Experimental Parallel Inference Software. In Pmc. of thE Int. Conf. on FGCS, 1992. [Taki 1992] K. Taki. Parallel Inference Machine PIM. In Pmc. of the Int. Conf. on FGCS. 1992. PROCEEDINGS OF THE INTERN A TIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 lO76 A Hybrid Reasoning System For Explaining Mistakes In Chinese Writing Jacqueline Castaing Univ. Paris-Nord, Lipn / . Csp., Avenue J-B Clement 93430 Villetaneuse France jc@lipn.univ-paris13Jf Abstract We present in this paper a hybrid reasoning system for Explaining Mistakes In Chinese Writing, called EMICW. The aim of EMICW is to provide students of the chinese language with a means to memorize characters. The students write down from EMICW 's dictation. In case of graphic errors, EMICW will explain the reasons of this error by using either the etymology of characters or some efficient mnemonic techniques. EMICW has multiple representations associated to multiple reasoning methods. The coherence of the reasoning is ensured by means of a common logic formalism, the FLL-theories, derived from Girard's linear logic. 1 Introduction The main aim of the system EMICW is to provide students of the chinese language with a means to memorize chinese characters without losing heart. The first obstacle for people accustomed to an alphabet is indeed the great number of characters to sink in. We propose to them to write down from EMICW's dictation. In the case where students are mistaken about a character, the system will explain the reasons of this graphic error either by using the origin of the character [Henshall 1988], [Ryjick 1981], [Wieger 1978], or by invoking an efficient mnemonic technique. EMICW is a hybrid knowledge representation and reasoning system [Brachman, et al. 1985], [Kazmareck et al. 1986], [Nebel 1988]. It has multiple representations - a semantic network associated to inference rules expressed in the formalism of Gentzen's calculus [Gentzen 1969] associated to multiple reasoning methods. The set of inference rules defines the main cases of mistakes that the author of this article and school fellows could make during their own initiation into the chinese writing. The learning methods used are given in [Bellassen 1989], [De Francis 1966], [Lyssenko and Weulersse 1987] [Shanghai Press 1982]. To ensure a coherent reasoning, EMICW has a common logic formalism, the FLL-theories [Castaing 1991], borrowed from Girard's linear logic [Girard 1987, 1989]. The system essentially performs monotonic abduction [B ylander 1991]. So, let a be the correct chinese character the student should write down from EMICW 's dictation. Let b be the actual answer given by the student. If the student is mistaken, it means that the character a is different from b, the binary predicate Error (a, b) is then set to the value true. An explanation of a graphic error consists in finding a set of first-order formulas Sigma such that a proof of the linear sequent Sigma f-- Error (a,b) can be carried out in a FLL-theory. The set of the formulas of Sigma shows the different causes of the confusion of the characters a with the character b. For example, the two characters a and b may have the same sound (they are homophonic), or they may share the same graphic components, and so on. In this paper, we first briefly outline the history of chinese characters [Alleton 1970], [Henshall 1988], [Li 1991] [Ryjick 1981], [Wieger 1978], so the reader can appreciate how a character is made up, how it acquired its structure and will make himself an opinion on the difficulties of the chinese writing. We also give the terminology we use. In the third section, we discuss the problem of characters representation and recognition which explains the limitation of our system. Then, after describing the system EMICW (section 4), we will give in section 5, an example of explanation in the FLL-theory T. The essential point of the section 6 is the proof of the tractability of our system. 2 Chinese Writing The chinese characters originated between 3000- 2000 B.C in the Yellow River of China. They have been the subject of numerous studies. In this paper, we limit ourselves to mentioning what is essential for a good understanding of our work. The chinese characters, also called sinograms (letters from China) are written in square form with the help of strokes, for example, horizontal stroke, vertical stroke. A set of 24 strokes standardized by the Foreign Languages Institute of Beijing are now of general use (see section 3.1). Strokes must be written down according to established principles of stroke order (generally from top to bottom, and from left to right) called calligraphic order. A knowledge of these principles is important in order to achieve the proper shape and to write in the cursive style or semi-cursive style (the writing style of the chineses). Sinograms are monosyllabic, and each syllable has a definite tone. There are four basic tones in the official national language (called mandarin chinese too). The transliteration used in this article is based on the official Chinese phonetic system, called pinyin, which is a representation of the sounds of the language in the Latin alphabet. We mark tones with numbers from 1 to 4. Sinograms have traditionally been classified into six categories. However, in many cases the categorization is 1077 open to difference of opinion, and one sino gram can legitimately belong to more than one category. We list below the main categories that shed considerable light on the nature of sinograms. The students should consider these categories as guides to remembering sinograms. 1. The simple pictogram: essentially a picture of si~le physical object. For example, woman ~ nu3, child -:rzi3. 2. The complex pictogram: a picture of several physical objects normally indissociable. For example, good }tf ha03. 3. The ideogram: a meaningful combination of two or more pictograms chosen for their meanings. For example, from pictograms sun ri4, and moon yue4, the ideogram intelligent is derived: 8~ 4. The ideo- phonogram: the largest category, containing about 90% of the sinograms. Essentially a combination of a semantic element with a phonetic element. For example, the ideo-phonogram seed ~t zi3 obtained by combining the semantic element cereal* mi3, with the phonetic element child=f zi3, which gives to the character its reading. In fact, only about 30% of sinograms have a real phonetic component as in the example. Chinese (as any other language which is still spoken) has changed since the origin, so the phonetic element has lost its property. The classification of sinograms in dictionaries can be done with the help of several methods. The number of strokes method and the alphabetical order (based on the pinyin romanization) method are easy to apply. The four corner method considers particular strokes located at the four corner of the sinogram. These strokes are codified with the help of four (or five) digits, and the sinogram is located at the position given by its numerical representation. The radical method uses a particular element in a sinogram, the key element, which indicates the general nature of the character. For instance, the ideo-phonogram~.J zi3 is located under the radical The character dictionary Xin Bua Zi Dian (eds .. 1979) lists the sinograms with respect to 189 radicals. About five to seven thousand sinograms of up to ten or so strokes are needed in order to master the Chinese writing. The usual technique for learning consists in writing down a sinogram until it sinks in. We believe that the key to successful study of sinograms does not lie in rote learning. We propose a way to make the task a lot easier. For each case of mistake, our EMICW system gives an explanation based on the etymology of the characters. For instance, the character tian 1 (sky) can be confused with the following one ~ fu4 (adult), because they have similar graphics. In fac(,1he character ~ comes from .* da4 (tall), and from the graphic - yi 1 (one), which represents a hat, while the character.t;:. comes from 1:. ' and from the graphic- which means a hairpin. The position of the strokes can be meaningful. If such an explanation is given to the students in case of error, they progressively will be able to correct their own mistakes by reasoning, without relying heavily on memory. Moreover, they can consider these explanations as an introduction to the history of Eastern Asia. We list below the main cases of mistakes we have met in our study of the chinese language: 1. Confusion of homophonic sinograms: about 50000 sinograms share four hundred syllables. According to official statistics each syllable with its tone corresponds to an average of five distinct sinograms. So, the first a A *. *= difficulty for students is to distinguish the homophonic sinograms. For exftmple, ten shi2, moment shi2, and to know1R shi2 w ich are homophonic sinograms can be confused in a dictation. 2. Confusion of sinograms with similar graphics: For example,~ji3, yi3, si4 have similar graphics, tian 1, fu4 adult have similar graphics too. It happens that t e mistaken graphic is not a sinogram. For example, instead of half.:f ban 1, the student (the author of this article) wrote':': . 3. Confusion of si'hograms which share the same components: For example,;t..~ di4, and:1! chi2, " which share the component ~ . 4. Confusion of sinograms which form a word: The sinograms are monosyllabic, but the chinese words are generally dissyllabic. For example, the words .iJ ~l].enlti3 (body),.Jt. fi] gong4tong2 (togeilier), and 1& shuolfiua4 (to talk). The students usually learn dissyllabic words. So, they happen to confuse a sinogram with another. We can also mention the case of confusion of simplified forms with non simplified forms of sinograms, of missing strokes: very complex sinograms may have about thirty strokes, so missing strokes is a very frequent mistake. t EJ·t e * e 1* 1! 3 The Graphics Capture Students write down sinograms from EMICW's dictation. A "good" method for representing graphics should allow the system to rapidly recognize the graphics drawn which are not automatically sinograms, because students can be mistaken. The different classification and search techniques in dictionaries that we have mentioned in the previous paragraph, permit to locate a character, but not to correct it. For instance, the four corners method does not take into account all the strokes drawn by the student, so, cannot be used to correct mistakes. The recognition problem of sinograms has been the subject of numerous studies. The last results can be found in rWang 1988] [Yamamoto 1991 J. 3.1 Data Capture In our particular application, we have to "understand" graphics drawn by students in order to help them in case of error. Each graphic drawn is characterized by the type of strokes used, the calligraphic order of strokes, and their positions in a square. In order to capture all these data, the system displays the set of 24 standardized strokes. In fact, only six strokes are primary ones: the point l (pt), the horizontal stroke (hr), the vertical stroke I (vt), the top to left bottom stroke ) (dg), the top to right bottom stroke ' (dd), and the back up stroke".(rt). All other strokes derived from these primary ones. These strokes are implemented by means of graphical primitives such as line drawing, rectangle and arc drawing. The students arrange strokes to draw graphics inside a square, the pictures may be expanded or shrunk to fit their destination square. For instance, the sinogram ~ tian1 (sky) can be written down in the following square by means of strokes of types hr, dg, and dd, according to the calligraphic order of writing (hr hr dg dd): 1078 4.1 3.2 Graphic Feature of a Sinogram As the position of strokes can be meaningful, we propose to locate each stroke in terms of coordinates on a plane (the coordinate plane is a two-dimension grid, which corresponds to the square drawn above, the coordinate origin (0, 0) being at the left top corner of the square). We sort out strokes with respect to their coordinates: from top to bottom (top-down order) , from bottom to top (bottom-up order), from left to right (left-right order), from right to left (right-left order). So, every graphic is characterized by the set of following codifications: the calligraphic order of strokes, the top-down, the bottom-up, the left-right and the right-left orders of strokes. For instance, the graphic feature of the sinogram :t: tian 1 is given by the calligraphic order of strokes (hr lfrllg dd), the top-bottom order (hr dg hr dd), the bottom-top order (dg dd hr hr), the left-right order (hr hr dg dd), and the rightleft order (hr hr dd dg). We show now how all that knowledge can be used to explain graphic errors. 4 Terminological component Concepts are labelled collections of (attribute, value) pairs. The main concepts are the following ones: Stroke, Graphic-Feature (abbreviated as G-F), Graphic-Meaning (abbreviated as G-M), Graphic-Sound (abbreviated into G-S), Syllable, Meaning. Individual concepts denoted by small letters are instances of concepts denoted by capital letters. Attributes are classified into the structural link IS-A and properties. The IS-A link is used for inheritance. So, if two concepts B and A are linked by means of the IS-A link, we say that A subsumes B, and that the concept B is of type A. Properties are related to the intrinsic features of concepts. The attribute values are concepts too. The main properties in the system are the following ones: c-o (abbreviation of stroke calligraphic order ), o-t-d (abbreviation of top-down order), o-b-u (abbreviation of bottom-up order), o-l-r (abbreviation of stroke order from left to right), o-r-l (abbreviation of stroke order from right to left), sound (pronunciation), etymo (abbreviation of etymology). We give below a general view of the classification of the main concepts in EMICW taxonomy. To make clear the presentation, we use an ordering graph (semantic network), where the bold arrow -> represents the IS-A relation, and the arrow -> represents the roles. Knowledge Representation The representation language of EMICW is a restricted version of the frame-based language KL-ONE [Brachman and Smolze 1985] - for instance it does not support structural dependency relations. EMICW has a terminological component, the data base associated to an assertional component. The assertional component is a set of rules expressed in terms of predicates which are defined in the terminological component. Let us first justify our choice, then we will describe the language. In order to deal with all the cases of mistakes listed in section 2, we need for a representation system which allows us to define all the links of "proximity" between the objects manipulated, i.e. graphics which are (or which are not) sinograms. For instance, homophonic links between two different sinograms, or graphic similarity between a graphic and its components. The inheritance link IS-A ( B IS-A A means intuitively that all instances of B are also instances of A), and the properties which correspond to roles fit very well our problem. For efficiency reasons, we have to find a trade-off between the expressive power of the representation language and the computational tractability of the relation IS-A (called subsumption relation). In [Castaing 1991], we analysed the relation B IS-A A, and we proved that providing some restrictions, a subsumption criterion can be defined. A matching algorithm based on this criterion computes subsumption in polynomial time. In the system EMICW, we increase the expressive power of our language by adding to the system an assertional component, which only deals with existential rules. In section (6), we will discuss the computational complexity of our system. In the taxonomy given above, there are only individual concepts of type Meaning. For instance, the words tall and hat are instances of Meaning. The concepts of type Syllable correspond to the syllables of the chinese language without tone. For instance, the concept Tian is of type Syllable. An instance of the concept Tian may be tian1 (first tone). The concepts of type Stroke correspond to ordered sequences of strokes. Let Sa and Sb be two concepts of type Stroke. Sb IS-A Sa if and only if the strokes in Sa also appear in Sb in the same order. For instance, the concept Sa which corresponds to the sequence of strokes (hr dg dd) subsumes the concept Sb given by the sequence (hr hr dg dd). Intuitively, this relation means that the graphics drawn by means of the ordered sequence of strokes (hr hr dg dd ) have been partially drawn by means of the ordered sequence (hr dg dd) too. The concepts of type G-F give the graphic features of sinograms. The meaning of a sinogram 1079 is given by the property etymo, and its reading is given by the property sound. It may happen that two different sinograms have the same graphic feature. For instance, to 10vet'J' ha04, and good 1<:f ha03. So, we define concepts of type G-M (Graphic Meaning), and G-S (Graphic Sound), such that each sinogram in the data base can be considered as an instance of the concepts G-M and G-S. We now give an example of a sinogram representation. Example-I: Let alOO be the sinogram ~ tianl. Its graphic feature can be defined by means of the concept G-FIOO which is characterized by the following (attribute, value) pairs: G-FIOO = { (c-o, (hr, hr, dg, dd)), (o-t-d, (hr, dg, hr, dd)), (o-b-u, (dg, dd, hr, hr)), (o-l-r, (hr, hr, dg, dd)), (or-I, (hr, hr, dd, dg»}. The sinogram alOO inherits its meaning (sound) from the concept G-MIOO (G-S 100) partially defined by the following sets of (attribute, value) pairs: G-MIOO = {(IS-A G-FlOO), (etymo, sky) } G-S 100 = {(IS-A G-FlOO), (sound, tian)} So, the sinogram alOO is an individual concept of type GFIOO defined by the (attribute, value) pairs: alOO ={(IS-A, G-FlOO), (etymo, sky), (sound, tianl) }. End of the Example-I. The graphics drawn by the student during a dictation are not automatically sinograms. So, we first consider them as concepts of type G-F (Graphic Feature). We solve the recognition problem of graphics by means of a classifier [Brachman and Levesque 1984]. 4.1.1 Classifier Usually the role of the classifier in a KL-ONE taxonomy consists in placing automatically a concept at its proper location. For classifying concepts in EMICW taxonomy, we proceed in two steps: 1. From the graphic drawn by the student, we define the concept CG (Complete- Graphic) related to the properties c-o, o-t-d, o-b-u, o-l-r, o-r-l of the components 2. We look for the concepts A and B, such that A subsumes CG, CG subsumes B, and there does not exist a concept A' which can be located between A and CG, and a concept B' which can be located between CG and B. We place CG, and we say that CG is at its optimal location in EMICW taxonomy. It means that CG inherits from all its ancestors. A is said to be a father of CG. B is said to be a son of CG. In case the concepts A and B are identical, we say that CG has been identified with A (or with B). 4.1.2 Recognition Problem The recognition problem consists in discovering an individual concept b of type Sino, which has the same graphic feature than CG. We proceed as follows: 1. By means of the classifier, we place the concept CG at its optimal location. 2. If CO can be identified with a concept G-Fn of type G-F, it means that there exists at least a sinogram which is an instance of G-Mn and G-Sn. Let cf be this particular instance of G-Mn andG-Sn. We identify CG with cf, and CG "wins" all the properties of cf, for example, the properties sound, and etymo. We give an example. Example-2 Let us suppose that the graphic drawn by the student is ~ tianl (sky). The concept CG has the following properties (after sorting out the strokes with respect to their coordinates) CG = {(c-o, (hr hr dg dd)), (o-t-d, ( hr dg hr dd)), (o-bu, (dg dd hr hr)), (o-l-r, (hr hr dg dd)), (o-r-l, (hr hr dd dg))}. The concept CG placed at its optimal location can be identified with the concept G-FIOO (see the Example-I): G-FlOO = { (c-o, (hr hr dg dd)), (o-t-d, (hr dg hr dd)), (o-b-u, (dg dd hr hr)), (o-l-r, (hr hr dg dd)), (o-r-l, (hr hr dd dg))}, and so, can be identified with the instance alOO of G-MlOO and G-SIOO. The concept CG gains the properties sound and etymo of a I 00. End of the example-2. Our recognition procedure is a little drastic. It may happen in sinograms with multiple components that some strokes in a component have no link with those in another component. By sorting out all strokes, we consider that they are necessarily linked, so, we detect a graphic error and reject the graphic proposed by the student. Our recognition procedure suits sinograms (simple or complex) whose components are specified by the students. 4.2 Rules The rules of the assertional component deal with the different cases of error in chinese writing. All the predicates manipulated are defined in the terminological component either as unary predicates (concepts) or as binary predicates (roles), except for the predicates Error, (different), and = (equivalence). We explain now how the confusion of sino grams can be interpreted by means of the predicate Error. Let a be the sinogram of the dictation, and CG be the complete concept obtained from the graphic drawn by the student. The student's answer is considered correct (there is no error) if and only if: 1. The concept CG is recognized as a sinogram denoted by b. 2. The individual concepts a and b share exactly the same properties. Two cases of error are possible: 1. The concept CG cannot be identified with a concept of type Graphic- Feature of a sinogram. It means that the graphic drawn is not a sinogram. 2 The concept CG is recognized as a sinogram denoted by b, but the sinograms a and b do not share the same properties. In the first case, the concept CG is located at its optimal position, and has a father that we denote by B. We consider an individual concept b of type B, and we propose to explain the confusion of a with b.The choice of an individual b may depend on a strategy. For the time being in our application, we identify CO with an individual which has the same graphic feature as B. In the second case, we propose to directly explain the confusion of a with b. The individual concepts pointed out by our system during an explanation are the witnesses of the error. The rules of the assertional component have a limited syntax. Their general form is: "If there-exists x such that P (x) then Error (a, b)", where x is a vector of variables, and P is a finite conjunction of predicates. For instance, the * 1080 rule: "If there-exists z such that Syllable(z) & Sound(a, z) & Sound(b,z) then Error (a, b)", can be used in order to explain a mistake between two sinograms a and b which are homophonic. We give some examples ofrules expressed in sequent calculus formalism. rule-I: ::3 z Syllable(z) & Sound(a,z) & Sound(b, z) I- Error (a, b) rule-2: ::3 u z ml, rn2, G-M(u) & G-M(z) & Meaning (mI) & Meaning (m2) & ml:;t:m2 & u:;t: a & z:;t:b & Etymo (u, mI) &Etymo(z, m2) & Etymo (a, ml) & Etymo (b, m2) & Error Cu, z) I- Error (a, b) rule-3: ::3 s Stroke (s) & c-o(a, s) & c-o(b, s) I- Error (a, b) Logical rules: r 1- r ~ r,A°1- r, r', ~ A I- 5 Explanation in term of Proofs In this section, we first present a formal description of EMICW by means of the FLL-theory T, then we will give an example of explanation. The FLL-theories use a fragment of linear logic (see also [Cerrito 1990], and [Masseron et a1. 19901 for some particular applications of this logic). We suppose the reader familiar with sequent calculus. In the next chapter, we will discuss the tractability ofEMICW. 5.1 Formal Description of EMICW The FLL-theories are built from the linear fragment which consists of the connectives & (conjunction), the connective y (disjunction), and the linear negation denoted by ( )0. The essential feature of the fragment used is the absence of the contraction and weakening rules listed below: r, A, A 1r, A 1r 1- ~ r, A 1- ~ r (C-l) 1- ~, A, A r ~ 1- ~, B,~ rl-A, r 1- (A Y B), r, A ~ (W-r) Axioms : A 1- A r 1- ~, A A, r, r' 1- r' 1- ~ , (C) ~,~' Exchange rules: r, A, B 1- ~ r,B,AI- ~ (Ex-I) r I- rl- ~, A, B (Ex-r) ~,B,A ~ I- A, (&-11) _-=-r-2.,-=B,-,---I-~~"-----j(&_12) ~ r r, r, r,(A&B)I- r,A 1- r,3x AI- ~ (3-1) ~ r 1r ~ ~ ~ ~ (&-r) 1- B, A(t/x) I- ~ (V-I) V x A 1- (y-l) ~, ~' (y-r) r I- (A&B), A, ~ 1- V x A, (V-r) ~ r I- A(t/x) ,~ (3-r) rl- 3 x A, ~ In rules (\I -r) and (::3-1), x must not be free in rand L1 A FLL-theory can be obtained from the above fragment by adding a finite set of proper axioms S, which. are sequents closed under substitution. In the cut rule gl.v en above, the formula A is the cut-formula. A proof In a FLL-theory is said to be cut-free, if all cut-formulas involved occur in some sequent of S. In our particular application, the set of proper axioms S which completely defines the FLL-theory T is made up of two subsets S 1 and S2. The subset of proper axioms S 1 corresponds to the terminological component. They have the general form A I- B, where A and B are literals which interprete either concepts or roles. So, the terminological component of EMICW can be formally described by the FLL-theory Tl limited to the set of proper axioms Sl. The subset of proper axioms S2 is given by the rules of the assertional component. 5.2 The axiom and the rules of the fragment are the following ones: Cut r ~' e-r) ~ ~ ~ 1- A I-~ 1- AO (C-r) A r 1- ~ r 1- ~, A ---=----'---=-- (W -I) B 1- r, r', (A y B) 1- r,(A&B)1- The rule-I deals with errors due to homophonic sinograms. The rule-2 explains that the confusion of a with b may come from a misunderstanding of the etymologies of some components of the sinograms a and b.The rule-3 stresses the importance of the calligraphic order: two sinograms with the same calligraphic order can be confused. r, A,~ (0_1) To Explain is To Prove EMICW combines the two following different reasoning methods: 1. The classifier which performs inferences by means of the subsumption operation. 2. A theorem prover which applies the cut-rule by only using the cut-formulas which appear in the rules of the set S2. An explanation of a graphic error consists in finding a finite conjunction of ground formulas Sigma == PI & ... & Pn such that a proof of the linear sequent Sigma IError (a, b) can be carried out in the FLL-theory T. Let us show how we proceed generally. 1. First 'case: the cut-formula doesn't contain the predicate Error. 1081 '* axiom of S2 Sigma f-:3 x P(x) :3x P(x) f- Error (a, b) (Cut) Sigma f- Error (a, b) The proof of the sequent Sigma f-:3x P(x) consists in instantiating the existential quantifier. We define a component called instantiation component which performs the following operations: 1. it defines a concept CP by using the properties given in the predicates P. 2. it locates the concept CP at its optimal position with the help of the classifier, such that there exists a witness c which satisfies P in the taxonomy of EMICW. We obtain the new sequent to be proved, Sigma f- P(c). We "force" the proof of this sequent by setting Sigma = P(c) & P2 ... &Pn. The proof of the sequent P(c) & P2 ... &Pn f- P(c) is now straightforward by means of the (&-11) rule. 2. Second case: the cut-formula contains the predicate Error. We are left with the following tree: Sigma f-:3x y P(x,y) & Error (x, y) Sigma f- Error (a, b) In the same way as indicated above, we use the instantiation component to point out two witnesses c and d which satisfy P. We obtain the following sequent to be proved: Sigma f- P(c,d) & Error (c,d). We apply the (&-r) rule and we obtain the new tree: Sigma f- P(c,d) Sigma f- Error (c,d) ----------------------------------------- --------- (& -r) Sigma f- P(c,d) & Error (c,d). We set Sigma = P(c,d) & P2 ... & Pn, so, we are now left with the proof of the sequent: P(c,d) & P2 ... & Pn fError(c,d). We progressively makes appear all the formulas of Sigma by iterating the same process. The sequent Sigma f- Error (a,b) may have several proofs. In this case, the system can give multiple explanations to the students. The best explanation must allow the students to better memorize the sinogram a. We think that a good criterion for the choice of the best explanation can be : 1. The presence of the predicate Etymo in the explanation with the meanings of the components 2. The shorter proof (a proof which applies the smaller number of rules ). 5.3 An Exalnple of Explanation Let us explain the confusion of the sinogram ~ tian1 (sky) with the sinogram:t. fu4 (adult) by means of proofs. Etymologists givt the following explainations: the sinogram sky comes from a person standing with arms spread out to look as tall as possible with a big head (or a hat) symbolised by the stroke ..... The sinogram adult *- comes from tall with an ornamental hairpin through his hair (a sign of adulthood in ancient China) symbolised by the stroke _ . So, we propose the following taxonomy: 1. The concepts G-F90 and G-M90 give the graphic feature of the sinogram ;(: da4 (tall), and its etymology G-F90 = { (c-o, ( hr dg dd», (o-t-d, ( dg hr dd», (o-b-u, (dg dd hr », (o-I-r, (hr dg dd», (o-r-l, (hr dd dg»}. G-M90 = {(IS-A, G-F90), (etymo, tall)}. 2. The concept G-FOI corresponds to the graphic feature of the sinogram yi 1, G-FOI = {(c-o, ( hr », (o-t-d, ( hr », (o-b-u, ( hr », (o-l-r, (hr », (o-r-l, (hr »}. As the sinogram yil has (at least) two different origins, hat and hairpin, we define two concepts of type G~M: G-MOlO = {(IS-A, G-FOl), (etymo, hat)} G-MOll = {( IS-A, G-FOl), (etymo, hairpin)} 3. The concept G-M 100 defined as {(IS-A, G-FlOO), (etymo, sky)} (see the example-l of section 4.1) can be located now as: G-MlOO = {(IS-A, G-M90), (IS-A, G-MOlO)} . The sinogram ~ tian 1 represented by the individual concept alOO= {(IS-A, G-M 100), (IS-A, G-S 100), (sound, tianl), (etymo, adult)} inherits the properties (etymo, tall) and (etymo, hat) from the concepts G-M90 and G-MOIO. In the same way, the sinogram:t . adult is represented by the individual concept b100~ l: (IS-A, G-MllO), (IS-A, G-S 110), (sound, fu4), (etymo, adult)}, and inherits the properties (etymo, tall), (etymo, hairpin) from the concepts G-M90 and G-MOIl. In order to prove the sequent Sigma f- Error (aIOO, bIOO), we propose to apply the cut-rule (C) with the cutformula appearing in the rule-2: :3 u z ml m2 G-M(u)& G-M(v) & Meaning (ml) & Meaning (m2) & ml ;j:. m2 & u ;j:. a & z;j:. b & Etymo(u, ml) &Etymo(z, m2) & Etymo (a, ml) & Etymo (b, m2) & Error( u, z) f- Error (a, b). We are left with the following sequent to be proved : Sigma f-:3 u z ml m2 G-M(u) & G-M(v) & Meaning (m1) & Meaning (m2) & ml ;j:. m2 & u ;j:. a100 & z;j:. blOO & Etymo(u, ml) & Etymo(z, m2) & Etymo (aIOO, ml) & Etymo (b100, m2) & Error( u, z). The instantiation component instantiates the variable ml to hat, and the variable m2 to hairpin (it has only this possibility), and defines the two individual concepts uC and vC whose etymologies correspond to these meanings: uC ={ (IS-A, G-MOIO), (etymo, hat)}, and vC= {(IS-A, G-M011) , (etymo, hairpin)}. Then, Sigma contains the following main ground formulas: Etymo(uC, hat) & Etymo(vC, hairpin) which shows that the reason of the confusion of a 100 with b 100 comes from a misunderstanding of the origins of the component - yi I which appears in these two sinograms. We invite the reader to try to apply the rule-3 in place of the rule-2. He will find that the confusion of a 100 with blOO may come from the fact that these two sinograms have the same calligraphic order. 6 Computational COlnplexity In this chapter, we prove that EMICW is tractable. The main problem comes from subsumption. The subsumption opeartion has been particularly analysed in [Levesque and 1082 B (b I z, a), 'yIz B(z,a) 1- Brachman 1987] and in [Schmidt-Schaub 1989]. Their approach are mainly based on semantics. In [Castaing 1991], we have characterized a subsumption criterion by means of proofs in FLL-theories as Tl (see section 5.1). We briefly explain how we have proceeded. 6.1 Tractability of Subsumption Let A and B be two concepts. We interpret A and B by means of first-order formulas, as in Brachman-Levesque's interpretation, then, we replace all classical connectives with linear ones. Let Ac = 3xAl(x) & ... An(x), and Bc = 'yI zB 1(z)& ... Bm(z), (where z and x can be vectors of variables, and Ai(x) = Ai 1(x) y.. :y Aip(x), Bj(z) = Bj 1(z) y...y Bjq(z» be the conjunctive normal forms obtained. A subsumes B iff there exists a cut-free proof in Tl of the sequent Bc I- Ac. In the absence of contraction and weakening, we proved the following result : Theorem (subsumption criterion): A subsumes B iff Ac and Bc satisfy the following condition (C): there exists a, a substitution for x such that for each Ai, 1~ i ~n, there exists some Bj, l~ j ~m, and b, a substitution for z, such that there exists a cut-free proof of the sequent Bj.b I- Ai.a in the FLL-theory Tl. A matching algorithm can be easily derived from the condition C. It computes subsumption in polynomial time proportional to the length of the concepts, and to the cardinality of the set of proper axioms S 1. Without contraction and weakening, FLL-theories are decidable. There exists other decidable first-order theories which are based on classical logic [Ketonen and Weyhrauch 1984], or [Patel-Schneider 1985, 1988]. The originality of our approach comes from the way we deal with the universal quantifiers (or with the existential ones). Let us show how we can explain the rise in complexity of subsumption by means of contraction. We consider the following cases: 1. Bc and Ac satisfy condition (C) (the contraction rule is absent): the sequent Bc I- Ac is provable in polynomial times, then the complexity of subsumption is polynomial. 2. Bc and Ac do not satisfy condition (C): let us suppose that the sequent Bc I- Ac is provable (for example, by means of an approach based on semantics), and the proof of the sequent Bc 1- A c necessitates the use of the contraction rule, (and possibly of the weakening rule): the search procedure for a proof can make sequents of the form 'yIz B(z,a) I-~, (or of the form r I- :3 x A(x, a» appear at the nodes of the search-tree. Let us consider the case, where the sequent 'yIzB(z,a) I- ~ appears at a node of the search-tree: the search procedure can go back-up the tree by applying the universal and contraction rules. We can be left with the following tree: ~ ---'----'--------,(~ -1) ~ z B(z,a), 'yI z B(z,a) I- ~ ~zB(z,a) I- (C-l) ~ The use of contraction may open a branch which terminates with a failure. Some back-tracking is then necessary. The complexity of the subsumption in this case is NP-hard. 3. Bc I- Ac is not provable, then the use of the contraction rule may lead to duplicate infinitely the same formulas in the case where the set of instantiation terms (such as b) is infinite (for example in presence of functions) B (b/z,a), ~ zB (z,a), 'yI zB(z,a)l-~ (C-I) B(blz, a),'yIzB(z,a)l-~ ---.:..~-,-,-----,-----,----,------,(~ 'yIz B(z,a),'yIz B(z,a) I- -1) ~ --"-':'-~----':"""":--'------(C-l) 'yIzB(z,a) I- ~ The subsumption turns to be undecidable. 6.2 Tractabilty of EMICW The terminological component of EMICW has a restricted syntax. The condition (C) defined above gives an adequate subsumption criterion. In order to locate a concept at its optimal location, the classifier performs the subsumption operations in number limited by the diameter of the semantic network. Its computational complexity is then limited. The theorem prover applies the cut-rule, with cutformulas in some sequents of S2 (see section 5.2). Without contraction, the existential formulas which appear are never duplicated, and so, are only instantiated by means of the classifier. The cardinality of S2 is finite. Then, the proof of the sequent Sigma I- Error (a,b) can also be carried out in limited time depending on the cardinaly of the set of proper axioms S= S 1 + S2. The tractability of our system is then ensured. Conclusion A prototype of our EMICW system is implemented in LISP. For the time being, if the student writes down a graphic which is not recognised as a sinogram, the system has no particular strategy for discovering a "good" witness of the error. We are now investigating a strategy of choice of witnesses, which can take the context of the dictation, (the sinograms that the student have already drawn during the dictation) into account. Providing adequate rules, EMICW can also help students to learn japanese characters (kanjis) with the chinese or the japanese reading, or to learn classical vietnamese characters (nom). 1083 Acknowledgements Representation and Reasoning, Compo Intell. 3 (2) (1987) pp 78-93. I would like to thank the four FGCS 's referees for their comments which contributed to clarify the presentation of my work. Discussions with my colleagues of PRC-IA were very helpful. The contribution of J-L. Lambert and C. Tollu to this work was invaluable. Thanks to both. [Masseron M. and Tollu C. and Vauzeilles J. 1990] "Generating plans in linear logic" Proc. FST & TCS 10, Bengalore (India), Dec. 1990. References [Brachman RJ and Levesque HJ .1984]: "The Tractability of Subsumption in Frame-Based Descrition language" Proceedings AAAI-84, August 84, pp34-37. [Brachman R.J and Smolze J.G. 1985] : "An overwiew of the KL-ONE Knowledge Representation System". Cogitive Sci. 9(2) (1985) 171- 216. [Brachman R.J and Gilbert V.P and Levesque HJ. 1985]: "An Essential Hybrid Reasoning System:Knowledge and Symbol Level Accounts of KRYPTON" Proc. 9th IJCAI (1985) Los Angeles. pp 532-539. [Bylander T. 1991]: "The Monotonic Abduction Problem: A Functional Characterization on the Edge Of Tractability" . Principles Of Knowledge Representation and reasoning Proceedings of the Second Internatipnal Conference. Cambridge, Massachusetts. April 1991. [Cerrito S. 1990]:" A linear semantics for Allowed Logic Programs" Proc. 5th Annual IEEE Symposium on Logic in Computer Science, IEEE Computer Society Press, 1990,219-227. [Castaing J. 1991]: "A New Formalisation Of Subsumption In Frame-Based Representation Systems". Principles Of Knowledge Representation and Reasoning Proceedings of the Second International Conference. Cambridge, Massachusetts. April 1991. [Girard J. Y. 1987] : "Linear Logic" Theorical Computer Science 50 (1987) .pp 1-102. [Girard J. Y. 1989] :" Towards a Geometry ofInteraction" Proc. AMS Conference on Categories, Logic and Computer, Contemporary Mathematics 92, AMS 1989). [Gentzen G.1969] :" The Collected Papers of Gerhard Gentzen" Ed. E; Szabo, North-Holland, amsterdam (1969). [Kazmareck T.S and Bates R and Robbins G.1986]: "Recent Developments in NIKL" Proc. AAAI-86. Philadelphia, pp 978-985. [Ketonen J. and Weyhrauch R. 1984] fragment of predicate calculus" Theorical Computer Science 32:3, 1984. [Nebel B. 1988]: "Computational Complexity of Terminological Reasoning in BACK". Artificial Intelligence 34 (1988) pp371-383. [Patel-Schneider P.F. (1985)] : "A Decidable First-Order Logic for Knowledge Representation" Proceedings 9th. IJCAI (1985). Los Angeles. pp 455-458. [Patel-Schneider P.F 1988]: "A Four-Valued Semantics for Terminological Logics" Artificial Intelligence. 36 (1988) pp 319- 353. [Schmidt-Schaub M. 1989]: "Subsumption in KL-ONE is Undecidable" First International Conference on Principles of Knowledge Representation.1989. pp 421-431. [Wang P.S.P 1988]: "On Line Chinese Character recognition" 6th IGC Int. Conference on Electronic Image pp209-214 1988. [Yamamoto Y. 1991]: "Two-Dimensional Uniquely Parsable Isometrc Array Grammars". Proceedings of the International Colloquium On Parallel Image Processing Paris] une 1991. [Alleton V.1970]: "L'Ecriture Chinoise". Que sais-je N° 1374 [Bellassen 1989]: "Methode d'Initiation l'Ecriture chinoises". Eds. La Compagnie/ Bellassen 1989. a la Langue et a [De Francis]: "Character Text for Beginning Chinese" Yale Language Series. New haven and London, Yale University Press. [Henshall Kenneth G 1988]. : "A Guide to Remembering Japanese Characters" Charles E. Tuttle Company, Inc. of Ruland, Vermont & Tokyo, Japan 1988. [LI XiuQin 1991]: "Evolution de l'Ecriture Chinoise". Librairie You Feng Paris 1991. [Lyssenko N. and Weulersse D.]: "Methode Programmee du Chinois Moderne" Eds. Lyssenko Paris 1987. [Ryjick K. 1981]: "L'Idiot Chinois". Payot1981 "A decidable [Levesque H.J and Brachman R.J. 1987]: "Expressiveness and Tractability in Knowledge [Shangai Foreign Language Institute 1982]: "A Concise Chinese Course For Foreign Learners" (Books 1 and 2). Shangai Foreign Language Institute Press 1982. [S.S Wieger 1978]: "Les caracteres chinois" Taichung 1978 PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1084 Automatic Generation of a Domain Specific Inference Program for Building a Knowledge Processing System Takayasu Kasahara"', Naoyuki Yamada"', Yasuhiro Kobayashi"', Katsuyuki yoshino ...., Kikuo Yoshimura"" "'Energy Research Laboratory, Hitachi, Ltd., 1168 Moriyama-cho, Hitachi-shi Ibaraki-ken, Japan 316 Tel. (0294) 53-3111 ....Software Development Center, Hitachi, Ltd., 549-6, Shinano-cho, Totsuka-ku Yokohama-shi Japan 244 Tel. (045) 821-4111 Abstract We have proposed and developed an expert system tool ASPROGEN(Automatic §..earch ~m Generator) having a built-in the automatic generation function of a domain specific inference program. This function was based on search-based program specification and an abstract data type of search. ASPROGEN has interfaces for domain knowledge using an object-oriented approach and constraints which represent control knowledge. It is described by using domain knowledge and it can cover a detailed problem solving strategy We applied ASPROGEN to produce three kinds of scheduling systems. These generated systems have equivalent performance in comparison with knowledge processing systems implemented by the conventional tool. Further, a twothirds reduction of the program step numbers required as programmers' input was realized. 1. Introduction Current expert system tools based on production rule and/or frame representation provide an environment to generate expert systems through formalizing and describing problems by production rules. They are powerful tools, and many practical expert systems have been produced by using them. Industrial field applications of expert system tools have sometimes met problems, the most important one being that tools based on the production system only prepare a rule based language, not a problem solving strategy. So, mapping the problem solving strategy to the production rules is difficult for users who are not knowledge engineers. Domain shells[ll, tools based on generic task method[2], half weak method[31, and SOAR[4] have been developed to overcome this difficulty. Domain shells are expert system tools which are restricted to the specified problem regions such as diagnosis, scheduling, and design. They have spread sheet type user interfaces and problem-specific inference programs. But, actual industrial problems include particular conditions, constraints, or problem solving knowledge, and domain shells do not have enough flexibilities to cover all of them. This leads to a conflict between tool flexibility and easy use. In general, the tool becomes more specific to some regions, so it becomes easy to use it, but loses flexibility. The generic task method and half weak method also have this conflict. The generic task method classifies problem solving methods into several types which are called generic tasks, and prepares generic task tools to provide them. Tool users select an appropriate generic task and supply domain knowledge to develop the knowledge processing system. The half weak method regards problem solving as a search and provides pre-defmed search modules. Tool users select an appropriate search module and add domain knowledge to the module. However, these methods, based on classification, do not necessarily give directions for systematic preparation of building blocks of knowledge processing systems. So, tool users must reformulate the problem definition according to the prepared building blocks. SOAR has more flexibility for defining the problem solving strategy. It can generate a search program by defming several search control rules. But, lack of functions to relate the search program and domain knowledge restricts the applicability of SOAR to toy problems. Then, we developed ASPROGEN(~utomatic §..earch !X.Qgram Generator). ASPROGEN is an expert system tool having a built-in automatic generation function of a domain specific inference program was built. To specify the problem to be solved, it has interfaces for describing the problem solving strategy as a search strategy, domain knowledge in an objectoriented way, and the detailed problem solving strategy as constraints among the attribute values of the domain objects. 2. Overview of ASPROGEN 2.1 Building expert systems based on search ASPROGEN has no embedded inference mechanism. Instead, as shown in Fig. 1, its parts include the search program and search program generating mechanism which produces inference programs according to user specifications of the search program, domain knowledge, and detailed constraints. The reason Why we use search as the inference program specification is that it covers almost every inference mechanism required for expert systems, and it is simple. But, a search is not easy to describe nor is it easy to prepare controls tightly directed to a particular problem by the search strategy. 1085 To describe detailed control strategies, ASPROGEN includes an interface for domain knowledge. Using the domain knowledge, the detailed controls or problem solving strategy can be described as constraints between attribute values of domain knowledge. The detailed control programs are complicated in the case of a scheduling system or CAD systems, and it is important to support their generation. In general, domain specific inference programs which have functional operators have complicated constraints. ASPROGEN combines these constraints to global search strategies, and generates domain specific inference programs. ASPROGEN users develop expert systems by following this procedure: select nodes in the search procedures, pruning functions, establishment conditions, and so on. The evaluation functions define a global search strategy, for example one which prefers the deepest nodes of the search tree corresponds to the depth first search. The characteristics of the search nodes are described by specification values of the nodes in the search tree which are depth, breadth, parent relations, sibling relations, and node attributes values. Their values are retrieved from the structure of the search tree, and we can prepare these specification values or functions to calculate them. On the other hand, node attribute values cannot be retrieved from the structure of the search tree, and it is difficult to cover all attribute values of the nodes to specify the problem solving strategy. Genera! Search Program L _-~~~~~r;;;=~ Q Domain knowledg Constraints Expert & Global search information Unknown constraints Knowledge-base retrieval Heuristic search Operator type Link function I Knowledge processing module Fig. 1 Overview of ASPROGEN given as conditions (1)Users specify a problem solving strategy from the viewpoint of a search strategy. (2)Users input the search strategy by selecting the classification items of the search classification tree which the tool prepares. This step is executed with the help of the tool interface. (3)Users input domain knowledge and constraints with the help of the tool interface. (4)ASPROGEN generates a domain specific inference program and data structures for domain knowledge. Although (1) is an interesting problem, we limit the present discussion to (2)-(4). 2.2 Specification of problem solving strategy To specify the problem solving strategy as a search, we define a classification tree for the search strategy and a template of the search program. Figure 2 shows the classification tree. It comes from analyzing search trees used in various kinds of problem solving. A search tree consists of nodes and operators. We retrieve the classification items from the characteristics of the nodes and operators. The first classification item comes from the characteristics of the operators. There are two operator types. One is a functional operator which creates new nodes from parent nodes and adds them to the search tree. In the scheduling search program, a functional operator is used. The other type is a link operator. The link operator is used in the diagnosis search program which selects, suitable diagnosis nodes for the observed state. The second classification item comes from the characteristics of the nodes. They are evaluation functions to I Goal type. . st d given as In anlce no e . I Solution type Optimal Satisfactory I .. given as conditions I Initial node type . . given as Instance node Node evaluation function type not fixed I Known I I Node evaluation function search tree parameter I pruning Not prune I fixed paramete~ I Id domain knowe ge I I Prunig function parameter search 1ree parameter ~ II une I Fig. 2 Search classification tree To mitigate this difficulty, we rank the attribute values from the viewpoint of their relation to the search tree operators. The attributes which search for the operator directly are called first-order attributes. For example, in the scheduling system starting time and ending time of each job are first order, and the resource constraints are not, if search operators are functions which adjust job scheduling. To describe programs in detail, not only first-order attributes but also multi-order attributes or variables are required. The ftrst-order attributes and the multiorder attributes are domain knowledge. We do not embed detailed domain knowledge in ASPROGEN, instead an 1086 interface is prepared to describe the domain knowledge and constraints of attributes of domain knowledge and global search strategy. By combining the global search strategy, described as a search strategy, and domain knowledge, ASPROGEN covers not only toy problems, but also applications for industrial uses. 2.3 Representation constraints of domain knowledge and ASPROGEN has an interface for describing the domain knowledge. Domain knowledge is described by objects and attributes, attribute value ranges, and attribute constraints. There are two types of objects. One is a class objects which defmes attributes, and relations between other objects. The other type is an instance object which has instantiated attribute values. Figure 3 shows a representation scheme of domain knowledge for ASPROGEN. Nodes of the search tree are also objects. Node objects are related to other objects. The relations among objects are of three types. (1) Class-instance relations: Instance objects have the same attributes as class objects, and the values of the attributes are inherited from the class objects. (2) Attribute-value relations: The value region of the attributes can be described by he class objects. Thus, the attribute value region is a set of instance objects of the class objects. (3) Attribute-object relations: The attributes of the objects can be described by the class objects. Thus, the attributes of the nodes are instance objects of the class objects, and attribute values are those of the instance objects. to the global problem solving strategy, and constraint satisfaction programs which correspond to the domain knowledge. Figure 4 shows an outline of the inference program. The control program is embedded into ASPROGEN, and the global search program and constraint satisfaction programs are generated according to user input. If the inference program is completed, it behaves as follows. Using the global search strategy, the inference program activates an operator and generates or selects new node. Then, the constraint satisfaction programs activate and adjust the attribute values of the objects for every constraint. According to the result of the constraint satisfactions, the operator is activated again. This process continues until termination conditions are satisfied. The generating process of the inferen~e program consists of three steps. (1) Generate the search program which represents a global search strategy ASPROGEN has a general search program which is independent of domain and includes six search sub-functions as shown in Fig. 5. When completed, it becomes the global search program of Fig. 4. Constraint satisfaction programs are activated in the sub-function of 'Apply operator'. The difference between each search strategy is reflected in the difference of the six element functions. ASPROGEN prepares two reference tables and abstract data types for search[S],[6J• Parent function parent(c,k) which returns the parent of the node k of search tree c, and Left_most_child(c,k) which returns a child node which was first generated or selected are examples of abstract data types for search. Here, an abstract data type of search makes up the functions for the search program. The first reference table is a table intended for generation of search element functions. C) Instance node of search tree ~ Attribute-object relation ...... Attribute-value relation --- Class-instance relation Fig.3 Scheme of knowledge reprsentation in ASPROGEN On the basis of these definitions of domain knowledge, ASPROGEN users describe constraints. ASPROGEN prepares a simplified language which can describe constraints by using object names and attributes. 2.4 Generation of problem-directed inference program The inference program generated by ASPROGEN consists of two parts, the search program which corresponds CSP: Constraint satisfaction progam Fig. 4 Outline of the inference progam Fig.5 General search progam Search element functions are program parts of sub-functions and Fig.6 shows some. They consist of the abstract data type of search and domain-dependent search functions. Examples functions are node evaluation function for domain dependent corresponding search element functions to the userspecified problem solving strategy. Figure 7 shows the generating process of the problemdirected inference program. Referring· to the problem solving strategies using the reference table, the system decides the search element function. This is done by domain dependent search control functions such as evaluation function for node. 1087 Then, the same as in the definition process, sub-functions are dermed by abstract data jy~s of the search tree. Functions concerning search tree configuration (1) parent(c,k) (2) Leftmose_child(c,k) (3) Right_sibling(c,k) (4) Label(c,k) (5) Root(k) (6) Clear(k) (7) Deep(c,k) (8) Height(c,k) (9) Leaf(c,k) :returns parent of c in search tree k. :returns eldest son of c in search tree k. :returns next younger brother in search tree k. :returns label of c in search tree k. :returns root node of k. :makes search tree k null set. :returns depth of c in search tree k. :returns height of c in search tree k. :if c has no children returns yes; otherwise, return no. generating process, which exemplifies the node evaluation function. According to the user-input problem solving strategy that the depth of the tree has a high evaluation value, the tool selects the depth function from abstract data types and completes the node evaluation function. Reference table classification selected statement . value item solution type satisfactory ----------~------~ Functions concerning search tree operation§.(search element functions) I (10) Evaluate(c,k) :evaluate c, and returns evaluation value. (11) Chllnge_e(n_t,S,k) :change evaluatioD function of node type n_t to S. (12) Search state(c,k) :if c is an open node returns Current; if c is a removed node returns Finished; otherwise returns Yet. (13) Jumping(c,k) :returns the node, when c is established. (14) Back_tracking(c,k) :returns the node, when c is not established. (15) Initial(c,k) :if node c is initialized state returns yes, otherwise returns no. (16) Kill(c,k) :removes node c from the active node. (17) Active_c(k) :returns active node of search treeof k. (18) Goal(c,k) :if c is a goal node returns yes. (19) :if node c satisfies establish condition of node Cond_Dode(n_t,c,k) type of n_t, returns yes, otherwise returns no. (20) Cond_node_type( :if node c satisfies establish conditions of node D_t,c_t,c,k) type of D_t, returns yes, othewise returns DO. (21) Establish(c,k) :if c is established returns yes, otherwise returns no. Evaluate(c, k) { return ( user_define_func + depth (c, k)1 I Fig. 8 Generation Example for search element functions Figure 9 shows an example of a sub function generating process, which exemplifies function (a) in Fig.S which is named SUCCES_END(c,k.) here. Since the optimal solution is requested in the problem-solving strategy, the tool generates the checking successful termination function which terminates the inference program only if an optimal solution is found. Fig. 6 Search element functions Reference table classificatiori item solution type y Global searell informalion unknown nequiHKI goat number:S Jound goal number Generated CheCKing~\ . function if terminated y I successJully or nol you~n~ goal number Problem solving Program paris slralegy Optimal solulion Number oJ active nodes -0 Search elemenl Junclion J ISel search Iree I -t iN ~eneraled checking function if terminated unsuccessJully or nol ( IN y -~ ~ /1" _l,,- -~"-:::;:--'--"=-I ~--------~II-~ 1.-----' Problem-solving slralegy 'O""'"I'~"" I number of selected value ~atisfactory I::)GOAL:'NUM:;~:: statement ctive c (c, k) = {} GOAL NUM && -'-- ~____~~g~o~a~l~~£fJ[12jj1fJ1£i~==~:=~___________~---~ SUCCES_END(c,k) ( if~ctive_c(c,k) __ () && ~__________________~ GOAL_c (c, k) >=/30AL_NUM) l~_ _ _ _ _ _ _-, return (TRUE) ; else return(FALSE);) Fig.9 Generation Example for search sub-functions Templale oJ generaJ search ro ram Generated inference program Fig. 7 Generating function of program-directed . inference program (2) Generate the constraint satisfaction programs, according to the user specifications in simplified language The object names and attribute names of the objects which the tool users input are registered in the ASPROGEN as key words for simplified language constraints description. We call the language, SCRL (Simple Constraint Representation Figure 8 shows an example of the element function 1088 tenninal symbols. The SCRL compiler accepts only the following style sentence. [value clause] [comparing key words] [value clause] A value clause consists of object name, attribute name and object relation key words. Table 1 lists key words and their meaning in SCRL. There are set operation key words, comparison key words, and object relation key words such as 'of'. Figure 10 shows an example of a constraint described by SCRL in which the number of persons required for each time span is less than the available personnel number. object attribute values and their range. Figure 11 shows an example of the constraint satisfaction program generating process. The constraint is like in Fig. 10, a personnel constraint. First, the sentence is pursed into the C language by the SCRL compiler (code C in Fig. 11). And attribute value range and code C are set to the allotting function which ASPROGEN prepares, and the constraint satisfaction program is completed. (3) Synthesize all constraint satisfaction programs and the search program Finally, ASPROGEN synthesizes all the constraint satisfaction programs and the search programs, and generates the domain specific inference program. The key point of the synthesis is to ensure consistency of the attribute values of the objects which the tool users define. To make the argument clear, we define the identity of the search node and scope of the attribute values. Constraint name: personnel sum«time_span of job), person) > time_span of avairable_person Identitv of the search node Fig. 10 Example of constraint Table 1 Example of key words of the SCRP Set operation keywords Scope of the attribute values · A include B: A:2 B · A have e: e E B · SUM(A,B): sum up attribute value of B of all instance object of A Comparing keywords ·x>y ·x=y object relation key words . A of y: value of attribute y of object A. Code C !SCRl Compiler Constraint name: personnel Step. 1 sum«tima_span of job), person) > time_span of avairable-person The identity of the node is defined by equality of the value set of the first-order attributes (cf. Section 2.2). Search tree operators operate them directly. So, it is possible that the inference programs generate different results, though the problem solving strategies are the same. { for(t=O;tjobOl·person; iI(sum >= tima_span[tj.available-person) return(O) ;) raturn(1);) We define scope of the attribute values in the search tree node. The attribute value of the objects should have a consistency in the tree node, and the change of the attribute values in the process of the constraint satisfaction must propagate to other constraints. Set first-order atribute values Pick constraints which restricts first-order attribute values according to CSP Usar input domain knowladga Coda C Stap 3 '-----4 ~!~~ng 1----+1 Attributa value sat Fig.11 function (built in) Generating process of constraint satisfaction program ASPROGEN generates constraint satisfaction programs from the kernel of the constraint satisfaction programs. This kernel of the constraint satisfaction programs comes from the relation of attributes and their value range. The allotting mechanism of the\attribute values is built to ASPROGEN. The mechanism selects values from the value range. If constraints are not satisfied, other values are selected. ASPROGEN generates each constraint satisfaction program by setting the Fig. 12 Simplified procedure for constraint satisfaction 1089 rBy search operator F1=10 Constraints C 1. x+y +f1 > 30 C2. y+z +f2 < 15 C3. x+z <10 Region of values x \5,10,15.20\ Y E {4.8\ Z E {3.6\ E x,y,z: multi-order attributes f1 ,f2: first-order attributes R(s): suitable value set for s. F2= 7 J.. Initial set R(x)={5,1 0, 15,20} R(y)={4,8} R(Z)={3,6} B is a plant construction scheduling program. The generated program produces a schedule under previous relations between tasks and personnel limitations. Problem C is a jobshop scheduling system. The generated program produces a schedule under constraints of resource limitations and appointed date of delivery. TabJe 2 Test problems 1 Problem Characteristics of solution By C1 R(x)={1 0, 15,20} R(y)={4,8,16} R(Z)={3,6} Maintenance scheduling optimal Construction scheduling satisfactory ..1 By C2 R(x)={10,15,20} R(y)={4} R(Z)={3} 1 By C3 R(x)= R(y)={4} R(Z)={3} Fig. 13 Example of filterinq process Figure 12 shows a simplified mechanism to assure consistencies of the attribute values. At first, using the search tree operator, first-order attribute values are instantiated. In the next step, attributes which are constrained by the first-order attributes are instantiated by the allotting mechanism of the attributes. This process continues to survey all constraints. If the set of attribute values is found, then the first-order attribute set is suitable, and if not so, the node is unsuitable. But, this simple algorithm has a fatal defect, i.e. ineffectiveness of the allotting process. If global consistency among the constraints does not exist, the algorithm searches for every combination of the attributes until no solution is found. To avoid this ineffectiveness ASPROGEN deals with attributes as a set. In the first stage, using the search tree operator, first-order attribute values are instantiated. Then the available value set of multi-order attributes are filtered by the constraints. Figure 13 shows a simple example of the filtering process. At first the region of the attributes value set is a candidate for solution. Filtering by the constraints, inconsistent values are retrieved from the candidate set. The process continues until no retrieval value exists/or no suitable value exists for some attributes. 4. Example and Result Using ASPROGEN, we built three kinds of scheduling systems. They were a maintenance scheduling system (Problem A), construction scheduling system (Problem B), and jobshop scheduling system (Problem C). Problem A is a scheduling system for maintenance scheduling of a nuclear power plant[7]. The generated program produces a schedule under constmints of maintenance personnel limitations and interferences between tasks. Problem Jobshop sche Evaluation function Constraints Variety of resources . task 2 interference working time I . task execution order satisfactory 3 The problems are shown in Table 2. Table 3 summaries the problem solving strategy for each scheduling problem. These problems differ regarding solution type and resource numbers. Figure 14 shows the domain model of each problem. They are the basis of ASPROGEN input. The framework of these problems is the same. This means that global search strategies are the same. First-order attribute values are the starting and ending times of each jobs. Preference of the node is total scheduling time. There are interference constraints that some jobs cannot be executed simultaneously. Domain knowledge differs. For example, problem A and problem B have personnel limitations, and problem C has machine constraints. Table 3 Specification of the test problem-- task specific knowledge Definition of problem solving method Items maintenance scheduling construction scheduling jobshop scheduling Number of goal 1 all 1 Initial number I 1 1 Global search information none none none 'lYpe of operator function of adjusting schedule 'lYpe of initial state state of representing work schedule 'lYpc of goal state conditions that satisfies all constraints Solution type optimal satisfactory satisfactory Establish conditions about tree configuration none none none Evaluation function fixed fixed fixed Figure 15 shows program step numbers which programmers input. Comparing the inference programs implemented by using a conventional tool [8) , equivalent perlonnance is realized with two-thirds reduction in number of program steps required as programmer input. Of course, the reduction rate depends on the applications, for example, a diagnosis system has more domain knowledge, and the reduction rate may be smaller than for a scheduling system. But, overall some reduction of programmers load will result by the tool. 1090 (1) Problem A knowledge and it can cover a detailed problem solving strategy We applied ASPROGEN to produce three kinds of scheduling systems. These generated systems have equivalent performance in comparison with knowledge processing systems implemented by the conventional tool, and two-thirds reduction of the program step numbers required as programmer input was realized by ASPROGEN. \ We have applied ASPROGEN only to scheduling systems, we are now going to check its applicability to CAD systems and diagnosis systems. References (3) Problem C Fig.14 Domain model of the problems oo :ASPROGEN :Conventional tool based on production system 1500 .J::. 0 :c .;!: f! Ql .D_ E :;) :;) a. c:: .E j lnference program 1000 a.'" Ql ... _Ql '" E E E Problem solving strategy 500 ~ e! OJ OJ Task implementation knowledge ee 0.0. Problem Problem Problem ABC Problem A : Ma intenance scheduling Problem B : Construction scheduling Problem C : Jobshop scheduling Fig:15' -Program step numbers which programmers input 5. Conclusions We have proposed and developed an expert system tool ASPROGEN(Automatic §..earch ~ Generator) in which the automatic generation function of a domain specific inference program was built in. This function was based on search-based program specification and an abstract data type of search. ASPROGEN has interfaces for domain knowledge using an object-oriented approach and constraints which represent control knowledge. It is described by using domain [1] K. Okuda et al.: Model Based Process Monitoring and Diagnosis, Proc. of IEEE Pacific Rim International Conference on Artificial Intelligence'90,pp.134-139, Nagoya, Japan (1990). [2] B. Chandrasekaran: Towards Functional Architecture for Intelligence Base on Generic Information Processing Tasks, Invited Talk of UCAI-87(1987). [3] J. McDermott: Using Problem-Solving Methods to Impose Structure on Knowledge, Proc. of IEEE International Workshop on Artificial Intelligence for Industrial Applications, pp.7-11,Hitachi, Japan(1988). [4] J.Laird, et al.: Universal Subgoaling and Chunking, K1ur Academic Publishers(1987). [5] E. W. Dijakstra et al. : Structured Programming, Academic Press, London(1979). [6] A.V. Aho, et al.: Data Structure and Algorithms, AddisonWesley Publishing Company, Inc., Reading Mass.(1983). [7] T.Kasahara, et al.: Maintenance Work Scheduling Aid for Nuclear Power Plants, Proc. of IEEE International Workshop on Artificial Intelligence for Industrial Applications, pp.161-166 Hitachi, Japan(1988). [8] S. Tano, et al.: Eureka-ll A Programming Tool for Knowledge-Based Real Time Control Systems, International Workshop on Artificial Intelligence for Industrial Applications, pp.370-378, Hitachi, Japan(1988). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1091 Knowledge-Based Functional Testing For Large Software Systems Uwe Nonnenmann and John K. Eddy AT &T Bell Laboratories 600 Mountain Avenue, Murray Hill, NJ 07974, U.S.A. Abstract through time as people interact and come to undocumented agreements about the real meaning of features. Automated testing oflarge embedded systems is perhaps one of the most expensive and time-consuming parts of the software life cycle. It requires very complex and heterogeneous knowledge and reasoning capabilities. The Knowledge-based Interactive Test Script System (KITSS) automates functional testing in the domain of telephone switching software. KITSS uses some novel approaches to achieving several desirable goals. Telephone feature tests are specified in English. To support this KITSS has a statistical parser that is trained in the domain's technical dialect. KITSS converts these tests into a formal representation that is audited for coverage and sanity. To accomplish this, KITSS uses a customized theorem prover-based inference mechanism and a hybrid knowledge base as the domain model that uses both a static terminological logic and a dynamic temporal logic. Finally, the corrected test is translated into an in-house automated test language that exercises the switch and its embedded software. This paper describes and motivates the approach taken and also provides an overview of the KITSS system. The consequence of these problems is that programs that do not function as expected are produced and therefore extensive and costly testing is required. Once software is developed, even more testing is needed to maintain it as a product. The major cost of maintenance is in re- testing and re-deployment and not the coding effort. Estimates, as in [Myers, 1976] and [McCartney, 1991], are that at least 50%, and up to as much as 80%, of the cost in the life cycle of a system is spent on maintenance. 1 Functional Testing Problem There is an increasing amount of difficulty, effort, and cost that is needed to test large software development projects. It is generally accepted that the development of large-scale software with zero defects is not possible. A corollary to this is that accurate testing that uncovers all defects is also not possible [Myers, 1979]. This is because of the many inherent problems in the development of large projects [Brooks, 1987]. As just a few examples, a large project provides support for many interacting features, which makes requirements and specifications complex. Also, many people are involved in the project, which makes it difficult to ensure that each person has a common understanding of the meaning and functioning of features. Finally, the project takes a long time to complete, which makes it even harder to maintain a common understanding because the features change We believe that the only practical way to drastically reduce the maintenance cost is to find and eliminate software problems early and within the development process. Therefore, we designed an automated testing system that is well integrated into the current development process [Nonnenmann & Eddy, 1991]. The focus of our system is on "functional testing" [Howden, 1985]. It corresponds directly to uncovering discrepancies in the program's behavior as viewed from the outside world. In functional testing the internal design and structure of the program are ignored. This type of testing has been called black box testing because, like a black box in hardware, one is only interested in the input and how it relates to the output. The resulting tests are then executed in a simulated customer environment. This corresponds to verifying that the system fulfills its intended purpose. KITSS achieves a good integration into the current development process by using the same expressive and unobtrusive input medium (English functional tests) as is used currently as well as generating tests in the existing automated test language as output. Additionally, KITSS checks the tests for consistency with its built-in extensive knowledge base of "telephony". Therefore, KITSS helps the test process by generating more tests of better quality and by allowing more frequent regression testing through automation. Furthermore, tests are generated earlier, i.e., during the development phase not after, which should lead to detecting problems earlier. The result is higher quality software at a lower cost. In this section, we motivated the need for the approach 1092 chosen in KITSS. In the next section, we will describe KITSS in more detail. 2 KITSS Overview The Knowledge-based Interactive Test Script System (KITSS) was developed at AT&T Bell Laboratories to reduce the increasing difficulty and cost involved in testing the software of DEFINITY@PBX switches 1 . Although our system is highly domain dependent in its knowledge base and inference mechanisms, the approach taken is a general one and should be applicable to any functional software testing task. DEFINITY supports hundreds of complex features such as call forwarding, messaging services, and call routing. Additionally, it supports telephone lines, telephone trunks, a variety of telephone sets, and even data lines. At AT&T Bell Laboratories, PBX projects have many frequent and overlapping releases over their multi-year life cycle. It is not uncQmmon for these projects to have millions of lines of code. 2.1 Testing Process Before KITSS, the design methodology involved writing test cases in English. They describe the details of the external design and are written before coding begins. The cases, which are written by developers based on the requirements, constitute the only formal description of the external functioning of a switch feature. The idea is to describe how a feature works without having coding in mind. Figure 1 shows a typical test case. Test cases are structured in part by a goal/action/verify format. The goal statement is a very high-level description of the purpose of the test. It is followed by alternating action/verify statements. An action describes stimuli that the tester has to execute. Each stimulus triggers a switch response that the tester has to verify (e.g., a specific phone rings, a lamp is lit, a display shows a message etc). Overall, there are tens of thousands of test cases for DEFINITY. All these test cases are written manually, just using an editor, and are executed manually in a test lab. This is an error prone and slow process that limits test coverage and makes regression test intervals too long. Some 5% of the above test cases have been converted into test scripts written in an in-house test automation language. Tests written in this language are run directly against the switch software. As this software is embedded in the switching system, testing requires large 1 A PBX, or private branch exchange, switch is a real-time system with embedded software that allows many telephone sets to share a few telephone lines in a private company. GOAL: Activate CF 2 using CF Access Code. ACTION: Set station B without redirect notification 3 . Station B goes oflhook and dials CF Access Code. VERIFY: Station B receives the second dial tone. ACTION: Station B dials station C. VERIFY: Station B receives confirmation tone. The status lamp associated with the CF button at B is lit. ACTION: Station B goes onhook. Place a call from station A to B. VERIFY: No ring-ping (redirect notification) is applied to station B. The call is forwarded to station C. ACTION: Station C answers the call. VERIFY: Stations A and C are connected. Figure 1: Example of a Test Case investments in test equipment (computer simulations are not acceptable as they do not address the real-time aspects ofthe system). Running and re-running test scripts becomes very time consuming and actually controls the rate at which projects are completed. Although an improvement over the manual testing process, test automation has several problems. The current tools do not support any automatic semantic checking. The conversion from test case to test script takes a long time and requires the best domain experts. There are only limited error diagnosis facilities available as well as no automatic update for regression testing. Also, test scripts are cluttered with test language initialization statements and are specific to switch configurations and software releases. Test scripts lack the generality of test cases, which are a template for many test scripts. Therefore, test cases are easier to read and maintain. 2.2 KITSS Architecture KITSS takes English test cases as its input. It translates all test cases into formal, complete functional test scripts which are run against the DEFINITY switch software. To make KITSS a practical system required novel approaches in two very difficult and different areas. First, a very informal and expressive language needed 2CF is an acronym for the call-forwarding feature, which allows the user to send his/her incoming calls to another designated station. The user can activate or deactivate this feature by pressing a button or by dialing an access code. 3Redirect notification is a feature to notify the user about an incoming call when he/she has CF activated. Instead of the phone ringing it issues a short "ring-ping" tone. 1093 to be transformed into formal logic. Test cases are written in English. While English is undeniably quite expressive and unobtrusive as a representation medium, it is difficult to process into formal descriptions. It also requires theoretically unbounded amounts of knowledge to satisfactorily resolve incompleteness, vagueness, ambiguity, etc. In practice, however, test cases are written in a style that is considerably more restrictive than most English text. The test case descriptions are circumscribed in terms of the vocabulary and concepts to which they refer. Syntactic and semantic variations do occur, but the language is a technical dialect of English, a naturally occurring telephonese language that is less variable and less complex. These limits to a specific domain and style make it possible to transform the informal telephonese representation into a formal one. Second, incomplete test cases needed to be extended. Even though humans find it easier to write test cases in natural language as opposed to formal language, they still have difficulties specifying tests that are both complete and consistent. They also have difficulties identifying all of the interactions that can occur in a complex system. This is analogous to the difference between trying to define a word and giving examples of its use. Creating a good definition, like creating a complete test case with all the details, is usually the more challenging task; giving word-usage examples, like describing a test case in general terms, is easier. Therefore, the input test cases need to be translated into a formal representation and then analyzed to be corrected and/or extended. Both tasks have been attempted for more than a decade [Balzer et al., 1977] with only limited success. Most difficulties arise because of the many possible types of imprecision in unrestricted natural language specifications, as well as by the lack of a suitable corpus of formalized background knowledge to guide automated reasoning tools for most application domains. To address these two difficulties (see also [Yonezaki, 1989]), KITSS provides a natural language processor that is trained on examples of the telephonese sub-language using a statistical approach. It also provides a completeness and interaction analyzer that audits test coverage. However, these two modules have been feasible only due to the domain-specific knowledge-based approach taken in KITSS [Barstow, 1985]. Therefore, both modules are supported by a hybrid knowledge-base (the "K" in KITSS) that contains a model ofthe DEFINITY PBX domain. Concepts that are used in telephony and testing are available to both processes to reduce the complexity of their interpretive tasks. If, for example, a process gets stuck and cannot disambiguate the possible interpretations of a phrase, it interacts (the "I" in KITSS) with the test author. It presents the context in which the ambiguity occurs and presents its best guesses and asks the author to pick the correct choice. Finally, Figure 2: KITSS Architecture KITSS also provides a translator that generates the actual test scripts (the "TS" in KITSS) from the formal representation derived by the analyzer. The two needs described above led to the architecture shown in Figure 2. It shows that KITSS consists of four main modules: the domain model, the natural language processor, the completeness and interaction analyzer, and the translator. The domain model (see Section 3) is in the center of the system and supports all three reasoning modules (see Section 4). 3 Domain Knowledge A domain model serves as the knowledge base for an application system. Testing is a very knowledge intensive task. It involves experience with the switch hardware and testing equipment as well as an understanding of the switch software with its several hundred features and many more interactions. There are binders full of papers that describe the features of DEFINITY PBX software, but no concise formalizations of the domain were available before KITSS. One of the core pieces of KITSS is its extensive domain model. The focus of KITSS and the domain model is on an end-user's point of view, i.e., on (physical and software) objects that the user can manipulate. The KITSS domain model consists of three major functional pieces (see Figure 3): Core PBX model: It is split into two major parts. The static model is used by all reasoning modules. The dynamic model is used mainly by the analyzer. Test execution model: It includes details about the current switch configuration and all the necessary 1094 DO M A IN MOD STATIC MODEL CORE PBX MODEL • Major hardware components • Static data • Phenomena • Processes • Logical resources TEST • Configuration model EXECUTION • Automated test, MODEL language model E L DYNAMIC MODEL • Predicates • Primitive stimuli , • Abstract stimuli • Observables • Integrity constraints - Invariants - Rules • Phenomena, such as tones and flashing patterns which are occurrences at points in time. • Processes, such as static definition of types of calls (e.g. voice calls, data calls, priority calls) and types of sessions (e.g. calling sessions, feature sessions). • Logical resources, such as lines and trunks required by processes. LINGUISTIC • Telephonese statistics MODEL • Telephonese concepts TERMINOLOGICAL LOGIC • Static data, e.g., telephone numbers, routing codes and administrative data such as available features, and current feature settings. The test execution model is divided as follows: TEMPORAL LOGIC Figure 3: KITSS Domain Model • The configuration model describes the current test setup, i.e., how many simulated phones and trunk lines are available or which extension numbers belong to which phones/lines, etc. It also contains the dial plan and the default feature assignments. • The automated test language model defines the vocabulary of the test script language. specifics of the automated test language. This model is used mainly by the translator. Linguistic model: It is specific to the input language (telephonese) and is used mainly by the natural language processor. From a knowledge representational point of view, we distinguish between static properties of the domain model and dynamic ones [Brodie et al., 1984]. Static properties include the objects of a domain, attributes of objects, and relationships between objects. All static parts of the domain model are implemented in a terminological logic (see Section 3.1). Dynamic properties include operations on objects, their properties, and the relationships between operations. The applicability of operations is constrained by the attributes of objects. Integrity constraints are also included to express the regularities of a domain. The dynamic part of the core PBX model is represented in temporal logic (see Section 3.2). 3.1 Static Model This part of the domain model represents the static aspects of KITSS. By static we mean all objects, data, and conditions that do not have a temporal extent but may have states or histories. The static PBX model includes the following pieces: • Major hardware components, such as telephones and switch administration consoles as well as smaller subparts of theses components, e.g., buttons, lamps, and handsets. The linguistic model supports two pieces: • Telephonese statistics, which are frequency distributions of syntactic structures, help the natural language processor by disallowing interpretations of phrases and concepts that are possible in English but not likely in telephonese. • Telephonese concepts make it easier to paraphrase KITSS' representations for user interactions. We used CLASSIC [Brachman et al., 1989] to represent the knowledge in our domain. CLASSIC belongs to the class of terminological logics (e.g. KL-ONE). It is a frame-based description system that is used to define structured concepts and make assertions about individu-. also CLASSIC organizes the concepts and the individuals into a hierarchy by classification and subsumption. Additionally, it permits inheritance and forward-chaining rules. CLASSIC is probably the most expressive terminological logic that is still computationally tractable [Brachman et al., 1990]. Queries to CLASSIC are made by semantics not by syntax. The static model incorporates multiple views of an object from the various models into one (e.g., a station might have one name in the English test case, another in the automated test language and a third in the actual configuration). Thus, although each reasoning module might have a different view on the same object, CLASSIC will always retrieve the same concept correctly. 1095 3.2 Dynamic Model This unique part of the domain model represents all dynamic aspects of the switch's behavior. It basically defines constraints that have to be fulfilled during testing as well as the predicates they are defined upon. The dynamic PBX model includes the following pIeces: • Predicates, such as offhook, station-busy, connected, or on-hold, define a state which currently holds The different phases of a call for the switch. are described with predicates such as requestingconnection, denied-connection, or call-waiting-fortimeout. Each of the predicates has defined sorts that relate to objects in the static model. Synonyms (e.g., on-hold is a synonym for call-suspended) are allowed as well. • Stimuli can be either primitive or abstract. Stimuli appear in the action statements of test cases. A primitive stimulus defines an action being performed by the user (e.g., dials-extension, goesoffhook) or by the switch (e.g., timeout-call). The necessary pre- and post conditions (before and after the stimulus) are also specified. For instance, for a station to be able to go offhook the precondition is that the station is not already offhook and the postcondition is that the station is offhook after the stimulus 4 . An abstract stimulus is not an atomic action but may have pre- and post conditions like a primitive stimulus. However, there are several primitive stimuli necessary to achieve the goal of a single abstract stimulus (e.g., place-call, busy-out-station, or activatefeature). The steps necessary for an abstract stimulus are defined in one or many abstract stimulus plans. The abstract stimulus defines the conditions that need to be true for the goal to succeed whereas the abstract stimulus plans describe possible ways of achieving such a goal. • Observables are states that can be verified such as receives-tone, ringing, or status-lamp-state. ObservabIes appear in the verify statements of test cases. Additionally, the dynamic model includes two different types of integrity constraints: • Invariants are assertions that are true in all states. These are among the most important pieces of domain knowledge as they describe basic telephony behavior as well as the look €3 feel of the switch. The paraphrases of a few of the invariants are as follows: 4N ote the difference between the state of being offhook and the action goes-offhook. "Only offhook phones receive tones" or "You only get ringing of any kind when you are alerting" or "A forwarded call always alerts at the forwardee, never at the forwarder" or "You can't be be talking to an on-hold call". • Rules also describe low-level behavior in telephony. These are mainly state transitions in signaling behavior like "A tone must stop whenever another begins" or "Stop dial-tone after dialing an extension" or "An idle phone starts to ring when the first incoming call arrives". Representing the dynamic model we required expressive power beyond CLASSIC or terminological logics. For example, CLASSIC is not well-suited for representing plan-like knowledge, such as sequences of actions to achieve a goal, or to perform extensive temporal reasoning [Brachman et al., 1990]. But this is required for the dynamic part of KITSS (see above examples). We therefore used the WATSON Theorem Prover (see Section 4.2), a linear-time first-order resolution theorem prover with a weak temporal logic. This non-standard logic has five modal operators holds} occurs} issues} begins} and ends which are sufficient to represent all temporal aspects of our domain. For example, the abstract stimulus plan for activating a feature is represented in temporal logic as follows. (abstract-stimulus-plan activate-feature-1 «:plan-goal activate-feature) (:sorts «station s1) (feature f) (station s2))) (:preconditions «holds (onhook s1)))) ( :plan-steps «(occurs (initiate-feature-session s1 f)) (begins (receives-tone s1 second-dial-tone))) «occurs (dials-destination s1 s2)) (issues (receives-tone s1 confirmation-tone))) «occurs (terminate-feature-session s1 f)) ))))) The theorem proving is tractable due to the tight integration between knowledge representation and reasoning. Therefore, we specifically designed the analyzer using the WATSON Theorem Prover and targeting them for this domain. The challenging task in building the dynamic model was to understand and extract what the invariants, constraints, and rules were [Zave & Jackson, 1991]. Representing them then in the temporal logic was much eaSIer. 1096 3.3 Domain Model Benefits In choosing a hybrid representation, we were able to increase the expressive power of our domain model and to increase the reasoning capabilities as well. The integration of the hybrid pieces did produce some problems, for example, deciding which components belonged in which piece. However, this decision was facilitated because of our design choice to represent all dynamic aspects of the system in our temporal logic and to keep everything else in CLASSIC. There were other benefits to building a domain model. It ensures that a standard terminology is used by all of the test case authors. The domain model also simplifies the maintenance of test scripts. In automated testing environments without a domain model, the knowledge is scattered throughout thousands of scripts. With the domain model a change in the functioning of the software is made in only one place which makes it possible to centralize knowledge and therefore centralize the maintenance effort. Additionally, the domain model provides the knowledge that r~duces and simplifies the tasks of the natural language processor, the analyzer, and the translator modules. 4 Reasoning Modules 4.1 Natural Language Processor The existing testing methodology used English as the language for test cases (see Figure 1) which is also KITSS' input. Recent research in statistical parsing approaches [Jones & Eisner, 1991] provided some answers to the difficulty of natural language parsing in restricted domains such as testing languages. In the KITSS project, the parser uses probabilities (based on training given by telephonese examples) to prune the number of choices in syntactic and semantic structures. Unlikely structures can be ignored or eliminated, which helps to speed up the processing. For instance, consider the syntax of the following two sentences 5 : Place a call to station troops in Saudi Arabia. Place a call to station "4623" in two minutes. Both examples are correct English sentences. Although the second sentence on the surface matches in many parts the first one, their structure is very different. In the first sentence "station" is a verb, in the second a noun; "to" is an infinitive and a preposition respectively. "In Saudi Arabia" refers to a location whereas "in two minutes" refers to time. It is hard to come up with correct parses for both but by restricting ourselves to the 5This example was given by Mark Jones. telephonese sublanguage this is somewhat easier. In telephonese, the structure of the first sentence is statistically unlikely and can be ignored while the second sentence is a common phrase. The use of statistical likelihoods to limit search during natural language processing was used not only during parsing but also when assigning meaning to sentences, determining the scope of quantifiers, and resolving references. When choices could not be made statistically, the natural language processor could query the domain model, the analyzer, or the human user for disambiguation. The final output of the natural language processor are logical representations of the English sentences, which are passed to the analyzer. 4.2 Completeness & Interaction Analyzer The completeness and interaction analyzer represents one of the most ambitious aspects of KITSS. It is based on experience with the WATSON research prototype [Kelly & Nonnenmann, 1991]. Originally, WATSON was designed as an automatic programming system to generate executable specifications from episodic descriptions in the telephone switching software domain. This was an extremely ambitious goal and could only be realized in a very limited prototype. To be able to scale up to realworld use, the focus has been shifted to merely checking and augmenting given tests and maybe generating related new ones rather than generating the full specification. Based on the natural language processor output, the analyzer groups the input logical forms into several episodes. Each episode defines a stimulus-response-cycle of the switch, which roughly corresponds to the action/verify statements in the original test case. These episodes are the input for the following analysis phases. Each episode is represented as a logical rule, which is checked against the dynamic model. The analyzer uses first-order resolution theorem proving in a temporal logic as its inference mechanism, the same as WATSON. The analysis consists of several phases that are specifically targeted for this domain and have to be re-targeted for any different application. All phases use the dynamic model extensively. The purpose of each phase is to yield a more detailed understanding of the original test case. The following are the current analysis phases: • The structure of a test case is analyzed to recognize or attribute purpose to pieces of the test case. There are four major pieces that might be found: administration of the switch, feature activation or deactivation, feature behavior, and regression testing. 1097 • The test case is searched for connections among concepts, e.g., there might be relations between system administration concepts and system signaling that need to be understood. • Routine omissions are inserted into the test case. Testers often reduce (purposefully or not) test sequences to their essential aspects. However, these omissions might lead to errors during testing and therefore need to be added. • Based on the abstract plans in the dynamic model, we can enumerate possible specializations, which yield new test cases from the input example. • Plausible generalizations are found for objects and actions as a way to abstract tests into classes of tests. During the analysis phases, the user might interact with the system. We try to exploit the user's ease at verifying or falsifying examples given by the analyzer. At the same time, the initiative of generating the details of a test lies with the system. For example, some test case might violate the look B feel of the system, i. e., there is a conflict with an invariant. However, the user might want this behavior intentionally which will lead to a change in the look B feel itself. The final output of the analyzer is a corrected and augmented test case in temporal logic. As an example of the analyzer's representation after analysis, the following shows the logical forms for the first few episodes .in Figure 1. Notice that the test case is expanded since the analyzer applied abstract stimulus plans. ((OCCURS (GOES-OFFHOOK B)) (BEGINS (RECEIVES-TONE B NORMAL-DIAL-TONE))) ((OCCURS (DIALS-CODE B (ACTIVATE-ACCESS-CODE CF))) (BEGINS (RECEIVES-TONE B SECOND-DIAL-TONE))) ((OCCURS (DIALS-EXTENSION B C)) (ISSUES (RECEIVES-TONE B CONFIRMATION-TONE)) (BEGINS (STATUS-LAMP-STATE B (BUTTON CF) STEADY))) This representation is passed to the translator. 4.3 Translator To make use of the analyzer's formal representation, the translator needs to convert the test case into an executable test language. This language exercises the switch's capabilities by driving test equipment with the goal of finding software failures. One goal of the KITSS project was to extend the life of test cases so that they could be used as many times as possible. To accomplish this, it was decided to make the translator support two types of test case independence. First, a test case must be test machine independent. Each PBX that we run our tests on has a different configuration. KITSS permits a test author to write a test case without knowing which particular machine it will be run on and assuming unlimited resources. The translator loads the configuration setup of a particular switch into the test execution model. It uses this to make the test case concrete with respect to equipment used, system administration performed, and permissions granted. Thus, if the functional description of a test case is identical in two distinct environments, then the logical representation produced by the earlier modules of KITSS should also be identical. Second, a test case must be independent of the automated test language. KITSS generates test cases in an in-house test language. The translator's code is small because much of the translation information is static and can be represented in CLASSIC. If a new test language replaces the current one then the translator can be readily replaced without loss of test cases, with minimal changes to the KITSS code, and without a rewrite of most of the domain model. 5 Status The KITSS project is still a prototype system that has not been deployed for general use on the D EFINITY project. It was built by a team of researchers and developers. Currently, it fully translates 38 test cases (417 sentences) into automated test scripts. While this is a small number, these test cases cover a representative range of the core features. Additionally, each test case yields multiple test scripts after conversion through KITSS. The domain model consists of over 500 concepts, over 1,500 individuals, and more than 80 temporal constraints. The domain model will grow somewhat with the number of test cases covered, however, so far the growth has been less than linear for each feature added. All of the modules that were described in this paper have been implemented but all need further enhancements. System execution speed doesn't seem to be a bottleneck at this point in time. CLASSIC's fast classification algorithm's complexity is less than linear in the size of the domain model. Even the analyzer's theorem prover, which is computationally the most complex part of KITSS, is currently not a bottleneck due to continued specialization of its inference capability. However, it is not clear how long such optimizations can avoid potential intractability. The current schedule is to expand KITSS to cover a few hundred test cases. To achieve this, we will shift our 1098 strategy towards more user interaction. The version of KITSS currently under development will intensely question the user to explain unclear passages of test cases. We will then re-target the reasoning capabilities of KITSS to cover those areas. This rapid-prototyping approach is only feasible since we have already developed a robust core system. Although scaling-up from our prototype to a real-world system remains a hard task, KITSS demonstrates that our knowledge-based approach chosen for functional software testing is feasible. 6 Conclusion As we have shown, testing is perhaps one of the most expensive and time-consuming steps in product design, development, and maintenance. KITSS uses some novel approaches to achieving several desirable goals. Features will continue to be specified in English. To support this we have incorporated a statistical parser that is linked to the domain model as well as to the analyzer. Additionally, KITSS will interactively give the user feedback on the test cases written and will convert them to a formal representation. To achieve this, we needed to augment the domain model represented in a terminological logic with a dynamic model written in a temporal logic. The temporal logic inference mechanism is customized for the domain. Tests will continue to be specified independent of the test equipment and test environment and the user will not have to provide unnecessary details. Such a testing system as demonstrated in KITSS will ensure project-wide consistent use of terminology and will allow simple, informal tests to be expanded to formal and complete test scripts. The result is a better testing process with more test automation and reduced maintenance cost. Acknowledgments Many thanks go to Van Kelly, Mark Jones, and Bob Hall who also contributed major parts of the KITSS system. Additionally, we would like to thank Ron Brachman for his support throughout the project. References [Balzer et al., 1977J Balzer R., Goldman N., and Wile D.: Informality in program specifications. In Proceedings of the 5th IJCAI, Cambridge, MA, 1977. [Barstow, 1985J Barstow, D.R.: Domain-specific automatic programming. IEEE Transactions on Software Engineering, November 1985. [Brachman et al., 1989J Brachman, R.J., Borgida, A., McGuinness, D.L., and Alperin Resnick, L.: The CLASSIC knowledge representation system, or, KLONE: The next generation. In preprints of Workshop on Formal Aspects of Semantic Networks, Santa Catalina Island, CA, 1989. [Brachman et al., 1990J Brachman, R.J., McGuinness, D.L., Patel-Schneider, P.F., Alperin Resnick, L., and Borgida, A.: Living with CLASSIC: When and how to use a KL-ONE-like language. In Formal Aspects of Semantic Networks, J. Sowa, Ed., Morgan Kaufmann, 1990. [Brodie et al., 1984J Brodie, M.L., Mylopoulos, J., and Schmidt, J.W.: On conceptual modeling: Perspectives from Artificial Intelligence. Springer Verlag, New York, NY, 1984. [Brooks, 1987J Brooks, F.P.: No silver bullet: Essence and accidents of software engineering. Computer, Vol. . 20, No.4, April 1987. [Howden, 1985J Howden, W.E.: The theory and practice of functional testing. IEEE Software, September 1985. [Jones & Eisner, 1991J Jones, M.A., and Eisner J.: A probabilistic chart-parsing algorithm for context-free grammars. AT8T Bell Laboratories Technical Report, 1991. [Kelly & Nonnenmann, 1991J Kelly, V.E., and Nonnenmann, U.: Reducing the complexity of formal specification acquisition. In Automating Software Design, M. Lowry and R. McCartney, eds., MIT Press, 1991. [McCartney, 1991J McCartney, R.: Knowledge-based software engineering: Where we are and where we are going. In Automating Software Design, M. Lowry and R. McCartney, eds., MIT Press, 1991. [Myers,1976J Myers, G.J.: Software Reliability. John Wiley & Sons, New York, NY, 1976. [Myers, 1979J Myers, G.J.: The Art of Software Testing. John Wiley & Sons, Inc. New York, NY, 1979. [Nonnenmann & Eddy, 1991J Nonnenmann, U., and Eddy J.K.: KITSS - Toward software design and testing integration. In Automating Software Design: Interactive Design - Workshop Notes from the 9th AAAI, 1. Johnson, ed., USC/lSI Technical Report RS-91-287, 1991. [Yonezaki, 1989J Yonezaki, N.: Natural language interface for requirements specification. In Japanese Perspectives In Software Engineering, Y. Matsumoto and Y., Ohno, eds., Addison-Wesley, 1989. [Zave & Jackson, 1991J Zave, P, and Jackson, M.: Techniques for partial specifica~ion and specification of switching systems. In Proceedings of the VDM'91 Symposium, October 1991. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1099 A Diagnostic and Control Expert System Based on a Plant Model Junzo SUZUKI* Chiho KONUMA Mikito IWAMASA Naomichi SUEDA Systems&Software Engineering Laboratory, Toshiba Corporation 70, Yanagi-cho, Saiwai-ku, Kawasaki 210, Japan Shigeru MOCHIJI Akimoto KAMIYA Fuchu Works, Toshiba Corporation 1, Toshiba-cho, Fuchu 183, Japan Abstract A conventional expert system for plant control is based on heuristics, which are a priori knowledge stored in a knowledge base. Such a system has a substantial limitation in that it cannot deal with "unforeseen abnormal situations" in a plant due to the lack of heuristics. To realize a flexible plant control system which can overcome this limitation, we focus on model-based reasoning. Our system has three major functions: 1) modelbased diagnosis for unforeseen abnormal situations, 2) model-based knowledge generation for plant control, and 3) knowledge-based plant control both with generated and a priori stored know ledge. In this paper, we focus on the function of model-based knowledge generation. First, we show an overview of our system which has an integrated architecture of deep reasoning with shallow reasoning. Next, we explain the theoretical aspects of model-based knowledge generation. Finally, we show the experimental results of our system, and discuss the system's capabilities and some open problems. 1 Introduction Currently in the field of diagnosis and control of thermal power plants, the more intelligent and flexible systems become, the more knowledge they need. Conventional diagnostic and control expert systems are based on heuristics stored a priori in knowledge bases, so they cannot deal with unforeseen abnormal situations in the plant. Such situations could occur if knowledge engineers forgot to implement some necessary knowledge. A skilled human operator is able to operate the plant and somehow deal with such unforeseen abnormal situations because he has fundamental knowledge about the structure and functions of component devices of a plant, the principles of plant operations, and the laws of *Email:suzuki%ssel.toshiba.co.jp@uunet.uu.net physics. His thought process is as follows. • Diagnosis of an unforeseen abnormal situation • Generation of plant control knowledge • Verification of generated knowledge A skilled human operator can deal with unforeseen abnormal situations by repeatedly executing these steps using the fundamental knowledge mentioned before. Therefore, the concepts of our diagnostic and control expert system are based on the same steps. In this paper, we focus on the generation and verification of plant control knowledge. First, we show an overview of our system. Next, we explain the model representations and the model-based reasoning mechanisms. After that, we describe the experimental results and discuss the system's capabilities. Finally, we discuss some open problems and related work. 2 A System Overview The model-based diagnostic and control expert system (Figure 1) consists of two subsystems: the Shallow Inference Subsystem (SIS) and the Deep Inference Subsystem (DIS). The SIS is a conventional plant control system based on heuristics, namely the shallow knowledge for plant control. It selects and executes plant operations according to the heuristics stored in the knowledge base. The Plant Monitor detects occurrences of unforeseen abnormal situations, and then activates the DIS. The DIS consists of the following modules: the Diagnosor, the Operation-Generator, the PreconditionGenerator, and the Simulation- Verifier. The Diagnosor utilizes the Qualitative Causal Model for plant process parameters to diagnose unforeseen abnormal situations. The Operation-Generator figures out which plant operations are necessary to deal with these unforeseen abnormal situation. It utilizes the Device Model and the 1100 Operation Principle Model. The Precondition-Generator attaches the preconditions to each plant operation above, and as a result, generates rule-based knowledge for plant control. The Simulation- Verifier predicts plant behavior which is to be observed when the plant is operated according to th~ generated knowledge. It utilizes the Dynamics Model, verifies the generated knowledge using predicted plant behavior, and gives feedback to the Operation- Generator to refine the knowledge if necessary. The knowledge compiled from models by the DIS is transmitted to the SIS. The SIS executes the plant operations accordingly, and as a result, the unforeseen abnormal situations should be handled properly. Deep Inference Subsystem ------ component devices, a Device Model can be defined for each component device. Figure 2 shows the Device Model representation for a boiler-feeding-waterpump, which supplies water to a boiler. name: demand: goal: states : operation : a_bfp a_bl'f: ~ [tonlbrJ a_bI'f =< capaclty( a_bIT) on ; capaclty( a bIT) =615 [tonlbr) off ; capaclty( (.bIT) = 0 [tonlbr) off -.on ; time-lag = 0.1 [br) , dldt( a_bIT) = + on ~ ; time-lag =0.1 [br), dldt( a_bIT) =- =dldt( a_bflf) quality: dldt( a_bIT) flowJn: flow_out: system: ( defiMd III sys~". ) ( dejiMd III sys~". ) bfp_system( a_btf , a_btlf ) Figure 2: An example of the Device Model The demands for each component device are de-· scribed in the demand slot, and their constraints to be satisfied are described in the goal slot. The functions of each component device are described as possible states of each device in the states slot. The operations of a device are defined by the change of its state. Direct and indirect influences to plant processes by operations are described in the operation and quality slots respectively. The structure of a plant is described in the flow jn and flow _out slots. In addition, hierarchical modeling can be done as shown in Figure 3. Figure 1: An overview of the system Model-Based Generation and Verification of Knowledge 3 The main purpose of this section is to present a generation and verification procedure for plant control knowledge to deal with unforeseen abnormal situations. This knowledge is in IF-THEN format. 3.1 Model Representation The Device Model and the Operation Principle Model are used to generate the knowledge. The Dynamics Model is used to verify the knowledge. We explain these models briefly. 1. Device Model The Device Model represents the fundamental knowledge about the functions, structure and characteristics of a plant. Because a plant consists of Figure 3: Hierarchical modeling of plant devices 2. Operation Principle Model The Operation Principle Model is concerned with the principles for safe and economical plant control. It consists of the following two rules. • Strict Accordance Rule The purpose of this rule is to ensure plant safety throughout a series of plant operations. It consists of the following two components: a rule to use a device within its own allowable range, and a rule to keep a faulty device out of service. 110 1 • Preference Rule The purpose of this rule is to ensure an economical plant operation. It consists of the following two components: a rule to keep the number of in-service devices to a minimum, and a rule to equalize the service-time of each device. at each device to the others according to the connections of devices, and locally verifying the constraInts at each device. demand for A demand forB 3. Dynamics Model The Dynamics Model represents the dynamic characteristics of the plant. In the area of plant control, the Dynamics lVlodel is concerned either with the functions of traditional plant controllers based on PID-controlor with the characteristics related to physical laws. Figure 4 shows the model of a waterflow-controller. f{ p and T are constants. 1/ s means the integral operator. feeding water ~~_~~.o~___ ~~~~ftow Figure 4: An example of the Dynamics Model 3.2 Model-Based nism Reasoning Mecha- We briefly explain the model-based reasoning mechanism of these modules: the Operation Generator, the Precondition Generator and the Simulation- Verifier. State change ...·)(~arameterls Le. operation" i value change : I.e. external -\1 \_ .....~~!:~~!.J Figure 5: Constraints verification function (b) Update of the Goal-State If some of the constraints at a certain device are proved not to be satisfied, a new state for this device should be sought in order to satisfy them. This function (Figure 6) consists of the following sub-functions: searching for a state of each device where all of its demands can be satisfied, distributing the demands for a device of higher hierarchy to devices of lower hierarchy according to the constraints defined by the Operation Principle Model, and generating new demands for connected devices according to the Device Model and propagating them. The plant operations are deduced by taking the difference between the initial goal-state and the updated one. New demand for C ) ... Propagate new demand for D 1. Operation Generator This module determines the goal-state where all of the constraints defined by the Device Model and the Operation Principle Model are satisfied. Generally, an unforeseen abnormal situation causes a state change of a plant, and this change can make the above constraints unsatisfied. To estimate this unsatisfied constraints, the following functions are needed. (a) Verification of Constraints All the constraints defined by the Device Model should be verified to see if they are still satisfied after the unforeseen abnormal situation. This function (Figure 5) c'onsists of the following two sub-functions: propagating the change Figure 6: Goal-State update function 2. Precondition Generator In the domain of thermal power plant control, preconditions of each plant operation can be classified into the following five generic classes [Konuma 1990J. • Preconditions for the state before an operation • Preconditions for the order of operations 1102 procedure GenelYlle&Test (M or DO , SO ) begin [ Se, Op] <= Operation_GenelYlle (M or DO, SO) ; Kl <= Precondition_GeMlYlle (SO, Se, Op ) ; PS <= Sim,,",~ (SO, Kl ) ; [NG, Dl, SI ] <= Verify (PS); IfNG constraint violation tben return( Kl, Se ') ; else [ 10, S3 ] <= GeMra~&Test (Dl, SI ) ; [ K3, Se ] <= Generate&Test (M, S3 ) ; K4 <= FIX( Kl ) + 10 + K3 ; return( K4, Se ) ; endlf end. • Preconditions for safety during an operation • Preconditions for the timing of an operation ='= • Preconditions for completion of an operation This module generates the above preconditions for each operation by analyzing the goal-state according to the constraints defined by the Device Model. An image of their generation process is shown in Figure 7. NOTATION: SI, Se : plant state DI : demand for a device PS : plant bebavlor KI : plan ~ plant operations [ ] : list expression M : output fIf Dlagnosor Op : plant operations NG : nag for allowable range violations <= : substitution expression Figure 8: Generate&Test algorithm of the knowledge Experiments 4 We have implemented the expert system on Multi-PSI [Taki 1988]. To realize a rich experimental environment, we have also implemented a plant simulator instead of an actual plant on a mini-computer G8050. Both computers are linked by a data transmission line. This section describes the results of some experiments. Precnd. for order Precnd. for safety Precnd. for timing Precnd. for completion ~-----IF-part - - - - - - - i l t- THEN-part --i 4.1 Figure 7: Generation process of preconditions 3. Simulation Verifier This module predicts plant behavior using the Dynamics Model to verify the knowledge compiled from models by the Operation Generator and Precondition Generator. The prediction of plant behavior can be realized through simulation methods [Suzuki 1990]. After the prediction, the module examines whether or not undesirable events have occurred. Undesirable events can be defined by several criteria, but one of the most important is the transient violation of the allowable range for each process parameter's value. The execution of plant operations usually causes the transient change of processes due to the dynamic characteristics of a plant. If this change is beyond the allowable range of a current plant state, it is detected as a violation. The Simulation- Verifier supports the GenerateBTest algorithm of knowledge [Suzuki 1990] as illustrated in Figure 8. This process can be formalized as updating the goal-state according to the degree of the violation. Configuration of a Thermal Power Plant Figure 9 shows the configuration of the thermal power plant. It consists of controllers (hatched rectangle) and devices. The condenser is a device for cooling the turbine's exhaust steam; the steam is reduced to water using cooling water taken from the sea. The reduced water is moved through the de-aerator to the boiler by the condensation-pump-system and the boiler-feeding-pumpsystem. The cooling water is provided by the circulationpump-system. The fuel-system supplies pulverized coal to the boiler. 4.2 Experimental Results The total of the Device Models in the system amounts to 78 (Table 1). In this table, the difference between the numbers in the left and right columns is due to hierarchical modeling. The experiments were performed as follows. 1. First, we selected appropriate faults of the following devices: a coal-pulver,izer, a boiler-feeding-pump, a condensation-pump, a circulation-water-valve, and a water-heater. We made these faults the malfunctions of the plant simulator. We also set them up for multiple faults. 1103 Table 1: The amount of devices and controllers ~ Devices Amount in the plant Amount in the Device Models 43 63 Controllers 7 15 Total 50 78 in Nl (N2), the number of generated ones by the system (N3), the covered ratio of N2 by the system (C Rl), and the uncovered ratio of N2 (ER), namely, the ratio of N2 missed being generated or incorrectly generated by the system. (C R2 will be explained in Section5.) The difference between Nl and N2 is due mainly to the following reason. Although a human expert specifies the preconditions of the knowledge as generally as possible, the system generates specialized preconditions for each occurring unforeseen abnormal situation. With this point in mind, we determine N2 by eliminating unnecessary preconditions from Nl. CRi (i=1,2) and ER are calculated by the following formulas. ' = Success(N2) CR ~ N2 Figure 9: Configuration of a thermal power plant 2. Next, we extracted some specific knowledge for plant control from the knowledge base in the SIS. This specific knowledge was necessary to deal with the selected faults. As a result, the selected faults were equal to unforeseen abnormal situations. 3. Finally, after activating the malfunctions of the plant simulator, we confirmed that the DIS compiled the knowledge from the models and that the SIS executed the operations accordingly. We explain the quality of generated knowledge for a single fault, because the results in multiple faults are the same as in a single fault. In the experiments, the contents of generated knowledge are concerned with switching from a faulty device to a backup one. Table 2 summarizes all the generated plant operations. In the case of a water-heater fault, the system failed to generate plant operations. In other cases, the system succeeded in generating plant operations. We estimate the quality of the generated knowledge in terms of its preconditions. This table lists columns consisting of the following items for each operation: the number of the preconditions encoded by a human expert (Nl), the number of the essential ones ER = Miss(N2) + Fail(N2) N2 Success(N2) denotes the number of N2 generated by the system; Miss(N2) the number of N2 which were not generated; and Fail(N2) the number of N2 incorrectly generated. We also consider the following in evaluating Success(N2). • Although the generated preconditions enumerate the individual state of each device, a human expert often represents them succinctly. For example, the conjunctive precondition "a_bfp = on" 1\ "b_bfp = on" 1\ "c_bfp = off" are represented as "the number of activated bfp = 2". • The system often generates superfluous preconditions that a human expert does not mention. • Although a human expert encodes preconditions for the selection of an in-service device, the system never generate them because they are already estimated in applying the Operation Principle Model. None of the above devalues the quality of generated knowledge because the system is "required only to generate specific preconditions for an occurring unforeseen abnormal situation. For this reason, we regard generated preconditions applicable to any of the above as Success(N2). We carried out the experiments under the following conditions. 1104 Table 2: Quality of generated knowledge ~ econd. lunforeseen (OP) situation . Operation 1. activate Pulverizer a Pulverizer Fault 2. halt a Pulverizer 3. activate aBFP 4. open a FWCV BFP Fault CP Fault CWV Fault HTR Fault S.setFWCV auto 6. set FWCV hand 7. close a FWCV 8. halt aBFP 9. activate aCP 10. halt aCP 11. activate aCWP 12. open aCWV 13. close aCWV 14. halt aCWP 15. open a HTR BvoassVLV 16. close a HTR VLV The number of preconditions Knowledge Knowledge Generated Covered (CRl) Error (ER) C.R. after (CR2) refinement [% ] in KB (Nl) in KB (N2) (N3) ratio [%] ratio [%] 12 6 11 100 0 100 8 8 12 100 0 100 42 18 8 22 78 100 26 10 8 40 60 90 32 8 8 38 62 75 12 4 7 75 25 100 14 6 9 83 17 100 23 8 11 87 13 100 17 7 6 57 43 100 13 7 8 86 14 100 8 7 7 57 43 100 4 3 7 67 33 100 4 3 8 67 33 100 7 8 71 29 failed to aenerate OP failed to aenerate OP failed to aenerate OP failed to aenerate OP 8 4 3 4 3 • Once DIS was activated, no further unforeseen abnormal situation occurred. • The Diagnosor deduced the exact diagnostic results. Because of the above conditions, SIS interpreted all the generated knowledge and handled the unforeseen abnormal situations. Figure 10 shows the generated knowledge and its corresponding knowledge encoded by a human expert for the operation no.5 in Table 2. We also show some additional information in Figure 10, which is referred to in the next section. 5 Discussion In this section, we evaluate the system's capability to generate the necessary plant operations and to generate the correct preconditions for each operation. The former is concerned with performance of the Operation Generator' and the latter is concerned with that of the Precondition Generator. In addition, we discuss the pros and cons of using Multi-PSI and some open problems. 1. Capability to generate plant operations I I failed to generateOP failed to aenerateOP 100 failed to generate OP failed to aenerate OP In the experiment, the system could generate all the necessary plant operations for each malfunction except the water-heater fault. We briefly explain the reason for this failure bellow. At a boiler, the following approximation holds true for outlet steam pressure (P), inlet fuel flow (F), inlet water temperature (T) and inlet water flow (G). are positive constants, and al, a2 are correction terms related to other process parameters. Cl, C2 The Operation Generator calculates F, G and T from P using this formula defined in the Device Model. P is the demand for the boiler. After that, the Operation Generator propagates F to the fuelsystem, and G and T to the water-heater as a new demand respectively. In this time, the Operation Generator must evaluate the above formula from left side to right side, but possible value combinations of F,G and T cannot be decided using the single input value P. To deal with this undecidability, the Operation Generator utilizes the Operation Principle 1105 no.c: no.d: no.e: no.f: no.g: device, this concept does not hold true for devices such as PID-controllers or devices placed under the control of PID-controllers. a fwcv ss = auto b- fwcv- ss = auto c-fwcv-= dose - - - - - -......" - _ c~bfp fw_dev>.S[%] - - - -.. c_bfp fw_dev NML - 200 c fwcv m >72-a[%] [mm] c=fwcv=m < 72 + a [%] Operation no.S Figure 10: Knowledge for cbfp controller 1I1odel and approximation functions supplemented with the Device Model. The failure in water-heater fault is caused by this reasoning mechanism. We believe that additional principles are needed to evaluate such a process balance. 2. Capability to generate preconditions From CR1 and ER in Table 2, we can see that most of the generated preconditions are imperfect, namely ER > O. The reasons are as follows. • The Precondition Generator failed to generate preconditions related to devices not modeled in Device Model. An example is the set of preconditions to establish the electric power supply for the pump. We can resolve this problem easily by augmenting the Device Model. • Although all the necessary preconditions could be checked in the goal-state search, the Precondition Generator missed analyzing them. No.i to no.j in Figure 10 illustrates this point. The system focuses only on the neighbor devices of the operated device. Because the system is required only 'to generate specific preconditions for an occurring unforeseen abnormal situation, we can resolve this problem easily by extending the focusing area. • The Precondition Generator generated incorrect preconditions for the timing of operations, as shown by no.8 to no.9 in Figure 10. Although the system is based on the concept that the timing of operations can be determined from the maximum outlet process flow of each Although we can resolve the former two problems easily, the last problem is serious because it is closely related to the basic concept for the generation of preconditions. It is still an open problem. In Table 2, column C R2 represents the expected results after the refinements against the former two problems. The remaining uncovered parts for operations 4 and 5 (ER is 10% and 25% respectively) are related to the last mentioned problem above. 3. Real-time reasoning using Multi-PSI Although our system does not require of the severe real-time reasoning capability to cover either PIDcontrol or adaptive-control, it requires at least the ability to compile the knowledge within a few minutes. To guarantee this performance, we have been investigating a parallel reasoning mechanism with Multi-PSI [Suzuki 1991]. We can use KL1 language on Multi-PSI, which is a profitable language to implement a multi-process system concisely. In particular, its process synchronization mechanism by "suspend" is an advantage for our system implementation. In spite of this point, it is very difficult to achieve a drastic speedup using KL1 and Multi-PSI. We have already demonstrated a threefold to fivefold improvement of reasoning time by using MultiPSI with 16 processor elements. To achieve more improvement, we think we must make a more elaborate implementation. 4. Utility of the compiled knowledge In contrast to the classical approach by shallow 1mbwledge, our proposed model-based reasoning mechanism succeeded to deal with unforeseen abnormal situations in a plant. This point is the utility of the compiled knowledge. Although our proposed mechanism is powerful to deal with unforeseen abnormal situations, it is weak with respect to the acquisition of knowledge which is reusable in the SIS. Because the system generates specific knowledge only for occurring unforeseen abnormal situations, the generated knowledge is either too general with respect to the lack of some conjunctive preconditions or too specific with 'respect to their enumerative representations from the viewpoint of its reusability. 5. Facility of model acquisition The system utilizes the Qualitative Causal Model, the Device Model, the Operation Principle Model and the Dynamics Model. These models could be 1106 built from the plant design, and should be consistent with each other. In the current implementation of the system, each model is built and implemented separately. Therefore, model sharing is not yet realized. In a diagnostic task, Yamaguchi[Yamaguchi 1987] refers to the facility of model· acquisition. Some other related works are in the area of the qualitative reasoning. Crawford [Crawford 1990] attempted to maintain and support the qualitative modeling environment by QPT. 6. Over-sensitive verification of the plant behavior In the current implementation of the Generate8Test algorithm for the knowledge, the priority of each allowable range is not considered at all. Therefore, even though the violation of the range is slight enough to be ignored, the system tries to deal with this violation sensitively. This sensitivity is meaningless for all practical purposes because a plant would be designed with enough capacity to absorb the violation. For this reason, the system should check the range with some allowable degree of violation. We are now investigating the mechanism. 7. Monitoring the execution of the generated knowledge In this paper, we supposed that the Diagnosor can diagnose unforeseen events exactly. However, in general, this supposition can be invalid. Diagnostic results should be estimated by plant monitoring following the plant operations. As for the related work, Dvorak [Dvorak 1989] utilizes the QSIM [Kuipers 1986] to monitor a plant. However, he does not refer to the generation of the knowledge for unforeseen events. 6 Conclusion We proposed a diagnostic and control expert system based on a plant model. The main target of our approach is a system which could deal with unforeseen abnormal situations. Our approach adopts a model-based architecture to realize the thought process of a skilled human plant operator. In this paper, we focused on model-based generation of plant control knowledge, and explained the details of the model-based reasoning. Our system utilizes the following models: the Device Model, the Operation Principle Model and the Dynamics Model. We also discussed its ability as demonstrated through some experimental results. The results encourage us to make sure the modelbased reasoning capabilities in plant control. Acknowledgements This research was carried out under the auspices of the Institute for New Generation Computer Technology (ICOT). References [Crawford 1990] Crawford, J., Farquhar, A. and Kuipers, B. "QPC: A Compiler from Physical Models into Qualitative Differential Equations", Proc. of AAAI-90, pp.365-372 (1990). [Dvorak 1989] Dvorak, D. and Kuipers, B. ((ModelBased Monitoring of Dynamics Systems", Proc. of IJCAI-89, pp.1238-1243 (1989). [Konuma 1990] Konuma, C., et. al. "Deep Knowledge based Expert System for Plant Control - Development of Conditions Generation Mechanism of Plant Operations - ", Proc. 9f 12th Intelligent System Symposium, Society of Instrument and Control Engineers, pp.1318 (1990) (in Japanese). [Kuipers 1986] Kuipers, B. "Qualitative Simulation", Artificial Intelligence, 29, pp.289-338 (1986). [Suzuki 1990] Suzuki, J., et al. "Plant Control Expert System Coping with Unforeseen Events - Model-based Reasoning Using Fuzzy Qualitative Reasoning - ", Proc of Third International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (lEAl AIE-90), ACM, pp.431-439 (1990). [Suzuki 1991] Suzuki, J., et al. "Plant Control Expert System on Multi-PSI Machine", Proc. of KL1 Programming Workshop, pp.l0l-108 (1991) (in Japanese). [Taki 1988] Taki, K. "The parallel software research and development tool: Multi-PSI system ", Programming of Future Generation Computers, North-Holland, pp.411-426 (1988). [Yamaguchi 1987] Yamaguchi, T., et al. "Basic Design of Knowledge Compiler Based on Deep Knowledge", Journal of Japanese Society for Artificial Intelligence, vol.2, no.3, pp.333-340 (1987) (in Japanese). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1107 A Semiformal Metatheory for Fragmentary and Multilayered Knowledge as an Interactive Metalogic Program Andreas Hamfelt and Ake Hansson Uppsala Programming Methodology and Artificial Intelligence Laboratory Computing Science Dept., Uppsala University Box 520, S-751 20 Uppsala, Sweden +46-18-18 25 00 hamfel tes = [actor(X),actor(Y»), LegSetl = [[provisionno(sga(5»U,LegSetO], Text = the same text as in rule schema 1 in fig. 2.l. propertyping(t(1 ),RulePropl ,ModAtl ,Types,Text). In IT this provision is assumed open with respect to the concepts 'vendee' and 'vendor'. So, the assumed fixed structure of this provision is represented in the metalanguage as the term specified for the metavariable RulePropl with open places expressed by the metavariables X and Y. These variables have to be specialised interactively with the user. The predicate propertyping/5 is defined for this interaction. The metavariable ModAtl expresses the relation between the concepts of IT, i.e., the text of Text and its open parts, i.e., 'vendee' and 'vendor', and its formal counterpart in 07 partly specified in RulePropl. Thus, a proper typing carried out by the user gives a meaningf~l rule of the object language of levell, represented In the metalanguage by the specialised term of RulePropl. The metavariable LegSeU identifies what part of level 1 in IT is relevant for a particular case. The rules of acceptance may also only be partially characterized in the metalanguage. However, a user can interactively add interpretation data, thereby extending the partial characterization of OT in the theory MT of the metalanguage. What is hard characterizing is the determination of whether or not a meaningful rule belongs to a theory of IT, i.e., is legally acceptable, and thus should have a formal counterpart in OT. Presently, this is solved by assuming in MT that a rule is acceptable when a user tries to apply it, and the conditions for its application are accepted, i.e., either follow by logic from other accepted rules or are included in the theory by rules at the higher adjacent level in co-operation with the user. So, we presuppose that it is only the user who can determine the relevance of a specific principle. Consequently, at the end of a session these assumptions should be possible for a user to examine. These aspects are encoded in the prover clause [UP] (short for upward reflection). Observe that the prover clauses belong to MT which takes as object theory the whole multilayered OT. Their first demo argument defines the formalisation in OT of logic provability between a theory Ti of IT and a sentence of IT but though e.g., the fourth proof term argument has a counterpart in OT - a formal proof extending over the whole hierarchy of OT-it includes expressions solely of MT as well. [UP] prover(demo(n(t{l»,n(SentPropl»,ModI,LegSetI,ProofI):proposesent(t{l),SentPropl,Modl,LegSetI), J is I + I, ground([SentPropl,ModI,LegSetI]), permissible(t{l),SentPropl), prover(demo(n(t(J»,n(demo(n(t{l»,n(SentPropl»», [ModAtJ ,ModI] ,[LegSetAtJ ,LegSetI] ,ProoD), Proofl = (sentenceof(theory(I),SentPropl):proofof(theory(J),proved(theory(I),SentPropl),ProoD». permissible(t{l),SentPropl):-I = 1. permissible(t(I),SentPropl):-I ~ 2,\+ SentPropl = (Head:-Body). Clause [UP] encodes in MT upward reflection between two theories Ti and Tj of arbitrary adjacent levels in IT, with formal counterparts t(l) and t(J) in OT. A sentence is assumed to belong to a theory Ti if this accords with the rules of theory Tj of the higher adjacent level. In MT, LegSetI and ModI identify and modify formula schemata corresponding to known fragments of sentences of the theory Ti • The predicate proposesent/4 is defined to specialise interactively with a user such meaningfulsent schemata. Proofl is a metaproof in MT of the existence of a sequence of formulas in OT's formalisation of IT constituting a formal proof of the proposed sentence. Upward reflection must be constrained. If each sentence were upward reflected directly when proposed, the reasoning process would ascend directly to the topmost level since the metarule proposed for assessing the sentence would itself directly be upward reflected, etc. Therefore, at levels i, i ~ 2, only sentence proposals which are ground facts may be upward reflected, postponing the assessment of rules, which may only be proposed as non-ground conditional sentences, till facts are activated by their premises. Under this reasoning scheme the content of all sentences involved in the reasoning process will eventually be assessed. The restriction is maintained by the permissible subgoal. Clause [ANOI] handles A-introduction. In MT a theory Ti of IT, with t(I) as formal counterpart in OT, is assumed to include a sentence which is a conjunction if both its conjuncts may be assumed included in Ti • [ANOI] prover(demo(n(t{l»,n(and(GI,G2»), [[ModGI ,ModG2] ,ModsBelow],LegSetI,Proofl):I ~ 2, prover(demo(n(t(I»,n(GI»,[ModGI,ModsBelow],LegSetI,ProofGl), prover(demo(n(t(l»,n(G2»,[ModG2,ModsBelow],LegSetI,ProofG2), Proofl =(sentenceof(theory(I),and(G I ,G2»:and(proofof(theory(I),G I ,ProofG I), proofof(theory(l),G2,ProofG2»). Clause [MP] encodes our version of modus ponens. In MT a theory Ti of IT, with t{l) as formal counterpart in OT, is assumed to include a sentence which is the consequence of a proposed implication of Ti whose antecedent can be assumed included in Ti • 1112 [MP] prover(demo(n( t(I) ),n(HeadI)),ModI,LegSetI,ProofI):I ~ 2, proposesent(t(I),(HeadI:-BodyI),ModI,LegSetI), prover(demo(n(t(I»,n(BodyI»,ModI,LegSetI,ProofBodyI), ProofI = (sentenceof(theory(I),HeadI):and(ruleof(theory(I) ,(HeadI: -BodyI», proofof(theory(I),BodyI,ProofBadyI»). The knowledge of rules in IT for assessing sentence proposals for the adjacent lower level theory Ti will at some level j be too rudimentary for composing a theory Tj • At this level, Tj is considered to be the user's opinion of the sentences proposed for T j • This is encoded in MY in the clause [TOP]. [TOP] prover(demo(n(t(J»,n(demo(n(t(l»,n(RulePropI»», ModJ ,LegSetJ ,ProoD):J ~ 2, \f. proposesent(t(J),(demo(n(t(I»,n(RulePropl):-BodyJ), ModJ ,LegSetJ), externalconfirmation(t(I) ),RulePropI,ModJ ,LegSetJ), ProoD = externallyconfirmed(sentenceof(theory(I),RulePropl)). ModI = [[hirer/vendee,letter/vendor],unspec], call it (modI) RulePropi = (legalcons(pay,hirer,letter,goods,price):and(actorI(hirer,goods),and(actor2(letter,goods), and(unseUledprice(goods ),and(demands(letter,price), reasonable(price,goods»»», call it (ruleprop 1) Now it must be established whether it accords with the higher adjacent level, i.e., the theory T2 , to assume a primary rule with this proposed content is included in the theory T 1 • This is accomplished through "upward reflection". Before a formula with content information is upward reflected it must be checked for groundness (a hack) and permissibleness. These are the tasks of the third subgoal of [UP] (where (name) is shorthand for an occurrence of the term named by name). ground([(rulepropl) ,(modI) ,(legsetl) ]) and of the fourth subgoal of [UP] permissible(t(l), (ruleprop I», which permits a conditional rule on levell to be upward reflected. The fifth, "upward reflection" , subgoal of [UP] Let us now partially trace the computation of a sample query prover(demo(n(t(2»,n(demo(n(t(I»,n«(rulepropI»»), [ModAt2,(mod 1) ],[LegSetAt2,(legseU) ],Proof2), >prover(demo(n(t(I»,n(RulePropI»,ModI,LegSetl,Proofl). resolves with the prover clause [MP] leading to four subgoals (the first and last of which controls the index of the current level and builds the proof term, respectively). Now a secondary rule must be proposed for assessing the lower level expression. The second subgoal of [MP] is This query could be read as "is there a metaproof Proofl stating that the theory Tl of level 1 includes a primary rule which is represented in OT by RulePropi and modified by ModI in the legal set ting LegSeU?" Since it is completely unspecified at this point what particular problem to solve the query can be stated in these general terms and be generated by the system. The goal resolves with the prover clause [UP] leading to six subgoals, the last of which builds the proof term to bind Proofl. Below, we refrain from discussing how the proof term is built during the computation. The first subgoal of [UP] is proposesent(t(l),SentPropl,Modl,LegSetl) which through user interaction selects a legal rule and modifies it for the current case. The unifying clause proposesent(Theory,RuleProp,Mod,LegSet):(Theory = t(l);RuleProp=(demo( _,J:-Body»,l findlegalseuing(Theory,Leg Set), meaningfulsent(Theory,RuleProp,Mod,LegSet,Text). identifies the relevant part of the legal domain from which it retrieves a proposal for a rule provided it is meaningful. The latter is sorted out by meaningfulsent clau~es, say, the one presented above. In this clause the propertyping condition is intended to promote that user proposed modifications preserve the rule's meaningfulness. Suppose now that the user interaction makes the first subgoal of [UP] return with the following ground argument bindings, i.e., the schemata from sect. 5 Sale of Goods Act is adapted into a primary rule proposal regulating a case of 'hire of goods', LegSetl = [[provisionno(sga(5», provisioncategory(,Determination of Purchase Money'), legalfield('Commercial Law'»),unspec], call it (legsetl) 1 The legal setting may be assumed unknown if any of these two conditions hold. proposesent(t(2),(demo(n(t(I»,n«(rulepropl»):-Body2), (mod2) ,(legset2», where (mod2) is [ModAt2,(modl)], (modI) is [(modatl),unspec), (modatl) is [hirer/vendee,letter/vendor] and (legset2) is [LegSetAt2, (legsetl)] . Suppose the user chooses the analogia legis principle. The relation between primary rules of theory Tl and secondary rules for analogia legis of theory T2 is encoded in this clause: meaningfulsent(t(2),RuleProp2,Mod2,LegSet2,Text):RuIeProp2 = (demo(n(t(I»,n(RulePropl»:analogialegis(n(RulePropl),n(ModAtI),LegSetl», M0d2 = L,[ModAtl,J], LegSet2 = [[interpretationtheory(,analogia legis')IJ,LegSetl), Text = '''A primary rule proposal is legally valid (i.e., belongs to the theory tl of valid primary rules) if its inclusion accords with the secondary rule for analogia legis." .. .', propertyping(t(2),RuleProp2,[),[),Text). The second subgoal of [MP) returns with its second argument bound to (demo(n(t(I»,n«(ruleprop I) »:analogialegis(n( (ruleprop I) ),n( (modatl) ),(legsetl») and LegSetAt2 bound to [interpretationtheory('analogia legis')IJ. The third subgoal of [MP] is prover(demo(n(t(2)),n(analogialegis(n( (ruleprop I», n«(modatl», (legsetl»», (mod2) ,(legset2) ,ProofBody2), 1113 which recursively calls [MP]. Now a meaningful proposal for an actual analogia legis secondary rule will, by the second proposesent subgoal of [MP], be retrieved from this clause meaningfulsent(t(2).RuleProp2, _.LegSet2,Text):RuleProp2 = (analogialegis(n«Cons:-Ante»,n(ModAtl),LegSetl):and(not(casuisticalinterpretation(LegalField, n«not(Cons):-Ante»», and(intendedfor(ProvisionNo,n(TypeCase», and(substantialsimilarity(n(TypeCase) ,n(Ante) ,n(ModAtl», and(intendedtomeet(ProvisionNo,Interests,LegaIField), and(supports(ProvisionNo,n(ModAtl ),ProInt,Interests), and(recommendrejection(ProvisionNo,n(ModAtl), ContraInt,Interests), outweigh(Prolnt,ContraInt»»»», LegSet2 = [[interpretationtheory('analogia legis')!J,LegSetl], LegSetl = [[provisionno(ProvisionNo), _,legalfield(LegalField)], J, Text = the same text as in rule schema 3 in fig. 2.1. propertyping(t(2).RuleProp2,[] ,[] ,Text). with these bindings (where (rulepropl) is «(consrulel):(anterule 1)) ) analogialegis(n« (consrulel) :-(anterulel) »,n( (modatl) ),(legsetl) ):and(not(casuisticalinterpretationCCommercial Law', n«not(consrulel) :-(anterulel) »», and(iotendedfor(sga(5),n(TypeCase», and(substantialsimilarity(n(TypeCase),n«(anterulel),n«(modatl)), and(intendedtomeet(sga(5),Interests,'Commercial Law'), and(supports(sga(5),n( (modatl) ),ProInt,Interests), and(recommendrejection(sga( 5),n( (modatl )),Contralnt,Interests), outweigh(Prolnt,ContraInt»»») . Now it must be proved that with the proposed content the antecedent of the analogia legis rule (call it (albody)) is included in T2 • The third subgoal of [MP] is prover(demo(n(t(2»,n«(albody)), _, [[interpretationtheory('analogia legis')J.(legsetl)J. J, and each of the conjuncts in (albody) will be demonstrated in turn by the prover clauses [ANDI] , [MP], and [UP]. To illustrate how user proposed content for a sentence is accepted (or rejected) at higher levels let us focus on the fourth conjunct which gives rise to the goal prover(demo(n(t(2», n(intendedtomeet(sga(5),Interests,'Commercial Law'»), _,[[interpretationtheory('analogia legis')!J,(legsetl)]). An "intended to meet" sentence must be proposed by the user. The resul t may be a meaningful fact (unconditional sentence) whose inclusion in the theory T2 must be accepted by the rules of theory T3 or it may be a rule (conditional sentence) which is assumed included in T2 directly after the user's acceptance. The resolving clauses in the respective cases are [UP] and [MP]. Thus, in the first case upward reflection occurs immediately. In the second case upward reflection is postponed until backward inferencing by modus ponens at the current level leads to the proposal of a fact. Note that this guarantees that the application of the originally proposed rule is not accepted unless all the components of its antecedent eventually are assessed and accepted. Suppose a fact is proposed. The goal will resolve with the prover clause [UP], whose recursive fifth subgoal resolves with the prover clause [MP] leading to the application of tertiary rules for assessing the proposed (secondary) fact. Reasons of space force us to remove a part of the trace here. The inferencing at the tertiary level is similar to that just described for the secondary level. We conclude this section with a fragment of the trace in which a tertiary fact is proposed but no quaternary rules exist for assessing it. The upward reflected goal looks like prover(demo(n(t(4», n(demo(n(t(3», n(adequatetoequalize( ,actors with similar economical positions', 'consumer protection'/,hirer protection', 'Commercial Law'»»), Mod4,LegSet4, J. For the theory T4 proposesent fails however to return any quaternary rules which may assess the adequatetoequalize fact. The goal resolves with the prover clause [1DP] and the user mayor may not accept the content of the "adequate to equalize" rule. Provided the rule is accepted this completes the computation of the fourth conjunct in the antecedent of the analogia legis rule. The following three conjuncts in the antecedent of the analogi a legis rule are computed likewise which completes the computation of the initial query. A conclusion is not considered as final before the line of arguments leading up to it has been considered and accepted by the user. To this end the user needs a comprehensible presentation of the proof term. We illustrate elsewhere [Hamfelt and Hansson 1991b] how derivations of goals can be entrusted to the user's acceptance or rejection by an interactive piecemeal unfolding of a term representing the proof of the goal. 5. Coping with Change A program should be able to cope with changes in the frequently revised legal knowledge it formalises. Also it should be structure preserving ("isomorphic") modulo this knowledge, cf. [Sergot et al. 1986]. This is a conflict, Bratley et al. [1991] claim, since coping with changes requires modifying "implicit or explicit rules which do not correspond directly to paragraphs in the text of law" . Our metalogic program MT, however, is a structure preserving formalisation of legal knowledge coping with changes. The schemata give a modular, direct and easily changed description of statutory rules and (meta ... )metarules of legal interpretation. MT is modular both horizontally and vertically entailing that adjustments can be made locally to the schemata for the (higher level) rules of legal interpretation as well as to the schemata for the ordinary (low level) statutory rules. The level of the knowledge is identified and the appropriate adjustment made to its rule schemata, which then control the computation of accepted rules assumed included in theories of the lower adjacent level. Also, since MT takes as its object language the whole n-Ievel language of OT, we can encode in the formal part of MY, rules coping with global changes which are not possible to localize to rule schemata of a certain level. Furthermore, if the legal system has undergone an even more drastic revision, a large part of our system will nevertheless remain intact since the structure of principles such as analogia legis will hardly be affected. The structure 1114 preserving model of the British Nationality Act [Sergot et al. 1986] is according to Kowalski and Sergot [1990] "of limited practical value" since it expresses a "layman's reading of the provision" but in our MT expert knowledge may be incorporated e.g., for verifying the correctness of 01', modifying and augmenting it, and for suggesting promising ways for applying its rules. 6. Flelated VVork Allen and Saxon [1991] discuss, in contrast to our multiple semantic interpretations, assistance for multiple structural interpretation of components of provisions, such as "if", "not", "provided that", e.g., by changing which component is taken as the main connective of a sentence. The logical relationship between theories comprising interpretative knowledge and interpreted theories is not analysed. Assessing and compiling persuasive lines of arguments pro and contra different, often contradictory, legal decisions is important in legal reasoning. Proof terms should thus be objects of discourse and be reasoned about, which they are in MT. This is advocated also by Bench-Capon and Sergot [1988], who do not, however, propose a formalisation or a detailed informal theory, such as our IT, concerning how these aspects are sorted out in informal legal reasoning. 7. Conclusions and Further VVork Above we have proposed a novel approach for representing fragmentary, multilayered, not fully formalisable knowledge, in which the informal metatheory of the usual formalisation approach is replaced by a semiformal metalogic program which interactively composes formal object theories to be accepted or rejected as formalisations of the knowledge by the user. Our representation easily copes with changes in the represented knowledge. Imprecise knowledge requires advanced user interaction that promotes meaningful user answers and queries, constructs and intelligibly displays proof terms explaining derived conclusions, and makes the system pose its questions in a natural order. These aspects have been considered and to some extent solved in our program [Hamfelt and Hansson 1991b]. Multiple semantic interpretations of provisions is realised by allowing the user to fill schemata with meaningful content referring to his fact situation whereupon the system accepts or rejects the thus proposed rule. Including multiple structural interpretations, e.g., adding premises, should raise no real obstacles provided rules of acceptance for such alteration can be established. In case law rules of legal interpretation are as important as in statute law and apart from the difficult problem of inducing schemata from precedent cases, we hypothesize that our framework needs only minor adaptations to catch the problem of case-based reasoning. Proof terms should, since the notion of being a persuasive line of arguments is vague, not only be displayed for user communication but also reasoned about. Acknowledgments We like to thank Keith Clark and Leon Sterling for valuable comments. Fleferences [Allen and Saxon 1991] L. E. Allen and S. S. Saxon. More IA Needed in AI: Interpretation Assistance for Coping with the Problem of Multiple Structural Interpretations. In Proc. Third Int. Conf. on Artificial Intelligence and Law, ACM, New York, 1991. pp. 5361. [Bench-Capon and Sergot 1988] T. Bench-Capon and M. Sergot. Toward a Rule-Based Representation of Open Texture in Law. Computer Power and Legal Language, ed. C. Walter, Quorum Books, New York, 1988. pp. 39-60. . [Bowen and Kowalski 1982] K. A. Bowen and R. A. Kowalski. Amalgamating Language and Metalanguage in Logic Programming. Logic Programming, eds. K. Clark and S.-A. Tarnlund, Academic Press, London, 1982. pp. 153-72. [Brat ley et al. 1991] P. Bratley, J. Fremont, E. Mackaay and D. Poulin. Coping with Change. In Proc. Third Int. Conf. on Artificial Intelligence and Law, ACM, New York, 1991. pp. 69-75. [Costantini 1990] S. Costantini. Semantics of a Metalogic Programming Language. In Proc. Second Workshop on Metaprogramming in Logic, ed. M. Bruynooghe, Katholieke Universiteit Leuven, 1990. pp. 3-18. [Hamfelt 1990] A. Hamfelt. The Multilevel Structure of Legal Knowledge and its Representation, Uppsala Theses in Computing Science 8/90, Uppsala University, Uppsala, 1990. [Hamfelt and Barklund 1990] A. Hamfelt, J. Barklund. Metaprogramming for Representation of Legal Principles. In Proc. Second Workshop on Metaprogramming in Logic, ed. M. Bruynooghe, Katholieke Universiteit Leuven, 1990. pp. 105-22. [Hamfelt and Hansson 1991a] A. Hamfelt, A. Hansson. Metalogic Representation of Stratified Knowledge. UPMAIL TR 66, Compo Sci. Dept., Uppsala University, Uppsala, 1991. [Hamfelt and Hansson 1991 b] A. Hamfelt, A. Hansson. Representation of Fragmentary and Multilayered Know ledge-A Semiformal Metatheory as an Interactive Metq.logic Program. UPMAIL TR 68, Compo Sci. Dept., Uppsala University, Uppsala, 1991. [Horovitz 1972] J. Horovitz. Law and Logic. SpringerVerlag, Vienna, 1972. [Kleene 1980] S. C. Kleene, Introduction to Metamathematics. North Holland, New York, 1980. [Kowalski 1990] R. A. Kowalski. Problems and Promises of Computational Logic. Computational Logic, ed. J. W. Lloyd, Springer-Verlag, Berlin, 1990. pp. 1-36. [Kowalski and Sergot 1990] R. A. Kowalski, M. J. Sergot. The Use of Logical Models in Legal Problem Solving. Ratio Juris, Vol. 3, No.2 (1990), pp. 201-18. [Sergot et al. 1986] M. J. Sergot, F. Sadri, R. A. Kowalski, F. Kriwaczek, P. Hammond, H. T. Cory. The British Nationality Act as a Logic Program. Comm. ACM 29, (May 1986), pp. 370-86. [Sergot 1983] M. J. Sergot. A Query-the-User Facility for Logic Programming. In.tegrated Interactive Computer Systems, eds. P. Degano and E. Sandewall, North-Holland, Amsterdam, 1983. pp. 27-41. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1115 HELIC-II: A Legal Reasoning System on the Parallel Inference Machine Katsumi Nitta (1) Masayuki Ono (1) (1) (2) (3) Yoshihisa Ohtake (1) Hiroshi Ohsaki (2) Shigeru Maeda (1) Kiyokazu Sakane (3) Institute for New Generation Computer Technology 4-28, Mita l-chome, Minato-ku, Tokyo 108, Japan Japan Information Processing Development Center Nippon Steel Corporation nitta@icot.or.jp Abstract This paper presents HELIC-II, a legal reasoning system on the parallel inference machine. HELIC-II draws legal conclusions for a given case by referring to a statutory law (legal rules) and judicial precedents (old cases). This system consists of two inference engines. The rule-based engine draws legal consequences logically by using legal rules. The case-based engine generates legal concepts by referencing similar old cases. These engines complementally draw all possible conclusions, and output them in the form of inference trees. Users can use these trees as material to construct arguments in a legal suit. HELIC-II is implemented on the parallel inference machine, and it can draw conclusions quickly by parallel inference. As an example, a legal inference system for the Penal Code is introduced, and the effectiveness of the legal reasoning and parallel inference model is shown. 1 Introduction The primary knowledge source of a legal inference system is a statutory law. A statutory law is a set of legal rules. As legal rules are given as logical sentences, they are easily represented as logical formulae. Therefore, if a new case is described using the same predicates as those appearing in legal rules, we can draw legal conclusions by deductive reasoning. However, legal rules often contain legal predicates (legal concepts) such as "public welfare" and "in good faith" . Some legal concepts are ambiguous and their strict meanings are not fixed until the rules are applied to actual facts. Predicates which are used to represent actual facts do not contain such legal concepts. As there are no rules to define sufficient conditions for legal predicates, in order to apply legal rules to actual facts, interpreting rules and matching between legal concepts and facts are needed. To realize this, precedents (old cases) are often referenced because they contain the arguments of both sides (plaintiff vs. defendant or prosecutor vs. defendant) and the judge's opinions concerning interpretation and matching. Consequently, legal reasoning can be modeled as a combination of logical inference using legal rules and case-based reasoning using old cases. Based on this model, several hybrid legal inference systems consisting of two inference engines have been developed [Rissland et al. 1989] [Sanders 1991(a)]. However, as practical legal systems contain many legal rules and old cases, it takes a long time to draw conclusions. Moreover, controlling two engines often requires a complex mechanism. ICOT (Institute for New Generation Computer Technology) has developed parallel inference machines (Multi PSI and PIMs) [Uchida et al. 1988],[Goto et al. 1988]. These are MIMD-type computers, and user's programs written in parallel logic programming language KL1 [Chikayama et al. 1988] are executed in parallel on them. The HELIe-II (Hypothetical Explanation constructor by Legal Inference with Cases by 2 inference engines) is a legal inference system based on the hybrid model. It has been developed on the parallel inference machine, and draws legal conclusions for a given case by quickly referencing statutory law and old cases. In Section Two, we introduce the function and architecture of HELIC-II. In Section Three, we explain legal knowledge representation. In Section Four, we explain the reasoning mechanism of HELIC-II. In Section Five, a legal inference system of the Penal Code is explained. 2 Overview of HELIC-II The function of HELIC-II is to generate all possible legal conclusions for a given case by referring to legal rules 1116 and old cases. These conclusions are represented in the form of inference trees which include final conclusions and explanations of them. HELIC-II consists of two inference engines - the rule based engine and the case- based engine - and three knowledge sources - a rule base, a case base and a dictionary of concepts (see Fig.I). The rule-based engine refers to legal rules and draws legal consequences logically. The case-based engine generates abstract predicates (legal concepts) from concrete predicates (given facts) by referring to similar old cases. HELIC-II draws legal consequences using these two engines. Since the reasoning of these engines is datadriven, there are no special control mechanisms to manage this. A typical pattern of reasoning by HELIC-II is as follows. When a new case (original facts) is given to HELIC-II, the case-based engine initially searches for similar old cases and generates legal concepts which may hold in the new case. These concepts are passed to the rule-based engine by way of working memory(WM). Then, the rule- based engine draws legal consequences using original facts and legal concepts. These results are gathered by an explanation constructor, which then produces inference trees. Knowledge Representation 3 In this section, we will explain the representation of legal knowledge in HELIC-II. We will show how to represent legal rules, old cases and legal concepts. 3.1 Representation of Legal Rules A statutory law consists of legal rules. Each legal rule is represented as follows. RuleN ame( Comment, Rulelnfo, [A I ,A2 , ••• ,Ai ]-+ [[BI, .. ,Bk],[CI, .. ,C1], •• ]). In this clause, RuleName is the rule identification, Comment is a comment for users and Rulelnfo is adThe. ditional information such as article number. LHS ([All A 2 , ••• , Ai]) is the condition part, and the RHS([[BI , .. , B k ], [CI , .. , Cd, .. ]) is the consequence part. [BI, .. , B k ] and [CI, .. , Cd are combined disjunctively. Each literal of the LHS and RHS is an extended predicate or its negation (denoted by""''' or "not"). An extended predicate consists of a predicate (concept), an object identifier and a list of attribute = value pairs. The following is an example of an extended predicate. An object "drivel" is an instance of a concept "drive". Two attribute = value pairs (agent = tom and car = toyotaI) are defined. drive(drivel, [agent = tom, car = toyotaI]). Internally, this extended predicate is treated as a set of the triplet {obj ect, attribute, value} as follows. {drivel, agent, tom} {drivel,car,toyota1} In a clause, we can use "not" (negation as failure) in addition to ""," (logical not). By introducing "not", nonmonotonic reasoning is realized, and the representation of exceptional rules and presumed facts are easily represented [Sartor 1991]. The following are examples of legal rules. Figure 1: The architecture of HELIC-II homicideOI("example", [article = 199], [person(A) , person(B), action(Action, [agent = A]), intention (Intention, [agent = A, action = Action, goal = Result]), death(Result, [agent = BJ), caused(Caused, [event = Action,effect = Result2]), death(Result2, [agent = B]), riot( '" illegality( Illegal, [agent = A, 1117 action = Action, result = Result2])] [[crimeO f H omicide( Crime, [agent = A, action = Action, result = Result2])]]). legality01("example", [article = 38], [action(Action, [agent = A]), intention (Intention, [agent = A, action = Action, goal = Result]), selfDefence(Result, [object = Action]), caused( Caused, [event = Action, effect = Result2])] [["" illegality(Illegal, [agent = A, action = Action, result = Result])]]). drive(drive1, [agent = john, car = toyota1]). accident(accident1, [agent = john]). person(john, [sex = male]). person(mary, [sex = female]). restaurant(maxim's, [rank = 5stars]). car(toyota1, [type = sportsCarl). The meaning of this example is that the case "traf fic accident 112" consists of three events such as "dinner1", "drivel" and "accidentl". "Dinner1" occurred before "drivel", and "accident1" happened during "drivel". The event "dinned" is a lower concept of "dinner", and it is acted by "john" in "maxim's", etc .. (2) Case Rules The first rule is a definition of the crime of homicide, which is given by the Penal Code. The meaning of "not( "" illegality(Illegal, [... ])" is that illegality is presumed, in other words, if there isn't proof that """ illegality(Illegal, [... ])" holds then "not( "" illegality (Illegal, [... ])" is true. The second rule is an exception of the first If a person did some action in defense, rule. "illegality(Illegal, [... ])" is refuted. 3.2 Representation of Cases A judicial precedent consists of the arguments of both sides, the opinion of the judges and a final conclusion. We represent a precedent (an old case) as a situation and some case rules, and represent a new case as a situation. (1) Situation A situation consists of a set of events/objects and their temporal relations. An event and an object are represented as an extended predicate as introduced in the previous section. The temporal relations are represented as follows. problem(CaseID, Comment, TemporalRelations). CaseID is the case identification, Comment is a comment for users and TemporalRelations is a list of relations between events. To represent temporal relations between events/obj~cts, we use Allen's interval notation such as "before", "meets", "starts", and so on [Allen 1984]. The following is an example of a situation. problem(traf ficAccident112, "example", [before(dinnerl, drivel), during (accident1 , drivel)]). dinner(dinnerl, [agent = john,place = maxim's]). Arguments by both sides are represented as a set of case rules. The following is the syntax of a case rule. RuleN ame( Comment, Rulelnfo, [AI, A 2 , ... , Ai] -+ [BI' B 2 , .. , Bk])' RuleName is the rule identification, Comment IS a comment for users and RuleInfo is additional information such as a related article, index for the opposing side's case rules, relation to judge's decision and so on. The LHS ([AI, A 2 , ... , Ai]) is the context of the opinion, and the RHS ([Bb B 2 , .. , B k]) is the conclusion insisted on by one side. The following is an example of a case rule. rule001("example" , [ article = 218, insisted = prosecutor, result = lost], [ drive(drive1, [agent = john/important, object = toyota1/trivial]), person(john, [sex = male/trivial]), person(mary, [sex = female/trivial]), accident(accident1, [agent = john/important]), caused( caused1, [event = accident1/important, effect = injury1/important]), injury(injury1, [agent = mary/trivial])] [ responsibility(resp1, [agent = john, object = ken, reason = accident1])]). The meaning of this case rule is:"In the case that a traffic accident caused by John injured Mary, John had a responsibility of care to Mary." This rule concerns article 218 of the Penal Code and was insisted on by the prosecutor, but the judge didn't employ this rule. On the LHS, "effect = injury1" is an important fact from 1118 the legal point of view. Therefore, this fact is marked as "important". We can use "exact", "important" and "trivial" to represent levels of importance. This information is used to calculate the similarity between two situations. Arguments in a case are sequences of case rules. As both sides try to draw contradictory conclusions, an old case contains case rules whose conclusions are inconsistent. 3.3 Representation of Concepts All concepts in legal rules and cases must he contained in the dictionary. In other words, each event and object in a situation are instances of these concepts. In the dictionary, a super concept, a concept and a list of attributes are defined as follows. 4.1 A Rule-based Engine The function of the rule-based engine is to draw all legal consequences by the forward reasoning of legal rules, using original data (a new case) and results from a casebased engine. The rule-based engine is based on the parallel theorem prover MGTP (Model Generation Theorem Prover)[Fujita et al. 1991] developed by ICOT. MGTP solves range restricted non-Horn problems by generating models. For example, let's take the following clauses. C1: C2: C3: C4: true -4 p(a); q(b). p(X) -4 q(X); r(X). r(X) -4 s(X). q(X) -4 false. obj ect( creature, []). creature(person, [age, sex)). person(person, []). person(infant, []). creature(lion, []). action(drive, [agent, car, destination]). M1 ={p(a)} The similarity between concepts is defined by the distance in the hierarchy (see Fig.2). For example, "baby" is closer to "infant" than to "lion" because it requires two steps for "baby" to reach "infant" but thr~e steps to reach "lion" in this hierarchy. /C2~ M3={p(a),q(a)} C4 creature /"" lion infant baby Figure 2: Hierarchy of concepts 4 Reasoning by HELIC-II In this section, we will explain the reasoning mechanisms of the rule-based engine and the case-based engine. These engines are implemented in the parallel logic programming language KL1 and run on the parallel inference machine. M2={q(b)} x C4 M4={p(a),r(a)} x C3 M5={p(a),r(a),s(a)} Figure 3: MGTP proof tree MGTP calculates models which satisfy these clauses as follows (see Fig.3). The proof starts with null model MO = {¢>}. By applying C1, MO is extended into M1 = {p(a)} and M2 = {q(b)}. Then, by applying C2, M1 is extended into M3 = {p(a), q(a)} and M4 = {p(a), r(a)}. Using C4, M3 and M2 are discarded. By C3, M4 is extended to M5 = {p(a),r(a),s(a)}. M5 is a model which satisfies all clauses. In MGTP, each clause is compiled into a KL1 clause, and each KL1 clause is applied in parallel on the parallel inference machine. In the problem in which the proof tree has many branches, parallel inference performance becomes high. To use MGTP as a rule-based engine of HELIC-II, we extended the original MGTP as follows. 1119 1. Realization of "not (negation as failure)": We made MGTP able to treat "negation as failure" based on [Inoue et aI. 1991]. For example, the following C is treated as C', and the model is extended in two ways (see FigA). Here, "k" is a modal operator, and "k(r(X»" means that the model is believed to contain a datum which will satisfy reX) in the future. The Case-based Engine plaintiff's opinion C: not(r(X» ~ seX). C': dom(X) ~ k(r(X»; '" k(r(X», seX). After MGTP generates models which satisfy all clauses, the rule-based engine examines each of them. For example, if a model contains both '" k(r(a» and rea), or if a model contains k(r(a)) and doesn't contain r( a), the model is discarded. defendant's opinion {-q(b)} {q(b)} The Rule-based Engine ~------ initial models Figure 5: Splitting a model Figure 4: Negation as failure of MGTP 2. Realization of the multiple context: The rulebased engine uses both original facts (a new case) and results from the case-based engine as the initial model. The case-based engine may generate data which conflicts with each other such as "q(b)" and "", q(b)". Therefore, before reasoning, the rulebased engine has to split the initial model into several ones so that each model doesn't contain any conflicts (see Fig. 5). However, the case-based engine has not generated all results when the rule-based engine begins to reason because the reasoning of both engines is data driven. To obtain the pipeline effect, we developed a function to register predicates which may cause conflicts, and to split the model when such predicates reach the rule-based engine. For example, in Fig.5, if '" q(b) reaches the rule-based engine, the model is split l?efore q(b) is reached. We implemented this mechanism by using a similar modal operator as the "k-operator" . 3. Keeping justification: To' construct inference tr~es, th~ rule-based engine must keep the justifications for each consequence. A justification consists of a rule name and data which matches the LHS of the rule. 4. Temporal reasoning: We prepared a small rule set of temporal reasoning [Allen 1984] to help in describing the temporal relation. The following are example rules. before(A, B), before(B, C) ~ before(A, C). meets(A,B),overlaps(C,B) ~ overlaps(A, C); during(A, C); starts(A, C). With these extensions, the rule-based engine has many proof tree branches even if clauses don't have the disjunction such as C1 and C2 in Fig.3. Therefore, the rule-based engine has a lot of parallelisms in its reasoning. 4.2 A Case-based Engine The function of the case-based engine is to generate legal concepts by using similar old cases. The reasoning of the case-based engine consists of two stages (see Fig.6). 1120 I New Case I situation we regard [strikel, runAwayl] and [kick2, sneak2] as mapped subsequences of Sl and S2 (see Fig.7). Time ~~ I (Tom hits Mary }--~ causality ? Case1 violence escape strike 1 injury 1 '---~------~==~~----------::~-Case2 situation I ( Jim kicks Bi II ) Case3 Time ~ D I (injury ~ ) case rule -------------------- C)O 81 I I runAway 1 I kick 2 82 I sneak 2 It-----I Figure 7: Subsequence of events Figure 6: Reasoning by the case-based engine The similarity between two cases is evaluated by the length of the longest mapped subsequence. Several cases whose similarities are beyond a threshold are selected in the first stage. 2. Applying case rules: 1. Searching similar cases: The role of the first stage is to search for similar cases from the case base. At first, the case-based engine constructs a sequence of events for each case. As the situations of the new case and old cases are described as a set of events/objects and their temporal relations, it is easy to coristruct a sequence of events for each situation. Then, the case-based engine tries to extract common subsequences from event sequences of the new case and each old case. For example, let's take the following two sequences. Sl: [... , meets(strike1, injury1), during(runAwayl, injuryl), .. ] S2: [... , before(kick2,sneak2), ..] In this example, the temporal relation between "strikel" and "runAway1" is the same as that of "kick2" and "sneak2". Furthermore, "strikel" and "kick2" have a common upper concept "violence", and "runAway1" and "sneak2" have a common upper concept "escape" in the dictionary. Therefore, The role of the second stage is to apply the case rules of selected cases as follows [Branting 1989]. At first, the similarity between the LHS of a case rule and a new case is evaluated. For example, let's take "rule001" in section 3.2 and the following new case. person(bill, [D. baby(jane, O). cycle( cycle2, [agent = bill, object = honda2D. collision( collision2, [agent = bill]). sprain(sprain2, [agent = jane]). intention( intention2, [goal = injury2]). injury(injury2, [agent = jane]). The engine tries to map the LHS of "rule001" to a new case. As the following pairs of event/ohject have common upper concepts in the dictionary, we map these pairs (see Fig.8). john mary drivel toyotal ~ ~ ~ ~ bill jane cycle2 honda2 1121 new case LHS of rule001 {cycle2.agent.bill} •..... agent Figure 8: Mapping networks accident! injuryl causedl +-+ +-+ +-+ collision2 sprain2 caused2 The similarity is evaluated by counting the number of mapped links in Fig.8. As we explained in section 3.2, an ann6tation (exact, important, trivial) is attached to each link in the network. These annotations and the distances between concepts are used as weights to evaluate similarities. Even if some conditions of a case rule are not satisfied, but the important conditions are satisfied, then the LHS may be judged as similar to the new case. For example, in Fig.8, though there is no node which can be mapped to "negligencel", "ruleOOl" may be selected as similar. Next, the case-based engine selects case rules whose LHSes are similar to the new case, and executes their RHSes. The matching and executing case rules are repeated until there are no case rules left to be fired. On the parallel inference machine, each stage is executed in parallel. In the first stage, before searching, cases are distributed to processors (PEs) of the parallel inference machine, and then a new case is sent to each PE. Each PE evaluates similarities between the new case and old cases, and selects similar ones. Figure 9: Rete-like networks of KLl processes In the second stage, case rules are distributed to PEs, and the LHSes of each case rule are compiled into a Retelike network of KLl processes (see Fig.9). Then, triplets ({object,attribute,value}) which are facts of the new case are distributed to each PE as tokens. To realize matching based on similarity, each one-input node refers to the dictionary of concepts, and each two-input node not only examines the consistency of pairs of tokens but evaluates their similarities with the LHS. 5 A legal reasoning system for the Penal Code We developed an experimental legal reasoning system for the Penal Code. In the Penal Code, general provisions and definitions of crimes are given as legal rules. Though they seem to be strictly defined, the existence of criminal intention and causality between one's action and its result often becomes the most difficult issue in the court. The concept of causality in the legal domain is similar to the concept of responsibility and is different from physical causality. Therefore, to judge the existence of causality, we have to take into account various things such as social, political and medical aspect. We show the function of the reasoning system of the Penal Code using Mary's case. We selected this case 1122 from the qualification examination for lawyers in Japan. Mary's Case: On a cold winter's day, Mary abandoned her son Tom on the street because she was very poor. Tom was just 4 months old. Jim found Tom crying on the street and started to drive Tom by car to the police station. However, Jim caused an accident on the way to the police. Tom was injured. Jim thought that Tom had died of the accident arid left Tom on the street. Tom froze to death. The problem is to decide the crimes of Mary and Jim. The hard issues of this case are the following. 1. Causality between Mary's action and Tom's death: If Mary hadn't abandoned Tom, Tom wouldn't have died. Moreover, the reason for his death wasn't injury but freezing. Therefore, some lawyers will judge the existence of causality and insist she should be punished for the crime of "abandonment by person responsible resulting in death". On the other hand, other lawyers will deny any causality because causality was interrupted by Jim's action. 2. Causality between Jim's action and Tom's death: Jim did several actions such as "pick up", "drive", "cause accident" and "leave Tom". Among them, "cause accident" will be punished by the crime of "injury by negligence in the performance of work" , and "leave Tom" will be punished by the crime of "d eath by negligence". Moreover, if there is causality between "cause accident" and Tom's death, Jim will be punished by the crime of "death by negligence in the performance of work" which is very grave. As the main reason of Tom's death is freezing, it is difficult to judge the causality. Though the Penal Code has no definite rule for the causality, lawyers can get hints from old cases. For example, let's take Jane's case which was handled by the Supreme Court in Japan. Jane's Case: Jane strangled Dick to kill him. Though Dick only lost consciousness, Jane thought he was dead. Then, she took him to the seashore, and left him there. He inhaled sand and suffocated to death. In the court, there were arguments between the prosecutor and Jane. The prosecutor insisted Jane should be punished by the crime of homicide because of the following reasons. PI: "Strangling" and "taking to the seashore" should be considered the one action of performing the homicide. Therefore, it is evident that there was an intention to kill Dick and causality between her action and Dick's death. P2: There is causality between "strangling" and "Dick's death" even though "strangling" wasn't the main reason for his death. On the contrary, Jane insisted her actions didn't satisfy the condition of the crime of homicide because of the following reason. J1: "Strangling" should be punished be the crime of "attempted homicide, and "taking to the seashore" should be punished by the crime of "manslaughter caused by negligence" because there isn't causality between strangling and Dick's death, and there wasn't an intention to kill him when taking him to the seashore. We represent Mary's situation and Jane's case rule as follows. Mary's situation problem("mary's case", "example", ..... ). abandon(aba1, [agent = mary, object = tom]). pickup(pic2, [agent = jim, object = tom]). traf ficAccident(acc1, [agent = jim]). Jane's opinion rule002("Jane's case", [article = 218, insisted = defendant, result = lost], [ suf focate(suf1, [agent = jane/trivial, object = dick/trivial]), intention(int1, [agent = jane/trivial, object = act1/important, goal = deathl/important]), death(deathl, [agent = dick/trivial]), 1123 caused(causedl, [event = aetl/important, ef feet = lostl/important]), Speedup Time(sec.) 500~--------------------------------T20 rv caused(causedl, [event ef feet = death3])]). = actl, 400 , _///~--..--.-..-.--.-.----.-..• 300 The case-based engine of HELIe-II generated caused(ID, [event = acc1, effeet = death9])" byapplying rule002. In Mary's case, HELIe-II generated 12 inference trees. Some of them are based on the prosecutor's opinion and others are based on the defendant's opinion. The root of each tree is a possible crime such as abandonment by a person responsible resulting in death, manslaughter caused by negligence, etc .. The leaves are the initial data of the new case, and intermediate nodes are consequences by case rules or legal rules (see Fig.l0). "rv ~ ~~:~~~.) ,//' 200 10 ,/ ,; .:/ 100 t ..;. , o+o----1-o--~20-----30--~4-0----5-0~--6-0----+700 Number of processors Figure 11: Performance of stage 1 of the case-based engine Speedup TIme(sec.) 1200r-~~----------------------------T60 1000 50 800 40 600 Time(sec.) Speedup ~ 400 . . . . . . . . . f[e 4 200 au I ...... 30 20 / 10 .t••••...- 0+0~--1~0--~2~0---3~0----4~0----5~0----6~0---4700 Number of processors Figure 12: Performance of stage 2 of the case-based engine Figure 10: An Inference Tree We measured the calculation time to draw a conclusion for Mary's case on the experimental parallel inference machine Multi-PSI. The number of rules used was about 20 and the number of cases used was about 30. Fig.s 11 and 12 show the performance of the case-based engine, and Fig.13 shows the performance of the rulebased engine. These graphs show the effectiveness of the parallel inference. Speedup TIme(sec.) 1000.--------------------------------.12 800 ;"'///'/// / 600 400 10 8 ... Time(sec.) S_~ 6 4 ........... 200 2 ~ ~., 6 Conclusion We introduced the parallel legal reasoning system HELIe-II. The advantages of HELIe-II are as follows. O~--------------~--------------~O o 10 20 Number of processors Figure 13: Performance of the rule-based engine 1124 1. The hybrid architecture of HELIC-II is appropriate to realize legal reasoning. As the reasoning of both engines is data-driven, controlling these engines is easier. 2. The knowledge representation and inference mechanisms of HELIC-II are simple' but convenient to represent legal rules. and old cases. 3. By parallel inference, HELIC-II draws conclusion quickly. As the rule base and the case base of the legal domain are very large, quick searching and quick reasoning are important to develop practical systems. 4. Though it is troublesome to represent cases in detail, the rules of temporal reasoning help to describe cases. There are many tasks for extending HELIC-II. The following are examples. • Though the case-based engine IS focusing on the similarity between two cases, we have to develop a mechanism to contrast two cases By [Rissland et al. 1987],[Rissland et al. 1989]. comparing two inference trees, it is possible to construct a debate system. • To describe legal rules in detail, we have to integrate an extended logic system such as the logic of belief and knowledge with temporal logic on MGTP. • To improve the power of the similarity based matching of the case-based engine, we have to introduce a derivational analogy mechanism. • As inference trees are not suitable for allowing lawyers to understand the inference steps, they are represented in natural language. References [Uchida et al. 1988] Shunichi Uchida et al. . Research and Development of the Parallel Inference System in the Intermediate Stage of the FGCS Project. In Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. pp.16-36. [Goto et al. 1988] Atsuhiro Goto et al. . Overview of the Parallel Inference Machine Architecture. In Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. pp.208-229. [Chikayama et al. 1988] Takashi Chikayama et al. . Overview of the Parallel Inference Machine Operating System (PIMOS). In Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. pp.230-251. [Nitta et ai. 1991] K. Nitta et al. . Experimental Legal Reasoning System on Parallel Inference Machine. In Proc. PPAI Workshop of 12th IJCAI, Sydney, Australia, 1991. pp.139-145. [Rissland et al. 1987] E.L. Rissland et al. . A Case-Based System for Trade Secrets Law. In Proc. Int. Conf. on Artificial Intelligence and Law, Boston, USA, 1987. pp.60-66. [Rissland et al. 1989] E.L. Rissland et aI .. Interpreting Statutory Predicates. In Proc. Int. Conf. on A rtificial Intelligence and Law, Vancouver, CANADA, 1989. pp.46-53. [Sartor 1991] G. Sartor. The structure of Norm Conditions and Nonmonotonic Reasoning in Law. In Proc. Int. Conf. A rtificial Intelligence and Law, Oxford, UK, 1991. pp.155-164. [Branting 1989] L.K.Branting. Representing and Reusing Explanations of Legal Precedents. In Proc. Int. Conf. on A rtificial Intelligence and Law, Vancouver, CANADA, 1989. pp.103-110. [Sanders 1991(a)] K.Sanders. Representing and reasoning about open-textured predicates. In Proc. Int. Conf. on A rtificial Intelligence and Law, Oxford, UK, 1991. pp.137-144. [Sanders 1991(b)] K.Sanders. Planning in an OpenTextured Domain. A Thesis Proposal. Technical Report CS-91-08, Brown University, 1991. [Fujita et al. 1991] H.Fujita et al. . A Model Generation Theorem Prover in KL1 Using a Ramified-Stack Algorithm. ICOT TR-606. 1991. [Inoue et al. 1991] K.Inoue et al. . Embedding negation as failure into a model generation theorem prover. ICOT TR-722. 1991. [Allen 1984] J.F.Allen. Towards a general theory of action and time. Artificial Intelligence, Vol. 23, No.2 (1984 ),pp.123-154. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1125 Chart Parsers as Proof Procedures for Fixed-Mode Logic Programs David A. Rosenblueth IIMAS, UNAM Apdo. 20-726, 01000 Mexico D.F. drosenbl~unamvm1.bitnet Abstract Logic programs resemble context-free grammars. Moreover, Prolog's proof procedure can be viewed as a generalization of a simple top-down parser with backtracking. Just as there are parsers with advantages over that simple one, it may be desirable to develop other proof procedures for logic programs than the one used by Prolog. The similarity between definite clauses and productions suggests looking at parsing to develop such procedures. We show that for an important class of logic programs (fixed-mode logic programs with ground data structures) the conversion of parsers into proof procedures can be straightforward. This allows for proof procedures that construct refutations that Prolog does not find and opens up opportunities for parallelism. 1 Introduction A logic program consists of clauses that look like the productions of a context-free grammar. This suggests connections between proof procedures and parsers. In fact, Prolog's proof procedure can be regarded as a generalization of a simple parser with backtracking. Although this language has found numerous applications, its execution mechanism has several disadvantages. For instance, if such a mechanism finds an infinite branch of the derivation tree, it enters a non terminating loop. Thus, it may be desirable to develop new proof procedures for logic programs. Simple parsers with backtracking also enter nonterminating loops easily. This has motivated the design of other more sophisticated parsing methods. In contrast with proof procedures for logic programs, there already exists a great variety of parsers. The resemblance between definite clauses and productions suggests looking at parsers to develop new proof procedures. Pereira and Warren [1983] have adapted Earley's [1970] parsing algorithm, but the result is inefficient compared with Prolog. It uses subsumption, which is NPcomplete [Garey and Johnson 1979]. We show that by considering a restricted class of logic programs, parsers can be readily adapted to proof procedures. This class is important: it consists of fixed-mode logic programs with ground data structures. Moreover, our proof procedures do not use subsumption and may be more efficient than Pereira and Warren's. Compositional programs. By using difference lists to represent strings, a logic program can be restricted to coincide with the productions of a context-free grammar. Hence, for this class of logic programs, parsers are proof procedures. Such a class, however, only has the expressive power of context-free grammars. Assuming that we are interested in having a programming language, this suggests generalizing such programs without losing the close similarity with grammars. We do so by allowing the body of clauses to denote the composition of arbitrary binary relations; we call such programs "compositional." Prolog programs are not normally written in compositional form. Thus, we consider programs in a larger class (fixed-mode programs with ground data structures) and transform [Rosenblueth 1991] them into compositional form. Fixed-mode programs. A "mode" for a subgoal is the subset of arguments that are variables at the time the subgoal is selected. Thus, the mode depends on the derivation tree for a program and a query. When we refer to a "fixed-mode logic program," we actually mean a program and a query such that with Prolog's computation rule all subgoals with the same predicate symbol have the same mode. By further restricting these programs to have "ground data structures," we require all arguments in a subgoal that are not variables to be ground terms when the subgoal is selected. This class of program is important because it includes many programs occurring in practice. At first glance, it seems that the presence of difference lists causes a program to have data structures with variables. However, by separating both components of a difference list it is possible to write some programs using 1126 P-+-R .. ·S difference lists as programs with ground data structures. (The usual quicksort program is such an example; the sorted list is then built backwards.) Overview of the paper. The rest of this paper is organized as follows. Section 2 reviews chart parsers. Section 3 shows that such parsers are a,lso correct for compositional programs. Section 4 deals with a method for converting fixed-mode to compositional programs, thus making chart parsers proof procedures for the former class of programs. Section 5 compares these procedures with Pereira and Warren's. Section 6 concludes this paper with some remarks. 2 Chart parsers Charts. Chart parsers [Gazdar and Mellish 1989] are methods for parsing strings of context-free languages that can be regarded as a generalization of Earley's algorithm. A chart is a set of "partially" applied productions, usually called edges. Each edge contains, in addition to the part of a production to be applied and the left-hand side of that production, two pointers to symbols of the string being parsed. The substring between these pointers corresponds to the part of that production that has already been applied. It is useful to classify edges into those that have not been applied at all: empty active edges, those that have already been applied completely: passive edges, and all the others: nonempty active edges. The fundamental rule. New edges are created according to the following rule, often called the fundamental rule. If a chart contains: 1. an active edge (either empty or nonempty) from point a to point b in which the next symbol to be applied is Q, and 2. a passive edge with left-hand side Q, from point b to point c, then create a new edge from a to c in which the production is the same as the one in the active edge, but with Q applied. Figure 1 illustrates this rule. In figures representing edges, we use the following notation. Each edge is labeled with an arrow, a symbol to the left of the arrow, and a possibly empty string to the right. The symbol is the left-hand side of the partially applied production. The string is the part of that production that remains to be applied. Top-down and bottom-up parsing. The fundamental rule takes only existing edges to create new ones, and does not use information from the set of productions. b a c Figure 1: The fundamental rule. Therefore, a mechanism is needed for building edges from productions. Two main mechanisms for this purpose are used, commonly called "top-down" and "bottom-up" rules. The former builds parse trees from the root towards the leaves, and the latter does so from the leaves towards the root. The top-down rule creates edges as follows. If an active edge from a to b is added to the chart, in which the next symbol to be applied is Q, then create one empty active edge from b. to b for every production having Q as lejthand side and labeled with that production. Figure 2 exemplifies this rule. Q-+-S .. ·T one new edge for every production Q-+-S .. ·T a b Figure 2: The top-down rule. Given a parse tree having a leaf Q and a node P as parent of Q, this rule allows for Q to be expanded by creating an empty active edge with Q as left-hand side. Hence, parse trees are built by expanding the leaves with nonterminals, which is a construction of parse trees from the root towards the leaves. The bottom-up rule creates edges as follows. If a passive edge from a to b is added to the chart, in which the left-hand side symbol is Q, then create one empty active edge from a to a for every production having Q as first symbol on the right-hand side and labeled with that production. This rule is depicted in Figure 3. The bottom-up rule takes a passive edge, representing a parse subtree with Q as root. By creating an empty active edge with Q as first symbol to be applied, and P as left-hand side, Q becomes the child of a node P, which is the root of a new subtree. Thus, this rule builds parse trees from the leaves towards the root. 1127 P-Q···R one new edge for every production P-Q···R a b Figure 3: The bottom-up rule. Base of the chart. The fundamental rule takes two edges. One of them is active and the other one passive. The next symbol to be applied in the former must be the left-hand side of the latter. This means that the case where the next symbol to be applied is a terminal is not covered (all left-hand sides of productions are non terminals). We can remedy this situation by assuming that the productions have been written in such a way that each terminal occurs only in productions with exactly one symbol (that terminal) on the right-hand side. Now we can create certain edges as follows. For each production with a terminal occurring in the string being parsed, we create a passive edge from that terminal to the next one, labeled with that production. We can do so, because an edge represents a partially applied production (where "partially" may mean "completely") and all those productions can be immediately applied. Now we can rely only on the fundamental rule to operate existing edges. We shall call the set of all edges created from terminals the base of the chart. Initialization. To initialize a parser using the bottomup rule, it suffices to create the base. The reason is that the creation of edges in the bottom-up rule depends only on the existence of a passive edge. In a parser using the top-down rule, however, we must also create empty active edges from the first symbol of the string being parsed to itself labeled with productions having the start symbol of the grammar as left-hand side. This is because such a rule uses an active edge to create another one. Agenda. The rules for producing edges that we have described only create edges, but do not add them to the chart. Normally, chart parsers store edges in two different data structures: the chart and an agenda of the set of edges to be added to it. The choice of the procedure for selecting edges from the agenda to be added to the chart is a degree·of freedom relegated to the chart-parser designers. When an edge is removed from the agenda, it is added to the chart only if it has not been added before. ([], [a], [b]) ([a], [], [b)) ([a], [b]) ([],[a,b]) Figure 4: A char't constructed with the top-down rule. Example. Figure 4 shows a chart created by a parser using the top-down rule for the grammar with productions: a-- ko a kl a -- ([a], [], [b]) ko -- ([], [a], [b]) kl -- ([a], [b]) and the input string ([], [a], [b]) ([a], [], [b]) ([a], [b]) ([], [a, b]). Terminals have been enclosed in angled brackets. The last symbol ([], [a, b]) is not part of the string itself, but rather an end marker. This example will be used again to illustrate the chart created by a proof procedure when concatenating [a] to [b]. Phillips' variant of the bottom-up parser. Phillips observed [Simpkins and Hancox 1990] that the bottom-up chart parser can be modified so that some edges can be disposed of as the chart is built. The agenda, then, only keeps passive edges, ordered with respect to the position of the symbol on the string they start from. The chart only keeps active edges. When the first passive edge E is removed from the agenda and momen- tarily added to the chart, then 1. the fundamental rule is applied as many times as possible, and 2. the bottom-up rule is also applied if possible, followed by applications of the fundamental rule. In both cases, if the resulting edges are active, they are added to the chart; otherwise they are added to the agenda. After this, E can be disposed of. The reason is that E cannot contribute to the creation of any more new edges. 1128 3 Chart parsers as proof procedures In this section we will show that chart parsers can be regarded as proof procedures for compositional programs. State-oriented computations. The difference-list representation of strings associates a production (1) with a clause of the form (2) and a production with a single terminal on its right-hand side (3) Chart parsers as proof procedures. We shall generalize chart parsers to proof procedures by establishing a correspondence between chart parsing and resolution. The difference-list representation of languages suggests that clauses of the form (2) should play the role of productions with no terminals on the right-hand side (1). Clauses of the form (5) would then be the counterpart of productions with exactly one terminal on the right-hand side (3). Given this correspondence, we now turn our attention to edges. The fundamental rule of chart parsing takes two edges and produces another one. Resolution, on the other hand, takes two clauses and produces another one. This suggests identifying edges with clauses and the fundamental rule with a resolution step. The fundamental rule. If an edge from a to b labeled with Po ---7 Pi· .. Pn corresponds to a clause of the form with (4) (6) With a programming language having only those clauses we cannot compute all computable functions. But if we generalize (4) to then the fundamental rule corresponds to a resolution step having (6) (which plays the role of the active edge) and p(t, t') +- (5) where t and t' are terms such that var(t') ~ var(t), we can. (Throughout, var( t) denotes the set of variables occurring in term t.) This can be shown, for instance, by associating a logic program with a flowchart in such a way that both have the same set of computations [Clark and van Emden 1981]. A refutation for such a program and a query with a ground term in its first argument may be said to define a sequence of ground terms, resembling the sequence of states in a computation of a programming language using destructive assignment. Thus we shall say that such a logic program defines state-oriented computations. Strings vs. state-oriented computations. There are two main differences between state-oriented computations and strings. One is that at a given point of a state-oriented computation, there may be more than one way to extend it. State-oriented computations are then said to be nondeterministic. This phenomenon does not occur in strings, which have a linear structure. The other difference is that whereas we do know all the symbols of the string before it is parsed, we do not know initially all the states in a computation. A proof procedure could in principle compute some sequence of states before trying to build a chart. However, it may not be convenient to do so, because not all sequences of states form the base of a chart. A better idea is to extend the computations one step at a time, guided by the part of the chart built so far. Pi(b, c) +- (which plays the role of the passive edge) as input clauses. The resolving clause of this resolution step is which corresponds to an edge from a to c labeled with Po ---7 Pi +1 ••• Pn . By correctness of resolution, the resolving clause is a logical consequence of the two input clauses. Thus, we have generalized the fundamental rule to a correct operation. The top-down and the bottom-up rules. Given the above identification of clauses with edges, the topdown rule for parsing corresponds to the following. Let P be a program in compositional form. If a clause of the form is added to the chart, then create a clause of the form for every clause in P of the form The created clause is an instance of a clause in P, which is a logical consequence of P. The bottom-up rule can be generalized in a similar way. 1129 The base. The base can be extended one step at a time as follows. For each clause that is created, create a clause r(b, t'f)) ~ for each clause in P of the form r(t, t') ~ such that band t unify with unifier f) and there is a path from Pi to r. There is a path from p to r if 1. pis r or 2. there is a clause in P of the form p(t, t') ~ Conversion of fixed-mode to compositional programs We have seen that chart parsers can be regarded as proof procedures for compositional programs. However, logic programs are hot normally written in compositional form. In this section we observe that it is possible to convert a fixed-mode logic program with ground data structures into compositional form. The resulting program' is logically implied by an extension of the original one. First we define the class of programs transformable by our method and the class produced by it. Then we prove, for a particular example, the correctness of the resulting program. We omit the proof for the general case, which can be found in [Rosenblueth 1991]. 4.1 Directed and compositional programs Direc.ted form. The class of transformable programs has fixed modes. Thus we assume, without loss of generality, that in each predicate, all input arguments have been grouped into one argument, and all output arguments into another one. We write the input argument first, and the output argument second. A definite clause of the form where 1. var(ti) n var(tJ = 0, for i,j is a directed clause. A directed program is a logic program having only directed clauses. Condition 1 causes the term constructed when a subgoal succeeds to have an effect only on the input of other subgoals. Condition 2 causes the input argument of all selected subgoals to be ground if the input of the initial query is also ground and subgoals are selected in a left-to-right order. We include Condition 3 only for technical reasons. This is a minor restriction that considerably simplifies both stating our transformation and proving it correct. We call these "directed programs" because we can visualize the binding of a variable as flowing from one occurrence to subsequent occurrences. Compositional form. ini te clause of the form and there is a path from q to r. 4 3. each variable occurring in t~ occurs only once in t~, for i = 0, ... , n = 0, ... , nand i i- j; A compositional clause is a def- or where t and t' are terms such that var(t') ~ var(t), and the Xi are distinct variables. A logic program with only compositional clauses is a compositional program. We shall need various axioms. As with program clauses, we assume that each axiom is implicitly universally quantified with respect to its variables. Normally, an SLD-derivation is either successful, failed, or infinite. Sometimes, however, we shall use derivations that end in a clause that could possibly be resolved with a program clause. We shall refer to these derivations as partial derivations. A partial derivation with a single-subgoal initial query yields a conditional answer [Vasey 1986]. Such an answer is a clause in which the head is the subgoal in the initial query of that derivation with the composition of the substitutions applied to it, and the body is the set of subgoals in the last query of that derivation. 4.2 Example We illustrate our method with the following program for concatenating two lists. It defines the usual append relation, but its arguments have been grouped in such a way that its two inputs constitute the first argument, and its output, the second. a( (X, Y), (Z}) holds if Z is the concatenation of the list Y at the end of the list X. The angled brackets ( ) are an alternative notation for ordinary brackets [], that we use to group input and output arguments. We do this for clarity. a( ([], Y), (Y}) ~ a( _______ ([WIX], Y), ([WIZ]}) ~ 2. var(tD ~ var(t o) U '" U var(ti), for i = 0, ... , n; to ' t~ ~ a( (X, Y), '-v-" (Z)) ~ t~ tl (7) (8) 1130 We shall convert (8), which is directed, to compositional form. This process can be motivated as follows. Assume that we wish to construct an SLD-derivation for (7) and (8) with a query having a ground input that unifies with the head of (8). It is necessary, then, to remember the t~rm with which W unifies, to be able to add it to the front of the result of appending the lists that unify with X and Y. This lack of information in the arguments of the subgoal of (8) prevents us from representing a computation by the composition of the relation denoted by a(X, Y) with itself. To be able to use relational composition for representing computations, we must provide the missing information to the arguments. A common technique in the implementation of state-oriented languages for recording values needed in subsequent steps of a computation is the use of a stack. This suggests storing the term unifying with W in a list that is treated as a stack. We thus define the predicate: a( (StoIX), (StlIY) H St o = St l &, a(X, Y) (9) Although both St o and Stl represent the same stack, it will be convenient to keep two names for this term, so that the input of this new predicate shares no variables with the output. Later we will see why we wish clauses in which the input and the output of their atoms share no variables. We will also use of the standaTd equality theoTY. This theory consists of the following axioms: the "if" part of the definition of a (9) we obtain: a( (Sto, [WoIXo], Yo), (St l , [W1IZ1))) +St o = St l , Wo = WI, a( (Xo, Yo), (ZI)) Next we fold the "iff" version [UIV] = [U'IV'] H U = = V' of the function substitutivity axiom for the list-constructor function symbol: U' & V a( (Sto, [WoIXoJ, Yo), (Stb [W1IZl]) +[lVoISto] = [WlISt l ], a( (Xo, Yo), (ZI)) and fold the definition of a: a( (Sto, [WoIXo], Yo), (Stb [WI IZl)) ) f a( ([WoISto],Xo, Yo), ([WlISt l ], ZI)) (11) Now the head (Wo) of the first list in the original clause can be thought of as being removed from that list, and pushed onto the stack, then being removed from the stack with another name (WI) and finally added to the front of the result of appending the tail of the first list to the second. The fact that in (11) the inputs share no variables with the outputs allows us to fold the definitions of ko and kl : ko(U, V) H kl(U, V) H in the following clause: a(Uo, U3) X=XfX=Yf-Y=X X=Zf-X=Y,Y=Z J(XI, ... ,Xn ) = J(y}, ... , 1';1) p(U, V) f- f- Xl = Yi, ... , Xn = Yn U = X, V = Y,p(X, Y) which are called, respectively, reflexivity, symmetry, transitivity, function substitutivity, and predicate substitutivity. Note that the last two axioms are actually axiom schemas; an axiom is included for every function and predicate sy,mbol respectively. Next, we can derive another clause in which the input and the output of the atoms have no variables in common: a( ([H1 IX], Y), ([vl"IZ])) f- TV 3St 0 3W0 3X0 3Yo.[(St o, [WoIXo], Yo) = u & ([WoISto],Xo, Yo) = Vj 3St13W13Z1,[([WlIStl], Zl) = U & (St l , [T¥lIZl]) = V] = W', a( (X, Y), (Z)) (10) This clause can be obtained as a conditional answer, starting from the query f - a( U, V) and using function substitutivity to disassemble the term ([lVIZ)), and reflexivity to assemble it with lV' instead of TV. Next we can proceed as follows. Unfoldingl (10) on 1 In program-transformation terminology, the "unfold" operation is a resolution step. The "fold" operation replaces the subgoals that unify with a conjunction of atoms by a single atom using a definition. . f- (Sto, [WoIXo], Yo) = Uo, ([WoISt o], X o, Yo) = Ul , ([WlIStd, Zl) = U2, (St l , [WlIZl )) = U3, a(Ub U2) which is a logical consequence of (11) and the standard equality theory. The resulting clause is: a(UO,U3) +- kO(UO,Ul),a(Ul,U2),kl(U2,U3) Using a result found, for instance, in [Shoenfield 1967 p. 57, 58] we can prove that the fold steps preserve all models of the program. It may not be practical to transform a program with fold and unfold operations. The compositional form of a directed program may be obtained in a more straightforward manner based on the theorem in the Appendix. 4.3 Example (continued) The compositional form of the append program used to concatenate lists is, then: a(Uo, U3 ) f - ko(Uo, U1 ), a(U1, U2), kl (U2, U3) a((St, [], Y), (St, Y)) f ko( (St, [WIXj, Y), ([WISt], X, Y) +kl(([WISt], Z), (St, [WIZ]) +- 1131 The chart created by a proof procedure using the top-down rule for this program and the query .a( ([], [a], [b]), Z) was shown in Figure 4. 5 A comparison with Pereira and Warren's Earley deduction Pereira and Warren [1983] have extended Earley's [1970] algorithm to a proof procedure for logic programs that they call "Earley deduction," and we shall now compare their work with ours. Their proof procedure has the advantage that it can be applied to any logic program. Two rules produce new clauses; when none can be applied, the process terminates. Since chart parsers are a generalization of Earley's algorithm, we can give such rules using the chart-parsing terminology. 1. If the chart contains a clause C having a selected literal that unifies with a unit clause either in the chart or in the program, then create the resolvent of C with that unit clause. (This rule is the counterpart of the fundamental rule as well as the extension of the base.) 2. If the chart contains a clause having a selected literal that unifies with the head of a nonunit clause C in the program with most general unifier e, then create the clause (This rule parallels the top-down rule of chart parsing.) ceo A new clause is added to the chart only if there' is no clause already in the chart that subsumes the new one. Subsumption, however, is NP-complete [Garey and Johnson 1979]. Earley deduction terminates for some programs if subsumption is replaced by a test for syntactic equality. This change results in a proof procedure that can be faster than the original Earley deduction and our methods. Our proof procedures, however, are preferable than this variant of Earley deduction in programs for which our methods terminate but such a variant does not. We now exhibit one such example. Given the directed program p(O,X).- p(O,j(X)) and a chart initialized with the clause ans(Y) .p( 0, Y), Earley deduction with a syntactic equality test instead of subsumption produces the infinite sequence p((), Y) .- p(O, j(Y)) p(O,j(Y))'- p(O,j(f(Y))) , With subsumption, Earley deduction does terminate for this example. Our method, in contrast, does not require subsumption and yet also terminates. We have implemented Earley deduction based on the top-down chart parser of [Gazdar and Mellish 1989, p. 211, 212]. and using Robinson's [1965] subsumption algorithm as modified in [Gottlob and Leitsch 1985]. We have also adapted both top-down and bottom-up parsers [Gazdar and Mellish 1989, p. 208-212] to proof procedures for compositional programs: In addition, we have modified Phillips' variant of the bottom-up chart parser as presented in [Simpkins and Hancox 1990]. The following table summarizes execution times for several programs and queries. The tests were performed on a SUN SPARC station 1 using SICStus Prolog. perm hanoi append qsort PW1 time 48 36 49 249 top-down time su 46 1.0 21 1.7 22 2.2 30 8.3 Phillips time su 11 4.4 4.0 9 5 9.8 7 35.6 PW2 time su 7 6.9 2 18.0 6 8.2 17 14.6 "perm" computes all permutations (four elements), "hanoi" solves the Towers of Hanoi problem using difference lists to store the sequence of steps of the solution (five disks), "append" is the ordinary append used to concatenate lists (80 elements), and "qsort" is quicksort using difference lists (20 elements). "PW1" is Pereira and Warren's proof procedure, "top-down" and "Phillips" result from our method, and "PW2" is a variant of Pereira and Warren's proof procedure in which subsumption has been replaced by a syntactic equality test. "su" stands for "speedup." Times are in seconds. 6 Concluding remarks Chart parsers work for a generalization of the differencelist representation of context-free grammars. This generalization replaces the clauses representing productions with exactly one terminal by clauses having terms subject to only one syntactic restriction: all variables in the second argument must appear in the first (compositional programs). It is possible to transform [Rosenblueth 1991] fixedmode logic programs into this generalization by adding arguments that play the role of a stack. Consequently, chart parsers can be used as proof procedures for fixed-mode logic programs transformed by this method. Strings correspond to sequences of ground terms. Experiments have shown that programs so transformed can be executed several times faster than with the previous adaptation of Earley's parser to a proof procedure done by Pereira and Warren [1983]. Phillips has modified [Simpkins and Hancox 1990] the bottom-up chart parser so that portions of the chart being built can be disposed of. It is essential in the doctored parser to keep edges ordered with respect to the string 1132 being parsed. In compositional programs, computations form sequences and Phillips' idea can also be applied. It is not clear how to apply it to Pereira and Warren's method. Proof procedures obtained from chart parsers terminate for some pr.ograms for which Prolog does not. In addition, it is possible to build charts in parallel [Trehan and Wilk 1988]. Acknowledgments We are grateful to Felipe Bracho, Carlos Brody, Warren Greiff, Rafael Ramirez, Paul Strooper, and Carlos Velarde. The anonymous referees also made valuable suggestions. We acknowledge the facilities provided by IIMAS, UN AM. Bibliography [Clark and van Emden 1981] Keith L. Clark and M.H. van Emden. Consequence verification of flowcharts. IEEE Transactions on Software Engineering, SE7(1):52-60, January 1981. [Earley 1970] Jay Earley. An efficient context-free parsing algorithm. Communications of the A CM, 14:453460, 1970. [Garey and Johnson 1979] Michael R. Garey and David S. Johnson. Computers and Intractability. A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979. [Gazdar and Mellish 1989] Gerald Gazdar and Chris Mellish. Natural Language Processing in Prolog. An Introduction to Computational Linguistics. AddisonWesley, 1989. [Gottlob and Leitsch 1985J G. Gottlob and A. Leitsch. On the efficiency of subsumption algorithms. Journal of the ACM, 32(2):280-295, 1985. [Simpkins and Hancox 1990] Neil K. Simpkins and Peter Hancox. Chart parsing in Prolog. New Generation Computing, 8:113-138, 1990. [Trehan and Wilk 1988] R. Trehan and P.F. Wilko A parallel chart parser for the committed choice nondeterministic logic languages. In K.A. Bowen and R.A. Kowalski, editors, Logic Programming: Proceedings of the Fifth International Conference and Symposium, pages 212-232. MIT Press, 1988. [Vasey 1986] P. Vasey. Qualified answers and their application to transformation. In Proceedings of the Third International Logic Programming Conference, pages 425-432. Springer-Verlag Lecture Notes in Computer Science 225, 1986. Appendix Our method for converting fixed-mode programs to compositional form is based on the following theorem, which is proved in [Rosenblueth 1991]. Theorem 1 Let C be a directed clause Po( to, t~) t- PI (t~, td, P2( t~, t 2), ... ,Pn (t~_l' tn) and let I1i = (var(to) U··· U var(ti-l)) n (var(tD U··· U var(t~)) for i = 1, ... ,n. Then the clause Po(XO ,X2n+t} t - ko(XO ,Xt},PI(Xb X 2), k l (X 2, X 3 ),P2(X3 ,X4 ), ••• , Pn(X2n - l , X 2n ), kn(X2n , X 2n+1) is logically implied by C, the standard equality theory, the "iff" version of the function substitutivity axiom for the list-constructor function symbol, and the following axzoms: Pi((St/X), (St'/Y}) H St = St' & Pi(X, Y) i = 0, ... ,n ko(U, V) H 3}},0" . 3Ymo ,0.[(St/to} = U [Pereira and Warren 1983] Fernando C.N. Pereira and David H.D. Warren. Parsing as deduction. Technical Report 295, SRI, June 1983. kl(U, V) H 3}},1'" 3Ym1 ,1.[(E 1/t l [Robinson 1965] J.A. Robinson. A machine-oriented logic based on the resolution principle. J. A CM, 12:23-41, 1965. kn-I(U, V) [Rosenblueth 1991] David A. Rosenblueth. Fixed-mode logic programs as state-oriented programs. Technical Report Preimpreso No.2, IIMAS, UNAM, 1991. [Shoenfield 1967] Joseph R. Shoenfield. Logic. Addison-Wesley, 1967. !l1athematical & (EI/t~) } =U = V] & (E2It~) = V] H 3}},n-1 ... 3Ymn_l,n-I.[{En-l/tn-l) = U & (En/t~_l) = V] kn(U, V) H 3}},n'" 3Ymn ,n.[(En /t n) = U & (Stlt~) = V] where }},i, ... ,Ym"i are the variables on the right-hand side of the definition of ki' except for U and V, for i = 0, ... ,n; Ei is any list of the form [X1,i, ... ,Xd"i/St], and {X1,i, .. . ,Xd"d = ni , for i = 1, ... , n. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE O~ FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 1133 A Discourse Structure Analyzer for Japanese Text* K. Sumita, K. Ono, T. Chino, T. Ukita, and S. Amano Toshiba Corp. R&D Center Komukai-Toshiba-cho 1, Saiwai-ku, Kawasaki 210, Japan sumita@isl.rdc.toshiba.co.jp Abstract This paper presents a practical procedure for analyzing discourse structures for Japanese text, where the structures are represented by binary trees. In order to construct discourse structures for Japanese argumentative articles, the procedure uses local thinking-flow restrictions, segmentation rules, and topic flow preference. The thinking-flow restrictions restrict the consecutive combination of relationships detected by connective expressions. Whereas the thinking-flow restrictions restrict the discourse structures locally, the segmentation rules constrain them globally, based on rhetorical dependencies between distant sentences. In addition, the topic flow preference, which is the information concerning the linkage of topic expressions and normal noun phrases, chooses preferable structures. Using these restrictions, the procedure can recognize the scope of relationships between blocks of sentences, which no· other discourse structure analysis methods can handle. The procedure has been applied to 18 Japanese articles, different from the data used for algorithm development. Results show that this approach is promising for extracting discourse information. 1 Introduction A computational theory for analyzing linguistic discourse structure and its practical procedure are necessary to develop machine systems dealing with plural sentences; e.g., systems for text summarization and for knowledge extraction from a text corpus. Hobbs developed a theory in which he arranged three kinds of relationships between sentences from the text coherency viewpoint [Hobbs 1979]. Grosz and Sidner proposed a theory which accounted for interactions between three notions on discourse: linguistic structure, intention, and attention [Grosz and Sidner 1986]. Litman and Allen described a model in which a discourse structure of conversation was built by recognizing a participant's plans [Litman and Allen 1987]. These theories all de*This work was supported by ICOT (Institute for New Generation Computer Technology), and was carried out as a part of the Fifth Generation Computer Systems research. pend on extra-linguistic knowledge, the accumulation of which presents a problem in the realization of a practical analyzer. The authors aim to build a practical analyzer which dispenses with such extra-linguistic knowledge dependent on topic areas of articles to be analyzed. Mann and Thompson proposed a linguistic structure of text describing relationships between sentences and their relative importance [Mann and Thompson 1987]. However, no method for extracting the relationships from superficial linguistic expressions was described in their paper. Cohen proposed a framework for analyzing the structure of argumentative discourse [Cohen 1987], yet did not provide a concrete identification procedure for 'evidence' relationships between sentences, where no linguistic clues indicate the relationships. Also, since only relationships between successive sentences were considered, the scope which the relationships cover cannot be analyzed, even if explicit connectives are detected. This paper discusses a practical procedure for analyzing the discourse structure of Japanese text. The authors present a machine analyzer for extracting such structure, the main component of which is a structure analysis using thinking-flow restrictions for processing of argumentative documents. These restrictions, which examine possible sequences of relationships extracted from connective expressions in sentences, indicate which sentences should be grouped together to define the discourse structure. 2 2.1 Discourse structure of Japanese text Discourse structure This paper focuses on analyzing discourse structure, representing relationships between sentences. In text, various rhetorical patterns are used to clarify the principle of argument. Among them, connective expressions, which state inter-sentence relationships, are the most significant. They can be divided into the ca.tegories described in Table 1. Here, connective expressions include not only normal connectives such as "therefore", but also idiomatic 1134 expressions stating relations' to the other part of text such as "in addition" and "here is described." The authors extracted 800 connective expressions from a preliminary analysis of more than 1,000 sentences in several argumentative a.rticles [Ono et al. 1989]. Then, connective relationships were classified into 18 categories 'as shown in Table 1. Using these relationships, linguistic structures of articles are captured. Table 1 is the current version of the relationship categories. The number of relationship categories necessary and sufficient to represent discourse structures must be determined through further experimentation. New categories will be formed as need becomes apparent; likewise, categories found to overlap in function will be merged. Final categorization can only be fixed after extensive analysis. Sentences of similar content may be grouped together into a block. Just as each sentence in a block serves specific roles, e.g., "serial", "parallel", and "contrast" , each block in text serves a similar function. Thus, the discourse structure must be able to represent hierarchical structures as well as individual relationships between sentences. In this paper, a discourse structure is represented as a binary tree whose terminal nodes are sentences; sub-trees correspond to local blocks of sentences in text. Figure 1 shows a paragra.ph from an article titled "a zero-crossing rate which estima.tes the frequency of a speech signal," where underlined words indicate connective expressions. Figure 2 shows its discourse structure. Extension relationships are set to sentences without any explicit connective expressions. Although the fourth and fifth sentences are clearly the exemplification of the first three sentences, the sixth is not. Thus, the first five can be grouped into a block. Discourse structure can be represented by a formula. The discourse structure in Figure 2 corresponds to the following formula. [[[1 [2 3JJ [4 5JJ 6J. 2.2 Local constraint for consecutive relationships For analyzing discourse structure, a local constraint on consecutive relationships between blocks of sentences is introduced. The example shown in Figures 1 and 2 suggests that the sequence of connective relationships can limit the accepted discourse structures to those most accurately representative of original argumentative text. Consider the sequence [P Q RJ, where P, Q, R are arbitrary (blocks of) sentences. The premise of R is obviously not only Q but both P and Q. Since the argument in P and Q is considered to close locally, the two should be grouped into a block. This is a local constraint on natural argumentation. Table 1: Connective relationships. EXAMPLES and EXPLANATION RELATION serial connection tt. tJ~ ~ (thus, therefore), J: dakara negative connection tt. tJ~ (but), daga reason parallel "? "C (then) yotte L tJ~ L (though) shikashi ~.tf 1J: ~ (because)~ nazenara ~ (J) KR ti (the reason is ...) sono wake wa IiiJ Ifi te (at the same time), doujini c! ~. te (in addition) saram contrast -11 (however), & 00 (on the contrary) exemplification -wtJ X. f'f (for example), ippou hanmen tatoeba ... ~-c: ~ Q (and so on) .,. nado dearu repetition <:RF> ~ ? (J) toiunowa l: ti (in other words), ~ It, ti (it is ...) sore wa supplementation rephrase summarization extension t !:> .'?Iv (of course) mochiron "? 'i D, T 1J: :b !:> (that is ...) tsumari sunawachi tafiU(after all), 'i l: ~ Q l: (in sum) kekkyolcu matomeruto L It, ti (this is) /core wa definition L C -C" •.• l: rhetorical question direction 1J:.tf ... 1J: (J) tt. ~ ? tJ~ (Why is it ...) T Q (... is defined as ...) /co/co de ... to suru naze .,. nanodarouka C C-c:ti ... ~~~Q Iwlcode wa .,. wo noberu (here ... is described) reference ~xte ... ~~~ Q (Fig.X shows ...) topic shift background c! "'C, l: C -? -C:(well, now) enumeration M- te (in the rust place), ZIl X ni .. , wo noberu sate ~ * juurai tolcorode (hitherto) dai 1 ni M= te(in the second place) dai 2 ni 1135 In the context of discrete-time signals, zerocrossing is said to occur if successive samples have different algebraic signs. 2 The rate at which zero crossings occur is a simple measure of the frequency content of a signal. 3 : ~ particularly true of narrow band signals. 4 For example, a sinusoidal signal of frequency R>, sampled at a rate Fs, has Fs/R> samples per cycle of the sine wave. 5 : Each cycle has two zero crossings so that the long-term average rate of zero-crossings is Z = 2R>/Fs 6 : Thus, the average zero-crossing rate gives a reasonable way to estimate the frequency of a sine wave. Figure I: Text example 1. : extension : exemplification : serial 1 4 5 6 Figure 2: Discourse structure for the text example 1.' (This structure can be represented as the form [[[1 [2 3]] [4 5]] 6].) Thinking-flow is defined by a sequence of connective relationships and the way in which the sequence fits into the allowable structure. The authors have investigated all 324 (18 x 18) pairs of connective relationships and derived possible local structures for thinking.:flow restrictions. The pairs of connective relationships can be represented by (rl, r2), where the relations rl and r2 are arbitrary connective relatioriships. They can be classified into the following four major groups. (1) POP-type: permitting [[P rl Q] r2 R] (eliminating [p rl [Q r2 R]]) ex. [[P Q] R], : exemplification, : serial. (2) PUSH-type: permitting [p rl [Q r2 R]] ex. [P [Q R]], : reason. (3) NEUTRAL-type: permitting both (1) and (2) ex. [[p Q] R], [p [Q R]], : parallel. (4) NON-type: permitting non-structure [p rl Q r2 R] ex. [P Q R]. The relationship sequence of POP-type means that the local structure for the first two blocks should be popped up, because the local argument is closed. On the other hand, the relationship sequence of PUSH-type means that the local structure should be pushed down. The relationship sequence of NON-type permits nonstructure, which is of the form [P rl Q r2 RJ. Therefore, to be exact, the discourse structure which contains the sequence of this type is not a binary tree. The thinking-flow restrictions can be used to eliminate structures expressing unnatural argumentative extensions, by examining their local structures. Although the thinking-flow restrictions define local constraints on relationships to neighbors, the scope of rela.tionships is analyzed by recursively checking all local structures of a discourse structure. 2.3 Distant dependencies The greater part of text ca.n be appropriately analyzed, using the above local constraints on connective relationships to neighbors, if the relationships are extracted correctly. However, in real text, there are rhetorical dependencies concerning distant sentences, which cannot be detected by examining only the normal relationships to neighbors. Two kinds of linguistic clues to distant dependencies must be considered in the realization of a precise discourse analyzer: rhetorical expressions which cover distant sentences, and referential relations of words, in particular, topics. 2.3.1 Rhetorical expressions stating global structure First, rhetorical expressions which relate to an entire article play an important role. Examples are: "... ? ... ? The reason is, ... ", " ... as follows .... (TENSE=present). . .. (TENSE=present).", ". " is not an exceptional case. '" ... ". Consider the text example in Figure 3, in which unnecessary words are omitted for expositional clarity. In this text the rhetorical expressions which relate to the entire paragraph affect its discourse structure. The expressions "first" and "second" in the last two sentences correspond to the expression "two pieces" in the first sentence; the second and the third sentences, therefore, cab be said to be connected by parallel relationship, as they have similar relations with the first sentence. Thus, the discourse structure in Figure 4 is a natural representation. 1136 While, in real text, there is a wide variety of rhetorical expressions of this type, those that are often used in argumentative articles can be determined through ana.lysis. A robust discourse analysis system must detect these rhetorical expressions to restrict discourse structures. 2.3.2 Topic frow The other significant phenomenon concerning the distant dependencies is reference. While English uses pronouns and definite noun phrases in reference, in Japanese, a phrase that is identical to or a part of the original noun phrase is used when referring to some other part of the text. By analyzing the appearance of the same expressions, a restriction or a preference for building discourse structures can be determined. However, the same expressions tend to scatter in a text: and it is difficult to determine the referent for a reference without tasl P. In the case of the text shown in Figure 5, T2 => 1, T3 => 2, and T4 => 3 hold. If a topic in a sentence refers to a word in the previous sentence, it is regarded as an elaboration of the earlier sentence. Thus, these sentences must be kept close together in their discourse structure; the structure depicted in Figure 6 is appropriate for this text. In addition, relative importance of relationship connecting sentences in text must be considered for the topic flow analysis. Connective relationships can be classified into three categories according to their relative importance: left-hand, right-hand, and neutral type. For example, the exemplification relationship is a left-hand type; i.e., for [p QJ, P strongly relates to the global flow of argumentation beyond the outside of this block, and in this sense P is more important than Q. In contrast, the serial relationship is a right-hand type, and the parallel relationship is a neutral type. Consider the structure [[P rl QJ r2 RJ, where 'rl' is a left-hand type relationship, and 'r2' can be any relationship. If TR => P, the above structure is natural, even if there is the same word as TR in Q. However, if TR => Q, this structure is unnatural, in the sense of coherency. In this case, the structure [P rl [Q r2 RJ J is preferable to [[p r1 QJ r2 RJ. On the contrary, in the case .where 'r1' is a righthand type, [[P r1 QJ r2 RJ is a natural structure, even if TR => Q. In short, the naturalness of a discourse structure closely depends on the appearance position of topics and their referents, and the relative importance of the referred nodes. 1 : Two pieces of X are relevant. 2 : First, ... . 3 : Second, ... . Figure 3: Text example 2 (X is a noun phrase.) ~ / enumeration parallel 3 Figure 4: Discourse structure for the text example 2. 1 : AttB C Ci3:l. b ~ ~ 0 A wa B to C lcara naru A consists of Band C. 2: ctt ... DcEK:5Ht bnQo C wa ... D to E ni wakerareru C is divided into D and E. 3 : Dtt .... F~#f"'Jo D wa ... F wo motsu D has .,. F. 4 : Ftt ... 0 F wa .., F is .... Figure 5: Text example 3 (A - F are noun phrases.) ~ EX> : extension «EX> 123 4 Figure 6: Discourse structure for the text example 3. 1137 3 Discourse structure analyzer 3.1 System configuration Input Figure 7 shows the discourse structure analyzer, which consists of five parts: pre-processing, segmentation, candidate generation, candidate reduction and preference judgement. If input text consists of multiple paragraphs or multiple sections, every section or every paragraph in the text is analyzed individually. Figure 8 outlines the input/output data of each stage for a paragraph. The outline of each stage of the discourse structure analyzer is described in the following sections. 3.1.1 Pre-processing In this stage, input sentences are analyzed, character strings are divided into words, and the dependency structure for each sentence is constructed. The stage consists of the following sub-processes: (1) Extracting the text of an article from chapters or sections. (2) Accomplishing morphological and syntactic analysis. (3) Extracting topic expressions and the reappearance of the targeted expression. (4) Detecting connective relationships and constructing their sequence. Segmentation Rule Thinking-flow Restriction Topics and their Referents Output Figure 7: System overview. Input sentences: 1."0 2~1 tc"' o 3 (: Q) .. ·A .. ·o 4Ati· .. o dai 1 n; First, kono This Awa Ais 5~2 tc ... 0 6 L. tc ;t~ "? -C ... 0 shitagatte dai 2 ni Second, Thus t Pre-processing result: [1 2 3 4 5 6] In Step (1), the title of an article is eliminated, and the body is extracted. Next, in Step (2), sentences in the body of the article, extra.cted in Step (1), a.re morphologically and syntactically analyzed. In Step (3), topic expressions are extracted, according to a table of topic denotation expressions. The following are examples of topic expressions. " ... " ... " ... " ... wa" (as for ... ), niwa" (in ... ), dewa" (in ... ), nioitewa" (in ... ). In Step (4), a connecti ve expression is detected based on an expression table consisting of a word and its part of speech for individual connective relationships. In this step, connection sequence, a sequence of sentence identifiers and connective relationships, is acquired. For example, a connection sequence is of the form t Segmentation result: [1 {2 3 4} @ 5 6] t Candidate generation result: [[1 [[[2 3] 4] 5]] 6] [[1 [[2 [3 4]] 5]] 6] [1 [[[[2 3] 4] 5] 6]] [l [[[2 [3 4]] 5] 6]] [1 [[[2 3] 4] [5 6]]] [l [[2 [3 4]] [5 6]]] I Candidate reduction result: [[1 [[[2 3] 4] 5]] 6] [[1 [[2 [3 4]] 5]] 6] [1 2 3 4 5 6], as is shown as the final result in Figure 8. 3.1.2 Segmentation In this stage, rhetorical expressions between distant sentences, which define discourse structure, are detected. They form restrictions on segmentation of text. This stage is implemented as a rule-based proce- Since «EN>, , [[2 [3 4]] 5]] 6] Figure 8: Output example of each process. 1138 dure [0 no et al. 1991]. If-then rules, called segmentation rules, have been formulated in advance. The ifpart of a segmentation rule corresponds to linguistic surface patterns to detect inter-sentence rhetorical expressions, e.g. "as follows. . .. First... ... Second ... ". The thenpart represents a connection sequence embedded with control operators discussed below. Also, the then-part can indicate an exchange of connectIve relationships. There are three kinds of control operators. They are '{' and '}', '(' and ')', and '<0'. Sentences enclosed by '{' and '}' must be grouped together as a block of sentences. Operators '(' and ')' are similar to '{' and '}'. They can be used singly, while the operators '{' and '}' must be used in pairs. The operator '<0' means that the position must not be a boundary of a sentence block. Figure 9 shows examples of the segmentation rules. The first example means that if the Nth sentence includes expression "tashikani" (of course), and the Mth sentence includes expression "shikashz~' (though), then from N+lst to M-lst sentences must be grouped together. For the input sentences and the connection sequence in Figure 8, the second rule is activated. The connection sequence is then converted into If-part : Sentence No.: N Connective relationship: Input string: .... ·0 liE:6:.tc. .. ·o M L:6:. L .. ·o '''0 tashikani of course .. shikashi though then-part : ["'{N {N+l ... M-l}} @ M ... ] If-part : Sentence No.: N Connective relationship: Input string: ..... M ~ltc.· .. o ~2tc· .. o .. doi 1 ni first, doi 2 ni second, then-part : [ ... N-l {N ... M-l} @ M ... ] Figure 9: Segmentation rule example. [1 {2 3 4} 5 6]. This structure directs the next stage to generate discourse structure candidates whose second, third and fourth sentences are grouped into a block. At present, approximately 100 rules are available in the system. 3.1.3 Candidate generation All possible discourse structures, described by binarytrees which do not violate segmentation restrictions, are generated as discourse structure candidates. The generation is performed in a bottom-up manner of sentence parsing' by the CYK algorithm. After the generation of sub-trees for blocks directed by segmentation restrictions, the whole trees are generated based on these subtrees. In case of the example in Figure 8, only 6 candidates are generated, while 42 binary trees would be produced without the segmentation rules. 3.1.4 Candidate reduction Local structures of generated structure candidates are checked by inspecting thinking-flow restrictions. The candidates, including a local structure violating the restrictions are discarded. Only legal candidates are passed on to the next stage. In order to show the effectiveness of the thinkingflow restrictions, consider the following connection sequence. [1 2 3 4 5]. Figure 10 shows discourse structure candidates for the above sequence. There are 14 binary tree possibilities. The candidates violating the thinking-flow restrictions are eliminated. For example, the first structure is discarded, because it contains the local structure [2 [3 4]], and the pair «EG>, [[3 "). As a result of elimination through thinking-flow restrictions,l1 candidates can be discarded, and the second the fourth and the tenth structures remain. 'In the above example, in the case outlined in Figure 8 structure candidates unnatural from the viewpoint of thinking-flow are discarded. Since the third through sixth candidates violate thinking-flow restrictions, the candidates are reduced to two structures. The thinking-flow restrictions are represented in the system as a table of the applicable pa.irs of consecutive relationships and their acceptable local structures. 3.1.5 Preference judgment The final result of discourse analysis is the structure with the lowest penalty score, a value associated with topicreferent relationships. A penalty is set against each arc of path on a discourse structure, which leads from a sentence containing a topic to a sentence referred to by the topic. The concrete arc of a discourse structure, on which a. penalty is imposed, is either an arc to or from an unimportant node or an arc to an equally important node. For example, for the structure [[P Q] R] where TR =} Q, 1139 1: [[1 [2 [3 4]]] 5] : NG «. [3 " :NG) 2: [[1 [[2 3] 4]] 5] 3: [[[1 2] [3 4]] 5] : NG «, [3 " :NG) 4: [[[1 [2 3]] 4] 5] 5: [[[[1 2] 3] 4] 5] : NG «, 2] " :NG) 6: [1 [2 [3 [4 5]]]] : NG «, [4 " :NG) 7: [1 [2 [[3 4] 5]]] : NG «, [[3 ":NG) 8: [1 [[2 3] [4 5]]] : NG «, [4 " :NG) 9: [1 [[2 [3 4]] 5]] : NG «, [3 " :NG) 10: [1 [[[2 3] 4] 5]] 11: [[1 2] [3 [4 5]]] : NG «, [4 ":NG) 12: [[1 2] [[3 4] 5]] : NG «, [[3 ":NG) 13: [[1 [2 3]] [4 5]] : NG «, [4 " :NG) 14: [[[1 2] 3] [4 5]] : NG «, [4 " :NG) Figure 10: Discourse structure candidates. a penalty is imposed on the arc from the parent node of P and Q to Q because the left node in an exemplification relationship is unimportant. The penalty of a discourse structure is defined as a sum of penalties for all paths concerning all topics in the paragraph. By selecting the structure candidate with the lowest penalty, the most coherent discourse structure is obtained. Of the two surviving structures of the ca.ndidate reduction process in Figure 8, the second structure is preferable. The difference is the structural relationship between the second and fourth sentences: the local structure for the first candidate is [[2 3] 4], and that for the second candidate is [2 [3 4]]. Since T4 => 3, a penalty is imposed on the first structure, but not on the second structure. As a result, the second structure candidate is chosen. as the terminal nodes of the structure. The connective relationship expressed in the first sentence of each paragraph is used for making the connection sequence. After structure candidates are generated based on t.lle connection sequence, candidates unnatural from the viewpoint of thinking-flow are eliminated. Since every paragraph is analyzed into a discourse structure, each node of the discourse structure for a section also forms t.he discourse structure for the corresponding paragraph. 3.2 Experiment To evaluate the discourse structure analyzer, 18 journal articles, different from the data used for algorithm development or rule extraction, have been analyzed. The journal used is "Toshiba Review", which publishes short technical papers of three or four pages. An experiment has been carried out on every paragraph. Correct discourse structure for every paragraph was made manually in advance. The system's performance was evaluated by comparing the correct human-produced structures and the structures analyzed by the system, Table 2 shows analysis results. There are a total of 554 paragraphs. Nearly 50% of them consist of only one sentence and are excluded from consideration. For 114 paragraphs consisting of more than three sentences, a correct analysis was produced for approximately seventyfour percent. . There were 15 errors for all of the processed paragraphs. Most of the errors are due to incorrect detection of relationships (60%), or incorrect candidate reduction (27%). For the former, the procedure failed to detect explicit connective expressions because of insufficient dictionary data, which can be improved by refining the dictionary data. Most of the latter type of errors occur in a paragraph in which the first or last sentence refers to information outside of the paragraph by such phrases as "as shown above" or "as follows." This suggests that the procedure should also take into account relationships to Table 2: Analysis results paragraph correct* size (number of (unique) sentences) 1 - - - -2-- ----------53 8 3 4 5 6 7 8 While every paragraph can be analyzed respectively, a chapter or a section containing multiple paragraphs is analyzed in an analysis manner similar to that of a paragraph. In case of a discourse structure for a chapter, or a section, paragraphs rather than sentences are used - correct* incorrect* (other candidate) 9 Total - - - - -- 12 7 3 5 2 2 5 1 0 0 0 0 6 7 2 0 0 0 0 84 14 15 Total * 293 147 -----67 24 10 3 6 2 2 114+(554) * ~umbers indicate c~unts of paragraphs, except for the paragraph sIZe. + Total nllmber of paragraphs consisting of more than 3 sentences. 1140 neighboring paragraphs. In the segmentation stage segmentation rules were activated for 35 paragraphs, with 85% of the rules correctly used; 65% have contributed to structure determination for itemized parts of text, and 20% to relationship determination. In addition, the preference judgment stage has increased the accuracy of the analysis by 3%. Except for the effects of these contributions, correct relationships have been detected in 73 paragraphs, and correct results have been obtained for ,5.5 paragraphs. Thus, if correct connective relationships are detected, 73% of discourse structures can be appropriately analyzed using . thinking-flow restrictions only. 4 Concluding remarks A practical analyzer has been described for building discourse structures for Japanese argumentative or explanatory articles. To analyze structures, three types of knowledge are used: thinking-flow rest.rictions, sf'gmentation rules, and topic-flow preference. They represent relative constraints between connective relationships or structural restrictions spanning a paragraph, as opposed to the relative importance between consecutive sentences on which other discourse structure analysis researchers depend. Using linguistic knowledge, global structures or the scope of relationships can be determined appropriately. In addition, the above knowledge on which the procedure is based is detected from superficial linguistic clues independent of topic areas in analyzed articles. The authors are convinced that the method is effective for any articles whose aim is persuasion or assertion. It should be noted that the relative importance of sentences can be evaluated, using the extracted discourse structure. For example, a left-hand node of a structure linked by exemplification relationship is more important than the right-hand node, as discussed in Section 2.3.2. By a recursive application of relative importance judgment from the top node of discourse structure analyzed from a paragraph, the key-sentence in the paragraph can be extracted. In addition to the key-sentence extraction shown above, the extracted structure can be a promising clue to other various natural language processes, such as topic estimation and knowledge extraction. The authors intend to polish up the presented restrictions and rules, and refine the procedure toward these natural language processes. References [Cohen 1987] Cohen, R.: "Analyzing the Structure of Argumentative Discourse", Computational Linguistics, Vol.13, 1987, pp.11-24. [Grosz and Sidner 1986] Grosz, B.J. and Sidner, C.L.: "Attention, Intentions and the Structure of Discourse", Computational Linguistics, Vo1.12, 1986, pp.175-204. [Hobbs 1979] Hobbs, J.R.: "Coherence and Coreference", Cognitive Science, Vo1.3, 1979, pp.67-90. [Litman and Allen 1987] Litman, D.J. and Allen, J.F.: "A Plan Recognition Model for Subdialogues in Conversations", Cognitive Science, VoU1, 1987, pp.163-200. [Mann and Thompson 1987] Mann, \V.C. and Thompson, S.A.: "Rhetorical Structure Theory: A Framework for the Analysis of Texts", USC/Information Science Institute Research Report RR-87-190, 1987. [Nagano 1986] Nagano, K.: Bunshouron Sousets'u Bunpouron-teki [(ousatsu- (An Introduction to Theory of Texts -Syntactic Consideration-), Asakusa Shoten, 1986, (in Japanese). [Ono et al. 1989] Ono, K., Ukita, T., and Amano, S.: "An Analysis of Rhetorical Structure", IPS Japan Technical Report NL 70-2, 1989, (in Japanese). [Ono et al. 1991] Ono, K., Sumita, K., Ukita, T., and Amano, S.: "Text Segmentation and Discourse Analysis", Proc. IPS Japan '91 October, 4E-2, 1991, (in Japanese). [Schank 1977] Schank, R.C.: "Rules and Topics in Conversation", Cognitive Science, VoLl, 1977, pp.421441. [Sidner 1983] Sidner, C.L.: "Focusing in Comprehension of Definite Anaphora", M.Brady and R.C.Berwick (Eds.), Computational Models of Discourse, MIT Press, 1983, pp.267-330. [Sumita et ai. 1991] Sumita, K., Ukita, T., and Amano, S.: "Disambiguation in Natural Language Interpretation Based on Amount of Information", IEICE Trans., Vol.E74, No.6, 1991, pp.1735-1746. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1141 Dynamics of Symbol Systems An Integrated Architecture of Cognition HASIDA. Koiti Institute for New Generation Conlputer Technology (ICOT)* Abstract To account for the diversity and partiality of information processing in the cognitive process, we need a design method for cognitive system without explicit stipulation of domain/task dependent information flow, together with a control scheme for partial information processing which does not commit us to global and crisp consistency or completeness. A computational architecture is proposed which consists of a first-order logic program with a dynamics. Information flow is controlled not by any domain/task dependent procedures but by a control scheme emergent from the dynamics. The declarative semantics of the logic program is defined by formulating the degree of violation in terms of potential energy, and a control scheme for both analog and symbolic inferences is derived from an energy minimization principle. This inborn integration of the control scheme with the declarative semantics guarantees a natural reflection of semantic relevance in inferences. Ideas underlying inference mechanisms developed so far, such as weighted abduction and marker passing, are captured in terms of such a dynamics. 1 Introduction It is practically impossible to delimit the information of the world potentially relevant to the benefit (typically. survival) of a cognitive agent. whereas the informationprocessing capacity of the cognitive agent is severely restricted. Here arises par'tiality of info'rmation: the information potentially relevant to the determination of a cognitive agent's action (including information processing) is only partially reflected in its actual behavior. Only very relevant information must hence be selectively reflected in the behavior of the cognitive agent. However, the distribution of relevant information. together with the degree of relevance, drastically changes depending on the context. Since only a very small part of the potentially relevant information is exploited at each context, dramatically different parts of the information must be exploited at different contexts. in 01'* From April of 1992, the author is at Natural Language Section. Electrotechnical Laboratory, 1-1-4 Umezono, Tukuba, Ibaraki :305 JAPAN. del' for the whole information the cognitive agent uses in various contexts to encompass as much of the relevant information as possible. This causes very diverse patterns of information flow, underlying the complex behavior of a cognitive agent. So cognition is complex, not entirely because the design of the cognitive agent itself is complex, but rather because it is situated in a complex world, which provides the diverse contexts of the cognitive agent's behavior. The cognitive agent is complex indeed, but still is far simpler than the behavior of the agent reflecting also the vastness of the world. To capture this situatedness and relative simplicity of a cognitive agent. the design of the cognitive system should largely abstract away the directions of information flow (the temporal order of actions, among others). The models which stipulate the directions of information flow (that is, pl'oced'Ural programming) quickly become untractably complex, attributing too much of the complexity of cognitive process to the complexity of thp cognitive system itself. and thus failing to capturp the situatedness of cognition. For instance, production systems (Anderson 1983) fail to serve as the functional architecture of cognition. This is where constraint paradigm comes 111. Constraint abstracts the direction of information flow away from the design of a cognitive modeL keeping the model within tractable complexity. attributing most of the complexity to the world, and thus capturing the situatedness of cognition. So the domain-dependent aspects of cogllitioll (lallguage. vision. etc.) should be designed basically in terms of declarative semantics rather than operational Sf'lllantics. Symbolic logic is a typical formalism for cledarativp design. Some sort of logic at least as powerful as firstorder predic ate ('aleul us is considered necessary to design a cognitive system capable of combinatorial behaviors such as language use. However. such a powerful formalism commits us to untractable computation for maintaining global cOllsistency, exhaustive examination of the possible hypotheses. and so on. This applies to whate\'er logics have ever been fabricated, including non-monotonic logic. probabilistic logic, fuzzy logic. paraconsistent logic. and so forth. There has been no formalism of logic which could support useful inferences under arbitrary sort of violation of' the constraint in question. The problem here is 1142 essentially that symbolic logics provide no control over inferences ~ther than closure operation (exhaustive inference). We need a declarative formalism which inherently supports partial and hence tractable computation, while approximately preserving the first-order expressive power and supporting diverse flow of information. To be useful at all, that computation must be about only very relevant information, which will lead to a diverse information flow sensitive to the context. To implement all this, the present paper considers a system of constraint represented as a first-order logic program, and postulates a" dynamics of this constraint. The degree of violation is captured in terms of potential energy, which is a real-valued function of the state of the constraint. The constraint is thus provided with a fuzzy declarative semantics which is finer-grained than the usual crisp semantics. An operational semantics is also derived from the dynamics. That is, control schemes for analog and symbolic inferences are obtained on the basis of energy minimization principle. Such an inborn integration of declarative semantics and inference method not only supports concise design but also guarantees natural reflection of semantic relevance in inferences. The rest of the paper proceeds as follows. In the next section we outline the combinatorial structure of the constraint. Section 3 provides a declarative semantics for this constraint. The components of the declarative semantics are each formulated in terms of potential energy. Section 4 discusses the field of force induced from the potential energy, and analog information processing driven by this field of force. It will be shown that associative inferences naturally emerge out of the dynamics. Section 5 defines a method of symbolic inference which is a sort of program transformation, and derives a control scheme for it on the basis of energy minimization principle. The proposed framework is pointed out to capture the ideas underlying some inference mechanisms tailored so far, such as weighted abduction (Hobbs et al. 1990, Stickel 1989) and marker passing (Charniak 1986, Norvig 1989). Section 6 concludes the paper. 2 A clause is written as a sequence of the included literals followed by a period. The order among literals is not significant. So (1) and (2) represent the same clause, which means (3) in a rough, crisp approximation. (1) -p(U,Y) +q(Z) -U=f(X) -X=Z. (2) +q(X) -p(f(X),Y). (3) VU, x. Y {-'p( U, Y) V q(X) V U =I j(X)} A clause containing a literal with empty sign is called a definition clause of the predicate of that literal. The meaning of such a predicate is defined in terms of completion based on its definition clauses. For instance, if the definition clauses of predicate p are those in (4), then p is defined as in (5). (4) p(X) -q(X,a). p(f(X)) -r(X). (5) VA{p(A)"{:? {3Y(q(A, Y) /\ Y = a) V 3X(A = j(X) /\ r(X))}} A definition clause of a zero-ary predicate true is called a top clause. A top clause corresponds to the query in Prolog. That is, top clause (6) represents top-level hypothesis (7).2 (6) true -p(X) +q(X,Y). (7) :lX, Y {p(X) /\ 'q(X, Y)} We postulate clause +true. to give rise to such a toplevel hypothesis. The computation is to tailor the best hypothesis to explain a top-level one. A constraint is regarded as a network. For instance, the following constraint may be graphically shown as in Figure 1. (i) +true -p(A) -q(B). (ii) +p(X) -r(X,Y) -p(Y). (iii) +r(X,Y) -q(X). Constraint Network A "constraint consists of clauses. A clause is a set of literals, and roughly means their disjunction, which is inclusive or exclusive to various degrees depending of their dynamical properties as discussed later. A literal is an atomic constraint preceded by a sign. An atomic constraint is an atomic j01'mula such as p(X,Y,Z) or an equation such as X= Y. Signs are' +' and '-' and st and for affirmation and negation, respectively. '+' is omitted in cases discussed below. Names beginning with capital letters represent variables, and the other names predicates. 1 1 A binding is also regarded as an atomic formula. For example, X=f(Y) is an atomic formula with binary predicate =f. Figure 1: Constraint Network. 2Theoretically, Prolog uses false instead of true here so that the negation of the top clause amounts to the top-level hypothesis. In our formulation. a top clause itself directly means a top-level hypothesis. 1143 In such a graphical representation, a clause is a closed domain containing the atomic constraints constituting that clause. Short thick arrows indicate references to the atomic constraints as positive literals in clauses. Atomic constraints without such indication are negative literals. An argument of an atomic formula is shown either as a '.' or as an identifier. Equations between arguments are links. Equations in clauses are called intraclausal equations, and those outside of clauses are called extra clausal equations. . We will write a 0 (3 to mean that atomic formulas a and (3 are unifiable. We regard each part of constraint network as a set of instances, and a 0 (3 as meaning that whether l(a) n 1((3) = 0 or not is unknown. lis an interpretation function which maps those instances to objects (state of affairs, in the case of atomic formulas) in the world. So unifiability is not transitive. We assume two atomic formulas are unifiable if and only if their corresponding arguments are directly connected through an extraclausal equation, and that every extraclausal equation connects two corresponding arguments of two unifiable atomic formulas. 3 For each zero-ary predicate. the constraint network contains only one atomic formula with it. 3 (8) T {.to log .l:o +T c, log ;l'o} Let us call this the normali::ation energy of o. For any v. v stands for 1 - l'. Here and henceforth. mathematical details are not very important; they are quite tentative indeed. The formulas are mainly motivated by convenience. In fact, the fancy outlook of (8) is for the computational ease of analog inference, though we do not go into details here. Disjunction of literals in a clause. The disjunction energy of a clause implements the ordinary disjunctive meaning of the clause. For instance. consider the following clause. (9) -p +q. The ordinary disjunctive meaning of this clause is that --,p or q is true. The disjunction energy of this clause as follows captures this meaning. (10) D is a positive constant associated with clause (9). (10) is small iff either .rp or ;rq is small; keep in mind that the activation values are between 0 and 1 due to the normalization energy. The semantics of (9) may be depicted like Figure :2. D in (10) represents how large the area b Declarative Semantics Now we move on to dynamics to define a declarative semantics for the constraint network described above. Each atomic constraint a has an activation value Xcn which is a real number such that 0 < Xa < 1 and ma.y be regarded as the truth value (or a subjective probability of the truth) of a. The potential energy of a constraint network is a function of the activation values, and represents the degree of violation of the constraint. The potential energy U of the entire constraint is the sum of the potential energy of the parts of the constraint. The declarative semantics of the entire constraint is decomposed into several aspects. U is a sum of terms each representing one such aspect, so that U captures the whole declarative semantics. Each term of [C is the degree of violation of the aspect of declarative semantics in question. These aspects are enumerated below and each formulated by a term of potential energy. Normalization of activation value. In order to normalize the activation value of an atomic constraint a so that 0 < Xa < 1, let us employ a standard sig1 ( ) and postulate Xa = moid function sg( x) = l+exp x sg( -FaiT) holds at equilibria of force, where Fa stands for the total force to a from outside of a, and T is a positive constant called the temperature. This amounts to assuming the following energy inherent in a. 3There can hence be O(N2) extraclausal equations, for N different atomic formulas sharing the same predicate. So an efficient encoding schema would be necessary to avoid that space complexity. We skip further details of this issue. Figure 2: Venn Diagram for (9). is in comparison with a in this figure. Mutual exclusion of literals in a clause. B:' 'lIlUtual exclusion' we mean that at most one literal may be true ill a cla.use. In the case of (9), the mutual exclusion will allow us to abductively assume p vvhen given q. and assume --,q when given --'p. The following term. called the f:.ulu.';ioll flurgyof (9), will take care of such inferen ces. (11) EXp:Tq E is a constant associated with clause (9). In Figure 2. E represents how large the area b is in comparison with c. If q means that you are in Japan, for instance. E is larger when p means that you are in Tokyo than when it means that you are in Imabari, a small city in the island of Sikoku. In the general form, the disjunction energy and the exclusion energy of clause consisting of literals ll' 1m are (12) and (13), respectively. 1144 (13) (12) E L TiYi 1'jYj ii=j Yi is the activation value of Ii. For any atomic constraint 0:, the activation value of literal +0: is defined to be X' tY and that of -0: is defined to be Xa' Ti is a constant such that 0 < Ti ~ 1, and is called the relevance coefficient of Zi' In the digital approximation, (12) means that at least one literal should be true, whereas (13) means that at most one literal may be true. Incidentally, it is due to exclusion energy that top clause (6) means (7). In (9), (10) and (11), II =-p, l2 =+q, Y1 = x p , Y2 = x q , and 1'1 = r2 = l. Completion of an atomic formula. We somewhat extend the notion of completion so that to complete atomic formula (not predicate) 0: positively (negatively) means that 0: (-'0:) should be inferred either ded ucti vely or abductively 4 on the basis other than the one on which 0: (-'0:) was first postulated. For example, if ~e have postulated q(X) (say, based on clause +p(X) -q(X)., abductively) and it is positively completed. then it must be inferred from another reason; typically, another atomic formula q(Y) could be closely related with q(X) (in the sense of assimilation to be discussed later) and is inferred on the basis of a clause such as +q(Y) -r(Y). deductively or -q(Y) +s(Y). abductively. As discussed later, completion implements assumability cost (Hobbs et al. 1990). The positive and negative completion energy of an atomic formula 0: are defined by (14) and (15), respectively. same arguments for the corresponding argument places. By relaxing this, we obtain the notion of assimilation: two unifiable atomic formulas should have similar truth values to the extent that they share the same assignment of the arguments. So for instance p(X) and p(Y) tend to have similar activation values if X and Yare linked with a strongly activated equation. To capture this. we postulate assimilation energy. Suppose 0: 0 d for two atomic formulas 0: and ,13, and let b be the extracla.usal equation connecting their i-th arguments. Then the assimilation energy of b is defined as follows. (16) -Arri C--'+ 'axa II-sapxp aop (15) C;;xa II sal'xp aop C+ and C-: are positive constants, and are called the p:sitive co~pletion coefficient and the negative completion coefficient of 0:, respectively. Sap is a constant called the subsumption coefficient of 0: as to ,13. Sap represents how close 0: is related to 13, as seen also in the formulation of assimilation below. 'We say 0: subsumes d to mean 1(0:) 2 1(,6). When 0: 0 ,13, Sap = 1 if 0: subsumes d. and otherwise Sap. = So for a small positive constant So. In the digital approximation, the positive (negative) completion energy means that some ,13 (-,;3 for some 13) satisfying 1600: should be true in order for 0: (-'0:) to be true:5 Since a subsumption coefficient usually equals to so, which is close to 0, completion energy and accordingly other types of energy often decrease if subsumption coefficients increase, which is caused by symbolic operation discussed in the next section. The dynamics for definition clauses may be defined on the basis of exclusion energy and completion energy. but we do not go further into details here. Assimilation between atomic formulas. Two unifiable atomic formulas are the same if they have the 41n this respect, only deduction is considered in Prolog. 5The completion in Prolog corresponds to our positive completion. In Prolog (3 must be deduced only. + S,3c» 1 Xs (xc> - '2)(x p - 1 '2) is a positive constant called the assimilation coefficient of the i-th argument place of the predicate 7r shared A1Ci bv 0: and 3. The assimilation energy roughly means that :1';3 should be similar (both close to 0 or 1) if X8 is close to 1. and vice versa. Transitivity of equality. A transitive cycle is a cycle of equations ~ = bo b1 . . . bk - 1 where either 6(i-1)modk or bimodk is an intraclausal equation for everyi. Note that no cycle of extraclausal equations is a transitive cycle. Transitivity of equality as to 6. is regarded as excluding the cases where just one equation in 6. is false. To capture this, we define the transitivity eneT'9Y Ut::. of 6. as below. x~-y and (17) U t::. (14) (StY!3 = { -tIl(ei - 0) a (ei < 0 ~or at most one i) (otherwIse) is the activation value of bi , and 0 is a constant such that 0 < 0 < 1. t is a positive constant called the transitivity coefficient. Note that the transitivity energy is large when just one equation in 6. has a small activation value. Since detection of cycles is a very costly computation, we will have to consider some approximate method for efficient processing of transitivity energy instead of guaranteeing perfect detection of transitive cycles. We do not go further into such implementation details. ei 4 Analog Inference Potential energy gives rise to a field of force to change the state of the system so as to decrease the total potential energy. Suppose there are n distinct atomic constraints in the given constraint. and hence n activation values, Xl through Xn. Then the current analog state of the system is regarded as a point (18) in the n-dimensional Euclidean space, and the global potential energy C defines a field offorce (19). ( 18) (19) F= ( -:~ au -aXn ) 1145 F causes sp1'eading activation: when F #- 0, a change of so as to reduce U influences the neighboring parts of the constraint network, which causes further changes of activation values there, and thus state transition propagates across the network. In the long run, the assignment of the activation values will settle upon a stable equilibrium satisfying F = 0, under an appropriate scheme of spreading activation. The resulting state gives a minimal value of U. 6 That is, it satisfies the constraint best in some neighborhood. Let us look at some typical patterns of analog inferences emerging from the dynamics through spreading activation. First, the dynamics gives rise to associative inference based on syntactic similarity. Suppose for instance that, as in Figure 3, the extraclausal equation Xi the first sentence. There is an attachment ambiguity in the second sentence. about whether the prepositional phrase with it modifies saw or a man. Let us assume that the structure of the constraint generated by processing this discourse looks like Figure 4. Each region a telescope "'. ,,' _-----(a) Tom -___ ~.__::_--~~~~:--~~~~~~-- \" take(,) Figure 4: Semantic Association Concerning (20). Figure 3: Association due to Syntactic Silllilarity. connecting argument A of p(A,B) and argument C of p( C, D) is included in a transitive cycle as shown in the figure, and that the activation value of every equation in this cycle is greater than (). Then {j is excited due to the transitivity energy. This raises the tendency (due to the assimilation energy of (j) for peA, B) and p( C, D) to have similar activation values. Thus the assimilation energy of the extraclausal equation (J between Band D makes (J to have a high activation value, provided that the equations in the transitive cycle involving (J as shown in the figure are all highly activated. So each equation in a transitive cycle including (J could be excited even stronger due to the transitivity energy. This might make other pairs (such as the two q( .,.)s in Figure 3) of atomic formula with corresponding arguments on that transitive cycle have similar activation values, and so on. In general, two syntactically similar combinations of atomic constraints thus tend to have similar activation patterns, corresponding parts exciting each other or inhibiting each other. Transitivity energy also enhances semantic aSSOCIation. Consider the following discourse. (j (20) Tom took a telescope. He saw a man with it. We assume that he and it in the second sentence are anaphoric with Tom and the telescope, respectively, in 6When if is not entirely attributed to potential energy, spreading activation is not guaranteed to converge into a stable equilibrium but may exhibit chaotic behaviors. Such a less restricted system may be more powerful and useful, but that is beyond the scope of the present discussion. in a dashed closed curve represents a cluster of clauses. Thest' clauses have been created by symbolic inference as described in the next section. (a) is a set of clauses including the top clause. (b) and (c) represent two alternative readings of the second sentence of (20), each derived bv backward (abductive) inferences. The take(.,.) in (a) is" a part of the hypothesis obtained by interpreting the first sentence. Its first argument stands for Tom and the second argument the telescope, so that the whole thing means that Tom takes the telescope at some time. Thus, reading (b) means that Tom has the telescope when he sees the man, and (c) that the man has it when Tom sees him. Clause (d) is an inference rule to the effect that if A takes B then A will have B.7 Due to this inference rule, the take(.,.) in (a) can imply the have(.,.) in (b) but not that in (c), so (b) is more plausible than (c). Note that there are two transitive cycles both going through the take(.,.)s in (a) and (d). So these two atomic formulas tend to strongly excite each other due to assimilation energy, provided that every relevant equation is excited. These two cycles also both go through the have(.,.)s in (b) and (d), making them tend to strongly excite each other. too. On the other hand. there is only one transitive cycle which goes through both the have(.,.)s in (c) and (d). Hence the associative inference based on the take(.,.) in (a) through (d) supports the have(.,.) in (b) more strongly than it supports the have(.,.) in (c). 'We ignore the temporal relation between the taking and the having here. 1146 5 Symbolic Inference We consider just one type of symbolic operation called subsumption. It is a sort of program transformation to create a new subsumption relation. A subsumption operation concerns 'a pair of unifiable atomic formulas. As shown in Figure 5, subsumption operation from atomic Figure 5: Subsumption Operation From Atomic Formula to (3. Q formula Q to (3 divides (3 into (3' and ;3". (3' is the maximum subset of (3 subsumed by Q, and (3/1is the rest of ,8: beta/l = (3 - (3'. Neither Q nor (3' is hence unifiable with (3/1, as indicated in the figure. If it is somehow known that Q subsumes (3 from the beginning, then no copy (division) need to happen. When the division of (3 actually takes place, then it causes a duplication of the clause containing (3, atomic formula ~ accordingly dividing into and Unlike in the division of (3, and e' are unifiable both with each other and with all the atomic formulas unifiable with ~, because there is no reason to believe I(e) n I(C) = 0,and so on. We omit further details of combinatorial aspects of symbolic inference, due to the space limitation, and go on to the dynamical aspect. Subsumption generates new atomic constraints and thus redefines U. Sext3' is set to 1, because Q subsumes (3'. S~'~II and s~lIe are both set to So, because we are not sure about the subsumption relation between these atomic formulas. The other coefficients are simply inherited along with the copy of the part of the constraint network. Since subsumption is a local operation, it may take place in parallel at many different places. Now we consider how to guide such computation based on the dynamics, without recourse to any centralized control. As the preference score for a subsumption, we could use the expected contribution of that subsumption to reduction of U at the equilibrium of spreading activation. As mentioned above, a subsumption from atomic formula Q to (3 divides j3 into (3' and (3/1, setting Sext3' to 1. The expected influence of this to reduction of the total energy could be estimated by - ",EJP , where P is defined to be the e e'.8 e VSo./3 8If a and (3 belonged to the same clause, then a is also divided into a' and a", If a' and (3' belong to one clause and hence a" and .t3" belong to another, then a' and (3" subsume each other and 0"1 and .13" are not unifiable, minimal Co (['0 under the condition F = 0) in a neighborhood of the current x. Uo is a representative part of energy whose definition is not changed due to symbolic operations. For instance, it could be the disjunction energy of clause +true.. At any rate, the symbolic computation is controlled so as to minimize some part of energy, whereas the analog computation to minimize the whole energy. By employing generalized backpropagation (Pineda 1988), vSo./3 ",op can be efficiently computed for all 8 0 3. The space complexity of that computation is linear with regard to the size of the constraint network, and its parallel time complexity practically constant. See APPENDIX for mathematical details. Our method implements some important features of other inference mechanisms proposed elsewhere. First, weighted abduction (Hobbs et al. 1990, Stickel 1989) emerges from our method. In weighted abduction, just as in the current framework. one at tempts to tailor a best hypothesis to explain the observed fact. A hypothesis is a conjunction of (negated) atomic formulas. Each conjunct in a hypothesis is assigned an assumability cost, which is a. cost of assuming the conjunct. A hypothesis is better when the total assumability cost is smaller. Assumability cost may be reduced by unifying the conjuncts. For instance, if the current hypothesis contains p(A) and p(B) one of which has a large cost, then this cost will be reduced by unifying them. Assumability cost is inherited through abduction. For example, a cost of p(A) in the current hypothesis is inherited down to q(A) and r(A) when p(A) is resolved by clause +p(X) -q(X) -r(X) .. Assumability cost is basically captured by our completion energy: the conjunct in question must be inferred otherwise than the way it was first postulated, or it would be inhibited due to its completion energy. So an inherent cost is encoded by a completion coefficient of atomic formula Q. This gives rise to a. high preference score of subsumption from Q, because if Q comes to subsume another atomic formula (3 then perhaps the completion energy of Q is reduced due to Sap = 1, which will be indicated bv a large value of - vS ",oP . An inherited cost is a /3 captured along the same line. For example, when p(A) with a large cost subsumes p(X) in clause +p(X) -q(X) -r(X)., the completion energy of p(A) is probably still large, but it will decrease if q(X) and r(X) get more excited. So the preference score of subsumptions from q(X) and r(X) tend to be large, corresponding to the inherited cost in weighted abduction. Our framework is more flexible and dynamic than weighted abduction. That is, we allow inferences concerning a hypothesis to influence the state of other hypotheses, whereas in weighted abduction assumability costs change only due to unification involving the atomic formulas carrying those costs. So our method is more appropriate to account for such phenomena as belief revision. In this connection, our dynamical semantics is much more general than the probabilistic semantics of v 1147 Charniak and Shimony (1990), which is restricted to propositional Horn logic. Second, marker passing (Charniak 1986, Norvig 1989) may be also understood as an emergent property of the dynamics, along the same line as above. Consider the following discourse for example. (21) Taro got a book. He paid one thousand yen. Figure 6 shows the network involved in the abductive in- G1 P1 ! instance get SUbC~ instance! pay Ament buy Figure 6: Marker-Passing for (21). ference to assume that Taro bought the book. In the left is the marker-passing network encoded9 by the constraint network, which is in the right. A node in marker passing network corresponds to an argument or a predicate in our constraint. An edge between an argument node and a predicate node represents that the argument satisfies the predicate, and an edge between two predicate nodes represent a clause referring to the two predicates. The directions of the arrows are static, and irrelevant to the direction of marker passing. get(GI) and pay(Pl) are created upon reading/hearing (21), where Gl and PI stand for the event of Taro's getting a book and that of his paying money, respectively. In marker passing, the abductive inference of Taro's buying the book will be suggested by a collision of markers passed down from GI and PI along the path between them. In our framework, the same abductive inference consists of three subsumption operations along the (copy of) extraclausal equations in the right of Figure 6. The preference scores of these subsumptions are probably all high, because of the path of clauses between get( GI) and pay(PI). If the activation value of get(GI) is larger than then it excites get(E) due to assimilation energy, get(E) excites buy(E) due to exclusion energy, buy(E) excites buy(B) due to assimilation energy, and buy(B) excites pay(P) due to disjunction energy. get(E) is similarly excited indirectly by pay(PI). So get(E), buy(E), buy(B) and pay(P) are excited stronger than when there were no such path. The subsumptions along the extraclausal equations in the right of Figure 6 are therefore very promising for reduction of positive completion energy, so that the abductive inference mentioned above is suggested. A path of clauses between two very informative atomic formulas (ones with activation values close to 0 or 1) thus tends to raise the preference for subsumptions along it. !, 9Charniak (1986) employs a similar encoding scheme. This is what marker passing is designed to capture in general. Of course how much the preference for subsumption increases depends on the dynamical properties of the path. For instance, the path in Figure 6 would not indicate the above abductive inference if the exclusion coefficients of the two clause~ are small.lO Suggestion of inference also depends on the length of the path. Obviously. shorter paths more readily suggest inferences. Subsumptiolls can also be promoted by associative inferences discussed in the previous section. because a subsumption between two atomic formulas will strongly affect P when SOllle ofthe extraclausal equations between them are strongly excited owing to transitive cycles involving them. See Hasida (1991) for how generation of natural language sentence is controlled by heuristics regarded as approximating our control sche~e taking such associations into account. 6 Concluding Remarks We have discussed a framework of constraint for designing a cognitive system. To capture the partiality and the corresponding situatedness of cognition, the co~straint is situated in a field of force derived from potential energy representing the degree of violation. This field of force gives rise to analog inference as spreading activation, and also controls symbolic computation to transform the constraint. Not only nearly logical inferences and abductive inferences but also associative inferences emerge out of such a dynamics. A distinguished feature of our framework is that the control scheme for inference is derived from a dynamics which also provides the declarative semantics. In comparison, the other frameworks such as marker passing stipulate the inference control apart from the declarative semantics. The inborn integration of declarative semantics and inference control as in our method will not only provide a clear perspective of the design, but also guarantee emergent reflection of semantic relevance in information processing. In this connection, our method is integrated also in another sense that it controls analog and symbolic inferences based on the same dynamics. This is a strong advantage over the methods such as in Waltz and Pollack (1985) which separate the two inference schemes. The current framework should be extended with respect to several points. First, some partial processing method is necessary for dealing with transitive cycles, although at any rate a massively parallel computational system is essential to implement our theory. Second. deletion should be incorporated in addition t~ subsumption, in order to prevent the constraint network from unlimited growth. Probably deletion is regarded as a reverse of subsumption, and hence the control of deletion lOWhat Charniak (1986) calls isa-plateau can be understood along the same lin",. 1148 may be formulated along the same line as that of subsumption. Third, the control method should take into account consistency checking as well. Consistency checking pertaining to binding is discussed in Hasida (1991).11 In order to handle consistency checking in general, we will have to give preferences not only to subsumptions which seem to decrease P but also to those which seem to increase P. Finally, learning is vitally necessary for both the coefficients (Suttner and Ertel 1990) and the symbolic structure of constraint. Further scrutiny is open with regard to the role of the dynamics in learning. The equilibrium condition of the spreading activation concerning is regarded in general as = :ij, where .ij is a vector function of x and ,5. 12 Let s be a parameter in S. When is differentiable. we get (22) where ~ is defined by (23). x x il (22) ( ~. ) aXn ( aXn aYn aXl aYn aXn (24) follows from (22), where I is the n-dimensional unit matrix. ax = (I _ail)-1 ail as ax as Let H be a scaler function of x and 5, and P be a scalar function of § such that P = H when x = :ij. Where H (24 ) is differentiable, we get the following. OP oH 0(25) -as -- _ ox 2 as Here + -oH as oH ( 0-)-1 0- = u£:\x- I - ~xu _ ;:j - N ~s u oH + ~s u as + oH as Qi z is defined by (26), from which we obtain (27). (26) .... = z aH (I _ail)-l ax ax 7' ail aH - - - ax + ax (27) 7'- z Thus, is computed via spreading activation based on (27). So as a whole we are to do double-layered spreading activation, the first layer for x and the next for z. We omit mathematical discussions on the convergence of spreading activation. Finally, ~~ can be obtained from (25). We have avoided calculating ~, which would be a is not zero for very complex computation. Note that most Xi, whereas ~ and fs are sparse. W a~ a~ 11Treatment of binding could probably be ascribed to the general case of consistency checking plus transitivity energy. 121n the current formulation, Yi l+e£p(Y,)' where Y; is a polynomial not involving Xi. = Charniak. E. and Shimony. S. E. (1990). Probabilistic semantics for cost based abduction. In Proceedings of AAAI '90. pp. 106-111. Hasida. K. (1991). Common heuristics for parsing, generation, and whatever ... In Proceedings of the Workshop on Reversible Grammar in Natural Language Processing, Berkeley. . .. aYI) ~Yl aXl Anderson. J. R. (198:3). The Architecture of C:ognition. Harvard University Press. Cambridge. Charniak, E. (1986). A neat theory of marker passing. In Proceedings of AAAI '86, pp. 584-588. ail ax + ail axas as = as (23) The author would like to thank DEN Yasuharu, Mark Gawron, Jerry Hobbs, NAGAO Katashi, NAKASHIMA Hideyuki, Manny Rayner, Ivan Sag, TUTIYA Syun, and Remi Z~jac for valuable discussions and comments. He is also indebted to ASOH Hideki for drawing his attention to generalized backpropagation. Special thanks go to NISHIKAWA Noriko, NAGATA Kazumi, and MIYATA Takasi for implementation works. References APPENDIX ax as = Acknowledgments Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1990). Interpretation as abduction. Tech. rep. Technical Note 499, SRI International. Norvig, P. (1989). Marker passing as a weak method for text inferencing. Cognitive Science, 13, 569-620. Pineda, F. J. (1988). Generalization of Backpropagation to Recurrent and Higher Order Neural Networks, pp. 602-611. Stickel, M. E. '(1989). Rationale and methods for abductive reasoning in natural-language interpretation. In Studer, R. (Ed.), Proceedings, Natural Language and Logic, International Scientific Symposium, No. 459 in Lecture Notes in Artificial Intelligence, pp. 233252. Springer Verlag. Suttner, C. B. and Ertel, W. (1990). Automatic acquisition of search guiding heuristics. In Proceedings of the 10th International Conference on Automated Deduction (CADE) , pp. 470-484. Waltz, D. and Pollack. J. (1985). Massively parallel parsing: A strongly interactive model of natural la~­ guage interpretation. Cognitive Science, 9,51-74. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1149 Mental Ergonomics As Basis For New-Generation Computer Systems M.H. van Emden Logic Programming Laboratory, Department of Computer Science University of Victoria, Victoria, B.C., Canada V8W 3P6 vanernden@csr.uvic.ca Abstract Reliance on Artificial Intelligence suggests that Fifth Generation Computer Systems were intended as a substitute for thought. The more feasible and useful objective of a computer as an aid to thought suggests mental ergonomics rather than Artificial Intelligence as the basis for new-generation computer systems. This objective, together with considerations of software technology, suggest logic programming as a unifying principle for a computer aid to thought. 1 Introduction When surveying the literature on computing, it is remarkably difficult to find work directly aimed at making computers usable as a tool for thought. Even when we go to publications specialized in Artificial Intelligence, we find mostly work aiming at simulating or automating human intellectual functions, but very little on how to use computers to augment the intellect in the way envisaged by pioneers such as Licklider, Engelbart, Taylor, and Kay. Until recently it was understandable that the goal of augmenting the intellect had to be deferred, and that top priority had to be given to the development of hardware and systems software that provided a functional basis on which to proceed towards the main goal. I believe that this basis now exists and that therefore the top priority in computing should be to use the existing machinery to make computers available as tools for thought. It seems, however, that at present it is still top priority to make computers faster, bigger, and cheaper. This can only be explained as a form of inertia: we feel comfortable in an enterprise blessed with past and continuing success and it is painful to change emphasis, even if it is towards what is now really important. Let us then take stock and see what progress has been made towards providing the basis for the goal of making a computer into a tool for thought. The hardware dreamed of by the pioneers has arrived: fast processors, large memories,sophisticated and comfortable displays, high-performance networks; all of this available in thousands of enterprises and institutions. Still, we do not use computers as tools for thought in the way the pioneers envisaged. Were they unrealistic in their expectations? Or is it the case that the remaining obstacles can be overcome? I believe that the latter is the case, and that the remaining barrier is the difficulty of using software. What is from the larger viewpoint but a tool among many, such as, for example, a database program, is so complex that it comes with a fat manual and a programming language of its own, so that to become an effective user is almost a career in itself. This is but one example of many, of a wider phenomenon I refer to as concept fragmentation: that each job seems to require its own special-purpose solution; even worse, that the same job in a different context requires a different solution. This, not hardware, is now the main barrier standing in the way of using computers to augment the intellect. In this paper I will argue that the best bet for a unifying principle to overcome this barrier is logic programming. As Artificial Intelligence often comes up in discussions about how to make computers easier to use, it is important to distinguish the roles to be played by Artificial Intelligence and Mental Ergonomics l . For this reason, I sketch the argument from scratch: why there is reason to believe that computers can be tools for thought, before going on to explain in what way logic programming, as ergonomic principle, can help demolish the main barrier now holding up progress. 2 A synopsis of the argument This section is what is sometimes called "executive summary." Each paragraph summarizes one of the following sections of the paper. 1 Webster's Third International Dictionary: ergonomics an applied science concerned with the characteristics of people that need to be considered in designing and arranging things that they use in order that people and things interact most effectively and safely. 1150 Towards Computer-Aided Thought. Writing is paper-aided thought: while we can do simple sums in our head, we need help for more complex ones; help offered traditionally by writing on paper.' Similarly, we can do simple thoughts in our head; to work out complex thoughts, such as plans, proposals, essays, reviews, critiques, we need writing. Computers are now widely used as a more convenient writing tool than a pen on paper. The availability of programs for spreadsheets, databases, and communications provide a tantalizing glimpse of a more powerful aid to thought than the pen on paper ever was. Such a potent new mixture deserves a name. I chose one suggested by the familiar concepts of Computer-Aided Instruction (CAl) and Computer-Aided Design (CAD). I call it Computer-Aided Thought, CAT for short. Today's laptop computers already pack the hardware required to support a powerful CAT system. Thus you will be able to take it wherever you go, like a Sony Walkman. Let us call'such a package "CATMAN." And hardware trends suggest that CATMAN will be widely affordable, giving unprecedented power to intellectual workers of all ages: school children as well as professionals, business persons, and scientists. Why Computer-Aided Thought is an underdeveloped area. In spite of spectacular advances in computing, both in large systems and in personal computers, no one, not even the most privileged researcher, has a computer available as a congenial tool for intellectual work. At best she 2 can call on a hodge-podge of language processors, databases and application packages requiring a bevy of system gurus at her beck and call if she is to avoid devoting her career to mastering the mechanics of the various systems. Improvement is a matter of ergonomics, not AI. A congenial tool for intellectual work needs to handle a variety of tasks: including database work, text processing, communication, constraint exploration, developing algorithms. This diversity of tasks has caused a proliferation of languages so that logically identical tasks need to be done in an exasperatingly different way, just because they occur in a different application. The problem is one of ergonomics (in this case Mental Ergonomics): the lack of a unifying concept makes current program interfaces conceptually fragmented. This is where we should look in the first place for help, rather than to Artificial Intelligence. Logic programming meets the requirements of Mental Ergonomics. I mention some ergonomic principles that help to make computers easier to use and 20r "he." Here, as elsewhere in my writings, no gender is to be inferred when none is implied b.y the context. review results showing how such principles can be implemented by means of logic programming. 3 Towards Thought Computer-Aided What is "thought"; what is "intellect"? Why do I consider writing "paper-aided thought"? The analogy about complex thought spilling over to paper, just as complex sums do, is due to Susan Horton [1982], who took as starting point the familiar phenomenon that we don't sit down to write an essay with its main line of reasoning ready in our head. Instead, we only discover what we want to say as a result of initially unsuccessful and often frustrating attempts to write down inchoate, preliminary versions. In this way, Horton concludes, writing allows us to have thoughts too hard to do in our head. Another way of expressing Horton's idea is to say that writing "augments the intellect." In 1963 Douglas C. Engelbart published the paper" A conceptual framework for the augmentation of man's intellect" [Engelbart 1963]. Its first paragraph, which I quote in full, contains a better description of "aids to thought" or "augmentation of intellect" than I can give. By "augmenting man's intellect" we mean increasing the capability of a man to approach a complex problem situation, gain comprehension to suit his particular needs, and to derive solutions to problems. Increased capability in this respect is taken to mean a mixture of the following: that comprehension can be gained more quickly; that better comprehension can be gained; that a useful degree of comprehension can be gained where previously the situation was too complex; that solutions can be produced more quickly; that better solutions can be produced; that solutions can be found where previously the human could find none. And by "complex situations" we include the professional problems of diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers - whether the problem situation exists for twenty minutes or twenty years. We do not speak of isolated clever tricks that help in particular situations. We refer to a way of life in an integrated domain where hunches, cut-and-try, intangibles, and the human "feel for a situation" usefully coexist with powerful concepts, streamlined terminology and notation, sophisticated methods, and high-powered electronic aids. Conventional computer applications are much further developed in the area of what Engelbart calls "powerful 1151 concepts, streamlined terminology and notation, sophisticated methods." I regard the area of "hunches, cutand-try, intangibles and 'human feel for the situation' " the one where writing helps as an aid to thought in the sense of Horton. Engelbart's vision will be realized when a computer can be used as a congenial tool for writing, giving fluent access to spreadsheets, databases (local as well as remote), numerical and statistical libraries and so on. Other prophetic early papers that improved upon much later thinking are by Bush [1945] and Licklider [1960]. For an overview of Engelbart's subsequent work in the Augmented Knowledge Workshop, see [Engelbart 1988]. 4 Why Computer-Aided Thought is an underdeveloped area A present-day personal computer can provide a word processor, a spelling checker and a thesaurus. This combination is a powerful advance over pen and paper, and therefore qualifies as a correspondingly powerful aid to thought. Personal computers can also run packages for databases, spreadsheets, and computer mail. But although these are potentially valuable extensions, the resulting combination is not easy enough to use to qualify as a computer tool for thought. The existence of the components conjures up a tantalizing vision of such a tool, but the reality of ergonomics turns the vision into a mirage. To appraise the situation, let us consider what personal computers have given us, and what's lacking. What we do have. The amazing thing about personal computers is that they have caused such large step in the direction of a computer tool for thought. The early computers were operated in a closed shop, to which users submitted their jobs, which were collected in batches and run. Timesharing brought a dramatic change: from a turnaround time of hours to instantaneous interaction. Effectively, the timesharing user has the machine to himself. And even in the sixties, these machines were not small compared to contemporary personal computers. So it was not obvious that a dramatic change would result from the next step, the introduction of the personal computer. Yet .they made an enormous difference and that was because of their low cost. This has two effects: 1. Small firms and individual software designers can afford machines of their own. 2. The potential financial rewards in the software market became much greater. The result should convince sceptics of the power of a sufficiently free market: it resulted in an unprecedented improvement in user ·interfaces. This is the more amazing when we realize that since the early sixties timesharing computers have been used for word processing. These installations commanded the best in programming talent and were largely devoted to research. Yet nothing was produced that can compare with the better word processing packages that appeared on the market soon after personal computers took off. For most of the 1980's, Unix-based workstations, with more powerful hardware than personal computers, had word processors worse than those on personal computers. Spreadsheets are an even more striking example. This type of software, now considered obvious and indispensable, was not even known before the advent of personal computers. Even a loaded PC does not come close. Yet, even this progress still falls far short of what is necessary to make a personal computer a congenial tool for thought. Progress has been in the application packages separately, not in ways to integrate. Consider, as an example of the need for integration, an engineer in his daily activities. He makes calculations, searches tables, standards, textbooks, draft reports, receives and sends mail, retrieves and studies drawings and textual library material, accesses databases (local and remote) and so on. In all these activities separately, computer programs exist that can help. The rapidly falling cost of hardware makes these programs potentially accessible to every engineer. But this is a mixed blessing: if he is to utilize the full potential of all available computer tools, he needs several specialists in attendance, to be available at a few seconds' notice. Even then he will not be able to use the computer as a truly congenial tool: that is only possible when no intermediary is needed. At present he needs an intermediary for many applications because of the complex and idiosyncratic interface provided by the required software. And it does not help that every application package comes with its own, unique programming language, so that two logically identical jobs in different applications need to be done in an exasperatingly different way. Of course the situation sketched here is not unique to engineers, but is shared by professionals in public or business administration and in scientific research. What's lacking? A plan for improvement has to be based on a diagnosis of what is wrong. One common diagnosis can be summarized as: Computers are difficult to use because they are not enough like humans. To make progress we must make them more like humans. Therefore, before we work directly towards a congenial tool 1152 for mental work, we need progress in Artificial Intelligence. But another diagnosis is possible: Computers are difficult to use because they are not enough like automobiles, that is, they are not a tool that one can easily learn to use as an extension of oneself. To make progress towards a congenial tool for mental work, we need to work on the ergonomics of interfaces to software. The Japanese Fifth Generation Computer System project [Moto-Oka 1982] is based on the first diagnosis. I will argue that to make most rapid progress towards CA TMAN we must work on ergonomics rather than on Artificial Intelligence. Moreover, that via ergonomics progress is predictable and will be rapid, as it will be a matter of elaborating existing software technology. In comparison, progress in Artificial Intelligence seems unpredictable: the required results may indeed be around the corner, or it may be a long time before they materialize (if at all). 5 Improvement is a matter of ergonomics, not AI We saw that what stands in the way of CATMAN can be diagnosed as either a problem in Artificial Intelligence or as a problem in mental ergonomics. There are two episodes from the past that should help in deciding which diagnosis is more fruitful. In the late 1940's influential administrators perceived an acute shortage of experts available to translate Russian scientific publications into English. In that period there existed considerable optimism about the feasibility of fully automatic high-quality translation, resulting in several well-funded research projects. Lack of progress in the fifties, combined with devastating criticism [Bar-Hillel 1964] of the scientific basis of machine translation, caused funding to be withdrawn. Let us consider two possible reactions to this failure to get computers to alleviate the shortage of translators. Reaction 1: The funds should have been spent on Artificial Intelligence. The consensus that emerged in the fifties and caused the demise of projects aiming at fully automatic high-quality translation, was that the text to be translated had to be understood, at least to a certain extent, by the translating agent, human or machine. Machine translation was therefore seen as a problem in Artificial Intelligence. Getting a computer to help in translation was therefore premature - progress in Artificial Intelligence was needed first. Reaction 2: The funds should have been spent on ergonomics. In the fifties, when the objective was to use computers to help alleviate the shortage in translators, the technology available to translators consisted of a typewriter (electro-mechanical at best) and some well-thumbed reference books. In the early eighties, after machine translation had long been forgotten, and in response to different pressures, there evolved a set of computer tools that have enormously increased the productivity of translators: word processors, spelling checkers, thesauruses, dictionaries, checkers of style and diction. Such software could have been built soon after 1960 when the first time-sharing systems became available. Thus in 1960, when it was clear that the approach to machine translation taken in the fifties was doomed to failure, it would have been possible to go on to achieve great increases in productivity at low cost. Instead, it was concluded that the least tractable stage of translation was to remain to receive top priority and that research in A rtijicial Intelligence was to be motivated in part by the desirability to use computers to increase productivity of translators. A lesson to be learned from this episode. There is a similarity between the situation now, in which we suspect that computers can do more to help mental work than is actually the case and the situation in the fifties when it was hoped that computers could help in translation. In the case of translation, the least tractable aspect of the work was selected, leading to Artificial Intelligence. In retrospect, more tractable, even mundane, aspects (namely ergonomics) could have been selected with great success, not only to increase the productivity of translators, but of other office workers as well. Similarly, when considering how to make a computer into a congenial tool for mental work, there seems to be a great temptation to fall into the same trap: to view Artificial Intelligence as panacea. To end this section on a positive note, I will conclude with an episode from the past where the right alternative was selected. There was a time when automobiles were difficult to use, for several reasons: for example, because of frequent need for tuning, maintenance, and repair. At that time, a Plutocrat requiring transportation solved this problem by retaining a chauffeur and a mechanic (ideally, but not necessarily, the same person). When considering. obstacles preventing more widely available transportation by automobile, the following diagnoses are possible: 1. build robot chauffeur-mechanics 2. make automobiles easier'to use, so that the chauffeuring can be done by the person to be transported and so that only an occasional visit to a garage is required for tuning, maintenance, and repair. 1153 The Japanese FGCS project 3 has selected an alternative in the spirit of the first [Moto-Oka 1982]. 6 Logic programming meets the requirements of Mental Ergonomics In this section I review some of the basic principles of Mental Ergonomics and comment how logic programming can help implement them in CATMAN: Avoid trying to do two things at a time, Allow the user to do the same thing in the same way (if desired), Exploit useful conservatism, and A void harmful conservatism. Avoid trying to do two things at a time. It is bad ergonomics to define a programming language in such a way that the declarative and the imperative aspects of programming are not easy to separate. Rather than to attempt to define these aspects, I will illustrate them by the example of computing an arithmetic expression using register-to- register machine operations. Two tasks have to be distinguished here: • To make sure that the correct expression is evaluated ( what is computed; this is the declaraiive aspect of programming). What the correct expression is, is only determined by the application, independently of the machine on which the computation is to be performed. Thus, this task can also be thought of as that of solving the application problem. • To determine the sequence of register operations and transfers required to get the correct value in the desired location (how it is computed; this is the imperative aspect of programming). This task contains the above task. What belongs to this task over and above the application problem is to control the machine. To get this additional aspect right is to solve the control problem. The example of arithmetical expressions is useful because every programming language allows these to be evaluated without having to solve a control problem. Thus, every programmer, at the level of assembler language and up, is familiar with declarative programming. The problem with conventional languages is that this type of programming is only possible with arithmetic expressions, on which the programmer spends only a small proportion of his time. Most of the work requires areas of the language where the application and control aspects 31n its original 1982 version [Moto-Oka 1982]. What the project has actually done since then is more sensible. In fact, they have been a prolific contributor to logic programming, my proposed technical basis for CATMAN. But the preoccupation with parallelism and with big machines remains, and this can only be traced back to the initially intended role of Artificial Intelligence. of the task are intimately intertwined. As a result, it is possible for an error in control to cause a wrong answer. As a result, a programmer in such a language is forced to try to do two things at the same time. Logic as a programming language allows a decomposition of an algorithm into what Kowalski [1979] calls its logic component (corresponding to the declarative aspect) and its control component (corresponding to the imperati ve aspect). A consequence of Kowalski's approach is that the declarative and imperative aspects are separated, so that an error in control cannot cause an erroneous answer to appear; at worst it will cause failure to find an answer. The advantage is that there is no need to solve the application and control problems at the same time. Allow the user to do the same operation in the same way (if desired). In the existing personal computer systems, the closest approximations to CATMAN require the use of separate programming languages for databases, spreadsheets, intensive numerical computation, system programming, document preparation, the shell, and perhaps even other ones. This means that the same operation (such as procedure and data declaration, procedure call, case selection, iteration, and so on) has to be done in a different way in each of these different languages, violating a principle of ergonomics. It has been shown that logic programming can be the basis of many different types of programming language: functional [Cheng et al. 1990], imperative [van Emden 1976, Rosenblueth 1989], objectoriented [Davison 1988], stream-oriented [Taylor 1989], as well as for database querying [Ceri et al. 1990]. Of course, within this framework there are still many opportunities for violating the ergonomic principle by undue proliferation of variety. But by using logic as common framework for whatever different languages are needed, improvement is made easier. Exploit useful conservatism. A conceptual interface represents a beneficial kind of conservatism: it is an interface modelled on a familiar concept so that the known operations on the concept can serve as model for the computer operations that need to be learned. The prototypical example of a conceptual interface is the WYSIWYG editor, where the familiar concept is a sheet of paper. Examples of conceptual interfaces that fit well in logic programming are: the conversational partner, the pocket calculator, spreadsheets and tables, and the filling in oj blanks. I elaborate on these below. Lack of a conversational partner as conceptual interface can lead to the kind of frustration eloquently voiced by John McCarthy, who complained4 that even to get a computer to acquire a simple symbol manipulation skill 41n debate with Sir James Lighthill on BBC TV in 1974. 1154 is like having to perform brain surgery. He explained that the goal of his research is to program computers in such a way that one just needs to tell them. That is, to model the interface on that of a conversational partner. This ideal has been realized to a certain extent by MYCIN [Shortliffe 1976], an early expert system. The user interacts with it in the following way. If the user lacks information, he asks a question to which the machine may respond with an answer. If the user knows something that the computer doesn't, he tells a fact, or a rule. If the user is puzzled by an answer, then he can request an explanation, which comes in the form of facts and rules chaining the facts to the answer. Sergot [1982] and Shapiro [1983], have shown that this conceptual interface finds a natural home in logic programming. They added to the initial version embodied in MYCIN the possibility of making the computer and user play symmetrical roles. The programming language LISP is an example of a conceptual interface, albeit in an inverted way. The basic interaction mode in LISP does not need to be learned because it is the same as that of a pocket calculator: enter expression to be evaluated, get in return its value. The curious inversion lies in the fact that the familiiH concept, the pocket calculator, is of more recent origin than the beneficiary of the conceptual model, namely LISP itself. In the early days, computers were used in a rigidly planned way. With the advent of time-sharing, users were given the illusion of having a machine of their own, allowing in principle an intimate, interactive, and spontaneous use. Software to exploit this possibility was slow in coming: only with the advent of programs modelled on the spreadsheet as conceptual interface for personal computers has this mode of use been convincingly demonstrated. A similar, but significantly different, interface is that of a table, where rows and columns play different roles. Both interfaces have been shown to be compatible with logic programming [van Emden et al. 1986, Cheng et al. 1988]. Filling in the blanks of a form is a useful, though not widely loved, conceptual interface. It has been exploited in Query-By-Example [Zloof 1977] to provide one of the more congenial query languages for databases. The queries of the logic programming language Prolog are similar [van Emden 1977, Kowalski 1979]. Avoid harmful conservatism. Exploiting a conceptual interface is a useful form of conservatism. Insisting that only English is fit for humans to communicate with our tools is not. The optimism about the utility of natural language for a user-computer interface is based in no small degree on the work of T. Winograd [1972], who himself, however, subsequently made the following observation [Winograd and Flores 1987]: Th~ practicality of limited natural language systems is still an open question. Since the nature of the queries is limited by the formal structure of the data base, it may well be more efficient for a person to learn a specialized formal language designed for that purpose, rather than learning through experience just which English sentences will and will not be handled. When interacting in natural language it is easy to fall into assuming that the range of sentences that can be appropriately processed will approximate what would be understood by a human being with a similar collection of data. Since that is not true, the user ends up adapting to a collection of idioms - fixed patterns that experience has shown will work. Once the advantage of flexibility has been removed, it is not clear that the additional costs of natural language (verbosity, redundancy, ambiguity, etc.) are worth paying in place of a more streamlined formal system. An interface where the user is confronted with seemingly random breakdowns and has to guess at what will work and what won't, is frustrating and inefficient bad ergonomics. A special-purpose notation can not only be a convenience, but even a genuine augmentation of the intellect. Such a notation should be seen as evolution of language, helping further development of the intellect. Such coevolution of language and intellect should be allowed to continue in the computer age and should not be stifled by doctrinaire insistence that only English is fit for humans. "Natural, easy-to-use" interfaces are to be approached warily when they are slower in use than other interfaces. Windows and a mouse can be extremely enticing when a novice finds that already after the first half hour he can get simple jobs done on a computer. But an interface that takes ten times as long to learn and allows the user to work twice as fast is worth the extra trouble after four and a half hours of use. As most users spend over a hundred, or even over a thousand hours with a computer every year, it is clear that preference for the "natural and easy-to-use" can be a form of harmful conservatism. 1 Concluding remarks If a moratorium on hardware improvement were to go into effect today, it would take decades before software caught up far enough to exploit hardware to a reasonable extent. Such a degree of exploitation includes the use of a computer as a congenial tool for thought, and deserves to be primary focus of computer science. To exploit the potential of computers as tools to augment the intellect, the Fifth-Generation Computer Systems Project has relied on expected advances in Artificial 1155 Intelligence. Experience in attempts to use computers to increase productivity in translation between texts in natural language suggests that more mundane approaches, summarized under Mental Ergonomics, are more effective. I have argued against the use of Artificial Intelligence and of natural language processing by computer. Lest I be misunderstood, let me emphasize that this concerned the particular applications addressed in this paper. Artificial Intelligence as cognitive science is as interesting and important as particle physics and cosmology. Naturallanguage processing by computer has by now reached the stage where it can be a valuable aid to human translators, and to authors more generally. This is consistent with the reservations quoted from Winograd. Acknowledgments Generous support was provided by the British Columbia Advanced Systems Institute, the Canada Natural Science and Engineering Research Council, the Canadian Institute for Advanced Research, the Institute of Robotics and Intelligent Systems, and the Laboratory for Automation, Communication and Information Systems Research. References [Bar-Hillel 1964] Y. Bar-Hillel. Language and Information. Addison-Wesley, 1964. [Bush 1945] Vannevar Bush. As we may think. In Adele Goldberg, editor, A History of Personal Workstations, pages 237-247. Addison Wesley, 1988. First published in Atlantic Monthly, July 1945. [Ceri et al. 1990] S. Ceri, G. Gottlob, and L. Tanca. Logic Programming and Databases. Springer, 1990. [Cheng et al. 1988] M.H.M. Cheng, M.H. van Emden, and J .H.M. Lee. Tables as a user interface for logic programs. In Proceedings of the International Conference on Fifth Generation Computer Systems 1988, pages 784-791, Tokyo, Japan, NovemberDecember 1988. Ohmsha, Ltd. [Engelbart 1963] D.C. Engelbart. A conceptual framework for the augmentation of man's intellect. In P.W. Howerton and D.C. Weeks, editors, Vistas in Information Handling, pages 1-29. Spartan Books, 1963. [Engelbart 1988] Douglas Engelbart. The augmented knowledge workshop. In Adele Goldberg, editor, A History of Personal Workstations, pages 187-232. Addison Wesley, 1988. [Horton 1982] S. Horton. Thinking Through Writing. Johns Hopkins University Press, 1982. [Kowalski 1979a] R.A. Kowalski. Algorithm = Logic + Control. Communications of the A CM, 22:424-436, 1979. Logic for Problem[Kowalski 1979] R.A. Kowalski. Solving. Elsevier North-Holland, 1979. [Licklider 1960] J.C.R. Licklider. Man-computer symbiosis. In Adele Goldberg, editor, A History of Personal Workstations, pages 131-140. Addison Wesley, 1988. First published in IRE Transactions on Human Factors in Electronics, March 1960, pp. 411. [Moto-Oka 1982] T. Moto-Oka, editor. Fifth-Generation Computer Systems. North-Holland, 1982. [Rosenblueth 1989] D. Rosenblueth. Exploiting Determinism in Logic Programming. PhD thesis, University of Victoria, 1989. [Sergot 1982] M. Sergot. A Query-The-User facility for logic programming. In P. Degano and E. Sandewall, editors, Proceedings European Conference on Integrated Interactive Computing Systems. North Holland, 1982. [Shapiro 1983] Ehud Shapiro. Algorithmic Program Debugging. MIT Press, 1983. [Shortliffe 1976] E.H. Shortliffe. Computer-Based Medical Consultations: MYCIN. Elsevier, 1976. [Cheng et al. 1990] M.H.M. Cheng, M.H. van Emden, and B.E. Richards. On Warren's method for functional programming in logic. In David H.D. Warren and Peter Szeredi, editors, Logic Programming:. Proceedings of the Seventh International Conference, pages 546-560, Jeruzalem, 1990. MIT Press. [Taylor 1989] Stephen Taylor. Parallel Logic Programming Techniques. Prentice Hall, 1989. [Davison 1988] A. Davison. Polka: a parlog objectoriented language. Technical report, Department of Computing, Imperial College of Science and Technology, University of London, 1988. [van Emden 1977] M.H. van Emden. Computation and deductive information retrieval. In E. Neuhold, editor, Formal Description of Programming Concepts, pages 421-440. North Holland, 1977. [van Emden 1976] M.H. van Emden. A proposal for an imperative complement to prolog. Technical Report CS-76-39, University of Waterloo, 1976. 1156 [van Emden et al. 1986] M.H. van Emden, M. Ohki, and A. Takeuchi. Spreadsheets with incremental queries as a user interface for logic programs. New Generation Computing, 4:287-304, 1986. [Winograd 1972] T. Winograd. Understanding Natural Language. Edinburgh University Press, 1972. [Winograd and Flores 1987] T. Winograd and F. Flores. Understanding Computers and Cognition. AddisonWesley, 1987. [Zloof 1977] M. Zloof. Query-By-Example: a database language. IBM Systems Journal, 16:324-343, 1977. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by lCOT. © ICOT, 1992 1157 An Integrated Knowledge Support System B R Gaines, M Linstert and M L G Shaw Knowledge Science Institute, University of Calgary Calgary, Alberta, Canada T2N 1N4 tExpert Systems Research Group, GMD PO Box 1240, D-S20S St. Augustin 1, Germany Abstract The overall aim of the studies reported is to explore the possibility of creating highly integrated knowledge support systems through the loose coupling of software tools developed independently at unrelated sites for different purposes. Three tools were integrated to form a combined knowledge acquisition and performance system. A hypermedia tool was used as a general-purpose knowledge acquisition tool for unstructured material in the form of text and diagrams. A knowledge acquisition tool was used to elicit knowledge more formally and structure it as a computable knowledge base. A knowledge representation and deduction tool was used to represent the elicited knowledge and perform inferences with it to generate advice in a performance situation. Strong synergy was created between the tools in that the annotation and explanation captured in the hypermedium system were available as context-sensitive help to the user of the expert system, and the expert system was used to validate the knowledge base generated by the knowledge acquisition tool and feed back anomalous cases as additional data for induction. 1 Introduction This paper reports the results of a collaborative program of research between the Canadian Knowledge Science Institute (KSI) and the German National Research Centre for Computer Science (GMD) on the integration of knowledge acquisition and performance systems. Researchers at the GMD have developed a knowledgebased systems shell, BABYLON, which combines objectoriented knowledge representation with a number of powerful inference paradigms (Christaller, di Primio & Voss, 1989). Researchers at the KSI have developed a knowledge acquisition system, KSSO, which combines object-oriented knowledge representation with a number of powerful elicitation, visualization and induction paradigms (Shaw & Gaines, 1987). The Canadian researchers have also developed inter-program communication protocols enabling Apple's HyperCard to be used as an extension to KSSO for the textual and graphic annotation of knowledge structures (Gaines, Rappaport & Shaw, 1989). The outcome of the collaborative program is a system, HyperKSE, combining hypermedia, knowledge acquisition and performance tools, to provide an environment supporting knowledge-based system development from acquisition, through application to maintenance. The approach we have adopted to system implementation is the heterogeneous integration of existing tools, involving substantial redevelopment but not the design and implementation of a monolithic system. This is an important issue in its own right and our logic for heterogeneous integration is manyfold (Gaines, 1990a); notably: the rapid pace of change of all the base technologies making systems obsolescent as they are implemented; the need to incorporate subsystems optimized for particular roles, probably by different organizations; and the overarching requirement for continuous upgrading and enhancement without a total system rebuild. All our current system development is based on highly modular systems communicating through servers and implemented as class libraries in object-oriented languages (Gaines, 1991). 2 The Principles Involved When combining tools in an integrated system it is important to have a clear systems architecture in terms of required functionality and how it is to be allocated among the tools. It is necessary to exclude some facilities provided by some tools because it is duplicated in others or inappropriate in the overall system. If this is not done, and the combined system does not project a clear model of its intended use, then users can become confused by excessive features. A functional model for the integrated system has been built at a high level of abstraction in terms of four dimensions of logical validation of a knowledge base, corresponding to similar dimensions in evaluating the truth of scientific theories (Rescher, 1979):• Coherence-the coherence of internal relationships between knowledge structures • Consistency-the lack of logical contradiction between knowledge structures • Correctness-the correctness of deductions from the knowledge structures checked against external data • Completeness-the adequate coverage of an intended scope for deductions from the knowledge structures. Figure 1 shows how the three tools relate to the four dimensions of validation. At the center of Figure 1, each of the tools provides means for visualizing the conceptual structures they support. For example, the knowledge acquisition tool provides various clustering algorithms for presenting case data graphically, the hypermedia tool is essentially visual, and the representation system provides its own knowledgebase grapher. From a formal validation point of view such visualization supports the expert and knowledge engineer in evaluating the coherence truth of the knowledge structures-the internal relations between different representations. Such evaluation provides feedback through each of the tools to correct errors, improve expressiveness, and so on, that are made manifest through incoherence. 1158 Fig. 1 Logical basis for modes of knowledge validation in the integrated system At the top left of Figure 1, the induction module in the knowledge acquisition tool derives plausible constraints on the database of case data in the form of entailments that eventually become expressed as rules. It supports the evaluation of these entailments for mutual consistency (can two rules arrive at different conclusions on the same case) and for consistency with the case data. Such evaluation again provides feedback through all the tools-for example, one may amend cases in the ~nowledge acquisition tool, amend rules in the representatIon tool, or make a note of the potential problem in the hypermedia tool. At the bottom of Figure 1, the running of new test cases, perhaps in routine system use, provides ~,:aluation of the correctness of the knowledge base as a deCISIon support tool. Again such evaluation provides feedback through all the tools, amendment of the case base and re-induction, direct editing of the knowledge structures, or commentary in hypertext. Finally, at the top right of Figure 1, it is shown that the hypermedia tool provides a far more significant logical validation role that its annotation duties may suggest. Systems intrinsically cannot validate themselves for completeness, and clearly there can be no guarantees of completeness in an open universe. However, in terms of validating the formal knowledge base completeness, the informal knowledge base held in hypermedia form plays a very significant role. Anything mentioned informally that has no referent formally leads to a suspicion of incompleteness requiring further investigation. 3 The Integrated System Architecture and Operation The knowledge acquisition tool KSSO, providing repertory grid, conceptual clustering, conceptual c?mpariso~, empirical induction, and knowledge base creatIon tools, IS already configured as a set of mod~l~s around a speci~ist database. It focuses on medlatmg representations supporting the expert's modeling processes in moving from skilled performance to overt knowledge structure.s supporting its emulation in the computer. In recent years It has been extended to support the informal representations of knowledge that are prior to those within the tool, such as text, pictures, diagrams, semi-structured int~~iews, protocols, and so on. This has been done by prOVIding ~n interapplication protocol allowing KSSO to mteract With HyperCard to provide the appearanc~ ?f a sean;uess .sing~e application to users. KSSO-specific functIonahty In HyperCard is supported by scripts that allow conceptual structures on the KSSO side to be linked to informal sources and annotation on the HyperCard side. KSSO exports knowledge bases to a range of existing knowledge-based system shells, and a number of collaborative studies have been reported in which these shells have been integrated directly (Gaines, Rappaport & Shaw 1989; Gaines & Linster, 1990). Figure 2 shows the distributed knowledge base and inter-application protocols linking the hypermedia, knowledge acquisition and knowledge-based system shell in Hyper-KSE. 1159 Expert accesses knowledge base annotation through HyperCard annotation cards I Client or expert in client role accesses BABYLON advice through HyperCard ..-....- ....~ interface cards Knowledge engineer accesses BABYLON directly Entity & Attribute annotation cards ) \.. KSSO-BABYLON inter-process link KSSO Knowledge Base BABYLON Knowledge Base Domain Definitions,""'I Attributes, Entities, Values & Implications Frames, Slots, Instances, Properties & Rules Imported by Babylon Fig. 2 Integration of hypermedia, knowledge acquisition and knowledge-based system shell ~igure 3 shows a typical sequence of activity in using the mtegrated system to develop a knowledge base. The acquisition co~mences with parallel creation by the knowle?ge engmeer of a repertory grid in KSSO and annotatIon stack in HyperCard. The elicitation tools are then used by the expert to enter critical cases characterizing the domain together with their relevant distinctive attributes and annotation about both cases and attributes. In particular, this annotation can include descriptions of the procedures by which the expert characterizes a case in terms of the attributes, and tutorial examples of such characterization. At any time during this process the expert or knowled~e engineer can analyze the knowledge entered !o ~ate~ v1sually clus~ering it, deriving underlying 1mphcatIons, and extendmg the knowledge structures and annotation in the light of such feedback. When the expert feels that a reasonable amount of knowl~dge has been entered it is exported to BABYLON where 1t can be consulted on test cases, including access to the annotation in HyperCard. If a test case leads to an error then this case can be posted back to KSSO with a corrected recommendation, re-analyzed and a revised knowledge base exported to BABYLON. . The fol1~wing sequence of screen dumps is based on a s~p~e tut~nal example on ~ushroom toxicity provided for trammg WIth Hyper-KSE. F1gure 4 shows the entity screen in KSSO where a list of mushrooms has been entered KSSO is automatically evaluating entity and attribute matches and sugges~n~ further elicitation, and the user has popped up a menu gIvmg access to commands, annotation in HyperCard and consultation in BABYLON. Figure 5 shows the way in which matching attributes are shown graphically in KSSO. The expert can change a rating just by dragging an entity along the scale. He or she may also respond to the prompt at the top to enter another, discriminating case. This graphic presentation has proved very effective in involving domain experts, and allowing them to enter knowledge directly into the computer without communicating it through a knowledge engineer. Figure 6 shows a cluster analysis of the data entered in KSSO. This analysis is available interactively at any time, providing a different perspective on the cases which both motivates the expert and allows the coherence of the data in terms of meaningful relations between entities and between attributes to be evaluated. Figure 7 shows the result of the user selecting HyperCard annotation on the popup menu over "Sweet scented Boletus" in Figure 5. The fields in the upper half are generated automatically from the case data, and those in the lower half are user-entered annotation, including a button to show further information. Figure 8 shows the result of the user selecting the "Show" button in Figure 7. The additional annotation can be of any form, access to a videodisc, sound replay, database, simulation, and so on. This is not computational information, but it is an important part of the interface of the knowledge base to a user, for example, in explaining terminology. 1160 KSSO BABYLON HyperCard Describe problem, Create associated purpose and domam~ domain-specific stack ~ Define initial Automatically create ~ annotation cards set of entities characterizing domain for each entity l Distinguish entities with Automatically create attributes & values ~ annotation cards ~ for each attribute Edit and extend Analyze entity/a~t~ibut~ annotation and other . str!Jctyre, denvmg Imph~t!ons, !ind aCId facilities in stack additional Items ; Export _ _ _ _ _ _ _ _ _ _.. ~ Import as BABYLON entity/attribute/implication knowledge base , ~ct~ i Access annotation Use knowledge base through BABYLON"'- in a consulation t explanation system Re-analyze, annotate Post erroneous cases and re-export. ....AII(~--------back to KSSO Continue elicitation, annotation, export Develop annotation Develop knowledge base and testing as in HyperCard ~ in BABYLON required t Use knowtdge base Access annotation through .BABYLON ~ in a consulation explanation system Fig. 3 Sequence of activities in using integrated system for knowledge base development Mushroom-Elicit ( TriPle) ( To Properties ( Pair ) [Delete) ( Add ) ( Edit ) iii i )G]( Show ) 0 In your context of N Illustrating KSS" you have entered 6 Properties and the 7 Mushrooms shown be low. Click on the Mushrooms to select them for editing, deleting, and showing matches. Click in this box to remove the ~dyice. Click on • AdYice· to get it back. Deadly Amanita Destroying Angel Cham i non Y.llow Boletus Satans Mushroom Tough stemm.d Bol.t The two Mushrooms, NDeadly Amanita II and NDesk You can add another Property to re Click in this box to show the m~t Rduice Status To Attributes Show Match ilar. Rdd New Entity Delete Entity Edit Entity d differences Pair Triple The NTrip1e" button helps you add another Property by think between thr •• Mushroo KSSO normally chooses the three Mus However I you can select one or more Mushrooms to be par ........................................................... hem above. Click in this box to add another Pr HyperCard BAB'r'LON-KS[ The two Properties I N poisonous - edib1." and N bulbuo You can add another Mushroom to reduce the match. Click in this box to show the m~tching Properties. imilar. Fig. 4 Entity screen in KSSO showing popup menu to access annotation and shell 1161 ED 0;; Mushroom-Elicit r Yes C~n you think of 8 relev8nt Mushroom th~t is "poisonous" end "cyHndric81" or "edible" 8nd "bulbuous"? You C8n 81 so edit your d8t8. ] ~ This display compans the ratw.gs of all the Mushrooms on two Properties. Re ..d the question ..boye .. nd oliok . . ·Yes· if IOU vish to ..dd .. nev Mushroom to reduoe tIM Ift..toh_ You can click on the any ofthe Mushrooms, on either si., and drag them along the rating bw to change their ratings. You can double click on the Mushroom or Property names to select them for editing. Click in this box to remoye this adYice_ ~ bulbuous poisonous Satans Mushroom- ~~ad1y Amanita estro\linQ AnQel Yes No D.adllj Amanita..... Destroying Mg.,..... .................................................... ,ChampiQnon Ch"'~ _Hili - 1- Swnt scent.d Boletus BRBYLON leSE Yellow Boletus !l:enow Boletus Tough stemm.d Bol.tus-' "-Tough stemmed Bol.tus cljlindric..l edible K;ll I to I~ £2] Fig. 5 Attribute match screen in KSSO prompting entry of a new entity iiO Mushroom-FOCUS 3 scal.d tubes flat cylindrical edible fat 4 5 7 2 6 100 90 80 70 60 50 """ 4 3 2 3 6 6 4 2 1 1 5 5 ~ .. ....... 6 100 90 80 70 60 s~tans Mushroom ..•... ' , , , , Deadly Amanita········· 2 Destroying Mgel ....... 7 Tough stMlf1\ed 801.t 5 Yellow Boletus······· .................................................... 4 1- 3 Champignon ......... . Fig. 6 FOCUS cluster analysis screen in KSSO NIIsJirllll.,s Sweet scented Boletus Edibility .. edible Hllltshlllpe '" fllllt Underside of hot .. tubes Surfoce .. smooth Stalk'" flllt Rootshape - cylindrical Rnnototlon unregularly formed agreeably sweet August until November, in deciduous or coniferious wood confounded wah: other kInds of boletus, which ere all edible latin dtsigr;atlofl: 8garicus 8rvensls hatshape: taste and smell: to be found: Fig. 7 Entity annotation card Delete 1162 Fig. 8 Associated auxiliary annotation Figure 9 shows a consultation running directly in BABYLON using the knowledge base entered in KSSO. The popup dialog box allows the user to select one of the possible values, or to go to the HyperCard annotation if, for example, he or she does not understand the question and needs some explanation of how this attribute should be evaluated. Figure 10 shows the same consultation running in BABYLON with the requests for data being made through HyperCard using the attribute annotation cards for query purposes. This is the preferred mode for end users since HyperCard may be used to give a non-technical interface oriented to the particular domain with extensive help facilities. The availability of the direct mode as shown in I'" ; File Edit Eual Tools Windows Figure 9 is important to the knowledge engineer since it gives access to the extensive tracing and knowledge structure visualization facilities in BABYLON. Figure 11 shows how the knowledge engineer can edit the results of a consultation on a test case in BABYLON and then post it back as a new entity to KSSO---similar facilities are available to the expert or client during the equivalent HyperCard-based consultation. These facilities put the application of the knowledge-based system into the knowledge acquisition process. They model what must be achieved throughout knowledge-based systems if we are to develop systems in which knowledge-base maintenance is an integral feature of ongoing knowledge acquisition. Mushroom Frame Rule iii BABYLON Knowledge Base gene~ated f~om KSSO data, 24--Nov-90 12:26:27 iii Design and implementation by B~ian Gaines and Marc Linste~ (Def-Kb-Configuration Frames-Rules-Config (:P~ocs Normal-F~ame-Mixin No~mal-Rule-Mixin No~mal-Interface-Mixin) (: Interface (Def-Kb-Instance Mush~oom Lisp-Mixin) Frames-Rul (Domain Mushrooms) (Context I Ilust~ating-KSS) (User Ma~c-Linster) (O~iginai Testststs) (Entity Mushroom Mushrooms) (Attribute Property P~opertie$) (~eight weight weights) (gref 0) Expected Ualue • CYLINDRICAL CYL INOR ICAL ~PPED BULBUOUS UHKNOIJN o MUSHROOMS MUSHROOM DEADLY..AMANITA DESTROYING..ANGEL CHAMP IGNON S~EET-SCENTED-BOLETUS YELLO~-BOLETUS SATANS-HUSHROOM TOUGH-STEMMED-BOLETUS (RULE-SET :EDIBILITY-RULES DEFINED FOR MUSHROOM) (RULE-SET :CONTROL DEFINED FOR MUSHROOM) ( INSTRUCT IONS DEF INED FOR MUSHROOM) ? ~hat·s the value of ROOTSHAPE of MUSHROOM: Fig. 9 Direct consultation with BABYLON 1163 ,.. ; File Edit Go Tools 12:45:11 III III Click u. tile ROOTSHRPE belo. tbat applies tu MUSHROOM cylindrical wrapped bulbuous UNKNOWN Show Fig. 10 Consultation with BABYLON using HyperCard as user interface IDE-OF..HAT Edit the instaDce MUSHROOM, 'Which! Post to KSSO) represents the case. Select only ( Forget It) from the possible valuesl Toadstool UNKNOWN SCALl!D Ml!DIUM CYLINDRICAL I ern ern ern ®D rm rm (huu'>e UBe uf: - EXPLAIH EDIBLE POISOtlOUS UtlKHOIJtt Fig. 11 Editing a case in BABYLON and posting it back to KSSO 1 1164 4 Conclusions This simple example serves to illustrate the main features of the integrated system. Each of the tools supports very large knowledge bases and a variety of applications has shown that the system scales effectively to significant applications. The graphic case elicitation tools in KSSO operate effectively for some 20 to 50 cases which is enough to characterize the attributes of a coherent subdomain. The induction tool in KSSO has been shown to be effective with databases exceeding 10,000 cases (Gaines, 1991d), inducing rules and evaluating edited knowledge bases rapidly enough for interactive use. HyperCard, with appropriate indexing tools, is capable of annotating databases of 10,000 cases without loss of interactivity. BAB YLON has been used on a number of majorknowledge-based system developments, and has recently been reimplemented to support large scale industrial applications. The main weakness of Hyper-KSE is in the difficulty of sustaining the functional integration of the knowledge acquisition tools in the development and application of complex applications. In a straightforward diagnostic application, soluble through heuristic classification, the major part of the knowledge base is a single, large but coherent, case base, and induced and manually entered rules. The representation and inference system has a simple task and its knowledge structures do not extend beyond those of the acquisition tools. In more complex system developments involving multiple subdomains, the acquisition tools may be used to characterize each subdomain, but the problem-solving, strategic knowledge that is involved in using the subdomain knowledge effectively has to be entered directly into the representation and inference tool. As the balance of the system changes such that this knowledge becomes increasingly important, the involvement of the acquisition tools in knowledge base maintenance is reduced. This indicates the need for acquisition tools supporting problem-solving techniques, and knowledge-level integration based, for example, on generic problem-solving methodologies (Chandrasekaran, 1988). Some recent experiments have shown that multiple heuristic classification of subdomains may be used to solve complex procedural problems, such as sequential decision making in room allocation (Gaines, 1991e). As a wider range of structures for generic problem solving methodologies are developed it is becoming feasible to extend the type of system described here to include more meta-knowledge designed to manage the acquisition, validation, and maintenance phases systematically. Extensions to KSSO to support such methodologies have been reported recently (Gaines, 1991a,b,c), and many researchers are working to develop the problem solving methodologies and test them in applications. 5 References Chandrasekaran, B (1988) Generic tasks as building blocks for knowledge-based systems: the diagnosis and routine design examples. The Knowledge Engineering Review, 3(3), 183-211. Christaller, T., di Primio, F. and VoB, A. (1989) Die KIWerkbank BABYLON. Bonn: Addison Wesley. Gaines, B.R. (1990). Organizational integration: modeling technology and management. Zunde, P. & Hocking, D. (Eds.) Empirical Foundations of Information and Software Sciences II. pp.51-64. New York: Plenum Press. Gaines, B.R. (1991a). Empirical investigation of knowledge representation servers: design issues and applications experience with KRS. AAAI Spring Symposium: Implemented Knowledge Representation and Reasoning Systems. pp. 87-101. Stanford (March) and SIGART Bulletin 2(3), 45-56 (June). . Gaines, B.R. (l991b). Integrating rules in term subsumption knowledge representation servers. AAAI'91: Proceedings of the Ninth National Conference on Artificial Intelligence. pp.458-463. Menlo Park, California: AAAI Press. Gaines, B.R. (1991c). An interactive visual language for term subsumption visual languages. IJCAI'91: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, California: Morgan Kaufmann. Gaines, B.R. (1991d) Induction and visualization of rules with exceptions. Boose, J.H. & Gaines, B.R. (Eds) Proceedings of the Sixth AAAI Knowledge Acquisition for Knowledge-Based Systems Workshop. pp. 9-1-9-25. Banff (October). Gaines, B.R. (l991e). Organizational modeling and problem solving using an object-oriented knowledge representation server and visual language. COCS'91: Proceedings of Conference on Organizational Computing Systems. ACM Press (November). Gaines, B.R. & Linster, M. (1990). Development of second generation knowledge acquisition systems. Wielinga, B., Boose, I.H. & Gaines, B.R., Schreiber, G., van Someren, M., Eds. Current Trends in Knowledge Acquisition. pp. 143-160. Amsterdam, lOS. Gaines, B.R., Rappaport, A. & Shaw, M.L.G. (1989). A heterogeneous knowledge support system. Boose, I.H. & Gaines, B.R., Eds. Proceedings of the Fourth AAAI Knowledge Acquisition for Knowledge-Based Systems Workshop. pp.13-1-13-20. Banff (October). Rescher, N. (1979) Cognitive Systematization. Oxford: Basil Blackwell. Shaw, M.L.G. & Gaines, B.R. (1987). KITTEN: Knowledge Initiation & Transfer Tools for Experts & Novices. International Journal of Man-Machine Studies, 27, 251-280. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 ' edited by ICOT. © ICOT, 1992 1165 Modeling the Generational Infrastructure of Information Technology B R Gaines Knowledge Science Institute, University of Calgary Calgary, Alberta, Canada T2N IN4 gaines@cpsc.ucalgary .ca Abstract A socio-economic model of generational infrastructure of information technology is presented as a tiered progression of 'learning curves' in mutually supportive technologies. This model is used to analyze trends in research and pr?duct, de,;elopment, an~ the transition from 'computer SCIence to know ledge sCIence' that characterizes the fifth generation. The achievements of fifth generation research are evaluated, and the expected directions of future generations research are projected. 1 Introduction The Japanese Fifth Generation research program has had an important socio-economic impact internationally in raising government awareness of fundamental changes in the nature of information technology and its strategic role. It has been directly responsible for structural change in national computing policy such as the formation of the MCC within the US anti-trust ethos, and the ESPRIT program in Europe cutting across strongly entrenched national boundaries. One side-effect of this has been to bring i~to promi~ence what .wa.s previously seen only as a marketmg/techmcal descnptIon of the evolution of information technology in terms of its generational infrastructure. As the fifth generation program comes to an end, this raises policy questions as to the nature and significance. o~ the next ge~eration, and as to the utility of conceptuahzmg computmg research in terms of generational advances. This paper analyzes the questions in terms of a general model.of 'learning cu~es' whose ~ime scale is largely determmed by the medmm term busmess cycle of capital replacement. It highlights an important difference between computing and other industrial technologies in that the pace of chan~e in, the ~as~, vlsi, technology is so rapid that ~onventIonal substitutIOn' effects are swamped by a tiered mfrastructure of learning curves involving major qualitative differences in technologies. A detailed account of the ~nderlyi~g model and its fit to the past development of mformanon technology has been given elsewhere (Gaines & Shaw, 1986; Gaines, 1990, 1991), and this paper focuses on fifth generation and later issues. 2 Electronic Device Technology The. initial breakthrough for computing was in electronic deVIce technology, and a de.finition of computer gener~tions in terms of hardware works well for the early machmes. However, as Rosen (1983) notes it blurs thereafter and "we are faced with an anomalous situation in which the most powerful computers of 1979-1980, the CDC Cyber 176 and the CRA Y 1, would have to be assigned to the second and third generations, respectively, while the most trivial of hobbyist computers would be a fifth-generation system." The reason for this anomaly is that the substitution effects of one form of technology for another are gradual and do not correspond to major structural changes. The enabling effects of changing technologies are far more dramatic: the change from mechanical to electronic devices made it possible to store programs as data and enabled the use of computers as a general-purpose tool and the development of language compilers; the transistor made reliable operation possible and enabled routine data processing and then interactive timesharing; integrated circuits reduced costs to the level where computers became commonplace and made possible the personal computer dedicated to a single user. Modem microelectronics commenced in 1956 when silicon planar process was invented and enabled integrated circuit technology to be developed. As Figure 1 shows, the number of devices on a chip follows Moore's law in doubling each year through the 1960s, and has continued to double every eighteen months through the 1970s and 1980s (Robinson, 1984). The current projected limit is some 1,000,000,000 devices on a chip in the late 1990s when quantum mechanical effects will become a barrier to further packing density on silicon planar chips. However, threedimensional packing, semiconducting peptides optical devices, or, most likely, new materials not yet co~sidered, are expected to extend the growth. Microelectronics shows over 9 decades of performance increase in 40 years. Such exponential growth is common in many technologies, but never over more than 2 decades and then in periods of the order of 100 years. Computer technology is unique in being based on underlying devices wh<;>se performance has increased at a rate and over a range achIeved by no other technology. Logarithmic plots, such as that of Figure 1, do not adequately project the impact such a long-term sustained growth, but this is apparent in the linear plot of the devices on a chip by computer generation as shown in Figure 2. During each generation, changes have taken place that correspond in magnitude to those of some hundred years in other industries. These quantitative changes have led to major qualitative effects that are analyzed in the following sections. 1166 1,000,000,000 ~1~ 100,000,000 10,000,000 5XlO~ 1,000,000 / 100,000 5X1 10,000 / 1,000 100 1 - - - 2 10 1 1956 3 / ~ V / V 1964 ID 1972 ID 1980 ~ 1988 % 1996 1956 1959 W ~ Figure 1 Growth of devices on a chip 3G 4G SG 6G Figure 2 Devices on a chip during six generations of computers 3 Learning Curves in Scientific and Technological Development There is a simple phenomenological model of developments in science technology as a logistic "learning curve" of knowledge acquisition (Ayres, 1968; Marchetti, 1981). It has been found to be a useful model of the introduction of new knowledge, technology or product in which growth takes off slowly, begins to climb rapidly and then slows down as whatever was introduced has been assimilated. Such curves arise in many different disciplines such as education, ecology, economics, marketing and technological forecasting (Van Dujin, 1983; Stoneman, 1983). It has also been noted in many disciplines that the qualitative phenomena during the growth of the logistic curve vary from stage to stage (Crane, 1972; De Mey, 1982; Gaines & Shaw, 1986). The era before the learning curve takes off, when too little is known for planned progress, is that of the inventor having very little chance of success. When an inventor makes a breakthrough, very rapidly his or her work is replicated at research institutions world-wide. The experience gained in this way leads to empirical design rules with very little foundation except previous successes and failures. However, as enough empirical experience is gained it becomes possible to inductively model the basis of success and failure and develop theories. This transition from empiricism to theory corresponds to the maximum slope of the logistic learning curve. The theoretical models make it possible to automate the scientific data gathering and analysis and associated manufacturing processes. Once automation has been put in place effort can focus on cost reduction and quality improvements in what has become a mature technology. 4 The Infrastructure of the Information Sciences The fast, sustained, learning curve for electronic devices, and the scope· for positive feedback in the information sciences, together result in a tiered infrastructure for the information sciences and technologies which is fundamental to their nature. It involves a succession of learning curves as rapid advances in one level of technology trigger off invention in others as shown in Figure 3. The breakthrough in electronic device technology leading to the zeroth generation is placed at 1940 about the time of the Atanasoff and Berry- experiments with tube-based digital calculations. Automation by 1980 had rea~hed the extreme level where silicon compilers allow a desIgner to implement ideas directly in devices with little further human intervention (Fields, 1983). The first breakthrough generating a computing infrastructure was the introduction of the stored program virtual machine architecture. Mauchly (1973) recognized the significance of stored programs, noting that subroutines create "a new set of operations which might be said to form a calculus of instructions." This was the key conceptual breakthrough in computer architecture, that the limited functionality provided directly by the hardware c<;luld be increased by stored programs called as subroutmes or procedures, and that the hardware and these routines together may be regarded as a new virtual machine. This is the foundation of the development of a variety of forms of virtual machine architectures (Weegenaar, 1978) that separates out computing science as a distinct discipline from other areas of electronic applications. The next level of breakthrough was in software to bridge the gap between machine and task through the development of problem-oriented languages. Work on the deSIgn of FORTRAN in 1954 and its issue in 1957 marks the beginning of the second generation era with. languages targeted to specific ~roblem areas of busme~s data processing, text processmg, database access, ma.chme tool control, and so on. A 1968 paper on the commg fourth generation notes that "progra~m~ng to~a¥ has no theoretical basis" and calls for a sCIenttfic baSIS m the next generation (Walter, Bohl & Walter, 1968). Sl;lre e~ough the theory linking languages to the underlymg VIrtual machines developed during the fourth generation era, for example that of abstract data types and initial algebras (Gogue~, Thatcher ~ ~agner, 197.8). In ~he fifth generation era the apphcatlon of expenence, deSIgn rules and theory to the automation of software production became the top priority (Balzer, Cheatham & Green, 1983). The next level of breakthrough was in interactive activity systems when continuous interaction ~oming a significant possibility as the mean time between fallures of computers began to be hours rather than minutes in the early 1960s. The move from batch-processing to direct human-~omputer interaction was made in 1963/1964 wIth the implementation of MIT MAC, RAND JOSS and Dartmouth BASIC systems. The study of such systems led to design rules for ~CI in the 1970s (Hans~n, 1971) and theoretical foundattons started to emerge m the 1980s 1167 1(now[etige Science • Breakthrough: creative advance made • Replication period: experience gained by mimicing breakthrough • Empirical period: design rules formulated from experience • Theoretical period: underlying theories formulated and tested • Automation period: theories predict experience & generate rules • Maturity: theories become assimilated and used routinely Figure 3 The infrastructure of the information sciences ~Alexa~der, 1987). The improvement of human-computer mteractIOn was a major stated priority in the Japanese fifth generation development program (Karatsu, 1982). Other forms of interaction also became feasible as a result of improved reliability such as direct digital control, and various forms of digital communications systems. The next level of breakthrough was one of knowledgebased systems supporting knowledge-processing, the hum~n capability t? store information through its interrelatlons and make mferences about its consequences. The breakthrough in knowledge-based systems dates from the development of DENDRAL (Buchanan, Duffield & Robertson, 1971) for inferring chemical structures from mass-spectrometry data and MYCIN (Shortliffe, 1976) for the diagnosis of microbial infections in the early 1970s. It led to a spate of expert system development in the fourth gen~ration era of the 1970s (Gevarter, 1983), and pragmatic deSIgn ~les for knowledge engineering in the current fifth generation era (Hayes-Roth, 1984). The utilization of their vlsi production capability (Gaines, 1984; Galinski, 1983) for the support of knowledge-based systems through PROLOG machines (Kitsuregawa & Tanaka, 1988) has been the other major priority in the Japanese fifth generation development program (Moto-oka, 1982). Defining the upper levels of the infrastructure becomes more and more speCUlative as we move into the immediate past of our. own era ~nd look for evidence of learning curves that are at theIr early stages. It is reasonable to suppose that the level above the representation and processing of knowledge in the computer is that of know!edge acquisitic:n syste~, breakthroughs in machine le~rIl1ng and expertIse modelIng. Two breakthroughs in t~lS area have ~en Lenat's AM learning mathematics by dIscovery (DaVIS & Lenat, 1982) and Michalski's inductive inference of expert rules for plant disease diagnosis (Mic~alski & ~hilausky, 1980). In the fifth generation era ~achm~ le.armng became a highly active research area in Its rephc~tlon phase (Michalski & Carbonell, 1983). The general fIeld of knowledge acquisition has also seen a massive growth in research (Boose, 1989). One may speculate that the growth of robotics will provide the next .breakthroughs ip which goal-directed, mobile co~put~t10?al systems wIll act autonomously to achieve theIr obJectIves. The breakthrough into the sixth generation era commencing in 1988 will probably be seen as one of autonomous activity systems. It is possible to see the nascent con~epts for this breakthrough in the adoption of the goal-~lrected programming paradigms of logic programmmg languages such as PROLOO. When, in a robot, a .goal speci~ication is expanded by such a programmmg system mto a sequence of actions upon the world depende?t on conditions being satisfied in that world, then the behaVIOr of such a system will deviate sufficiently f~om its top-level specification, yet be so clearly goaldirected, as to appear autonomous. However to achieve significant results with such systems we n~ed to add perceptual acts to th.e 1?lanning structures of a language such as ~IPE (WIlkInS, 1984) and develop logic programmIng languages that cope with the resulting t~mporal log~c (Allen, 1984)-in these developments the Slxt~ gen.eratIOn b~eakthrOl.~gh will come to be recognized, pOSSI?ly m th~ nO~lon ~f "SItuated action" (Suchman, 1987) and Its applIcatIOn In subsumption architectures for autonomous robots (Brooks, 1990; Connell,'1990). One may speculate further that interaction between these systems will become increasingly important in enabling them to cooperate to achieve goals and that the seventh genera.tion era commencing in 1996 will be one of socially organzzed systems. The Japanese "Sixth Generation" research program proposals emphasize emulation of 1168 creativity and intuition and the development of interdisciplinary knowledge sciences (STA, 1985; Gaines, 1986a). This recognizes the distinction between "computer science" and "knowledge science" as shown in Figure 3, and that cutting edge innovation in the information sciences involves human and social considerations intrinsic to the nature of knowledge. It is also possible that building an adequate forecasting model based on the premises of this paper may undermine the very processes that we model. If we come to understand the dynamics of our progress into the future then we may be able to modify the underlying process-to make the next steps more rapidly when the territory is better mapped. 5 Using the BRETAM Model The tiered infrastructure model of Figure 3 also shows the superimposed trajectories of invention, research, and so on. The intersection of these with the horizontal lines of the different information sciences may be used to model and predict the primary focus of different types of activity in each generation of computers: • Invention is focused at the BR interface where new breakthrough attempts are being made based on experience with the replicated breakthroughs of the technology below. • Research is focused at the R E interface where new recognized breakthroughs are being investigated using the empirical design rules of the technologies below. • Product Innovation is focused at the ET interface where new products are being developed based on the empirical design rules of one technology and· the theoretical foundations of those below. • Product Lines are focused at the T A interface where product lines can rest on the solid theoretical foundations of one technology and the automation of the technologies below. • Low-cost Products are focused at the AM interface where cost reduction can be based on the automated mass production of one technology and the mature technologies below. • Throw-away Products are at the MM interface where cost reduction has become such that maintenance and repair costs exceed replacement costs. For example, by the end of the fourth generation (1972-80): • BR: recognition of the knowledge acquisition possibilities of knowledge-based systems led to the breakthrough to inductive-inference systems. • R E: research focused on the natural representation of knowledge through the development of human-computer interaction, e.g. the Xerox Star direct manipulation of objects. • ET: experience with the human-computer interaction using the problem-oriented language BASIC led to the innovative product of the Apple II personal computer. • T A: the simplicity of the problem-oriented language RPG II led to the design of the IBM System/3 product line of small business computers. • AM: special-purpose chips allowed the mass-production of low-cost, high-quality calculators. By the end of the fIfth generation (1980-88): • B R: recognition of the goal-seeking possibilities of inductive inference systems led to breakthroughs in autonomous-activity systems. • R E: research focused on knowledge acquisition for knowledge-based systems. • ET: the advantages of the non-procedural representation of knowledge for human-computer interaction led to the innovative designs of Lisp and Prolog machines. • T A: the ease of human-computer interaction through a direct manipulation problem-oriented language led to the Apple Macintosh product line of personal computers. • AM: the design of highly-integrated language systems has allowed the mass-production of low-cost, high-quality software such as Turbo Pascal. • MM: calculators have become so low in cost that replacement is preferable to repair. By the end of the sixth generation (1988-96): • BR: recognition of the cooperative possibilities of autonomous intelligent systems will lead to a breakthrough in socially organized systems. • RE: research will be focused on autonomous intelligent behavior in systems such as neural networks and subsumption robots. • ET: the advances in inductive systems will lead to new products for extracting know ledge from large datasets. • T A: non-procedural problem-oriented languages will become routinely available on main-frame computers. • AM: highly interactive personal workstations will drop in cost to a level where they become standard office equipment. • MM: workstation replacement will have become more effective than maintenance and repair. 6 Significant Developments and Interactions The BRET AM model can be used to highlight the significant developments in information technology for purposes of planning research, development and applications. Figure 4 left shows the cross section of Figure 3 that is relevant to the state of the art in information technologies during the previous, fifth generation of computers. The top three levels on the right of invention, research and innovation show why the fifth generation· is generally recognized for its innovations in artificial intelligence. It was during this period that knowledgebased system products such as expert system shells first became available. However, in terms of reliance upon proven technology, it is the lower levels of product lines and below that are significant. The fIfth generation was that in which human-computer interaction was dramatically improved through graphic users interfaces, object oriented languages brought control of complex system development in software engineering, and networking became ubiquitous. All these innovations took for granted advances in the underlying device technology that offered very fast powerful and reliable processors and large highspeed memories at low-cost. Figure 4 right shows the equivalent picture of what is happening now as we progress through the sixth generation of computers. Hardware and networking have become almost negligible in cost and almost indefInitely powerful. Large-scale distributed systems are becoming readily available in terms of equipment and hardware architectures. Object oriented languages, and their associated application programming support environments (APSEs) and class libraries, are becoming routinely available at very low-cost. Graphic user interfaces (GUIs) are become standardized and portable across platforms as a routinely available technology. By the end of this generation the lowest level of knowledge-based system technology will have become available as well-supported product lines. These will support large-scale conceptual modeling at the enterprise level, the integration of heterogeneous information and < 1169 ~"oi!. Invention . . ... .. Collaborative Agents Socially Organized Systems Invention Goal Directed Systems Research Machine Learning Autonomous Activity Systems Knowledge Acquisition Systems fJ;. . .l·. Product Innovation l~"J Product Lines 17~ Product Lines Graphic User Interfaces Low-Cost Products Object-Oriented Languages Throw-Away Products Network Interfaces Virtual Machine Architecture Research ~tt Knowledge Acquisition Tools Product Innovation Expert System Shells Knowledge-Based Systems fl·.· . . ~ '... . . Goal Directed Systems 3.··.·M ~J\."j •. ~.~~" 6~~~t~Orflr:~~dti~guages .,. tv, M ..--r .,---r 1 519 9 o Low-Cost Products Graphic User Interfaces ... Electronic Device Technology 8 •Knowledge Representation Servers ~ 8 8 8 8 6 ~9 6 Figure 4 Significant technologies in the fifth and sixth generations . . . languages subject to verification of correct ~mplementatIon. processes at lower. levels, a~d the emulatIon of many The computer-support of CASE dia~ramr.nmg tools, group ~spelo,,:, and oriented languages and, while commercial tools are designed to support many paradigms, the full impact is support for further development from mnovatIons m the dependent on the development of formal specification less mature technologies above. B BR R.·.JE Socially Organized Systems ~ Autonomous Activity Systems Knowledge Acquisition Systems E·)"f '->YA Knowledge-Based Systems Interactive Activity Systems Expert Systems , Window systems, graphics AMf,~~~gg~~~ Specification languages Problem-Oriented Languages Virtual Machine Architecture Machine Learning M.quadrant C12 Cvee o :Inner Connector -'"'''r''' West Edge I Coul g03 i Cin g02 C12 PQtree Il_g04drant (SLl) Pre-compilation of circuit net into a quadtree In (S 1) above, the original planned area should be divided into at most four slices, the circuit should also be divided into relevant number of subcircuits, and their embedding plan into slices should be made. To avoid the untractable computational complexity caused by these two divisions and one embed, the CirNet is transformed into a quadtree-shaped process network named CMPN (Circuit Module Process Network) before layout generation. Each node in the CMPN is a process having message-passing streams among its upper and lower nodes. Each leaf node of CMPN represents a module in CirNet but a non-leaf node represents a block module newly defined in this transformation. Modules specified as a pair by Constr are compiled into an identical node near the leaf of CMPN to assure mutual proximity. The main role of CMPN is the layout generation, i.e., module placement and inter-module wire generation. If the module placement topology, i.e., in what quadrant each subcircuit should be placed, can be given in the stage of CMPN generation, later placement task is fairly simplified. (SL2) Vertical co-ordination of module shapes As subproblems are solved in parallel in (S2) above, nonabutment of their final shapes might happen. In that case, chip area will be enlarged due to many dead spaces among modules. To avoid this, the planned shape of the subslice in which a subcircuit is being placed is descended down to the vee in 2.2 Recursive Problem Solving The slicing structure representation of a layout permits us the following recursive problem solving. (S 1) If the module is an indivisible leaf cell, import its layout from a library. If the module is a divisible block, divide the original layout problem, composed of a circuit data and a PL (planned layout), into at most 4 subproblems, each having a homologous structure with the original problem. (S2) Solve all the subproblems in parallel using the recursion principle. If much processing elements or PEs are available, fork these subproblems on different PEs. (S3) Aggregate finished layouts of subproblems following the placement plan given in (S 1) to generate the layout of the original problem. ------- gO [Cll,C12] Line Segment South Edge Figure 2.1 Circuit and Layout Representations subcircuit as its planned shape. See Figure 2.2(1). (SL3) Horizontal co-operation in wiring Wire abutment among modules running in parallel had been an unsolved problem in LSI layout design automation. This problem is solved by way of runtime co-operation among nodes in CMPN. The induced connector processes (See 2.1) are used for this purpose. Each of them holds a CERW (Current Existence Range of a Wire on a slice edge). As the first task of wiring, each node process in CMPN tries to narrow CERWs of its peripheral connectors on the north-, west-, south-, and east-edges. If a CERW intersects with two internal subslices, then the CERW is narrowed to one of the two intersections. Among the two candidates, the one which has both enough wiring capacity and minimum wire length with inner connectors is selected. For feed-through nets, i.e., nets having no inner connectors, the CMPN node eagerly waits for a completion of narrowing action by some neighbour node to avoid useless wire bend generation. See 1175 19ure 2.2(2). ;L4) Wiring direction control to reduce co-operation loads lthough the CER W narrowing co-operation is useful in lrallel wiring, it is expensive due to the repeated peripheral mnector inspections. This is particulary true for cell wiring here tremendously large number of cells attend in the coperation. To reduce useless co-operation, wiring direction is )-ordinated in cell wiring. First, in all the cells and for all ets, partial wires are generated that connect inner- and eripheral-connectors on south or east edges. Each CERW n these edges is narrowed into a point where the wire rrived (SE-wiring). Then, NW-wiring follows. Finallly, on-directional feed-through wiring (ND-wiring) is made. )ue to the CERW reductions in t\l/O previous steps, much ager-waiting co-operation in ND-wiring can be avoided. ee Figure 2.2 (3). (1) Ver:1:ical Co-ordination (2) Horizontal Co-operation P1 P2 t P1 P21 P22 P2 waits P1 P11 P2 P21 (3) Global Wiring Direction Control - ." Initial SE-Wlring NW-Wlrlng ND-Wlrlng Figure 2.2 Problem Solving Heuristics 3 Overview of Co-HLEX An overview of Co-HLEX is given in Figure 3.1. The main components of Co-HLEX include: a set of original data, I/O functions, a backup memory, a problem solving kernel based on CMPN, and a template library. 3.1 The Problem Solving Kernel The problem solving kernel is a quadtree-shaped process network CMPN that generates a chip layout. Before the layout generation, each node of CMPN has only circuit data including a module name, the module property, a list of net names connecting this module to others, and a list of subcircuit names. After the layout, a set of layout data is added to each node including; a name of the layoutframe (parameterized slicing template) used to slice the node, an enveloping rectangle size, a list of slicing points in the rectangle, a list of submodule names in each slice, a list of adopted wiring pattern name for each net, and a list of peripheral- and induced-connector names. 3.1.1 Problem Solving Steps The overall layout problem solving is performed by the follwoing steps. (STl) Placement: A placement message containing a PL - a list of planned shape and planned peripheral connector placements - is sent to the top node of CMPN from the top level co-ordination process. Then a set of placement actions based on HRCTL is performed by CMPN processes. (ST2) Wiring preparation: Upon receiving the placement completion message from the top node of CMPN, the coordinator sends a wiring preparation message to it. Then a set of wiring preparation actions are made by CMPN. (ST3) Wiring non-terminal power nets: Power supply nets Vcc and Vee - have different width from other signal nets. As they offend the latter, they are wired first. The coordinator sends a message to the top node of CMPN to envoke recursive power wiring actions. (ST4) Wiring non-terminal signal nets: The co-ordinator sends a message to the top node of CMPN to generate signal nets. Then a set of wiring actions based on HRCTL is performed by CMPN processes. Recursion terminates when it reaches to a cell node. At that time, the CERWs held by connector processes contract to the magnitude of cell size. (ST5) Wiring nets in cells (SE-wiring): The co-ordinator sends a messages to the top node of CMPN to do SE-wiring. This message is passed down to cells. Cells which have inner connectors such as base-, emitter-, collector-contact, etc., that should be wired to peripheral connectors on south or east edges, wires all these nets. After this, each CER W on these edges reduces to a point where a wire reached. As layout rules such as wiring obstacle avoidance, minimun allowances between layout objects, etc., should be sufficed so the maze-router of Lee [Lee 1961] was used with some modifications. (ST6) Wiring terminal nets in cells (NW -wiring): Similar to that of (ST5). (ST7) Wiring terminal nets in cells (ND-wiring): The coordinator sends a message to the top node of CMPN to do ND-wiring. This is for feed-through wires which only pass above the cell without any inner connectors. All the cell nodes make feed-through wires in parallel co-operation. 3.1.2 Placement by HRCTL (STl) Termination of placement: When a terminal cell node in CMPN receives a placement message from above, it imports layout data from relevant layoutframe in the 1176 template library. (ST2) Subproblem generation: When a node - call it CN - in CMPN is a non-terminal module, it generates subproblems as follows. It sends its own PL and a list of its subcircuits with their estimated areas to a subproblem generation planframe (planning frame) in the library. The planframe sends requests to all layoutframes to make and evaluate possible slicing of PL and embedding of subcircuits into derived slices. The evaluation is made in view of the estimated wire length among inner- and peripheral-connectors, the estimated layout area, and estimated distortions of realized layout from the PL. Receiving the best plan and the relevant layoutframe name from the planframe, CN memorizes them. Then inter-slice wiring is made by a wiring plan frame relevant to the chosen layoutframe. In the first step of this wiring, CERWs of peripheral connectors are narrowed. Then inter-slice wires are planned for each net. Only abstract wiring plan is made as shown in Figure 2.1. Wiring patterns attatched to the chosen layoutframe is used. As the final wireability largely depends on the wire congestion on slice edges, so the wiring resource consumption on these edges should be balanced. To do this, the idea of wiring resource vector is introduced. It is a list of maximum possible wires through slice edges. In selecting a wiring pattern for each net, the resource vector consumption is analyzed for all the possible patterns and the best one is selected. New induced connectors are given, each having a CERW identical to the edge length on which it was defined. Induced connector processes are newly spawned, each having a message stream to CN. Finally, subproblem definition planframe is envoked to give all the PLs for all subcircuits. The planframe defines PLs by using derived subslices and their peripheral connectors and descends them to lower CMPN nodes. Streams to peripheral connectors are also descended to assure subsequent narrowing actions by subnodes. Notice that by this combination of placement and wiring, PLs of subcircuits homologous to that of parent CN could be generated. (ST3) Recursion: All the subnodes of CN are invoked in parallel to solve their problems. When many PEs are available, they are spawned on different PEs. (ST4) Placement aggregation: Layout aggregation planframe is invoked by CN. It waits for the completion message from all subnodes and after eceiving the message, it aggregates all the realized layouts of subnodes to generate the CN layout. The layoutframe chosen in (ST2) is reused here to give an aggregation scheme. But when it gives a large dead space in CN layout, layoutframe swapping is tried. 3.1.3 Wiring Preparation To make dead spaces usable in wiring, they are compiled into CMPN as dummy modules. Module placement points are determined in world co-ordinate with its origin at nothwest corner of the chip. After the dead space compilation, connector processes generated in placement become unusable, so they are killed. A CWPN - Cell Wiring Process Network approximating the meshed cell - is newly generate under each cell sharing a communication stream with th cell. Wiring obstacles in each cell are examined by using th relevant layoutframe and written into CWPN. Constraints Process Name Figure 3.1 Overview of Co-HLEX 3.1.4 Non-terminal Power Net Wiring by HRCTL (ST!) Termination of wiring: When CN is a terminal node having no inner power connectors, it only ascends a completion message to its upper node. Otherwise, a power wire termination planframe is invoked to generate powerwire connectors in the cell. (ST2) Subproblem generation: When CN is a non-terminal block with inner power connectors, a planframe is invoked to extend power wires along slice edges reaching to subslices. (ST3) Recursion: All the subnodes of CN run in parallel to make power wires. (ST4) Wiring aggregation: Aggregation planframe for power wire is invoked by CN. It waits for the power wiring completion messages from subnodes and determines the width of CN power lines based on electrical considerations. 3.1.5 Non-terminal Signal Net Wiring by HRCTL (ST!) Termination of wiring: When CN is a terminal node, it 1177 nly ascends a completion message. ;T2) Subproblem generation: When CN is a non-terminal lock, it generates wiring subproblems by using the same lethod as explained in 3.1.2 (ST2). As the module lacement is already given, only wiring action is repeated. :ERWs of peripheral connectors are narrowed first. Then vires are made giving new induced connector processes. :beir names are descended down to subnodes for recursion. ST3) Recursion: All the subnodes of CN run in parallel to olve their problems. When many PEs are available, they are :pawned on different PEs. ST4) Wiring aggregation: When CN receives completion nessages from all subnodes, it ascends its own completion nessage. t1.6 Wiring Terminal Nets ~STI) Wirable net detection: Each terminal CN, in this case a terminal cell, finds a net that has at least one fixed peripheral- or inner- connector. If such a net is found, CN broadcasts the net name, its connectors, and usable wiring layers at these connectors to its CWPN. (ST2) Pre-processing: Using the broadcasted information, CWPN changes the passage cost on its nodes. High costs are given to nodes in wire inhibition area. (ST3) Wiring a net: Modified maze-routing is performed on CWPN to find wires among given connectors. (ST4) Post-processing: Upon completion, the CWPN node on which a connector or a wire is placed is given a high passage cost. Finally a completion message is sent to CN. Then other net is tried from (STI). (ST5) Wiring other nets: After memolizing the reported wire, the cell repeats step (STl) untill all nets are wired. 3.2 Template Library 3.2.3 Layout Rules (1) Cell size definitions (depends on fabrication process). (2) Allowances (same, admissible gaps between objects). (3) Wiring rules (same, wiring layers and their usabiliy by signal and power nets). 3.3 I/O Functions 3.3.1 CMPN Generator (STI) Input data: The original circuit net CirNet and the PL of the topmost chip. (ST2) Process network generation: Circuit modules and nets are transformed into processes and their connections are replaced by streams giving a CMPN - Circuit Module Process Network. (ST3) Module shape alignment: Align-shape message is broadcasted to CMPN from the top co-ordinator. Divisible modules in CMPN such as resisters divide themselves to give aligned heights to those of standard transistors. As a result, an enlarged CMPN is given. (ST4) Hierarchy generation: The flat CMPN given by (ST3) is recursively partitioned to give a hierarchical CMPN. 3.3.2 Assignment of Processes on PEs LAP (List of available processors) is given to the top node of CMPN. The top node divides LAP into the number of its subnodes in accordance with their computation loads. As an approximation of the load, total number of modules in the circuit is used. One of the PE is picked up from each subset and a subnode is spawned on the PE. This process is recurred until LAP becomes indivisible. After that, all subnodes in CMPN is spawned on the same PE. 3.2.1 Plan Frames 3.3.3 Other Functions Planframes are a set of procedures used by CN as explained in 3.l. Many of them are layoutframe specific. (1) Choose an appropriate layoutframe for subproblem generation. (2) Evaluate a proposed plan for placement or wiring. (3) Generate subproblems for placement or wiring. (4) Descend subproblems down to subnodes. (5) Aggregate subproblem solutions for placement or wiring. (6) Other functions. 3.2.2 Layout Frames Layoutframes are templates, or types in other words, for layout. (1) Block levellayoutframes: Slicing templates of arity 1, 2, 3, and 4 containing several slicing structure variants. Template specific wiring patterns are included. (2) Cell level layoutframes: Parameterized configuration templates of transistors, resisters, capacitors, and connectors. Layout data in CMPN is written out to a display terminal. 4 Computational Complexity of HRCTL Definitions. Let PrBPT(R,N) denotes a balanced CMPN with R subnodes and height N. Let leaf(PrBPT(R,N)) denotes the number of leaf nodes of PrBPT(R,N). Let no(PEs) denotes the number of parallel processing elements on which the problem PrBPT(R,N) is solved by the HRCTL algorithm. Suppositions. All the nodes of PrBPT(R,N) consume the same computation power. Instantaneous communication among processes is possible without any computation load. The total elapsed time of processing on one PE is proportional to its total computation load. 1178 Theorem. For PrBPT, HRCTL has the time complexity of either O(log(leaf(PrBPT(R,N»» or O(log(no(PE) )+leaf(PrBPT(R,N) )/no(PE». The latter is the usual case where large problem is solved on limited PEs. Proof. Casel. no(PE) >= leaf(PrBPT(R,N» : The president PE is the bottleneck processor which receives the topmost node of PrBPT(R,N). It processes maximum number of nodes among PEs. The maximum number is 10g(leaf(PrBPT(R,N»). Case2. no(PE) =< leaf(PrBPT(R,N» : The president PE is also the bottleneck processor. Until the depth of 10g(no(PE» is reached on PrBPT(R,N), casel applies. After it, each PE is obliged to solve all the unsolved nodes in pseudo parallel mode. Here, the number of unsolved nodes is leaf(PrBPT(R,N»/no(PE). As the president PE faces the two situations sequentially, they should be added to give the loge no(PE) )+leaf(PrBPT(R,N) )/no(PE) complexity. QED. 5 Experiments 5.1 Experiment Design (1) Main objectives: The main objectives of the experiment are the verifications of; OE1. Parallel placement and wiring capability, OE2. Wire length and chip area reduction by vertical coordination and holizontal co-operation, OE3. Enhanced computation speed, OE4. The program size reduction and maintenability. (2) Used circuit and fabrication process: A real bipolar analog circuit with 10 19 modules and 683 nets are used in the experiment. 149 pairs were given as Constr. After module shape alignment, CMPN had 1299 modules and 901 nets. The height of generated CMPN was 14. From this original circuit, 254-, 489-, and 8l0-moduled subcircuits were extracted for computation speed measurement. A bipolar analog fabrication process with 3 wiring layers of ALl, AL2, and AL3 was assumed. The first two are for signal nets and the last one for power nets. For signal nets, as much ALl should be used as possible to attain high electrical quality. Traditionally, time consuming maze-router is usually applied to this problem. 5.2 Experiment Results Figure 5.1: Example chip layout. Figure 5.2: Computation speed. Figure 5.3: PEs vs Speedup. Table 5.1 : Scale ofCo-HLEX. 1299 modules, 683 nets, 623 sec/Multi-PSl.64PEs Figure 5.1 Bipolar Analog Circuit Layout by Co-HLEX 5.3 Considerations (1) The possibility of parallel layout problem solving. This has been proved through the experiment. As far as we know, Co-HLEX is the first system that can abut layouts module shapes and wires - by runtime co-operation. (2) Quality of Generated Layout: Wire length and chip area. Through observations of Figure 5.1 we notice that both compact module placement and wires without useless bends could be generated. By the runtime wire abutment cooperation, traditional channel areas to patch inter-submodule wires could be diminished. This contributes to chip area reduction. (3) Computational efficiency. Figure 5.2 shows the performance of both Co-HLEX on Multi-PSI/64PE and a practical layout system on a main frame. Co-HLEX has a time complexity of nearly O(Nl.o). Here N is the number of modules in the circuit. For the 1299 moduled circuit, it took only 623 sec. This extraordinary outperforms the traditional system. Also, nearly linear peedup could be attained as shown in Figure 5.3. (4) Program size and maintainability. The 6,000-Iined Co-HLEX remarkably outperforms the O(l05)-O(l06)~sized traditional implementations. See 1179 Table 5.1 Scale of Co-HLEX CPU Sec (Main Frame) 10000~------------------------~ Subsystems Practical Layout System with Maze Router 8000 50 100 150 Kernel 200 Number of Modules KL1.Lines 620 Planframes 2648 Layoutframes 1180 Layoutrules 684 Utilities 865 Total 5997 CPU Sec (Multi-PSI.64PE) 1000~----------------------------------------~ 600 Table.5.l. This is due to the recursive HRCTL algorithm. Highly modularized program description was possible on streamed-parallel dataflow computation model offered by KLl. 400 6 Conclusions 200 6.1 Results Co-HLEX with HRCTL Algorithm 800 o 254 489 810 1299 Number of Modules Figure 5.2 Problem Size vs Problem Solving Time Speedup 2.4 2.2 2.0 (1) A co-operative hierarchical layout problem solver named Co-HLEX was developed in FGCS project as an application program of Fifth Generation Parallel Inference Machines. (2) The kernel algorithm of Co-HLEX is HRCTL which is a hierarchical recursive concurrent theorem prover for layout. Taditional wiring channels could be avoided due to its runtime co-operation to abut module shapes and wiring connectors. (3) Due to the recursive nature of HRCTL, Co-HLEX is nearly 6,OOO-lined in KLI which remarkably outperforms traditional LSI layout program implementations. (4) Nearly O(Nl.o) time performance could be attained due to the streamed-parallel and distributed-memory architecture of Co-HLEX which greatly outperforms traditional methods. 6.2 New Problems 1.8 Programmers are in the well-known plan-do-see cycle. The non-repeatablity of parallel computation often destroyes the loop and deteriorates debugging efficiency. A programming environment for parallelism would be one of the most important issues to be studied further. 1.6 1.4 1.2 1.0 10 20 30 40 50 60 70 Acknowledgements Number of PEs Figure 5.3 PEs vs Speedup We acknowledge Dr. K.Furukawa, Dr. K.Nitta, Dr. K.Taki and other ICOT members for general supports, Prof. M.Iri of the University of Tokyo for advices on the problem solver design, Mr. M.Oda, Mr. H.Yamaguchi, Mr. K.Kawamura, 1180 Mr. Y.Mori, and Y. Hoh of ISC Ltd. for assisting the program development, Mr. S.Domen and Dr.K.lshihara; the general manager and the senior chief researcher of the Systems Development Lab. of Hitachi, for their supports of the research, Mr. K. Maio of the Semiconductor Design and Development Center, Hitachi Ltd., for assisting layout experiments, Dr. Y.Shiraishi of the Central Research Lab. of Hitachi for state of the art discussions, and Dr. S.Hayashi of Systems Development Lab. of Hitachi for discussions on analog circuit layout. References [Kleene 1952] S. C. Kleene. Introduction to Metamathematics, Van Nostrand,1952. [Mandelbrot 1982] B. Mandelbrot. The Fractal Geometry of Nature, W. H. Freeman, 1982. [Watanabe and Hayashi 1989] T. Watanabe, and S.Hayashi. Modelling Layout Problem Solving in Logic, in CAD Systems using AI Techniques, G. Odawara(ed.), pp.181-188, North-Holland, 1989. [Samet 1984] H. Samet. The Quadtree and Related Hierarchical Data Structures, Computing Survey, VoI.I6, No.2, June (1984), pp.187-260. [Otten 1982] R. Otten. Automatic FIoorplan Design, Proc. 19th DAC (1982), pp.261-267. [Luk et at. 1986] W. K. Luk, D. T. Tang, and C. K. Wong. Hierarchical Global Wiring for Custom Chip Design, Proc. 23rd DAC (1986), pp.481-489. [Lee 1961] C. Y. Lee. An Algorithm for Path Connections and its Applications, IRE TEC, September (1961), pp.346-365. [Watanabe and Komatsu 1991] T. Watanabe and K. Komatsu. Co-operative Hierarchical Layout Problem Solver on Parallel Inference Machine, Proc. of the LPC'91 , ICOT (1991) pp.9-24. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 1181 A Cooperative Logic Design Expert System on a Multiprocessor Yoriko Minoda, Shuho Sawada, Yuka Takizawa, Fumihiro Maruyama, and Nobuaki Kawato FUnTSU LIMITED 1015 Kamikodanaka, Nakahara-ku, Kawasaki 211, Japan pro 114@flab.fujitsu.co.jp Abstract CAD systems that can quickly produce quality designs are needed for the expanding VLSI market. This paper presents a cooperative design mechanism in a cooperative logic design expert system on a multiprocessor, co-LODEX. coLODEX accepts constraints on area and speed, and outputs a CMOS standard cell netlist that satisfies the constraints. The user can even get an optimal circuit for area or speed by iteratively strengthening the corresponding constraint. Short turnaround is expected through the combination of parallel processing by several processors and their cooperation. The cooperative design mechanism is based on an evaluation-redesign mechanism using assumption-based reasoning within a single processor. Design alternatives are considered as assumptions and constraint violations as contradictions. Redesign is implemented as a contradiction resolution. The evaluate-redesign cycle repeats itself until the design satisfies the specified constraints. Global evaluation-redesign takes place by processors exchanging design results for subcircuits in terms of gate counts and delays (in case of success) or justifications for constraint violations (in case of failure). Experimental results show that (1) co-LODEX can efficiently carry out global optimization. For example, a circuit with the minimum number of gates has been obtained while satisfying constraint on speed. (2) Linear speedup has also been observed. 1 Introduction CAD systems that can produce quality circuits quickly are needed for the expanding VLSI market. One of the most pressing problems is the lack of a means to iterate the cycle of evaluation and redesign until the design satisfies all constraints. Without it, it would be impossible to design a quality circuit with the desired characteristics (area and speed) by looking at the design from a global point of view. There is also demand for CAD systems that can do global optimization for the whole circuit. With such systems, designers can get a circuit with the gate count minimized and the delays kept shorter than the given constraints or vice versa. Turnaround time seems to be another key issue. Short turnaround allows designers to rapidly implement a variety of architectural choices and to choose the solution best suited for their specific situation by comparing area and speed characteristics. Designers can thus explore their options in a way that has not been practical before. Since design decisions may be retracted after later evaluation, they can be thought of as assumptions. Assumptionbased reasoning uses both facts and assumptions that can be retracted [de Kleer 1986]. Justification, originally introduced for truth maintenance [Doyle 1979], is the key concept to manipulating information containing assumptions. In de Kleer's Assumption-based Truth Maintenance System (ATMS), all assumptions are enumerated in advance and all combinations are examined. In design, however, we are not interested in all combinations. This is because a decision's significance depends on decisions made earlier. We can prune a considerable number of combinations. A global optimization technique using as linear programming (LP) was proposed [Kageyama 1990]; however, we can not get the exact optimal circuit, because the solution does not always give O's or 1's for variables that must take 0 or 1. We proposed an evaluation-redesign mechanism using assumption-based reasoning [Maruyama 1988]. In our evaluation-redesign mechanism, design alternatives are considered as assumptions and constraint violations as contradictions. Redesign is implemented as contradiction resolution. Justifications for violations, called nogood justifications (NJs), playa central role in the mechanism. NJs enable us to drastically prune the search space for constraint satisfaction or optimization problems [Maruyama 1991]. In this paper, we present a cooperative logic design ex- 1182 pert system on a multiprocessor, co-LODEX. co-LODEX divides the whole circuit to be designed into subcircuits in advance and designs each subcircuit on each processor to exploit parallel processing. Global evaluation-redesign takes place by processors exchanging design results (in case of success) or NJs (in case of failure). In our cooperative design mechanism, NJs received from other agents help narrow down the search space for an agent in the sense that NJs made out of the received ones enable the agent to prune the search space. That is the reason why we claim co-LODEX as "cooperative". Short turnaround is expected through the combination of parallel processing by several processors and their cooperation. co-LODEX also has the advantage of exact global optimization. The next section gives an overview of co-LODEX. Section 3 describes its cooperative design mechanism. We give some experimental results in Section 4 and concluding remarks in Section 5. UHDL; interface_view: interface_exampleOl; inputs: .xi(12) •.yi(12) •.dxi(12) •.ui(12) •. ai(12); outputs: .xo(12) •.yo(12); behaviocview: behaviocexampleOl; derme: constS =5. const3 =3; terminal: ul(12). u2(12). u3(l2). u4(12). u5(12). u6(12). yl(12). FF; operator: 2stage...,pipeline(Cmultiplier(x. y. z) = ( len = 2 ). z <- X * y; end_op; function: main: clk; while (FF) do 2a: ·2stage...,pipelined_multiplier·(u. dx. ul); 3a: ·2stage...,pipelined_multiplier·(x. constS. u2); 4a: ·2stage...,pipelined_multiplier·(const3. y. u3); 5a: ·2stage...,pipelined_multiplier·(u2, ul. u4). x<- x +dx; 6a: ·2stage...,pipelined_multiplier·(u. dx. yl). FF<- x< a; 7a: ·2stage...,pipelined_multiplier·(u3. dx. u5). u6 <-u -u4; 8a: y <- yl + y; 9a: u <- u6 - u5. xo := x. yo := y; enddo; la: stop(x CHIP (1) The form, "component ('a')", represents gate count of each component. This says that if the total gate count of the input buffer, the registers, the multiplexers, and so on, exceeds the value of variable CHIP, it means a constraint violation. CHIP is the variable that refers to the currently valid constraint value on gate count, for example 2000. co-LODEX transforms each constraint specified by the designer into default NJs. A timing constraint in terms of the clock cycle is transformed into a set of default NJs, that is, an inequality representing that the sum of the delays of the components along a path from source to destination exceeds the constraint value. For example, one of the default NJs represents that the path 1185 from REG_1 via MUX_4, MUX_5, ADD_SUB, MUX_7, to REG_4 is longer than the clock cycle. It is as follows: REG_l(p2) + MUX_4(Pl) + MUX_5(p2) + ADD_SUB(p2) + MillC7(p2) + REG_4(Pl) > CLOCK (2) The form, "component ('p' number)", represents a path within each component. CLOCK is the variable that refers to the currently valid constraint value of the clock cycle, for example, 120. Starting from default NJ s, new NJ s are added during redesign through NJ expansion and generation as described below. NJs save us doing direct evaluation against constraints. All we have to do is to check to see if any NJ is satisfied. NJ Expansion NJ expansion is used to narrow the scope and go down the hierarchy to resolve contradictions, or constraint violations. NJ expansion is formally defined in the following three steps. The NJ to be expanded is the one that is satisfied at the moment. Step 1: Select a component appearing in the NJ to be expanded. Call it C. Step 2: Replace C in the NJ with its in alternative's subcomponents. If the in alternative is at the leaf of the hierarchical structure (at the standard cell level), replace C with its actual gate count or its delay value. Step 3: Go down the hierarchy to the alternative node and store the NJ obtained in Step 2. (End) NJ Generation If every alternative of a component causes a constraint violation, NJ generation enables us to get a new NJ, the logical product of the NJs corresponding to each alternative. The generated NJ does not refer to that component. It is put at the alternative node one level up. This procedure is justified by resolution [Robinson 1965]. In general, the generated NJ is a logical product of NJs about gate count and NJs about delay. Evaluation-Redesign Algorithm within Each Agent The redesign algorithm within each agent uses NJ expansion and generation. Redesign is invoked when an NJ turns out to be true, since satisfying an NJ means a constraint violation. Step 1: Set ALT to the agent node and proceed to Step 2. Step 2: Check to see if there is any satisfied NJ at the ancestor alternative nodes (including itself) of ALT. If so, set ALT to the alternative node where the satisfied NJ is put, and proceed to Step 3. Otherwise, go to Step 7. Step 3: If there is a subcomponent of ALT appearing in the NJ, proceed to Step 4. Otherwise, go to Step 5. Step 4: Expand the NJ. Set ALT to the current alternative node and return to Step 3. Step 5: Make ALT out. Select another alternative node that is not inhibited by an NJ, make it in, set ALT to it, and go to Step 2. If every alternative is inhibited by NJs, proceed to Step 6. S te p 6: Generate an NJ. Set ALT to the current alternative node and go to Step 3. If there is no alternative node one level up, output the generated NJ and exit (Fail!). Step 7: If there is no component node whose alternative nodes are all out, exit. (Succeed!). Otherwise, select an alternative node that is not inhibited by NJs, make it in, set ALT to it, and go to Step 2. (End) In Step 5, selection is done either by recalling an out alternative or by generating a new implementation. The above algorithm starts when an agent receives information from the other agents. Once the algorithm terminates in success or failure, the agent sends information to the other agents. 3.2 Cooperative Design Algorithm We propose a cooperative design algorithm by describing the procedure for each agent. Step 1: Design its subcircuit. Repeat redesign by the evaluation-redesign algorithm. The gate counts and delays of the other subcircuits are assumed to be O. If any agent fails, the algorithm terminates in failure. Otherwise, proceed to Step 2. Step 2: Exchange the design results, that is the gate counts and delays of the subcircuits, with the other agents. Proceed to Step 3. Step 3: Set the gate counts and delays of the other subcircuits to the design results received in Step 2. If no stored NJ is satisfied, go to Step 9. If some of the stored NJs are satisfied and the design results of each agent are the same as in the previous cycle (caught in a loop), go to Step 7. Otherwise, proceed to Step 4. Step 4: Redesign its subcircuit. If at least one agent succeeds in redesign without any stored NJ satisfied, go to Step 2. Otherwise (all agents fail), proceed to Step 5 Step 5: Exchange the generated NJs with the other agents. Proceed to Step 6. Step 6: Combine the NJs received in Step 5. Go to Step 1. 1186 Step 7: Set a temporary constraint and proceed to Step 8. Step 8: Design its subcircuit. Repeat redesign by the evalu- NJ(2) ation-redesign algorithm until all the constraints induding the temporary one are met. The gate counts and delays of the other subcircuits are assumed to be O. If all the agents fail, the algorithm terminates in failure. Otherwise, go to Step 2. Step 9: Put together all the subcircuits. The algorithm terminates in success. (End) f\. . . Only default NJs are stored initially. As the algorithm proceeds, new generated NJs and combined NJs are added. In Step 7, select one of the violated constraints with the fewest agents related, and set the current value corresponding to that constraint as a temporary constraint. NJ(2) CLOC'{I----...~....d-- ~~-......l'I/J(l) 96 CHIP NJ(l) ..J--_..lociloJHIP_NJ(1) NJ(4) from Agent4 NJ(5) v NJ(6) Figure 5. Example of combining NJs Once the above algorithm terminates in success or failure (In Step 1, Step 8, and Step 9), the design run is finished, and the user can retry by changing the constraints. The user can look for a faster circuit by tightening the delay constraint, or can rerun by relaxing the constraints in case of failure. When the constraints are changed, the system updates them and re-evaluates by checking all the stored NJ s. As more NJs are accumulated, the efficiency of the algorithm is further improved. (5) and (6) are added to Agent5. Figure 5 illustrates the above. The two axes of each graph correspond to default NJs (1) and (2). NJ (3) means that any design by Agentl is either 192 gates or more, or 19.2 ns or longer along the path for default NJ (2). The left graph shows that Agent! cannot design inside the hatched part. Similarly, the middle graph shows that Agent4 cannot design inside the hatched part. Combining (3) and (4) gives a condition for the agents other than Agentl and Agent4 to be unable to design without violating any constraint. If the agents other than Agentl and Agent4 design inside the hatched part of the right graph, that will cause constraint violation. NJs (5) and (6) represent the hatched part. 3.3 Combining NJ s 4 Experimental Results When an agent fails in redesign with the evaluation-redesign algorithm described in the above section, it generates an NJ and sends it out to the agents that share it. Each agent "combines" the NJs received from other agents and makes a new NJ out of them. Considering an NJ from an agent as a condition where design is impossible for the agent, the combined NJ can be seen as a condition where design is impossible for agents other than the recipient agent. Agents are required to design without any combined NJ satisfied. For example, suppose Agent5 received the following two generated NJs (3) and (4) originated from default NJ (1) We implemented co-LODEX on Multi-PSI [Taki 1988] in KLI [Veda 1986] to evaluate the performance of the cooperative design mechanism, and tested as examples to design a specific circuit and usual circuits. and (2) from Agentl and Agent4, respectively ("A" signifies logical product): 192 + Agent2(a) + Agent3(a) + Agent4(a) + Agent5(a) > CHIP 1\ 19.2 + Agent5(pl) > CLOCK (3) Agentl(a) + Agent2(a) + Agcnt3(a) + 96 + Agent5(a) > CHIP (4) Agent5 combines the above NJs and makes the following new NJs: 288 + Agent2(a) + Agent3(a) + Agent5(a) > CHIP 1\ 19.2 + Agent5(pl) > CLOCK (5) 96 + Agent2(a) + Agent3(a) + Agent5(a) > CHIP (6) 4.1 Optimization Optimization using co-LODEX proceeds as follows: First, co-LODEX requests the user for area and speed constraints and produces a solution satisfying the constraints. The user then changes area or speed constraint value to the value for the solution just obtained minus 1, and iterates as long as the constraints are satisfied. If constraint satisfaction fails, the previous solutionis used as the optimal solution. Figure 6 shows some of the results for the MAG example. MAG approximates (a2 + b2)1!2. At first, the area constraint was large enough, and the timing constraint was 130. We obtained the circuit shown at the right. As the area constraint was strengthened, different results were achieved. The smallest circuit, we find is shown at the left. Finally, the above optimization failed in constraint satisfaction with NJ, 1224>CHIP. This means that design is impossible if 1187 > AREA-TIME MAP < Specific Circuit ns 140 130 120 110 100 90 1200 Figure 6. Experimental result for MAG circuit the specified set of constraints satisfies the NJ. We must thus relax the constraints so that the above NJ is not true any more. 4.2 Speedup Speedups were examined by increasing the number of agents from 1 to 15. Agents correspond to processors on a one-to-one basis. We had one extra processor for distributing the functional blocks to the other processors and taking statistics, so we used up to 16 processors altogether. We expected that speedups would increase in proportion to the number of agents. The example presented here is to design a multi-argument adder (array adder). The function of this circuit is to calculate the sum of nine integers represented in two's complement format. This circuit is adopted in ALUs and multipliers in other example circuits described below. This circuit consists of 122 one-bit adders. The function of a one-bit adder is to calculate the sum and the carry-out of one-bit integers. Each one-bit adder has many design methods, so the whole circuit has over 50 million design combinations. Table 1 lists the number of design methods with the number of inputs and outputs. Each one-bit adder can be implemented with CMOS standard cells immediately. Thus, we have tested only the cooperative design mechanism of coLODEX. We used 30 default NJs. Figure 7 shows a part of this circuit. The boxes represent one-bit adders and the number inside them represent the number of input bits. The arrows represent default NJs. The upper and lower side or upper-left and lower-right side of the arrangement of neighboring blocks has a relationship to the same default NJ. Accordingly, co-LODEX divided the whole circuit and the boundary lines between subcircuits as vertical or slanting (from upper-left to lower-right). We averaged design costs to agents in this test. We assumed that design costs depend on the total number of design methods for the agent in charge. Taking the number of design methods into account, co-LODEX divided the whole circuit into subcircuits as many as agents. The shaded areas in Figure 7. show two of the subcircuits, where the number of agents is 10. co-LODEX can easily divide this circuit with agents, since it is orderly. The relation between the number of agents and the speedups is shown in Figure 8, which shows a change in Table 1. The number of combinations for design method inputs sum carry- number of out 1 1 0 1 2 1 1 1 1 1 3 1 4 2 1 12 5 2 2 30 105 6 2 7 3 2 2 8 3 3 3 3 9 inputs design methods 15 420 84 Figure 7. Array adder outputs 1188 Usual Circuits Speedup ••• initial design 18 • •• a change in constraint on area 16 •• - a change in constraint on delay time 14 •_ / -0- design for a set of constraints • / "'...,........ /,.(:: ....•... .... 12 10 • / ./. 8 .' ....' ,.,-' .... / ./ 4 5 .L.~···. '····'·. ·· 6 4 2 2 3 6 7 8 9 10 11 12 13 14 15 Agents Figure 8. Relation between the number of agents and speedup design time according to the number of agents. The slanting straight line represents the ideal line. All agents are active in consequence of a change in area constraint, while some agents are active and others are inactive in consequence of a change in delay time constraint. A change in area constraint thus increases speedups and the result surpasses the ideal line. The reason seems to be that our cooperative mechanism reduces the amount of computation by saving useless combinations of alternatives from each agent. Initial design time, time taken until evaluation-redesign occurs, is roughly constant, because the increase in distribution work of the entire specification to agents cancels out the decrease in each agent's design work due to an increase in the number of agents. Figure 8 also shows the speedups for a design, including initial design, when a set of constraints are given. Table 2 lists the results of speedups for design of six usual circuits, including initial design, when a set of constraints are given, together with the optimal number of agents and the time. A block diagram of the datapath includes various functional blocks. Some functional blocks such as ALU are complex, and others are simpler. We observed that one or two special agents work hard but that the other agents spend time waiting for messages from busy agents. Processing time depends on the busy agents which manage complex functional blocks. To take advantage of our cooperative design mechanism on a multiprocessor, distribution strategy would need, in addition to focusing on critical path candidates, (1) to look ahead in the library when distributing the functional blocks, and (2) to set up sub-agents if necessary. 5 Conclusion We presented a cooperative logic design expert system on a multiprocessor, co-LODEX. co-LODEX divides the whole circuit to be designed into subcircuits in advance and designs each subcircuit on each processor to take advantage of parallel processing. Global evaluation-redesign takes place by processors exchanging design results or NJ s. A cooperative design algorithm based on assumption-based reasoning makes this possible. Short turnaround is expected through the combination of parallel processing by several processors and their cooperation. co-LODEX can efficiently carry out global optimization. For example, a circuit with the minimum number of gates has been obtained while satisfying constraint on speed. By Table 2. Results of experiments Number of Components Circuit Main Components Speedup Optimal # of Agents Time(sec) 1.7 Greatest common devisor 11 1 subtracter, 1 comparator 1.1 2 Differential equation y" + 5xy' + 3y = 0 28 1 multiplier, 1 ALU(add/subtract) 1 comparator 1.3 3 72 MAG(I) 14 1 ALU(add/subtract), 1 comparator 1 two's complementer l.7 4 3.3 MAG(2) 13 1 ALU(add/subtract/compare) 1 two's complementer 1.2 3 6.4 16 1 adder, 1 subtracter, 1 comparator 2 two's complementers 5.0 15 3.7 22 RAMs, 1 ALU(multiply/add) 1 adder, 1 comparator 1 decrementer, 1 incrementer 2.1 4 113 MAG(3) Correlational function N-l-j y[i] =j~l xfj] * xCi + j] 1189 increasing the number of agents up to 15, the best linear speedup has been observed. Our future plans include working on parallel processing of design, evaluation, and redesign within an agent. Distribution strategy is also important for load balancing among processors. Acknowledgments This work has been done as part of the Fifth Generation Computer Systems (FGCS) Project of Japan. We would like to thank Dr. Nitta, manager of the Seventh Laboratory of ICOT, for his support. References [de Kleer 1986] J. de Kleer: "An Assumption-Based Truth Maintenance System," Artificial Intelligence 28, pp.l27162 (1986). [Doyle 1979] J. Doyle: "A Truth Maintenance System," Artificial Intelligence 24 (1986). [Kageyama 1990] N. Kageyama et al.: "Logic Optimization Algorithm by Linear Programming Approach," Proc. of the 27th Design Automation Conference, pp.345-348 (1990). [Maruyama 1988] F. Maruyama et al.: "co-LODEX: a cooperative expert system for logic design," Proc. of FGCS'88, pp.1299-1306 (1988). [Maruyama 1991] F. Maruyama et al.: "Solving Combinatorial Constraint Satisfaction and Optimization Problems Using Sufficient Conditions for Constraint Violation," Proc. of the Fourth International Symposium on Artificial Intelligence (1991). [Fujisawa 1989] H. Fujisawa et al.: "UHDL (Unified Hardware Description Language) and its support tools," Int. J. Computer Aided VLSI Design (1989). [Duley and Dietmeyer 1969] J. R. Duley and D. L. Dietmeyer: "A digital system design language (DDL)," IEEE Trans. Computers, Vol.C-17, No. 19, pp.850-861 (1968). [Brewer 1987] F. D. Brewer: "Knowledge Based Control in Micro-Architecture Design," Proc. of the 24th Design Automation Conference, pp.203-209 (1987). [Robinson 1965] J. A. Robinson: "A Machine Oriented Logic Based on the Resolution Principle," Journal of the ACM, Vol. 12, No.1, pp.23-41 (1965). [Taki 1988] K. Taki: "The Parallel Software Research and Development Tool: Multi-PSI system," Programming of Future Generation Computers (1988). [Ueda 1986] K. Ueda: "A Parallel Logic Programming Language with the Concept of a Guard," ICOT Technical Report, TR-208 (1986). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 ' edited by ICOT. © ICOT, 1992 1190 A Parallel Inductive Learning Algorithm for Adaptive Diagnosis Yoichiro Nakakuki Yoshiyuki Koseki Midori Tanaka C&C Systems Research Laboratories, NEC Corporation 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216, JAPAN E-mail: nakakuki@btl.cI.nec.co.jp Abstract This paper describes a parallel inductive learning algorithm for adaptive model-based diagnosis. Although the model-based systems are more robust than the rulebased systems, they require more computation time. This is because they lack heuristic knowledge. On the other hand, human experts can learn and utilize such knowledge from experience. Therefore, in order to realize efficient model-based diagnosis, learning capability from experience is indispensable. We had proposed an inductive learning mechanism but unfortunately it took much computation time. In order to reduce the computation time, this paper proposes a parallel learning algorithm. The experiential knowledge is represented as a fault probability model and the proposed algorithm searches the most appropriate one out of all the possible models. In order to search effectively, a partial order is introduced into the search space. By using this ordering, two kinds of search control mechanisms, that are local pruning and global pruning, are developed. The algorithm is implemented in KL1 language on a parallel inference machine, Multi-PSI. The experimental results show the effectiveness of the mechanisms. It is also shown that the 16 PE implementation is about 11 times as fast as the sequential one. 1 Introduction Since the creation of the MYCIN system[Shortliffe 1976], most of expert systems, have incorporated the idea of representing their knowledge in a form of symptomfailure association rules. Those expert systems that take rule-based approach have two major inherent disadvantages. First, those systems lack robustness because they cannot deal with unexpected cases which are not covered by rules in their knowledge bases. Second, their knowledge bases are expensive to be created and maintained. There has been a series of research to tackle those The most distinct ones are on modelproblems. based methods, i.e. first-principle methods. Modelbased methods use design descriptions, such as structure and behavior descriptions [Davis 1984, de Kleer 1987, Genesereth 1984]. However, model-based diagnostic systems are generally not as efficient as rule-based ones since they require more complex computation. This is because they lack heuristic knowledge which human experts usually utilize. We have been working on a research to explore a general architecture to realize an adaptive diagnostic agent and introduced its basic architecture[Koseki 1989]. Moreover, an experimental system based on the architecture [Koseki et al. 1990a, Koseki et al. 1990b, Ohta et al. 1991a, Ohta et al. 1991b] have been developed. The system realizes adaptability with learning capability from its experience. The experiential knowledge is represented in a form of a fault probability model of target system components. With this experiential knowledge, it is able to diagnose a failing component faster with a fewer tests than pure model-based systems. However, it takes much computation cost to learn experiential knowledge. This is because the hypothesis space to search grows rapidly with the size of the target problem. In order to reduce the computation time, we developed a parallel learning algorithm. The algorithm utilizes two kinds of search control mechanism, that are local pruning and global pruning. The search space is divided and assigned to each processor so that the transmission of local pruning information does not require interprocess communication. The interprocess communication is restricted to the plausible global pruning information. The algorithm is implemented in KL1 language on a parallel inference machine, Multi-PSI. The experimental results show that the implementation using 16 PEs is about 11 times as fast as the sequential one. Section 2 presents the mechanism of the adaptive diagnostic system. In section 3, the probabilistic-model learning problem is described. A parallel learning algorithm is presented in section 4, and experimental results are shown in section 5. 1191 2 Adaptive Diagnosis Mechanism This section presents the architecture of an adaptive model-based diagnosis. We can observe two kinds of intelligent behavior in maintenance expert's diagnostic procedure. First, they can quickly identify a faulty component with a little information utilizing their experience. Second, even if a novel symptom arises, the expert can reach a conclusion, by consulting with other information sources, such as design description manuals. They can reason which component might have gone wrong and caused the symptom to appear, by knowing how the system is supposed to work. To realize those kinds of intelligent behavior, the system consists of several modules as shown in Figure 21. The knowledge base consists of design knowledge and experiential knowledge. The design knowledge represents a correct model of the target device. It consists of structural description which expresses component interconnections and behavior description which expresses component behavior. The experiential knowledge is expressed as component failure probability for each component. create Test Pattern Selector I Generator -+- t Symptom call ~ Diagnosis Module learning call t Test Test result Module ~ Suspects and fed into the target device. By feeding the test into the target device, another set of observation is obtained as a test result. and .is used to eliminate the non-failure components. Fig. 2-2 Diagnosis Flow In order to compute test effectiveness, the system uses fault probability distribution for each component. The mechanism employed in the system is basically same as the one found in the reference [de Kleer 1989]. It is so called minimum entropy technique where entropy is calculated from the fault probability for each suspected component. Here, an entropy E(5L) of a suspect-list 5 L is defined in terms of the estimated probabilities of each component in the list. Let 5 L denote the set of suspected components, and let PbP2, ... Pn( LPi = 1, Pi> 0) be failure probabilities of suspects 5 b 52, ... 5 n . Then an entropy E(5L) is defined as n E(5L) = - 2::Pilogpi. Fig. 2-1 Structure of the System The general flow of the diagnostic system is shown in Figure 2-2. The system keeps a set of suspected components as a suspect-list. And it takes eliminatenot-suspected strategy[Tanaka et al. 1989] to reduce the number of the suspects in the suspect-list, repeating the test-and-eliminate cycle. It starts with getting an initial symptom. It calculates an initial suspect list from the given initial symptom by performing a model-based reasoning. After obtaining the initial suspect-list, the system repeats a test-andeliminate cycle, while the number of suspects is greater than one and an effective test exists. A set of tests is generated by the test pattern generator. Among the generated tests, the most cost effective one is selected as the next test to be performed. The selected test is suggested i=l The system evaluates gain(T) for all of the available tests. In addition to this value, the system considers the test execution cost to select a cost effective test. The system selects a test according to the following evaluation function. gain(T) / cost(T) At first, the diagnostic system does not know the probability distribution for a target device. Therefore, it should assume that the all of the components have the same fault probability. However, the system becomes efficient as it acquires information on the fault probability from its experience.This is because it can estimate more precise probability distribution and can generate more effective test sequence. In the next section, a learning mechanism is presented. 1192 3 Learning Probabilistic Models The performance of the diagnostic mechanism relies on the correctness of the presumed probability distribution of components. However, it is not easy to predict appropriate probability for each component from observed data, especially when the number of observed data is small. For example, consider a diagnosis of a network system with 100 modems and 100 communication terminals. Here, we assume that 10 modems have broken down in the past (once for each). A simple estimate concludes that each of the 10 components has higher fault probability than any other component. However, human may presume that a modem has higher fault probability than a terminal because modems have broken 10 times in the past and terminals have never broken. Therefore, it is important to select an appropriate estimation method to derive a precise probability distribution (probabilistic model). Here, we consider an example of a target device which consists of 16 components. The observed number of faults for each component is shown in Table 3-1. Several attributes for each component are also shown in the table. Table 3-1 Component 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Example Attributes Type Age a new old a b new b old c new c old d new d old e e new f new f g g h h Next, we consider the relationship between component age and the fault probability. In the example, it seems that the component age does not affect the fault prob~bility. Therefore, in order to estimate the fault probability distribution precisely, it is important to consider component type. In general, some attributes are important to estimate the fault probability and the other attributes are not so important. Moreover, a combination of several attributes may be important. For instance, in the above example, we had better to consider component age, in the case of the component type is g. In order to estimate the probability distribution precisely, we must find relevant attributes (and/or their combination) and consider how to estimate with those attributes. Here we define the presumption problem. Consider a set of events X = {Xl, X2, ... , xm} and attributes aI, a2, •.. , an' Here, we assume that the events are exhaustive and mutually exclusive, and that the domain for each attribute aj (j = 1,2, ... , n) is a finite set Dom( aj). As shown in Table 3-2, for each event, Xi, a value, Vij (E Dom(aj)), for each attribute, aj, is given. Also, ni, the number of observations is given. old old new old new old No. of Obs. (Times) 1 0 13 9 1 1 0 0 0 0 1 0 0 5 1 0 First, we consider the relationship between the component type and the fault frequency. A type b component seems to have a very high fault probability. And it may be natural to conclude that type g component has also slightly higher probability than the other components. On the other hand, it is dangerous to conclude that each of the other components has different fault probability, e.g., the fault probability of type c component is about twice as large as type a component's. Because the difference between the number of observation may be due to an accident. Table 3-2 Event Table of events Attributes al a2 Xl Vn Vl2 X2 V21 V22 X3 V31 V32 ... ... ... ... Xm Vml Vm2 ... an No. of Obs. (times) Vln nl V2n n2 V3n n3 Vmn nm The problem is to presume the probability Pi for each event Xi, from the number of observations ni. If enough amount of data are given, it seems to be easy to estimate the probability appropriately. However, if only a few observation data are given, we must consider the noise affection. Therefore, it is important to extract reliable information by avoiding the noise affection. In order to estimate the fault probability appropriately, we introduced an inductive learning mechanism [Nakakuki et al. 1990, Nakakuki et al. 1991b, Nakakuki et al. 1991c]. In the learning mechanism, a presumption tree is used to express a probabilistic model. Using a presumption tree, all the events are classified into several groups. Here, each event ina group is assumed to have the same fault probability. Therefore, the probabilities for individual events can be calculate from a presumption tree. The details are descri bed below. 1193 L1 = L: log (n - dx + 1) 1 + L: -log Ox + xEPUQ xEQ L: {log (kx - 1) 2 + log c1(kx, Ix)} xEP 4 G4 • : Branching node o :Leaf Fig. 3-1 Presumption tree As shown in Fig. 3-1, a presumption tree consists of several branching nodes and leaves. An attribute aj corresponds to each branching node, and subset Ajk of Dom( aj) corresponds to each branch. Here, each Ajk must satisfy the following conditions. A presumption tree classifies all possible events into several groups. For example, the tree in Fig. 3-1 has four leaves, therefore, the events are classified into four groups by using the tree as a decision tree[Quinlan 1986]. For example, G 1 is a group of events which corresponds to leaf 1. Each event in G 1 is considered to have the same fault probability. Here, for each leaf k, let its corresponding group of event be X k , and for all event Xi E X k , let the sum of ni be Ok. By using a presumption tree, the probability Pi for each event Xi E X k can be calculat'ed as follows: As shown above, a presumption tree represents a probabilistic model. The problem is to find the most appropriate presumption tree for given data. As a criterion for the selection, we introduced the minimum description length (MDL) criterion[Rissanen 1978, Rissanen 1983, Rissanen 1986]. Rissanen argued that the least description length model is expected to fit for presuming the future events better than any other models. Here, description length for a model is defined as the sum of the model-complexity and model-fitness for the given data. The description length of a presumption tree is the sum of: (1) Code length of a tree, and Here, P is a set of all the branching nodes and Q is a set of all the leaves. For. each branching node X, Ix is the number of branches, dx is the depth of the node, kx = IDom(ai)1 (ai is a corresponding attribute for node x), n is the number of attributes, and c1(kx, Ix) = C:~l) if Ix < kx, otherwise 1. On the other hand, log-likelihood (model fitness), L2, between a model and observed data is defined as follows. L2 i Here, Pi is the presumed probability that is derived by using the model. The total code length is the sum of L1 and L2. 4 4.1 A Parallel Learning Algorithm Local Pruning Mechanism As described in the previous section, the problem is to search the least description length tree out of all the possible presumption trees. A heuristic algorithm for the problem was implemented[Nakakuki et al. 1991b] for a sequential machine by using branch-and-bound technique. The following summarizes the algorithm and then proposes a parallel version of the algorithm. Here, let the length of a presumption tree T be denoted by L(T). It is the sum of model complexity(L1(T)) and the model fitness (L2(T)). Intuitively, a large tree has large model-complexity, and a small tree has large(bad) model-fitness[Nakakuki et al. 1991d]. In order to discuss such characteristics more precisely, we introduce a partial order ">-" among the possible presumption trees. The order is defined as: {:::} def Presumption tree T2 can be obtained by replacing some leaves in presumption tree Tl with branching nodes. n in Fig. 4-1 can be For example, presumption tree obtained by replacing leaf z in T a , therefore, (2) Log-likelihood(distance) between the tree and observed data. The code length(model complexity), L1, for a presumption tree is defined as follows[Nakakuki et al. 1991b]. = - L: ni log Pi Similarly, 1194 Intuitively, T2 ~ Tl means T2 is strictly larger than TI. Second, the following inequality holds obviously: L2(T') - L2(T) ;::: L2MIN - L2(T). al al x y Here, if the sum of the right hand sides of the above two inequalities is positive(Le., the pruning condition holds), then the sum of the left hand sides will be positive. Hence, L1(T') + L2(T') > L1(T) + L2(T). z i.e. L(T') > L(T); Ta Fig. 4-1 If T2 ~ Tc Tb Example T1 , then the following inequalities hold by the definition: L1(Tt} :::; L1(T2) L2(TI) ;::: L2(T2) Therefore, for a certain presumption problem, if a presumption tree T is a maximal one under the ordering, then L2(T) will take the least value, say L2 M1N . L2MIN can be easily calculated in advance. By using these characteristics, we can effectively find a least description length tree. The proposed algorithm searches the space of possible presumption trees. It tests simpler tree before testing more complex ones. That is, if there are two presumption trees T and T' such that T' ~ T, the system calculates the length of T before trying T'. Here, consider that the length of a tree T has been tested. Then, the system considers the necessity of testing T' which is more complex than T (i.e. T' ~ T). If it turned out to be unnecessary(i.e., there is no possibility that T' has shorter length than T), then all the trees which are more complex than T' also turns out to be unnecessary to examine. The details of this technique are as follows. In order to decide the necessity, the algorithm tests the following pruning condition: log (n - dx + 1) + log(kx - 1) + log Therefore, it is not necessary to test T'. 0 Here we consider to implement a parallel version of the algorithm. It is natural to divide the search space and to assign each sub-space to individual processor. However, we must be careful when we divide the search space because the performance of the system is greatly affected by the dividing method. For example, in Fig. 4-2(a),' the search space is divided into four parts and each of them are assigned to processor PI to P4. Here, we assume that P2 found that the hatched area in the figure can be eliminated from the search space. Then P2 must transmit that information to other processors. On the other hand, if we divide the search space as shown in Fig. 4-2(b), then P2 can reduce the search space without communicating with other processors. Therefore, it is better to divide the search space so that the reduction can be done locally in a processor. (a) P3 c1(kx, Ix) +L2MIN - L2(T) > 0 Here, x is one of the leaves in T and its corresponding node in T' is a branching node. If the inequality holds, it is not necessary to calculate the length for T'. (b) proof First, it is clear that the following inequality holds by the definition of L1: Fig. 4-2 Search Space Division L1(T') - L1(T) ;::: log (n - dx + 1) + log(kx - 1) + log c1(kx, Ix). In the presumption problem, the search space has a tree structure. Each node in the search tree corresponds 1195 to a possible presumption tree. Moreover, for a internal node of the search tree, each of its child node corresponds to a presumption tree which has longer description length than the parent node's corresponding one. Therefore, for example, the root node of the search tree corresponds to the simplest presumption tree. If a search process examined node T, and the pruning condition for a child node of T is satisfied, then the subtree below the child node can be pruned (Fig. 4-3(a)). This means that the pruned area is included in a subtree which has node T as a root. In other word, parallel search for multiple disjoint subtrees can be performed independently. The algorithm we propose divides the search tree into several disjoint subtrees and searches each of them with individual processor (Fig. 4-3(b)). (a) L1(T') + L2(T') > L(To) i.e. L(T') Local Pruning > L(To). Therefore, it is not necessary to examine T'. Therefore if we find a presumption tree which has shorter length than ever known, then some portion of the search space will be able to be eliminated. However, reducible part of the search tree may be distributed widely throughout the search space. In other words, the pruning information should be announced to all of the other processors. Therefore, it is important to consider the trade-off between the increase of communication cost and the reduction of computation cost. That is, in a searching process, if a presumption tree is found to have shorter length than ever known, then the length of the tree should not always be announced to the other processors. In order to solve the problem, a simple mechanism is incorporated. That is, the newly found length is transmitted only if it is over x bits smaller than the previously known least length. Here, x is a threshold value. 5 (b) Fig. 4-3 then, from the above inequalities, we can conclude: Implementation and Results The learning algorithm was implemented in KL1 language on Multi-PSI, a distributed-memory multi processor machine. First, we implemented the algorithm with the local pruning mechanism. The experiments were performed by using up to 16 PEs in parallel. As a sample data, a fault history which comprised about 100 fault examples was given. The computation time was measured 5 times, and we took the average. The speedup curve of the example is shown in Fig. 5-1. Speedup 4.2 Global Pruning Mechanism There is another kind of search tree pruning mechanism. If a certain process finds that a presumption tree To has less description length than ever known, then each processor need not to test a tree that seems to have longer description length than To. The rest of this section describes details of this technique. Here we consider two presumption trees T and T' such that T' ~ T. Then 10 5 L1(T') + L2(T') + L2MIN > L1(T) + L2 M1 N. ~ L1(T') Here, if newly found tree To, which has shorter description length than ever known, satisfies the pruning condition: L1(T) + L2MIN ~ L(To) ~~~~-r~~~~~~~~~-#PE 7 8 9 1011121314151 Fig. 5-1 Speedup of the Algorithm The implementation using 16 PEs is about 11 times as fast as the sequential implementation (1 PE). There is a 1196 possibility of further speedup by equalizing the load of each PE. An example of the overall load distribution is illustrated in Fig. 5-2. The difference of the load among the PEs may be improved by adding a dynamic load balancing mechanism into the system. Development of this mechanism is under investigation. algorithm. One is an algorithm with local pruning mechanism (Local), and another version incorporates both local and global pruning mechanism (Local+Global). Both of them are executed with 16 PEs. The results show that the global pruning mechanism improved both of the number of reductions and execution time about 20% to 30% in comparison with the local pruning version. Fig. 5-3 shows an example of acquired presumption tree. cur1ng] [low] 9.1625 Fig. 5-2 Load Distribution Next, we implemented the global pruning technique in addition to the local pruning mechanism. The threshold value for transmission is set to 2. This value was acquired empirically. The performance of the algorithm with both the local and global pruning mechanism is shown in Table 5-1. Table 5-1 Performance of the Algorithms (a) Example 1 (ratio) Example 2 (ratio) Example 3 (ratio) No. of Reductions Local Local+Global 870716 558661 1.00 0.64 3602255 2588851 1.00 0.71 30773602 23342853 1.00 0.76 Example 2 (ratio) Example 3 (ratio) Execution Time (msec) Local Local+Global 4522 3378 1.00 0.75 16050 11282 1.00 0.70 109892 89549 1.00 0.81 (b) Each experiment is performed with three randomly generated examples. The number of reductions and the execution time are measured for the two versions of the [old] 5.1231 Fig. 5-3 6 [old] [n8. [01. 7.8935 1. 37 1. 57 [new] 1. 8837 [now] 3.1527 Example of Acquired Tree Conclusion This paper has described a parallel learning algorithm for adaptive model-based diagnosis. The algorithm is based on branch-and-bound technique, and local and global pruning mechanisms are incorporated into the algorithm. The'16 PE implementation with local pruning mechanism is shown to be about 11 times as fast as the sequential one. Moreover, the global pruning mechanism is shown to have an ability to accelerate the parallel search. Future work is to improve the heuristics used in the pruning process. If we can find more effective global pruning criterion which can be computed with low time complexity, it seems to be possible to perform superlinearly. Acknowledgment This research has been carried out as a part of the Fifth Generation Computer Project: The authors would like to thank Katsumi Nitta of the Institute for New Generation Computer Technology for his support, and to express their appreciation to Masahiro Yamamoto and Takeshi Yoshimura of NEC for their encouragement in this work. 1197 References [Davis 1984] Davis, R., "Diagnostic reasoning based on structure and behavior," A rtificial Intelligence, Vol. 24, pp. 347-410, 1984. [Ohta et al. 1991a] Ohta, Y. et aI., "A parallel processing method for an adaptive model-based diagnostic system," Proc. 5th Annual Conference of Japanese Society for Artificial Intelligence (in Japanese), 1991. [de Kleer 1987] de Kleer, J. and Williams, B. C., "Diagnosing multiple faults," Artificial Intelligence, Vol. 32, pp. 97-130, 1987. [Ohta et al. 1991b] Ohta, Y. et al., "A parallel processing method for an model-based diagnosis," Proc. KLl Programming Workshop (in Japanese)" 1991. [de Kleer 1989] de Kleer, J. and Williams, B. C., "Diagnosis with behavioral modes," Proc. IJCAI-89, Vol. 2, pp. 1324-1330, 1989. [Quinlan 1986] Quinlan, J. R., "Induction of decision trees," Machine Learning, Vol. 1 (1), pp. 81-106, 1986. [Genesereth 1984] Genesereth, M. R., "The use of design descriptions in automated diagnosis," Artificial Intelligence, Vol. 24, pp. 411-436, 1984. [Rissanen 1978] Rissanen, J., "Modeling by shortest data description," Automatica, Vol. 14, pp. 465471, 1978. [Koseki et al. 1990a] Koseki, Y., Nakakuki, Y., and Tanaka, M., "An adaptive model-Based diagnostic system," Proc. PRICAI'90, Vol. 1, pp. 104-109, 1990. [Koseki et al. 1990b] Koseki, Y., Nakakuki, Y., and Tanaka, M., "An adaptive model-based diagnostic system and its learning method," Proc. 4th Annual Conference of Japanese Society for Artificial Intelligence (in Japanese), pp. 503-506 1990. [Koseki 1989] Koseki, Y., "Experience learning in model- based diagnostic systems," Pmc. IJCAI-89, Vol. 2, pp. 1356-1361, 1989. [Nakakuki et al. 1990] Nakakuki, Y., Koseki, Y., and Tanaka, M., "Inductive learning in probabilistic domain," Proc. AAAI-90, Vol. 2, pp. 809-814, 1990. [Nakakuki et al. 1991a] Nakakuki, Y., Koseki, Y., and Tanaka, M., "An adaptive model-based diagnostic strategy," Proc. 5th Annual Conference of Japanese Society for Artificial Intelligence (in Japanese), 1991. [Nakakuki et al. 1991b] Nakakuki, Y., Koseki, Y., and Tanaka, M., "An inductive learning method and its application to diagnostic systems," IPSJ S UG Reports 91-AI-74 (in Japanese), Vol. 91 (3), pp. 1928, 1991. [Nakakuki et al. 1991c] Nakakuki, Y., Koseki, Y., and Tanaka, M., "Inductive learning of probabilistic knowledge," Proc. 42nd A nnual Convention IPS Japan (in Japanese), 1991. [Nakakuki et al. 1991d] Nakakuki, Y., Koseki, Y., and Tanaka, M., "A parallel algorithm for learning presumption tree," Proc. 43rd Annual Convention IPS Japan (in Japanese), 1991. [Rissanen 1983] Rissanen, J., "A universal prior for integers and estimation by minimum description length," Ann. of Statist., Vol. 11, pp. 416-431, 1983. [Rissanen 1986] Rissanen, J., "Complexity of stings in the class of Markov sources," IEEE Trans. on Information Theory, Vol. 32 (4), pp. 526-531, 1986. [Shortliffe 1976] Shortliffe, E. J., Computer Based Medical Consultations: balMYCIN, American Elsevier, New York 1976. [Tanaka et al. 1989] Tanaka, M., Koseki, Y., and Nakakuki, Y., "A method of narrowing down suspects using experiential knowledge in model-based diagnostic systems," In Proc. 39th Annual Convention IPS Japan (in Japanese), pp. 243-244 1989. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1198 Parallel Logic Simulator based on Time Warp and its Evaluation Yukinori Matsumoto and Kazuo Taki Institute for New Generation Computer Technology 1-4-28, Mita, Minato-ku, Tokyo 108, Japan yumatumo@icot.or.jp Abstract This paper focuses on parallel logic simulation. An efficient logic simulator on a large-scale multiprocessor is targeted. The Time Warp mechanism, an optimistic method for time-keeping, was experimented and evaluated. Synchronous mechanisms and conservative mechanisms for time-keeping have been examined already, and their inefficiency on large-scale distributed memory machines has been noted. There have been few reports, however, on evaluation of the Time Warp mechanism although rollback processes have been presumed to be heavy. We aim at evaluating the efficiency of this mechanism. Several devices, such as a local message scheduler, an antimessage reduction mechanism and a load distribution scheme, are added in order to reduce rollback overhead. The simulator is implemented on the Multi-PSI, a distributed memory multiprocessor. The simulator is writ-' ten in concurrent logic language KLl. KL1 is expected to be suitable for parallel programming because it supports data-flow synchronization and global name space across the processor boundary. Experiments were done so that the speedup, performance and influences of various overheads could be measured. Using 64 processors, 48-fold speedup and 99K events/ sec performance were obtained. The overhead measurements revealed that rollback processes slightly affected performance. These results showed that the simulator had fairly good performance as a full-software logic simulator and that the Time Warp mechanism worked efficiently. 1 Introduction Logic simulation is used in order to verify not only the functions of designed circuits but also the timing of signal propagation. Since logic simulation is currently one of the most time-consuming stages in LSI design, faster simulators are urgently needed. A parallel logic simulator is one likely way of producing quick simulation. Parallel logic simulation is usually treated as a typical application of parallel discrete event simulation (PDES). In PDES, performance essentially depends on the timekeeping mechanism. The mechanisms broadly fall into three categories: synchronous, conservative and optimistic mechanisms. Since synchronous mechanisms require global synchronization, they, apparently, do not work efficiently on distributed memory multiprocessors[Soule and Blank 1988]. Furthermore, conservative mechanisms tend to deadlock when circuits have feedback loops. A lot of computation power is needed to avoid this[Lubachevsky 1989, Misra 1986, Shimogori and Kage 1989]. On the contrary, optimistic mechanisms cannot deadlock, however, they do expend some computation power on rollback processes [Fujimoto 1990, Jefferson 1985]. Only a few experiments with the optimistic mechanism were reported [Chung 1989, Briner et aI. 1991] but the details have not been evaluated yet. We are targ~ting an efficient logic simulator on largescale MIMD multiprocessors, most of which will be distributed memory machines. We adopted the Time Warp mechanism, an optimistic mechanism, because the overheads of the mechanism were considered to be reduced using some devices such as a local message scheduler, an antimessage reduction mechanism and a load distribution scheme. By adding them to the Time Warp mechanism' we expected that it would become suitable for logic simulation on large-scale MIMD machines. We have implemented a parallel logic simulator on the Multi-PSI[Taki 1988] - an experimental parallel inference machine, a distributed memory multiprocessor. The simulator was written in concurrent logic language KLl. KL1 provides several advantages for quickly programming parallel applications. Data-flow synchronization, global name space and dynamic memory allocation are expected to remove the causes of many bugs. Several benchmark circuits have been simulated on the 1199 simulator in order to evaluate the efficiency of the Time Warp mechanism. Performance, speedup, rollback overhead and inter-PE (processor element) communication overhead have been measured. This paper firstly overviews our system. Remarkable devices to enhance efficiency, such as a load distribution scheme, a local message scheduler and an antimessage reduction mechanism are mentioned. Secondly, KL1 and the Multi-PSI are briefly introduced. Then, fairly good performance and speedup in actual execution are reported. Finally, we refer to the examination that revealed the main causes affecting performance in our simulator. 2 The Time Warp Mechanism Event simulation can be modeled so that several objects change their states by communicating with each other. An object is a state-automaton. A message has information of an event whose occurrence time is stamped on the message (time-stamp). Jefferson proposed the Virtual Time paradigm and its implementation, the Time Warp mechanism[Jefferson 1985]. He suggested that the Time Warp mechanism would be useful as the time-keeping mechanism for PDES. In the Time Warp mechanism, each object usually acts according to received messages and also records the history of messages and states, assuming that messages arrive chronologically. But when a message arrives at an object out of time-stamp order, the object rewinds its history (this process is called rollback), and makes adjustments as if the message had arrived in correct timestamp order. After rollback, ordinary computation is resumed. If there are messages which should not have been sent, the object also sends antimessages in order to cancel those messages. In addition to the above, a global control mechanism sometimes works to update GVT (global virtual time) which is used for memory management. GVT must satisfy the following two conditions. 1. GVT is not greater than the minimum simulation time at any object. 2. GVT is not greater than the minimum time-stamp values in the messages that have been sent but not yet re~eived. After the global control mechanism updates GVT, it notifies all objects of the new GVT. As no objects rewind their histories before GVT, the memory area occupied by histories before GVT can be released. 3 3.1 System Overview System Specification The system simulates combinatorial circuits and sequential circuits that have feedback loops. It handles three values: Hi, Lo, and X (unknown). "A different delay time can be assigned to each gate (non-unit delay model). Since this system only treats gates, flip-flops and other functional blocks should be completely decomposed into gates. The supported functions are the minimum set for experiments, but they can be easily expanded (e.g. to handle more signal values). 3.2 Load Distribution Scheme For efficient execution of parallel logic simulation on a distributed memory machine, the scheme of load distribution is important at the following three points: load balancing, keeping inter-PE communication frequency low and deriving a lot of parallelism. In our simulator, target circuits are partitioned statically in the preprocessing phase. We propose a new partitioning strategy called "Cascading-Oriented Partitioning" (COP, for short) for high-quality load distribution. COP makes several clusters by grouping gates that are connected to each other in a cascade form. A grouping operation starts from the primary input of the circuit. By tracing a path of the gate connection straightforward, subsequent gates are included in a cluster. If there are several candidates to be included, only one gate is selected and the others are left to be the starting points for other grouping operations. After partitioning, small clusters that contain very few gates are merged into adjacent large clusters. Conversely, extremely long cascade-formed clusters are cut into several smaller clusters so that they "do not cause load imbalancing. Finally, clusters are assigned to PEs randomly; the only constraint is that each processor should contain a roughly equal number of gates. COP intends to exploit parallelism of the multiple fanouts. COP also guarantees that a gate has at least one adjacent gate in the same cluster. So, COP is effective in keeping the communication locality, that is, reducing inter-PE communication. The random distribution of clusters attains load balancing. In COP, the smaller each cluster, the better load balancing but the higher inter-PE communication frequency. There is a performance trade-off between good load balancing and the low frequency of inter- PE communication. It is necessary to decide the appropriate size of a cluster according to the number of gates in the 1200 target circuit and the number of PEs I . COP may look similar to Agrawal's algorithm, however, COP is different from it in the next two points. • Agrawal's algorithm basically assumes simulation using the synchronous time-keeping mechanism. According to the gate delay value, the algorithm estimates the number of messages generated in each cluster at each tick for the purpose of load balancing. x ~~ X~ ~ \~ \~ \ ~ V to be canceled Figure 1: Cancellation with several antimessages Conversely, an estimation such as the above is slightly beneficial to our simulator because messages with different time-stamps can be evaluated simultaneously in the Time Warp mechanism . • In Agrawal's algorithm, once a cluster is generated, it will never be decomposed into smaller ones. Therefore, the load is sometimes imbalanced. to be canceled Figure 2: Cancellation with one antimessage In COP, however, clusters that are too large are cut into several adequately sized clusters. This enables the system not only to be flexible for various numbers of PEs but also it to exploit more parallelism (i.e. pipeline parallelism). 3.3 Local Message Scheduler In the simulation, there are usually several messages to be evaluated in a PE. When the Time Warp mechanism is used, the bigger time-stamp a message has, the more likely the message is to be rewound. For this reason, proper message scheduling in each PE is expected to reduce rollback effectively. In our system, a message scheduler resides in each PE. When a message is spawned, it is first registered in the scheduler in which the destination object belongs. The scheduler picks up the messages with the smallest timestamps and sends them to destination objects at the appropriate moment. This scheduler ensures that rollback never happens as long as an object is receiving messages from other objects in the same PE. Only messages sent from other PEs may cause rollback. 3.4 Reduction of Antimessages In Jefferson's original Time Warp mechanism, when rollback occurs, as many antimessages must be generated as the number of messages that need to be canceled (Figure 1). However, the number of antimessages can be reduced when we assume the next condition: for any objects A and B, messages transmitted from A to B are received by B in the same order as they are sent by A (orderpreserved condition)[Fukui 1989]. 1 For reference, clusters with 12 to 32 gates are generated for a circuit consisting of 12,000 gates in the simulation using 64 PEs. to be canceled Figure 3: Cancellation with no antimessages Assume that MIl M 2 , .. , Mn are messages and AM is an antimessage. Also assume MI, M 2 , .. , Mn all satisfy the following three conditions: • M I , M 2 , .. , Mn were sent before AM, • M I , M 2 , .. , Mn were sent along the same channel that AM is sent along, • M I , M 2 , .. , Mn have time-stamps greater than or equal to AM. Then it is clear that MI, M 2 , .. , Mn must be canceled. No other messages must be canceled. Only one antimessage that corresponds to the canceled message with the smallest time-stamp need be sent (Figure 2 ). We advanced this idea one step further. Assume that a sender has to cancel messages M 1 ,M2 , .. ,Mn which have already been sent in this order, and at the same time the sender knows that a new message Mnew will be sent whose time-stamp is equal to or less than that of MI. In this case, the sender sends Mnew but no antimessage. When a receiver receives Mnew with a smaller timestamp than the Mn that the destination object received just before, the receiver can easily notice that an invalid situation has occurred, and ca,n cancel MI, M 2 , .. , Mn immediately (Figure 3). In our system, the message streams of KL1 are used for communication between objects. Since KL1 keeps the order of messages in the stream, the order-preserved 1201 condition is satisfied. So, we adopted the above optimization for reducing antimessages. 4 4.1 Hardware and Language Hardware This simulator is implemented on the Multi-PSI[Taki 1988], a distributed memory MIMD machine. The MultiPSI consists of 64 processing elements (PEs) connected to each other by a 2-dimensional mesh network. A PE is a 40- bit (8 bits for tag and 32 bits for data) CISC processor controlled by horizontal micro-instruction. The cycle time is 200 nsec. A network controller is paired with each PE, supporting message passing communication between PEs. The bandwidth of the controller is 5M bytes/sec. The network has wormhole routing functionality. Since the Multi-PSI is a distributed memory machine, communication latency between objects in different PEs takes approximately twenty times longer than latency in the same PE. However, the distributed memory architecture can be scaled up easily. 4.2 Language and Implementation This simulator is written in concurrent logic language KL1. KL1 is a language without destructive value assignment to variables, that is a single assignment language. Due to its nature, data-flow synchronization is realized without significant overheads in the language implementation. Therefore, KL1 never compels programmers to describe synchronization explicitly at a primitive level. On the other hand, a single assignment language tends to consume storage rapidly. A dynamic memory allocation mechanism and several garbage collection mechanisms are supported in the KL1 implementation. So, programmers are free from writing memory management. The KL1 language assumes a system-wide (global) name space even on a distributed memory machine. In KL1 programming, first, a programmer writes only the logical concurrency, relations among concurrent objects or data-flow. Then, the programmer adds the "pragma" to specify object allocation to a certain processor, as below. ... , B@processor(PE), ... where B is a "goal" of KL1, which represents an object. Inter- PE reference pointers among objects or variables are maintained automatically by the KL1 language system at run time. Thus a programmer need not worry at all about the programming of inter-PE communication. Since the characteristics described above eliminate the causes of many bugs, KL1 enables us to develop parallel programs much more easily than with the conventional languages (e.g. FORTRAN and C). In fact, it took one person just three months to complete the primary version of the simulator, including the circuit partitioning module! Moreover, because of KL1, several different experiments, which needed program modification, could be performed in a short period. 4.3 Avoiding Asynchronous Copying GC As mentioned above, garbage collection (GC) is indispensable for KL1. Two kinds of garbage collection (GC) mechanisms, a copying GC[Baker 1978] and the MRB GC[Chikayama and Kimura 1987], are implemented for intra-PE memory management on the Multi-PSI. For the Time Warp mechanism, the most important point in obtaining good performance is to keep the pace of simulation in each PE equal. However, the copying GC starts at different times in different PEs and disturbs the pace of simulation. Fortunately, since the MRB GC collects single reference data area without stopping KL1 execution, it is expected to stabilize the pace of simulation. We took great care to keep the data reference single so that all data areas can be collected by the MRB GC. Hence we succeeded in preventing the copying GC 2 • 5 Measurements Slons and Discus- Four sequential circuits, presented in ISCAS'89, were simulated on the Multi-PSI. The number of gates, average fan-ins and average fan-outs of the circuits are shown in Table 1. We measured system performance, speedup and overheads, such as rollback and inter- PE communication, in the experiments. Table 2 shows the system performance for various numbers of PEs. Figure 4 indicates speedup. Table 3 shows the percentage of each process cost 3 . Table 4 shows the percentage of actual events4 and rewound messages. Table 5 shows the frequency of rollback fr, the average depth of rollback dr (i.e. the average number of rewound messages per rollback) and the frequency of inter-PE communication fe. fr is defined as Rj E, dr as Hr/ R, and fe as Me/Mall, where R is the total number of rollback occurrences, E is the total number of actual events, Hr is the total number of rewound messages, Me 2 Consequently, when a certain circuit was simulated using 64 PEs, an improvement in performance of approximately 37% was attained compared to the case where the copying GC occurred(see appendix). 3These are the average values for 64 PEs. 4Actual events are the messages that are not rewound. 1202 Table 1: Target circuits Circuits No. of gates Avg. fan-ins Avg. fan-outs Table 3: Percentage of time for each process (64PEs) I s1494 I s5378 I s9234 I s13207 I 683 2.15 2.08 3,853 1.70 1.61 6,965 1.57 1.50 11,965 1.66 1.55 Table 2: Performance (events/sec) I PEs \ Circuits I 1 4 16 64 s1494 2,572 5,662 10,413 10,943 I s5378 I s9234 I s13207 I 2,410 8,401 26,141 64,013 2,326 7,709 19,003 35,118 2,051 9,092 33,793 99,299 50 Evaluating and scheduling messages t , etc. Rollback Inter-PE communication GVT updating History releasing Idling 72.28 80.28 58.69 85.79 5.32 2.50 1.61 1.53 13.13 8.24 4.38 2.12 1.21 0.43 7.63 0.48 2.41 6.09 0.62 1.57 33.13 0.64 3.86 6.06 t This process is not only for actual events but also for messages rewound. Table 4: Percentage of actual events and rewound messages (64PEs) Speedu 60 I Process\ Circuits I s1494 I s5378 I s9234 I s13207 I ------ 40 Ideal , s13207 s9234 s5378 s1494 , , , , Circuits Actual events Rewound msgs. ,- ,30 20 10 10 20 30 40 50 60 No. of PEs Figure 4: Speedup 5.2 is the total number of messages that are sent across PE boundaries and Mall is the total number of messages. 5.1 Performance and Speedup As shown in Table 2 and Figure 4, the simulator attained approximately 99K events/sec performance and 48-fold speedup in the best case using 64 processors. This performance is fairly good for a full-software logic simulator. This good speedup shows that the Time Warp mechanism works efficiently. In some cases, however, comparatively poor speedup was measured. In order to ascertain the cause of limited speedup, we will discuss the inter-PE communication overhead, the rollback overhead and parallelism in the following subsections. Inter-PE head Communication Over- The cost per message for inter-PE communication was measured to be 0.503 msec 5 • However, the average cost of the essential work, that is, evaluating and scheduling a message, was 0.362 msec. So, the inter-PE communication cost was not negligible6 • Tables 3 and 5, however, show that both the frequency and the percentage of inter-PE communication processes were low in the cases of s13207 and s9234. This means that our strategy for partitioning circuits worked effectively and that inter-PE communication had only a slight effect on performance in these cases. Conversely, in the case of s1494, both the inter-PE communication frequency and the percentage of its process were high compared to other cases. s1494 has, on average, more fan-ins and fan-outs than the others (Table 1), and it tends to A message is 25 bytes of data the relative cost of inter-PE communication is far lower than systems where inter-PE communication is supported by the operating system. 5 6 However , 1203 cause the high inter-PE communication overhead. 5.3 Rollback Overhead Rollback frequency and its cost greatly attracted our interest. Table 5 shows that the rollback frequency is not as high as we formerly suspected, except for s1494. The average cost per rollback, even for the highest case, s92347 , amounted to 0.578 msec by our measurement. Since the time for essential processes was 0.362 msec, the rollback procedure is not extremely timeconsuming compared to the essential works, and consequently the percentage of total rollback cost is not seriously high in itself, as shown in Table 3. 5.4 Parallelism Parallelism suggests the upper limit of speedup. In practice, however, the actual speedup is usually different from the parallelism because of several overheads. We estimated the parallelism of each problem, as below. We made another simulator to measure parallelism. In that simulator, all PEs work according to the global synchronization. Here, we call an interval between global synchronizations a "time slot". A PE evaluates only one message in a time slot. All output messages, if any, are also sent and registered to their destination schedulers within the time slot. When there is no message to be evaluated in a PE at a certain time slot, the PE simply idles. Assume that the simulation finishes after N synchronization, and that M actual events, which are the messages that are not rewound, are measured in the simulation. Here, we define the parallelism of the problem as MIN. The parallelism means the speedup in such an environment where the cost for non-essential processes, such as rollback and inter-PE communication, can be ignored. On the other hand, to make clear the effect of the interPE communication overhead and the rollback overhead, we measured the cost of releasing an unnecessary history area, which causes super-linear speedup[Matsumoto and Taki 1991]. Then we removed its effect from the measured speedup and recalculated the speedup. We named the recalculated speedup "modified speedup" . Table 6 compares the modified speedup and the parallelism of each problem using 64 PEs. The gap between the modified speedup and the parallelism is caused by the inter-PE communication overhead and the rollback 7The cost is considered roughly proportional to the depth of rollback, and s9234 has the largest depth. overhead. For s5378, s9234 and s13207, the parallelism is close to the modified speedup. This means that the limited speedup was caused by a lack of parallelism. We conclude that our system could show good speedup as long as target problems have sufficient parallelism. With respect to the exceptional case, s1494, as Table 4 shows, a considerable percentage of the messages are rewound. It is considered that the high percentage was caused indirectly by the high inter-PE communication overhead or high rollback overhead, and resulted in further suppression of speedup. 6 Further Experiments Since neither the inter-PE communication cost nor the rollback cost are negligibly small, both of these costs are considered to affect performance not only directly but also indirectly. However, it is difficult to separate their infl. uences clearly. In this section, we report on the experiments that aim at clarifying which affects performance more, the interPE communication cost or the rollback cost. We assumed the model described below and made a system for the experiments. 6.1 Model We assume that the only processes that need costs are the rollback, the inter-PE communication and an essential process. Here, an essential process consists of a message evaluation work and a scheduling work. Any other processes, such as GVT updating and releasing unnecessary history area, do not need any costs at all. It is also assumed that the essential process cost is equal for any gates. In our model, the inter-PE communication cost Cc decomposes into three factors as follows. (1) where Cpo is the time consumed in the sender PE for composing a message packet, C1 is the time from when the message leaves the sender until it arrives at the receiver (latency), and Cpr is the time taken by the receiver to decompose the message packet. As the rollback cost, tTl is roughly proportional to the number of rewound messages, tr is represented by the next equation. (2) where hr is the number of messages rewound in the history and Cr is a constant. In practice, Equations 1 and 2 give a fairly precise representation of these costs. With regard to Cps and Cpr' they are· approximately equal[N akajima and Ichiyoshi 1990] on the Multi-PSI, while the latency is negligible even if messages are transmitted between the most distant PEs. 1204 6.2 Experimental System No. of rewound msgs. (relative) Performance (relative) The experimental system is based on the simulator presented in the previous sections. By adding several dummy loads to the original simulator, the actual costs for rollback and inter-PE communication become negligible. Thus this system maintains its fidelity to the model as much as is possible .. In the system, the cost for an essential process is fixed to be heavy, whereas the rollback cost and the inter-PE communication cost are changeable. 10.0 1.0 0.8 ,//-,> 0.6 .II' ______ ,~/'-' 0.4 ,'- 0.2 _-.".' 6.3 Results o·~.o We performed two kinds of comparative examination, as below. 1.0 t 2.0 3.0 + Cpr Cps Figure 5: Factors affecting performance(l) 1. The inter-PE communication cost is fixed so that its relative value to the essential process cost can be kept the same as in the actual simulation. With respect to rollback, kr and Cr in Equation (2) are varied but Cr / kr is kept equal to that in the actual simulation. 10.0 1.0 0.8 2. The rollback cost is fixed so that its relative value to the essential process cost can be kept the same as in the actual simulation. The inter~PE communication cost is varied but the equality between Cps and Cpr in Equation (1) is kept because they were approximately equal in the actual simulation. We simulated s9234 and s1494. They involved approximately the same parallelism, whereas both the interPE communication frequency and the rollback frequency were very different. Figures 5 and 6 show the results. In Figure 5, the X axis shows the relative value of Cps +Cpr compared to the essential cost. In Figure 6, the X axis shows the relative value of Cr. The Y axis shows the relative performance (by solid lines) and the relative amount of rewound messages (by broken lines) compared to those when both the inter-PE communication cost and the rollback cost are set to zero. The arrows indicate the the actual proportion points between these costs. For both circuits, the higher the inter-PE communication cost, the worse the performance. This apparently shows that the inter-PE communication cost affected performance adversely. Interestingly, the relative amount of rewound messages increased with the higher inter-PE communication cost for s1494, while the curve of the amount is approximately flat for s9234. The difference in declination of the performance curves was, therefore, caused not only by the difference in the inter-PE communication frequency but also by the difference in the amount of rewound messages. On the contrary, neither performance nor the amount of rewound messages varied remarkably even though the No. of rewound msgs. (relative) Performance ( relative) 0.6 .... __._-------a----_ _-----.-------------- 5.0 .. - [ : -..--- S1494,Prfm.[ --s1494,Rwd. -0-s9234,Prfm. ---00-- s9234,Rwd. 0.4 0.2 ~ ~ ..... -II- --0--<>--------0-----0--------0-------------- 1.0 O.o+-----..---..,.....---..-------l 0.0 0.5 1.0 1.5 Cr Figure 6: Factors affecting performance(2) cost of rollback increased. This means that rollback cost did not have a dominant effect on performance in our system. 7 Summary and Conclusion We constructed a parallel logic simulator on the MultiPSI, a distributed memory multiprocessor. The simulator was programmed in concurrent logic language KL1. Since the causes of many bugs are essentially reduced by KL1, the simulator was able to be programmed in only three months by one person. The Time Warp mechanism was adopted for timekeeping in the simulator. Since rollback overhead in a naive Time Warp mechanism was considered heavy, we added several devices such as a local message scheduler, an antimessage reduction mechanism and a load distribution scheme to reduce the overhead. Several benchmark circuits were simulated on our system. Approximately 99K events/sec performance and 48-fold speedup were attained using 64 PEs. The per- 1205 formance is fairly good for a software logic simulator. The good speedup shows that the Time Warp mechanism worked efficiently in the simulator. We also examined the factors that are considered to affect performance adversely. The experiments revealed that the rollback overhead did not affect performance seriously in our system, while the inter-PE communication over head decreased performance. Acknowledgement Valuable advice and suggestions were given by the members of PIC-WG, an ICOT working group, discussing parallel LSI-CAD. The authors gratefully thank them. Data for the evaluation of our system were recommended and given by Fujitsu Ltd. and Keio Vniv. We also thank them. References [Agrawal 1986] P. Agrawal. Concurrency and Communication on Hardware Simulators. IEEE Trans. on Computer-Aided Design, VoLCAD-5, No.4 (1986), pp. 617-623. [Baker 1978] H. G. Baker. List Processing in Real Time on a Serial Computer. Communications of the ACM, Vol. 21, No.4 (1978), pp. 280-294. [Briner et al. 1991] J. V .Briner et al. Parallel Mixedlevel Simulation using Virtual Time. CAD accelerators, North-Holland, 1991. pp. 273-285. [Chung 1989] M. J. Chung and Y. Chung. Data Parallel Simulation using Time-Warp on the Connection Machine. In Proc. 26th A CM/IEEE Design Automation Co nf. , 1989. pp. 98-103. [Chikayama and Kimura 1987] T. Chikayama and Y. Kimura. Multiple Reference Management in Flat GHC. In Proc. Fourth Int. Conj. on Logic Programming, 1987. pp. 276-293. [Fujimoto 1990] R. M. Fujimoto. Parallel Discrete Event Simulation. Communications of the ACM, Vol.33, No.10 (1990), pp. 30-53. [Fukui 1989] A. Fukui. Improvement of the Virtual Time Algorithm. Trans .. of Information Processing Society of Japan, Vol.30, No.12 (1989), pp. 1547-1554. (in Japanese) [Jefferson 1985] D. R. Jefferson. Virtual Time. ACM Trans. on Programming Languages and Systems, Vol.7, No.3 (1985), pp. 404-425. [Kudoh et al. 1991] T. Kudoh et al .. Parallel Logic Simulator for Shared Memory Multiprocessors. IEICE Technical Report, CPSY91-23 (1991), pp. 151-131. (in Japanese) [Lubachevsky 1989] B. D. Lubachevsky. Efficient Distributed Event-Driven Simulations of Multiple- Loop Networks. Communications of the ACM, VoL32, No.1 (1989), pp. 111-131. [Matsumoto and Taki 1991] Y. Matsumoto and K. Taki. Parallel Logic Simulation based on Virtual Time. In Proc. Joint Symposium on Parallel Processing '91, 1991. pp. 365-372. (in Japanese) [Misra 1986] J. Misra. Distributed Discrete-Event Simulation. ACM Computing Surveys, Vol. 18, No.1 (1986), pp. 39-64. [Nakajima et al. 1989] K. Nakajima et al .. Distributed Implementation of KL1 on the Multi-PSljV2. In Proc. 1989 Int. Conj. on Logic Programming 1989. pp. 436-451. [Nakajima and Ichiyoshi 1990] K. Nakajima and N. Ichiyoshi. Evaluation of Inter-processor Communication in the KL1 Implementation on the Multi-PSI. ICOT Technical Report, TR-531 (1990). [Shimogori and Kage 1989] S. Shimogori and T. Kage. Parallel Logic Simulation using A Message-Driven Approach. IEICE Technical Report, CAS88-ll0 (1989), pp. 23-30. (in Japanese) [Soule and Blank 1988] 1. Soule and T. Blank. Parallel Logic Simulation on General Purpose Machines. In Proc. 25th A CM/IEEE Design Automation Conj., 1988, pp. 166-170. [Taki 1988] K. Taki. The parallel software research and development tool: Multi-PSI system. Programming of Future Generation Computers, North-Holland, 1988. pp. 411-426. [Veda and Chikayama 1990] K. Veda and T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. The Computer Journal, VoL33, No.6 (1990), pp. 494-500. 1206 Appendix For the purpose of ascertaining the influence of the asynchronous copying GC, we made another simulator and compared it to the original. The difference between the comparative simulator and the original is as follows. Original simulator : Only the MRB GC works for collecting garbage. Comparative simulator : The copying GC happens asynchronously in each PE. Table 7 compares the simulators when s13207 was simulated using 64 PEs. The result shows that asynchronous out breaks of the copying GC in each PE increased both rollback frequency and rollback depth. It certainly caused the poor performance of the simulator. Table 7: Influence of asynchronous copying GC Performance (K events/sec) Frequency of rollback Depth of rollback Original simulator Comparative simulator 99.299 72.895 0.0243 0.0261 7.96 11.684 PROCEEDINGS OF THE INTERN A TIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 1207 Applications of Machine Learning: Towards Knowledge Synthesis Ivan Bratko Faculty of Electr. Eng. and Computer Sc., Ljubljana University and J. Stefan Institute, Slovenia Abstract This paper shows, by presenting a number of Machine Learning (ML) applications, that the existing ML techniques can be effectively applied in knowledge acquisition for expert systems, thereby alleviating the known knowledge acquisition bottleneck. Analysis in domains of practical interest indicates that the performance accuracy of knowledge induced through learning from examples compares very favourably with the accuracy of best human experts. Also, in addition to accuracy, there are encouraging examples regarding the clarity and meaningfulness of induced knowledge. This points towards automated knowledge synthesis, although much further research is needed in this direction. The state of the art of some approaches to Machine Learning is assessed relative to their practical applicability and the characteristics of a problem domain. 1 Introduction Machine Learning is one of the most active areas of Artificial Intelligence. In the view of the technical results of this area, and the well known knowledge acquisition bottleneck in expert systems, sometimes known as the Feigenbaum bottleneck, it is surprizing that Machine Learning has not had a stronger impact on the practice of knowledge acquisition for expert systems. Even some known authorities on expert systems occasionally express a reserved view regarding automatic knowledge acquisition through machine learning.. For example, Chandrasekaran (1991) in a recent discussion posed the question: "It is often proposed that a way to avoid teasing expertise from experts is to automatically learn from examples. Have you found this a useful strategy?" The answer from a leading practitioner from the commercial side of expert systems technology was: " ... I have yet to see a situation where that is an effective way to go forward, especially when you're starting with somebody who knows something .... " The practice of AI applications in some laboratories and companies shows, however, that this expresses an overly pessimistic view. This paper presents examples of ML applications in which existing techniques were effectively applied. The paper does not aspire to be in any way a complete survey of the state-of-the-art ML techniques and their applications. However, the example applications and programs discussed are generally illustrative of the practically oriented ML research done at many AI laboratories. An early demonstration of the usefulness of Machine Learning from examples in knowledge acquisition was induction-assisted knowledge base construction for diagnosing soybean diseases (Chilausky and Michalski 1976; Michalski and Chilausky 1980). A comparison between a manually constructed knowledge base and one constructed with the assistance of an inductive learning program showed the advantages of the latter approach. Michie (1989) describes another early interesting experience concerning the construction of a small expert system to decide whether a Space Shuttle pilot should lend manually or automatically. The decision depends on the current information about the stability, altitude and velocity estimates of the vehicle etc. This project was an early demonstration of the experts' difficulty in explicitly formulating decision rules although all the relevant information was in their heads. Experience shows that experts' difficulty of this kind is a rather typical phenomenon. 1208 Michie (1989) writes: "Early in 1984, to address a NASA requirement, the autolander's chief designer, Mr Roger Burke, with engineering colleagues, attempted to construct a computer program to map the real-time values of monitored variables to the alternative decisions useauto and noauto. Such a program running on an on-board computer was needed to display continually updated advice to the pilot. After some months of (noninductive) programming they concluded that further effort would not be rewarding. The trouble was later shown to have stemmed not from any intrinsic difficulty of the decision task but from the disability from which every expert suffers in articulating what he or she knows, whether about plant pathology, about medical diagnosis, about process control, about how to play lightening chess or about the movements of the stockmarket. Mr Burke and his colleagues then attended a course in inductive programing given by Radian Corporation in Austin, Texas, based on the commercial induction software RuleMaster (Michie et. al. 1984). Relieved from the struggle to read the needed rules directly from inside their own heads, they were able ( ... ) to construct the solution '... " Alth~>ugh a not very sophisticated tool was applied to a not very difficult problem, the NASA experience is very instructive. It illustrates the phenomenon concerning the difficulty of eliciting explicit rules about a domain even if (1) there are experts that can solve concrete problems in the domain quite well, and (2) there is nothing inherently difficult about the domain. Even so, extracting explicit rules from the user turns out to be difficult. When the knowledge elicitation process is aided by a learning tool, the process suddenly appears trivial. Finally, when the actual simple looking solution becomes obvious, there is typically a somewhat embarassing impression that "clearly, the problem should have been possible to solve without the use of machine learning". However, experience confirms that often only when a machine learning tool is eventually applied, the problem solution emerges as obVIOUS. Another early and similar example of this phenomenon is W. Leech's (1986) application of ML to the synthesis of control rules for process control at a Westinghouse nuclear fuel processing plant. Control rules synthesised from examples using another early ML tool ExpertEase improved the yield drammatically. When analysing the project that led to this innovation, the company officially confirmed that the discovery of the new control rules only occurred when ML was used and the discovery would have been highly unlikely without it. A review (UrbanCic, Kononenko and Krizman 1991) of AI applications done by my laboratory in Ljubljana also contains many applications with similar scenario. Among over sixty AI applications included in the review, almost half of them critically rely on the use of ML techniques. One more or less randomly chosen example among these applications, illustrating the same point as the NASA and Westinghouse experience, is from the Jesenice Steel Mill, Slovenia. Their problem was the control of the quality of the rolling emulsion for the Sendzimir rolling mill. The quality of rolling critically depends on the properties of emulsion. An expert therefore daily measured various parameters of emulsion in the rolling mill (concentration of iron, ashes, presence of bacteria, etc.) and decided on the appropriate action (e.g. change emulsion, add antibacteria oil, no action, etc.). When the expert was expected to leave the company they attempted to construct an expert system, extracting his decision knowledge from him in the dialogue fashion. Only when after half a year there was no clear progress, they were prepared to apply a ML tool (Assistant Professional in this case; Cestnik et al. 1987) using example decisions from the expert's practice as learning examples. The resulting decision tree, implemented as an expert system, is now used regularly and completely substitutes the decisions that were previously entirely made by the expert. The most practically successful form of learning has been attribute-based learning exemplified by the TDIDT approach (top-down induction of decision trees, e.g. Quinlan 1986). The next section presents results of applications of attribute-based learning in various domains of medical diagnosis and prognosis. These results are interesting also in that they enable a quantitative comparison of the performance of human experts and ML-based diagnostic systems. Although very effective in many domains of practical interest, attribute-based learning has some clear limitations, pointed out in Section 3. These limitations are being overcome by the development of another generation of learning systems, implementing relational learning, such as ILP (Inductive Logic Programming, Muggleton 1991). Section 4 presents an example application where the ability of relational learning is essential. ILP, although less mature than attribute-based learning, shows great potentials in application problems that are hard to tackle with attribute-based learning. Section 5 discusses the fu- 1209 ture of ML with respect to knowledge synthesis. 2 Applications in medical domains Along with the development of various learning methods in the Ljubljana AI Laboratories, these methods were applied to a number of medical diagnosis/prognosis problems. These applications also served as a source of useful new ideas for further improvements of the learning methods. Some of our medical data (in particular the diagnosis in lymphography, location of primary tumor, and prognosis in breast cancer) were made available to other researchers and were used by many for experimentation and direct comparison of various learning algorithms. This section presents some results obtained in Ljubljana with various learning systems in several medical domains. Most of this work in medical applications was done with the Assistant system although other programs were also used, including GINESYS (Gams 1988) and Log Art (Cestnik and Bratko 1988). Assistant belongs to the TDIDT family of learning programs (top down induction of decision trees, Quinlan 1986). Assistant is a successor of Quinlan's ID3 (Quinlan 1979) with a number of addidional mechanisms. Early experiments with a version of ID3 in learning of diagnostic rules for lymphatic cancers (Bratko and Mulec 1979) provided encouragement that led us to further exloration and substantial refinements of this approach that were implemented in Assistant. The new mechanisms, motivated and discovered through experiments in medical domains, include: automatic selection of good examples for learning, handling partially specified objects (missing data), forward pruning of decision trees, post pruning (Niblett and Bratko 1986, Cestnik and Bratko 1991), binarisation of attributes (Kononenko et. al. 1985; Bratko and Kononenko 1987). It should be noted that these techniques among some other important improvements to the basic TDIDT learning were contributed or independently discovered by other researchers, for example in the C4 program (Quinlan et al. 1989) and the CART system (Breiman et al. 1984). Mingers (1989a; 1989b) reviews various related techniques and makes an attempt at their comparison. TDIDT programs belong to attribute-based learning. They accept learning examples in the form of attribute-value vectors. Similarly, both GINESYS and LogArt are attribute-based learning programs. GINESYS generates if-then rules. The innovation of GINESYS was confirmation rules that accompany the "main" rule and enable the system to exploit redundancy in the attribute-value data. Redundancy is in principle useful in noisy domains, such as medicine, as a means for filtering out errors. The idea of exploiting redundancy (rule bases with redundancy) was later accepted as generally useful in learning in noisy domains, but GINESYS (Gams 1989) was probably the first to build explicitly on this principle in Machine Learning. Unlike most other systems, LogArt (Cestnik and Bratko 1988) generates elimination rules. The rules are ordered according to their statistical credibility. In diagnosis, rules are applied in this order to eliminate all but one of the diagnostic possibilities. When this is not possible and there are more than one residual diagnostic possibilities, the Bayes classifier is employed as a tie-breaker. The credibility of induced rules is measured simply as the number of confirming observations in the learning data. These rules are extremely simple and thus also useful for straightforward explanation of the diagnostic decision. Despite this almost unbelievable simplicity, LogArt compares extremely well with other learning systems in respect of diagnostic accuracy. The key to Log Art 's performance lies in high number of simple elimination rules for each application which, similar to GINESYS, facilitates the use of redundancy. This makes LogArt very robust with respect to noise in the learning data and also enables it to cope easily with missing data, that is unspecified attribute values. On the theoretical side, it was shown that LogArt's classification procedure can be viewed as a special strategy of evaluating the Bayes classification rule without the attribute independence assumption (Cestnik and Bratko 1988). LogArt's classification procedure tends to use those conditional probabilities for whose estimation the learning data provides most evidence. Table 1 summarises the properties of eight medical domains in which these learning systems have been applied. The domains are characterised by: the number of known examples (patients), the number of classes (that is: possible diagnoses), the number of attributes, the average number of possible attribute values per attribute. More detailed description of these applications can be found for example in (Bratko and Kononenko 1987), (Pirnat et al. 1989) and (Roskar et al. 1986). Table 2 shows results of these applications in terms of diagnostic accuracy of learned diagnostic 1210 domain examples lymphography 1 148 lymphography 2 150 primary tumor 339 breast cancer 288 hepatitis 155 thyroid 884 rheumathology 355 urinary tract m 1843 urinary tract f 3580 classes 4 7 22 2 2 4 6 9 9 majority class attributes 18 55 % 46 % 18 17 25 % 10 80 % 19 79 % 15 56 % 32 66 % 44 21 % 45 25 % average entropy no. values (bit) 3.3 1.23 2.11 3.3 2.2 3.64 2.7 0.72 0.74 3.6 15.7 1.59 9.1 1.70 2.91 3.8 6.5 2.59 Table 1: Properties of the medical application domains. rules by the three systems. The performance of medical experts is also included for comparison in the cases when their performance has been experimentally estimated on the same data as used by the systems. In one case (lymphography), the physicians' performance is an expert's own estimate and was not experimentally confirmed. It is probably an optimistic over-estimate. Systems' accuracy on new data was estimated in the usual way: 70% of the available data was randomly chosen for learning, and the remaining 30% was diagnosed by the learned rule. The system's diagnoses on the "new" data were then compared with the known physician's diagnoses. This was repeated several times (usually ten times, to reduce statistical fluctuation) and the figures in Table 2 are the average of these repeated experiments. For comparison, the performance of "naive Bayes" (that is Bayes classification under the assumption of attribute independence) is also included. It should be noted that this straight- . forward application of Bayes has the disadvantage that it does not support the usual style of explanation in expert systems. It is therefore avoided in expert systems, although Michie (1990) describes a way to overcome this difficulty. Some accuracy results in Table 2 are surprizing as in some cases the system's or expert's accuracy are lower than the percentage of the majority class. For example, in the breast-cancer domain the specialists' performance is 64% and Assistant's performance is 77%. These performances are both below the 80% percent likelihood of the majority class, so an almost uninformed clasifier, always just predicting the majority class, would score better than both human experts and the learning programs. This reflects a drawback of simple accuracy measure as the criterion of success of a classifier. The accuracy criterion does not take into account the relative difficulty of predicting particular classes and is there- fore misleading, particularly in domains where the probabilities are extremely unequally distributed between classes, as in the breast cancer domain. This problem with accuracy as a performance measure is discussed in (Kononenko and Bratko 1991), and an information-based criterion is proposed. Therefore classifiers' information scores (in bits) are also given wherever they were available. The information scores are in all cases positive, indicating that the clasifiers are in fact always doing better than an uninformed classifier (which would, always classifying into the majority class, by definition of the information-based performance measure score zero). One conclusion indicated by Table 2 is that the knowledge bases induced from no more than a few hundreds of examples of patients in some narrow diagnostic domain, perform better than medical doctors, including best specialists. Such a conclusion has been empirically confirmed by several other studies. This result should, of course, be taken with some qualifications. Namely, the criterion of performance here is only in terms of classification acuracy (or inform·ation score) under the condition that both the human expert and the induced classifier are given the same information. In practice, the human expert might be able to use extra information. Also, the medical doctor would typically have a much better global understanding of the problem and be capable of deeper explanation of the particular cases. 3 Attribute-based learning vs. relational learning Applications of Machine Learning described above all rely on the use of attribute-based learning. Both learning examples and induced concept descriptions employ global attributes of objects and not reI a- 1211 domain lymphography 1 lymphography 2 primary tumor breast cancer hepatitis thyroid rheumathology urinary tract m urinary tract f doctors nonspec. doctors specialists 60% (D) 32% , 0.95 bit 64% ,0.03 bit 85% (D) 42% , 1.22 bit 64% , 0.05 bit Assistant 64% ,0.59 bit 56% , 0.26 bit 76% 65% 44 % 77% 83% 73% 61% 70% 80% GINESYS LogArt 84% (A) , 1.38 bit , 0.07 bit , 0.86 bit , 0.46 bit (A) (A) 70% (C) 52% (C) 74% (C) 44% (B) 78% (B) 85% naive Bayes 79% 67% 49% , 1.59 79% , 0.06 84% 68% , 0.70 57% , 0.28 67% 79% bit bit bit bit Table 2: Performance in terms of classification accuracy and information score (in bits) on new data of the three learning systems, physicians (specialists and non-specialists), and the Bayes classifier evaluated under the assumption of attribute independence. Labels A, B, C, D in the table mean: A - old implementation of Assistant on DEC-I0; B - in the case that more than one class remain un-eliminated by rules, naive Bayes is applied as tie-break; C - original data preprocessed so that unknown attribute values in data are replaced by the most likely value; D - expert physician's estimate (not measured experimentally). tions among their parts. Well known families of such learning programs are TDIDT (e.g. Quinlan 1986), AQ (e.g. Michalsl\i 1983), CN2 (Clark and Niblett 1989). Attribute-based learning is a relatively simple approach to learning and is therefore most widespread and widely used. The following advantages of attributional learning contribute to its success in practical applications: • Computational efficiency • Attributional learning is relatively well stood und~r­ • Attributionallearning process is easy to understand by the users and it is straightforward to apply • The attribute-value language is natural in many domains and many users are used to this representation • It is well understood how to handle noisy and incomplete data in attributionallearning; there are methods that handle noise very well However, attribute-based learning also has strong limitations: Attribute-based descriptions are essentially equivalent to propositional logic. This is not sufficiently expressive for describing concepts in some application areas. An example of such a problem area is the finite-element mesh design which is described in detail in the next section. The realization of the limitations of attributebased learning led to a number of recent developments towards learning at the level of first-order predicate logic, including programs CIGOL (Muggleton and Buntine 1988), FOIL (Quinlan 1990), GOLEM (Muggleton and Feng 1990) and LINUS (Lavrac, Dzeroski and Grobelnik 1991). This led to the establishment of a special area of Machine Learning, named by Muggleton (1990) Inductive Logic Programming (ILP; see also Muggleton 1992). The learning problem in ILP is formalised as: given some background knowledge B expressed as a set of predicates, some examples E and some negative examples N, find a logic formula H, such that: BI\H'r-E and BI\HIfN The following section describes an application that illustrates the suitability of this approach. 4 • Background knowledge can be expressed in rather limited form • Lack of relational descriptions makes the concept description language inappropriate for some domains Application of ILP to finite-element mesh design Doisak and Muggleton (1991) describe an application of ILP to a problem for which the attribute- 1212 proper density of the mesh in various regions of the objects. Unfortunately, the experts have difficulties in forming general rules that would enable the automation of such guesses. In general, the mesh depends on the geometric properties of the object and forces acting on it. As pressure is transmitted between adjacent elements, the mesh density in a region of the object depends also on the adjacent regions. These general considerations were captured in Dolsak's application as background knowledge for the ILP learning in the form of properties and relations, such as: Figure 1: A cylindrical object partitioned by mesh suitable for the finite element computation. (Doisak 1991) based learning is unsuitable, and the relational representation appears natural. Here we illustrate this application with more recent results reported in (Doisak 1991). The problem of finite-element mesh design arises in numerical computation. Given, for example, a machine part and forces acting on it, the problem is to compute the pressure and deformations throughout the object. The finite-element method involves the partitioning of the given object into finite elements. Figure 1 shows an example. The resulting partition is called a finite element mesh. For each element of the mesh, constraints in the form of equations are stated. The constraints approximately state the physical laws modelling the behaviour of the individual elements. These approximations are sufficiently accurate if the elements are sufficiently small. Generally, the finer the mesh, the smaller the error. However, a dense mesh results in a large number of equations, leading to a lengthy computation when solving the corresponding system of equations. The complexity of computation is often measured in days or weeks of CPU time and can easily become prohibitive. The problem, then, is to find a suitable compromise between the density and coarseness of the mesh. Normally some regions of the object require denser mesh whereas in other regi6ns a coarser mesh still suffices for good approximation. There is no known general method that would enable automatic determination of optimal, or reasonably good meshes. However, expert users of finite element methods are capable of making good guesses about short( Edge) usual_length( Edge) loaded( Edge) not_loaded( Edge) tvo_side_fixedC Edge) neighbour_xyC Edge1, Edge2) neighbour_xzC Edge1, Edge2) The meaning of these relations is straightforward. For example, an edge is "two-BideJixed" if it is fixed at both ends. neighbour_xyC Edge1, Edge2) means that the edges are adjacent and they are in the xy-plane. In an experiment to learn a characterisation of the density of a mesh in terms of these relations, five meshes known to work well were used as sources of examples for learning (Figure 2). The relation to be learned was: mesh( Edge, N) where Edge is' the name of an edge in the structure, and N is the recommended number of finite elements along this edge. The target definition of this relation is to be learned in terms of the properties and relations in the given structure. All the five rrieshes used comprised altogether 278 edges, that is 278 positive examples for learning. The number of finite elements along the edges varied between 1 and 17. In edges with high partition, say 10, it was assumed that a similar partition would still make a good mesh, so 10 ± 1 was considered acceptable and sometimes used as another positive example. Negative examples were generated according to the closed-world assumption: if the given partitioning of an edge was 3, say, then partitionings such as 4, 5, etc. were taken as negative examples. This finally gives the following number of facts for learning in 1213 this experiment (Doisak 1991): 357 positive examples 2840 negative examples 2132 background facts Several relational learning algorithms were tried on this data: GOLEM (Muggleton and Feng 1990), LINUS (Lavrac, Dzeroski and Grobelnik 1991) and FOIL (Quinlan 1990). The results obtained with GOLEM were judged to be the most satisfactory. GOLEM generated a large number of rules, some of them being practically irrelevant. For example, although logically correct, they were computationally useless when applied to classifying new edges. On the other hand, some rules appeared useful. Fortunately it was possible to formalise the criteria for distinguishing useful rules from the others. These criteria were implemented as a short Prolog program (Doisak 1991) for postprocessing the rules generated by GOLEM. The so resulting set of rules were of interest to expert users of the finite element methods. According to their comments, these rules reveal interesting relational dependences. The following is an example of such a generated rule (the generated syntax is that of Prolog clauses): mesh( Edge, 7) usual_length( Edge), neighbour_xy( Edge, EdgeY), two_side_fixed( EdgeY), neighbour_zx( EdgeZ, Edge), not_loaded( EdgeZ). Figure 2: Two of the five meshes used for learning (Dolsak). This rule says that an appropriate partitioning of Edge is 7 if Edge has a neighbour EdgeY in the xyplane so that EdgeY is fixed at both ends, and Edge has another neighbour EdgeZ in the xz-plane so that EdgeZ is not loaded. The following is a recursive rule also generated by GOLEM: mesh( Edge, N) equal( Edge, Edge2), mesh( Edge2, N). This observes that an edge's partition can be determined by looking for an edge of the same length and shape in the same object. Of course, for this rule to be computationally useful, at least some of such equivalent edges has to have its partition determined by its own properties and those of its neighbours. 1214 The accuracy of the induce knowledge base was estimated by a cross-validation mehod. Thereby a subset of 10 % of the example edges was effectively removed from the training set. The remaining 90 % of the data was used for rule induction, and the so induced rules were applied to the removed 10 % of the data now used as a test set. This was repeated ten times. The results can be summarised as follows. On the average, the classification on the test set was correct in 78 % of the tested edges, incorrect in 2 % of the edges, and the edge remained unclassified (partition unknown) in 20 %of the test edges. An edge remains unclassified if there is no induced rule covering the edge. In another, more practically realistic evaluation attempt, the generated knowledge base was applied to determining a mesh for a completely new structure, one not used for learning (shown in Figure 1). In this case, 67 % of the edges were classified correctly, 22 % incorrectly, and 11 % remained unclassified. These results were input into a commercial automatic mesh generator as a partial specification of the mesh. The partial mesh was then completed automatically by the mesh generator, resulting in the mesh shown in Figure 3a. This mesh is close to the known good mesh of Figure 1, but unfortunately not quite acceptable with respect to the resulting numerical errors. Figure 3b shows the mesh generated by the commercial generator without any guidance from the user. This mesh is certainly fine enough with respect to the numerical errros, but completely unacceptable with respect to the computational complexity it requires. Figure 3c is again generated by the commercial generator, only this time guided by the user's advice regarding the "global" size of the elements in the mesh. This is again a deficient mesh which illustrates the generator's inability to adjust the density of the mesh in various regions of the object according to the criticality of the region. Comparing the meshes in Figures 3a-c it becomes clear that the induced knowledge base does "understand" the criticality of varios regions of the object and tries to adjust the density accordingly. The mesh resulting from the induced knowledge base can actually be easily improved. There is a well known rule of thumb in mesh design that in a rectangular mesh the ratio between the length and width of elements should not exceed 2. Applying this rule to mending the mesh of Figure 3a in fact results in the very good mesh of Figure 1. Figure 3: (a) A mesh generated by the induced knowledge base and completed by a commercial generator. (b) The mesh generated by the automatic mesh generator completely autonomously, without any guidance from the user. (c) A mesh, generated by the mesh generator, guided by the user's advice recommending the "global granularity" of 150 mm. (Doisak 1991) 1215 5 Towards knowledge synthe- . SIS As illustrated by the applications described in this paper, and concluding from many other applications, ML techniques have proved to be a useful tool for efficient construction of expert systems for tasks like classification, prediction, decision making etc. In our experience, for example in the medical domains, employing ML it was possible to inductively construct competent diagnostic systems in the matter of months, weeks or even days (including time for defining the problem, choice of attributes, preparation of learning data, etc.) when it would take much longer without learning. Muggleton (1991) and Clark et al. (1991) describe another comparison between dialogue-based and induction-based knowledge acquisition for large expert systems with thousands or tens of thousands of rules. That comparison showed that in projects employing ML the knowledge acquisition effort in man years (relative to the number of rules) was one or two orders of magnitude lower than in dialoguebased acquisition. It should be admitted that the basis for comparison was simply the number of rules in the knowldge-base per man-year invested. The quality of rules was not considered. Although the inductively constructed knowledge bases perform accurately, the question still remains whether automatically synthesised knowledge represents symbolically meaningful information. That is, does it tell the humans something about the problem domain in a transparent way that also fits nicely into the human's normal understanding of the domain. In other words, whatever has been induced from examples, does it deserve to be called knowledge? In ML there has been strong awareness of the importance of this comprehensibility criterion (for example Michie 1986 and 1988). There exist some standard techniques that help in this respect. For example, tree pruning in induction of decision trees, in addition to supressing noise, often improves the transparency of induced trees enormously by simply reducing the tree size to, say, 10% of its original size. It should be admitted, however, that compactness is only one measure that is usually correlated with meaningfulness. Human experts often prefer less compact, possibly redundant descriptions, because thay better correspond to the way the problem domain is structured in their heads, or to the way that the knowledge is to be used. The use of knowledge may require not only classification, but for example the achievement of certain goals, explanation, planning, or making decision on the basis of incomplete information. Criteria to decide whether given information deserves to be called knowledge are intricate. Of course, these criteria do not exactly correspond to simple measures of accuracy or compactness of induced rules. Identifying and formalising these criteria is an important research topic. Still, there has already been some success in the direction of automatically inducing meaningful information. Knowledge has been generated through ML that was of interest and revealing to human experts. I will illustrate this by an example from the KARDID project (Bratko, Mozetic and Lavrac, 1989). In KARDIO, a deep qualitative model of the heart was compiled for efficiency reasons into a large shallow diagnostic knowledge base. This was then compressed, using ML techniques, into a small number of equivalent prediction and diagnostic rules. It was interesting to compare these mechanically synthesised descriptions with human-synthesised descriptions that can be found in the medical literature. Here is an example of a synthesised prediction rule which tells what are the characteristic features in the ECG signal in the case of the disorder called AV block of the third degree (avb3 for short, possibly combined with any number of other defects in the heart ): avb3] is characterised by [av_conduct [rhythm_QRS regul ar] and independent_P_QRS] [relation_P_QRS This rule is in the VL1 formalism, normally used in the AQ family of programs (Michalski 1983). The propositions have the form [attribute = value]. Figure 4 illustrates what essentially happens in the case of the avb3 defect. For comparison, one of the classical books on ECG (Goldman 1976) describes this arrhythmia as follows: "In this condition the atria and ventricles beat entirely independently of one another. The ventricular rhythm is usually quite regular but at a much slower rate (20-60)." Some words here are in bold face to help the comparison between Goldman's description and the machine synthesised description. It is easy to notice strong similarities between both descriptions. It is nice that even the same qualitative descriptors, such as independent or regular appear in both descriptions. Goldman notices that the ventricular rate is usually much lower (20-60) which is not mentioned in the machine generated description. This is in fact 1216 the right amount is an open research problem. 6 Atrial signal Ventricular signal Figure 4: The mechanism of the heart disorder called av-block of the third degree. In the normal heart, the atrial signal reaches the ventricles through the AV-conductance and affects the QRS complex. In the case of the av-block, the atrial signal cannot propagate to the ventricles and has no effect on the QRS complex. the only essential difference between both descriptions. The reason that the ventricular rate is not mentioned in the machine-generated description is that it is redundant with respect to distinguishing between those conditions of the heart in which avb3 appears, and those in which it does not. Another authority on ECG, Phibbs (1973) describes avb3 as: "(1) The atrial and ventricular rates are different: the atrial rate is faster; the ventricular rate is slow and regular. (2) There is no consistent relation between P waves and QRS complexes." Again, some descriptors are in bold face to facilitate comparison with the machine-generated description. The comparison is rather straightforward in this case as well. The example above shows how well some of the synthesised descriptions correspond to those in the standard medical literature. On the other hand, some of the synthesised descriptions are considerably more complex than those in the literature. Machine-generated descriptions in such cases give much more detail that may not be necessary for an intelligent reader with a physiological background. Such a reader can usually infer the missing detail from the background knowledge. Making induced descriptions appealing to humans requires adding some redundancy or leaving out some information that can be usually recovered from background knowledge. How to add and leave out just Conclusions A large number of ML applications confirm the practical importance of this technology. Experience shows that inductive knowledge-acquisition is typically an iterative process whereby the representation, background knowledge and example sets are gradually refined through experiments and feedback obtained from the domain expert. ML tools are repeatedly applied. Induction from examples can be viewed as a way of compiling a high level specification where the specification consists of examples and background knowledge. The practical advantage of this approach lies in the fact that it is often easier to obtain examples (e.g. from the domain expert) than to extract from the expert explicit general laws about the domain. Until now, attribute-based learning has enjoyed most success in practice. However, the recent important developments in inductive logic programming (ILP) go beyond the limitations of the attributebased learning. Recent applications of ILP include, in addition to the mesh design described in this paper, the prediction of protein secondary structure (Muggleton et al. 1992). Another exciting area facilitated by ILP is automated construction of qualitative models from observed behaviours. Work that has been done in this direction includes (Mozetic 1987a,b; also described in Bratko et al. 1989), (Coiera 1989), (Bratko et al. 1991) and (Krann et al. 1991) Acknowledgements A number of medical doctors helped in the applications described in this paper. I would like to thank S. Hojker, G. Klanjscek, J.Lamovec, V. Pirnat, M. Soklic and M. Zwitter of the University Medical Center, Ljubljana, P. Abrams and M. Torrens of the Clinical Investigation Unit, Ham Green Hospital, Bristol, UK, and G. Gong of Carnegie-Mellon University, Pittsburgh, who helped in the medical applications of induction either by the data for learning or by advice. M. Horvat, B. Cercek, A. Grad and P. Rode assisted with the cardiological expertise in the KARDIO project. I would also like to acknowledge the cooperation of my AI colleagues whose names appear in the references. Special thanks to Igor Kononenko for his work in medical domains, Bo- 1217 jan DolSak for lending the mesh design figures, and Saso Dzeroski who helped in the preparation of this paper. This paper was written while the author was visiting the Strathclyde University and Turing Institute, Glasgow, Scotland Clark, P., Cestnik, B., Sammut, C., Stencier, J. (1991) Applications of machine learning: notes from the panel members. Machine Learning - EWSL91,. Proc. European Working Session on Learning (Porto, Portugal, March 1991) Springer-Verlag. Clark, P., Niblett, T. (1989) The CN2 induction algorithm. Machine Learning, Vol. 3, no. 4, 261-284. References Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. (1984) Classification and Regression Trees. Belmont, CA: Wadsworth. Coiera, E.(1989) Generating qualitative models from example behaviours. DCS Report No. 8901, School of Electr. Eng. and Computer Sc., Univ. of New South Wales, Sydney, Australia. Bratko, 1., Kononenko, 1. (1987) Learning diagnostic rules from incomplete and noisy data. In: Interactions in AI and Statistics (ed. B. Phelps) London: Gower Technical Press. DolSak, B., Muggleton, S. (1991), The application of Inductive Logic Programming to Finite Element Mesh Design. Proc. ILP-91, Viana do Castelo, Portugal. Bratko, 1., Mozetic, I, Lavrac, N. (1989) KARDIO: A Study in Deep and Qualitative Knowledge for Expert Systems. Cambridge, Massachusetts: The MIT Press. Doisak, B. (1991) Determining the geometric model of finite element meshes using AI methods. University of Maribor: CAD Center (M.Sc. thesis) Maribor, Slovenia. Bratko,1., Muggleton, S., Varsek, A. (1991) Learning qualitative models of dynamic systems, Proc. Inductive Logic Programming ILP-91, Viana do Castelo, Portugal, March 1991. (An abbreviated version also in Machine Learning: Proc. Eighth Int. Workshop, eds. L.A. Birnbaum and G. Collins, San Mateo, CA: Morgan Kaufmann). Gams, M. (1989) New measurements highlight the importance of redundant knowledge, Proc. 4th European Working Session on Learning (ed. K. Morik; Montpellier, France) Pitman and Morgan Kaufmann. Bratko, 1., Mulec, P. (1979) An experiment in automatic learning of diagnostic rules. Informatica, Vol. 4, No.4, 353-359. Cestnik, B., Bratko, 1. (1988) Learning redundant rules in noisy domains, Proc. ECAI88, European Conf. on Artificial Intelligence, Muenchen, August 1988 Cestnik, B., Kononenko, 1., Bratko, 1. (1987) ASSISTANT 86: a knowledge elicitation tool for sophisticated users. In Progress in Machine Learning (eds. 1. Bratko, N. Lavrac; Proc. Second European Working Session on Learning. Bled, Slovenia, 1987) Wilmslow, England: Sigma Press, 1987. Chandrasekaran, B. (1991) Interview with Frederick Hayes-Roth and Richards Fikes, IEEE Expert, October 1991, pp. 3-14. Chilausky, R., Jacobsen, B., Michalski, R.S. (1976) An application of variable-valued logic to inductive learning of plant disease diagnostic rules. Proceedings of the sixth international symposium on multiple-valued logic. May 25-28, 1976, Logan, Utah. Kononenko, 1., Bratko, 1. (1991) Information based evaluation criterion for classifier's performance. Machine Learning, Vol. 6, 67-80. Krann, 1., Richards, B., Kuipers, B. (1991) Automatic abduction of qualitative models, Proc. QR91, Austin, Texas, May 1991. Lavrac, N., Dzeroski, S, Grobelnik, M. (1991) Learning nonrecursive definitions of relations with LINUS. EWSL '91: Machine Learning: Proceedings of the European Working Session on Learning, Porto, Portugal (ed. Y. Kodratoff) Springer-Verlag, Lecture Notes in Artificial Intelligence. Leech, W.J. (1986) A rule based process control method with feedback. In: Advances in Instrumentation, Vol 41, Part 1, 69 - 175. Michalski, R.S. (1983) A theory and methodology of inductive learning. In: Machine Learning: an Artificial Intelligence Approach (eds. R.S. Michalski, J.G. Carbonell and T.M. Mitchell) Palo Alto, CA: Tioga. Michalski, R.S., Chilausky, R.L. (1980) Learning by being told and learning from examples: an experimental comparison of the two methods of know 1- 1218 edge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, Vol. 4, 125-161. Michie, D. (1986), The superarticulacy phenomenon in the context of software manufacture. Proceedings of the Royal Society A, London, Vol. 405, 185 - 212. Michie, D. (1988) Machine learning in the next five years. EWSL-88 - Proc. Brd European Worki~g Session on Learning, Glasgow, 1988. London: PItman. Michie, D. (1989) Problems of computer-aided concept formation. In Applications of Expert Systems) Vol. 2 (ed. J .R. Quinlan) Turing Institute Press in association with Addison-Wesley) Michie, D. (1990) Personal models of rationality. Journal of Statistical Planning and Inference, Vol. 25, 381-399. Michie, D., Muggleton, S., Riese, C., Zubrick, S. (1984) RuleMaster: a second generati~n ~nowledg~ engineering tool, Proc. 1st Conf. Artijiczal Intellzgence Applications) IEEE Computer Society. Mingers, J. (1989a) An empirical comparison of selection measures for decision tree-induction. Machine Learning, Vol. 3, No.4, 319-342. Mingers, J. (1989b) An empiricial comparison of pruning methods for decision-tree induction. Machine Learning, Vol. 4, No.2. Mozetic, I. (1987a) Learning of qualitative models. In Progress in Machine Learning (eds. I. Bratko, N. Lavrac; Proc. 2nd European Working Session on Learning, Bled, Slovenia) Wilmslow: Sigma Press. Mozetic, I. (1987b) The role of abstractions in learning qualitative models . Proc. Fourth Int. Workshop on Machine Learning, Irvine, CA, June 1987. Morgan Kaufmann. Muggleton, S. (1991), Inductive logic programming, New generation computing, Vol. 8, 295 - 318. Muggleton, S. (1992, ed.) Inductive Logic Programming, Academic Press. Muggleton, S., Buntine, W. (1988) Machine invention of first order predicates by inverting resolution. Proc. 5th Int. Conf. Machine Learning, Ann Arbor, MI, 1988. Muggleton, S., Feng, C. (1990) Efficient induction of logic programs. Proc. First Conf. Inductive Learning Theory, Tokyo: Japanese Society for Artificial Intelligence, 1990. Muggleton, S., King, R.D., Sternberg, M.J.E. (1992) Protein secondary structure using Logic (submitted for publication). Niblett, T., Bratko, I. (1986) Learning decision trees in noisy domains. In: Expert Systems 86: Proc. Expert Systems 86 Conf. (ed. M. Bramer) Cambridge Univ. Press. Pirnat, V., Kononenko, Janc, T., Bratko, I., Medical analysis of automatically induced diagnostic rules, Proc. Third Int. Conf. AI in Medicine, London, August 1989. Quinlan, J.R. (1979), Discovering rules by induction from large collections of examples. In Expert systems in the Microelectronic Age (ed. D. Michie) Edinburgh University Press. Quinlan, J .R. (1986) Induction of decision trees. Machine Learning Journal, Vol. 1, 81-106. Quinlan, J.R. (1990) Learning logical definitions from relations. Machine Learning Vol. 5, pp 239266. Quinlan, J.R., Compton, P., Horn, K.A., Lazarus, 1. (1989) Inductive knowledge acquisition: A case study. In Applications of Expert Systems) Vol. 2 (ed. J.R. Quinlan) Turing Institute Press in association with Addison-Wesley. Roskar, E., Abrams, P., Bratko, I., Kononenko, I., Varsek, A., MCUDS - an expert system for the diagnostics of lower urinary tract, Journal of Biomedical Measurement Informatics and Control, Vol. 1, No. 4, 201-204 (1986). UrbanCic, T., Kononenko, I., Krizman, V. (1991) Review of AI applications at Ljubljana AI Labs. Ljubljana, J. Stefan Institute: Technical Report DP6183. Author Index Abe, Masahiro ········ .. ··············1022 Aiba, Akira ....................... '113, 330 Ait-kaci, Hassan ·····················1012 Aikawa, Seiichi ························286 Alferes, Jose J. ························562 Ali, Khayri A.M. ·····················739 Alliot, Jean-Marc ·····················833 Amano, S. . ............................ ·1133 Aparicio, Joaquim N. ···············562 Arai, Susumu ···························414 Arikawa, Setsuo ························618 Arima, Jun ······························505 Asaie, M. ·································723 Asato, Akira ···························414 Babaguchi, Noboru ··················497 Bahr, E. ·································969 Barachini, F. . ....................... ···969 Barklund, Jonas ....................... ·817 Bj¢'rner, Dines ....................... ·191 Borgida, Alexander ··················1036 Bossi, A. ·································570 Brachman, Ronald J ......... '1036, 1063 Bratko, Ivan ........... '" ........... '·1207 Bruschi, Massimo .................... ·634 Bruynooghe, Maurice············473, 481 Bueno, Francisco .................... ·759 Carpineto, Claudio ................. ·626 Castaing, Jacqueline··················1076 Cheng, Anthony S.K. . ............. ·825 Chikayama, Takashi ........... ·······73 Chikayama, Takashi ························269, 278, 286, 791 Chino, T. ······························1133 Cho, Jung Wan ··················643, 851 Ciancarini, Paolo .................... ·926 Ciapessoni, Emanuele .............. ·702 Corradini, Andrea .................... ·887 Cox, P.T. ·································539 Dally, William J. ·····················746 Darlington, John ................. ····682 Date, Hiroshi ..................... ······237 De Schreye Danny ·········473, 481, 650 Debray, Saumya K. ··················581 Denecker, Marc ....................... ·650 Dung, Phan Minh ·····················555 Duvvuru, S. ······························809 Eddy, John K. ························1091 Eshghi, Kave ···························514 Evans, Chris ···························546 Feldmann, Richard J. ···············300 Fuchi, Kazuhiro ···························3 Fujise, Tetsuro ························269 Fujita, Hiroshi ························357 Fujita, Masayuki .............. '132,357 Fukumoto, Fumiyo ··················376 Furukawa, Koichi .............. '20,230 Gabbrielli, M. ···························570 Gaines, B.R.·····················1157, 1165 Gallaire, Herve ························220 Gaudiot, Jean-Luc ·····················977 Gelernter, David········ ........... '" ··926 Giacobazzi, Roberto ··················581 Goldberg, Yaron ·····················951 Gregory, Steve··········· ............... ·843 Guo, Yi-ke ······························682 Gupta, Gopal ···························770 Hagiwara, Kaoru ·····················385 Hagstrom, Ray ························307 Hamfelt, Andreas··········· ......... ·1107 Hansen, L. ······························809 Hansson, Ake ....................... ·1107 Hasegawa, Ryuzo ........ '113, 132, 357 Hasida, Koiti ···························1141 Hatazawa, Hiroyoshi ···················414 Hattori, Akira ···~·······················414 Hawley, David J. ·····················330 Hermenegildo, Manuel V. ······759, 770 Herzig, Andreas ....................... ·833 Hirano, Kiyoshi ··················414,436 Hirata, Keiji .......................... ·436 Hirosawa, Makoto ···············294, 300 Hoare, C.A.R ........................... ·211 Honda, Yasuaki .................... ·1044 Hori, Atsushi .......................... ·269 Horiuchi, Kenji ....................... ·897 Hoshi, Masahiro ....................... ·237 Hoshida, Masaki ··················294, 300 Ichiyoshi, Nobuyuki ············166, 869 Idestam-Almquist, Peter ············610 Ido, N. ····································723 Ikeda, Teruo .. ·························385 Imai, Akira ·····························-436 Inamura, Yu ··························-425 Inoue, Katsumi ························522 Ishida, Y oshiteru ·····················1030 Ishikawa, Masato ···············294, 300 Isozaki, Hideki ························694 Hoh, Fumihide ....................... ·278 Iwamasa, Mikito ·····················1099 Iwayama, Noboru ·····················330 Jaffar, Joxan ···························987 Kahn, Kenneth M ..................... ·943 VoLI 1- 460 V0 1.2 461 - 1218 Kakas, Antonios C. ··················546 Kale, Laxmikant V. ··················783 K~miko, Mayumi .................... ·286 Kamiya, Akimoto· .. ··················1099 Karlsson, Roland ·····················739 Kasahara, Takayasu ··················1084 Kato, Hiroo·· ..................... '" ... ·237 Kato, Tatsuo .......................... '278 Kawagishi, Taro ························330 Kawai, Hideo ···························436 Kawamura, Moto ................... ··248 Kawamura, Tadashi ................. ·463 Kawato, Nobuaki ·····················1181 Kazic, Toni ······························307 Kesim, F.Nihan ·····················1052 Kim, Byeong Man .................... ·643 Kimura, Kouichi ···············237, 869 Knill, E. ·································539 Kobayashi, Yasuhiro .............. ·1084 Kodama, Yuetsu ························731 Koike, Hanpei···························715 Komatsu, Keiko ·····················1173 Konagaya, Akihiko ··················791 Kondo, Seiichi ....................... '425 Konishi, Koichi ························791 Konuma, Chiho ·····················1099 Koseki, Y oshiyuki ................. ·1190 Koshimura, Miyuki ··················357 Kotani, Akira ···························385 Kowalski, Robert A. ··················219 Kubo, Hideyuki ························288 Kubo, Yukihiro ·····················385 Kuhara, Satoru ························618 Kumon, Kouichi························414 Kurozumi, Takashi ................. ····9 Lassez, Catherine .................... ·1066 Le Provost, Thierry ··················1004 Lee, J.H.M. ······························996 Lee, Sang Ho .......................... ·643 Lefebvre, Alexandre ··················915 Levi, Giorgio ..................... 570, 581 Lima-Marques, Mamede ........... ·833 Lin, Eileen Tien······················· ·907 Lin, Zheng ······························859 Linster, M .............................. ·1157 Maeda, Munenori ·····················961 Maeda, Shigeru ....................... ·1115 Maeng, Seung Ryoul ........... '643, 851 Maher, Michael J. . ................... ·987 Maim, Enrico ........................... 702 Martens, Bern .......................... '473 ii Maruyama, Fumihiro .............. ·1181 Maruyama, Tsutomu ··················791 Masuda, Kanae ....................... ·425 Matono, Fumio ....................... ·877 Matsumoto, Yukinori ·········237, 1198 Matsuo, Masahiro ·····················269 Matsuzawa, Fumiko ··················286 McGuinness, Deborah L. ·········1036 Menju, Satoshi ························330 Meo, M.C. ······························570 Michaels, George ............... 300, 307 Millroth, Hakan ························817 Minoda, Y oriko .................... ·1181 Mistelberger, H ........................ ·969 Miyano, Satoru ························618 Mizoguchi, Fumio ··················1061 Mochiji, Shigeru ·····················1099 Montanari, Angelo ·····················702 Montanari, U go ....................... ·887 Mori, Takeshi .......... ·················278 Mori, Toshiaki .................... ····497 Morita, Masao···························799 Muggleton, Stephen .................. 107l Mukouchi, Yasuhito ··················618 Naganuma, Kazutomo ···············248 Nagasawa,Ikuko ·····················405 Nakagawa, T. ···························723 Nakajima, Katsuto ·····················425 Nakakuki, Yoichiro··················1190 N akase, Akihiko······················· ·436 Nakashima, Hiroshi ··················425 Nang, long H.···························851 Nitta, Katsumi ············166, 294, 1115 Nonnenmann, Uwe ··················1091 Ohkawa, Takenao .................... ·497 Ohki, Masaru··························· 1022 Ohsaki, Hiroshi······················· ·1115 Ohta, Yoshihiko ························522 Ohtake, Y oshihisa ................. ·1115 Omiecinski, Edward ................. ·907 Onishi, Satoshi ....................... ·425 Onizuka, Kentaro ·····················294 Ono, K. ·································1133 Ono, Masayuki ....................... ·1115 Oohira, Eiji ···························1022 Overbeek, Ross .................. 223, 307 Patel-Schneider, Peter F. ············1036 Paterson, Ross A. ·····················825 Pereira, Luis Monis ··················562 Pietrzykowski, T. ·····················539 Pliimer, Lutz ................. ··········489 Podelski, Andreas·····················l012 Poirriez, Vincent ................... ·····674 Poole, David ···························530 Preist, Chris ······························514 Pull, Helen ............................. ·682 Ratto, Elena······ .. ······················702 Rawn, David ···························300 Reiter, Raymond······················· ·600 Resnick, Lori Alperin ···············1036 Robinson, l.A. ························199 Rokusawa, Kazuaki ··················436 Rosenblueth, David A. ············1125 Rossi, Francesca ................. ·······887 Sakai, Shuichi ···························731 Sakama, Chiaki ························592 Sakane, Kiyokazu···················· '1115 Sano, Hiroshi ···························376 Sastry, A.V.S. ···························809 Sato, Hiroyuki ························248 Sato, Masaki ···························278 Sato, Tadashi ···························278 Satoh, Ken ······························330 Sawada, Hiroyuki ·····················330 Sawada, Shuho ························1181 Sehr, David C. ························783 Sergot, Marek ························1052 Shapiro, Ehud ···························951 Shaw, M.L.G. ························1157 Shimada, Kentaro ·····················715 Shin, D.W. ······························851 Shinjo, Hiroshi························ 1022 Shinogi, Tsuyoshi .................... ·414 Shinohara, Ayumi ·····················618 Shinohara, Takeshi ··················618 Shoham, Y oav·························· ·694 Silverman, William ................. ·951 Smith, Cassandra ·····················307 Smolka, Gert ···························1012 Sohn, Andrew·························· ·977 Stuckey, Peter l. ························987 Sued a, Naomichi ·····················1099 Sugie, M. ·································723 Sugiyama, Kenji ....................... ·405 Sumita, K .............................. ·1133 Sundararajan, R. ························809 Suzuki, lunzo ························1099 Takagi, Tsuneyoshi ................. ·436 Takayama, Yukihide ··················658 Takeda, Yasutaka .................... ·425 Taki, Kazuo ············50, 166,237,436, 1074, 1198 Takizawa, Yuka ·····················1181 Tanaka, Hidehiko ·····················715 Tanaka, Hidetoshi ·····················321 Tanaka, liro ...................... , ····877 Tanaka, Midori" ..................... ·1190 Tanaka, Yuichi ························155 Tarui, T. ·································723 Tatsuta, Makoto ....................... ·666 Taylor, Ron ······························307 Terasaki, Satoshi ·····················330 Tezuka, Y oshikazu···················· ·497 Tick, E.····························· '809, 934 Tojo, Satoshi ···························395 Tokoro, Mario ..... ···················1044 Toya, Tomoyuki ························294 Tsuda, Hiroshi ................. ·257,347 Turuta, Michiko ....................... ·405 Uchida, Shunichi ··················33, 232 Ueda, Kazunori ························799 Ukita, T.·································1133 van Emden, M.H. ···············996, 1149 Verschaetse, Kristof .................. ·481 Wada, Kumiko ························269 Wallace, Mark ························1004 Watanabe, Toshinori .............. '1173 Watari, Shigeru ························1044 Wegner, Peter ···························225 Yalamanchili, Sudhakar ........... ·907 Yamada, N aoyuki ··················1084 Yamaguchi, Yoshinori ···············731 Yamamoto, Reki .................... ·436 Yamasaki, Shigeichiro .............. ·405 Yang, Rong ······························843 Yap, Roland H.C ..................... ·987 Yashiro, Hiroshi························ 269 Yasukawa, Hideki ........ '89, 257, 395 Yokota, Kazumasa .... , ····89, 248, 257 Yoshida, Kaoru ··················307, 791 Yoshimura, Kikuo ··················1084 Yoshino, Katsuyuki ··················1084 Zawada, David ························307 Zhong, X.································ ·809


Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19
Create Date                     : 2015:09:16 15:51:22-08:00
Modify Date                     : 2015:09:16 15:39:28-07:00
Metadata Date                   : 2015:09:16 15:39:28-07:00
Producer                        : Adobe Acrobat 9.0 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:b2f1c741-479d-964e-ad1c-31b352ac5c8b
Instance ID                     : uuid:03cf0c6f-89e5-364a-a9fb-b7186b135636
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 773
EXIF Metadata provided by EXIF.tools

Navigation menu