Fifth_Generation_Computer_Systems_1992_Volume_1 Fifth Generation Computer Systems 1992 Volume 1
Fifth_Generation_Computer_Systems_1992_Volume_1 Fifth_Generation_Computer_Systems_1992_Volume_1
User Manual: Fifth_Generation_Computer_Systems_1992_Volume_1
Open the PDF directly: View PDF .
Page Count: 495
Download | |
Open PDF In Browser | View PDF |
~ FIFTH GENERATION COMPUTER SYSTEMS 1992 Edited by Institute for New Generation Computer Technology (ICOT) _ Volume 1 Ohmsha, Ltd. lOS Press FIFTH GENERATION COMPUTER SYSTEMS 1992 Copyright © 1992 by Institute for New Generation Computer Technology All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, electronic, mechanical, recording or otherwise, without the prior permission of the copyright owner. ISBN 4-274-07724-1 (Ohmsha) ISBN 90-5199-099-5 (lOS Press) Library of Congress Catalog Card Number: 92-073166 Published and distributed in Japan by Ohmsha, Ltd. 3-1 Kanda Nishiki-cho, Chiyoda-ku, Tokyo 101, Japan Distributed in North America by lOS Press, Inc. Postal Drawer 10558, Burke, VA 22009-0558, U. S. A. United Kingdom by lOS Press 73 Lime Walk, Headington, Oxford OX3 7AD, England Europe and the rest of the world by lOS Press Van Diemenstraat 94, 1013 CN Amsterdam, Netherlands Far East jointly by Ohmsha, Ltd., lOS Press Printed in Japan iii FOREWORD On behalf of the Organizing Committee, it is my great pleasure to welcome you to the International Conference on Fifth Generation Computer Systems 1992. The Fifth Generation Computer Systems (FGCS) project was started in 1982 by the initiative of the late Professor Tohru MotoOka with the purpose of making a revolutionary new type of computers oriented to knowledge processing in the 1990s. After completing the initial and intermediate stages of research and development, we are now at the final point of our ten-year project and are rapidly approaching the completion of prototype Fifth Generation Computer Systems. The research goals of the FGCS project were challenging, but we expect to meet most of them. We have developed a new paradigm of knowledge processing including the parallel logic language, KLl, and the parallel inference machine, PIM. When we look back upon these ten years, we can find many research areas in knowledge processing related to this project, such as logic programming, parallel processing, natural language processing, and machine learning. Furthermore, there emerged many new applications of knowledge processing, such as legal reasoning and genetic information processing. I believe that this new world of information processing will grow more and more in the future. When very large knowledge bases including common sense knowledge come out in full scale and are widely used, the knowledge processing paradigm will show its real power and will give us great rewards. From now on, we can enjoy fifth generation computer technology in many fields. Following the same objective of creating such a new paradigm, there has been intense international collaboration, such as joint workshops with France, Italy, Sweden, the U.K., and the U.S.A., and joint research with U.S.A. and Swedish institutes on parallel processing applications. Against this background,ICOT hosts the International Conference on Fifth Generation Computer Systems 1992 (FGCS'92). This is the last in a series of FGCS conferences; previous conferences were held in 1981, 1984 and 1988. The purpose of the conference is to present the final results of the FGCS project, as well as to promote the exchange of new ideas in the fields of knowledge processing, logic programming, and parallel processing. FGCS'92 will take place over five days. The first two days will be devoted to the presentation of the latest results of the FGCS project, and will include invited lectures by leading researchers. The IV remaining three days will be devoted to technical sessions for invited and submitted papers, the presentation of the results of detailed research done at ICOT, and panel discussions. Professor D. Bj¢rner from the United Nations University, Professor l.A. Robinson from Syracuse University, and Professor C.A.R. Hoare from Oxford University kindly accepted our offer to· give invited lectures. Professor R. Kowalski from Imperial College is the chairperson of the plenary panel session on "A springboard for information processing in the 21st century." Professor Hajime Karatsu from Tokai University accepted our invitation to give a banquet speech. During the conference, there will be demonstrations of the research results from the ten-year FGCS project. The Parallel Inference Machines and many kinds of parallel application programs will be highlighted to show the feasibility of the machines. I hope that this conference will be a nice place to present all of the research results in this field up to this time, confirm the milestones, and propose a future direction for the research, development and applications of the fifth generation computers through vigorous discussions among attendees from all over the world. I hope all of the attendees will return to their own countries with great expectations in minds and feel that a new era of computer science has opened in terms of fifth generation computer systems. Moreover, I wish that the friendship and frank cooperation among researchers from around the world, brewed in the process of fifth generation computer systems research, will grow and widen so that this small but strong relationship can help promote international collaboration for the brilliant future of mankind. Hidehiko Tanaka Conference Chairperson v FOREWORD Esteemed guests, let me begin by welcoming you to the International Conference on Fifth Generation Computer Systems, 1992. I am Hideaki Kumano. I am the Director General of the Machinery and Information Industries Bureau of MIT!. We have been promoting the Fifth Generation Computer Systems Project, with the mission of international contributions to technological development by promoting the research and development of information technology in the basic research phase and distributing the achievements of that research worldwide. This international conference is thus of great importance in making our achievements available to all. It is, therefore, a great honor for me to be given the opportunity to make the keynote speech today. 1 Achievements of the Project Since I took up my current post, I have had several opportunities to visit the project site. This made a great impression on me since it proved to me that Japanese technology can produce spectacular results in an area of highly advanced technology, covering the fields of parallel inference machine hardware and its basic software such as operating systems and programming languages; fields in which no one had any previous experience. Furthermore, I caught a glimpse of the future use of fifth generation computer technology when I saw the results of its application to genetics and law. I was especially interested in the demonstration of the parallel legal inference system, since I have been engaged in the enactment and operation of laws at MIT!. I now believe that the machines using the concepts of fifth generation computers will find practical applications in the enactment and operation of laws in the near future. The research and development phase of our project will be completed by the end of this fiscal year. We will evaluate all the results. The committee for development of basic computer technology, comprised of distinguished members selected from a broad spectrum of fields, will make a formal evaluation of the project. This evaluation will take into account the opinions of those attending the conference, as well as the results of a questionnaire completed by overseas experts in each field. Even before this evaluation, however, I am convinced that the project has produced results that will have a great impact on future computer technology. 2 Features of the Fifth Generation Computer Systems Project I will explain how we set our goals and developed a scheme that would achieve these high-level technological advances. The commencement of the project coincided with the time when Japan was coming to be recognized as a major economic and technological power in the world community. Given these circumstances, the objectives of the project included not only the developInent of original and creative technology, but also the making of valuable international VI contributions. In this regard, we selected a theme of "knowledge information processing", which would have a major impact on a wide area from technology through to the economy. The project took as its research goal the development of a parallel inference system, representing the paradigm of computer technology as applied to the theme. The goal was particularly challenging at that time. I recalled the words of a participant at the first conference held in 1981. He commented that it was doubtful whether Japanese researchers could succeed in such a project since we, at that time, had very little experience in these fields. However, despite the difficulties of the task ahead of us, we promoted the project from the viewpoint of contributing to the international community through research. In this regard, our endeavors in this area were targeted as pre-competitive technologies, namely basic research. This meant that we would have to start from scratch, assembling and training a group of researchers. To achieve our goal of creating a paradigm of new computer technology, taking an integrated approach starting from basic research, we settled on a research scheme after exhaustive preliminary deliberations. As part of its efforts to promote the dissemination of basic research results as international public assets, the government of Japan, reflecting its firm commitment to this area, decided to finance all research costs. The Institute for New Generation Computer Technology (ICaT), the sponsor of this conference, was established to act as a central research laboratory where brainpower could be concentrated. Such an organization was considered essential to the development of an integrated technology that could be applied to both hardware and software. The Institute"s research laboratory, that actually conducted the project's research and development, was founded precisely ten years ago, today, on June 1 of 1982. A number of highly qualified personnel, all of whom were excited by the ideal that the project pursued, were recruited from the government and industry. Furthermore, various ad hoc groups were formed to promote discussions among researchers in various fields, making Ie aT the key center for research communication in this field. The duration of the project was divided into three phases. Reviews were conducted at the end of each phase, from the viewpoint of human resources and technological advances, which made it possible to entrust various areas of the research. I believe that this approach increased efficiency, and also allowed flexibility by eliminating redundant areas of research. We have also been heavily involved in international exchanges, with the aim of promoting international contributions. Currently, we are involved in five different international research collaboration projects. These include work in the theorem proving field with the Australian National University (ANU), and research into constraint logic programming with the Swedish Institute of Computer Science (SICS). The results of these two collaborations, on display in the demonstration hall, are excellent examples of what research collaboration can achieve. We have also promoted international exchange by holding international conferences and by hosting researchers from abroad at ICaT. And, we have gone to great lengths to make public our project's achievements, including in- VB termediate results. 3 Succession of the Project's Ideal This project is regarded as being the prototype for all subsequent projects to be sponsored by MITI. It is largely due to the herculean efforts of the researchers, under the leadership of Dr. Fuchi and other excellent research leaders, that have led to the revolutionary advances being demonstrated at this conference. In the light of these achievements, and with an eye to the future, I can now state that there is no question of the need to make international contributions the basis of the policies governing future technological development at MITI. This ideal will be passed on to all subsequent research and development projects. A case in point is the Real World Computing (RWC) project scheduled to start this year. This project rests on a foundation of international cooperation. Indeed, the basic plan, approved by a committee a few days ago, specifically reflects the international exchange of opinions. The RWC project is a particularly challenging project that aims to investigate the fundamental principles of human-like flexible information processing and to implement it as a new information processing technology, taking full advantage of advancing hardware technologies. We will not fail to make every effort to achieve the project's objective~ for use as common assets for all mankind. 4 International Response As I mentioned earlier, I believe that the Fifth Generation Computer System Project has made valuable international contributions from its earliest stages. The project has stimulated international interest and responses from its outset. The great number of foreign participants present today illustrates this point. Around the world, a number of projects received their initial impetus from our project: these include the Strategic Computing Initiative in the U.S.A., the EC's Esprit project, and the Alvey Project in the United Kingdom. These projects were initially launched to compete with the Fifth Generation Computer Systems Project. Now, however, I strongly believe that since our ideal of international contributions has come to be understood around the globe, together with the realization that technology can not and should not be divided by borders, each project is providing the stimulus for the others, and all are making major contributions to the advancement of information processing technologies. 5 Free Access to the Project's Software One of the great virtues of science, given an open environment, between researchers using a common base of technology. IS the collaboration Vlll Considering this, it would be impractical for one person or even one nation to attempt to cover the whole range of technological research and development. Therefore, the necessity of international cooperation is self-evident from the standpoint of advancing the human race as a whole. In this vein, MITI has decided to promote technology globalism in the fields of science and technology, based on a concept of "international cooperative effort for creative activity and international exchange to maximize the total benefit of science and technology to mankind." We call this concept "techno-globalism". It is also important to establish an environment based on "techno-globalism", that supports international collaboration in basic and original research as a resource to solve problems common to all mankind as well as the dissemination of the resulting achievements. This could be done through international cooperation. To achieve this "techno-globalism" all countries should, as far as possible, allow free and easy access to their domestic technologies. This kind of openness requires the voluntary establishment of environments where anyone can access technological achievements freely, rather than merely asking other countries for information. It is this kind of international cooperation, with the efforts of both sides complementing each other, that can best accelerate the advancement of technology. We at MITI have examined our policies from the viewpoint of promoting international technological advancement by using the technologies developed as part of this project, the superbness of which- has encouraged us to set a new policy. Our project's resources focused mainly on a variety of software, including parallel operating systems and parallel logic programming languages. To date, the results of such a national project, sponsored by the government, were available only for a fee and could be used only under various conditions once they became the property of the government. Therefore, generally speaking, although the results have been available to the public, in principle, they have not been available to be used freely and widely. As I mentioned earlier, in the push toward reaching the goal of promoting international cooperation for technological advancement, Japan should take the initiative in creating an environment where all technologies developed in this project can be accessed easily. Now, I can formally announce that, concerning software copyrights in the research and development phase which are not the property of the government, the Institute for New Generation Computer Technology(ICOT), the owner of these copyrights of software products is now preparing to enable their free and and open use without charge. The adoption of this policy not only allows anyone free access to the software technologies developed as part of the project, but also make it possible for interested parties to inherit the results of our research, to further advance the technology. I sincerely hope that our adopting this policy will maximize the utilization of researchers' abilities, and promote the advancement of the technologies of knowledge information processing and parallel processing, toward which all efforts have been concentrated during the project. This means that our adopting this policy will not merely result in a one-way flow of technologies from Japan, but enhance the benefit to all mankind of the technological advancements brought on by a two-way flow of technology and the mutual benefits thus ix obtained. I should say that, from the outset of the Fifth Generation Computer Systems Project, we decided make international contributions an important objective of the project. We fashioned the project as the model for managing the MITI-sponsored research and development projects that were to follow. Now, as we near the completion of the project, we have decided to adopt a policy of free access to the software to inspire further international contributions to technological development. I ask all of you to understand the message in this decision. I very much hope that the world's researchers will make effective use of the technologies resulting from the project and will devote themselves to further developing the technologies. Finally, I'd like to close by expressing my heartfelt desire for this international conference to succeed in providing a productive forum for information exchange between participants and to act as a springboard for further advancements. Thank you very much for bearing with me. Hideaki Kumano Director General Machinery and Information Industries Bureau Ministry of International Trade and Industry (MITI) xi PREFACE Ten years have passed since the FGCS project was launched with the support of the Japanese government. As soon as the FGCS project was announced it had a profound effect not only on computer scientists but also on the computer industry. Many countries recognized the importance of the FGCS project and some of them began their own similar national projects. The FGCS project was initially planned as a ten-year project and this final fourth FGCS conference, therefore, has a historical meaning. For this reason the conference includes an ICOT session. The first volume contains a plenary session and the ICOT session. The plenary session is composed of many reports on the FGCS project with three invited lectures and a panel discussion. In the ICOT session, the logic-based approach and parallel processing will be emphasized through concrete discussions. In addition to these, many demonstration programs have been prepared by ICOT at the conference site, the participants are invited to visit and discuss these exhibitions. Through the ICOT session and the exhibitions, the participants will understand clearly the aim and results of the FGCS project and receive a solid image of FGCS. The second volume is devoted to the technical session which consists of three invited papers and technical papers submitted to this conference. Due to the time and space limitation of the conference, only 82 papers out of 256 submissions were selected by the program committee after careful and long discussion of many of the high quality papers submitted. It is our hope that the conference program will prove to be both worthwhile and enjoyable. As a program chairperson, it is my great pleasure to acknowledge the support of a number of people. First of all, I would like to give my sincere thanks to the program committee members who put a lot of effort into making the program attractive. lowe much to the three program vice-chairpersons, Professor Makoto Amamiya, Dr. Shigeki Goto and Professor Fumio Mizoguchi. Many ICOT members, including Dr. Kazunori Ueda, Ken Satoh, Keiji Hirata, and Hideki Yasukawa have worked as key persons to organize the program. Dr. Koichi Furukawa, in particular, has played an indispensable role in overcoming many problems. I would also like to thank the many referees from many countries who replied quickly to the referees sheets. Finally, I would like to thank the secretariat at ICOT, they made fantastic efforts to carry out the administrative tasks efficiently. Hozumi Tanaka Program Chairperson xiii CONFERENCE COMMITTEES Steering Committee Chairperson: Members: Kazuhiro Fuchi Hideo Aiso Setsuo Arikawa Ken Hirose Takayasu Ito Hiroshi Kashiwagi Hajime Karatsu Makoto Nagao Hiroki Nobukuni Iwao Toda Eiiti Wada ICOT Keio Univ. Kyushu Univ. Waseda Univ. Tohoku Univ. ETL Tokai Univ. Kyoto Univ. NTT Data NTT Univ. of Tokyo Conference Committee Hidehiko Tanaka Chairperson: Vice-Chairperson: Koichi Furukawa Members: Makoto Amamiya Yuichiro Anzai Shigeki Goto Mitsuru Ishizuka Kiyonori Konishi Takashi Kurozumi Fumio Mizoguchi Kunio Murakami Sukeyoshi Sakai Masakazu Soga Hozumi Tanaka Shunichi Uchida Kinko Yamamoto Toshio Yokoi Akinori Yonezawa Toshitsugu Yuba Univ. of Tokyo _ ICOT Kyushu Univ. Keio Univ. NTT Univ. of Tokyo NTT Data ICOT Science Univ. of Tokyo Kanagawa Univ. ICOT(Chairperson, Management Committee) ICOT(Chairperson, Technology Committee) Tokyo Institute of Technology ICOT JIPDEC EDR Univ. of Tokyo ETL Program Committee Chairperson: Hozumi Tanaka Vice-Chairpersons: Makoto Amamiya Shigeki Goto Fumio Mizoguchi Members: Koichi Furukawa Kazunori Ueda Ken Satoh Keiji Hirata Hideki Yasukawa Hitoshi Aida Yuichiro Anzai Arvind Ronald J. Brachman John Conery Doug DeGroot Koichi Fukunaga Jean-Luc Gaudiot Atsuhiro Goto Satoshi Goto Seif Haridi Ken'ichi Hagihara Tokyo Institute of Technology Kyushu Univ. NTT Science Univ. of Tokyo ICOT ICOT ICOT ICOT ICOT Univ. of Tokyo Keio Univ. MIT AT&T Univ. of Oregon Texas Instruments IBM Japan, Ltd. Univ. of Southern California NTT NEC Corp. SICS Osaka Univ. XlV Makoto Haraguchi Ryuzo Hasegawa Hiromu Hayashi Nobuyuki Ichiyoshi Mitsuru Ishizuka Tadashi Kanamori Yukio Kaneda Hirofumi Katsuno Masaru Kitsuregawa Shigenobu Kobayashi Philip D. Laird Catherine Lassez Giorgio Levi John W. Lloyd Yuji Matsumoto Dale Miller Kuniaki Mukai Hiroshi Motoda Katsuto Nakajima Ryohei Nakano Kenji Nishida Shojiro Nishio . Stanley Peters Ant6nio Porto Teodor C. Przymusinski Vijay Saraswat Taisuke Sato Masahiko Sato Heinz Schweppe Ehud Shapiro Etsuya Shibayama Kiyoshi Shibayama Yoav Shoham Leon Sterling Mark E. Stickel MamoruSugie Akikazu Takeuchi Kazuo Taki Jiro Tanaka Yuzuru Tanaka Philip Treleaven Sxun Tutiya Shalom Tsur D.H.D. Warren Takahira Yamaguchi Kazumasa Yokota Minoru Yokota Tokyo Institute of Technology ICOT Fujitsu Laboratories ICOT Univ. of Tokyo Mitsubishi Electric Corp. Kobe Univ. NTT Univ. of Tokyo Tokyo Institute of Technology NASA IBM T.J. Watson Univ. di Pisa Univ. of Bristol Kyoto Univ. Univ. of Pennsylvania Keio Univ. Hitachi Ltd. Mitsubishi Electric Corp. NTT ETL Osaka Univ. CSLI, Stanford Univ. Univ. Nova de Lisboa Univ. of California at Riverside Xerox PARC ETL Tohoku Univ. Institut fOr Informatik The Weizmann Institute of Science Ryukoku Univ. Kyoto Univ. Stanford Univ. Case Western Reserve Univ. SRI International Hitachi Ltd. Sony CSL ICOT Fujitsu Laboratories Hokkaido Univ. University College, London Chiba Univ. MCC Univ. of Bristol Shizuoka Univ. ICOT NEC Corp. Publicity Committee Chairperson: Kinko Yamamoto Vice-Chairperson: Kunio Murakami Members: Akira Aiba Yuichi Tanaka JIPDEC Kanagawa Univ. ICOT ICOT Demonstration Committee Chairperson: Takashi Kurozumi Vice-Chairperson: Shunichi Uchida ICOT ICOT xv LIST OF REFEREES Abadi, Martin A bramson, Harvey Agha, Gul A. Aiba, Akira Aida, Hitoshi Akama, Kiyoshi Ali, Khayri A. M. Alkalaj, Leon Amamiya, Makoto Amano, Hideharu Amano, Shinya America, Pierre Anzai, Yuichiro Aoyagi, Tatsuya Apt, Krzysztof R. Arikawa, Masatoshi Arikawa, Setsuo Arima, Jun Arvind Baba, Takanobu Babaguchi, Noboru Babb, Robert G., II Bancilhon, Fran<;ois Bansal, Arvind K. Barklund, Jonas Beaumont, Tony Beeri, Catriel Beldiceanu, Nicolas Benhamou, Frederic R. Bibel, Wolfgang Bic, Lubomir Biswas, Prasenjit Blair, Howard A. Boku, Taisuke Bonnier, Staffan Boose, John Borning, Alan H. Boutilier, Craig E. Bowen, David Brachman, Ronald J. Bradfield, J. C. Bratko, Ivan Brazdil, Pavel Briot, Jean-Pierre Brogi, Antonio Bruynooghe, Maurice Bry, Fran<;ois Bubst, S. A. Buntine, Wray L. Carlsson, Mats Chikayama, Takashi Chong, Chin Nyak Chu, Lon-Chan Ciepielewski, Andrzej Clancey, William J. Clark, Keith L. Codish, Michael Codognet, Christian Conery, John Consens, Mariano P. Crawford, James M., Jr. Culler, David E. Dahl, Veronica Davison, Andrew de Bakker, Jaco W. de Maindreville, Christophe Debray, Saumya K. Deen, S. M. DeGroot, Doug del Cerro, Luis Farinas Demolombe, Robert Denecker, Marc Deransart, Pierre Dincbas, Mehmet Drabent, Wlodzimierz Duncan, Timothy Jon Dutra, Ines Fahlman, Scott E. Falaschi, Moreno Faudemay, Pascal Feigenbaum, Edward Fitting, Melvin C. Forbus, Kenneth D. Fribourg, Laurent Fujisaki, Tetsu Fujita, Hiroshi Fujita, Masayuki Fukunaga, Koichi Furukawa, Koichi Gabbrielli, Maurizio Gaines, Brian R. Gardenfors, Peter Gaudiot, Jean-Luc Gazdar, Gerald Gelfond, Michael Gero, John S. Giacobazzi, Roberto Goebel, Randy G. Goodwin, Scott D. Goto, Atsuhiro Goto, Satoshi Goto, Shigeki Grama, Ananth Gregory, Steve Gunji, Takao Gupta, Anoop Hagihara, Kenichi Hagiya, Masami Han, Jiawei Hanks, -Steve Hara, Hirotaka Harada, Taku Haraguchi, Makoto Haridi, Seif Harland, James Hasegawa, Ryuzo Hasida, K6iti Hawley, David J. Hayamizu, Satoru Hayashi, Hiromu Henry, Dana S. Henschen, Lawrence J. Herath, J ayantha Hewitt, Carl E. Hidaka, Yasuo Higashida, Masanobu Hiraga, Yuzuru Hirata, Keiji Hobbs, Jerry R. Hogger, Christopher J. Hong, Se June Honiden, Shinichi Hori, Koichi Horita, Eiichi Hori~chi, Kenji Hsiang, Jieh Iannucci, Robert A. Ichikawa, Itaru XVI Ichiyoshi, Nobuyuki Ida, Tetsuo Ikeuchi, Katsushi Inoue, Katsumi Ishida, Toru Ishizuka, Mitsuru Iwasaki, Yumi I wayama, Makoto Jaffar, Joxan J ayaraman, Bharat Kahn, Gilles Kahn, Kenneth M. Kakas, Antonios C. Kameyama, Yukiyoshi Kanade, Takeo Kanamori, Tadashi Kaneda, Yukio Kaneko, Hiroshi Kanellakis, Paris Kaplan, Ronald M. Kasahara, Hironori Katagiri, Yasuhiro Katsuno, Hirofumi Kautz, Henry A. Kawada, TsutonlU Kawamura, Tadashi Kawano, Hiroshi Keller, Robert Kemp, D'avid Kifer, Michael Kim, Chinhyun Kim, Hiecheol Kim, WooYoung Kimura, Yasunori Kinoshita, Yoshiki Kitsuregawa, Masaru Kiyoki, Yasushi Kluge, Werner E. Kobayashi, Shigenobu Kodratoff, Yves Kohda, Youji Koike, Hanpei Komorowski, Jan Konagaya, Akihiko Kono, Shinji Konolige, Kurt Korsloot, Mark Koseki, Yoshiyuki Kraus, Sarit Kumar, Vipin K unen, Kenneth Kunifuji, Susumu Kurita, Shohei K urokawa, Toshiaki Kusalik, Anthony J. Laird, Philip D. Lassez, Catherine Leblanc, Tom Lescanne, Pierre Leung, Ho-Fung Levesque, Hector J. Levi, Giorgio Levy, J ean-J acques Lieberman, Henry A. Lindstrom, Gary Lloyd, John W. Lusk, Ewing L. Lytinen, Steven L. Maher, Michael J. Makinouchi, Akifumi Manthey, Rainer Marek, Victor Marriott, Kim Martelli, Maurizio Maruoka, Akira Maruyama, Fumihiro Maruyama, Tsutomu Masunaga, Yoshifumi Matsubara, Hitoshi Matsuda, Hideo Matsumoto, Yuji Matsuoka, Satoshi McCune, William, W. Memmi, Daniel Mendelzon, Alberto O. Menju, Satoshi Meseguer, Jose Michalski, Richard S. Michie, Donald Miller, Dale A. Millroth, Hakan Minami, Toshiro Minker, Jack Miyake, Nobuhisa 1Iliyano, Satoru Miyazaki, Nobuyoshi Miyazaki, Toshihiko Mizoguchi, Fumio Mizoguchi, Riichiro Mori, Tatsunori Morishita, Shinichi Morita, Yukihiro Motoda, Hiroshi Mowtesi, Dawilo Mukai, K uniaki Mukouchi, Yasuhito Murakami, Kazuaki Murakami, Masaki M uraki, Kazunori Muraoka, Yoichi N adathur, Gopalan Naganuma, Jiro N agashima, Shigeo Nakagawa, Hiroshi Nakagawa, Takayuki Nakajima, Katsuto Nakamura, J unichi Nakano, Miyuki Nakano, Ryohei Nakashima, Hideyuki Nakashima, Hiroshi Nakata, Toshiyuki N akayatna, Masaya N aqvi, Shamim A. N atarajan, Venkat Nikhil, Rishiyur, S. Nilsson, J 0rgen Fischer Nilsson, Martin Nishida, Kenji Nishida, Toyoaki Nishikawa, Hiroaki Nishio, Shojiro Nitta, Izumi Nitta, Katsumi N oye, Jacques N umao, Masayuki N umaoka, Chisato 'Rorke, Paul V. Ogura, Takeshi o hki, Masaru Ohmori, I{enji Ohori, Atsushi Ohsuga, Akihiko Ohsuga, Setsuo Ohwada, Hayato Oka, Natsuki Okumura, Manabu Ono, Hiroakira Ono, Satoshi Overbeek, Ross A. o XVII Oyanagi, Shigeru Palamidessi, Catuscia Panangaden, Prakash Pearl, Judea Pereira, Fernando C. Pereira, LUIs MonIz Petrie, Charles J. Plaisted, David A. Plumer, Lutz Poole, David Popowich, Fred P. Porto, Antonio Przymusinski, Teodor C. Raina, Sanjay Ramamohanarao, Kotagiri Rao, Anand S. Reddy, U day S. Ringwood, Graem A. Robinson, John Alan Rojas, Raul Rokusawa, Kazuaki Rossi, Francesca Rossi, Gianfranco Russell, Stuart J. Sadri, Fariba Saint-Dizier, Patrick Sakai, Hiroshi Sakai, Ko Sakai, Shuichi Sakakibara, Yasubumi Sakama, Chiaki Sakurai, Akito Sakurai, Takafumi Sangiorgi, Davide Santos Costa, Vltor Saraswat, Vijay A. Sargeant, John Sato, Masahiko Sato, Taisuke Sato, Yosuke Satoh, Ken Schweppe, Heinz Seki, Hirohisa Seligman, Jerry M. Sergot, Marek J. Sestito, Sabrina Shanahan, Murray Shapiro, Ehud Shibayama, Etsuya Shibayama, Kiyoshi Shibayama, Shigeki Shimada, Kentaro Shin, Dongwook Shinohara, Takeshi Shintani, Toramatsu Shoham, Yoav Simonis, Helmut Sirai, Hidetosi Smith, Jan Magnus Smolka, Gert Sterling, Leon S. Stickel, Mark E. Stolfo, Salvatore. J. Subrahmani an , V. S. Sugano, Hiroyasu Sugie, Mamoru Sugiyama, Masahide Sundararajan, Renga Suwa, Masaki Suzuki, Hiroyuki Suzuki, Norihisa Takagi, Toshihisa Takahashi, Mitsuo Takahashi, N aohisa Takahashi, Yoshizo Takayama, Yukihide Takeda, Masayuki Takeuchi, Akikazu Takeuchi, Ikuo Taki, Kazuo Tarnai, Tetsuo Tamura, Naoyuki Tanaka, Hozumi Tanaka, J iro Tanaka, Katsumi Tanaka, Yuzuru Taniguchi, Rin-ichiro Tatemura, Jun'ichi Tatsuta, Makoto Terano, Takao Tick, Evan M. Toda, Mitsuhiko Togashi, Atsushi Tojo, Satoshi Tokunaga, Takenobu Tomabechi, Hideto Tomita, Shinji Tomiyama, J;'etsuo Touretzky, David S. Toyama, Yoshihi to Tsuda, Hiroshi Tsur, Shalom Tutiya, Syun U chihira, N aoshi U eda, Kazunori Uehara, K uniaki Ueno, Haruki van de Riet, Reinder P. van Emden, Maarten H. Van Hentenryck, Pascal Van Roy, Peter L. Vanneschi, Marco Wada, Koichi Wah, Benjamin W. Walinsky, Clifford Walker, David Waltz, David L. Warren, David H. D. Warren, David Scott Watanabe, Takao Watanabe, Takuo Watanabe, Toshinori ",\iVatson, Ian Watson, Paul Weyhrauch, Richard W. Wilk, Pau~ F. Wolper, Pierre Yamaguchi, Takahira Yamamoto, Akihiro Yamanaka, Kenjiroh Yang, Rong Yap, Roland Yardeni, Eyal Yasukawa, Hideki Yokoo, Nlakoto Yokota, Raruo Yokota, Kazumasa Yokota, Minoru Yokoyama, Shoichi Yonezaki, N aoki Yonezawa, Akinori Yoo, Namhoon Yoon, Dae-Kyun Yoshida, Hiroyuki Yoshida, Kaoru Yoshid~, Kenichi Yoshida, N orihiko Yoshikawa, Masatoshi Zerubia, Josiane B. XIX CONTENTS OF VOLUME 1 PLENARY SESSIONS Keynote Speech Launching the New Era Kazumro Fuchi ........................3 General Report on ICOT Research and Developnlent Overview of the Ten Years of the FGCS Project Takashi K urozurru . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . .. 9 Summary of Basic Resear:ch Activities of the FGCS Project .20 Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . . Summary of the Parallel Inference Machine and its Basic Software Sh uni chi Uchida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Report on ICOT Research Results Parallel Inference Machine PIM Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . Operating System PIMOS and Kernel Language KL1 Takashi Chikayama . . . . . . . . . . . . . . . . . . Towards an Integrated Knowledge-Base Management System: Overview of R&D on Databases and Knowledge-Bases in the FGCS Project Kazumasa Yokota and Hideki Yasukawa . . . . . . . . . . . . . . . . . . . . . . . . Constraint Logic Programming System: CAL, GDCC and Their Constraint Solvers Akira Aibaand Ryuzo Hasegawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Theorem Provers and Their Applications Ryuzo Hasegawa and Masayuki Fujita Natural Language Processing Software Yuichi Tanaka . . . . . . . . . . . . . . . . . . . . . . . Experimental Parallel Inference Software Katsumi Nitta, Kazuo Taki and Nobuyuki Ichiyoshi Invited Lect ures Formalism vs. Conceptualism: Interfaces between Classical Software Development Techniques and Knowledge Engineering Dines Bj¢rner . . . . . . . . . . . . . . . . . . . . . . . . . . .. . The Role of Logic in Computer Science and Artificial Intelligence J. A. Robinson . . . Programs are Predicates C. A. R. Hoare . .. Panel Discussion! A Springboard for Information Processing in the 21st Century PANEL: A Springboard for Information Processing in the 21st Century Robert A. Kowalski (Chairman) . . . . . . . . . . . . . . . . . . . . Finding the Best Route for Logic Programming Herve Ga1laire . . . . . . . . . . . . . . . . . . The Role of Logic Programming in the 21st Century Ross Overbeek . . . . . . . . . . . . . . Object-Based Versus Logic Programming Peter Wegner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Concurrent Logic Programming as a Basis for Large-Scale Knowledge Information Processing Koichi Furukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50 .73 .89 113 132 155 166 191 199 211 219 220 223 225 230 xx Knowledge Information Processing in the 21st Century Shunichi Uchida " . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 IeOT SESSIONS "Parallel VLSI-CAD and KBM Systems LSI-CAD Programs on Parallel Inference Machine Hiroshi Date, Yukinori Matsumoto, Kouichi Kimura, Kazuo Taki, Hiroo Kato and Masahiro Hoshi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Parallel Database Management System: Kappa-P Moto Kawamura, Hiroyuki Sato, Kazutomo Naganuma and Kazumasa Yokota . . . . . . . . 248 Objects, Properties, and Modules in QUIXOTE: Hideki Yasukawa, Hiroshi Tsuda and Kazumasa Yokota . . . . . . . . . . . . . . . . . . . . . . 257 Parallel Operating System, PIM OS Resource Management Mechanism of PIMOS Hiroshi Yashiro, Tetsuro Fujise, Takashi Chikayama, Masahiro Matsuo, Atsushi Hori and K umiko vVada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 The Design of the PIMOS File System Fumihide Itoh, Takashi Chikayama, Takeshi Mori, Masaki Sat 0, Tatsuo Kato and Tadashi Sato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 ParaGraph: A Graphical Tuning Tool for Multiprocessor Systems Seiichi Aikawa, Mayumi Kamiko, Hideyuki Kubo, Fumiko Matsuzawa and Takashi Chikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Genetic Information Processing Protein Sequence Analysis by Parallel Inference Machine Masato Ishikawa, Masaki Hoshida, Makoto Hirosawa, Tomoyuki Toya, Kentaro Onizuka and Katsumi Nitta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Folding Simulation using Temperature Parallel Simulated Annealing Makoto Hirosawa, Richard J. Feldmann, David Rawn, Masato Ishikawa, Masaki Hoshida and George Michaels . . . . . . . . . . . . ". . . . . . . . . . . . . . . . . . . . . . . . . . . Toward a Human Genome Encyclopedia Kaoru Yoshida, Cassandra Smith, Toni Kazic, George Michaels, Ron Taylor, David Zawada, Ray Hagstrom and Ross Overbeek . . . . . . . . . . . . . . . . . . . . . . . . . . Integrated System for Protein Information Processing Hidetoshi Tanaka . . . . " . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 300 307 321 Constraint Logic Progralnming and Parallel Theorenl. Proving Parallel Constraint Logic Programming Language GDCC and its Parallel Constraint Solvers Satoshi Terasaki, David J. Hawley, Hiroyuki Sawada, Ken Satoh, Satoshi Menju, Taro Kawagishi, Noboru Iwayama and Akira Aiba . . . . . . . . . . . . . . . . . . . . . . . . 330 cu-Prolog for Constraint-Based Grammar Hiroshi Tsuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Model Generation Theorem Provers on a Parallel Inference Machine Masayuki Fujita, Ryuzo Hasegawa, Miyuki Koshimura and Hiroshi Fujita 357 Natural Language Processing On a Grammar Formalism, Knowledge Bases and Tools for Natural Language Processing in Logic Programming Hiroshi Sana and Fumiyo Fukumoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 xxi Argument Text Generation System (Dulcinea) Teruo Ikeda, Akira Kotani, Kaoru Hagiwara and Yukihiro Kubo . . . . . . . . . . . . . . . . . 385 Situated Inference of Temporal Information Satoshi Tojo and Hideki Yasukawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 A Parallel Cooperation Model for Natural Language Processing Shigeichiro Yamasaki, Michiko Turuta, Ikuko Nagasawa and Kenji Sugiyama . . . . . . . . . 405 Parallel Inference Machine (PIM) Architecture and Implementation of PIM/p Kouichi Kumon, Akira Asato, Susumu Arai, Tsuyoshi Shinogi, Akira Hattori, . . . 414 Hiroyoshi Hatazawa and Kiyoshi Hirano . . . . . . . . . . . . . . . . . . . . Architecture and Implementation of PIM/m Hiroshi Nakashima, Katsuto Nakajima, Seiichi Kondo, Yasutaka Takeda, Yu Inamura, Satoshi Onishi and Kanae Masuda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Parallel and Distributed Implementation of Concurrent Logic Programming Language KLl Keiji Hirata, Reki Yamamoto, Akira Imai, Hideo Kawai, Kiyoshi Hirano, Tsuneyoshi Takagi, Kazuo Taki, Akihiko Nakase and Kazuaki Rokusawa 436 A uthor Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .i xxiii CONTENTS OF VOLUME 2 FOUNDATIONS Reasoning about ProgralTIS Logic Program Synthesis from First Order Logic Specifications Tadashi Kawamura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sound and Complete Partial Deduction with Unfolding Based on Well-Founded Measures Bern Martens, Danny De Schreye and Maurice Bruynooghe . . . . . . . . . . . . . . . . . A Framework for Analyzing the Termination of Definite Logic Programs with respect to Call Patterns Danny De Schreye, Kristof Verschaetse and Maurice Bruynooghe . . . . . . . . . . . . . . . Automatic Verification of GHC-Programs: Termination Lutz Pliimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analogy Analogical Generalization Takenao Ohkawa, Toshiaki Mori, Noboru Babaguchi and Yoshikazu Tezuka Logical Structure of Analogy: Preliminary Report Jun Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abduction (1) Consistency-Based and Abductive Diagnoses as Generalised Stable Models Chris Preist and Kave Eshghi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Forward-Chaining Hypothetical Reasoner Based on Upside-Down Meta-Interpretation Yoshihiko Ohta and Katsumi Inoue . . . . . . . . . . . . . . . . . . . . . . . . . Logic Programming, Abduction and Probability David Poole . . . . . . . . . . . . . . . . . . . Abduction (2) Abduction in Logic Programming with Equality P. T. Cox, E. Knill and T. Pietrzykowski Hypothetico-Dedudive Reasoning Chris Evans and Antonios C. Kakas . . . . . . . . . . . . . . . . . . . . . . . . . . . Acyclic Disjunctive Logic Programs with Abductive Procedures as Proof Procedure Phan Minh Dung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Semantics of Logic Programs Adding Closed '\iVorld Assumptions to Well Founded Semantics Luis Moniz Pereira, Jose J. Alferes and Joaquim N. Aparicio . . . . . . . . . . . Contributions to the Semantics of Open Logic Programs A. Bossi, M. Gabbrielli, G. Levi and M. C. Meo . . . . . . . . . . . . . . . . . . . A Generalized Semantics for Constraint Logic Programs Roberto Giacobazzi, Saumya K. Debray and Giorgio Levi Extended Well-Founded Semantics for Paraconsistent Logic Programs Chiaki Sakama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 473 481 489 ." 497 .505 514 522 530 539 546 .. 555 562 570 581 . 592 Invited Paper Formalizing Database Evolution in the Situation Calculus Raymond Reiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 xxiv Machine Learning Learning Missing Clauses by Inverse Resolution Peter Idestam-Almquist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 A Machine Discovery from Amino Acid Sequences by Decision Trees over Regular Patterns Setsuo Arikawa, Satoru Kuhara, Satoru Miyano, Yasuhito Mukouchi, Ayumi Shinohara and Takeshi Shinohara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . " 618 Efficient Induction of Version Spaces through Constrained Language Shift Claudio Carpineto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Theorem Proving Theorem Proving Engine and Strategy Description Language Massimo Bruschi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 ANew Algorithm for Subsumption Test Byeong Man Kim, Sang Ho Lee, Seung Ryoul Maeng and Jung Wan Cho . . . . . . . . . . 643 On the Duality of Abduction and Model Generation Marc Denecker and Danny De Schreye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650 Functional Programming and Constructive Logic Defining Concurrent Processes Constructively Yukihide Takayama .. '. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Realizability Interpretation of Coinductive Definitions and Program Synthesis with Streams Makoto Tatsuta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MLOG: A Strongly Typed Confluent Functional Language with Logical Variables Vincent Poirriez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANew Perspective on Integrating Functional and Logic Languages John Darlington, Yi-ke Guo and Helen Pull . . . . . . . . . . . .' . . . . . . . . . . . . . . . . . . Telnporal Reasoning A Mechanism for Reasoning about Time and Belief Hideki Isozaki and Yoav Shoham . . . . . . . . Dealing with Time Granularity in the Event Calculus Angelo Montanari, Enrico Maim, Emanuele Ciapessoni and Elena Ratto 658 666 674 682 694 702 ARCHITECTURES & SOFTWARE Hardware Architecture and Evaluation UNIRED II: The High PerforII?-ance Inference Processor for the Parallel Inference Machine PIE64 Kentaro Shimada, Hanpei Koike and Hidehiko Tanaka . . . . . . . . . . . . . . . . . . . . .. Hardware Implementation of Dynamic Load Balancing in the Parallel Inference Machine PIM/c T. Nakagawa, N. Ido, T. Tarui, M. Asaie and M. Sugie . . . . . . . . . . . . . . . . . . . Evaluation of the EM-4 Highly Parallel Computer using a Game Tree Searching Problem Yuetsu Kodama, Shuichi Sakai and Yoshinori Yamaguchi . . . . . . . . . . . . . . . . . OR-Parallel Speedups in a Knowledge Based System: on Muse and Aurora Khayri A. M. Ali and Roland Karlsson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 723 731 739 Invited Paper A Universal Parallel Computer Architecture William J. Dally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 xxv AND-Parallelisrn. and OR-Parallelism An Automatic Translation Scheme from Prolog to the Andorra Kernel Language Francisco Bueno and Manuel Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Recomputation based Implementations of And-Or Parallel Prolog Gopal Gupta and Manuel V. Hermenegildo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 Estimating the Inherent Parallelism in Prolog Programs David C. Sebr and Laxmikant V. Kale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 Implementation Techniques Implementing Streams on Parallel Machines with Distributed Memory Koicbi Konisbi, Tsutomu Maruyama, Akibiko Konagaya, Kaoru Yosbida and Takasbi Cbikayama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Message-Oriented Parallel Implementation of Moded Flat GHC Kazunori Ueda and Masao Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Towards an Efficient Compile-Time Granularity Analysis Algorithm X. Zbong, E. Tick, S. Duvvuru, L. Hansen, A. V. S. Sastry and R. Sundararajan 809 Providing Iteration and Concurrency in Logic Programs through Bounded Quantifications Jonas Barklund and Hakan Millrotb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Extension of Logic Programming An Implementation for a Higher Level Logic Programming Language Antbony S. K. Cbeng and Ross A. Paterson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Implementing Prolog Extensions: a Parallel Inference Machine Jean-Marc Alliot, Andreas Herzig and Mamede Lima-Marques . . . . . . . . . . . . . . . . . . 833 Parallel Constraint Solving in Andorra-I Steve Gregory and Rong Yang . . . . . . . . . . . . . . . . . . . . 843 A Parallel Execution of Functional Logic Language with Lazy Evaluation Jong H. Nang, D. W. Sbin, S. R. Maeng and Jung W. Cbo . . . . . . . . . . . . . . . . . . . . 851 Task Scheduling and Load Analysis Self-Organizing Task Scheduling for Parallel Execution of Logic Programs Zbeng Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymptotic Load Balance of Distributed Hash Tables Nobuyuki Icbiyosbi and Kouicbi Kimura 859 869 Concurrency Constructing and Collapsing a Reflective Tower in Reflective Guarded Horn Clauses Jiro Tanaka and Fumio Matono . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 CHARM: Concurrency and Hiding in an Abstract Rewriting Machine 887 Andrea Corradini, Ugo Montanari and Francesca Rossi . . .... Less Abstract Semantics for Abstract Interpretation of FGHC Programs Kenji Horiucbi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897 Databases and Distributed SystelTIS Parallel Optimization and Execution of Large Join Queries .907 Eileen Tien Lin, Edward Omiecinski and Sudbakar Yalamancbili . . . . . . . Towards an Efficient Evaluation of Recursive Aggregates in Deductive Databases 915 Alexandre Lefebvre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Distributed Programming Environment based on Logic Tuple Spaces .......... 926 Paolo Ciancarini and David Gelernter . . . . . . . . . . . . . . . . . XXVI Programming EnvirOlU11.ent Visualizing Parallel Logic Programs with VISTA E. Tick . . . . . . . . . . . . . . . . . . . . . . Concurrent Constraint Programs to Parse and Animate Pictures of Concurrent Constraint Programs Kenneth M. Kahn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logic Programs with Inheritance Yaron Goldberg, William Silverman and Ehud Shapiro . . . . . . . . . . . . . . . . . . . Implementing a Process Oriented Debugger with Reflection and Program Transformation Munenori Maeda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 934 943 951 961 Prod uction Systel11.S ANew Parallelization Method for Production Systems E. Bahr, F. Barachini and H. Mistelberger . . . . . . . . . . . . . . . . . . . . . . . . . . 969 Performance Evaluation of the Multiple Root Node Approach to the Rete Pattern Matcher for Production Systems Andrew Sohn and Jean-Luc Gaudiot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 APPLICATIONS & SOCIAL IMPACTS Constraint Logic Programn1.ing Output in CLP(R) Joxan Jaffar, Michael J. Maher, Peter J. Stuckey and Roland H. C. Yap Adapting CLP(R) to Floating-Point Arithmetic J. H. M. Lee and M. H. van Emden .. Domain Independent Propagation Thierry Le Provost and Mark Wallace . . . . . . . . A Feature-Based Constraint System for Logic Programming with Entailment Hassan Ai't-Kaci, Andreas Podelski and Gert Smolka . . . . . . . . . . . . Qualitative Reasoning Range Determination of Design Parameters by Qualitative Reasoning and its Application to Electronic Circuits Masaru Ohki, Eiji Oohira, Hiroshi Shinjo and Masahiro Abe Logical Implementation of Dynamical Models Yoshiteru Ishida . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knowledge Representation The CLASSIC Knowledge Representation System or, KL-ONE: The Next Generation Ronald J. Brachman, Alexander Borgida, Deborah L. McGuinness, Peter F. PatelSchneider and Lori Alperin Resnick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Morphe: A Constraint-Based Object-Oriented Language Supporting Situated Knowledge Shigeru Watari, Y,asuaki Honda and Mario Tokoro . . . . . . . . . . . . On the Evolution of Objects in a Logic Programming Framework F. Nihan Kesim and Marek Sergot . . . . . . . . . . . . . . . . . . . . . . . . .987 .996 1004 1012 1022 1030 1036 1044 1052 Panel Discussion: Future Direction of Next Generation Applications The Panel on a Future Direction of New Generation Applications Fumio Mizoguchi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061 Knowledge Representation Theory Meets Reality: Some Brief Lessons from the CLASSIC Experience Ronald J. Brachman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063 XXVll Reasoning with Constraints Catherine Lassez Developments in Inductive Logic Programming Stephen A1uggleton . . . . . . . . . . . . . . . . . . . . . Towards the General-Purpose Parallel Processing System Kazuo Taki .. . . . . . . . . . . . . . . . . . . . . . . . . Knowledge-Based SystelTIS A Hybrid Reasoning System for Explaining Mistakes in Chinese Writing Jacqueline Castaing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Automatic Generation of a Domain Specific Inference Program for Building a Knowledge Processing System Takayasu I{asahara, Naoyuki Yamada, Yasuhiro Kobayashi, Katsuyuki Yoshino and Kikuo Yoshimura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knowledge-Based Functional Testing for Large Software Systems Uwe Nonnenmann and John K. Eddy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Diagnostic and Control Expert System Based on a Plant Model Junzo Suzuki, Chiho Konuma, Mikito Iwamasa, Naomichi Sueda, Shigeru Mochiji and Akimoto Kamiya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1066 1071 1074 1076 1084 1091 1099 Legal Reasoning A Semiformal Metatheory for Fragmentary and Multilayered Knowledge as an Interactive Metalogic Program Andreas Hamfelt and Ake Hansson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1107 HELIC-II: A Legal Reasoning System on the Parallel Inference Machine Katsumi Nitta, Yoshihisa Ohtake, Shigeru Maeda, Masayuki Ono, Hiroshi Ohsaki and Kiyokazu Sakane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115 Natural Language Processing Chart Parsers as Proof Procedures for Fixed-Mode Logic Programs David A. Rosenblueth . . . . . . . . . . . . . . . . . . . . A Discourse Structure Analyzer for Japanese Text I{. Sumita, K. Ono, T. Chino, T. Ukita and S. Amano . . . . . . . Dynamics of Symbol Systems: An Integrated Architecture of Cognition Koiti Hasida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Knowledge Support Systems Mental Ergonomics as Basis for New-Generation Computer Systems M. H. van Emden .. . . . . . . . . . . . . . . . . . . . . . . . . . . An Integrated Knowledge Support System B. R. Gaines, M. Linster and M. L. G. Shaw . . . . . . . . . . . . Modeling the Generational Infrastructure of Information Technology B. R. Gaines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125 1133 1141 1149 1157 1165 Parallel Applications Co-HLEX: Co-operative Recursive LSI Layout Problem Solver on Japan's Fifth Generation Parallel Inference Machine Toshinori Watanabe and Keiko Komatsu . . . . . . . . . . . .. 1173 A Cooperative Logic Design Expert System on a Multiprocessor Yoriko Minoda, Shuho Sawada, Yuka Takizawa, Fumihiro Maruyama and Nobuaki I{awato . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181 A Parallel Inductive Learning Algorithm for Adaptive Diagnosis Yoichiro N akakulci, Yoshiyuki Koseki and Midori Tanaka 1190 xxviii Parallel Logic Simulator based on Time Warp and its Evaluation Yukinori lVlatsumoto and Kazuo Taki . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198 Invited Paper Applications of Machine Learning: Towards Knowledge Synthesis Ivan Bratko 1207 A uthor Index . . . i PLENARY SESSIONS PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 3 Launching the New Era Kazuhiro Fuchi Director, Research Center Institute for New Generation Computer Technology (ICOT) 4-28, Iv1ita l-chome, Minato-ku, Tokyo 108, Japan Thank you for coming to FGCS'92. As you know, we have been conducting a ten-year research project on fifth generation computer systems. Today is the tenth anniversary of the founding of our research center, making it exactly ten years since our project actually started. The first objective of this international conference is to show what we have accomplished in our research during these ten years. Another objective of this conference is to offer an opportunity for researchers to present the results of advanced research related to Fifth Generation Computer Systems and to exchange ideas. A variety of innovative studies, in addition to our own, are in progress in many parts of the world, addressing the future of computers and information processing technologies. I constantly use the phrase "Parallel Inference" as the keywords to simply and precisely describe the technological goal of this project. Our hypothesis is that parallel inference technology will provide the core for those new technologies in the future-technologies that will be able to go beyond the framework of conventional computer technologies. During these ten years I have tried to explain this idea whenever I have had the chance. One obvious reason why I have repeated the same thing so many times is that I wish its importance to be recognized by the public. However, I have another, less obvious, reason. When this project started, an exaggerated image of the project was engendered, which seems to persist even now. For example, some people believed that we were trying, in this project, to solve in a mere ten years some of the most difficult problems in the field of artificial intelligence (AI), or to create a machine translation system equipped with the same capabilities as humans. In those days, we had to face criticism, based upon that false image, that it \V,1.S a reckless project trying to tackle impossible goals. Now we see criticism, from inside and outside the country, that the project has failed because it has been unable to realize those grand goals. The reason why such an image was born appears to have something to do with FGCS'Sl-a conference we held one year before the project began. At that confer- ence we discussed many different dreams and concepts. The substance of those discussions was reported as sensational news all over the world. A vision with such ambitious goals, however, can never be materialized as a real project in its original form. Even if a project is started in accordance with the original form, it cannot be managed and operated within the framework of an effective research scheme. Actually, our plans had become much more modest by the time the project was launched. For example, the development of application systems, such as a machine translation system, was removed from the list of goals. It is impossible to complete a highly intelligent system in ten years. A preliminary stage is required to enhance basic studies and to reform computer technology itself. We decided that we should focus our efforts on these foundational tasks. Another reason is that, at that time in Japan, some private companies had already begun to develop pragmatic, low-level machine-translation systems independently and in competition with each other. Most of the research topics related to pattern recognition were also eliminated, because a national project called "Pattern Information Processing" had already been conducted by the Ministry of International Trade and Industry for ten years. We also found that the stage of the research did not match our own. We thus deliberately eliminated most research topics covered by Pattern Information Processing from the scope of our FGCS project. However, those topics themselves are very important and thus remain major topics for research. They may become a main theme of another national project of Japan in the future. Does all this mean that FGCS'Sl was deceptive? I do not think so. First, in those days, a pessimistic outlook predominated concerning the future development of technological research. For example, there was a general trend that research into artificial intelligence would be of no practical use. In that sort of situation, there was considerable value in maintaining a positive attitude toward the future of technological research-whether this meant ten years or fifty. I believe that this was the very reason 4 why we received remarkable reactions, both positive and negative, from the public. The second reason is that the key concept of Parallel Inference was presented in a clear-cut form at FGCS'Sl. Let me show you a diagram (Figure 1). This diagram is the one I used for my speech at FGCS'81, and is now a sort of "ancient document." Its draft was completed in 1980, but I had come up with the basic idea four years earlier. After discussing the concept with my colleagues for four years, I finally completed this diagram. Here, you can clearly see our concept that our goal should be a "Parallel Inference Machine." We wanted to create an inference machine, starting with study on a variety of parallel architectures. For this purpose, research into a new language was necessary. We wanted to develop a 5G-kernel language--what we now call KLl. The diagram includes these hopes of ours. The upper part of the diagram shows the research infrastructure. A personal inference machine or workstation for research purposes should be created, as well as a chip for the machine. We expected that the chip would be useful for our goal. The computer network should be consolidated to support the infrastructure. The software aspects are shown in the bottom part of the diagram. Starting with the study on software engineering and AI, we wanted to build a framework for high-level symbol processing, which should be used to achieve our goal. This is the concept I presented at the FGCS'81 conference. I would appreciate it if you would compare this diagram with our plan and the results of the final stage of this project, when Deputy Director Kurozumi shows you them later. I would like you to compare the original structure conceived 12 years ago and the present results of the project so that you can appreciate what has been accomplished and criticize what is lacking or what was immature in the original idea. Some people tend to make more of the conclusions drawn by a committee than the concepts and beliefs of an individual. It may sound a little bit beside point, but I have heard that there is a proverb in the West that goes, "The horse designed by a committee will turn out to be a camel." The preparatory committee for this project had a series of enthusiastic discussions for three years before the project's launching. I thought that they were doing an exceptional job as a committee. Although the committee's work was great, however, I must say that the plan became a camel. It seems that their enthusiasm created some extra humps as well. Let me say in passing that some people seem to adhere to those humps. I am surprised that there is still such a so-called bureaucratic view even among academic people and journalists. This is not the first time I have expressed this opinion of mine about the goal of the project. I have, at least in Japanese, been declaring it in public for the past ten years. I think I could have been discharged at any time had my opinion been inappropriate. As the person in charge of this project, I have pushed forward with the lines of Parallel Inference based upon my own beliefs. Although I have been criticized as still being too ambitious, I have always been prepared to take responsibility for that. Since the project is a national project, it goes without saying that it should not be controlled by one person. I have had many discussions with a variety of people for more than ten years. Fortunately, the idea of the project has not remained just a personal belief but has become a common belief shared by the many researchers and research leaders involved in the project. Assuming that this project has proved to be successful, as I believe it has, this fact is probably the biggest reason for its success. For a research project to be successful, it needs to be favored by good external conditions. But the most important thing is that the research group involved has a common belief and a common will to reach its goals. I have been very fortunate to be able to realize and experience this over the past ten years. So much for introductory remarks. I wish to outline, in terms of Parallel Inference, the results of our work conducted over these ten years. I believe that the remarkable feature of this project is that it focused upon one language and, based upon that language, experimented with the development of hardware and software on a large scale. From the beginning, we envisaged that we would take logic programming and give it a role as a link that connects highly parallel machine architecture and the problems concerning applications and software. Our mission was to find a programming language for Parallel Inference. A research group led by Deputy Director Furukawa was responsible for this work. As a result of their efforts, Veda came up with a language model, GHC, at the beginning of the intermediate stage of the project. The two main precursors of it were Parlog and Concurrent Prolog. He enhanced and simplified them to make this model. Based upon GHC, Chikayama designed a programming language called KL1. KL1, a language derived from the logic programming concept, provided a basis for the latter half of our project. Thus, all of our research plans in the final stage were integrated under a single language, KLl. For example, we developed a hardware system, the Multi-PSI, at the end of the intermediate stage, and demonstrated it at FGCS'88. After the conference we made copies and have used them as the infrastructure for software research. In the final stage, we made a few PIM prototypes, a Parallel Inference Machine that has been one of our final research goals on the hardware side. These prototypes are being demonstrated at this conference. 5 (Year) 1 @ © 5 Network 10 -----(optiC:5) ---- (Reducing to chips) } - - - - - - - - - - - - Personal inference machine PROLOG machine + a v ~ (comparable to large-scale) machine currently used LISP ) APL SMALL TALK PS, etc ~ (funct i~na 1) programming log lC (New software) Intelligent programming environments © Designing and prototype - building environments © Various machines (chip, module) © Super machine ~ rendered intelligent New language ,~, @ " ",~, " "" ~G Core ~anguage "" Parallel \\ ,\ \\ \\ '~ Data flow machine (Inference Associative Databa se mach ine symbol manipulation Other ideas Planning Programming © Software --- (Accumulation) Knowledge engineering Problem solving ---------- -------- - - Software engineering Games (Basic theories) Research for artificial intelligence QA - language understanding Knowledge base Fig_ ( Theorem proving ( Consultations Conceptional development diagram T. Moto-oka (ed.): Fifth Generation Computer Systems (Proc. FGCS'81), JIPDEC: North-Holland, 1982, p. 113 Each prototype has a different architecture in its interconnection network and so forth, and the architecture itself is a subject of research. Viewed from the outside, however, all of them are KL1 machines. Division Chief Uchida and Laboratory Chief Taki will show you details on PIM later. What I want to emphasize here is that all of these prototypes are designed, down to the level of internal chips, with the assumption that KL1, a language that could be categorized as a very high-level language, is a "machine language." On the software side as well, our research topics were integrated under the KL1 language. All the application software, as well as the basic software such as operating systems, were to be written in KL1. We demonstrated an operating system called PIMOS at FGCS'88, which was the first operating system software written in KL1. It was immature at that time, but has been improved since then. The full-fledged version of PIMOS now securely backs the demonstrations being shown at this conference. Details will later be given by Laboratory Chief Chikayama, but I wish to emphasize that not only have we succeeded in writing software as complicated and huge as an operating system entirely in KL1, but we have also proved through our own experience that KL1 is much more appropriate than conventional languages for writing system software such as operating systems. One of the major challenges in the final stage was to 6 demonstrate that KL1 is effective not only for basic software, such as operating systems and language implementations, but also for a variety of applications. As Laboratory Chief Nitta will report later, we have been able to demonstrate the effectiveness of KL1 for various applications including LSI-CAD, genetic analysis, and legal reasoning. These application systems address issues in the real world and have a virtually practical scale. But, again, what I wish to emphasize here is that the objective of those developments has been to demonstrate the effectiveness of Parallel Inference. In fact, it was in the initial stage of our project that we first tried the approach of developing a project around one particular language. The technology was at the level of sequential processing, and we adopted ESP, an expanded version of Prolog, as a basis. Assuming that ESP could playa role of KLO, our kernel language for sequential processing, a Personal Sequential Inference machine, called PSI, was designed as hardware. We decided to use the PSI machine as a workstation for our research. Some 500 PSIs, including modified versions, have so far been produced and used in the project. SIMPOS, the operating system designed for PSI, is written solely in ESP. In those days, this was one of the largest programs written in a logic programming language. Up to the intermediate stage of the project, we used PSI and SIMPOS as the infrastructure to conduct research on expert systems and natural language processing. This kind of approach is indeed the dream of researchers, but some of you may be skeptical about our approach. Our project, though conducted on a large scale, is still considered basic research. Accordingly, it is supposed to be conducted in a free, unrestrained atmosphere so as to bring about innovative results. Some of you may wonder whether the policy of centering around one particular language restrains the freedom and diversity of research. But this policy is also based upon my, or our, philosophy. I believe that research is a process of "assuming and verifying hypotheses." If this is true, the hypotheses must be as pure and clear as possible. If not, you cannot be sure of what you are trying to verify. A practical system itself could include compromise or, to put it differently, flexibility to accommodate various needs. However, in a research project, the hypotheses must be clear and verifiable. Compromises and the like could be considered after basic research results have been obtained. This has been my policy from the very beginning, and that is the reason why I took a rather controversial or provocative approach. We had a strong belief that our hypothesis of focusing on Parallel Inference and KL1 had sufficient scope for a world of rich and free research. Even if the hypothesis acted as a constraint, we believed that it would act as a creative constraint. I would be a liar if I was to say that there was no resistance among our researchers when we decided upon the above policy. KL1 and parallel processing were a completely new world to everyone. It required a lot of courage to plunge headlong into this new world. But once the psychological barrier was overcome, the researchers set out to create new parallel programming techniques one after another. People may not feel like using new programming languages such as KLl. Using established languages and systems only, or a kind of conservatism, seems to be the major trend today. In order to make a breakthrough into the future, however, we need a challenging and adventuring spirit. I think we have carried out our experiment with such a spirit throughout the ten-year project. Among the many other results we obtained in the final stage was a fast theorem-proving system, or a prover. Details will be given in Laboratory Chief Hasegawa's report, but I think that this research will lead to the resurrection of theorem- proving research. Conventionally, research into theorem proving by computers has been criticized by many mathematicians who insisted that only toy examples could be dealt with. However, very recently, we were able to solve a problem labelled by mathematicians as an 'open problem' using our prover, as a result of collaborative research with Australian National University. The applications of our prover is not limited to mathematical theorem proving; it is also being used as the inference engine of our legal reasoning system. Thus, our prover is being used in the mathematics world on one hand, and the legal world on the other. The research on programming languages has not ended with KLl. For example, a constraint logic programming language called eDee has been developed as a higherlevel language than KLl. We also have a language called Quixote. From the beginning of this project, I have advocated the idea of integrating three types of languages-logic, functional, and object-oriented-and of integrating the worlds of programming and of databases. This idea has been materialized in the Quixote language; it can be called a deductive object-oriented database language. Another language, CIL, was developed by Mukai in the study of natural language processing. CIL is a semantics representation language designed to be able to deal with situation theory. Quixote incorporates CIL in a natural form and therefore has the characteristics of a semantics representation language. As a whole, it shows one possible future form of knowledge representation languages. More details on Quixote, along with the development of a distributed parallel database management system, Kappa-P, will be given by Laboratory Chief Yokota. Thus far I have outlined, albeit briefly, the final results 7 of our ten-year project. Recalling what I envisaged ten years ago and what I have dreamed and hoped would materialize for 15 years, I believe that we have achieved as much as or more than what I expected, and I am quite satisfied. Naturally, a national project is not performed for mere self-satisfaction. The original goal of this project was to create the core of next-generation computer technologies. Various elemental technologies are needed for future computers and information processing. Although it is impossible for this project alone to provide all of those technologies, we are proud to be able to say that we have created the core part, or at least provided an instance of it. The results of this project, however, cannot be commercialized as soon as the project is finished, which is exactly why it was conducted as a national project. I estimate that it takes us another five years, which could be called a period for the "maturation of the technologies", for our results to actually take root in society. I had this prospect in mind when this project started ten years ago, and have kept declaring it in public right up until today. Now the project is nearing its end, but my idea is still the same. There is often a gap of ten or twenty years between the basic research stage of a technology and the day it appears in the business world. Good examples are UNIX, C, and RISC, which has become popular in the current trend toward downsizing. They appear to be up-to-date in the business world, but research on them has been conducted for many years. The frank opinions of the researchers involved will be that industry has finally caught up with their research. There is thus a substantial time lag between basic research and commercialization. Our project, from its very outset, set an eye on technologies for the far distant future. Today, the movement toward parallel computers is gaining momentum worldwide as a technology leading into the future. However, skepticism was dominant ten years ago. The situation was not very different even five years ago. When we tried to shift our focus on parallel processing after the initial stage of the project, there was a strong opinion that a parallel computer was not possible and that we should give it up and be happy with the successful results obtained in the initial stage. In spite of the skepticism about parallel computers that still remains, the trend seems to be changing drastically. Thanks to consta,nt progress in semiconductor technology, it is now becoming easier to connect five hundred, a thousand, or even more processor chips, as far as hardware technology is concerned. Currently, the parallel computers that most people are interested in are supercomputers for scientific computation. The ideas there tend to still be vague regarding the software aspects. Nevertheless, a new age is dawning. The software problem might not be too serious as long as scientific computation deals only with simple, scaledup matrix calculations, but it will certainly become serious in the future. Now suppose this problem has been solved and we can nicely deal with all the aspects of large-scale problems with complicated overall structures. Then, we would have something like a general-purpose capability that is not limited to scientific computation. We might then be able to replace the mainframe computers we are using now. The scenario mentioned above is one possibility leading to a new type of mainframe computer in the future. One could start by connecting a number of processor chips and face enormous difficulties with parallel software. However, he or she could alternatively start by considering what technologies will be required in the future, and I suspect that the answer should be the Parallel Inference technology which we have been pursuing. I am not going to press the above view upon you. However, I anticipate that if anybody starts research without knowing our ideas, or under a philosophy that he or she believes is quite different from ours, after many twists and turns that person will reach more or less the same concept as ours-possibly with small differences such as different terminology. In other words, my opinion is that there are not so many different essential technologies. It may be valuable for researchers to struggle through a process of research independently from what has already been done, finally to find that they have followed the same course as somebody else. But a more efficient approach would be to build upon what has been done in this FGCS project and devote energy to moving forward from that point. I believe the results of this project will provide important insights for researchers who want to pursue general-purpose parallel computers. This project will be finished at the end of this year. As for "maturation of the Parallel Inference technology", I think we will need a new form of research activities. There is a concept called "distributed cooperative computing" in the field of computation models. I expect that, in a similar spirit, the seeds generated in this project will spread both inside and outside the country and sprout in many different parts of the world. For this to be realized, the results of this project must be freely accessible and available worldwide. In the software area, for example, this means that it is essential to disclose all our accomplishments including the source codes and to make them "international common public assets." MITI Minister Watanabe and the Director General of the E1ureau announced the policy that the results of our project could be utilized throughout the world. Enormous effort must have been made to formulate such a policy. I find it very impressive. We have tried to encourage international collaboration for ten years in this project. As a result, we have 8 enjoyed opportunities to exchange ideas with many researchers involved in advanced studies in various parts of the world. They have given us much support and cooperation, without which this project could not have been completed. In that regard, and also considering that this is a Japanese national project that aims at making a contribution, though it may only be small, toward the future of mankind, we believe that we are responsible for leaving our research accomplishments as a legacy to future generations and to the international community in a most suitable form. This is now realized, and I believe it is an important springboard for the future. Although this project is about to end, the end is just another starting point. The advancement of computers and information processing technologies is closely related to the future of human society. Social thought, ideologies, and social systems that fail to recognize its significance will perish as we have seen in recent world history. We must advance into a new age now. To launch a new age, I fervently hope that the circle of those who share our passion for a bright future will continue to expand. Thank you. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 9 Overview of the Ten Years of the FGCS Project Takashi Kurozumi Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan Kurozumi @ icot.or.jp Abstract This paper introduces how the FGCS Project started, its overall activities and the results of the FGCS project. The FGCS Project was launched in 1982 after a three year preliminary study stage. The basic framework of the fifth generation computer is parallel processing and inference processing based on logic programming. Fifth generation computers were viewed as suitable for the knowledge information processing needs of the near future. ICOT was established to promote the FGCS Project. This paper shows not only, ICOT's efforts in promoting the FGCS project, but relationship between ICOT and related organizations as well. I, also, conjecture on the parallel inference machines of the near future. 1 Preliminary Study Stage for the FGCS Project The circumstances prevailing during the preliminary stage of the FGCS Project, from 1979 to 1981, can be summarized as follows. .J apanese computer technologies had reached the level of the most up-to-date overseas computer technologies. ·A change of the role of the Japanese national project for computer technologies was being discussed whereby there would be a move away from improvement of industrial competi ti veness by catching up with the latest European computer technologies and toward world-wide scientific contribution through the risky development of leading computer technologies. In this situation, the Japanese Ministry of International Trade and Industry (MIT!) started study on a new project - the Fifth Generation Computer Project. This term expressed MITI's will to develop leading technologies that would progress beyond the fourth generation computers due to appear in the near future and which would anticipate upcoming trends. The Fifth Generation Computer Research Committee and its subcommittee (Figure 1-1) were established in 1979. It took until the end of 1981 to decide on target technologies and a framework for the project. Figure 1-1 Organization of the Fifth Generation Computer Committee Well over one hundred meetings were held with a similar number of committee members participating. The following important near-future computer technologies were discussed . Inference computer technologies for knowledge processing Computer technologies to process large-scale data bases and knowledge bases High performance workstation technologies Distributed functional computer technologies Super-computer technologies for scientific calculation These computer technologies were investigated and discussed from the standpoints of international contribution by developing original Japanese technologies, the important technologies in future, social needs and conformance with Japanese governmental policy for the national project. Through these studies and discussions, the committee decided on the objectives of the project by 10 the topic with foreign researchers. the end of 1980, and continued future studies of technical matters, social impact, and project schemes. The committee's proposals for the FGCS Project are summarized as follows. CD The concept of the Fifth Generation Computer: To have parallel (non-Von Neumann) processing and inference processing using knowledge bases as basic mechanisms. In order to have these mechanisms, the hardware and software interface is to be a logic program language (Figure 1-2) . ® The objectives of the FGCS project: To develop these innovative computers, capable of knowledge information processing and to overcome the technical restrictions of conventional computers. 2 2.1 (Intelligent Assistant for Human Activities) OBasic Mechanisn of H/w & S/W....:; *Logicallnference Processing (based on Logic Programming) *Highly Parallel Processing Concept of the Fifth Generation Computer ® The goals of the FGCS project: To research and develop a set of hardware and software technologies for FGCS, and to develop an FGCS prototype system consisting of a thousand element processors with inference execution speeds of between 100M LIPS and 1G LIPS (Logical Inferences Per Second). ® R&D period for the project: Estimated to be 10 years, divided into three stages. 3-year initial stage for R&D of basic technologies 4-year intermediate stage for R&D of subsystems 3-year final stage for R&D of total prototype system MITI decided to launch the Fifth Generation Computer System (FGCS) project as a national project for new information processing, and made efforts to acquire a budget for the project. At the same time, the international conference on FGCS '81 was prepared and held in October 1981 to announce these results and to hold discussions on Stages and Budgeting in the FGCS Project The FGCS project was designed to investigate a large number of unknown technologies that were yet to be developed. Since this involved a number of risky goals, the project was scheduled over a relatively long period of ten years. This ten-year period was divided into three stages. - In the initial stage (fiscal 1982-1984), the purpose of R&D was to develop the basic computer technologies needed to achieve the goal. - In the intermediate stage (fiscal 1985-1988), the purpose of R&D was to develop small to medium subsystems. - In the final stage (fiscal 1989-1992), the purpose of R&D was to develop a total prototype system. The final stage was initially planned to be three years. After reexamination halfway through the final stage, this stage was extended to four years to allow evaluation and improvement of the total system in fiscal year 1992. Consequently, the total length of this project has been extended to 11 years. OComputer for Knowledge Information Processing System (KIPS) Figure 1-2 Overview of R&D Activities and Results of the FGCS Project (' .. 'i ( I trehmlnary: Initial Stage: Intermediate : Study : 3 years:82- 84 : Stage : Stage : (TOTAL:~8.3B): 4 years: 8S- 88 l:979-1981) .: (TOTAL: ~21.6B) o R&D of BaSIC I 0 R&D of Experimental Sth G. Computer I Small-to-Medium Scale Technology: Sub-system Final Stage 4 years: 89- 92 (3years total:~20.7B) 0 R&D of Total (Prototype) System 0 Total Evaluation " 1985 1986 1987 1988' 1989 1990 1991 1992 Budaet 1982 1983 1984' ¥400M ~2.7B ¥5.1B 1~4.7B ¥5.55B ~5.6B ¥5.7B I ¥6.5B ~7.0B ¥7.2B (¥3.6B) (for each fiscal SI.86M·, S12.6M S23.7M: S21.9M S34.5M', S35.0M S35.6M : S40.6M S43.7M SSI.4M" year) fl.30M', f8.80M f'6.6M: £l5.3M £22.0M', f22.4M £22.8M : f26.0M f28.0M £lO.OM', 10- year initial plan • R&D are carried out under the auspices of MITt. (All budget (Total budgets:¥54,6B) are covered by MITt.) " SI • ¥ 2'5, fl. ¥ 307 ('982-1985) '2 S'. ¥ 160, £I- ¥ 250 ('986-'990) '3 $1- ¥ 140, ¥ 240 (,991-) £, • Figure 2-1 Budgets for the FGCS project Each year the budget for the following years R&D activities was decided. MITI made great efforts in negotiating each year's budget with the Ministry of Finance. The budgets for each year, which are all covered by MITI, are shown in Figure 2-1. The total budget for the 3-year initial stage was about 8 billion yen. For the 4-year intermediate stage, it was about 22 billion yen. The total budget for 1989 to 1991 was around 21 billion yen. The budget for 1992 is estimated to be 3.6 billion yen. II Consequently, the total budget for the II-year period of the project will be about 54 billion yen. 2.2 R&D subjects of each stage At the beginning, it was considered that a detailed R&D plan could not be decided in detail for a period as long as ten years. The R&D goals and the means to reach these goals were not decided in detail. During the project, goals were sought and methods decided by referring back to the initial plan at the beginning of each stage. The R&D subjects for each stage, shown in Figure 2-2, were decided by considering the framework and conditions mentioned below. We defined 3 groups of 9 R&D subjects at the beginning of the initial stage by analyzing and rearranging the 5 groups of 10 R&D subjects proposed by the Fifth Generation Computer Committee. At the end of the initial stage, the basic research themes of machine translation and speech, figure and image processing were excluded from this project. These were excluded because computer vender efforts on these technologies were recognized as having become very active. In the middle of the intermediate stage, the task of developing a large scale electronic dictionary was transferred to EDR (Electronic Dictionary Research Center), and development of CESP (Common ESP system on UNIX) was started by AIR (AI language Research Center). The basic R&D framework for promoting this project is to have common utilization of developed software by unifying the software development environment (especially by unifying programming languages). By utilizing software development systems and tools, the results of R&D can be evaluated and improved. Of course, considering the nature of this project, there is another reason making it difficult or impossible to use commercial products as a software development environment. In each stage, the languages and the software· development environment are unified as follows. Initial stage: Prolog on DEC machine Intermediate stage: ESP on PSI and SIMPOS Final stage: KLI on Multi-PSI (or PIM) and PIMOS (PSI machines are also used as pseudo multi-PSI systems.) (Figure 2-6) 2.3 Overview of R&D Results of Hardware System Hardware system R&D was carried out by the subjects listed listed below in each stage. CD Initial stage iscal ear » >< Final Stage Intermediate Stage X Initial Stage 119821 '83 I '84 19851 '861 '871 '88 19891 '90 I '911 '92 < OBasic SIW System 8Basic SIW System .5G Kernel Languages .Problem Solving & Inference SIWM .K8 Management SIWM .'ntelligent Interface .5G Kernel Languages .Problem Solving & Inference SIWM .K8 Management S/WM .'ntelligent Interface S/WM .'ntelli!i1ent Programming .Experlmental Application SIWM(Software module) .'ntelligent Programming S/WM OPiiot Model for Software Development .SIM Hardware .SIM Software Software Development .Network System lor -Hardware System Development Support .PIM Functional Mechanism .K8M Functional Mechanism Figure 2-2 ~stem for Basic SIW .Development Support System .Pilot Model for Parallel .Hardware System • Inference subsystem .k8 Subsystem • Expenm.entll l l'aral.lel Application System .Knowledge Programming S/w System .Knowledge construction & Utilization .Natural Language Interface .Problem Solving & Programming(CLP.Prover) • Advanced Inference Method • Basic Software System .Inference Control Module (PIMOS) .K8 Management Modul (KflMS: Kappa & Quixote) .Prototype Hardware System Transition of R&D subjects in each stage ® Functional mechanism modules and simulators for PIM (Parallel Inference Machine) of the hardware system ® Functional mechanism modules and simulators for KBM (Knowledge Base Machine) of the hardware system CD SIM (Sequential Inference Machine) hardware of pilot model for software development @ Intermediate Stage ® Inference subsystem of the hardware system. ® Knowledge base subsystem of the hardware system CD Pilot model for parallel software development of the development support system. ® Final Stage ® Prototype hardware system Figure 2-3 Transition of R&D results of Hardware System The major R&D results on SIM were the PSI (Personal Sequential Inference Machine) and CHI (high performance back-end inference unit). In the initial stage, PSI- I (CD CD) was developed as KLO (Kernel Language Version 0) machine. PSI- I had 12 around 35 KLIPS (Logical Inference Per Second) execution speed. Around 100 PSI- I machines were used as main WSs (workstations) for the sequential logic programming language, ESP, in the first half of the intermediate stage. CHI- I (CD ©) showed around 200 KLIPS execution speed by using WAM instruction set and high-speed devices. In the intermediate stage, PSI was redesigned as multiPSI FEP (Front End Processor) and PSI- IT, and has performance of around 330-400 KLIPS. CHI was also redesigned as CHI- IT (® ©), with more than 400 KLIPS performance. PSI- IT machines were the main WSs for ESP after the middle of the intermediate stage, and were able to be used for KLI by the last year of the intermediate stage. PSIIII was developed as a commercial product by a computer company by using PIM/m CPU technologies, with the permission of MITI, and by using UNIX. R&D on PIM continued throughout the project, as follows. In the initial stage, experimental PIM hardware simulators and software simulators with 8 to 16 processors were trial-fabricated based on data flow and reduction mechanisms (@®). In the intermediate stage, we developed multiPSI VI, which was to construct 6 PSI-Is, as the first version of the KLI machine. The performance of this machine was only several KLIPS because of the KLI emulator (® ©). It did, however, provide evaluation and experience by developing a very small parallel as in KLl. This meant that we could develop multi-PSI V2 wi th 64 PSI- IT CPU s connected by a mesh network (® ®). The performance of each CPU for KLI was around 150 KLIPS, and the average performance of the full multi-PSI V2 was 5 MLIPS. This speed was enough to significantly improved to encourage efforts to develop various parallel KLI software programs including an practical as. After development of multi-PSI V2, we promote the design (® ®) and trial-fabrication of PIM experimental models (@®). At present, we are completing development of prototype hardware consisting of 3 large scale PIM modules and 2 small scale experimental PIM modules (@ ®). These PIM modules are designed to be equally suited to the KLI machine for inference and knowledge base management, and to be able to be installed all programs written by KLl. This is in spite of their using different architecture. The VPIM system is a KLl-b language processing system which gives a common base for PIM firmware for KLl-b developed on conventional computers. R&D on KBM continued until the end of the intermediate stage. An experimental relational data base machine (Delta) with 4 relational algebraic engines was trial-fabricated in the initial stage (CD ®). During the intermediate stage, a deductive data base simulator was developed to use PSIs with an accelerator for comparison and searching. An experimental system was also developed with multiple-multiple name spaces, by using CHI. Lastly, a know ledge base hard ware simulator with unification engines and multi-port page memory was developed in this stage (® (li)). We developed DB/KB management software, called Kappa, on concurrent basic software themes. At the beginning of the final stage, we thought that adaptability of PIM with Kappa for the various descri ption forms for the knowledge base was more important than effectivity of KBM with special mechanism for the specific KB forms. In other words, we thought that deductive object-oriented DB technologies was not yet matured to design KBM as a part of the prototype system. 2.4 Overview of R&D Results of Software Systems The R&D of software systems was carried out by a number of subjects listed below in each stage. CD Initial stage · Basic software ® 5G Kernel Languages ® Problem solving and inference software module CD Knowledge base management software module @ Intelligent interface software module ® Intelligent programming software module CD SIM software of pilot model for development support ® Basic software system in the intermediate stage ®-® (as in the initial stage) CD Experimental application system for basic software module @ Final stage · Basic software system ® Inference Control module ® KB management module · Knowledge programming software CD Problem solving and programming module @ Natural language interface module ® Knowledge construction and utiljzation module CD Advanced problem solving inference method 13 ® Experimental parallel application system To make the R&D results easy to understand, I will separate the results for languages, basic software, knowledge programming and application software. 2.4.1 R&D results of Fifth Generation Computer languages As the first step in 5G language development, we designed sequential logic programming languages KLO and ESP (Extended Self-contained Prolog) and developed these language processors (CD @). KLO, designed for the PSI hardware system, is based on Prolog. ESP has extended modular programming functions to KLO and is designed to describe large scale software such as SIMPOS and application systems. As a result of research on parallel logic programming language, Guarded Horn Clauses, or GHC, was proposed as the basic specification for KLI (Kernel Language Version 1) (CD @). KLI was, then, designed by adding various functions to KLI such as a macro description (@@). KLI consists of a machine level language (KLl-b (base) ), a core language (KLl-c) for writing parallel software and pragma (KL1-p) to describe the division of parallel processes. Parallel inference machines, multi-PSI and PIM, are based on KLl-b. Various parallel software, including PIMOS, is written in KLl-c and KLl-p. A'um is an object oriented language. The results of developing the A'um experimental language processor reflect improvements in KLI (@@, ®@). To research higher level languages, several languages were developed to aid description of specific research fields. CIL (Complex Indeterminate Language) is the extended language of Prolog that describes meanings and situations for natural language processing (CD @, @ @). CRL (Complex Record Language) was developed as a knowledge representation language to be used internally for deductive databases on nested relational DB software (@ ©). CAL (Contrainte Avec Logique) is a sequential constraint logic language for constraint programming (@(0). Mandala was proposed as a knowledge representation language for parallel processing, but was not adopted because it lacks a parallel processing environment and we had enough experience with it in the initial stage (CD©). Quixote is designed as a knowledge representation language and knowledge-base language for parallel processing based on the results of evaluation by CIL and CRL. Quixote is also a deductive object-oriented database language and play the key role in KBMS. A language processor is currently being developed for Quixote. GDCC(Guarded Definite Clause with Constraints) Figure 2-4 Transi tion of R&D of 5G Languages is a parallel constraint logic language that processes CAL results. 2.4.2 R&D Results of Basic Software (OS) In the initial stage, we developed a preliminary programming and operating system for PSI, called SIMPOS, using ESP (CD ® CD). We contiJ;lued to improve SIMPOS by adding functions corresponding to evaluation results. We also took into account the opinions of inside users who had developed software for the PSI machine using SIMPOS (@@CD). Since no precedent parallel OS which is suited for our aims had been developed anywhere in the world, we started to study parallel OS using our experiences of SIMPOS development in the initial stage. A small experimental PIMOS was developed on the multi-PSI VI system in the first half of the intermediate stage (@(0). Then, the first version of PIMOS was developed on the multi-PSI V2 system, and was used by KLI users (@ (0). PIMOS continued to be improved by the addition of functions such as remote access, file access and debugging support (®@). The Program Development Support System was also developed by the end of the intermediate stage (@(0). Figure 2-5 Transition of basic software R&D 14 Paragraph was developed as a parallel programming support system for improving concurrency and load distribution by the indication results of parallel processing (@@). In regard to DB/KB management software Kaiser was developed as a experimental relationai DB management software in the initial stage (CD @). Then, Kappa- I and Kappa- II were developed to provide the construction functions required to build a large scale DB/KB that could be used for natural language processing, theorem proving and various expert systems (@ @). KappaI and Kappa- II ,based on nested relational model are aimed at the database engine of deductiv~ object-oriented DBMS. . Recently, a parallel version of Kappa, Kappa-P, I~ b~ing developed. Kappa-P can manage dIstnbuted data bases stored on distributed disks in PIM. (@ ®) Kappa-P and Quixote constitute the KBMS. 2.4.3 R&D Results of Problem Solving and Programming Technologies Throughout this project, from the viewpoint of similarity mathematical theorem proving and program specification, we have been investigating proving technologies. The CAP (Computer Aided Proof) system was experimentally developed in the initial stage (@ 0). TRS (Term Rewriting System) and Metis were also developed to support specific mathematical reasoning, that is, the inference associated equals sign (@0). An experimental program for program verification and composition, Argus, was developed by the end of the intermediate stage (CD 0 and @ 0). These research themes concentrated on R&D into the MGTP theorem prover in the final stage(@@). Meta-programming technologies, partial evaluation technologies and the learning mechanism were investigated as basic research on advanced problem solving and the inference method (CD®, @®, @CD). 2.4.4 . R&D Results on Natural Language Processing Technologies Natural language processing tools such as BUP (Bottom-Up Parser) and a miniature electronic dictionary were experimentally developed in the initial stage (CD @). These tools were extended improved and arranged into LTB (Language Tooi Box). LTB is a library of Japanese processing software modules such as LAX (Lexical Analyzer), SAX (Syntactic Analyzer), a text generator and language data bases (@@), @@). An experimental discourse understanding system, DUALS, was implemented to investigate context processing and semantic analysis using these language processing tools (CD @),@ @). An experimental argument system, called Dulcinia is being implemented in the final stage (@@). ' 2.4.5 R&D Results on Knowledge Utilization Technologies and Experimental Application Systems In the intermediate stage we implemented experimental knowledge utilization tools such as APRICOT, based on hypothetical reasoning technology, and Qupras, based on qualitative reasoning technology (@ ©). At present, we are investigating such inference mechanisms for expert systems as assumption based reasoning and case based reasoning, and implementing these as knowledge utilization tools to be applied to the experimental application system (@0). As an application system, we developed, in ~rol~g, an. experimental CAD system for logic cirCUlt deSIgn support and wiring support in the initial stage. We also developed severaJ experimental expert systems such as a CAD system for layout and logic circuit design, a troubleshooting system, a plant control system and a go-playing system written in ESP (@CD, etc.). Small to medium parallel programs written in KLI were also developed to test and evaluate parallel systems by the end of the intermediate stage. These were improved for application to PIM in the final stage. These programs are PAX (a parallel semantics analyzer), Pentomino solver, shortest path solver and Tsume-go. We developed several experimental parallel systems, implemented using KLI in the final stage, such as LSI-CAD system (for logical simulation, wire routing, block layout, logical circuit design), ¥enetic information processing system, legal Inference system based on case based reasoning, expert systems for troubleshooting, plant control and go-playing (3g). Some of these experimental systems were developed from other earlier sequential systems in the intermediate stage while others are new application fields that started in the final stage. 2.5 Infrastructure of the FGCS Project As explained in 2.2, the main language used for software implementation in the initial stage was Prolog. In the intermediate stage, ESP was mainly used, and in the final stage KLI was the principle language. Therefore, we used a Prolog processing system on a conventional computer and terminals in the initial stage. SIMPOS on PSI ( I and II) was used as the workbench for sequential programming in 15 for Software Development. Simulation & Communication Research Department and Research Laboratories Networks LAN Figure 2-6 the intermediate stage. We are using PSI (II and ill) as a workbench and remote terminals to parallel machines (multi-PSIs and PIMs) for parallel programming in the final stage. We have also used conven tional machines for simulation to design PIM and a communication (E-mail, etc.) system. In regard to the computer network system, LAN has been used as the in-house system, and LAN has been connected to domestic and international networks via gateway systems. 3 Working Groups Infrastructure for R&D Promoting Organization of the FGCS Project ICOT was established in 1982 as a non-profit core organization for promoting this project and it began R&D work on fifth generation computers in June 1982, under the auspices of MITI. Establishment of ICOT was decided by considering the following necessity and effectiveness of a centralized core research center for promoting originative R&D, · R&D themes should be directed and selected by powerful leadership, in consideration of hardware and software integration, based on a unified framework of fifth generation computers, throughout the ten-year project period. · It was necessary to develop and nurture researchers working together because of the lack of researchers in this research field. · A core center was needed to exchange information and to collaborate with other organizations and outside researchers. ICOT consists of a general affairs office and a research center (Figure 3-1) . The organization of the ICOT research center was changed flexibly depending on the progress being made. In the initial stage, the research center consisted of a research planning department and three research laboratories. The number of Figure 3-1 ICOT Organization laboratories was increased to five at the beginning of the intermediate stage. These laboratories became one research department and seven laboratories in 1990. Final StaJle ;> <:: Initia! Stage X Intermediate Stage Fiscal 119821 '83 L '84119851 '861 '871 '88119891 '90 '911 '921 Year IDirector I Deputy Directors Dl!puty Director H1st R.Lab. 2nd R.Lab. J 1st R.Lab. 2nd R.Lab. 3rd R.Lab. 14th R.Lab. H3rd R.Lab. I I Research Dep. 1st R.Lab. 2nd R.lab. Jr R.Lao. 4th R.Lab. 5th R.Lab.' ~t h R.Lab. 7th R.Lab. 5th R.Lab. *R.Lab.:Research Laboratory fL.fResearch P\anninq Department / Section 95 I 1 DD I 100 1 100 I organization The number of researchers at the ICOT research center has increased yearly, from 40 in 1982 to 100 at the end of the intermediate stage. All researchers at the ICOT research center have been transferred from national research centers, public organizations, and computer vendors, and the like. To encourage young creative researchers and promote originative R&D, the age of dispatched researchers is limited to 35 years old. Because all researchers are normally dispatched to the ICOT research center for three to four years, ICOT had to receive and nurture newly transferred researchers. We must make considerable effort to continue to consisten tly lead R&D in the fifth generation computer field despite researcher rotation. This rotation has meant that we were able to maintain a staff of researchers in their 30's, and also could easily change the structure of organization in the ICOT research center. In total, 184 researchers have been transferred to 16 the ICOT research center with an average transfer period of 3 years and eight months (including around half of the dispatched researchers who are presently at ICOT). The number of organizations which dispatched researchers to IeOT also increased, from 11 to 19. This increase in participating organizations was caused by an expanding scheme of the supporting companies, around 30 companies, to dispatch researchers to ICOT midway through the intermediate stage. The themes each laboratory was responsible for changed occasionally depending on the progress being made. Figure 3-3 shows the present assignment of research themes to each research laboratory. Research Planning => Research planning Department & Section & management (PIMOS) => ·Basic software (Kappa & Quixote) => ·Constraint logic programming software => ·Prover & its application => . Natural => language interface software ·Parallel application system ·Knowledge utilization software (as of 1991) Figure 3-3 ICOT research center organization Every year we invited several visiting researchers from abroad for several weeks at ICOT's expense to discuss and to exchange opinion on specific research themes with ICOT researchers. Up to the present, we have invited 74 researchers from 12 countries in this program. We also received six long-term (about one year each) visiting researchers from foreign governmental organizations based on memorandums with the National Science Foundation (NSF) in the United States, the Institute National de Recherche en Informatique et Automatiqeu (INRIA) in France, and the Department of Trade and Industry (DTI) in the United Kingdom (Figures 3-2 and 3-4). Figure 3-4 shows the overall structure for promoting this project. The entire cost for the R&D activities of this project is supported by MITI based on the entrust contract between MITI and ICOT. Yearly and at the beginning of each stage we negotiate our R&D plan with MIT!. MITI receives advice of this R&D plan and evaluations of R&D results and ICOT research activities from the FGCS project advisory committee. ICOT executes the core part of R&D and has contracts with eight computer companies for Transfering Research Staff From o Public Organizations (ETL,MEL,N'IT,JIPDEC) .Computer Companies (14) Visiting Researchers RESEARCH Collaboration .Domestic ETL,MEL,EDR etc. oOverseas ANL,NIH,SICS, ANU,LBL Figure 3-4 • Invited Researchers • Dispatched Researchers From NSF,INRIA,DTI Programming & Development work o Computer Companies (8) Structure for promoting FGCS project experimental production of hardware and developmental software. Consequently, ICOT can handle all R&D activities, including the developmental work of computer companies towards the goals of this project. ICOT has set up committee and working groups to discuss and to exchange opinions on overall plans results and specific research themes with researchers and research leaders from universities and other research institutes. Of course, construction and the themes of working groups are changed depending on research progress. The number of people in a working group is around 10 to 20 members, so the total number in the committee and working groups is about 150 to 250 each year. Another program for information exchange and collaborative research activities and diffusion of research results will be described in the. following chapter. 4 Distribution of R&D Results and International Exchange Activities Because this project is a national project in which world-wide scientific contribution is very important, we have made every effort to include our R&D ideas, processes and project results when presenting ICOT activities. We, also, collaborate with outside researchers and other research organizations. We believe these efforts have contributed to progress in parallel and knowledge processing computer technologies. I feel that the R&D efforts in these fields have increased because of the stimulative effect of this project. We hope that R&D efforts will continue to increase through distribution of this projects R&D results. I believe that many outside researchers have also made significant contributions to this project through 17 their discussions and information exchanges with ICOT researchers. ICOT Research ~ collaboration -Domestic ETl,EDRetc. -Overseas :~~:~~~,SICS Accepting Dispatched Researchers(total :8) (From NSF,INRIA,DTIl Hosting Conferences & Workshops -International Conference on FGCS ('81 :84:88:92) Co-sponser with U.S.(NSF), France(lNRIA),Sweden & Italy r::-~=;.:LL--'. ...-"--::,....:....,-:....-.:..:..::..:.:...:.:..::::.. U.K.(IED of DT\) -Domestic Conferences Figure 4-1 We could, for example, produce GHC, a core language of the parallel system, by discussion with researchers working on Parlog and Concurrent Prolog. We could, also, improve the performance of the PSI system by introducing the W AM instruction set proposed by Professor Warren. We have several programs for distributing the R&D results of this project, to exchange information and to collaborate with researchers and organizations. CD One important way to present R&D activities and results is publication and distribution of ICOT journals and technical papers. We have published and distributed quarterly journals, which contain introductions of ICOT activities, and technical papers to more than 600 locations in 35 countries. We have periodically published and sent more than 1800 technical papers to around 30 overseas locations. We have sent TRs (Technical Reports) and TMs (Technical Memos) on request to foreign addresses. These technical papers consist of more than 700 TRs and 1100 TMs published since the beginning of this project up to January 1992. A third of these technical papers are written in English. @ In the second program ICOT researchers discuss research matters and exchange information with outside researchers. ICOT researchers have made more than 450 presentations at international conferences and workshops, and at around 1800 domestic conferences and workshops. They have visited many foreign research organizations to discuss specific research themes and to explain ICOT activities. Every year, we have welcomed around 150 to 300 foreign researchers and specialists in other fields to exchange information with them and explain ICOT activities to them. As already described in the previous chapter, we have so far invited 74 active researchers from specific technical fields related to FGCS technologies. We have also received six longterm visiting researchers dispatched from foreign governmental organization based on agreemen t. These visi ting researchers conducted research at ICOT and published the results of that research. @ We sponsored the following symposiums and workshops to disseminate and exchange information on the R&D results and on ICOT activities. We hosted the International Conference on FGCS'84 in November 1984. Around 1,100 persons participated and the R&D results of the initial stage were presented. This followed the International Conference on FGCS'81, in which the FGCS project plan was presented. We also hosted the International Conference on FGCS'88 in November 1988. 1,600 persons participated in this symposium, and we presented the R&D results of the intermediate stage. We have held 7 Japan-Sweden (or Japan-Swederi-Italy) workshops since 1983 (co-sponsored with institute or universities in Sweden and Italy), 4 Japan-France AI symposiums since 1986, (co-sponsored with INRIA of France), 4 Japan-U.S. AI symposiums since 1987 (cosponsored with NSF of U.S.A.), and 2 Japan-U.K. workshops since 1989 (cosponsored with DTI of U.K.). Participating researchers have become to known each other well through presentations and discussions during these symposiums and workshops. We have also hosted domestic symposiums on this project and logic programming conferences every year. @) Because the entire R&D cost of this project has been provided by the government such intellectual property rights (IPR) as p~tents, which are produced in this project, belong to the Japanese government. These IPR are managed by AIST (Agency of Industrial Science and Technology). Any company wishing to produce commercial products that use any of these IPR must get permission to use them from AIST. For example, PSI and SIMPOS have already been commercialized by companies licensed by AIST. The framework for managing IPR must 18 impartially utilize IPR acquired through this project. That is, impartial permission to domestic and foreign companies, and among participating companies or others is possible because of AIST. @ Software tools developed in this project that are not yet managed as IPR by AIST can be used by other organizations for non-commercial aims. These software tools are distributed by ICOT according to the research tools permission procedure. We, now, have more than 20 software tools, such as PIMOS, PDSS, Kappa-II, the A'um system, LTB, the CAP system, the cuprolog system and the TRS generator. In other cases, we make the source codes of some programs public by printing them in technical papers. ® On specific research themes in the logic programming field, we have collaborated with organizations such as Argonne National Laboratory (ANL), National Institute of Health (NIH), Lawrence Berkeley Laboratory (LBL), Swedish Institute of Computer Science (SICS) and Australia National University (ANU). 5 Forecast of Some Aspects of 5GMachines LSI technologies have advance in accordance with past trends. Roughly speaking, the memory capacity and the number of gates of a single chip quadruple every three years. The number of boards for the CPU of an inference machine was more than ten for PSI- I , but only three for PSI- II and single board for PIM. The number of boards for 80M bytes memory was 16 for PSI- I , but only four for PSI- II and a single forPIM(m). Figure 5-1 shows the anticipated trend in board numbers for one PE (processor element: CPU and memory) and cost for one PE based on the actual value of inference machines developed by this project. The trend shows that, by the year 2000, around ten PEs will fit on one board, around 100 PEs will fit in one desk side cabinet, and 500 to a 1,000 PEs will fit in a large cabinet. This trend also shows that the cost of one PE will halve every three years. Figure 5-2 shows the performance trends of 5G machines based on the actual performance of inference machines developed by this project. The sequential inference processing performance for one PE quadrupled every three years. The improvement in parallel inference processing performance for one PE was not as large as it was for sequential processing, because PIM performance is estimated at around two and one half times that ~. Several -'cosrii'p-E' -. ~ MUPSlPE I(Relative Cost I ·:o.:".p':i.r~d. ,::,::i~~~I~? j , • 10 .... ,... 3Jl00 ...........•. (3. _. ~~!~~tiat) 130KlIPSIPE (Parallel) ..300.400. l (5e~L~~~~I) -·---e€)_._. 0.1 19§2 ·16Mbits DRAMMemory Figure 5-1 -64Mbits DRAM Memory '256Mbit, DRAM Memory Size and cost trends of 5G machines of multi-PSI. Furthermore, Figure 5-2 shows the performance of one board for both sequential and parallel processing, and the performance of a conventional micro-processor with CISC and RISC technology. In this figure, future improvements in the performance of one PE are estimated to be rather lower than a linear extension of past values would indicate because of the uncertainty of whether future technology will be able to elicit such performance improvements. Performance for one board is estimated at about 20 MLIPS, which is 100 times faster than PIM. Thus, a parallel machine with a large cabinet size could have 1 GLIPS. These parallel systems will have the processing speeds needed for various knowledge processing applications in the near future. Performance 1 10 GIPS LIPS 100 1M MIPS LIP 10 100 MIPS LIPS 1 10K MIPS LIPS Fiscal Year 19§2 Figure 5-2 2000 Performance trends of 5G machines Several parallel applications in this project, such as CAD, theorem provers, genetic information processing, natural language processing, and legal reasoning are described in Chapter 2. These applications are distributed in various fields and aim at cultivating new parallel processing application fields. We believe that parallel machine applications will be extended to various areas in industry and society, because parallel technology will become 19 common for computers in the near future. Parallel application fields will expand gradually according to function expansion by the use of advanced parallel processing and knowledge processing technologies. 6 Final Remarks I believe that we have shown the basic framework of the fifth generation computer based on logic programming to be more than mere hypothesis. By the end of the initial stage, we had shown the fifth generation computer to be viable and efficient through the development of PSI, SIMPOS and various experimental software systems written in ESP and Prolog. I believe that by the end of the intermediate stage, we had shown the possibility of realizing the fifth generation computer through the development of a parallel logic programming software environment which consisted of multi-PSI and PIMOS. And I hope you can see the possibility of an era of parallel processing arriving in the near future by looking at the prototype system and the R&D results of the FGCS Project. Acknowledgment This project has been carried out through the efforts of the researchers at ICOT, and with the support of MITI and many others outside ofICOT. We wish to extend our appreciation to them all for the direct and indirect assistance and co-operation they have provided. References [Motooka, et a11981] Proceedings of the International Conference on Fifth Generation Computer Systems, 1981, J1PDEC [Kawanobe, et a11984] K.Kawanobe, et al. ICOT Research and Development, Proceeding of the International Conference on Fifth Generation Computer Systems 1984, 1984, ICOT [Kurozumi, et a11987] T.Kurozumi, et al. Fifth Generation Computer Systems Project, 1987, ICOTTM303 [Kurozumi, et a11988] T.Kurozumi, et al. ICOT Research and development, Proceedings of the International Conference on Fifth Generation Computer Systems 1988, 1988, ICOT [Kurozumi, 1990] T.Kurozumi. Fifth Generation Computer Systems Project-Outline of Plan and Results, 1990, ICOT TM-996 PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 20 Summary of Basic Research Activities of the FGCS Project Koichi Furukawa Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan furukawa@icot.or.jp Introduction Abstract 1 The Fifth Generation Computer Project was launched in 1982, with the aim of developing parallel computers dedicated to knowledge information processing. It is commonly believed to be very difficult- to parallelize knowledge processing based on symbolic computation. We conjectured that logic programming technology would solve this difficulty. We conducted our project while stressing two seemingly different aspects of logic programming: one was establishment of a new information technology, and the other was pursuit of basic AI and software engineering research. In the former, we developed a concurrent logic programming language, GHC, and its extension for practical parallel programming, KL1. The invention of GHCjKLl enabled us to conduct parallel research on the development of software technology and parallel hardware dedicated to the new language. In the Fifth Generation Computer Project, two main research targets were pursued: knowledge information processing and parallel processing. Logic programming was adopted as a key technology for achieving both targets simultaneously. At the beginning of the project, we adopted Prolog as our vehicle to promote the entire research of the project. Since there were no systematic research attempts based on Prolog before our project, there were many things to do, including the development of a suitabie workstation for the research, experimental studies for developing a knowledge-based system in Prolog and investigation into possible parallel architecture for the language. We rapidly succeeded in promoting research in many directions. From this research, three achievements are worth noting. The first is the development of our own workstation dedicated to ESP, Extended Self-contained Prolog. We developed an operating system for the workstation completely in ESP [Chikayama 88]. The second is the application of partial evaluation to meta programming. This enabled us to develop a compiler for a new programming language by simply describing an interpreter of the language and then partially evaluating it. We applied this technique to derive a bottom-up parser for context free grammar given a bottom up interpreter for them. In other words, partial evaluation made meta programming useful in real applications. The third achievement was the development of constraint logic programming languages. We developed two constraint logic programming languages: CIL and CAL. CIL is for natural language processing and is based on the incomplete data structure for representing "Complex Indeterminates" in situation theory. It has the capability to represent structured data like Minsky's frame and any relationship between slots' values can be expressed using constraints. CIL was used to develop a natural language understanding system called DUALS. Another constraint logic programming language, CAL, is for non-linear equations. Its inference is done using the Buchberger algorithm for computing the Grobner Basis which is a variant of the Knuth-Bendix completion algorithm for a term rewriting We also developed several constraint logic programming languages which are very promising as high level languages for AI applications. Though most of them are based on sequential Prolog technology, we are now integrating constraint logic programming and concurrent logic programming and developing an integrated Ian.,. guage, GDCC. In the latter, we investigated many fundamental AI and software engineering problems including hypothetical reasoning, analogical inference, knowledge representation, theorem proving, partial evaluation and program transformation. As a result, we succeeded in showing that logic programming provides a very firm foundation for many aspects of information processing: from advanced software technology for AI and software engineering, through system programming and parallel programming, to parallel architecture. The research activities are continuing and latest as well as earlier results strongly indicate the truth of our conjecture and also the fact that our approach is appropriate. 21 system. We encountered one serious problem inherent to Prolog: that was the lack of concurrency in the fundamental framework of Prolog. We recognized the importance of concurrency in developing parallel processing technologies, and we began searching for alternative logic programming languages with the notion of concurrency. We noticed the work by Keith Clark and Steve Gregory on Relational Language [Clark Gregory 81] and Ehud Shapiro on Concurrent Prolog [Shapiro 83]. These languages have a common feature of committed choice nondeterminism to introduce concurrency. We devoted our efforts to investigating these languages carefully and Ueda finally designed a new committed choice logic programming language called GHC [Ueda 86a] [UedaChikayama 90], which has simpler syntax than the above two languages and still have similar expressiveness. We recognized the importance of GHC and adopted it as the core of our kernel language, KL1, in this project. The introduction of KL1 made it possible to divide the entire research project into two parts: the development of parallel hardware dedicated to KL1 and the development of software technology for the language. In this respect, the invention of GHC is the most important achievement for the success of the Fifth Generation Computer Systems project. Besides these language oriented researches, we performed many fundamental researches in the field of artificial intelligence and software engineering based on logic and logic programming. They include researches on nonmonotonic reasoning, hypothetical reasoning, abduction, induction, knowledge representation, theorem proving, partial evaluation and program transformation. We expected that these researches would become important application fields for our parallel machines by the affinity of these problems to logic programming and logic based parallel processing. This is now happening. In this article, we first describe our research efforts in concurrent logic programming and in constraint logic programming. Then, we discuss our recent research activities in the field of software engineering and artificial intelligence. Finally, we conclude the paper by stating the dirction of future research. 2 Concurrent Logic Programming In this section, we pick up two important topics in concurrent logic programming research in the project. One is the design principles of our concurrent logic programming language Flat GHC (FGHC) [Ueda 86a] [UedaChikayama 90], on which the aspects of KL1 as a concurrent language is based. The other is search paradigms in FGHC. As discussed later, one drawback of FGHC, viewing as a logic programming language, is the lack of search capability inherent in Prolog. Since the capability is related to the notion of completeness in logic programming, recovery of the ability is essential. 2.1 Design Principles of FGHC The most important feature of FGHC is that there is only one syntactic extension to Prolog, called the commitment operator and represented by a vertical bar "I". A commitment operator divides an entire clause into two parts called the guard part (the left-hand side of the bar) and the body part (the right-hand side). The guard of a clause has two important roles: one is to specify a condition for the clause to be selected for the succeeding computation, and the other is to specify the synchronization condition. The general rule of synchronization in FGHC is expressed as dataflow synchronization. This means that computation is suspended until sufficient data for the computation arrives. In the case of FGHC, guard computation is suspended until the caller is sufficiently instantiated to judge the guard condition. For· example, consider how a ticket vending machine works. After receiving money, it has to wait until the user pushes a button for the destination. This waiting is described as a clause such that "if the user pushed the 160-yen button, then issue a 160-yen ticket". The important thing is that dataflow synchronization can be realized by a simple rule governing head unification which occurs when a goal is executed and a corresponding FGHC clause is called: the information flow of head unification must be one way, from the caller to the callee. For example, consider a predicate representing service at a front desk. Two clauses define the predicate: one is for during the day, when more customers are expected, and another is for after-hours, when no more customers are expected. The clauses have such definitions as: serve ([First I Rest]) : -do_service(First) , serve(Rest). serve([]) :- true I true. Besides the serve process, there should be another process queue which makes a waiting queue for service. The top level goal looks like: ?- queue(Xs), serve(Xs). where "?-" is a prompt to the user at the terminal. Note that the execution of this goal generates two processes, queue and serve, which share a variable Xs. This shared variable acts as a channel for data transfer from one process to the other. In the above example, we assume that the queue process instantiates XS and the serve process reads the value. In other words, queue acts as a generator of the value of XS and serve acts as the consumer. The process queue instantiates XS either to a 22 list of servees represented by [ , , .. .] or to an empty list []. Before the instantiation, the value of Xs remains undefined. Suppose Xs is undefined. Then, the head unification invoked by the goal serve(Xs) suspends because the equations Xs = [First I Rest] and Xs = [] cannot be solved without instantiating Xs. But such instantiation violates the rule of one-way unification. Note that the term [First I Rest] in the head of serve means that the clause expects a non-empty list to be given as the value of the argument. Similarly, the term [] expects an empty list to be given. Now, it is clear that the unidirectionality of information flow realizes dataflow synchronization. This principle is very important in two aspects: one is that the language provides a natural tool for expressing concurrency, and the other is that the synchronization mechanism is simple enough to realize very efficient parallel implementation. 2.2 Search Paradigms in FGHC There is one serious drawback to FGHC because of the very nature of committed choice; that is, it no longer has an automatic search capability, which is one of the most important features of Prolog. Prolog achieves its search capability by means of automatic backtracking. However, since committed choice uniquely determines a clause for succeeding computation of a goal, there is no way of searching for alternative branches other than the branch selected. The search capability is related to the notion of completeness of the logic programming computation procedure and the lack of the capability is very serious in that respect. One could imagine a seemingly trivial way of realizing search capability by means of or-parallel search: that is, to copy the current computational environment which provides the binding information of all variables that have appeared so far and to continue computations for each alternative case in parallel. But this does not work because copying non-ground terms is impossible in FGHC. The reason why it is impossible is that FGHC cannot guarantee when actual binding will occur and there may be a moment when a variable observed at some processor remains unchanged even after some goal has instantiated it at a different processor. One might ask why we did not adopt a Prolog-like language as our kernel language for parallel computation. There are two main reasons. One is that, as stated above, Prolog does not have enough expressiveness for concurrency, which we see as a key feature not only for expressing concurrent algorithms but also for providing a framework for the control of physical parallelism. The other is that the execution mechanism of Prolog-like languages with a search capability seemed too complicated to develop efficient parallel implementations. We tried to recover the search capability by devising programming techniques while keeping the programming language as simple as possible. We succeeded in inventing several programming methods for computing all solutions of a problem which effectively achieve the completeness of logic programming. Three of them are listed as follows: (1) Continuation-based method [Ueda 86b] (2) Layered stream method [OkumuraMatsumoto 87] (3) Query compilation method [Furukawa 92] In this paper, we pick up (1) and (3), which are complementary to each other. The continuation-based method is suitable for the efficient processing of rather algorithmic problems. An example is to compute all ways of partitioning a given list into two sublists by using append. This method mimics the computation of ORparallel Prolog using AND-parallelism of FGHC. ANDserial computation in Prolog is translated to continuation processing which remembers continuation points in a stack. The intermediate results of computation are passed from the preceding goals to the next goals through the continuation stack kept as one of the arguments of the FGHC goals. This method requires input/output mode analysis before translating a Prolog program into FGHC. This requirement makes the method impractical for d~tabase applications because there are too many possible input-output modes for each predicate. The query compilation method solves this problem. This method was first introduced by Fuchi [Fuchi 90] when he developed a bottom-up theorem prover in KLl. In his coding technique, the multiple binding problem is avoided by reversing the role of the caller and the callee in straightforward implementation of database query evaluation. Instead of trying to find a record (represented by a clause) which matches a given query pattern represented by a goal, his method represents each query component with a compiled clause, represents a databasae 'with a data structure passed around by goals, and tries to find a query component clause which matches a goal representing a record and recurses the process for all potentially applicable records in the database1 . Since every record is a ground term, there is no variable in the caller. Variable instantiation occurs when query component clauses are searched and an appropriate clause representing a query component is found to match a currently processed record. Note that, as a result of reversing the representation of queries and databases from straightforward representation, the information flow is now from the caller (database) to the callee (a query component). This inversion of information flow avoids deadlock in query processing. Another important trick is that each time a query clause is called, a fresh variable is created for each variable in the query component. This mechanism is used for making a new environment 1 We need an auxiliary query clause which matches every record after failing to match the record to all the real query clauses. 23 for each OR-parallel computation branch. These tricks make it possible to use KLI variables to represent object level variables in database queries and, therefore, we can avoid different compilation of the entire database and queries for each input/output mode of queries. The new coding method stated above is very general and there are many applications which can be programmed in this way. The only limitation of this approach is that the database must be more instantiated than queries. In bottom-up theorem proving, this requirement is referred to as the range-restrictedness of each axiom. Range-restrictedness means that, after successfully finding ground model elements satisfying the antecedent of an axiom, the new model element appearing as the consequent of the axiom must be ground. This restriction seems very strong. Indeed, there are problems in the theorem proving area which do not satisfy the condition. We need a top-down theorem prover for such problems. However, many real life problems satisfy the range-restrictedness because they almost always have finite concrete models. Such problems include VLSI-CAD, circuit diagnosis, planning, and scheduling. We are developing a parallel bottom-up theorem prover called MGTP (Model Generation Theorem Prover) [FujitaHasegawa 91] based on SATCHMO developed by Manthey and Bry [MantheyBry 88]. We are investigating new applications to utilize the theorem prover. We will give an example of computing abduction using MGTP in Section 5. 3 Constraint mlng Logic Program- We began our constraint logic programming research almost at the beginning of our project, in relation to the research on natural language processing. Mukai [MukaiYasukawa 85] developed a language called CIL (Complex Indeterminates Language) for the purpose of developing a computational model of situation semantics. A complex indeterminate is a data structure allowing partially specified terms with indefinite arity. During the design phase of the language, he encountered the idea of freeze in Prolog II by Colmerauer [Colmerauer 86]. He adopted freeze as a proper control structure for our CIL language. From the viewpoint of constraint satisfaction, CIL only has a passive way of solving constraint, which means that there is no active computation for solving constraints such as constraint propagation or solving simultaneous equations. Later, we began our research on constraint logic programming involving active constraint solving. The language we developed is called CAL. It deals with non-linear equations as expressions to specify constraints. Three events triggered the research: one was our preceding efforts on developing a term rewrit- ing system called METIS for a theorem prover of linear algebra [OhsugaSakai 91]. Another event was our encounter with Buchberger's algorithm for computing the Grabner Basis for solving non-linear equations. Since the algorithm is a variant of the Knuth-Bendix completion algorithm for a term rewriting system, we were able to develop the system easily from our experience of developing METIS. The third event was the development of the CLP(X) theory by Jaffar and Lassez which provides a framework for constraint logic programming languages [JaffarLassez 86]. There is further remarkable research on constraint logic programming in the field of general symbol processing [Tsuda 92]. Tsuda developed a language called cu-Prolog. In cu-Prolog, constraints are solved by means of program transformation techniques called unfold/fold transformation (these will be discussed in more detail later in this paper, as an optimization technique in relation to software engineering). The unfold/fold program transformation is used here as a basic operation for solving combinatorial constraints among terms. Each time the transformation is performed, the program is modified to a syntactically less constrained program. Note that this basic operation is similar to term rewriting, a basic operation in CAL. Both of these operations try to rewrite programs to get certain canonical forms. The idea of cu-Prolog was introduced by Hasida during his work on dependency propagation and dynamical programming [Hasida 92]. They succeeded in showing that context-free parsing, which is as efficient as chart parsing, emerges as a result of dependency propagation during the execution of a program given as a set of grammar rules in cu-Prolog. Actually, there is no need to construct a parser. cu-Prolog itself works as an efficient parser. Hasida [Hasida 92] has been working on a fundamental issue of artifici~l intelligence and cognitive science from the aspect of a computational model. In his computation model of dynamical programming, computation is controlled by various kinds of potential energies associated with each atomic constraint, clause, and unification. Potential energy reflects the degree of constraint violation and, therefore, the reduction of energy contributes constraint resolution. Constraint logic programming greatly enriched the expressiveness of Prolog and is now providing a very promising programming environment for applications by extending the domain of Prolog to cover most AI problems. One big issue in our project is how to integrate constraint logic programming with concurrent logic programming to obtain both expressiveness and efficiency. This integration, however, is not easy to achieve because (1) constraint logic programming focuses on a control scheme for efficient execution specific to each constraint solving scheme, and (2) constraint logic programming essentially includes a search paradigm which re- 24 quires some suitable support mechanism such as automatic backtracking. It turns out that the first problem can be processed efficiently, to some extent, in the concurrent logic programming scheme utilizing the data flow control method. We developed an experimental concurrent constraint logic programming language called GDCC (Guarded Definite Clauses with Constraints), implemented in KL1 [HawleyAiba 91]. GDCC is based on an ask-tell mechanism proposed by Maher [Maher 87], and extended by Saraswat [Saraswat 89]. It extends the guard computation mechanism from a simple one-way unification solving problem to a more general provability check of conditions in the guard part under a given set of constraints using the ask operation. For the body computation, constraint literals appearing in the body part are added to the constraint set using the tell operation. If the guard conditions are not known to be provable because of a lack of information in the constraints set, then computation is suspended. If the conditions are disproved under the constraints set, then the guard computation fails. Note that the provability check controls the order of constraint solving execution. New constraints appearing in the body of a clause are not included in the constraint set until the guard conditions are known to be provable. The second problem of realizing a search paradigm in a concurrent constraint logic programming framework has not been solved so far. One obvious way is to develop an OR-parallel search mechanism which uses a full unification engine implemented using ground term representation of logical variables [Koshimura et al. 91]. However, the performance of the unifier is 10 to 100 times slower than the built in unifier and, as such, it is not very practical. Another possible solution is to adopt the new coding technique introduced in the previous section. We expect to be able to efficiently introduce the search paradigm by applying the coding method. The paradigm is crucial if parallel inference machines are to be made useful for the numerous applications which require high levels of both expressive and computational power. 4 Advanced Software EngineerIng Software engineering aims at supporting software development in various dimensions; increase of software productivity, development of high quality software, pursuit of easily maintainable software and so on. Logic programming has great potential for many dimensions in software engineering. One obvious advantage of logic programming is the affinity for correctness proof when given specifications. Automatic debugging is a related issue. Also, there is a high possibility of achieving automatic program synthesis from specifications by applying proof techniques as well as from examples by applying ind uetion. Program optimization is another promising direction where logic programming works very well. In this section, two topics are picked up: (1) meta programming and its optimization by partial evaluation, and (2) unfold/fold program transformation. 4.1 Meta Programming and Partial Evaluation We investigated meta programming technology as a vehicle for developing knowledge-based systems in a logic programming framework inspired by Bowen and Kowalski's work [BowenKowalski 83]. It was a rather direct way to realize a knowledge assimilation system using the meta programming technique by regarding integrity constraints as meta rules which must be satisfied by a knowledge base. One big problem of the approach was its inefficiency due to the meta interpretation overhead of each object level program. We challenged the problem and Takeuchi and Furukawa [Takeuchi Furukawa 86] made a brealdhrough in the problem by applying the optimization technique of partial evaluation to meta programs. We first derived an efficient compiled program for an expert system with uncertainty computation given a meta interpreter of rules with certainty factor. In this program, we succeeded in getting three times speedup over the original program. Then, we tried a more non-trivial problem of developing a meta interpreter of a bottom-up parser and deriving an efficient compiled program given the interpreter and a set of grammar' rules. We succeeded in obtaining an object program known as BUP, developed by Matsumoto [Matsumoto et al. 83]. The importance of the BUP meta-interpreter is that it is not a vanilla meta-interpreter, an obvious extension of the Prolog interpreter in Prolog, because the control structure is totally i::lifferent from Prolog's top-down control structure. After our first success of applying partial evaluation techniques in meta programming, we began the development of a self-applicable partial evaluator. Fujita and Furukawa [FujitaFurukawa 88] succeeded in developing a simple self-applicable partial evaluator. We showed that the partial evaluator itself was a meta interpreter very similar to the following Prolog interpreter in Prolog: solve(true) . solve((A,B)) solve(A) solve(A) , solve(B). clause(A,B), solve(B). where it is assumed that for each program clause, H :- B, a unit clause, clause(H ,B), is asserted 2 • A goal, solve (G), simulates an immediate execution ofthe subject goal, G, and obtains the same result. This simple definition of a Prolog self-interpreter, sol ve, suggests the following partial solver, psol ve. 2clauseC-,_) is available as a built-in procedure in the DECsystem-lO Prolog system. 25 psolve(true,true). psolve«A,B),(RA,RB)) psolve(A,RA), psolve(B,RB). psolve(A,R) :clause(A,B), psolve(B,R). psolve(A,A) :- '$suspend'(A). The partial solver, psol ve (G ,R), partially solves a given goal, G, returning the result, R. The result, R, is called the residual goal(s) for the given goal, G. The residual goal may be true when the given goal is totally solved, otherwise it will be a conjunction of subgoals, each of which is a goal, ~, suspended to be solved at '$suspend' (~), for some reason. An auxiliary predicate, '$suspend' (P), is define-d for each goal pattern, P, by the user. Note that psolve is related to solve as: solve(G) :- psolve(G,R), solve(R). That is, a goal, G, succeeds if it is partially solved with the residual goal, R, and R in turn succeeds (is totally solved). The total solution for G is thus split into two tasks: partial solution for G and total solution for the residual goal, R. We developed a self-applicable partial evaluator by modifying the above psol ve program. The main modification is the treatment of built-in predicates in Prolog and those predicates used to define the partial evaluator itself to make it self-applicable. We succeeded in applying the partial evaluator to itself and generated a compiler by partially evaluating the psol ve program with respect to a given interpreter, using the identical psol ve_ We further succeeded in obtaining a compiler generator, which generates different compilers given different interpreters, by partially evaluating the psol ve program with respect to itself, using itself. Theoretically, it was known that self-application of a partial evaluator generates compilers and a compiler generator [Futamura 71]. There were many attempts to realize self-applicable partial evaluators in the framework of functional languages for a long time, but no successes were reported until very recently [Jones et al. 85], [Jones et al. 88], [GomardJones 89]. On the other hand, we succeeded in developing a self-applicable partial evaluator in a Prolog framework in a very short time and also in a very elegant way. This proves some merits of logic programming languages over functional programming languages, especially in its binding scheme based on unification. 4.2 Unfold/Fold Program Transformation Program transformation provides a powerful methodology for the development of software, especially the derivation of efficient programs either from their formal specification or from decralative but possibly inefficient programs. Programs written in declarative form are often inefficient under Prolog's standard left to right control rule. Typical examples are found in programs based on a generate and test paradigm. Seki and Furukawa [SekiFurukawa 87] developed a program transformation method based on unfolding and folding for such programs. We will explain the idea in some detail. Let gen_ test (L) be a predicate defined as follows: gen_test(L) :- gen(L), test(L). where L is a variable representing a list, gen(L) is a generator of the list L, and test (L) is a tester for L. Assume both gen and test are incremental and are defined as follows: gene []) . gene [x IL]) gen_element(X) , gen(L). test ([]) . test([XIL]) :- test_element (X) , test(L). Then, it is possible to fuse two processes gen and test by applying unfold/fold transformation as follows: gen_test([XIL]) :- gen([XIL]), test([XIL]). unfold at gen and test gen_test([XIL]) :- gen_element(X) , gen(L), test_element (X) , test(L). fold by gen_ test gen_test([XIL]) :- gen_element(X) , test_element(X), gen_test(L). If the tester is not incremental, the above unfold/fold transformation does not work. One example is to test that all elements in the list are different from each other. In this case, the test predicate is defined as follows: test ([]) . test([XIL]) :- non_member(X,L), test(L). non_member(_,[]). non_member(X,[yIL]):dif(X,Y), non_member(X,L). where dif (X, Y) is a predicate judging that X is not equal to Y. Note that this test predicate is not incremental because a test for the first element X of the list requires the information of the entire list. The solution we gave to this problem was to replace the test predicate with an equivalent predicate with incrementality. Such an equivalent program test' is obtained by adding an accumulator as an extra argument of the test predicate defined as follows: 26 test' «(] ,_). test'([XIL] ,Ace) non_member(X,Acc), test'(L,[XIAcc]). The relationship between test and test' is given by the following theorem: unifiable to each pattern is associated with the pattern and only those patterns having the same associated subset of clauses are generalized. Note that a goal pattern is unfolded only by clauses belonging to the associated subset. Therefore the suppression of over-generalization also suppresses unnecessary expansion of clauses by unnecessary unfolding. Theorem test(L) = test'(L,[]) Now, the original gen_ test program becomes gen_test(L) :- gen(L), test'(L,[]). We need to introduce the following new predicate to perform the unfold/fold transformation: gen_test'(L,Acc) :- gen(L), test'(L.Acc). By applying a similar transformation process as before, we get the following fused recursive program of gen_ test': gen_test' «(] ,_). gen_test'([XIL],Acc) :- gen_element(X), non_member(X,Acc) , gen_test'(L,[XIAcc]). By symbolically computing the two goals ?- teste [Xi, ... ,Xn]). ?- test' ([Xi, ... ,Xn]). and comparing the results, one can find that the reordering of pair-wise comparisons by the introduction of the accumulator is analogous to the exchange of double summation L,i!iL,j!! Xij = L,j!iL,iJI Xij. Therefore, we refer to this property as structural commutativity. One of the key problems of unfold/fold transformation is the introduction of a new predicate such as gen_test' in the last example. Kawamura [Kawamura 91] developed a syntactic rule for finding suitable new predicates. There were several attempts to find appropriate new predicates using domain dependent heuristic knowledge, such as append optimization by the introduction of difference list representation. Kawamura's work provides some general criteria for selecting candidates for new predicates. His method first analyzes a given program to be transformed and makes a list of patterns which may possibly appear in the definition of new predicates. This can be done by unfolding a given program and properly generalizing all resulting patterns to represent them with a finite number of distinct patterns while avoiding over-generalization. One obvious strategy to avoid over-generalization is to perform least general generalization by Plotkin [Plotkin 70]. Kawamura also introduced another strategy for suppressing unnecessary generalization: a subset of clauses of which the head can be 5 Logic-based AI Research For a long time, deduction has played a central role in research on logic and logic programming. Recently, two other inferences, abduction and induction, received much attention and much research has been done in these new directions. These directions are related to fundamental AI problems that are open-ended by their nature. They include the frame problem, machine learning, distributed problem solving, natural language understanding, common sense reasoning, hypothetical reasoning and analogical reasoning. These problems require non-deductive inference capabilities in order to solve them. Historically, most AI research on these problems adopted ad hoc heuristic methods reflecting problem structures. There was a tendency to avoid a logic based formal approach because of a common belief in the limitation of the formalism. However, the limitation of logical formalism comes only from the deductive aspect of logic. Recently it has been widely recognized that abduction and induction based on logic provide a suitable framework for such problems requiring open-endedness in their formalism. There is much evidence to support this observation. • In natural language understanding, unification grammar is playing an important role in integrating syntax, semantics, and discourse understanding. • In non-monotonic reasoning, logical formalism such as circumscription and default reasoning and its compilation to logic based programs are studied ex~ tensively. • In machine learning, there are many results based on logical frameworks such as the Model Inference System, inverse resolution, and least general generalization. • In analogical reasoning, analogy is naturally described in terms of a formal inference rule similar to logical inference. The associated inference is deeply related to abductive inference. In the following, three topics related to these issues are picked up: they are hypothetical reasoning, analogy, and knowledge representation. 27 5.1 Hypothetical Reasoning A logical framework of hypothetical reasoning was studied by Poole et al. [Poole et al. 87]. They discussed the relationship among hypothetical reasoning, default reasoning and circumscription, and argued that hypothetical reasoning is all that is needed because it is simply and efficiently implemented and is powerful enough to implement other forms of reasoning. Recently, the relationship of these formalisms was studied in more detail and many attempts were made to translate non-monotonic reasoning problems into equivalent logic programs with negation as failure. Another direction of research was the formulation of abduction and its relationship' to negation as failure. There was also a study of the model theory of a class of logic programs, called general logic programs, allowing negation by failure in the definition of bodies in the clausal form. By replacing negation-by-failure predicates by corresponding abducible predicates which usually give negative information, we can formalize negation by failure in terms of abduction [EshghiKowalski 89] A proper semantics of general logic programs is given by stable model semantics [GelfondLifschitz 88]. It is a natural extension of least fixpoint semantics. The difference is that there is no Tp operator to compute the stable model directly, because we need a complete model for checking the truth value of the literal of negation by failure in bottom-up fixpoint computation. Therefore, we need to refer to the model in the definition of the model. This introduces great difficulty in computing stable models. The trivial way is to assume all possible models and see whether the initial models are the least ones satisfying the programs or not. This algorithm needs to search for all possible subsets of atoms to be generated by the programs and is not realistic at all. Inoue [Inoue et ai. 92] developed a much more efficient algorithm for computing all stable models of general logic programs. Their algorithm is based on bottom-up model generation method. Negation-by-failure literals are used to introduce hypothetical models: ones which assume the truth of the literals and the others which assume that they are false. To express assumed literals, they introduce a modal operator. More precisely, they translate each rule of the form: false, whereas, KA means that we assume that A is true. Although K and NK are modal operators, we can treat KA and NKA as new predicates independent from A by adding the following constraints: NKA, A -: NKA, KA for every atom A. (1) (2) By this translation, we obtain a set of clauses in first order logic and therefore it is possible to compute all possible models for the set using a first order bottom-up theorem prover, MGTP, described in Section 2. After computing all possible models for the set of clauses, we need to select only those models M which satisfy the following condition: For every ground atom A, if KA E M, then A EM. (3) Note that this translation scheme defines a coding method of original general logic programs which may contain negation by failure in terms of pure first order logic. Note also that the same technique can be applied in computing abduction, which means to find possible sets of hypotheses explaining the observation and not contradicting given integrity constraints. Satoh and Iwayama [SatohIwayama 92] independently developed a top-down procedure for answering queries to a general logic program with integrity constraints. They modified an algorithm proposed by Eshghi and Kowalski [EshghiKowalski 89] to correctly handle situations where some proposition must hold in a model, like the requirement of (3). Iwayama and Satoh [IwayamaSatoh 91] developed a mixed strategy combining bottom-up and top-down strategies for computing the stable models of general logic programs with constraints. The procedure is basically bottom-up. The top-down computation is related to the requirement of (3) and as soon as a hypothesis of K A is asserted in some model, it tries to prove A by a top-down expectation procedure. The formalization of abductive reasoning has a wide range of applications including computer aided design and fault diagnosis. Our approach provides a uniform scheme for representing such problems and solving them. It also provides a way of utilizing our parallel inference machine, PIM, for solving these complex AI problems. 5.2 to the following disjunctive clause which does not contain any negation-by-failure literals: -t for every atom A. Formal Approach to Analogy Analogy is an important reasoning method in human problem solving. Analogy is very helpful for solving problems which are very difficult to solve by themselves. AI+ 1 /\ ••• /\ Am - t (NKAm+l /\ .. , /\ NKAn /\ AI) V KAm+l V ... V KAn. Analogy guides the problem solving activities using the knowledge of how to solve a similar problem. Another The reason why we express the clause with the anaspect of analogy is to extract good guesses even when tecedent on the left hand side is that we intend to use there is not enough information to explain the answer. this clause in a bottom-up way; that is, from left to right. There are three major problems to be solved in order In this expression, NKA means that we assume that A is to mechanize analogical reasoning [Arima 92]: 28 • searching for an appropriate base of analogy with respect to a given target, • selecting important properties shared by a base and a target, and • selecting properties to be projected through an analogy from a base to a target. Though there was much work on mechanizing analogy, most of this only partly addressed the issues listed above. Arima [Arima 92] proposed an attempt to answer all the issues at once. Before explaining his idea, we need some preparations for defining teqIlinology. Analogical reasoning is expressed as the following inference rule: S(B) 1\ PCB) SeT) peT) J(x,s,p) = Jatt(s,p) 1\ Jobj(X,S), (5) The first component, Jatt(s,p), corresponds to information extracted from a base. The reason why it does not depend on x comes from the observation that information in the base of the analogy is independent from the choice of an object x. The second component, Jobj(X, s), corresponds to information extracted from the similarity and therefore it does not contain p as its parameter. Example: Negligent Student where T represents the target object, B the base object, S the similarity property between T and B, and P the projected property. This inference rule expresses that if we assume an object T is similar to another object B in the sense that they share a common property S then, if B has another property P, we can analogically reason that T also has the same property P. Note that the syntactic similarity of this rule to modus ponens. If we generalize the object B to a universally quantified variable X and replace the and connective to the implication connective, then the first expression of the rule becomes SeX) :) P(X), thereby the entire rule becomes modus ponens. Arima [Arima 92] tried to link the analogical reasoning to deductive reasoning by modifying the expression S(B) 1\ PCB) to 'v'x.(J(x) 1\ Sex) :) P(x)), and p represent the similarity property and the projected property. From the nature of analogy, we do not expect that there is any direct relationship between the object x and the projected property p. Therefore, the entire J(x,s,p) can be divided into two parts: (4) where J(x) is a hypothesis added to Sex) in order to logically conclude P(x). If there exists such a J(x), then the analogical reasoning becomes pure deductive reasoning. For example, let us assume that there is a student (StudentB) who belongs to an orchestra club and also neglects study. If one happens to know that another student (StudentT) belongs to the orchestra club, then we tend to conclude that he also neglects study. The reason why we derive such a conclusion is that we guess that the orchestra club is very active and student members of this busy club tend to neglect study. This reason is an example of the hypothesis mentioned above. Arima analyzed the syntactic structure of the above J( x) by carefully observing the analogical situation. First, we need to find a proper parameter for the predicate J. Since it is dependent on not only an object but also the similarity property and the projected property, we assume that J has the form of J(x,s,p), where s First, let us formally describe the hypothesis described earlier to explain why an orchestra member is negligent of study. It is expressed as follows: 'v'x,s,p.( Enthusiastic(x,s) 1\ BusyClub(s) I\Obstructive_to(p, s) 1\ Member _of (x, s) :) NegligenLof(x,p) ) (6) where x, s, and p are variables representing a person, a club and some human activity, respectively. The meaning of each predicate is easy to understand and the explanations are omitted. Since we know that both StudentB and StudentT are members of an orchestra, Members_of(X,s) corresponds to the similarity property Sex) in (4). On the other hand, since we want to reason the negligence of a student, the projected property P(x) is NegligenLof(x,p). Therefore, the rest of the expression ·in (6): Enthusiastic(x, s) 1\ BusyClub(s) 1\ Obstructive_to(p,s) corresponds to J(x,s,p). From the syntactic feature of this expression, we can conclude that JObj(X,S) = Enthusiastic(x,s), Jatt(s,p) = BusyClub(s) 1\ Obstructive_to(p,s). The reason why we need Jobj is that we are not always aware of an important similarity like Enthusiastic. Therefore, we need to infer an important hidden similarity from the given similarity such as Member _0 f. This inference requires an extra effort in order to apply the above framework of analogy. The restriction on the syntactic structure of J(x,s,p) is very important since it can be used to prune a search space to access the right base case given the target. This function is particularly important when we apply our analogical inference framework to case based reasoning systems. 29 5.3 Knowledge Representation Knowledge representation is one of the central issues in artificial intelligence research. Difficulty arises from the fact that there has been no single knowledge representation scheme for representing various kinds of know ledge while still keeping the simplicity as well as the efficiency of their utilization. Logic was one of the most promising candidates but it was weak in representing structured knowledge and the changing world. Our aim in developing a knowledge representation framework based on logic and logic programming is to solve both of these problems. From the structural viewpoint, we developed an extended relational database which can handle nonnormal forms and its corresponding programming language, CRL [Yokota 88aJ. This representation allows users to describe their databases in a structured way in the logical framework [Yokota et al. 88b]. Recently, we proposed a new logic-based knowledge representation language, Quixote [YasukawaYokota 90]. Quixote follows the ideas developed in CRL and CIL: it inherits object-orientedness from the extended version of CRL and partially specified terms from CIL. One of the main characteristics of the object-oriented features is the notion of object identity. In Quixote, not only simple data atoms but also complex structures are candidates for object identifiers [Morita 90J. Even circular structures can be represented in Quixote. The non-well founded set theory by Aczel [Aczel 88] was adopted to characterize them as a mathematical foundation for such objects, and unification on infinite trees [Colmerauer 82J was adopted as an implementation method. 6 Conclusion In this article, we summarized the basic research activities of the FGCS project. We emphasized two different directions of logic programming research. One followed logic programming languages where constraint logic programming and concurrent logic programming were focussed. The other followed basic research in artificial intelligence and software engineering based on logic and logic programming. This project has been like solving a jigsaw puzzle. It is like trying to discover the hidden picture in the puzzle using logic and logic programming as clues. The research problems to be solved were derived naturally from this image. There were several difficult problems. For some problems, we did not even have the right evaluation standard for judging the results. The design of GHC is such an example. Our entire picture of the project helped in guiding our research in the right direction. The picture is not completed yet. We need further efforts to fill in the remaining spaces. One of the most important parts to be added to this picture is the integration of constraint logic programming and concurrent logic programming. We mentioned our preliminary language/system, GDCC, but this is not yet matured. We need a really useful language which can be efficientlly executed on parallel hardware. Another research subject to be pursued is the realization of a database in KLl. We are actively constructing a parallel database but it is still in the preliminary stages. We believe that there is much affinity between databases and parallelism and we expect a great deal of parallelism from database applications. The third research subject to be pursued is the parallel implementation of abduction and induction. Recently, there has been much work on abduction and induction based on logic and logic programming frameworks. They are expected to provide a foundation for many research themes related to knowledge acquisition and machine learning. Also, both abduction and induction require extensive symbolic computation and, therefore, fit very well with PIM architecture. Although further research is needed to make our results really useful in a wide range of large-scale applications, we feel that our approach is in the right direction. Acknowledgements This paper reflects all the basic research activities in the Fifth Generation Computer Systems project. The author would like to express his thanks to all the researchers in ICOT, as well as those in associated companies who have been working on this project. He especially would like to thank Akira Aiba, Jun Arima, Hiroshi Fujita, K6iti Hasida, Katsumi Inoue, Noboru Iwayama, Tadashi Kawamura, Ken Satoh, Hiroshi Tsuda, Kazunori Ueda, Hideki Yasukawa and Kazumasa Yokota for their help in greatly improving this work. Finally, he would like to express his deepest thanks to Dr. Fuchi, the director of ICOT, for providing the opportunity to write this paper. References [Arima 92] J. Arima, Logical Structure of Analogy. In Proc. of the International Conf. on Fifth Generation Computer Systems 1992, Tokyo, 1992. [Aczel 88] P. Aczel, Non- Well Founded Set Theory. CLSI Lecture Notes No. 14, 1988. [Aiba et al. 88] A. Aiba, K. Sakai, Y. Sato, D.J. Hawley, and R. Hasegawa, Constraint Logic Programming Language CAL. In Pmc. of the International Conf. on Fifth Generation Computing Systems 1988, Tokyo, 1988. [Bowen Kowalski 83] K. Bowen and R. Kowalski, Amalgamating Language and Metalanguage 30 in Logic Programming. In Logic Programming, K. Clark and S. Tarnlund (eds.), Academic Press, 1983. [FujitaHasegawa 91] H. Fujita and R. Hasegawa, A [Clark Gregory 81] K. 1. Clark and S. Gregory, A Re- tional Conference on Logic Programming, Paris, 1991. Model Generation Theorem Prover in 1(Ll Using a Ramified-Stack Algorithm. In Proc. of the Eighth Interna- lational Language for Parallel Programming. In Proc. ACM Conf. on Func- tional Programming Languages and Computer Architecture, ACM, 1981. [Clark Gregory 86] K. L. Clark and S. Gregory, PARLOG: Parallel Programming in Logic. Research Report DOC 84/4, Dept. of Computing, Imperial College of Science and Technology, London. Also in ACM. Trans. Prog. Lang. Syst., Vol. 8, No.1, 1986. [Chikayama 88] T. Chikayama, Programming in ESP Experiences with SIMPOS -, In Programming of Future Generation Computers, Fuchi and Nivat (eds.), NorthHolland, 1988. [Colmerauer 82] A. Colmerauer, Prolog and Infinite Trees. In Logic Programming, K. L. 'Clark and S. A. Tarnlund (eds.), Academic Press, 1982. [Colmerauer 86] A. Colmerauer, Theoretical Model of Prolog II. In Logic Programming and Its Applications, M. Van Caneghem and D. H. D. Warren (eds.), Albex Publishing Corp, 1986. [FuchiFurukawa 87] K. Fuchi and K. Furukawa, The Role of Logic Programming in the Fifth Generation Computer Project. New Generation Computing, Vol. 5, No.1, Ohmsha-springer, 1987. [EshghiKowalski 89] K. Eshghi and R.A. Kowalski, Abduction compared with negation by failure, in: Proceedings of the Sixth International Conference on Logic Programming, Lisbon, Portugal, 1989. [Fuchi 90] K. Fuchi, An Impression of 1(Ll Programming - from my experience with writing parallel provers . -. In Proe. of KLI Programming Workshop '90, ICOT, 1990 (in Japanese). [FujitaFurukawa 88] H. Fujita and K. Furukawa, A SelfApplicable Partial Evaluator and Its Use in Incremental Compilation. New Generation Computing, Vol. 6, Nos.2,3, Ohmsha/Springer- Verlag, 1988. [Furukawa 92] K. Furukawa, Logic Programming as the Integrator of the Fifth Generation Computer Systems Project, Communication of the ACM, Vol. 35, No.3, 1992. [Futamura 71] Y. Futamura, Partial Evaluation of Computation Process: An Approach to a Compiler-Compiler. Systems, Computers, Controls 2, 1971. [GelfondLifschitz 88] M. Gelfond and V. Lifschitz, The stable model semantics for logic programming, In Proceedings of the Fifth International Conference and Symposium on Logic Programming, Seattle, WA,1988. [GomardJones 89] C. K. Gomard and N. D. Jones, Compiler Generation by Partial Evaluation: A Case Study. In Proc. of Information Processing 89, G. X. Ritter (ed.), NorthHolland, 1989. [Hasida 92] K. Hasida, Dynamics of Symbol Systems - An Integrated Architecture of Cognition. In Proc. of the International Conf. on Fifth Generation Computer Systems 1992, Tokyo, 1992. [HawleyAiba 9l] D. Hawley and A. Aiba, Guarded Definite Clauses with Constraints - Preliminary Report. Technical Report TR-713, ICOT, 1991. [Inoue et al. 92] K. Inoue, M. Koshimura and R. Hasegawa, Embedding Negation as Failure into a Model Generation Theorem Prover. To appear in CADE11: The Eleventh International Confer- ence on A utomated Deduction, Saratoga Springs, NY, June 1992. [IwayamaSatoh 91] N. Iwayama and K. Satoh, A Bottom-·up Procedure with Top-down Expectation for General Logic Programs with Integrity Constraints. ICOT Tech- nical Report TR-625, 1991. [JaffarLassez 86] J. Jaffar and J-L. Lassez, Constraint Logic Programming. Technical Report, Department of Computer Science, Monash University, 1986. 31 [Jones et al. 85] N.D. Jones, P. Sestoft, and H. S¢ndergaard, An Experiment in Partial Evaluation: The Generation of a Compiler Generator. In J-.P. Jouannaud (ed.), Rewriting Techniques and Applications, LNCS-202, Springer-Verlag, pp.124-140, 1985. Layered Streams, In Proc. 1987 International Symposium on Logic Programming, pp. 224-232, San Francisco, September 1987. [Plotkin 70] G. D. Plotkin, A note on inductive generalization. In B. Meltzer and D. Michie (eds.), Machine Intelligence 5, 1970. [Jones et al. 88] N. D. Jones, P. Setstoft and H. Sondergaard, MIX: a self-applicable partial evaluator for experiments in compiler generator,Journal of LISP and Symbolic Computation, 1988. . [Poole et al. 87] D. Poole, R. Goebel and R. Aleliunas, Theorist: A logical Reasoning System for Defaults and Diagnosis, N. Cercone and G. McCalla (eds.),' The Knowledge [Kawamura 91] T. Kawamura, Derivation of Efficient Frontier: Essays in the Representation of Knowledge, Springer-Verlag, pp.331- Logic Programs by Synthesizing New Predicates. Proc. of 1991 International Logic Programming Symposium, pp.611 - 625, San Diego, 1991. 352 (1987). [SakaiAiba 89] K. Sakai and A. Aiba, CAL: A Theoretical Background of Constraint Logic Programming and its Applications. J. Sym- [Koshimura et al. 91] M. Koshimura, H. Fujita and R. Hasegawa, bolic Computation, Vo1.8, No.6, pp.589603, 1989. Utilities for Meta-Programming in KL1. In Proc. of KLI Programming Workshop'91, ICOT, 1991 (in Japanese). [Maher 87] M. J. Maher, Logic semantics for a class of committed-choice programs. In Proc. of the 4th Int. Conf. on Logic Programming, MIT Press, 1987. rMantheyBry 88] R. Manthey and F. Bry, SATCHMO: A Theorem Prover Implemented in Prolog. In Proc. of CADE-88, Argonne, Illi- [Saraswat 89] [SatohIwayama 92] K. Satoh and N. Iwayama, A Correct Top-down Proof Procedure for a General Logic Program with Integrity Constraints. In Proc. of the 3rd International Workshop on Extensions of Logic Programming, E. Lamma and P. Mello nois, 1988. [Matsumoto et al. 83] Yuji Matsumoto, H. Tanaka, H. Hirakawa, H. Miyoshi and H. Yasukawa, BUP: A Bottom-up Parser Embedded in Prolog, New Generation Computing, Vol. 1, 1983. [Morita et ai. 90] Y. Morita, H. Haniuda and K. Yokota, Object Identity in Quixote. Technical Report TR-601, ICOT, 1990. [MukaiYasukawa 85] K. Mukai, and H. Yasukawa, Complex Indeterminates in Prolog and its Application to Discourse Models. New Generation Computing, Vol. 3, No.4, 1985. [OhsugaSakai 91] A. Ohsuga and K. Sakai, Metis: A Term Rewriting System Generator. In Software Science and Engineering, 1. Nakata and M. Hagiya (eds.), World Scientific, 1991. [OkumuraMatsumoto 87] Akira Okumura and Yuji Matsumoto, Parallel Programming with V. Saraswat, Concurrent Constraint Programming Languages. PhD thesis, Carnegie-Mellon University, Computer Science Department, 1989. (eds.), Facalta di Ingegneria, Universita di Bologna, Italy, 1992. [SekiFurukawa 87] H. Seki and K. Furukawa, Notes on Transformation techniques for Generate and Test Logic Programs. In Proc. 1987 Symposium on Logic Programming, iEEE Computer Society Press, 1987. [Shapiro 83] E. Y. Shapiro, A Subset of Concurrent Prolog and Its Interpreter. Tech. Report TR-003, Institute for New Generation Computer Technology, Tokyo, 1983. [Sugimura' et al. 88] R. Sugimura, K. Hasida, K. Akasaka, K. Hatano, Y. Kubo, T. Okunishi, and T. Takizuka, A Software En,vironment for Research into Discourse Understanding Systems. In Proc. of the International Conf. on Fifth Generation Computing Systems 1988, Tokyo, 1988. [TakeuchiFu'rukawa 86] A. Takeuchi and K. Furukawa, Partial Evaluation of Prolog Programs 32 and Its Application to Meta Programming. In Proc. IFIP'86, North-Holland, 1986. [Taki 88] K. Taki, The Parallel Software Research and Development Tool: Multi-PSI system. In Programming of Future Genera- tion Computers, K. Fuchi and M. Nivat (eds.), North-Holland, 1988. [Taki 89] K. Taki, The FGCS Computing Architechlre. In Proc. IFIP'89, NorthHolland, 1989. [Tanaka Yoshioka 88] Y. Tanaka, and T. Yoshioka, Overview of the Dictionary and Lexical Knowledge Base Research. In Proc. FGCS'88, Tokyo, 1988. [Tsuda 92] H. Tsuda, cu-Prolog for Constraintbased Grammar. In Proc. of the International Conf. on Fifth Generation Computer Systems 1992, Tokyo, 1992. [Veda 86a] K. Veda, Guarded Horn Clauses. In Logic Programming '85, E. Wada (ed.), Lectl.,lre Notes in Computer Science, 221, Springer-Verlag, 1986. [Veda 86b] K. Veda, 1I1aking Exhaustive Search Programs Deterministic. In Proc. of the Third Int. Conf. on Logic Programming, Springer-Verlag, 1986. [VedaChikayama 90] K. Veda and T. Chikayama, Design of the J{ ernel Language for the Parallel Inference 1I1achine. The Computer Journal, Vol. 33, No.6, .pp. 494-500; 1990. [Warren 83] D. H. D. Warren, An Abstract Prolog Instruction Set. Technical Note 304, Artificial Intelligence Center, SRI, 1983. [YasukawaYokota 90] H. Yasukawa and K .. Yokota, Labeled Graphs as Semantics of Objects. Technical Report TR-600, ICOT, 1990. [Yokota 88a] K. Yokota, Deductive Approach for Nested Relations. In Programming of Future Generation Computers II, K. Fuchi and L. Kott (eds.), NorthHolland, 1988. [Yokota et al. 88b] K. Yokota, M. Kawamura and A. Kanaegami, Overview of the Knowledge Base Management System(KAPPA). In Pt.()c. of the International Conf. on Fifth Grmeration Computing Systems 1988, Tokyo, 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 33 Summary of the Parallel Inference Machine and its Basic Software Shunichi Uchida Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan uchida@icot.or.jp Abstract This paper aims at a concise introduction to the PIM and its basic software, including the overall framework of the project. Now an FGCS prototype system is under development. Its core is called a parallel inference system which includes a parallel inference machine, PIM, and its operating system, PIMOS. The PIM includes five hardware modules containing about 1,000 element processors in total. On the parallel inference system, there is a knowledge base management system (KBMS). The PIMOS and KBMS make a software layer called a basic software of the prototype system. These systems are already being run on the PIM. On these systems, a higher-level software layer is being developed. It is called a know ledge programming software. This is to be used as a tool for more powerful inference and know ledge processing. It contains language processors for constraint logic programming languages, parallel theorem provers and natural language processing systems. Several experimental application programs are also being developed for both general evaluation of the PIM and the exploration of new application fields for knowledge processing. These achievements with the PIM and its basic software easily surpass the research targets set up at the beginning of the project. 1 Introduction Since the fifth generation computer systems project (FGCS) was started in June, 1982, 10 years have passed, and the project is approaching its goal. This project assumed that "logic" was the theoretical backbone of future knowledge information processing, and adapted logic programming as the kernel programming language of fifth generation computer systems. In addition to the adaptation of logic programming, highly parallel processing for symbolic computation was considered indispensable for implementing practical knowledge information processing systems. Thus, the project aimed to create a new computer technology combining knowledge process- Knowledge and Symbol Processing Applications and Parallel Evaluation and Benchmark Programs Knowledge Processing ( Knowledge Programming System Kernel of FGCS l@@ij~1 !l/'iliarQl/'ilcQ IJIsll/'il~ ~<9 l}(i'Ilo'WIGd~GMS<9 11111111 Parallel OS and KBMS PIMOS iQllJIbtO~9 KII~~II-P Logic Programming Language ~11;;:11 Parallel Inference Machine PIM 1,000 PEs in total ( Multi-PSI System, 64 PEs) Technical Framework Prototype System of FGCS Figure 1: Framework of FGCS Project ing with parallel processing using logic programming. Now an FGCS prototype system is under development. This system integrates the major research achievements of these 10 years so that they can be evaluated and demonstrated. Its core is called a parallel inference system which includes a parallel inference machine, PIM, and its operating system, PIMOS. The PIM includes five hardware modules containing about 1,000 element processors in total. It also includes a language processor for a parallel logic language, KL 1. On the parallel inference system, there is a knowledge base management system (KBMS). The KBMS includes a database management system (DBMS), Kappa-P, as its lower layer. The KBMS provides a knowledge representation language, Quixote, 34 based on the deductive (and) object-oriented database. The PIMOS and KBMS make a software layer called a basic software of the prototype system. These systems are already being run on the PIM. The PIM and basic software are now being used as a new research platform for building experimental parallel application programs. They are the most complete of their kind in the world. On this platform, a higher-level software layer is being developed. This is to be used as a tool for more powerful inference and knowledge processing. It contains language processors for constraint logic programming languages, parallel theorem provers, natural language processing systems, and so on. These software systems all include the most advanced knowledge processing techniques, and are at the leading edge of advanced software SCIence. Several experimental application programs are also being developed for both general evaluation of the PIM and the exploration of new application fields for knowledge processing. These programs include a legal reasoning system, genetic information processing systems, and VLSI CAD systems. They are now operating on the parallel inference system, and indicate that parallel processing of knowledge processing applications is very effective in shortening processing time and in widening the scope of applications. However, they also indicate that more research should be made into parallel algorithms and load balancing methods for symbol and knowledge processing. These achievements with the PIM and its basic software easily surpass the research targets set up at the beginning of the project. This paper aims at a concise introduction to the PIM and its basic software, including the overall framework of the project. This project is the first Japanese national project that aimed at making a contribution to world computer science and the promotion of international collaboration. We have published our research achievements wherever possible, and distributed various programs from time to time. Through these activities, we have also been given much advice and help which was very valuable in helping us to attain our research targets. Thus, our achievements in the project are also the results of our collaboration with world researchers on logic programming, parallel processing and many related fields. 2 2.1 Research Targets and Plan Scope of R&D The general target of the project is the development of a new computer technology for knowledge information processing. Having "mathematical logic" as its theoretical backbone, various research and development themes were established on software and hardware technologies focusing on knowledge and symbol processing. These themes are grouped into the following three categories: 2.1.1 Parallel inference system The core portion of the project was the research and development of the parallel inference system which contains the PIM, a KL1language processor, and the PIMOS. To make the goal of the project clear, a FGCS prototype system was considered a major target. This was to be build by integrating many experimental hardware and software components developed ar0und logic programming. The prototype system was defined as a parallel inference system which is intended to have about 1,000 element processors and attain more than 100M LIPS (Logical Inference Per Second) as its execution speed. It was also intended to have a parallel operating system, PIMOS, as part of the basic software which provides us with an efficient parallel programming environment in which we can easily develop various parallel application programs for symbol and knowledge processing, and run them efficiently.- Thus, this is regarded as the development of a super computer for symbol and knowledge processing. It was intended that overall research and development activities would be concentrated so that the major research results could he integrated into a final prototype system, step by step, over the timespan allotted to the project. 2.1.2 KBMS and knowledge programming software Themes in this category aimed to develop a basic software technology and theory for knowledge processing. • Knowledge representation and knowledge base management • High-level problem solving and inference software • N aturallanguage processing software These research themes were intended to create new theories and software technologies based on mathematical logic to describe various knowledge fragments which are parts of "natural" knowledge bases produced in our social systems. We also intended to store them in a computer system as components of "artificial" knowledge bases so that they can be used to build various intelligent systems. To describe the knowledge fragments, a knowledge representation language has to be provided. It can be regarded as a very high-level programming language execu ted by a sophisticated inference mechanism which is much cleverer than the parallel inference system. Natural language processing research is intended to cover 35 lE~psrijm~rn~m~ #\p~~ij~~ij©)rn Sw~~~ms Parallel VLSI-CAD Systems Legal Reasoning System Genetic Information Processing Systems Other parallel expert systems ----..I_ _---IK...;.;,rn_©)_w~~~~ ~O'@oo~~ijrn~__ ( Natural Language ~@ifilwIlO'~ Processing System_-=s~ ~/~_ _--..;"'-- ~-~---...- Parallel OS, PIMOS KL 1 Programming Env. Parallel KBMS/DBMS Kappa-P + Quixote PIMIk PIM/c PIM/i PIM/m Figure 2: Organization of Prototype System research on knowledge representation methods and such inference mechanisms, in addition to research on easyto-use man-machine interface functions. Experimental software building for some of these research themes was done on the sequential inference machines because the level of research was so basic that computational power was not the major problem. 2.1.3 Benchmarking and evaluation systems • Benchmarking software for the parallel inference system • Experimental parallel application software To carry out research on an element technology in computer science, it is essential that an experimental software system is built. Typical example problems can then be used to evaluate theories or methods invented in the progress of the research. To establish general methods and technologies for knowledge processing, experimental systems should be developed for typical problems which need to process knowledge fragments as sets of rules and facts. These problems can be taken from engineering systems, including machine design and the diagnosis of machine malfunction, or from social systems such as medical care, government services, and company management. Generally, the exploitation of computer technology for knowledge processing is far behind that for scientific calculation. Recent expert systems and machine translation systems are examples of the most advanced knowledge processing systems. However, the numbers of rules and facts in their knowledge bases are several hundreds on average. This scale of knowledge base may not be large enough to evaluate the maximum power of parallel inference system having about 1,000 element processors. Thus, research and development on large-scale application systems is necessary not only for knowledge processing research but also for the evaluation of the parallel inference system. Such application systems should be widely looked for in many new fields. The scope of research and development in this project is very wide, however, the parallel inference system is central to the whole project. It is a very clear research target. Software research and development should also cover diverse areas in recent software technology. However, it has "logic" as the common backbone. It was also intended that major research achievements should be integrated into one prototype system. This has made it possible for us to organize all of our research and development in a coherent way. At the beginning of the project, only the parallel inference machine was defined as a target which was described very clearly. The other research targets described above were not planned at the 36 beginning of the project. They have been added in the middle of the intermediate stage or at the final stage. 2.2 Overall R&D plan After three years of study and discussions on determining our major research fields and targets, the final research and development plan was determined at the end of fiscal 1981 with the budget for the first fiscal year. At that time, practical logic programming languages had begun to be used in Europe mainly for natural language processing. The feasibility and potential of logic languages had not been recognized by many computer scientists. Thus, there was some concern that the level of language was too high to describe an operating system, and that the overhead of executing logic programs might be too large to use it for practical applications. This implies that research on logic programming was in its infancy. Research on parallel architectures linked with highlevel languages was also in its infancy. Research on dataflow architectures was the most advanced at that time. Some dataflow architecture was thought to have the potential for knowledge and symbol processing. However, its feasibility for practical applications had not yet been evaluated. Most of the element technologies necessary to build the core of the parallel inference system were still in their infancy. We then tried to define a detailed research plan step by step for the la-year project period. We divided the la-year period into three stages, and defined the research to be done in each stage as follows: • Initial stage (3 years) : -Research on potential element technologies -Development of research tools 3 3.1 Inference System in the Initial Stage Personal Sequential Inference Machine (PSI-I) To actually build the parallel inference system, especially a productive parallel programming environment which is now provided by PIMOS, we needed to develop various element technologies step by step to obtain hardware and software components. On the way toward this development, the most promising methods and technologies had to be selected from among many alternatives, followed by appropriate evaluation processes. To make this selection reliable and successful, we tried to build experimental systems which were as practical as possible. In the initial stage, to evaluate the descriptive power and execution speed of logic languages, a personal sequential machine, PSI, was developed. This was a logic programming workstation. This development was also aimed at obtaining a common research tool for software development. The PSI was intended to attain an execution speed similar to DEClO Prolog running on a DEC20 system, which was the fastest logic programming system in the world. To begin with, a PSI machine language, KLO, was designed based on Prolog. Then a hardware system was designed for the KLO. We employed tag architecture for the hardware system. Then we designed a system description language, ESP, which is a logic language having a class and inheritance mechanisms to make program modules efficiently.[Chikayama 1984] ESP was used not only to write the operating system for PSI, which is named SIMPOS, but also to write many experimental software systems for knowledge processing research. • Final stage (3 years) : -Second selection of major element technologies for final targets -Experimental building of a final full-scale system The development of the PSI machine and SIMPOS was successful. We were impressed by the very high software productivity of the logic language. The execution speed of the PSI was about 35K LIPS and exceeded its target. However, we realized that we could improve its architecture by using the optimization capability of a compiler more effectively. We produced about 100 PSI machines to distribute as a common research tool. This version of the PSI is called PSI-I. At the beginning of the project, we made a detailed research and development plan only for the initial stage. We decided to make detailed plans for the intermediate and final stages at the end of the stage before, so that the plans would reflect the achievements of the previous stage. The research budget and manpower were to be decided depending on the achievements. It was likely that the project would effectively be terminated at the end of the initial stage or the intermediate stage. In conjunction with the development of PSI-I and SIMPOS, research on parallel logic languages was actively pursued. In those days, pioneering efforts were being made on parallel logic languages such as P ARLOG and Concurrent Prolog. [Clark and Gregory 1984], [Shapiro 1983] We learned much from this pioneering research, and aimed to obtain a simpler language more suited for a machine language for a parallel inference machine. Near the end of the initial stage, a new parallel logic language, GHC was designed. [Ueda 1986] • Intermediate stage (4 years) -First selection of major element technologies for final targets -Experimental building of medium-scale systems 37 Table 1: Development of Inference Systems ~QJl@I1ilUDtSlO '82-'84 Initial Stage I( ~tSlI1'@lOO@O Inference Tech. Sequential Logic Programming Il Languages, 1}:{1b® and ~®~ Inference Tech. J J Machme, (parallel Logic Programming ( sequential_Inference PSI-I and SIMPOS, ~I}:{IbO~® f1®ij' ~Ib® • Languages ~1Xl~ and ~1.6 u I ............................................................................................... ·..··r......·....·..·..·..·. ·....··..··J·..··················..·..·................ ..... 1tR fa !'IIJM '85-'88 ( Inter- ;~ mediate Stage 1=j:lW) New model of PSI, PSI-II, ~®I}:{IbO~® f1®ij' 1}:{1b® :·. "~~r" ~i~;;~:::;~;:;·;; 3.2 Effect of PSI development on the research plan Efficiency in program production One of the important questions related to logic language was the feasibility of writing an operating system which needs to describe fine detailed control mechanisms. Another was its applicability to writing large-scale programs. SIMPOS development gave us answers to these questions. The SIMPOS has a multi-window-based user interface, and consists of more than 100,000 ESP program lines. It was completed by a team of about 20 software researchers and engineers over about two years. Most of the software engineers were not familiar with logic languages at that time. We found that logic languages have much higher productivity and maintainability than conventional von Neumann languages. This was obvious enough to convince us to describe a parallel operating system also in a logic language. 3.2.2 IWn ® I COOIP>~~ {J@)!l' ~Ib 'ij E Parallel OS, ~DIMl@® and Small Application Programs ! . ·. . ·.·.·. r. ···~~i~~;ii::~:=;··:~~···. .· . The experience gained in the development of PSI-I and SIMPOS heavily affected the planning of the intermediate stage. 3.2.1 """ Execution performance The PSI-I hardware and firmware attained about 35K LIPS. This execution speed was sufficient for most knowledge processing applications. The PSI had an 80 MB main memory. It was a. very big memory compared to mainframe computers at that time. We found that this large memory and fast execution speed made a logic language a practical and highly productive tool for software proto typing. The implementation of the PSI-I hardware required 11 printed circuit boards. As the amount of hardware became clear, we established that we could obtain an element processor for a parallel machine if we used VLSI chips for implementation. For the KLO language processor which was implemented in the firmware, we estimated that better optimization of object code made by the compiler would greatly improve execution speed. (Later, this optimization was made by introducing of the "WAM" code.[Warren 1983]) The PSI-I and SIMPOS proved that logic languages are a very practical and productive vehicle for complex knowledge processing applications. 4 Inference Systems in the Intermediate Stage 4.1 A parallel inference system 4.1.1 Conceptual design of KL1 and PIJYIOS The most important target in the intermediate stage was a parallel implementation of a KL1 language processor, and the development of a parallel operating system, PIMOS. The full version of GHC, was still too complex for the machine implementation. A simpler version, FGHC, was designed.[Chikayama and Kimura 1985] Finally, a practical parallel logic language, KL1, was designed based on FGHC. The KL1 is a parallel language classified as an 38 AND-parallel logic programming language. Its language processor includes an automatic memory management mechanism and a dataflow process synchronization mechanism. These mechanisms were considered essential for writing and compiling large parallel programs. The first problem was whether they could be implemented efficiently. The second problem was what kind of firmware and hardware support would be possible and effective. In addition to problems in implementing the KLI language processor, the design of PIMOS created several important problems. The role of PIMOS is different from that of conventional operating systems. PIMOS does not need to do primary process scheduling and memory management because these tasks are performed by the language processor. It still has to perform resource management for main memory and element processors, and control the execution of user programs. However, a much more difficult role was added. It must allow a user to divide a job into parallel processable processes and distribute them to many element processors. Processor loads must be well balanced to attain better execution performance. In knowledge and symbol processing applications, the dynamic structure of a program is not regular. It is difficult to estimate the dynamic program structure. It was desirable that PIMOS could offer some support for efficient job division and load balancing problems. These problems in the language processor and the operating system were very new, and had not been studied as practical software problems. To solve these problems, we realized that we must have appropriate parallel hardware as a platform to carry out practical software experiments using a trial and error. 4.1.2 PSI-II and Multi-PSI system In conjunction with the development of KLI and PIMOS, we needed to extend our research and develop new theories and software technologies for knowledge processing using logic programming. This research and development demanded improvement of PSI-I machines in such aspects as performance, memory size, cabinet size, disk capacity, and network connection. We decided to develop a smaller and higherperformance model of PSI, to be called PSI-II. This was intended to provide a better workstation for use as a common tool and also to obtain an element processor for the parallel hardware to be used as a platform for parallel software development. This hardware was called a multi-PSI system. It was regarded as a small-scale experimental version of the PIM. As many PSI-II machines were produced, we anticipated having very stable element processors for the multi-PSI system. The PSI-II used VLSI gate array chips for its CPU. The size of the cabinet was about one sixth that of PSI1. Its execution speed was 330K LIPS, about 10 times faster than that of PSI-I. This improvement was attained mainly through employment of the better compiler optimization technique and improvement of its machine architecture. The main memory size was also expanded to 320 MB so that prototyping of large applications could be done quickly. In the intermediate stage, many experimental systems were built on PSI-I and PSI-II systems for knowledge processing research. These included small-to-medium scale expert systems, a natural language discourse understanding system, constraint logic programming systems, a database management system, and so on. These systems were all implemented in the ESP language using about 300 PSI-II machines distributed to the researchers.. as their personal tools. The development of the multi-PSI system was completed in the spring of 1988. It consists of 64 element processors which are connected by an 8 by 8 mesh network. One element processor is contained in three printed circuit boards. Eight element processors are contained in one cabinet. Each element processor has an 80 MB main memory. Thus, a multi-PSI was to have about 5GB memories in total. This hardware was very stable, as we had expected. We produced 6 multi-PSI systems and distributed them to main research sites. 4.1.3 KLI language processor and PIMOS This was the first trial implementation of a distributed language processor of a parallel logic language, and a parallel operating system on real parallel hardware, used as a practical tool for parallel knowledge processing applications. The KLI distributed language processor was an integration of various complex functional modules such as a distributed garbage collector for loosely-coupled memories. The automatic process synchronization mechanism based on the dataflow model was also difficult to implement over the distributed element processors. Parts of these mechanisms had to be implemented combined with some PIMOS functions such as a dynamic on-demand loader for object program codes. Other important func-tions related to the implementation of the language processor were support functions like system debugging, system diagnostic, and system maintenance functions. In addition to these functions for the KLI language processor, many PIMOS functions for resource management and execution control had to be designed and implemented step by step, with repeated partial module building and evaluation. This partial module building and evaluation was done for core parts of the KLllanguage processor and PIMOS, using not only KLI but also ESP and C languages. An appropriate balance between the functions of the language processor and the functions of PIMOS was considered. The language processor was implemented in a PSI-II firmware for the first time. It worked as a pseudo parallel simulator of KLl, and was used as a PIMOS 39 400KlIPS (Sequential LP.L.:KLO) • Machine language: KLl-b • Max. 64PEs and two FEPs (PSI-II) connected to LAN • Architecture of PE: - Microprogram control (64 bits/word) - Machine cycle: 200ns, Reg.file: 64W - Cache: 4 KW, set associative/write-back - Data width: 40 bits/word - Memory capacity: 16MW (80MB) • Network: - 2-dimensional mesh - 5MB/s x 2 directions/ch with 2 FIFO buffers/ch - Packet routing control function Figure 3: Multi-PSI System: Main features and Appearance development tool. It was eventually extended and transported to the multi-PSI system. In the development of PIMOS, the first partial module building was done using the C language in a Unix environment. This system is a tiny subset of the KLI language processor and PIMOS, and is called the PIMOS Development Support System (PDSS). It is now distributed and used for educational purposes. The first version of PIMOS was released on the PSI-II with the KLI firmware language processor. This is called a pseudo multi-PSI system. It is currently used as a personal programming environment for KLI programs. With the KLI language processor fully implemented in firmware, one element processor or a PSI-II attained about 150 KLIPS for a KLI program. It is interesting to compare this speed with that for a sequential ESP program. As a PSI-II attains about 300 KLIPS for a sequential ESP program, the overhead for KLI caused by automatic process synchronization halves the execution speed. This overhead is compensated for by efficient parallel processing. A full-scale multi-PSI system of 64 element processors could attain 5 - 10 MLIPS. This speed was considered sufficient for the building of experimental software for symbol and knowledge processing applications. On this system, simple benchmarking programs and applications such as puzzle programs, a naturallanguage parser and a Go-game program were quickly developed. These programs and the multi-PSI system was demonstrated in FGCS'88.[Uchida et al. 1988] These proved that KLI and PIMOS could be used as a new platform for parallel software research. 4.2 4.2.1 Overall design of the parallel inference system Background of the design The first question related to the design of the parallel inference system was what kind of functions must be provided for modeling and programming complex problems, and for making them run on large-scale parallel hardware. When we started this project, research on parallel processing still tended to focus on hardware problems. The major research and development interest was in SIMD or MIMD type machines applied for picture processing or large-scale scientific calculations. Those applications were programmed in Fortran or C. Control of parallel execution of those programs, such as job division and load balancing, was performed by built-in programs or prepared subroutine libraries, and could not be done by ordinary users. Those machines excluded most of the applications which include irregular computations and require general parallel programming languages and environments. This tendency still continues. Among these parallel machines, some dataflow machines were exceptional and had the potential to have functional languages and their general parallel programming environment. We were confident that a general parallel programming 40 language and environment is indispensable for writing parallel programs for large-scale symbol and knowledge processing applications, and that they must provide such functions as follows: there were no prior examples, we could not make any reliable quantitative estimation of the overhead caused by these software systems. This implementation was therefore considered risky too. 1. An automatic memory management mechanism for distributed memories (parallel garbage collector) Finally, we decided not to make an element processor too complex , so that our hardware engineers could provide the software researchers with a large-scale hardware platform stable enough to make the largest-scale software experiments in the world. 2. An automatic process synchronization mechanism based on a dataflow scheme 3. Various support mechanisms for attaining the best job division and load balancing. The first two are to be embedded in the language processor. The last is to be provided in a parallel operating system. All of these answer the question of how to write parallel programs and map them on parallel machines. This mapping could be made fully automatic if we limited our applications to very regular calculations and processing. However, for the applications we intend, the mapping process, which includes job division and loadbalancing, should be done by programmers using the functions of the language processor and operating system. 4.2.2 A general parallel programming environment Above mechanisms for mapping should be implemented in the following three layers: 1. A parallel hardware system consisting of element processors and inter-connection network (PIM hardware) 2. A parallel language processor consisting of run-time routines, built-in functions, compilers and so on (KLI language processor) 3. A parallel operating system including a programming environment (PIMOS) At the beginning of the intermediate stage, we tried to determine the roles of the hardware, the language processor and the operating system. This was really the start of development. One idea was to aim at hardware with many functions and using high density VLSI technology, as described in early papers on dataflow machine research. It was a very challenging approach. However, we thought it too risky because changes to the logic circuits in VLSI chips would have a long turn-around time even if the rapid advance of VLSI technology was taken into account. Furthermore, we thought it would be difficult to run hundreds of sophisticated element processors for a few days to a few weeks without any hardware faults. Implementation of the language processor and the operating system was thought to be very difficult too. As However, we ~ried to add cost-effective hardware support for KLI to the element processor, in order to attain a higher execution speed. We employed tag architecture to support the autom:atic memory management mechanism as well as faster execution of KLI programs. The automatic synchronization mechanism was to be implemented in firmware. The supports for job division and load balancing were implemented partially by the firmware as primitives of the KLI language, but they were chiefly implemented by the operating system. In a programming environment of the operating system, we hoped to provide a semi-automatic load balancing mechanism as an ultimate research goal. PIMOS and KLI hide from users most of the architectural details of the element processors and network system of PIM hardware. A parallel prograII}. is modeled and programmed depending on a parallel model of an application problem and algorithms designed by a programmer. The programmer has great freedom in dividing programs because a KLI program is basically constructed from very fine-grain processes. As a second step, the programmer can decide the grouping of fine-grain processes in order to obtain an appropriate granularity as divided jobs, and then specify how to dispatch them to element processors using a special notation called "pragma". This two step approach in parallel programming makes it easy and productive. We decided to implement the memory management mechanism and the synchronization mechanism mainly in the firmware. The job division and load balancing mechanism was to be implemented in the software. We decided not to implement uncertain mechanisms in the hardware. The role of the hardware system was to provide a stable platform with enough element processors, execution speed, memory capacity, number of disks and so on. The demands made on the capacity of a cache and a main memory were much larger than those of a general purpose microprocessor of that time. The employment of tag architecture contributed to the simple implementation of the memory management mechanism and also increased the speed of KLI program execution. 41 R&D in the final stage 5 5.1 Planning of the final stage At the end of the intermediate stage, an experimental medium-scale parallel inference system consisting of the multi-PSI system, the KL1 language processor, and PIMOS was successfully completed. On this system, several small application programs were developed and run efficiently in parallel. This proved that symbol and knowledge processing problems had sufficient parallelism and could be written in KL1 efficiently. This success enabled us to enter the final stage. Based on research achievements and newly developed tools produced in the intermediate stage, we made a detailed plan for the final stage. One general target was to make a big jump from the hardware and software technologies for the multi-PSI system to the ones for the PIM, with hundreds of element processors. Another general target was to make a challenge for parallel processing of large and ·complex knowledge processing applications which had never been tackled anywhere in the world, using KL1 and the PIM. Through the research and development directed to these targets, we expected that a better parallel programming methodology would be established for logic programming. Furthermore, the development of large and complex application programs would not only encourage us to create new methods of building more intelligent systems systematically but could also be used as practical benchmarking programs for the parallel inference system. We intended to develop new techniques and methodologies. 1. Efficient parallel software technology (a) Parallel modeling and programming techniques -Parallel programming paradigms -Parallel algorithms (b) Efficient mapping techniques of parallel processes to parallel processors -Dynamic load balancing techniques - Performance debugging support 2. New methodologies to build intelligent systems using the power of the parallel inference system (a) Development of a higher-level reasoning or inference engine and higher-level programming languages (b) Methodologies for knowledge representation and knowledge base management (methodology for knowledge programming) The research and development themes in the final stage were set up as follows: 1. PIM hardware development We intended to build several models with different architectures so that we could compare mapping problems between the architectures and program models. The number of element processors for all the modules was planned about 1,000. 2. The KL1 language processor for the PIM modules We planned to develop new KLllanguage processors which took the architectural differences on the PIM modules into account. 3. Improvement and extension of PIMOS We intended to develop an object-oriented language, AYA, over KL1, a parallel file system, and extended performance debugging tools for its programming environment. 4. Parallel DBMS and KBMS We planned to develop a parallel and distributed database management system, using several disk drives connected to PIM element processors, was intended to attain high throughput and consequently a high information retrieval speed. As we had already developed a data base management system, Kappa-II, which employed a nested relational model on the PSI machine, we decided to implement a parallel version of Kappa- II. However, we redesiged its implementation, employing the distributed database model and using KL1. This parallel version is called Kappa-P. We plan to develop a knowledge base management system on the Kappa-P. This would be based on the deductive object-oriented DB, having a knowledge representation language, Quixote. 5. Research on knowledge programming software We intended to continue various basic research activities to develop new theories, methodologies and tools for building knowledge processing application systems. These activities were grouped together as research on knowledge programming software. This included research themes such as a parallel constraint logic programming language, mathematical systems including theorem provers, natural language processing systems such as a grammar design system, and an intelligent sentence generation system for man-machine interfacing. 6. Benchmarking and experimental parallel application systems To evaluate the parallel inference system and the various tools and methodologies developed in the above themes, we decided to make more effort to 42 explore new applications of parallel knowledge processing. We began research into a legal expert system, a genetic information processing systems and so on. 5.2 R&D results in the final stage The actual research activities into the themes described above differed according to characteristics. In the development of the parallel inference system, we focused on the integration of PIM hardware and some software components. In our research on knowledge programming software, we continued basic research and experimental software building to create new theories and develop parallel software technologies for the future. 5.2.1 PIM hardware and KLI language processor A role of the PIM hardware was to provide software researchers with an advanced platform which would allow large-scale software development for knowledge processing. Another role was to obtain various evaluation data in the architecture and hardware structure of the element processors and network systems. In particular, we wanted to analyze the performance of large-scale parallel programs on various architectures (machine instruction sets) and hardware structures, so that hardware engineers could design more powerful and cost-effective parallel hardware in the future. In the conceptual design of the PIM hardware, we realized that there were many alternative designs for the architecture of an element processor and the structure of a network system. For the architecture of an element processor, we could choose between a CISC type instruction set implemented in firmware and a RISC type instruction set. On the interconnection network, there were several opinions, including a two dimensional mesh network like the multi-PSI, a cross-bar switch, and a common bus and coherent cache. To design the best hardware, we needed to find out the mapping relationships between program behavior and the hardware architectures and structures. We had to establish criteria for the design of the parallel hardware, reflecting the algorithms and execution structures of application programs. To gather the basic data we needed to obtain this design criteria, we tried to categorize our design choices into five groups and build five PIM modules. The main features of these five modules are listed in Table 2. The number of element processor required for each module was determined depending on the main purpose of the module. Large modules have 256 to 512 element processors, and were intended to be used for software experiments. Small modules have 16 or 20 element processors and were built for architectural experiments and evaluation. All of these modules were designed to support KL1 and PIMOS, so that software researchers could run one program on the different modules and compare and analyze the behaviors of parallel program execution. A PIM/m module employed architecture similar to the multi-PSI system. Thus, its KL1 language processor could be developed by simply modifying and extending that of the multi-PSI system. For other modules, namely PIM/p, PIM/c, PIM/k, and PIM/i, the KL1 language processor had to be newly developed because all of these modules have a cluster structure. In a cluster, four to eight element processors were tightly coupled by a shared memory and a common bus with coherent caches. While communication between element processors is done through the common bus and shared memory, communication between clusters is done via a packet switching network. These four PIM modules have different machine instruction sets. We intended to avoid the duplication of development work for the KL11anguage processor. We used the KL1.C language to write PIMOS and the usual application programs. A KL1-C program is compiled into the KL1B language, which is similar to the "WAM" as shown in Figure 5. We defined an additional layer between the KL1-B language and the real machine- instruction. This layer is called the virtual hardware layer. It has a virtual machine instruction set called "PSL". The specification of the KL1-B interpreter is described in PSL. This specification is semi-automatically converted to a real interpreter or runtime routines dedicated to each PIM modules. The specification in PSL is called a virtual PIM processor (the VPIM processor for short) and is common to four PIM modules. PIM/p, PIM/m and PIM/c are intended to be used for large software experiments; the other modules were intended for architectural evaluations. We plan to produce a PIM/p with 512 element processors, and a PIM/m with 384 element processors. Now, at the beginning of March 1992, a PIM/m of 256 processors has just started to run a couple of benchmarking programs. We aimed at a processing speed of more than 100 MLIPS for the PIM modules. The PIM/m with 256 processors will attain more than 100 MLIPS as its peak performance. However, for a practical application program, this speed may be much reduced, depending on the characteristics of the application program and the programming technique. To obtain better performance, we must attempt to augment the effect of compiler optimization and to implement a better load balancing scheme. We plan to run various benchmarking programs and experimental application programs to evaluate the gain and loss of implemented hardware and software functions. 43 EXPERIMENTAL PARALLEL APPLICATIONS PROGRAMS • Parallel VLSI-CAD system • Legal inference system • Parallel Go playing system • Natural language analysis tool • Genetic information analysis tool Software group for functional demonstration and parallel application experiment . Parallel expert system - Logic design - Equipment diagnosis . Parallel software development support - Parallel algorithm - Intelligent programming environment . Constraint programming - Parallel constraint processing syste m (GDCC) . Automatic parallel theorem-proving system - MGTP prover • Discourse processing system - Contextual analysis - Language knowledge base • General-purpose Japanese language pr ocessing system • Paral~r natural language analysis experimental system . Parallel programming support - Visualization tool (ParaGragh) • Deduction/object-oriented DB - Knowledge representation language Quixote • Gene DB/KB application experiment I Figure 4: Research Themes in the Final Stage Table 2: Features of PIM modules Item PIMlp PIMlc PIMlm PIMJj PIM/k Machine instructions RISC-type + Horizontal microinstructions Horizontal microinstructions RISC-type RiSe-type ~roinstructions Target cycle time 60nsec 65nsec 50nsec 100 nsec 100 nsec LSI devices Standard cell Gate array Cell base Standard cell Custom Process Technology . (line width) .0.96 pm O.Spm O.Spm 1.2 pm 1.2 )1ITl Machine configuration Multicluster Multicluster connections (S PEs connections (S PEs linked to a shared + CC linked to a shared memory) rnemoryl;!'a hypercu network in a crossbar network Two-dimensional mesh network connections Shared memory Two-level parallel cache connections connections through a parallel cache Number of PEs connected 512 PEs 256 PEs 16 PEs 256 PEs 16 PEs 44 r- I Kll Program Compilation into an intennediate languge, KLI-U (similar to WAM of Prolog). ......... ~.~.1.~.~..~.~.~.~........ rnlCI"C are many transfonnation methods corresponding to hardware architectures. Runtime Libraries, ••• • •• • • Microprograms, or ........::.... Object Codes Transformation Specification of KL 1-8 Abstract Machine .................................... Real Hardware (PIM/p, PIM/m, PIM/c, PIM/i, PIM/k, Multi-PSI) Uirtual Hardware (Shared-memory Multiprocessors + Loosely-coupled Network) Figure 5: KLI Language Processor and VPIM Multiple Hypercube Network ,, , ,, ,, ,, , ,, ,, ,, ,, ,, ,, ,, I I ,,, , ,,, , ••• , ,,, ,, , II I Clustero -. ---------------- ----, Ouster{ I cfusteris • Machine language: KLl-b • Architecture of PE and cluster - RISC + HLlC(Microprogrammed) - Machine cycle: 60ns, Reg.file: 40bits x 32W - 4 stage pipeline for RISC inst. - Internal Inst. Mem: 50 bits x 8 KW - Cache: 64 KB, 256 column, 4 sets, 32B/block - Protocol: Write-back, Invalidation - Data width: 40 bits/word - Shared Memory capacity: 256 MB • Max. 512 PEs, 8 PE/cluster and 4 clusters/cabinet • Network: - Double hyper-cube (Max 6 dimensions) - Max. 20MB/sec in each link Figure 6: PIM model P: Main Features and Appearance of a Cabinet 45 • Machine language: KL1-b • Architecture of PE: - Microprogram control (64 bits/word x 32 KW) - Data width: 40 bits/word - Machine cycle: 60ns. Reg.file: 40 bits x 64W - 5 stage pipeline - Cache: 1 KW for Inst .. 4 KW for Data - Memory capacity: 16MW x 40 bits (80 MB) • Max. 256 PEs, 32 PE/cabinet • Network: - 2-dimensional mesh - 4.2MB/s x 2 directions/ch Figure 7: PIM model M: Main Features and Appearance of four Cabinets 5.2.2 Development of PIMOS PIMOS was intended to be a standard parallel operating system for large-scale parallel machines used in symbol and knowledge processing. It was designed as an independent, self-contained operating system with a programming environment suitable for KLl. Its functions for resource management and execution control of user programs were designed as independent from the architectural details of the PIM hardware. They were implemented based on an almost completely non-centralized management scheme so that the design could be applied to a parallel machine with one million element processors.[Chikayama 1992] PIMOS is completely written in KLl. Its management and control mechanisms are implemented using a "meta-call" primitive of KLl. The KL1 language processor has embedded an automatic memory management mechanism and a dataflow synchronization mechanism. The management and control mechanisms are then implemented over these two mechanisms. The resource management function is used to manage the memory resources and processor resources allocated to user processes and input and output devices. The program execution control function is used to start and stop user processes, control the order of execution following priorities given to them, and protect system programs from user program bugs like the usual sequential operat- ing systems. PIMOS supports multiple users, accesses via network and so on. It also has an efficient KL1 programming environment. This environment has some new tools for debugging parallel programs such as visualization programs which show a programmer the status of load balancing in graphical forms, and other monitoring and measurement programs. 5.2.3 Knowledge base management system The know ledge base management system consists of two layers. The lower layer is a parallel database management system, Kappa-P. Kappa-P is a database management system based on a nested relational model. It is more flexible than the usual relational database management system in processing data of irregular sizes and structures, such as natural language dictionaries and biological databases. The upper layer is a knowledge base management system based on a deductive object-oriented database. [Yokota and Nishio 1989] This provides us with a knowledge representation language, Quixote. [Yokota and Yasukawa 1992] These upper and lower layers are written in KL1 and are now operational on PIMOS. The development of the database layer, Kappa, was started at the beginning of the intermediate stage. 46 Kappa aimed to manage the "natural databases" accumulated in society, such as natural language dictionaries. It employed a nested relational model so that it could easily handle data sets with irregular record sizes and nested structures. Kappa is suitable not only for natural language dictionaries but also for DNA databases, rule databases such as legal data, contract conditions, and other "natural databases" produced in our social systems. The first and second versions of Kappa were developed on a PSI machine using the ESP language. The second version was completed at the end of the intermediate stage, and was called Kappa-II.[Yokota et al. 1988] In the final stage, a parallel and distributed implementation of Kappa was begun. It is written in KL1 and is called Kappa-P. Kappa-P is intended to use large PIM main memories for implementing the main memory database scheme, and to obtain very high throughput rate for disk input and output by using many disks connected in parallel to element processors. In conjunction with the development of Kappa-II and Kappa-P, research on a knowledge representation language and a know ledge base management system was conducted. After repeated experiments in design and implementation, a deductive object-oriented database was employed in this research. At this point the design of the knowledge representation language, Quixote, was completed. Its language processor, which is the knowledge base management system, is under development. This language processor is being built over Kappa-P. Using Quixote, construction of a knowledge base can then he made continuously from a simple database. This will start with the accumulation of passive fact data, then gradually add active rule data, and will finally become a complete knowledge base. The Quixote and Kappa-P system is a new knowledge base management system which has a high-level knowledge representation language and the parallel and distributed database management system as the base of the language processor. The first versions of Kappa-P and Quixote are now almost complete. It is interesting to see how this big system operates and how much its overhead will be. 5.2.4 Knowledge programming software This software consists of various experimental programs and tools built in theoretical research and development into some element technologies for knowledge processing. Most of these programs and tools are written in KLl. These could therefore be regarded as application programs for the parallel inference system. 1. Constraint logic programming system In the final stage, a parallel constraint logi~ programming language, GDCC, is being developed. This language is a high-level logic language which has a constraint solver as a part of its language processor. The language processor is implemented in KL1 and is intended to use parallel processing to make its execution time faster. The GDCC is evaluated by experimental application programs such as a prograIIi for designing a simple handling robot.[ Aiba and Hasegawa 1992] 2. Theorem proving and program transformation A model generation theorem prover, MGTP, is being implemented in KLl. For this application, the optimization of load balancing has been made successfully. The power of parallel processing is almost proportional to the number of element processors being used. This prover is being used as a rulebased reasoner for a legal reasoning system. It en-::' abIes this system to use knowledge representation based on first order logic, and to contribute to easy knowledge programming. 3. Naturallanguage processing Software tools and linguistic data bases are being developed for use in implementing natural language interfaces. The tools integrated into a library called a Language Tool Box (LTB). The LTB includes natural language parsers, a sentence generators, and the linguistic databases and dictionaries including syntactic rules and so on. 5.2.5 Benchmarking and experimental parallel application software This software includes benchmarking programs for the parallel inference system, and experimental parallel application programs which were built for developing parallel programming methodology, knowledge representation techniques, higher-level inference mechanisms and so on. In the final stage, we extended the application area to include larger-scale symbol and knowledge processing applications such as genetic information processing and legal expert systems. This was in addition to engineering applications such as VLSI-CAD systems and diagnostic systems for electronic equipment. [Nitta 1992] 1. VLSI CAD programs Several VLSI CAD programs are being developed for use in logic simulation, routing, and placement. This system is aimed at developing various parallel algorithms and load balancing methods. As there are sequential programS which have similar functions to these programs, we can compare the performance of the PIM against that of conventional machines. 47 2. Genetic information processing programs Sequence alignment programs for proteins and a protein folding simulation program are being developed. Research on an integrated database for biological data is also being made using Kappa. 3. A legal reasoning system This system infers possible judgments on a crime using legal rules and past cases histories. It uses the parallel theorem prover, MGTP, as a core of the rule- based reasoner. This system is making full use of important research results of this project, namely, the PIM, PIMOS, MGTP and high-level inference and knowledge representation techniques. 4. A Go game playing system The search space of a Go game is too large to apply any exhaustive search method. For a human player, there are many text books to show typical position sequences of putting stones which is called "Joseki" patterns. This system has s~me of the Joseki patterns and some heuristic rules as its knowledge base to win the game against a human player. It aims to attain 5 to 10 "kyuu" level. The applications we have described all employ symbol and knowledge processing. The parallel programs have been programmed in KLI in a short time. Particularly for the CAD and sequence alignment programs, the processing speed has improved almost proportionally to the number of element processors. However, as we can see in the Go playing system, which is a very sophisticated program, the power of the parallel inference system can not always increase its intelligence effectively. This implies that we cannot effectively transcribe "natural" knowledge bases written in text books .on Go into data or rules in "artificial" knowledge base of the system which would make the system " clever". We need to make more effort to find out a better program structure and better algorithms to make full use of the merit of parallel processing. 6 6.1 Evaluation of the parallel inference system General purpose parallel programming environment The practical problems in symbol and knowledge processing applications have been written efficiently in KLl, and solved quickly using a PIM which has several hundred element processors. Productivity of parallel software using in KLI has been proved to be much higher than in any conventional language. This high productivity is apparently a result of using the automatic memory management mechanism and the automatic dataflow synchronization mechanism. Our method of specifying job division and load balancing has been evaluated and proved successful. KLI programming takes a two-step approach. In the first step, a programmer writes a program concentrating only on the program algorithms and a model. When the program is completed, the programmer adds the specifications for job division and load balancing using a notation called "pragma" as the second step. This separation makes the programming work simple and productive. The specification of the KLllanguage has been evaluated as practical and adequate for researchers. However, we realize that application programmers need a simpler and higher-level KLI language specification which is a subset of KLI. In the future, several application-oriented KLI language specifications should be provided, just as the von Neumann language set has a variety of different languages such as Fotran, Pascal and Cobol. 6.2 Evaluation of KL1 and PIMOS The functions of PIMOS, some of which are implemented as KLI functions, have been proved to be effective for running and debugging user programs on parallel hardware. The resource management and execution mechanisms in particular work as we had expected. For instance, priority control of user processes permits programmers to use about 4,000 priority levels and enables them to write various search algorithms and speculative computations very easily. We are convinced that the KLI and PIMOS will be the best practical example for general purpose parallel operating systems in the future. 6.3 Evaluation of hardware support for language functions In designing of the PIM hardware and the KLllanguage processor, we thought it more important to provide a usable and stable platform which has a sufficient number of element processor for parallel software experiments than to build many dedicated functions into the element processor. Only the dedicated hardware support built in the element processor was tag architecture. Instead, we added more support for the interconnection between element processors such as message routing hardware and a coherent cache chip. We did not embed complex hardware support, such as a matching store of a dataflow machine, or a contentaddressable memory. We thought it risky because an implementation of the complex hardware would take a long turn around time even by a very advanced VLSI technology. We also considered that we should create a new optimization technique for a compiler dedicated to 48 the embedded complex hardware support, and that this would not easy too. The completion of PIM hardware is now one year behind the original schedule, mainly because we had many unexpected problems in the design of the random logic circuits, and in submicron chip fabrication. If we had employed a more complex design for the element processor, the PIM hardware would have been further from completion. 6.3.1 Comparison of PIM hardware with commercially available technology Rapid advances have been made in RISe processors recently. Furthermore, a few MIMD parallel machines which use a RISe processor as their element processor have started to appear in the market. When we began to design the PIM element processor, the performances of both RISe and elSe processors were as low as a few MIPS. At that time, a dedicated processor with tag architecture could attain a better performance. However, now some RISe processors have attained more than 50 MIPS. It is interesting to evaluate these RISe processors for KLI program execution speed. We usually compare the execution speed of a PIM element processor to that of a general-purpose microprocessor, regarding 1 LIPS as approximately equivalent to 100 IPS. This means that a 500 KLIPS PIM element processor should be comparable to a 50 MIPS microprocessor. However, the characteristics of KLI program execution are very different from those of the usual benchmark programs for general-purpose microprocessors. The locality of memory access patterns for practical KLI programs is lower than for standard programs. As the length of the object codes for a RISe instruction set has to be longer than a elSe or dedicated instruction set processors, the cache miss ratio will be greater. Then, simple comparison with the PIM element processor and some recent RISe chips using announced peak performance is not meaningful. Thus, the practical implementation of the KLllanguage processor on a typical RISe processor is necessary. Most of the MIMD machines currently on the market lack a general parallel programming environment. The porting of the KLI language processor may allow them to employ new scientific applications as well as symbol and knowledge processing applications. In the future processor design, we believe that a general purpose microprocessor should have tag architecture support as apart of its standard functions. 6.3.2 Evaluation of high-level programming overhead Parallel programming in KLI is very productive, especially for large-scale and complex problems. The control of job division and load balancing works well for hundreds of element processors. No conventional language is so productive. However, if we compare the processing speed of a KLI program with that of a conventional language program with similar functions within a single element processor, we find that the KLI overhead is not so small. This is a corrunon trade-off problem between high-level programming and low-level programming. One straightforward method of compensating is to provide a simple subroutine call mechanism to link e language programs to KLI programS. Another method is to improve the optimization techniques of compilers. This method is more elegant than the first. Further research on optimization technique should be undertaken. 7 Conclusion It is obvious that a general-purpose parallel programming langu,age and environment is indispensable for solving practical problems of knowledge and symbol processing. The straightforward extension of conventional von Neumann languages will not allow the use of hundreds of element processors except for regular scientific calculations. We anticipated the difficulties in efficient implementation of the automatic memory management and synchronization mechanisms. However, this has been now achieved. The productivity and maintainability of KLI is much higher than we expected. This more than compensates for the overhead in high-level language programming. Several experimental parallel application programs on the parallel inference system have proved that most large-scale knowledge processing applications contain potential parallelism. However, to make full use of this parallelism, we need to have more parallel algorithms and paradigms to actually program the applications. The research and development targets of this FGeS project have been achieved, especially as regards the parallel inference system. We plan to distribute the KLI language processor and PIMOS as free software or public domain software, expecting that they will be ported to many MlMD machines, and will provide a research platform for future knowledge processing technology. Acknowledgment The development of the FGeS prototype system was conducfed jointly by many people at lCOT, cooperating manufacturers, and many researchers in many countries. The author would like to express my gratitude to all the people who have given us much advise and help for more than 10 years. 49 References [Uchida 1987] S. Uchida. "Inference Machines in FGCS Project", TR 278, ICOT, 1987. [Uchida et al. 1988] S. Uchida, K. Taki, K. Nakajima, A. Goto and T. Chikayama, "Research and Development of The Parallel Inference System in The Intermediate Stage of The project", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Nov.28-Dec.2, 1988. [Go to et al. 1988] A. Goto, M. Sato, K. Nakajima, K. Taki, and A. Matsumoto. " Overview of the Parallel Inference Machine Architecture (PIM)", In Proc. of the International Conference on Fifth Generation Computing Systems 1988, Tokyo, Japan, November 1988. [Taki 1992] K. Taki, "Parallel Inference Machine, PIM", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Jul.1-5, 1992. [Chikayama 1984] T. Chikayama, "Unique Features of ESP", In Proc. Int. Conf. on Fifth Generation Computer Systems 1984, ICOT, 1984, pp. 292-298. on Fifth Generation Computer Systems 1988, ICOT, 1988, pp. 230-251. [Chikayama 1992] T. Chikayama, "Operating System PIMOS and Kernel Language KL1", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Jul.15, 1992. [Uchida et al. 1988] S. Uchida, "The Research and Development of Natural Language Processing Systems in the Intermediate Stage of the FGCS Project", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Nov.28-Dec.2, 1988. [Yokota et al. 1988] K. Yokota, M. Kawamura, and A. Kanaegami, "Overview of the Knowledge Base Management System (KAPPA)", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Nov.28-Dec.2, 1988. [Yokota and Nishio 1989] K. Yokota and S. Nishio, "Towards Integration of Deductive Databases and ObjectOriented Databases-A Limited Survey", Proc. Advanced Database System Symposium, Kyoto, Dec., 1989. [Warren 1983] D.H.D. Warren, "An Abstract Prolog Instruction Set", Technical Note 309, Artificial Intelligence Center, SRI, 1983. [Yokota and Yasukawa 1992] K.Yokota and H. Yasukawa, "Towards an Integrated KnowledgeBase Management System", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Jul.1-5, 1992. [Clark adn Gregory 1983] Keith L. Clark and Steve Gregory, "Parlog: A parallel logic programming language", Research Report TR-83-5, Imperial College, March 1983. [ Aiba and Hasegawa 1992] A. Aiba and R. Hasegawa, "Constraint Logic Programming System", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Jul.1-5, 1992. [Clark and Gregory 1984] K. 1. Clark and S. Gregory, "Notes on Systems Programming in PARLOG", In Proc. Int. Conf. on Fifth Generation Computer Systems 1984, ICOT;'1984, pp. 299-306. [Nitta 1992] K. Nitta, K. Taki, and N. Ichiyoshi, "Development of Parallel Application Programs of the Parallel Inference Machine", Proc. Int. Conf. on Fifth Generation Computer Systems, Tokyo, Jul.1-5, 1992. [Shapiro 1983] E. Y. Shapiro, "A subset of Concurrent Prolog and Its Interpreter", TR 003, ICOT, 1987. [Ueda 1986] K. Ueda. Guarded Horn Clauses, "In Logic Programming", '85, E. Wada (ed.), Lecture Notes in Computer Science 221, Springer-Verlag, 1986, pp.168179. [Ueda 1986] K. Ueda, "Introduction to Guarded Horn Clauses", TR 209, ICOT, 1986-. [Chikayama and Kimura 1985] T. Chikayama and Y. Kimura, "Multiple Reference Management in Flat GHC", In Proc. Fourth Int. Conf. on Logic Programming, MIT Press, 1987, pp. 276-293. [Chikayama el al. 1988] T. Chikayama, H. Sato and T. Miyazaki, "Overview of the Parallel Inference Machine Operating System (PIMOS)" , In Proc. Int. Conf. PROCEEDINGS OF THE INTERNA nONAL CONFERENCE ON FIFTH GENERA nON COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 50 Parallel Inference Machine PIM Kazuo Taki First Research Laboratory Institute for New Generation Computer Technology 4-28, Mita l-chome, Minato-ku', Tokyo 108, JAPAN taki@icot.or.jp The parallel inference machine, PIM, is the prototype hardware system in the Fifth Generation Computer Systems (FGCS) project. The PIM system aims at establishing the basic technologies for large-scale parallel machine architecture, efficient kernel language implementation and many aspects of parallel software, that must be required for high performance knowledge information processing in the 21st century. The PIM system also supports an R&D environment for parallel software, which must extract the full power of the PIM hardware. The parallel inference machine PIM is a large-scale parallel machine with a distributed memory structure. The PIM is designed to execute a concurrent logic programming language very efficiently. The features of the concurrent logic language, its implementation, and the machine architecture are suitable not only for knowledge processing, but also for more general large problems that arise dynamic and non-uniform computation. Those problems have not been covered by commercial parallel machines and their software systems targeting scientific computation. The PIM system focuses on this new domain of parallel processing. There are two purposes to this paper. One is to report an overview of the research and development of the PIM hardware and its language system. The other is to clarify and itemize the features anp advantages of the language, its implementation and the hardware structure with the view that the features are strong and indispensable for efficient parallel processing of large problems with dynamic and non-uniform computation. 1 Introduction The Fifth Generation Computer Systems (FGCS) project aims at establishing basic software and hardware technologies that will be needed for high-performance knowledge information processing in the 21st century. The parallel inference machine PIM is the prototype hardware system and offers gigantic computation power ( Interfaces) Application Programs Abstract • I KL 1 Language == • PIMOS Protocol PIMOS f- KL 1 Parallel Implementation PIM Hardware • KL 1 Machine Language or Microprogram Figure 1: Overview of the PIM System to the knowledge information processing. The PIM system includes an efficient language implementation of KL1, which is the kernel language and a unique interface between hardware and software. Logic programming was chosen as the common basis of research and development for the project. The primary working hypothesis was as follows. "Many problems of future computing, such as execution efficiency (of parallel processing), descriptive power of languages, software productivity, etc., will be solved drammatically with the total reconstruction of those technologies based on logic programming. Following the working hypothesis, R&D on the PIM system started from scratch with the construction of hardware, a system software, a language system, application software and programming paradigms, all based on logic programming. Figure 1 gives an overview of the system structure. The kernel language KL1 was firstly designed for efficient· concurrent programming and parallel execution of knowledge processing prob.1ems. Then, R&D on the PIM hardware with distributed-memory MIMD architecture and the KLI language implementation on it were carried out, both aiming at efficient KLI execution in 51 parallel. A machine roughly with 1000 processors was primarily targeted. Each of these processors was to be a high-speed processor with hardware support for symbolic processing. The PIM system also focused on realizing a useful R&D environment for parallel software which could extract the real computing power of the PIM. The preparation of a good R&D environment was an important project policy. KL1 is a concurrent logic programming language primarily targeting knowledge processing. Since the language had to be a common basis for various types of knowledge processing, it became a general-purpose concurrent language suitable for symbolic processing, without shifting to a specific reasoning mechanism or a certain knowledge representation paradigm. Our R&D led to the language features of KL1 being very suitable for covering the dynamic and non-uniform large problems that are not covered by commercial parallel computers and their software systems for scientific computation. Most knowledge processing problems are included in the problem domain of dynamic and nonuniform computation. The PIM hardware and the KL1 language implementation support the efficiency of the language features. Thus, the PIM system covers this new domain of parallel processing. This paper focuses on two subjects. One is the R&D report of the PIM hardware and the KL1language implementation on it. The other is to clarify and itemize the features and advantages of the language, its implementation and the hardware structure with the view th?-t .the features are strong and indispensable for efficient parallel processing of large problems with dynamic and nonuniform computation. Any parallel processing system targeting this problem domain must consider those features. Section 2 scans the R&D history of parallel processing systems in the FGCS project, with explanation of some of the keywords. Section 3 characterizes the PIM system. Many advantageous features of the language, its parallel implementation and hardware structure are described with the view that the features are strong and indispensable for efficient programming and execution of the dynamic and non-uniform large problems. Section 4 presents the machine architecture of PIM. Five different models have been developed for both research use and actual software development. Some hardware specifications are also reported. Section 5 briefly describes the language implementation methods and techniques, to give a concrete image of several key features of the KL1 implementation. Section 6 reports some measurements and evaluation mainly focusing on a low-cost implementation of small-grain concurrent processes and remote synchronization, which support the advantageous features of KLl. Overall efficiency, as demonstrated by a few benchmark programs, is shown, including the most recent measurements on PIM/m. Then, section 7 con- cludes this paper. Several important research issues of parallel software are reported in other papers: the parallel operating system PIMOS is reported in [Chikayama 1992] and the load balancing techniques controlled by software are reported in [Nitta et al. 1992]. 2 R&D History This section shows the R&D history of parallel processing systems in the FGCS project. Important research items and products of the R&D are described briefly, with explanations of several keywords. There are related reports for further information [Uchida 1992] [Uchida et al. 1988]. 2.1 Start of the Mainstream of R&D Mainstream of R&D of the parallel processing systems started at the beginning of the intermediate stage of the FGCS project, in 1985. Just before that time, a concurrent logiclanguage GHC [Ueda 1986] had been designed, which was chosen as the kernel language of the R&D. Language features will be described in section 3.4. Development of small hardware and software systems was started based on the kernel language GHC as a hardware and software interface. The hardware system was used as a testbed of parallel software research. Experiences and evaluation results was fed back to the next R & D of larger hardware and software system, which was the bootstrapping of R fj D. It was started from development of the Multi-PSI [Taki 1988]. Purpose of the hardware development was not only the architectural research of a knowledge processing hardware, but also a preparation of a testbed for efficient language implementation of the kernel language. The Multi-PSI also focused to be a useful tool and environment of parallel software research and development. That is, the hardware was not just an experimental machine, but a reliable system being developed in short period, with measurements and debugging facilities for software development. After construction of the MultiPSI/VI and /V2 with language implementations, various parallel programs and technology and knowhow of parallel software have been accumulated [Nitta et al. 1992] [Chikayama 1992]. The systems have been used for the advanced software development environment for the parallel inference machines. 2.2 Multi-PSI/VI" The first hardware was the Multi-PSI/VI [Taki 1988] [Masuda et al. 1988], started in operation in spring 1986. The personal sequential inference machine PSI [Taki et al. 1984] was used for processing elements. It was a development result of the initial stage of the 52 project. Six PSI machines were connected by a mesh network, which supported so called wormhole routing. The first distributed implementation of GHC was built on it [Ichiyoshi et al. 1987]. (Distributed implementation means a parallel implementation on a distributed memory hardware). Execution speed was slow (IK LIPS = logical inference per second) because an interpreter system was written in ESP (the system description language of the PSI). However, basic algorithms and techniques of distributed implementation of GHC was investigated in it. Several small parallel programs were written and executed on it for evaluation, and primary experimentations of load balancing were also carried out. 2.3 From GHC To KLl Since GHC had only basic functions that the kernel concurrent logic language had to support, language extensions were needed for the next more practical system. Kernel language KLI was designed with considerations of execution efficiency, operating system supports, and some built-in functions [Ueda and Chikayama 1990] [Chikayama 1992]. An intermediate language KLI-B, which was the target language of KLI compiler, was also designed [Kimura and Chikayama 1987]. In the MultiPSI/V2 and a PIM model, binary code of KLI-B is directly interpreted by microprogram; that is, KLI-B is machine language itself. In the other PIM models, KLlB code is converted to lower-level machine instruction sequences and executed by hardware. 2.4 Multi-PSI/V2 The second hardware system was the Multi-PSI/V2 [Takeda et al. 1988] [Nakajima 1992]' which was improved in performance and functions enough to be called as the first experimental parallel inference machine. It started in operation in 1988 and was demonstrated in the FGCS '88 international conference. The Multi-PSI/V2 included 64 processors, each of which were equivalent to the CPU of PSIII [Nakashima and Nakajima 1987], smaller and faster model of the PSI. Processors were connected with two dimensional mesh network with improved speed (lOM Bytes/s, full duplex in each channel). KLI-B was the machine language of the system, executed by microprogram. Almost all the runtime functions of KLI was implemented in microprogram. The KLI implementation was improved much in execution efficiency, reducing inter-processor communication messages, efficient garbage collections, etc. compared with Multi-PSI/VI. lt attained 130K LIPS (in KLI append) in single processor speed. Table 1 to 4 include specifications of the Multi-PSI/V2. Since 1988, more than 15 systems, large system with 64 processors and small with 32 or 16 processors, have been in operation for parallel software R & D in ICOT and in cooperating companies. A strong simulator of the Multi-PSI/V2 was also developed for software development environment. It was called the pseudo Multi-PSI, available on the Prolog workstation, PSI-II. A very special feature was caused by similarity of the PSI-II CPU and processing element of the Multi-PSI/V2. Usually, PSI-II executed ESP language with dedicated microprogram. However, it loaded KLI microprogram dynamically at the activation of the simulator system. The simulator executed KLI programs as similar speed as that of the Multi-PSI/V2 single processor. Since the PIMOS could be also executed on the simulator, programmers could use the simulator as sim: ilar environment as the real Multi-PSI/V2, except for speedup with multiple processors and process scheduling. The pseudo Multi-PSI was the valuable system for initial debugging of KL1 programs. 2.5 Software Oevelopment Multi-PSI/V2 on the Parallel operating system PIMOS (the first version) and four small application programs (benchmark programs) [Ichiyoshi 1989] had been developed until FGCS'88. Much efforts was paid in PIMOS development to realize a good environment of programming, debugging, execution and measurements of parallel programs. In the development of small application programs, several important research topics of parallel software were investigated, such as concurrent algorithms with large concurrency without increase of complexity, programming paradigms and techniques of efficient KLI programs, and dynamic and static load balancing schemes for dynamic and non-uniform computation. The PIMOS has been improved in several versions, and ported to the PIM until 1992. The small application programs, pentomino [Furuichi et al. 1990], bestpath [Wada and Ichiyoshi 1990], PAX (natural language parser) and tsume-go (a board game) were improved, measured and analyzed until 1989. They are still used as test and benchmark programs on the PIM. These development gave observations that the KLI system on the Multi-PSI/V2 with PIMOS has reached sufficient performance level for practical usage, and has realized sufficient functions for describing complex concurrent programs and for experimentations of softwarecontrolled load balancing. Several large-scale parallel application programs have been developed from late 1989 [Nitta et al. 1992] and still continuing. Some of them have been ported to the PIM. 53 2.6 Parallel Inference Machine PIM 2.6.1 Five PIM Models Design of the parallel inference machine PIM was started in concurrent with manufacturing of the Multi-PSI/V2. Some research items in hardware architecture were omitted in the development of the Multi-PSI/V2, because of short development time needed for starting the parallel software development. So, PIM took a greedy R&D plan, focusing both the architectural research and realization of software development environment. The first trial to the novel architecture was the multiple clusters. A small number of tightly-coupled processors with shared-memory formed a cluster. Many clusters were connected with high speed network to construct the PIM system with several hundred processors. Benefits of the architecture will be discussed in section 3.7. Many component technologies had to be developed or improved to realize the new system, such as parallel cache memory suitable for frequent inter-processor communications, high speed processors for symbolic processing, improvement of the network, etc. For R&D of better component technologies and their combinations, the development plan of five PIM models was made, so that different component architecture and their combinations could be investigated with assigning independent research topics or roll on each model. Two models, PIM/p [Kumon et al. 1992] and PIM/ c [Nakagawa et al. 1992], took the multi-cluster structure. They include several hundreds processors, maximum 512 in PIM/p and 256 in PIM/ c. They were developed both for the architectural research and software R&D. Each investigated different network architecture and processor structure. The other two models, PIM/k [Sakai et al. 1991] and PIM/i [Sato et al. 1992], were developed for the experimental use of intra-cluster architecture. Two-layered coherent cache memory which enabled larger number of processors in a cluster, broadcast-typed coherent cache memory, and a processor with 1IW-type instruction set were tested. The other model, PIM/m [Nakashima et al. 1992], did not take the multi-cluster structure, but focused the rigid compatibility with the Multi-PSI/V2, having improved processor speed and larger number of processors. The maximum number of processors will be 256. The performance of a processor will be four to five times larger at peek speed, and 1.5 to 2.5 times larger in average than the Multi-PSI/V2. The processor was similar to the CPU of PSI- UX, the most recent version of the PSI machine. A simulator, pseudo-PIM/m, was also prepared like the pseudo Multi-PSI. The PIM/m targeted the parallel software development machine mostly among the models. Architecture and specifications of each model will be reported in section 4. Experimental implementations of some 1SIs of these models have started in 1989. The final design was almost fixed in 1990, and manufacturing of whole system was proceeded with in 1991. From 1991 to spring 1992, assembly and test of the five models have carried on. 2.6.2 Software Compatibility K11 language is common among all the five PIM models. Except for execution efficiency, any K11 programs including PIMOS can run on the all models. Hardware architecture is different between two groups, Multi-PSI and PIM/m as the one, and the other PIM models as the other. However, from programmers' view, abstract architecture are designed similar as follows. The load allocation to processors are fully controlled by programs on the Multi-PSI and the PIM/m. It is sometimes written by programmers directly, and sometimes specified by load allocation libraries. Programmers are often researchers of load balancing techniques. On the other hand, load balancing in a cluster is completely controlled by the K11 runtime system (not by KL1 programs) among the PIM models with the multi-cluster structure. That is, programmers does not have to think of multiple processors in a cluster, but specify load allocation· to each cluster in their programs. It means that a processor of the Multi-PSI or PIM/m corresponds to a cluster of the PIM models with the multi-cluster structure, which simplifies portation of KL1 programs. 2.7 KLI Implementation for PIM KL1 system must be the first regular system in the world which can execute large-scale parallel symbolic processing programs very efficiently. Execution mechanisms or algorithms of KL1 language had been developed for distributed memory architectures sufficiently on the MultiPSI/V2. Some mechanisms and algorithms should be expanded for the multi-cluster architecture of PIM. Ease of porting the KL1 system to four different PIM models was also considered in the language implementation method. Only the PIM/m inherited the KL1 implementation method directly from the Multi-PSI/V2. To expand the execution mechanisms or algorithms suitable for the multi-cluster architecture, several technical topics were focused, such as avoiding data update contentions among processors in a cluster, automatic load balancing in a cluster, expansion of an intercluster message protocol applicable for the message outstripping, parallel garbage collection in a cluster, etc. [Hirata et al. 1992]. For easiness of porting the KL1 system to four different PIM models, a common specification of K11 system "VPIM (virtual PIM)" was written in "C" -like description language "PSL", targeting a common virtual hardware. VPIM was the executable specification of KL1 execution algorithms, which was translated to C language and executed to examine the algorithms. VPIM has been 54 translated to lower-level machine languages or microprograms automatically or by hands according to each PIM structure. Preparation of the description language started in 1988. Study of efficient execution mechanisms and algorithms continued until 1991, then, VPIM was completed. Porting the VPIM to four PIM models partially started in autumn 1990, and continued to spring 1992. Now, the KL1 system with PIMOS is available on each PIM model. On the other hand, KL1 system on the PIM/m, which was implemented in microprogram, was made from conversion of Multi-PSI/V2 microprogram by hands or partially in automatic translation. Prior to the other PIM models, PIM/m started in operati6n with the KL1 system and PIMOS in summer 1991. 2.8 Performance and System Evaluation Measurements, analysis, and evaluation should be done on various levels of the system shown below. 1. Hardware architecture and implementations 2. Execution mechanisms or algorithms of KL1 implementation 3. Concurrent algorithms of applications (algorithms for problem solving, independent from mapping) and their implementations 4. Mapping (load allocation) algorithms 5. Total system performance of a certain application program on a certain system Various works have been done on the Multi-PSI/V2. and 2 were reported in [Masuda et al. 1988] and [Nakajima 1992]. 3 to 5 were reported in [Nitta et al. 1992], [Furuichi et al. 1990], [Ichiyoshi 1989] and [Wada and Ichiyoshi 1990]. Primary measurements have just started on each PIM models. Some intermediate results are included in [Nakashima et al. 1992] and [Kumon et at. 1992]. Total evaluation of the PIM system will be done in the near future, however, some observations and discussions are included in section 6. 3 Characterizing the PIM and KLI system PIM and KL1 system have many advantageous features for very efficient parallel execution of large-scale knowledge processing which often shows very dynamic runtime characteristics and non-uniform computation, much different from numerical applications on vector processors and SIMD machines. This section clarifies the characteristics of the targeted problem domain shortly, and describes the various advantageous features of PIM and KL1 system, that are dedicated for the efficient programming and processing in the problem domain. They will give the total system image and help to clarify the difference and similarity of the system with other large-scale multiprocessors, recently available in the market. 3.1 Summary of Features The total image of PIM and KL1 system are briefly scanned as follows. Detailed features and their benefits, and reasons why they were chosen are presented in the following sections. Distributed memory MIMD machine: Global structure of the PIM is the distributed memory MIMD machine in which hundreds computation nodes are connected by highspeed network. Scalability and ease of implementations are focused. Each computation node includes single processor or several tightly-coupled processors, and large memory. Processors are dedicated for efficient symbolic proc cessing. Logic programming language: The kernel language KL1 is a concurrent logic programming language, which is single language for system and application descriptions. Language implementation and hardware design are based on the language specification. KL1 is not a high-level knowledge representation language nor a language for certain type of reasoning, but a general-purpose language for concurrent and parallel programming, especially suitable for symbolic computations. KL1 has many beneficial features to write parallel programs in those application domains, described below. Application domain: Primary applications are largescale knowledge processing and symbolic computation. However, large numerical computation with dynamic features, or with non-uniform data and non-uniform computation (non-data-parallel computation) are also targeted. Language implementation: One KL1 system is implemented on a distributed memory hardware, which is not a collection of many KL1 systems implemented on each processing node. A global name space is supported for code, logical variables, etc. Communication messages between computation nodes are handled implicitly in KL1 system, not by KL1 programs. An efficient implementation for small-grain concurrent processes is taken. 55 becomes small, large directory memory is needed, which increases the hardware cost. These implementations focus to realize the beneficial features of KL1 language for the application domains described before. Single assignment languages need special memory management such as dynamic memory allocation and garbage collection. These management should be done as locally as possible for the sake of efficiency. Local garbage collection requires separation of local and global address spaces with some indirect referencing mechanism or address translation, even in a scalable shared memory architecture. Merits of the low-cost communication in the shared memory architecture decrease significantly for such the case. Policy of load balancing: Load balancing between computation nodes should be controlled by KL1 programs, not by hardware nor by the language system automatically. Language system has to support enough functions and efficiency for the experiments of various loadbalancing schemes with software. 3.2 Basic Choices These are the reasons to choose the distributed memory structure. (1) Logic programming: The first choice was to adopt logic programming as the basis of the kernel language. The decision is mainly due to the insights of ICOT founders, who expected that logic programming was suitable for both knowledge processing and parallel processing. A history, from vague expectations on logic programming to the concrete design of the KL1 language, is explained in [Chikayama 1992]. (2) Middle-out approach: A middle-out approach of R&D was taken, placing the KL1 language as the central layer. Based on the language specification, design of the hardware and the language implementation started downward, and writing the PIMOS operating system and parallel" software started upward. (3) MIMD machine: The other choices concerned with basic hardware architecture. Dataflow architecture before mid 1980 was considered not providing enough performance against hardware costs, according to observations for research results in initial stage of the project. SIMD architecture seemed inefficient on applications with dynamic characteristics or low dataparallelism that are often seen in knowledge processing. MIMD architecture remained without major demerits and was most attractive from the viewpoint of ease of implementation with standard components. (4) Distributed memory structure: Distributed memory structure is suitable to construct very large system, and easy to implement. Recent large-scale shared memory machines with directory- based cache coherency mechanisms claims good scalability. However, when the block size (the coherency management unit) is large, the interprocessor communication with frequent small data transfer seems inefficient. KL1 programs require the frequent small data transfer. When the block size 3.3 Characterizing the Applications (1) Characterization: Characteristics of knowledge processing and symbolic computation are often much different from those of numerical computation on vector processors and SIMD machines. Problem formalizations for those machines usually based on data-parallelism, parallelism for regular computation on uniform data. However, the characteristics of knowledge and symbolic computations on parallel machines tend to be very dynamic and non-uniform. Contents and amount of computation vary dynamically depending on time and space. For example, when a heuristic search problem is mapped on a parallel machine, workload of each computation node changes drastically depending on expansion and pruning of the search tree. Also, when a knowledge processing system is constructed from many heterogeneous 0 bjects, each object arises non-uniform computation. Computation loads of these problems are hardly estimated before execution. Some classes of large numerical computation without data-parallelism also show the dynamic and non-uniform characteristics. Those problems which has dynamism and nonuniformity of computation are called the dynamic and non-uniform problems in this paper, implying not only the know ledge processing and symbolic computation but also the large numerical computation without data-parallelism. The dynamic and non-uniform problems tends to include the programs with more complex program structure than the dat-a-parallel problems. (2) Requirements for the system: Most of the software systems on recent commercial MIMD machines with hundreds of processors target the dataparallel computation, but they almost don't care other paradigms. 56 The dynamic and non-uniform problems arise new requirements mainly on software systems and a few on hardware systems, which are listed below. 1. Descriptive power for complex concurrent programs 2. Easy to remove bugs 3. Ease of dynamic load balancing 4. Flexibility for changing the load allocation and scheduling schemes to cope with difficulty on estimating actual computation loads before execution 3.4 Characterizing the Language This subsection itemizes several advantageous features of 1\.L1 that satisfy the requirements listed in the previous section. Features and characteristics of the concurrent logic programming language 1\.L1 are described in detail in [Chikayama 1992]. The first three features have been in GRC, the basic specifications of 1\.L1. These features make descriptive power of the language large enough to write complex concurrent programs. They are the features of concurrent programming to describe logical concurrency, independent from mapping to actual processors. (load allocation) and scheduling. They are the features for parallel programming to control actual parallelism among processing nodes. (5) is prepared for operating system supports. (6) is for the effici~ncy of practical programs. (4) Pragma: Pragma is a notation to specify goal allocation to processing nodes or specify execution priority of goals. Pragma doesn't affect the semantics of a program, but controls parallelism and efficiency of actual parallel execution. Pragmas are usually attached to goals after making sure that the program is correct anyway. It can be changed very easily_ because it is syntactically separated from the correctness aspect of a program. Pragma for load allocation: Goal allocation is specified with a pragma, @node(X). X can be calculated in programs. Coupled with (1) and (2), the load allocation pragma can realize very flexible load allocation. Also coupled with (3) and the pragma, 1\.L 1 can describe a dynamic load balancing program within a framework of the pure logic programming language without side-effect. Dynamic load balancing programs are hard to be written in pure functional languages without indeterminacy. (1) Dataflow synchronization: Communication and synchronization between 1\.L1 processes are performed implicitly at all within a framework of usual unification. It is based on the dataflow model. Implicitness is available even in a remote synchronization. The feature drastically reduces bugs of synchronization and communication compared with the case of explicit description using separate primitives. The single-assignment property of logic variables supports the feature. (2) Small-grain concurrent processes: The unit of concurrent execution in 1\.L1 is each body goal of clauses, which can be regarded as a process invocation. 1\.L1 programs can thus involve a large a~ount of concurrency implicitly. (3) Indeterminacy: A goal (or process) can test and wait for the instantiation of multiple variables concurrently. The first instantiation resumes the goal execution, and when a clause is committed (selected from clauses that succeed to execute guard goals), the other wait conditions are thrown away. This function is valuable to describe "non-rigid" processing within a framework of side-effect free language. Speculative computation can be dealt with, and dynamic load distribution can be also written. The next features have been included in 1\.L1 as extensions to GRC. (4) was introduced to describe mapping Pragma for execution priority: Execution priority is specified with a pragma, @priority(Y). More than thousands priority levels are supported to control goal scheduling in detail, without rigid ordering. Combination of (3) and the priority pragma realizes the efficient control of speculative computations. Large number of priority levels can be utilized in e.g. parallel heuristic search to expand good branch of the search tree at first. (5) Shoen function (meta-control for goal group) : The shoen function is designed to handle a set of goals as a task, a unit of execution and resource management. It is mainly used in PIMOS. Start, stop and abortion of tasks can be controlled. Limit of resource consumption can be specified. When errors or exception conditions occur, the status are frozen and reported outside the shoen. (6) Functions for efficiency: 1\.L1 has several builtin functions or data types whose semantics is understood within the framework of GRC but which has been provided for the sake of efficiency. Those functions hide demerits of side-effect free languages, and also avoid an increase of computational complexity compared with sequeontial programs. 57 3.5 Characterizing the Language Implementation Language features, just described in the previous section, satisfy the requirements for a system by the dynamic and non-umform problems discussed in section 3.3. Most of special features of the language implementation focused to enlarge those advantageous features of KLI language. (1) Implicit communication: Communication and synchronization among concurrent processes are implicitly done by unifications on shared logical variables. They are supported both in a computation node and between nodes. It is especially beneficial that a remote synchronization is done implicitly as well as local. A process (goal) can migrate between computation nodes only being attached a pragma, @node(X). When the process has reference pointers, remote references are generated implicitly between the computation nodes. The remote references are used for the remote synchronizations or communications. These functions hide the distributed memory hardware from the "concurrent programming". That is, programmers can design concurrent processes and their communications, independent from their allocations to a same computation node or different nodes. Only the "parallel programming" with pragmas, a design of load allocation and scheduling, has to concern with hardware structure and network topology. Implementation features of those functions are summarized below, including the features for efficiency. • Global name space on a distributed memory hardware - in which implicit pointer management among computation nodes are supported for logical variables, structured data and program code • Implicit data transfer caused by unifications and goal (process) migration • Implicit message sending and receiving invoked with data transfer and goal sending, including message composition and decomposition • Message protocols able to reduce the number of messages, and also protocols applicable to message outstripping (2) Small-grain concurrent processes: Efficient implementation of small-grain concurrent processes are realized, coupled with low-cost communications and synchronizations among them. Process scheduling with low-cost suspension and resumption, and priority management are supported. Efficient implementation allows actual use of a lot of small-grain processes to realize large concurrency. A large number of processes also gives flexibility for the mapping and load balancing. Automatic load balancing in a cluster is also supported. It is a process (goal) scheduling function in a cluster implemented with priority management. The feature hides multiprocessors in a cluster from programmers. They do not have to think about load allocation in a cluster, but only have to prepare enough concurrency. (3) Memory management: These garbage collection mechanisms are supported. • Combination of incremental garbage collection with subset of reference counting and stop-andcollect copying garbage collection • Incremental releasing of remote reference pointers between computation nodes with weighted reference counting scheme Dynamic memory management including garbage collections looks essential both for symbolic processing and for parallel processii.g of the dynamic and non-uniform problems. Because the single assignment feature, strongly needed for the problems, requires dynamic memory allocation and reclamation. Efficiency of garbage collectors is one of key features for practical language system of parallel symbolic processing. (4) Implementation of shoen function: Shoen represents a group of goals (processes) as presented in the previous subsection. Shoen mechanism is implemented not only in a computation node but also among nodes. Namely, processes in a task can be distributed among computation nodes, and still controlled all together with shoen functions. (5) Built-in functions for efficiency: Several builtin functions and data types are implemented to keep up with the efficiency of sequential languages. (6) Including as kernel functions: Figure 2 shows the relation of KLI implementat.ion and operating system functions. KLI implementation includes so called OS kernel functions such as memory management, process management and scheduling, communication and synchronization, virtual single name space, message composition and decomposition, etc. While, PIMOS includes upper OS functions like programming environment and user interface. The reason why the OS kernel functions are included in the KLI implementation is that the implementation needs to use those functions with as light cost as possible. Cost of those functions affect the actual 58 Application Programs I PIMOS = KL 1 Language == ~ KL 1 Parallel Implementation :--OS-K~-r~el--: ~ :L _______________ Functions : I PIM Hardware - Load distribution libraries, etc. - Utility programs ( ego shell ) - Programming environment ( ego complier, tracer, performance analizer ) - Program code management - User task management - Resource management (eg. 10 resources) - Memory management - Process management - Communication, synchronization, and scheduling Single name space on a distributed memory system - Network message composition and decomposition Figure 2: KLI Implementation and OS Functions execution efficiency of the advantageous features of KLI language, such as large number of small-grain concurrent processes, implicit synchronization and communication among them (even between remote processes), indeterminacy, scheduling control with large number of priority levels, process migration specified with pragmas, etc. Those features are indispensable for concurrent and parallel programming and efficient parallel execution of large-scale symbolic computation with dynamic characteristics, or large-scale non-data-parallel numerical computations. Considering a construction of ,similar purpose parallel processing system on a standard operating system, interface level to the OS kernel may be too high (or may arise too much overhead). Some reconstruction of OS implementation layers might be needed for the standard parallel operating systems for those large-scale computation with dynamic characteristics. 3.6 Policy of Load Balancing Such o[l = a]/[l = a] 2.3 Program and Database A module concept is introduced in order to classify knowledge and handle (local) inconsistencies. Let m be a module identifier (mid) (syntactically the same as an o-term) and a be an a-term, then m: a is a proposition, which means that m supports a. Given a mid m, an a-term a, and propositions PI,··· ,Pn, a rule is defined as follows: m :: a ¢= Pb ... ,Pn. which means that a module with a mid m has a rule such that if PI,··· ,Pn hold, a holds in a module with a mid m. If a mid is omitted in Pi, m is taken as the default and if m is omitted, the rule is held in all modules. a is called a head and PI,··· ,Pn is called a body. As an a- term can be separated into an 0- term and a set of constraints, the rule can be rewritten as follows: where a ~ oIICH, Pi ~ mi : oilCi , and C B = C1 U ... U Cn. CH is a head constraint and CB is a body constraint. Their domain is a set of labeled graphs. Note that constraints by a-terms in a body can be included in CB. Compared with conventional constraint logic programming, a head constraint is new. A module is defined as a set of rules with the same mid. We define the acyclic relation among modules, a submodule relation. This. works for rule i';heritance as follows: ml ~s m2 where intrinsic attributes are out of inheritance. According to the rule, we can get the following: 01 ~ 02, 02 I I{02· l ~ a} ===> od I{01·l ~ 01 ~ 02, od I{01. l ~ a} ===> 02 I I{02.1 ~ 01 ~ 02 ~ 03, odl{02.l ~ a} ===> odl{ol.l ~ 03 I I{03· l ~ ol[l~join(a,b)] where a set of constraints are reduced by the constraint solver. Subsumption Relation and Property Inheritance 01 01 ~ 02, 01 ~ 03, 02/[l--+ a], 031 [1--+ b] ===> ol(l--+ meet(a, b)] 01 ~ 02, 01 ~ 03, 02/(l ~ a], 03/[l ~ b] ===> The right hand side of "II" is a set of constraints about properties, where ~H and ~H are a partial order generated by Hoare ordering defined by ~ and ~, respectively, and ~H is the equivalence relation. If an attribute of a label 1 is not specified for an object 0, o is considered to have a property of 1 without any constraint. The semantics of oids is defined on a set of labeled graphs as a subclass of hypersets[Aczel 1988]: an oid is mapped to a labeled graph and an attribute is related to a function on a set of labeled graphs. In this sense, attributes can be considered methods to an object as in F-logic[Kifer and Lausen 1989]. The reason for adopting a hyperset theory as the semantic domain is to handle an infinite data structure. The details can be found in [Yasukawa et al. 1992]. 2.2 Multiple inheritance is defined upward and downward as the merging of constraints: m3 ~s m4 U (ms \ m6) where ml inherits a set of rules in m2, and m3 inherits a set ~f rules defined by set operations such as m4 U (ms \ m6). Set operations such as intersection and difference are syntactically evaluated. Even if a module is parametric, that is, the mid is an o-term with variables, the submodule relation can be defined. In order to treat the exception of rule inheritance, each rule has properties such as local and overriding: a local rule is not inherited to other modules and an overriding rule obstructs the inheritance of rules with the same head from other modules. 93 pay[year= 1992, dept=X)j[raise= Y) ¢::begin_transactionj A progmm or a database is defined as a set of rules with definitions of subsumption and submodule relations. Clearly, a program can be also considered as a set of modules, where an object may have different properties if it exists in different modules. Therefore, we can classify a knowledge-base into different modules and define a submodule relation among them. If a submodule relation is not defined among two modules, even transitively, an object with the same oid may have different (or even inconsistent) properties in its modules. The semantics of a program is defined on the domain of pairs of labeled graphs corresponding to a mid and an o-term. In this framework, we can classify a large-scaled knowledge-base, which might have inconsistencies, and store it in a QUIXOTe database. 2.4 Updating and Persistence has a concept of nested transaction and allows two kinds of database update: QUIXOTe 1) incremental insert of a database when issuing a query, and 2) dynamic insert and delete of o-terms and a-terms during query processing. employee[num=Z]/[dept=X,salary= W]j -employee[num= Zl/[salary = W)j +employee[num= Z)/[salary = N eW)j end_transaction II{N ew= W * Y}. where "j" specifies sequential execution in order to suppress AND-parallel execution, "+" means insert, and "-" means delete. Except for the objects to be deleted or rolled back during query processing, all (extensional or intensional) objects in a QUIXOT£ program are guaranteed to be persistent. Such persistent objects are stored in the underlying database management system (explained in the next section) or a file system. 2.5 Query Processing and the System QUIXOTe is basically a constraint logic programming language with object-orientation features such as object identity, complex object, encapsulation, type hierarchy, and methods. However, this query processing is different from conventional query processing because of the existence of oids and head constraints. For example, consider the following program: lot [num = X)/[prizel ~ a) ¢:: X ~ 2n. lot[num=X)j[prize2~b) We can issue a query with a new database to be added to the existing database. 1) corresponds to the case. For example, consider the following sequence of queries to a database DE: query sequence to DB ?- begin_transaction. ?- Q1 with DB 1 . ?- begin_transaction. ?- Q2 with DB 2 . ?- abort_transaction. ?- Q3 with DB3 · ?- Q4. ?- end_transaction. equivalent query ¢::::} ?- Q1 to DBUDB 1 ¢::::} ?- Q1 ¢::::} ?- Q1 to DBUDB 1 UDB3 to DBUDB 1 UDB3 After successful execution of the above sequence, DE is changed to DE U DBI U DE3. Each DEi may have definitions of a subsumption relation or a submodule relation, which are merged into definitions of the existing database, If necessary, the subsumption or submodule hierarchy is reconstructed. By rolling back the transaction, such a mechanism can also be used as hypothesis reasoning. 2) makes it possible to update an o-term or its (mutable) properties during query processing, where transactions are located as subtransactions of a transaction in 1). In order to guarantee the semantics of update, so-called AND- and OR-parallel executions are inhibited. For example, the following is a simple rule for updating an employees' salary: ¢:: X ~ 3n. lot[num=X)j[prizel ~ c) ¢:: X ~ 5n. where 2n is a type with a multiple of two. Given a query ?-lot[num = 30)j[prizel = X,prize2 = YJ, the answer is X ~ meet(a, c) and Y ~ b, that is, lot[num=30)/[prizel ~ meet(a, c), prize2 ~ b). First, because of the existence of oids, all rules which possibly have the same oid must be evaluated and merged if necessary. Therefore, in QUIXOTe, a query is always processed in order to obtain all solutions. Secondly, as a rule in QUIXOTe has two kinds of constraints, a head constraint and a body constraint, each of which consists of equations and inequations of dotted terms besides the variable environment, the derivation process is different from conventional constraint logic programming: where ·Gi is a set of sub goals and Ci is a set of constraints of the related variables. On the other hand, in QUIXOTe, each node in the derivation sequence is (G, A, C), where G is a set of subgoals, A is a set of assumptions consisting of a body constraint of dotted terms, and C is a set of conclusions as a set of constraints consisting of a head constraint and a variable environment. Precisely speaking, the derivation is not a sequence but a directed acyclic graph in 94 QUIXOTe, because some subsumption relation among assumptions and constraints might force the two sequences to merge: for example, (G,A,C) and (G,A,C') are merged into (G, A, CUC'). Therefore, the derivation is shown in Figure 4, where the environment to make (Go, Ao, 0) ~ ... ......... • Users can use databases through their application programs in ESP [Chikayama 1984] or KLI [Ueda and Chikayama 1990], and through the specific window interface called Qmacs. The environment is shown in Figure 5. The first version of QUIXOTe was released in December, 1991. A second version was released in April, 1992. Both versions are written in KL1 and work on parallel inference machines (PIMs) [Goto et al. 1988] and its operating system (PIMOS) [Chikayama et al. 1988}. 3 Advanced Database Management System (Kappa) In order to process a large database in QUIXOTe efficiently, a database engine called Kappa has been developed 5. In this section, we explain its features. 3.1 Figure 4: Derivation in QUIXOTe it possible to merge two sequences is restricted: only results by the, so-called, OR-parallel that includes rules inherited by subsumption relation among rule heads can be merged innermostly. The current implementation of query processing in QUIXOTe is based on a tabular method such as OLDT in order to obtain all solutions. Sideways information passing is also implemented by considering not only binding information but also property inheritance. We list some features of the QUIXOTe system: • A QUIXOTe program is stored in persistent storage in the form of both the 'source' code and the 'object' code, each of which consists of four parts: control information, subsumption relation, submodule relation, and a set of rules. Persistence is controlled by the persistence manager, which switches where programs should be stored. A set of rules in the 'object' code is optimized to separate extensional and intensional databases as in conventional deductive databases. • When a user builds a huge database in QUIXOTe, it can be written as a set of small databases independently of a module concept. These can be gathered into one database, that is, a database can be reused in another database. • vVhen a user utilizes data and knowledge III QUIXOTe, multiple databases can be accessed simultaneously through the QUIXOTe server, although the concurrency control of the current version of QUIXOTe is simply implemented. Nested Relation and QuIXOT£ The problem is which part of QUIXOTe should be supported by a database engine because enriched representation is a trade-off in efficient processing. vVe intend for the database engine to be able to, also, play the role of a practical database management system. Considering the various data and knowledge in our knowledge information processing environment, we adopt an extended nested relational model, which corresponds to the class of an o-term without infinite structure in QUIXOTe. The term "extended" means that it supports a new data type such as Prolog term and provided extensibility as the system architecture for various applications. The reason why we adopt a nested relational model is, not surprisingly, to achieve efficient representation and efficient processing. Intuitively, a nested relation is defined as a subset of a Cartesian product of domains or other nested relations: NR ~ Ei .. - EI x··· x En D 12NR where D is a set of atomic values. That is, the relation may have a hierarchical structure and a set of other relations as a value. This corresponds to the introduction of tuple and set constructors. From the viewpoint of syntactical and semantical restrictions, there are various subclasses. Extended relational algebra are defined to each of these. In Kappa's nested relation, a set constructor is used only as an abbreviation of a set of normal relations as follows: {r[ll =a, /2= {b l ,"', bn }]} {=:::::} {r[II = a, 12 = bl ], ... ,r[ll = a, 12 = bn ]} -----------------5S ee the details in [Kawamura et al. 1992]. 95 KLI Program ESP Program Qmacs currently active Applications on PIM or FEP (PSI) Figure 5: Environment of The operation of "=>" corresponds to an unnest operation, while the opposite operation ("~") corresponds to a nest or group-by operation, although "~" is not necessarily congruent for application of nest or groupby operation sequences, That is, in Kappa, the semantics of a nested relation is the same as the coresponding relation without set constructors. The reason for taking such semantics is to retain first order semantics for efficient processing and to remain compatible to widely used relational databases. Given a nested tuple nt, let the corresponding set of tuples without a set constructor be nt. Let a nested relation be NR= {ntl,···,nt n } where nti= {til'···' tid for i = 1,···, n, then the semantics of N R is n Unti = {t ll , · · · , t 1 k,···, t nl ,···, tnk}. i=l Extended relational algebra to this nested relational database is defined in Kappa and produces results according to the above semantics, which guarantees to produce the same result to the corresponding relational database, except for treatment of the label hierarchy. A query can be formulated as a first order language, we, generally, consider this in the form of a rule constructed by nested tuples. As the relation among facts in a database is conjunctive from a proof-theoretic point of view, the semantics of a rule is clear according t.o the above semantics. For example, the following rule r[ll =X, 12 = {a, b, c}] ~ B, 1"[12= Y, 13 = {d, e}, 13= Zl, B'. can be transformed into the following set of rules without set constructors: QUIXOTE r[11=X,12=a] ~ B, r'[1 2= Y, 13 = d, 13 = Zl, r'[12 = Y, 13 = e, 13 = Z], B'. r[11=X,12=b] ~ B, r'[1 2= Y, 13=d, 13= Z], r'[12 = Y, 13 =e, 13= Z], B'. r[11=X,1 2=c] ~ B, r'[l2 = Y, l3= d, /3= Z], r'[12 = Y, 13 = e, l3= Z], B'. That is, each rule can also be unnested. The point of efficiently processing Kappa relations is to reduce the number of unnest and nest operations: that is, to process sets as directly as possible. Under the semantics, query processing to nested relations is different from conventional procedures in logic programming. For example, consider a simple database consisting of only one tuple: r[lt = {a,b},12 = {b,c}]. For a query ?-r[ll = X, /2 = X], we can get X = {b}, that is, an intersection of {a, b} and {b, c}. That is, a concept of unification should be extended. In order to generalize such a procedure, we must introduce two concepts into the procedural semantics[Yokota 1988]: 1) Residue Goals Consider the following program and a query: r[I=5'] ~ B. ?-r[l = 5]. If 5 n 5' is not an empty set during unification between r[l = 5] and r[l = 5'], new subgoals are to be r[l = 5 \ 5'], B. That is, a residue subgoal r[l = 5 \ 5'] is generated if 51 \ 52 is not an empty set, otherwise the unification fails. Note that there might be residue subgoals if there are multiple set values. 2) Binding as Constraint Consider the following database and a query: rl[l1=5d· 96 1'2[12= S2]' ?-rd!l = X], 1'2 [12 = X]. Although we can get X = Sl by unification between rdl1 = Xl and rdh = Sl] and a new subgoal 1'2[12 = SIl, the subsequent unification results in r2[l2=SlnS2] and a residue subgoal r2[l2= Sl \S2]' Such a procedure is wrong, because we should have an answer X = Sl n S2. In order to avoid this situation, the binding information is temporary and plays the role of constraints to be retained: rdl1 = X], 1'2[12 = X] ==? r2[l2 = X]II{X C Sd ==? II{X c Sl n Sd· There remains one problem where the unique representation of a nested relation is not necessarily decided in the Kappa model, as already mentioned. In order to decide a unique representation, each nested relation has a sequence of labels to be nested in Kappa. As the procedural semantics of extended relational algebra in Kappa is defined by the above concepts, a Kappa database does not necessarily have to be normalized also in the sense of nested relational models, in principle. That is, it is unnecessary for users to be conscious of the row nest structure. Furthermore, nested relational model is well known to reduce the number of relations in the case of multivalue dependency. Therefore, the Kappa model guarantees more efficient processing by reducing the number of tuples and relations, and more efficient representation by complex construction than the relational model. 3.2 Features of Kappa System The nested relational model in Kappa has been implemented. This consists of a sequential database management system J(appa-II [Yokota et al. 1988] and a parallel database management system J(appa-P [Kawamura et al. 1992]. Kappa-II, written in ESP, works on sequential inference machines (PSIs) and its operating system (SIMPOS). Kappa-P, written in KL1, works on parallel inference machines (PIMs) and its operating system (PIMOS). Although their architectures are not necessarily the same because of environmental differences, we explain their common features in this subsection, • Data Type As Kappa aims at a database management system (DBMS) in a knowledge information processing environment, a new data type, term, is added. This is because various data and knowledge are frequently represented in the form of terms. Unification and matching are added for their operations. Although unification-based relational algebra can emulate the derivation in logic programming, the features are not supported in Kappa because the algebra is not so efficient. Furthermore, Kappa discriminates one-byte character (ASCII) data from two-byte character (JIS) data as data types. It contributes to the compression of huge amounts of data such as genetic sequence data. • Command Interfaces Kappa provides two kinds of command interface: basic commands as the low level interface and extended relational algebra as the high level interface. In many applications, the level of extended relational algebra, which is expensive, is not always necessary. In such applications, users can reduce the processing cost by using basic commands. In order to reduce the communication cost between a DBMS and a user program, Kappa provides user-definable commands, which can be executed in the same process of the Kappa kernel (in Kappa-II) or the same node of each local DBMS (in Kappa-P, to be described in the next subsection). The user-definable command facility helps users design any command interface appropriate for their application and makes their programs run efficiently. Kappa's extended relational algebra is implemented as parts of such commands although it is a built-in interface. • Practical Use As already mentioned, Kappa aims, not only at a database engine of QUIXOTE, but also at a practical DBMS, which works independently of QUIXOTE. To achieve this objective, there are several extensions and facilities. First, new data types, besides the data types mentioned above, are introduced in order to store the environment under which applications work. There are list, bag, and pool. They are not, however, supported fully in extended relational algebra because of semantic difficulties. Kappa supports the same interface to such data types as in SIMPOS or PIMOS. In order to use Kappa databases from windows, Kappa provides a user-friendly interface, like a spreadsheet, which provides an ad hoc query facility including update, a browsing facility with various output formats and a customizing facility. • Main A1emory Database Frequently accessed data can be loaded and re- 97 Kappa is equipped with an efficient address translation table between an ntid and a logical page (ip), and between a logical page and a physical page (pp). This table is used by the underlying file system. For extraction purposes, each node of tained in the main memory as a main memory database. As such a main memory database was designed only for efficient processing of temporary relations without additional burdens in Kappa, the current implementation does not support conventional mechanisms such as deferred update and synchronization. In Kappa-P, data in a main memory database are processed at least three times more efficiently than in a secondary storage database. Index nested relatin From an implementational point of view, there are several points for efficient processing in Kappa. We explain two of them: • ID Structure and Set Operation Each nested tuple has a unique tuple identifier (ntid) in a relation, which is treated as an 'object' to be operated explicitly. Abstractly speaking, there are four kinds of 'object's, such as a nested t1lple, an ntid, a set whose element is a ntid, and a relation whose element is a nested tuple. Their commands for transformation are basically supported, as in Figure 6, although the set t nested tuple lp Ip ntid Figure 7: Access Network for Secondary DBMS a nested tuple has a local pointer and counters in the compressed tuple, although there is a trade-off in update operations' efficiency. nested tuple Each entry in an index reflects the nested structure: that is, it contains any necessary sub-ntids. The value in the entry can be the result of string operations such as substring and concatenation of the original values, or a result extracted by a user's program. nested relation Figure 6: 'Object's in Kappa and Basic Operations is treated as a stream in Kappa-P. Most operations are processed in the form of an ntid or a set. In order to process a selection result, each subtupIe in a nested tuple also has a sub-ntid virtually. Set operations (including unnest and nest operation) are processed mainly in the form of a (sub)ntid ·or a set without reading the corresponding tuples. • Storage Structure A nested tuple, which consists of unnested tuples in the semantics, is also considered as a set of unnested tuples to be accessed together. So, a nested tuple is compressed without decomposition and stored on the same page, in principle, in the secondary storage. For a huge tuple, such as a genetic sequence, contiguous pages are used. In order to access a tuple efficiently, there are two considerations: how to locate the necessary tuple efficiently, and how to extract the necessary attributes efficiently from the tuple. As in Figure 7, 3.3 Parallel Database System (Kappa-P) Management Kappa-P ha.s va.rious unique fea.tures as a parallel DBMS. In this subsectio~, we give a brief overview of them. The overall configuration of Kappa- P is shown in Figure 8. There are three components: an interface (J/F) process, a server DBMS, and a local DBMS. An IfF process, dynamically created by a user program, mediates between a user program and (server or local) DBMSs by streams. A server DBMS has a global map of the location of local DBMSs and makes a user's stream connect directly to an appropriate local DBMS (or multiple local DBMSs). In order to avoid a bottleneck in communication, there might be many server DBMSs with replicates global maps. A local DBMS can be considered as a single nested relational DBMS, corresponding to Kappa-II, where users' data is stored. 98 It;F rocessl ( It;F roceSS2 ( ) Server DBMS m Server DBMS 2 ) IfF Processk ..................................•................... '" Local DBMS l ----~~----- Local DBMS 2 ,--~~------ Local DBMS n Figure 8: Configuration of Kappa-P Users' data may be distributed (even horizontally partitioned) or replicated into multiple local DBMSs. If each local DBMS is put in a shared memory parallel processor, called a cluster in PIM, each local DBMS works in parallel. Multiple local DBMSs are located in each node of distributed memory parallel machine, and, together, behave like a distributed DBMS. User's procedures using extended relational algebra are transformed into procedures written in an intermediate language, the syntax of which is similar to KLl, by an interface process. During the transformation, the interface process decides which local DBMS should be the coordinator for the processing, if necessary. Each procedure is sent to the corresponding local DBMS, and processed there. Results are gathered in the coordinator and then processed. Kappa-P is different from most parallel DBMS, in that most users' applications also work in the same parallel inference machine. If Kappa-P coordinates a result from results obtained from local DBMSs, as in conventional distributed DBMSs, even when such coordination is unnecessary, the advantages of parallel processing are reduced. In order to avoid such a situation, the related processes in a user's application can be dispatched to the same node as the related local DBMS as in Figure 9. This function contributes not only to efficient processing but also to customization of the cOlmnand interface besides the user-defined command facility. 4 Applications vVe are developing three applications on QUIXOTE and Kappa, and give an overview of each research topic in this section. Figure 9: User's Process in Kappa Node 4.1 Molecular Biological Database Genetic information processing systems are very important not only from scientific and engineering points of view but also from a social point of view, as shown in the Human Genome Project. Also, at ICOT, we are engaged in such systems from thr viewpoint of knowledge information processing. In this subsection, we explain such activities, mainly focusing on molecular biological databases in QUIXOTE and Kappa 6. 4.1.1 Requirements for Databases Molecular Biological Although the main objective of genetic information processing is to design proteins as the target and to produce them, there remain too many technical difficulties presently. Considering the whole of proteins, we are only just able to gather data and knowledge with much noise. In such data and knowledge there are varieties such as sequences, structures, and functions of genes and proteins, which are mutually related. A gene in the 6S ee the details in [Tanaka 1992]. 99 genetic sequence (DNA) in the form of a double helix is copied to a mRN A and translated into an amino acid sequence, which becomes a part (or a whole) of a protein. Such processes are called the Central Dogma in biology. There might be different amino acids even with the same functions of a protein. The size of a unit of genetic sequence data ranges from a few characters to around 200 thousand, and w111 become longer as genome data is gradually analyzed fyrther. The size of a human genome sequence equals about 3 billion characters. As there are too many unknown proteins, the sequence data is fundamental for homology searching by a pattern called a motif and for multiple alignment a.mong sequences for prediction of the functions of unknown proteins from known ones. There are some problems to be considered for molecular biological databases: • how to store large values, such a.s sequences, and process them efficiently, • how to represent structure data and what operations to apply them, • how to represent functions of protein such as chemical reactions, and • how to represent their relations and link them. From a database point of view, we should consider some points in regard to the above data and knowledge: • representation of complex data as in Figure 2, • treatment of partial or noisy information in unstable data, • inference rules representing functions, as in the above third item, and inference mechanisms, and • representation of hierarchies such as biological concepts and molecular evolution. After considering the above problems, we choose to build such databases on a DOOD (QUIxoTE, conceptually), while a large amount of simple data is stored in Kappa-P and directly operated through an optimized window interface, for efficient processing. As cooperation with biologists is indispensable in this area, we also implemented an environment to support them. The overall configuration of the current implementation is shown in Figure 10. 4.1.2 Molecular Biological Information 111 QuXXOT£ and Kappa Here, we consider two kinds of data as quence data and protein function data. First, consider a DNA sequence. Such need inference rules, but needs a strong homology searching. In our system, such examples: sedata does not capability for data is stored Interface for Biologists Molecular Biological Applications Kappa-P Figure 10: Integrated System on QUIXOTE and Kappa directly in Kappa, which supports the storage of much data as is and creates indexes from the substrings extracted from the original by a user program. Sequenceoriented commands for information retrieval, which use such indexes, can be embedded into Kappa a.s userdefined commands. Furthermore, since the complex record shown in Figure 3 is treated like a nested relation, the representation is also efficient. Kappa shows its effectiveness as a practical DBMS. Secondly, consider a chemical reaction of enzymes and co-enzymes, whose scheme is as follows: Sources + Co-enzymes Enzymes ===} Products Environments As an example of metabolic reaction, consider the Krebs cycle in Figure 11. Chemical reactions in the Krebs cycle are written as a set of facts in QUIXOTE as in Figure 12. In the figure, 01 ~ 02/[' .. J means oIl[' .. J and 01 ~ 02' In order to obtain a reaction chain (path) from the above facts, we can write the following rules in QUIXOTE: reaction[jrom =X, to = YJ ¢: H! ~ reaction/[sources+ f-- X, products+ f-- ZJ, reaction[jrom=X, to= Y] II{ {X, Y, Z} ~ reaction}. reaction[jrom=X, to=X] ¢:II{X ~ reaction}. Although there are a lot of difficulties in representing such functions, QUIXOTE makes it possible to write them down easily. Another problem is how to integrate a Kappa database with a QUIXOTE database. Although one of the easiest ways is to embed the Kappa interface into QUIXOTE, it costs more and might destroy a uniform representation in QUIXOTE. A better way would be to manage common oids both in Kappa and in QUIXOTE, and guarantee the common object, however we have 100 pyruvate • acetyl-CoA " oxyaloaceta" ~ (8) malate t(7) (1) Krebs Cycle fumarate ~) • ~trate (2) ~ (2) Ie.is-aconitate t (~(nsocitrate succinate ~) (4Va-ketOglutarate krebs_cycle :: {{ krebsl r;;reaction/ [sources'" f - {acetylcoa, oxaloacetate} , products+ f - {citrate, coa}, enzymes f - citrate_synthase, energy = -7.7]. krebs2 r;; reaction! [sources-f f - citrate, products+ f - {isocitrate, h2o}, enzymes f - aconitase]. succinyl-CoA ENZYMES (1) citrate synthase (2) aconitate (3) isocitrate dehydrogenase (4) a-ketoglutarate dehydrogenase complex (5) succinyl-CoA synthetase (6) succinate dehydrogenase (7) fumarase (8) malate dehydrogenas Figure 11: Krebs Cycle in Metabolic Reaction not implemented such a facility in Kappa. The current implementation puts the burden of the uniformity on the user, as in Figure 10. 4.2 Legal Reasoning System (TRIAL) Recently, legal reasoning has attracted much attention from researchers in artificial intelligence, with high expectations for its big application. Some prototype systems have been developed. We also developed such a system as one of the applications of our DOOD system 7 4.2.1 Requirements for Legal Reasoning Systems and TRIAL First, we explain the features of legal reasoning. The analytical legal reasoning process is considered as consisting of three steps: fact findings, statutory interpretation, and statutory application. Although fact findings is very important as the starting point, it is too difficult for current technologies. So, we assume that new cases are already represented in the appropriate form for our system. Statutory interpretation is one of the most interesting themes from an artificial intelligence point of view. Our legal reasoning system, TRIAL, focuses on statutory interpretation as well as statutory application. 7See the details in [Yamamoto 1990], although the new version is revised as in this section. krebs8 r;; reaction! [sources-f f - malate, products+ f - oxaloacetate, enzymes f - malate_dehydrogenase, energy = 7.1]. }} Figure 12: Facts of Krebs Cycle in QUIXOTe Although there are many approaches to statutory interpretation, we take the following steps: • analogy detection Given a new case, similar precedents to the case are retrieved from an existing precedent database. • rule transformation Precedents (interpretation rules) extracted by analogy detection are abstracted until the new case can be applied to them. • deductive reasoning Apply the new case in a deductive manner to abstract interpretation rules transformed by rule transformation. This step may include statutory application because it is used in the same manner. Among the steps, the strategy for analogy detection is essential in legal reasoning for more efficient detection of better precedents, which decides the quality of the results of legal reasoning. As the primary objective of TRIAL at the current stage is to investigate the possibilities of QUIXOTe in the area and develop a prototype system, we focus only on a small target. That is, to what extent should interpretation rules be abstl'acted for a new case, in order to get an answer with a plausible explanation, but not for general abstraction mechanism. 4.2.2 TRIAL on Legal Precedent Databases All data and knowledge in TRIAL is described in QUIXOTe. The system, written in KL1, is constructed on QUIXOTE;. The overall architecture is shown in Figure 13. In the figure, QUIXOTE; supports the functions of rule transformation (Rule Transformer) and deductive reasoning (Deductive Reasoner) as the native functions besides the database component, while TRIAL 101 casel :: judge[case=X]/[judge----t job-causality] ¢=rel[state= Y, emp= Z]/[cause=X] II{X ~ parrn.case, Interface Component ........................................................................................... Y~parm.status, Z ~parm.emp};; case2 :: judge[case=X]/[judge----t job-execution] ¢= Xj[while = Y, result = Z], Y~job ~easoner Comp?nent : : :........................................................... : ~ : : :: ~ j Rule j ~ Transformer : : ............................ :: : j j QUIXOTe Database Component j.............................. ~ (------ : .......................................................................................... : Figure 13: Architecture of TRIAL supports the function of analogy detection (Analogy Detector) besides the interface component. Consider a simplified example related to "karoshi" (death from overwork) in order to discuss the analogy detector. A new case, new-case, is as follows: j'vfary, a driver, employed by a company, '(5)), died from a heart-attack while taking a catnap between jobs. Can this case be applied to the worker's compensation law? This is represented as a module new-case in as follows: QUIXOTe new-case:: {{new-casej[who=mary, while = catnap, result = heart-attack]; ; rel[state = employee, emp=mary] j[affil = org[name = "5"], job----t driver]}} where ";;" is a delimiter between rules. The module is stored ill the new case database. Assume that there are two abstract precedents 8 of job-causality and jobexecution: SIn this paper, we omit the rule transformation step and assume that abstract interpretation rules are given. II{X ~ parm.case, Y ~parm.while, Z ~parm.result}. Note that variables X, Y, and Z in both rules are restricted by the properties of an object parm. That is, they are already abstracted by parm and their abstract level is controlled by parm's properties. Such precedents are retrieved from the precedent database by analogy detection and abstracted by rule transformation. We must consider the labor'-law (in the statute database) and a theory (in the theory database) as follows: labor-law:: org[name=X] /[1'esp----t c01npensation[obj = Y, money = salary]] ¢=judge[case----t case] /[who= Y, result - t disease, judge ----t insurance], r'el[state= Z, emp= Y] j[affil =org[name=Xjj. theory:: Judge[case= X]/[judge----t insurance] ¢=judge[case = X]j[judge ----t job-causality], judge[case= X]/[judge----t job-execution] II{X ~ case}. Furthermore, we must define the parm object as follows: parm :: parm/[case=case, state = rel, while = job, result = disease, emp = person]. In order to use parm for casel and case2, we define the following submodule relation: parm ~s casel U case2' This information is dynamically defined during l'ule transfol'mation. Furthermore, we must define the subsumption relation: case rel disease job person job-causality job-execution ~ ~ ~ ~ ~ ~ ~ new-case employee heart-attack catnap mary znsurance znsurance \02 Such definitions are stored in the dictionary in advance. Then, we can ask some questions with a hypothesis to the above database: 1) If new-case inherits parm and theory, then what kind of judgment can we get? ?-new-case : judge[case = new-case]j[judge =X] if new-case ;;Js parm U theory. we can get three answers: • X = job-execution • if new-case : judger case = new-case] has a property judge ~ job-causality, then X ~ insurance • if new-case : rel[state = employee, emp mary] has a property cause = new-case, then X ~ insurance Two of these are answers with assumptions. 2) If new-case inherits labor-law and parm, then what kind of responsibility should the organization which Mary is affiliated to have? ?-new-case : org[name= "S"]j[resp=X] if new-case ;;Js parm U labor-law. we can get two answers: • if new-case: judge[case = new-case] has a property judge ~ job-causality, then X ~ compensation [obj =mary, money = salary] • if new-case:rel[state=employee, emp=mary] has a property cause = new-case, then X ~ compensation [obj = mary, money = salary] For analogy detection, the parm object plays an essential role in determining how to abstract rules as in casel and case2, what properties to be abstracted in parm, and what values to be set in properties of parm. In TRIAL, we have experimented with such abstraction, that is, analogy detection, in QUIXOTE. For the user interface of TRIAL, QUIXOTE returns explanations (derivation graphs) with corresponding answers, if necessary. The TRIAL interface shows this graphically according to the user's request. By judging an answer from the validity of the assumptions and the corresponding explanation, the user can update the database or change the abstraction strategy. 4.3 Temporal Inference Temporal information plays an important role in natural language processing. A time axis in natural language is, however, not homogeneous as in natural science but is relative to the events in rriind: shrunken in parts and stretched in others. Furthermore, the relativity is different depending on the observer's perspective. This work aims to show the paradigm of an inference system that merges temporal information extracted from each lexical item and resolves any temporal ambiguity that· a word may have 9. 4.3.1 Temporal Information in Natural Language We can, frequently, make different expressions for the same real situation. For example, Don Quixote attacks a windmill. Don Quixote attacked a windmill. Don Quixote is attacking a windmill. Such different expressions are related to tense and aspects. How should we describe the relation between them? According to situation theory, we write a support relation between a situation s and an in/on (7 as follows: s p (7. For example, if one of the above examples is supported in a situation s, it is written as follows: s p~ attack, Don Quixote, windmill ~, where attack is a relation, and "Don Quixote" and windmill are parameters. However, strictly speaking, as such a relation is cut out from a prespective P, we should write it as follows: s p (7 -¢::::? P(s' p (7'). Although we might nest perspectives on such a relation, we assume some reflective property: P(s' p (7') ===? P(s')P(p)P((7'). In order to consider how to represent P(s') and P( (7') from a temporal point of view, we introduce a partial order relation among sets of time points. Assume that a set of time points are partially ordered by ::S, then we can define ::St and ~ among sets TI and T2 as follows: TI :5t T2 d~ Vt l E TI, Vt 2 E T 2. tl ::::; t 2. TI ~ T2 d.;j Vt 1 E T 1 . tl E T 2. We omit the subscript t if there is no confusion. In order to make tense and aspects clearer, we introduce the following concepts: 9S ee the details in [Tojo and Yasukawa 1992]. 103 1) discrimination of an utterance situation u and a described situation s, and inj[v_rel = [rel = R, cls=CLS, per=P], args = Args], 2) duration (a set of linear time points, decided by a start point and an end point) of situations and an infon. The duration of T is written as II Tilt. We can see the relation among three durations of an utterance situation, a described situation, and an infon in Figure 14. If there is no confusion, we use a simple O the.utte~ance SItuatIon 1 a mental time axi~ <:=) • : mental time of a : mental location of s : mental location of u Figure 14: Relation of Three Durations notation: SI :::; S2 instead of II SI lit:::; II s211t and SI ~ S2 instead of II Sll1t~1I s211t. By the above definitions, we can define tense and aspects when s F a as follows (P(F) is written as F): s[s :::; s[s J s[s C s[a:::; u] u] u] u] F F F 1= ~ past,a». ~ present, a ~ ~ where v_reI takes a verb relation and args takes the arguments. R is a verb, CLS is the classification, and P is a temporal situation. For example, "John is running" is written as follows: inj[v_rel = [reI = run, cIs = act 2 , = [jov = ip, pov = pres]], args = [agt = john]]. pers That is, the agent is john, and the verb is run, which is classified in act2 (in-progress state or res7.Litant state), and the perspective is in-progress state as the field of view (an oval in Figure 14) and present as the point of view (. in Figure 14). The discourse situation which supports such a verbalized infon is represented as follows: dsit[jov = ip, pov = pres, src = » . progressive, a perfect, a » . ». where s is a described situation, u is an utterance situation, and a is an infon. C in s[ C] is a constraint, which is intended to be a perspective. The above rules are built-in rules (or axioms) for temporal inference in UJ, where the first two arguments are the same as the above infon's pers and the third argument is the utterance situation. According to the translation, we show a small example, which makes it possible to reduce temporal ambiguity in expression. QUIXOTE:. 4.3.2 Temporal Inference in QuIXOTE: We define a rule for situated inference as follows: where s, SI, •.. ,Sn are situations with perspectives. This rule means that S F a if SI 1= aI, " ' , and Sn 1= an' Such rules can be easily translated into a subclass of QUIXOTE: by relating a situation with perspectives to a module, an infon to an o-term, and partial order among duration to subsumption relation. However, there is one restriction: a constraint in a rule head may not include subsumption relations between o-terms, because such a relation might destroy a subsumption lattice. A verbalized infon is represented as an o-term as follows 10: lOAn o-term T[/I [II 01,"', In 02]' = = = ol,···,ln = 02] can be abbreviated as 1) Given an expression exp = E, each morpheme is processed in order to check the temporal information: mi[7.l=U, exp=[], e=D,infon=Infon]. mi[u= U, exp= [ExpIR], e=D, infon= Infon] ¢.d_cont[exp=Exp, sit =D, infon = Infon, mi[u=U, exp=R,e=D,infon=Infon]. Temporal information for each morpheme is intersected in D: that is, ambiguity is gradually reduced. 2) Temporal information in a pair of a discourse situation and a verbalized infon is defined by the following rules: 104 d_cont[exp=Exp, sit = dsit[Jov =Fov, pov= Pov, src= U] infon =inf[v_rel = V _rel, args= Args]] ¢=diet : v[cls=CLS,rel=R,jorm=Exp] II{V _rel= [rel=R, cls=CLS,pers= P]} d_cont[ exp = Exp, sit = dsit[Jov= Fov, pov = Pov, src= U] infon=inf[v_rel= V _rel, args= Args]] ¢=diet : auxv[asp=ASP, form= Exp], map[cis=CLS, asp=ASP,fov=Fov] II{V _rel= [rel= _, cis =CLS,pers= P], P= [Jov=Fov,pov = _};; d_cont [exp = Exp, sit = dsit[J ov = Fov, pov = Pov, sre= U] infon =inJ[v_rel = V _rel, args = ArgslJ ¢=dict : affix[pov = Pov, form=ru] II{V _rei = [rel = _, cis = _, pers = P], P = [Jov= _,pov=Pov]} 3) There is a module diet, where lexical information is defined as follows: diet:: {{ v[cis = aetI,rel = puLon, form =ki];; v[cls = aet 2 , rel = run, form =hashi];; v[cls = aet 3 , rei = understand, form =waka]; ; auxv[asp = state, form =tei];; affix[pov = pres,form =ru];; affix[pov = past, form =ru]}} where form has a value of Japanese expression. Further, mapping of field of view is also defined as a set of (global) facts as follows: more general framework of grammar in logic programming, HPSG and JPSG are considered to be better, because morphology, syntax, semantics, and pragmatics are uniformly treated as constraints. From such a point of view, we developed a new constraint logic programming (CLP) language, cu-Prolog, and implemented a JPSG (Japanese Phrase Structure Grammar) parser in it 11. 5.1.1 Constraints in Unification-Based Grammar First, consider various types constraint-based grammar: of constraints III • A disjunctive feature structure is used as a basic information structure, defined like nested tuples or complex objects as follows: 1) A feature structure is a tuple consisting of pairs of a label and a value: [h =Vl,"', In=v n]. 2) A value is an atom, a feature structure, or a set {fl,' .. ,fn} of feature structures. • In JPSG, grammar rules a.re descri.bed in the form of a binary tree as in Figure 15, each node of which is a feature structure: in which a specific map[cls = aet1,asp = state,fov = {ip,tar,res}]. map[cis = aet2,asp = state,fov = {ip,res}]. map[ cls = aet3, asp = state, f ov = {tar, res}]. If some Japanese expression is given in a query, the corresponding temporal information is returned by the above program. dependenLdaughter "0 head_daughter H Figure 15: Phrase Structure in JPSG 5 Towards More Flexible Systems In order to extend a DOOD system, we take other approaches for more flexible execution control, mainly focusing on natural language applications as its examples. 5.1 Constraint Thansformation There are many natural language grammar theories: transformational and constraint-base grammar such as GB, unification-based and rule-based gra~ar such as GPSG and LFG, and unification-based and constraintbased grammar such as HPSG and JPSG. Considering a feature (attribute) decides whether D works as a complement or as a modifier. Note that each grammar, called a structural principle, is expressed as the constraints among three features, lvI, D, and H, in the local phrase structure tree. As shown in the above definition, feature structures are very similar to the data structure in DOOD 12. VVe will see some requirements of natural language processing for our DOOD system and develop applications_ on the DOOD system. llSee the details in [Tsuda 1992]. 12This is one of the reason why we decided to 'design QUIXOTE. See the appendix. 105 5.1.2 eu-Prolog In order to process feature structures efficiently, we have developed a new CLP called cu-Prolog. A rule is defined as follows 13: where H, B 1 , ••• ,Bn are atomic formulas, whose arguments can be in the form of feature structures and are constraints in the form of an equation 1 , •.• among feature structures, variables, and atoms, or an atomic formula defined by another set of rules. There is a restriction for an atomic formula in constraints in order to guarantee the congruence of constraint solving. This can be statically checked. The semantic domain is a set of relations of partially tagged trees, as in CIL[Mukai 1988] and the constraint domain is also the same. The derivation in cu-Prolog is a sequence of a pair (G, e) of a set of subgoals and a set of constraints, just as in conventional CLP. Their differences are as follows: e ,em • All arguments in predicates can be feature structures, that is, unification between feature structures is necessary. • A computation rule does not select a rule which does not contribute to constraint solving: in the case of ({A} U G, e), A' ~ Bile', and AO = A'O, the rule is not selected if a new constraint CO U e' 0 cannot be reduced. • The constraint solver is based on unfold/fold transformation, which produces new predicates dynamically in a constraint part. 'Disjunction' in feature structures of cu-Prolog is treated basically as 'conjunction', just as in an o-term in QUIXOT£ and a nested term in Kappa (CRL). However, due to the existence of a predicate, disjunction is resolved (or unnested) by introducing new constraints and facts: H ~p([l={a,b}]) {::=} H ~ p([l=XDII{new-p(X)}. new_p(a). new_p(b). That is, in cu-Prolog, disjunctive feature structures are processed in OR-parallel, in order to avoid set unification as in CRL. Only by focusing on the point does the efficiency seem to depend on whether we want to obtain all solutions or not. One of the distinguished features in cu-Prolog is dynamic unfold/fold transformation during query processing, which contributes much to improving the efficiency of query processing. Some examples of a JPSG parser 13 As we are following \'lith the syntax of QUIXOT£, the following notation is different from eu-Prolog. in cu-Prolog appear in [Tsuda 1992]. As predicatebased notation is not essential, language features in cu-Prolog can be encoded into the specification of QUIXOT£ and the constraint solver can also be embedded into the implementation of QUIXOT£ without changing semantics. 5.2 Dynamical Programming This work aims to extend a framework of constraint throughout computer and cognitive sciences 14. In some sense, the idea originates in the treatment of constraints in cu-Prolog. Here, we describe an outline of dynamical programming as a general framework of treating constraints and an example in natural language processing. 5.2.1 Dynamics of Symbol Systems As already mentioned in Section 2, partial information plays an essential role in knowledge information processing systems. So, knowing how to deal with the partiality will be essential for future symbol systems. We employ a constraint system, which is independent of information flow. In order to make the system computationally more tractable than conventional logic, it postulates a dynamics of constraints, where the state of the system is captured in terms of potential energy. Consider the following program in the form of clauses: p(X) ~ r(X, Y),p(Y). r(X, Y) ~ q(X). Given a query ?-p(A),q(B), the rule-goal graph as used in deductive databases emulates top-down evaluation as in Figure 16. However, the graph presupposes a cer- ?-p(A), q(B) t p(X) + ~ r(X, Y),p(Y) t r(X, Y) I ~ q(X) Figure 16: Rule-Goal Graph tain information flow such as top-down or bottom-up evaluation. More generally, we consider it in the form in Figure 17. where the lines represent (partial) equations among variables, and differences between variables are not written for simplicity. We call such a graph a constraint network. In this framework, computation proceeds by propagating constraints in a node (a variable or an atomic 14See the details in [Hasida 1992]. 106 Tom has a telescope when he sees the girl, or the girl has the telescope when Tom sees her. Consider a set of facts: Figure 17: Constraint network (1) take(tom, telescope). (2) have( tom, tel escope). (3) have(girl, telescope). and an inference rule: constraint) to others on the constraint network. In order to make such computation possible, we note the dynamics of constraints, as outlined below: 1) An activation value is assigned to each atomic constraint (an atomic formula or an equation). The value is a real number between a and 1 and is considered as the truth value of the constraint. (4) Symbolic computation is also controlled on the basis of the same dynamics. This computational framework is not restricted in the form of Horn clauses. 5.2.2 Integrated Architecture of Natural Language Processing In traditional natural language processing, the system is typically a sequence of syntactic analysis, semantic analysis, pragmatic analysis, extralinguistic inference, generation planning, surface generation, and so on. However, syntactic analysis does not necessarily precede semantic and pragmatic comprehension, and generation planning is entwined with surface generation. Integrated architecture is expected to remedy such a fixed information flow. Our dynamics of constraint is appropriate for such an architecture. Consider the following example: Tom. took a telescope. He saw a girl with it. 'vVe assume that he and it are anaphoric with Tom and the telescope, respectively. However, with it has attachment ambiguity: '¢: take(X, Y). By constructing the constraint networks of (1),(2),(4) and (1),(3),(4) as in Figure 18, we can see that there .---.. / { take( 2) Based on activation values, normalization energy is defined for each atomic constraint, deduction eneT'!}y and abduction energy are defined for each clause, and assimilation energy and completion energy are defined for possible unifications. The potential eneT'!}y U is the sum of the above energies. 3) If the current state of a constraint is represented in terms of a point x of Euclidean space, U defines a field of force F of the point x. F causes spreading activation when F =f. O. A change of x is propagated to neighboring parts of the constraint network, in order to reduce U. In the long run, the assignment of the activation values settles upon a stable equilibrium satisfying F = O. have(X, Y) {takeT )} ),-,take( , )} )} L, {have( , I), -,take( , )} {have( {have( , I)} {have( ., )} ./ Constraint Network of (2) Constraint Network of (3) Figure 18: Constraint Networks of Alternatives are two cycles (involving tom and telescope) in the left network ((1), (2), and (4)), while there is only one cycle (girl) in the right network ((1), (3), and (4)). From the viewpoint of potential energy, the former tends to excite more strongly than the latter, in other words, (2) is more plausible than (3). Although, in natural language processing, resolution of ambiguity is a key point, the traditional architecture has not been promising, while our integrated architecture based on a dynamics of constraint network seems to give more possibilities not only for such applications but also for knowledge-base management systems. 6 Related Works Our database and knowledge-base management system in the framework of DOOD has many distinguished features in concept, size, and varieties, in comparison with other systems. The system aims not only to propose a new paradigm but also to provide database and knowledge- base facilities in practice for many knowledge information processing systems. There are many works, related to DOOD concepts, for embedding object-oriented concepts into logic programming. Although F-logic[Kifer and Lausen 1989] has the richest concepts, the id-term for object identity is based on predicate-based notation and properties are insufficient from a constraint point of view. Furthermore, it lacks update functions and a module concept. 107 QUIXOTE has many more functions than F-logic. Although, in some sense, QUIXOTE might be an overspecification language, users can select any subclass of QUIXOTE. For example, if they use only a subclass of object terms, they can only be conscious of the sublanguage as a simple extension of Prolog. As for nes ted relational models, there are many works since the proposal in 1977, and several models have been implemented: Verso [Verso 1986], DASDBS [Schek and Weikum 1986], and AIM-P [Dadam et al. 1986]. However, the semantics of our model is different from theirs. As the (extended) NF2 model of DASDBS and AIM-P has setbased (higher order) semantics~ it is very difficult to extend the query capability efficiently, although the semantics is intuitively familiar to the user. On the other hand, as Verso is based on the universal relation schema assumption, it guarantees efficient procedural semantics. However, the semantics is intuitively unfamiliar to the user: even if t tf. (JIT and t tf. (J2T for a relation T, it might happen that t E (JIT U (J2T. Compared with them, Kappa takes simple semantics, as mentioned in Section 3. This semantics is retained in o-terms in QUIXOTE and disjunctive feature structures in cu-Prolog for efficient computation. As for genetic information processing, researchers in logic programming and deductive databases have begun to focus on this area as a promising application. However, most of these works are devoted to query capabilities such as transitive closure and prototyping capabilities, while there are few works which focus on data and knowledge representation. On the other hand, QUIXOTE aims at both the above targets. As for legal reasoning, there are many works based on logic programming and its extensions. Our work has not taken their functions into consideration, but has reconsidered them from a database point of view, especially by introducing a module concept. 7 Future Plans and Concluding Remarks We have left off some functions due to a shortage in man power and implementation period. We are considering further extensions through the experiences of our activities, as mentioned in this paper. First, as for QUIXOTE, we are considering the following improvements and extensions: • Query transformation techniques such as sideways information passing and partial evaluation are not fully applied in the current implementation. Such optimization techniques should be embedded. in QUIXOTE, although constraint logic programming needs different devices from conventional deductive databases. Furthermore, for more efficient query processing, flexible control mechanisms, such as in cu-Prolog and dynamical programming, would be embedded. • For more convenience for description in we consider meta-functions as HiLog [Chen et al. 1989]: QUIXOTE, tc(R)(X, Y) :- R(X, Y) tc(R)(X, Y) :- tc(R)(X, Z), tc(R)(Z, Y) In order to provide such a function, we must introduce new variables ranging over basic objects. This idea is further extended to a platform language of QUIXOTE. For example, although we must decide the order relation (such as Hoare, Smyth, or Egli-Milner) among sets in order to introduce a set concept, the decision seems to depend on the applications. For more applications, such a relation would best be defined by a platform language. The current QUIXOTE would be a member of a family defined in such a platform language. • Communication among QUIXOTE databases plays an important role not only for distributed knowledge-bases but also to support persistent view, persistent hypothesis, and local or private databases. Furthermore, cooperative query processing among agents defined QUIXOTE is also considered, although it closely depends on the ontology of object identity. • In the current implementation, QUIXOTE objects can also be defined in KLl. As it is difficult to describe every phenomena in a single language, as you know, all languages should support interfaces to other languages. Thus, in QUIXOTE too, a multi-language system would be expected. • Although, in the framework of DOOD, we have focused mainly on data modeling extensions, the direction is not necessarily orthogonal from logical extensions and computational modeling extensions: set grouping can emulate negation as failure and the procedural semantics of QUIXOTE can be defined under the framework of object-orientation. However, from the viewpoint of artificial intelligence, non-monotonic reasoning and 'fuzzy' logic should be further embedded, and, from the viewpoint of design engineering, other semantics such as object-orientation, should also be given . As for Kappa, we are considering the following improvements and extensions: • In comparison with other DBMSs by Wisconsin Benchmark, the performance of Kappa can be further improved, especially in extended relational 108 algebra, by reducing inter-kernel communication costs. This should be pursued separately from the objective. • It is planned for Kappa to be accessed not only from sequential and parallel inference machines but also from general purpose machines or workstations. Furthermore, we should consider the portability of the system and the adaptability for an open system environment. One of the candidates is heterogeneous distributed DBMSs based on a client-server model, although Kappa-P is already a kind of distributed DBMS. • In order to provide Kappa with more applications, customizing facilities and service utilities should be strengthened as well as increasing compatibility with other DBMSs. In order to make Kappa and QUIXOT& into an integrated knowledge-base management system, further extensions are necessary: • takes nested transaction logic, while Kappa takes flat transaction logic. As a result, QUIXOT& guarantees persistence only at the top level transaction. In order to couple them more tightly, Kappa should support nested transaction logic. QUIXOT& • From the viewpoint of efficient processing, users cannot use Kappa directly through QUIXOT&. This, however, causes difficulty with object identi ty, because Kappa does not have a concept of object identity. A mechanism to allow Kappa and QUIXOT& to share the sa.me object space should be considered. • Although Kappa-P is a naturally parallel DBMS, current QUIXOT& is not necessarily familiar wi th parallel processing, even though it is implemented in 1\L1 and works in parallel. For more efficient processing, we must investigate parallel processing in Kappa and QUIXOT£.. We must develop bigger applications than those we mentioned in this paper. Furthermore, we must increase the compatibility with the conventional systems: for example, from Prolog to QUIXOT& and from the relational model to our nested relational model. We proposed a framework for DOOD, and are engaged in various R&D activities for databases and knowledge-bases in the framework, as mentioned in this paper. Though each theme does not necessarily originate from the framework, our experiences indicate that this direction is promising for many applications. Acknowledgments The authors have had much cooperation from all members of the third research laboratory of ICOT for each topic. We especially wish to thank the following people for their help in the specified topics: Hiroshi Tsuda for QUIXOTe and cu-Prolog, Moto Kawamura and Kazutomo N aganuma for J( appa, Hidetoshi Tanaka and Yuikihiro Abiru for Biological Databases, Nobuichiro Yamamoto for TRIAL, Satoshi Tojo for Temporal Inference, and Koiti Hasida for DP. We are grateful to members of the DOOD (DBPL, ETR, DDB&AI, NDB, IDB), STASS, and JPSG working groups for stimulating discussions and useful comments on our activities, and, not to mention, all members of the related projects (see the appendix) for their implementation efforts. We would also like to acknowledge Kazuhiro Fuchi and Shunichi Uchida without whose encouragement QUIXOT& and Kappa would not have been implemented. References [Aczel 1988] P. Aczel, Non- Well Founded Set Theory, CSLI Lecture notes No. 14, 1988. [Chen et al. 1989] W. Chen, M. Kifer and D.S. Warren, "HiLog as a Platform for Database Language", Proc. the Second Int. Workshop on Database Programming Language, pp.121-135, Gleneden Beach, Oregon, June, 1989. [Chikaya.ma 1984] T. Chikayama, "Unique Features of ESP", Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, Nov.6-9, 1984. [Chikayama et al. 1988] T. Chikayama, H. Sato, and T. Miyazaki, "Overview of the Parallel Inference Machine Operating Sistem (PIMOS)", Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, Nov.28-Dec.2, 1988. [Dadam et al. 1986] P. Dadam, et aI, "A DBMS Prototype to Support Extended NF2 Relations: An Integrated View on Flat Tables and Hierarchies", ACM SIGMOD Int. Conf. on Management of Data, 1986. [Delobel et al. 1991J C. Delobel, M. Kifer, and Y. Masunaga (eds.), Deductive and Object-Oriented Databases, (Proc. 2nd Int. Conf. on Deductive and Object-Oriented Databases (DOOD'91)), LNCS 566, Springer, 1991. [Goto et al. 1988] A. Goto et at., "Overview of the Parallel Inference Machine Architecture (PIM)", Proc. 109 Int. Con/. on Fifth Generation Computer Systems, Ie aT, Tokyo, Nov.28-Dec.2, 1988. [Haniuda et aI. 1991] H. Haniuda, Y. Abiru, and N. Miyazaki, "PHI: A Deductive Database Systern", Proc. IEEE Pacific Rim Conf. on Communication, Computers, and Signal Processing, May, 1991. [Hasida 1992] K. Hasida, "Dynamics of Symbol Systems - An Integrated Architecture of Cognition", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, June 1-5, 1992. [Kawamura et aI. 1992] M. Kawamura, H. Naganuma, H. Sato, and K. Yokota, "Parallel Database Management System J(appa-P", Proc. Int. Con/. on Fifth Generation Compute1' Systems, IeOT, Tokyo, June 1-5, 1992. [Kifer and Lausen 1989] M. Kifer and G. Lausen, "FLogic - A Higher Order Language for Reasoning about Objects, Inheritance, and Schema", Proc. ACM SIGMOD Int. Con/. on Management of Data, pp.134-146, Portland, June, 1989. [Kim et al. 1990] "V. Kim, J.-M. Nicolas, and S. Nishio (eds.), Deductive and Object-Oriented Databases, (Proc. 1st Int. Con/. on Deductive and ObjectOriented Databases (DOOD89)) , North-Holland, 1990. [Miyazaki et al. 1989] N. Miyazaki, H. Haniuda, K. Yokota, and H. Itoh, "A Framework for Query Transformation", Journal of Information Processing, vol.l2, No.4, 1989. [Mukai 1988] K. Mukai, "Partially Specified Term in Logic Programming for Linguistic Analysis", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, Nov.28-Dec.2, 1988. [Schek and Weikum 1986] H.-J. Schek and G. Weikum, "DASDBS: Concepts and Architecture of a Database System for Advanced Applications", Tech. Univ. of Darmstadt, Technical Report, DVSI-1986-Tl, 1986. [Tanaka 1992] H. Tanaka, "Integrated System for Protein Information Processing", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, June 1-5, 1992. [Tojo and Yasukawa 1992] S. Tojo and H. Yasukawa, "Situated Inference of Temporal Information", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, June 1-5, 1992. [Tsuda 1992] H. Tsuda, "cu-Prolog for ConstraintBased Grammar", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, June 15, 1992. [Ueda and Chikayama 1990] K. Ueda and T. Chikayama, "Design of the Kernel Language for thr Parallel Ingerence Machine", The Computer Journal, vo1.33, no.6, 1990. [Verso 1986] J. Verso, "VERSO: A Data Base Machine Based on Non lNF Relations", INRIA Technical Report, 523, 1986. [Yamamoto 1990] N. Yamamoto, "TRIAL: a Legal Reasoning System (Extended Abstract)", Joint French-Japanese Workshop on Logic Programming, Renne, France, July, 1991. [Yasukawa et al. 1992] H. Yasukawa, H. Tsuda, and K. Yokota, "Object, Properties, and Modules in QUIXOT£", Proc. Int. Con/. on Fifth Generation Computer Systems, IeOT, Tokyo, June 1-5, 1992. [Yokota 1988] K. Yokota, "Deductive Approach for Nested Relations", Programming of Future Generation Computers II, eds. by K. Fuchi and L. Kott, North-Holland, 1988. [Yokota et al. 1988] K. Yokota, M. Kawamura, and A. Kanaegami, "Overview of the Knowledge Base Management System (KAPPA)", Proc. Int. Conf. on Fifth Generation Computer Systems, IeOT, Tokyo, Nov.28-Dec.2, 1988. [Yokota and Nishio 1989] K. Yokota and S. Nishio, "Towards Integration of Deductive Databases and Object-Oriented Databases - A Limited Survey", Proc. Advanced Database System Symposium, Kyoto, Dec., 1989. [Yoshida 1991] K. Yoshida, "The Design Principle of the Human Chromosome 21 Mapping Knowledgebase (Version CSH91)", Inetrnal Technical Report of Lawrence Berkley Laboratory, May, 1991. 110 Appendix Notes on Projects for Database and Knowledge-Base Management Systems In this appendix, we describe an outline of projects on database and knowledge-base management systems in the FGCS project. A brief history is shown in Figure 19 15. Among these projects, Mitsubishi Electric Corp. has cooperated in Kappa-I, Kappa-II, Kappa-P, DO-l, CIL, and QUIXOTE projects, Oki Electric Industry Co., Ltd. has cooperated in PHI (DO-c/» and QUIXOTE projects, and Hitachi, Ltd. has cooperated in ETA (DO-7J) and QUIXOTE projects. a. Kappa Projects In order to provide database facilities for knowledge information processing systems, a Kappa 16 project begun in September, 1985 (near the beginning of the intermediate stage of the FGCS project). The first target was to build a database wi th electronic dictionaries including concept taxonomy for natural language processing systems and a database for mathematical knowledge for a proof checking system called CAP-LA. The former database was particularly important: each dictionary has a few hundred thousands entries, each of which has a complex data structure. We considered that the normal relational model could not cope with such data and decided to adopt a nested relational model. Furthermore, we decided to add a new type term for handling mathematical knowledge. The DBMS had to be written in ESP and work on PSI machines and under the SIMPOS operating system. As we were afraid of whether the system in ESP would work efficiently or not, we decided on the semantics of a nested relation and started to develop a prototype system called Kappa-I. The system, consisting of 60 thousands lines in ESP, was completed in the spring of 1987 and was shown to work efficiently for a large amount of dictionary data. The project was completed in August, 1987 after necessary measurement of the processing performance. After we obtained the prospect of efficient DBMS on PSI machines, we started the next project, KappaII[Yokota et al. 1988J in April, 1987, which aims at a practical DBMS based on the nested relational model. Besides the objective of more efficient performance than Kappa-I, several improvements were planned: a main memory database facility, extended relational 15 At the initial stage of the FGCS project, there were other projects for databases and knowledge-based: Delta and Kaiser, however these were used for targets other than databases and know ledge- bases. 16 A term Kappa stands for know/edge application oriented gdvanced database management system. algebra, user-definable command facility, and userfriendly window interface. The system, consisting of 180 thousand lines in ESP, works 10 times more efficiently in PSI-II machines than Kappa-I does in PSI-I. The project was over in March, 1989 and the system was widely released, not only for domestic organizations but also for foreign ones, and mainly for genetic information processing. To handle larger amounts of data, a parallel DBMS project called Kappa-P[Kawamura et al. 1992J was started in February, 1989. The system is written in KLl and works under an environment of PIM machines and the PIMOS operating system. As each local DBMS of Kappa-P works on a single processor with almost the same efficiency as Kappa-II, the system is expected to work on PIM more efficiently than KappaII, although their environments are different. b. Deductive Database Projects There were three projects for deductive databases. First, in parallel with the development of Kappa, we started a deductive database project called CRL (complex record language) [Yokota 1988], which is a logic programming language newly designed for treating nested relations. CRL is based on a subclass of complex objects constructed by set and tuple constructors and with a module concept. The project started in the summer of 1988 and the system, called DO-l, was completed in November, 1989. The system works on Kappa-II. The query processing strategy is based on methods of generalized magic sets and semi-naive evaluation. In it, rule inheritance among modules based on submodule relations are dynamically evaluated. Secondly, we started a project called PHI [Haniuda et al. 1991] in the beginning of the intermediate stage (April, 1985). This aimed at more efficient query processing in traditional deductive databases than other systems. The strategy is based on three kinds of query transformation called Horn clause transformation (HCT)[Miyazaki et al. 1989J: HCT/P executes partial evaluation or unfolding, HCT IS propagates binding information without rule transformation, and HCT/R transforms a set of rules in order to restrict the search space and adds related new rules. The HCT/R corresponds to the generalized magic set strategy. By combining these strategies, PHI aims at more efficient query processing. The consequent project is called DO-c/>, in which we aim at a deductive mechanism for complex objects. Thirdly, we started a project called ETA in April, 1988, which aimed at knowledge-base systems based on knowledge representation such as semantic networks. One year later, the project turned towards extensions of deductive databases and was called DO-7J. III ~~1~9~8~5~+1~1~9~8~6~~~1~9~8~7__~~1~9~8~8~+-~19~8~9~~~1~9~9~O__~~1~9~9~1~+-~1~92 (Kappa-r)~---..---.... ~------------------~-r--~ '--~~-./ ( PHI )r----------------~~__~ ~--------_H~~--~ elL QuIXOTE EB (Parallel) KIPS (Natural Language, Mathematical Knowledge, Biological Information, Legal Precedents, ... ) Figure 19: Brief History of Projects on Database and Knowledge-Base Management Systems "DO" in the above projects stands for deductive and object-oriented databases and is shown to adopt a concept of DOODs [Yokota and Nishio 1989] as its common framework. c. elL Project A language called GIL (complex indeterminates language) was proposed in April, 1985 [Mukai 1988]. The language aimed at semantic representation in natural language processing and was used not only in the discourse understanding system called DUALS, but also for representing various linguistic information. The implementation of CIL was improved several times and CIL was released to many researchers in natural language processing. The language is a kind of constraint logic programming and closely relates to situation theory and semantics. The language is based on partially specified terms, each of which is built by a tuple constructor. A set constructor was introduced into partia.lly specified terms in another language cu-Prolog, as mentioned in Section 5.1. d. QuIXOTE Project We tried to extend CRL not only for nested relations but a.lso for DOODs, a.nd to extend CIL for more efficient representation, such as the disjunctive feature structure. After these efforts, we proposed two new languages: Jllan, as an extension of CRL, and QUINT, as an extension of CIL. While designing their specifications, we found many similarities between Jllan and QUINT, and between concepts in databases and natural language processing, and decided to integrate these languages. The integrated language is QUIxoTf;[Yasukawa et ai. 1992] (with Spanish pronun- ciation) 17. As the result of integration, QUIXOTE has various features, as mentioned in this paper. The QUIXOTE project was started in August, 1990. The first version of QUIXOTE was released to restricted users in December, 1991, and the second version was released for more applications at the end of March, 1992. Both versions are written in KLI and work on parallel inference machines. e. Working Groups on DOOD and STASS At the end of 1987, we started to consider integration of logic and object-orientation concepts in the database area. After discussions with many researchers, we formed a working group for DOOD and started to prepare a new international conference on deductive and object-oriented databases 18. The working group had four sub-working-groups in 1990: for database programming languages (DBPL), deductive databases and artificial intelligence (DDB&AI), extended term representation (ETR), and biological databases (BioDB). In 1991, the working group was divided into intelligent databases (IDB) and next generation databases (NDB). In their periodic meetings 19, we discussed not only problems of DOOD but also directions and problems l70ur naming convention follows the DON series, such as Don Juan and Don Quixote, where DON stands for "Deductive Object-Oriented Nucleus". l8Most of the preparation up until the first international conference (DOOD89) was continued by Professor S. Nishio of Osaka University. 19Their chairpersons are Yuzuru Tanaka of Hokkaido U. for DOOD, Katsumi Tanaka of Kobe U. for DBPL, Chiaki Sakama of ASTEM for DDB&AI and IDB, Shojiro Nishio of Osaka U. for ETR, Akihiko Konagaya of NEC for BioDB, and Masatoshi Yoshikawa of Kyoto Sangyo U. for NDB. 112 of next generation databases. These discussions contributed greatly to our DOOD system. From another point of view, we formed a working group (STS) 20 for situation theory and situation semantics in 1990. This also contributed to strengthening other aspects of QUIXOTe and its applications. 20The chairperson is Hozumi Tanaka of Tokyo Institute of Technology. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 113 CONSTRAINT LOGIC PROGRAMMING SYSTEM CAL, (iDCC AND THEIR CONSTRAINT SOLVERS- Akira Aiba and Ryuzo Hasegawa Fourth Research Laboratory Institute for New Generation Computer Technology 4-28, Mita l-chorne, Minato-ku, Tokyo 108, Japan {aiba, hasegawa}@icot.or.jp Abstract This paper describes constraint logic programming languages, CAL (Contrainte Avec Logique) and GDCC (Guarded Definite Clauses with Constraints), developed at lCOT. CAL is a sequential constraint logic programming language with algebraic, Boolean, set, and linear constraint solvers. GDCC is a parallel constraint logic programming language with algebraic, Boolean, linear, and integer parallel constraint solvers. Since the algebraic constraint solver utilizes the Buchberger algorithm, the solver may return answer constraints including univariate nonlinear equations. The algebraic solvers of both CAL and GDCC have the functions to approximate the real roots of univariate equations to obtain all possible values of each variable. That is, this function gives us the situation in which a certain variable has more than one value. To deal with this situation, CAL has a multiple environment handler, and GDCC has a block structure. We wrote several application programs in GDCC to show the feasibility of the constraint logic programming language. 1 Introduction The Fifth Generation Computer System (FGCS) project is a Japanese national project that started in 1982. The aim of the project is to research and develop new computer technologies for knowledge and symbol processing parallel computers. The FGCS prototype system has three layers: the prototype hardware system, the basic software system, and the knowledge programming environment. Parallel application software has been developed for these. The constraint logic programming system is one of the systems that form, together with the knowledge base construction and the programming environment, the knowledge programming environment. In this paper, we describe the overall research results of constraint logic programming systems in lCOT. The programming paradigm of constraint logic programming (CLP) was proposed by A. Colmerauer [Colmerauer 1987] and J. Jaffar and J-L. Lassez [Jaffar and Lassez 1987] as an extension of logic programming by extending its computation domain. Jaffar and Lassez showed that CLP possesses logical, functional, and operational semantics which coincide with each other, in a way similar to logic programming [van Emden and Kowalski 1976]. In 1986, we began to research and develop high-level programming languages suitable for problem solving to achieve our final goal, that is, developing efficient and powerful parallel CLP languages on our parallel machine. The descriptive power of a CLP language is strongly depend on its constraint solver, because a constraint solver determines the domain of problems which can be handled by the CLP language. Almost all existing CLP languages such as Prolog III [Colmerauer 1987] and CLP(n) [Jaffar and Lassez 1987] has a constraint solver for linear equations and linear inequalities. Unlike the other CLP languages, we focused on nonlinear algebraic equation constraints to deal with problems which are described in terms of nonlinear equations such as handling robot problem. For the purpose, we selected the Buchberger algorithm for a constraint solver of our languages. Besides of nonlinear algebraic equations, we were also interested in writing Boolean constraints, set constraints, linear constraints, and hierarchical constraints in our framework. For Boolean constraints, we modify the Buchberger algorithm to be able to handle Boolean constraints, and later, we developed the algorithm for Boolean constraints based on the Boolean unification. For set constraints, we expand the algorithm for Boolean constraints based on the Buchberger algorithm. We also implemented the simplex method to deal with linear equations and linear inequalities same as the other CLP languages. Furthermore, we tried to handle hierarchical constraints in our framework. We developed two CLP language processors, first we implemented a language processor for sequential CLP 114 language named CAL (Contrainte A vee Logique) on sequential inference machine PSI, and later, we imple. mented a language processor for parallel CLP language named GDCC (Guarded Definite Clauses with Constraints), based on our experiments on extending CAL processor by introducing various functions. In Section 2, we briefly review CLP, and in Section 3, we describe CAL. In Section 4, we describe GDCC, and in Section 5, we describe various constraint solvers and their parallelization. In Section 6, we introduce application programs written in our languages. 2 eLP and the role of the constraint solver CAL and GDCC belong to the family of CLP languages. The concept of CLP stems from the common desire for easy programming. In fact, as claimed in the literature [Jaffar and Lassez 1987, Sakai and Aiba 1989], the CLP is a scheme of programming languages with the following outstanding features: • Natural declarative semantics. • Clear operational semantics that coincide with the declarative semantics. Therefore, it gives the user a paradigm of declarative (and thus, hopefully easy) programming and gives the machine an effective mechanism for execution that coincide with the user's declaration. For example, in Prolog (the most typical instance of CLP), we can read and write programs in declarative style like ". " if ... and ... ". The system execute these by a series of operations with unification as its basic mechamsm. Almost every CLP language has a similar programming style and a mechanism which plays the similar role to the unification mechanism in Prolog, and the execution of programs depends on the mechanism heavily. We call such a mechanism the constraint solver of the language. Usually, a CLP language aims at a particular field of problems and its solver has special knowledge to solve the problems. In the case of Prolog, the problems are syntactic equalities between terms, that is, the unification. On the other hand, CAL and GDCC are tuned to deal with the following: • algebraic equations • Boolean equations • set inclusion and membership • linear inequalities These relations are called constraints. In the CLP paradigm, a problem is expressed as constraints on the objects in the problem. Therefore, an often cited benefit of CLP is that "One does not need to write an implementation but a specification." In other words, all that a programmer should write in CLP is constraints between the objects, but not how to find objects satisfying the relation. To be more precise, such constraints are described in the form of a logical combination of formulas each of which expresses a basic unit of the relation. Though there are many others, the above benefit surely expresses an important feature of CLP. Building an equation is usually easier than solving it. Similarly, one may be able to write down the relation between the objects without knowing the method to find the appropriate values of obJects which satisfy the relation. An ideal CLP system should allow a programmer to write any combination of any well-formed formulas. The logic programming paradigm gives us a rich framework for handling logical combinations of constraints. However, we still need a powerful and flexible constraint solver to handle each constraint. To discuss the function of the constraint solver from a theoretical point of view, the declarative semantics of CLP [Sakai and Aiba 1989] gives us several criteria. Assume that constraints are given in the form of their conjunction. Then, the following are the criteria. (1) Can the solver decide whether a given constraint is satisfiable? (2) Given satisfiable constraints, is there any way for the solver to express all the solutions in simplified form? Prolog's constraint solver, the unification algorithm, answers these criteria affirmatively and so do the solvers in CAL and GDCC. In fact, they satisfy the following stronger requirements almost perfectly: (3) Given a set of constraints, can the solver compute the simplest form (called the canonical form of the constraints) in a certain sense? However, these criteria may not be sufficient from an applicational point of view. For example, we may sometimes be asked the following: (4) Given satisfiable constraints, can the solver find at least one concrete solution? Finding a concrete solution is a question usually independent of the above and may be proved theoretically impossible to answer. Therefore, we may need an approximate solution to answer this partly. As discussed later, we incorporated many of the constraint solvers and functions into CAL and GDCC. Another important feature of constraint solvers is their incrementality. An incremental solver can be given a constraint successively. It reduces each constraint as simple 115 as possible by the current set of constraints. Thus, an incremental solver finds the unsatisfiabilityof a set of constraints as early as possible and makes Prolog-type backtracking mechanism efficient. Fortunately, the solvers of CAL and GDCC are fully incremental like unification. CA This section summarizes the syntax of CAL. For a detailed description of CAL syntax, refer to the CAL User's Manual [CAL Manual]. 3.1 CAL language The syntax of CAL is similar to that of Prolog, except for its constraints. A CAL program features two types of variables: logical variables denoted by a sequence of alphanumeric characters starting with an uppercase letter (as with Prolog variables), and constraint variables denoted by a sequence of alphanumeric characters starting with a lowercase letter. Constraint variables are global variables, while logical variables are local variables within the clauses in which they occur. This distinction is introduced to simplify incremental querying. The following is an example CAL program that features algebraic constraints. This program derives a new property for a triangle, the relation which holds among the lengths of the three edges and the surface area, from the three known properties. :- public triangle/4. surface_area(H,L,S) :- alg:L*H=2*S. right(A,B,C) :- alg:A-2+B-2=C-2. triangle(A,B,C,S) :alg:C=CA+CB, right (CA,H,A), right(CB,H,B), surface_area(H,C,S) . The first clause, "surface_area", expresses the formula for computing the surface area S from the height H and the baseline length L. The second expresses the Pythagorean theorem for a right-angled triangle. The third asserts that every triangle can be divided into two right-angled triangles. (See Figure 1.). In the following query, heron, shows the name of the file in which the CAL source program is defined. ?- alg:pre(s,10), heron:triangle(a,b,c,s). This query asks for the general relationship between the lengths of the three edges and the surface area. CB ~.---------------C----------~~~ CAL - Sequential CLP Language 3 Figure 1: The third clause The invocation of alg: pre (s , 10) defines the precedence of the variable s to be 10. Since the algebraic constraint solver utilizes the Buchberger algorithm, ordering among monomials is essential for computation. This command changes the precedence of variables. Initially, the precedences of all variables are assigned to O. Therefore, in this case, the precedence of variable s is raised. To this query, the system responds with the following equation 1: s-2 = -1/16*b-4+1/8*a-2*b-2-1/16*a-4 +1/8*c-2*b-2+1/8*c-2*a-2-1/16*c-4. This equation is, actually, a developed form of Heron's formula. 'When we call the query ?- heron:triangle(3,4,5,s). the CAL system returns the following answer: If a variable has finitely many values in all its solutions, there is a way of obtaining a univariate equation with the variable in the Grabner base. Therefore, if we can add a function that enables us to compute the approximate values of the solutions of univariate equations, we can approximate all possible value of the variable. For this purpose, we implemented a method of approximating the real roots of univariate polynomials. In CAL, all real roots of univariate polynomials are isolated by obtaining a set of intervals, each of which contains one real root. Then, each isolated real root is approximated by the given precision. For application programs, we wanted to use approximate values to simplify other constraints. The general method to do this is to input equations of variables and their approximate values as constraints. For this purpose, we had to modify the original algorithm to compute Grabner bases to accept approximate values. vVhen we call the query IThis equation represents the expression 116 I User Program, Query, Command I I Translator 1· alg:pre(s.1 0). heron:triangle(3. 4.5. s). alg:geuesull(eq. 1. nonlin. R).alg:fmd(R,Sol). alg:con.rlr(Sol). Translated Code R-[s"2-36]. Sol-[s-real(·. [935.2054.5183.8764.3451. [3488.342.7523.6460.57])] - Is - -6.000000099] . • - ·6.000000099 ? Inference Engine Constraints R-[s"2-36]. Sol-[5-re81(+. [935. 2054. 5183. 8764. 345J. [3488. 342. 7523. 6460. 57J)J - Is - 6.000000099] . • - 6.000000099 ~ Canno nical Form Constraint Solvers Figure 2: Overall construction of CAL language processor ?- alg:set_out_mode(float), alg:set_error1(1!1000000), alg:set_error2(1!100000000), heron:triangle(3,4,5,s), alg:get_result(eq,1,nonlin,R) , alg:find(R,S), alg:constr(S). we can obtain the answers s = -6.000000099 and s = 6.000000099, successively by backtrack. The first line of the above, alg: set_out...mode, sets the output mode to float. Without this, approximate values are output as fractions. The second line of the above, alg: seLerror1, specifies the precision used to compare coefficients in the computation of the Grabner base. The third line, set_error2, specifies the precision used to approximate real roots by the bisection method. The essence of the above query is invocations of alg:get_result/4, and alg:find/2. The fifth line, alg : get_result, selects appropriate equations from the Grabner base. In this case, univariate (specified by 1) non-linear (specified by nonlin) equations (specified by eq) are selected and unified to a variable R. R is then passed to alg: find to approximate the real roots of equations in R. Such real roots are obtained in the variable S. Then, S is again input as the constraint to reduce other constraints in the Grabner base. 3.2 Configuration of CAL system In this section, we will introduce the overall structure of the CAL system. The CAL language processor consists of a translator, a inference engine, and constraint solvers. These subsystems are combined as shown in Figure 2. The translator receives input from a user, and translates it into ESP code. Thus, a CAL source program Figure 3: CAL system windows is translated into the corresponding ESP program by the translator, which is executed by the inference engine. An appropriate constraint solver is invoked everytime the inference engine finds a constraint during execution. The constraint solver adds the newly obtained constraint to the set of current constraints, and computes the canonical form of the new set. At present, CAL offers the five constraint solvers discussed in Section 1. 3.3 Context . To deal with a situation in which a variable has more than one value, as in the above example, we introduced context and context tree. A context is a set of constraints. A new context is created whenever the set is changed. In CAL, contexts are represented as nodes of a context tree. The root of a context tree is called the root context. The user is supposed to be in a certain context called the current context. A context tree is changed in the following cases: 1. Goal execution: A new context is created as a child-node of the current context in the context tree. 2. Creation of a new set of constraints by requiring other answers for a goal: A new context is created as a sibling node of the current context in the context tree. 3. Changing the precedence: A new context is created as a child-node of the current context in the context tree. In all cases, the newly created node represents the new set of constraints and becomes the current context. Several commands are provided to manipulate the context tree: These include a command to display the contents of a context, a command to set a context as the 117 current context, and a command to delete the sub-tree of contexts from the context tree. Figure 3 shows an example of the CAL processor window. 4 GDCC Parallel CLP Programming Language There are two major levels to parallelizing CLP systems. One is the execution of the Inference Engines and the Constraint Solvers in p~ra.llel. The other is the execution of a Constraint Solvers in parallel. There are several works on the parallelization of CLP systems: a proposal of ALPS [Maher 1987] introducing constraints into committed-choice language, a report of some preliminary experiments on integrating constraints into the PEPSys parallel logic system [Van Hentenryck 1989], and a framework of concurrent constraint (cc) language for integrating constraint programming with concurrent logic programming languages [Saraswat 1989]. The cc programming language paradigm models computation as the interaction among multiple cooperating agents through the exchange of query and assertion messages into a central store as shown in Figure 4. In Figure 4, query information to the central store is represented as Ask and assertion information is represented as Tell. This paradigm is embedded in a guarded (conditional) reduction system, where the guards contain the queries and assertions. Control is achieved by requiring that the queries in a guard are true (entailed), and that the assertions are consistent (satisfiable), with respect to the current state of the store. Thus, this paradigm has high affinity with KL1 [Ueda and Chikayama 1990], our basic parallel language. c;> yP " " ,..--.........--~-.... Ask @ Query True or False Add constrain@ ... Answer constraint Tell Figure 4: The ee language schema GDCC (Guarded Definite Clauses with Constraints), which satisfies two level parallelism, is a parallel CLP language introducing the framework of ee. It is implemented in KL1 and is currently running on the MultiPSI machine. GDCC includes most of KL1, since KL1 built-in predicates and unification can be regarded as a distinguished domain called HERBRAND [Saraswat 1989]. GDCC contains Store, a central database to save the canonical forms of constraints. Whenever the system meets an Ask or Tell constraint, the system sends it to the proper solver. Ask constraints are only allowed passive constraints which can be solved without changing the content of the Store. While in the Tell part, constraints which may change the Store can be written. In the GDCC program, only Ask constraints can be written in guards. This is similar to the KL1 guard in which active unification is inhibited. GDCC supports multiple plug-in constraint solvers so that the user can easily specify a proper solver for a domam. In this section, we briefly explain the language syntax of GDCC and its computa.tion model. Then, the outline of the system is described. For further information about the implementation and the language specification, refer to [Terasaki et al. 1992]. 4.1 GDCC language A clause in GDCC has the following syntax: Head: - Ask I Tell, Goal. where, Head is a head part of a clause, "I" is a commit operator, Goal is a sequence of predicate invocations, Ask denotes Ask-constraints and invocations of KL1 built-in guard predicates, and Tell means Tell-constraints. A clause is entailed if and only if Ask is reduced to true. Any clause with guards which cannot be reduced to either true or false is suspended. The body part, the right hand side of the commit operator, is evaluated if and only if Ask is entailed. Clauses whose guards are reduced true are called candidate clauses. A GDCC program fails when either all candidate clauses are rejected or there is a failure in evaluating Tell or Goals. The next program is pony _and_man written in GDCC: pony_and_man(Heads,Legs,Ponies,Men) alg# Heads= Ponies + Men, alg# Legs= 4*Ponies + 2*Men. true I where, true is an Ask constraint which is always reduced as true. In the body, equations which begin with alg# are Tell constraints. alg# indicates that the constraints are solved by the algebraic solver. In a body part, not only Tell constraints but normal KL1 predicates can be written as well. Bi-directionality in evaluation of constraints, an important characteristic of CLP, is not spoiled by this limitation. For example, the query ?pony_and_man(5,14,Ponies,Men). will return Ponies=2, and Men=3, and the query 118 ?- pony_and_man(Heads,Legs,2,3). will return Heads=5, and Legs=14, same as in CAL. 4.2 GDCC system The GDCC system consists of the compiler, the shell, the interface and the constraint solvers. The compiler translates a GDCC source program into KL1 code. The shell translates queries and provides rudimentary debugging facili ties. The debugging facilities comprise the standard KL1 trace and spy functions, together with solver-level event logging. The shell also provides limited support for incremental querying. The interface interacts with a GDCC program (object code), sends body constraints to a solver and checks guard constraints using results from a solver. Query Solve guard constraints roots in univariate equations. There are two constraint sets in this example, one includes X = v'2 and the other includes X = -v'2. In the CAL system, the system selects one constraint set from these two and solves it, then, the other is computed by backtracking (i. e. , a system forces a failure). In committed-choice language GDCC, however, we cannot use backtracking to handle multiple environments. A similar problem occurs when a meta operation to constraint sets is required such as when computing a maximum value with respect to a given objective function. Before executing a meta operation, all target constraints must be sent to the solver. In a sequential CLP, this can be controlled when this description is written in a program. While in GDCC, we need another kind of mechanism to specify a synchronization point, since the sequence of clauses in a program does not relate to the execution sequence. Introducing local constraint sets, however, which are independent to the global ones, can eliminate these problems. Multiple environments are realized by considering each multiple local constraint as one context. An inference engine and constraint solvers can be synchronized after evaluating a local constraint set. Therefore, we introduced a mechanism called block to describe the scope of a constraint set. vVe can solve a certain goal sequence with respect to a local constraint set in a block. To encapsulate failure in a block, the shoen mechanism of PIMOS [Chikayama et al. 1988] is used. GDCC sonrce Figure 5: System Configuration of GDCC The GDCC system is shown in Figure 5. The components are concurrent processes. Specifically, a GDCC program and the constraint solvers may execute in parallel, synchronizing only when, and to the extent, necessary at the program's guard constraints. That is, program execution proceeds by selecting a clause, and attempting to solve the guards of all its clauses in parallel. If one guard succeeds, the evaluation of the other guards is abandoned, and execution of the body can begin. In parallel with execution of the body goals by the inference engine, any constraints occurring in the body are passed to the constraint solver as they are being produced by the inference engine. This style of cooperation is very loosely synchronized and more declarative than sequential CLP. 4.3 Block In order to apply GDCC to problems such as handling robot design problem [Sato and Aiba 1991], there were two major issues: handling multiple environments and synchronizing the inference engine with the constraint solvers. For instance, when the solution X 2 = 2 is derived from the algebraic solver, it must be solved in more detail using a function to compute the approximate real -5 Constraint Solvers and Parallelization In this section, constraint solvers for both CAL and GDCC are briefly described. First, we describe the algebraic constraint solver for both CAL and GDCC. Then, we describe two Boolean constraint solvers - one is a solver utilizing the modified Buchberger algorithm and the other is a solver utilizing the incremental Boolean elimination algorithm. The former is for both CAL and GDCC, while the later is for CAL alone. Third, an integer constraint solver for GDCC is described, and fourth, a hierarchical constraint solver for CAL and GDCC is described. In the next subsection,a set constraint solver for CAL is described. And in the last subsection, a preHminary consideration on efficiency improvement of the algebraic constraint solver by applying dependency analysis of constraints. All constraint solvers for CAL are written in ESP, and those for GDCC are written in KL1. 5.1 Algebraic Constraint Solver The constraint domain of the algebraic solver is multivariate (non-linear) algebraic equations. The Buchberger 119 algorithm [Buchberger 1985] is a method to solve nonlinear algebraic equations which have been widely used in computer algebra over the past years. Recently, several attempts have been made to parallelize the Buchberger algorithm, with generally disappointing results in absolute performance [Ponder 1990, Senechaud 1990, Siegl 1990], except in shared-memory machines [Vidal 1990, Clarke et al. 1990]. 'liVe parallelize the Buchberger algorithm while laying emphasis on absolute performance and incrementality rather than on. deceptive parallel speedup. We have implemented several versions and continue to improve the algorithm. In this section, we outline both the sequential version and the parallel version of the Buchberger algorithm. 5.1.1 Grabner base and Buchberger algorithm Without loss of generality, we can assume that all polynomial equations are in the form of p = O. Let E = {PI = 0, ... ,pn = O} be a system of polynomial equations. Buchberger introduced the notion of a Grabner base and devised an algorithm to compute the basis of a given set of polynomials. A rough sketch of the algorithm is as follows (see [Buchberger 1985] for a precise definition). Let a certain ordering among monomials and a system of polynomials be gi ven. An equation can be considered a rewrite rule which rewrites the greatest monomial in the equation to the polynomial consisting of the remaining monomials. For example, if the ordering is Z > X > B > A, a polynomial equation, Z - X + B = A, can be considered to be the rewrite rule, Z - t X -B+A. A pair of rewrite rules LI - t RI and L2 - t R 2, of which LI and L2 are not mutually prime, is called a critical pair, since the least common multiple of their left-hand sides can be rewritten in two different ways. The S-polynomial of such a pair is defined as: where lcm(L I , L 2 ) represents the least common multiplier of LI and L 2 • If further rewriting does not succeed in rewriting the S-polynomial of a cri tical pair to zero, the pair is said to be divergent and the S-polynomial is added to the system of equations. By repeating this procedure, we can eventually obtain a confluent rewriting system. The confluent rewriting system thus obtained is called a Grabner base of the original system of equations. If a Grabner base does not have two rules, one of which rewrites the other, the Grabner base is called reduced. The reduced Grabner base can be considered a canonical form of the given constraint set since it is unique with respect to the given ordering of monomials. If all the solutions of a equation f = are included in the solution set of E, then f is rewritten to zero by the Grabner base of E. On the contrary, if a set of polynomials E ° has no solution, then the Grabner base of E includes "1". Therefore, this algorithm has good properties for deciding the satisfiability of a given constraint set. 5.1.2 Parallel Algorithm The coarse-grained parallelism in the Buchberger algorithm, suitable for the distributed memory machine, is the parallel rewriting of a set of polynomials. However, since the convergence rate of the Buchberger algorithm is very sensitive to the order in which polynomials are converted into rules, implementation must carefully select small polynomials at an early stage. We have implemented solvers in three different architectures; namely, a pipeline, a distributed architecture, and a master-slave architecture. We briefly mention here the master-slave architecture since this solver has comparatively good performance. Figure 6 shows the architecture. New rule (global minimum) Load balance info. Figure 6: Architecture of master-slave type solver The set of polynomials E is physically partitioned with each slave taking a different part. The initial rule set of G(E) is duplic?-ted so that all slaves use the same rule set. New polynomials are distributed to the slaves by the master. The outline of the reduction cycle is as follows. Each slave rewrites its own polynomials by the G(E), selects the local minimum polynomial from them, and sends its leading power product to the master. The master processor waits for reports from all the slaves, and selects the global minimum power products. The minimum polynomial can be decided only after all slaves finish reporting to the master. A polynomial, however, which is not the minimum can be decided quickly. Thus, the notminimum message is sent to slaves as soon as possible, and the processors that receive the not-minimum message reduce polynomials by the old rule set while waiting for a new rule. While the slave is receiving the minimum message, the slave converts the polynomial into a new rule and sends it to the master. The master sends the new rule to all slaves except the owner. If more than one candidate have equal power products, then all of these 120 candidates are converted to rules by slaves and they go to final selection at the master. Table 1 shows the results of the benchmark problems. The problems are adopted from [Boege et al. 1986, Backelin and Froberg 1991]. Refer to [Terasaki et ai. 1992] for further details. Timing and speedup of the master-slave Table 1: arch.( unit:sec) Problems Katsura-4 Katsura-5 Cyc.5-roots Cyc.6-roots 5.2 1 8.90 1 86.74 1 27.58 1 1430.18 1 2 7.00 1.27 57.81 1.50 21.08 1.31 863.62 1.66 Processors 4 8 6.53 5.83 1.53 1.36 39.88 31.89 2.18 2.72 19.27 19.16 1.44 1.43 433.73 333.25 4.29 3.30 16 9.26 0.96 36.00 2.41 25.20 1.10 323.38 4.42 Boolean Constraint Solver There are several algorithms that solve Boolean constraints, but we do not know so many that we can get the canonical form of constraints, one that can calculate solu tions incrementally and that uses no parameter variables. These criteria are important for using the algorithm as a constraint solver, as we described in Section 2. First, we implemented the Boolean Buchberger algorithm [Sato and Sakai 1988] for the CAL system, then we tried to parallelize it for the GDCC system. This algorithm satisfies all of these criteria. Moreover, we developed another sequential algorithm named Incremental Boolean elimination, that also satisfies all these criteria, and we implemented it for the CAL system. 5.2.1 Constraint Solver by Buchberger Algorithm We first developed a Boolean constraint solver based on the modified Buchberger algorithm called the Boolean Buchberger algorithm [Sato and Sakai 1988, Aiba et al. 1988]. Unlike the Buchberger algorithm, it works on the Boolean ring instead of on the field of complex numbers. It calculates the canonical form of Boolean constraints called the Boolean Grabner base. The constraint solver first transforms formulas including some Boolean operators such as inclusive-or (V) and! or not (.) to expressions on the Boolean ring before applying the algorithm. We parallelized the Boolean Buchberger algorithm in KL1. First we analyzed the execution of the Boolean Buchberger algorithm on CAL for some examples, then we found the large parts that may be worth parallelizing, rewriting formulas by applying rules. We also tried to find parts in the algorithm which can be parallelized by analyzing the algorithm itself. Then, we decided to adopt a master-slave parallel execution model. In a master-slave model, one master processor plays the role of the controller and the other slave processors become the reducers. The controller manages Boolean equations, updates the temporary Grabner bases (GB) stored in all slaves, makes S-polynomials and self-critical pair polynomials, and distributes equations to the reducers. Each reducer has a copy of GB and reduces equations which come from the controller by GB, and returns non-zero reduced equations to the controller. When the controller becomes idle after distributing equations, the controller plays the role of a reducer during the process of reduction. For the 6-queens problem, the speedup ratio of 16 processors to a single processor is 2.96. Because the parallel execution part of the problem is 77.7% of whole execution, the maximum speedup ratio is 4.48 in our model. The difference is due to the task distribution overhead, the update of GB in each reducer, and the imbalance of distributed tasks. Then, we improved our implementation so as not to make redundant critical pairs. This improvement causes the ratio of parallel executable parts to decrease, so the improved version becomes faster than the origInal version, but the speedup ratio of 16 processors to a single processor drop to 2.28. For more details on the parallel algorithm and results, refer to [Terasaki et al. 1992]. 5.2.2 Constraint Solver by Incremental Boolean Elimination Algorithm Boolean unification and SL-resolution are well known as Boolean constraint solving algorithms other than the Boolean Buchberger algorithm. Boolean unification is used in CHIP [Dincbas et ai. 1988] and SL-resolution is used in Prolog III [Colmerauer 1987]. Boolean unification itself is an efficient method. It becomes even more efficient using the binary decision diagrams (BDD) as data structures to represent Boolean formulas. Because the solutions by Boolean unification include extra variables introduced during execution, it cannot calculate any canonical form of the given constraints if we execute it incrementally. For this reason, we developed a new algorithm, Incremental Boolean elimination. As with the Boolean unification, this algorithm is based on Boole's elimination, but it introduces no extra variables, and it can calculate a canonical form of the given Boolean constraints. We denote Boolean variables by x, y, z, ... , and Boolean polynomials by A, B, C,. ... We represent all Boolean formulas only by logical connectives and (x) and exclusive-or (+). For example, we can represent Boolean formulas F!\ G, F V G and -,F by F x G, F x G + F + G and F + 1. We use the expression Fx=G to represent the formula obtained by substituting all occurrences of variable x in formula F with formula G. We omit x symbols 121 as usual when there is no confusion. We assume that there is a total order over variables. We define the normal Boolean polynomials recursively as follows. 1. The two constants 0, and 1 are normal. 2. If two normal Boolean polynomials A and B consist of only variables smaller than x, then Ax + B is normal, and we denote it by Ax EB B. We call A the coefficient of x. If variable x is at a maximum in formula F, then we can transformF to the normal formula (Fx=o+Fx=l)XEBFx=o. Hence we assume that all polynomials are normal. Boole's elimination says that if a Boolean formula F is 0, then Fx=o x Fx=l (= G) is also 0. Because G does not include x, if F includes x, then G includes fewer variables than F. Similarly we can get polynomials with fewer variables gradually by Boole's eliminations. Boolean unification unifies x with (Fx=o + Fx=l + 1)u + Fx=o after eliminating variable x from formula F, where u is a free extra variable. This unification means the substitution x with (Fx=o+Fx=l +1)u+Fx=o, when a new Boolean constraint with variable x is given, the result of the substitution contains u instead of x. Therefore, Boolean unification unifies u with a formula with another extra variable. Incremental Boolean elimination applies the following reduction to every formula instead of transforming F = to x = (Fx=o + Fx=l + 1)u + Fx=o and unifying x with (Fx=o + Fx=l + l)u + Fx=o. That is why the Incremental Boolean elimination needs no extra variables. ° Reduction A formula Cx (C ;j. 1) is reduced by the formula Ax EB B = shown below. This reductlon tries to reduce the coefficient of x to 1 if possible, otherwise it tries to reduce it to the sma.llest formula possible. ° Cx Cx -1 X -1 + BC + B (A + 1)Cx + BC (AC + A + C == 1) (otherwise) When a new Boolean constraint is given, the following operation is executed, since Incremental Boolean elimination does not execute unification. ° Merge Operation Let Cx EB D = be a new constraint, and suppose that we have a constraint AxEBB = 0. Then we make the merged constraint (AC + A +C)x EB (BD + B + D) = the new solution. If the normal form of AC D + BC + CD + D is not 0, we successively apply the merge operation to it. ° This operation is an expansion of Boole's elimination. That is, if we have no constraint yet, we can consider A and B as O. In this case, the merge operation is the same as Boole's elimination. Example Consider the following constraints. Exactly one of five variables a, b, c, d, e (a < b < c < d < e) is l. a 1\ b = 0, a 1\ c = 0, a 1\ d = 0, a 1\ e = 0, b 1\ c = 0, b 1\ d = 0, b 1\ e = 0, c 1\ d = 0, c 1\ e = 0, d 1\ e = 0, aVbVcVdVe=1 By Incremental Boolean elimination, we can obtain the following canonical solution. e (c+b+a)xd (b+a)xc ax b d+c+b+a+1 ° ° ° The solution can be interpreted as follows. Because the solution does not have an equation of the form A x a = B, variable a is free. Because a x b = 0, if a = 1 then the variable b is 0. Otherwise b is free. The discussion continues and, finally, because e = d + c + b + a + 1, if a, b, c, d are all 0, then variable e is 1. Otherwise e. is 0. By assignment of or 1 to all variables in increasing order of < under a solution by Boolean Incremental elimination, we can easily obtain any assignments that satisfy the given constraints. Thus, by introducing an adequate order to variables, we can obtain a favorite enumeration of assignments satisfy the given constraints. ° 5.3 Integer Linear Constraint Solver The constraint solver for the integer linear domain checks the consistency of the given equalities and inequalities of the rational coefficients, and, furthermore, gives the maximum or minimum values of the objective linear function under these constraint conditions. The purpose of this constraint solver is to provide an efficient constraint solver for the integer optimization domain by achieving a computation speedup incorporating pa.rallel execution into the search process. The integer linea.r solver utilizes the rational linear solver (parallellinea.r constraint solver) for the optimization procedure to obtain an evaluation of relaxed linear problems created in the course of its solution. A rational linear solver is realized by the simplex algorithm. We implemented the integer linear constraint solver for GDCC. 5.3.1 Integer Linear Programming and Branch and Bound Method In the following, we discuss a parallel search method employed in this integer linear constraint solver. The problem we are addressing is a mixed integer programming problem, namely, to find the maximum or minimum value of a given linear function under the integer linear constraints. The problem can be defined as follows: The problem is to minimize the following objective function on variables 122 Xj which run on real numbers, and variables Yj which run on integers: n Z m = L:Pi Xi + i=l L:qi Yi i=l under the linear constraint conditions: n m L: aij Xi + L: i=l bij Yi I: -:5. Ys -:5. [y:] y:' = [y:] 2: ej, for j = 1, ... ,1, i=l n m L:CijXi+ L:dijYi=/j, forj= 1, ... ,k, i=l i=l [Y;]+l-:5.ys -:5. u ; y:/1 = [y:]+l 'Figure 7: Branching of Nodes where Xi E R, and Yi Xi 2: 0, for i = 1, ... , n E Z, where li:S Yi :SUi, Ij,Uj EZ, fori=I, ... ,m and The method we use is the Branch-and-Bound algorithm. Our algorithm checks in the first place the solution of the original problem without requiring variables Yi in the above to take integer value. We call this problem a continuously relaxed problem. If the continuously relaxed problem does not have an integer solution, then we proceed by dividing the original problem into two subproblems successively, producing a tree structured search space. Continuously relaxed problems can be solved by the simplex algorithm, and if the original integer variables have exact integer values, then it yields the solution to the integer problem. Otherwise, we select an integer variable Ys which takes a non-integer value Ys for the solution of continuously relaxed problems, and imposes two different interval constraints derived from neighboring integers of the value Ys, Is -:5. Ys -:5. [Y3] and [Y3] + 1 -:5. Ys -:5. Us to the already existing constraints, and obtains two child problems (See Figure 7). Continuing this procedure, which is called branching, we go on dividing the search space to produce more constrained sub-problems. Eventually this process leads to a sub-problem with the continuous solution which is also the integer solution of the problem. We can select the best integer solution from among those found in the process. While the above branching process only enumerates integer solutions, if we have a measure to guarantee that a sub-problem cannot have a better solution compared to the already obtained integer solution in terms of the optimum value of the objective function, then we can skip that su b-problem and only need to search the rest of the nodes. Continuously relaxed problems give a measure for this, since these relaxed problems always have better optimum values for the objective function than the original integer problems. Sub-problems whose continuously relaxed problems have no better optimum than the integer solution obtained already cannot give a better optimum value, which means it is unnecessary to search further (bounding pro ced ure) . We call these sub-problems obtained through the branching process search nodes. The following two important factors decide the order in which the sequential search process goes through nodes in the search space: 1. The priorities of sub-problems(nodes) in deciding the next node on which the branching process works. 2. Selection of a variable out of the integer variables with which the search space is divided. It is preferable that the above selections are done in such a way that the actual nodes searched in the process of finding the optimal form as small a part of the total search space as possible. vVe adopted one of the best heuristics of this type from operations research as a basis of our parallel algorithm( [Benichou et ai. 1971]). 5.3.2 Parallelization Method of Branch-and-Bound As a parallelization of the Branch-and-Bound algorithm, we distribute search nodes created through the branching process to different processors, and let these processors work on their own sub-problems following a sequential search algorithm. Each sequential search process communicates with other processes to transmit information on the most recently found solutions and on pruning subnodes, thus making the search proceed over a network of processors. We adopted one of the best search heuristics used in sequential algorithms. Heuristics are used for controlling the schedule of the order of sub-nodes to be searched, in order to reduce the number of nodes needed to get to the final result. Therefore, it is important in designing parallel versions of search algorithms to balance the distributed load among processors, and to communicate information for pruning as fast as possible between these processors. 123 We considered a parallel algorithm design derived from the above sequential algorithm to be implemented on the distributed memory parallel machine Multi-PSI. Our parallel algorithm exploits the independence of many sub-processes created through the branching procedure in the sequential algorithm and distributes these processes to different processors (see Figure 8). Scheduling of sub-problems is done by the use of the priority control facility provided from the KL1 language (See[Oki et ai. 1989]). The incumbent solutions are transferred between processors as global data to be shared so that each processor can update the current incumbent solution as soon as possible. o o o ~o~ EJEJ EJ Figure 8: Generation of Parallel Processes 5.3.3 Experimental Results We implemented the above parallel algorithm in the KL1 language and experimented with the job-shop scheduling problem as an example of mixed-integer problems. Below are the results of computation speedups for a "4 job 3 machine" problem and the total number of searched nodes to get to the solution. Table 2: Speedup of the Integer Linear Constraint Solver processors speedup number of nodes 1.0 242 2 4 8 1.5 248 1.9 395 2.3 490 The above table shows the increase of the number of searched nodes as the number of processors grows. This is for one reason because of the speculative computation inherent in this type of parallel algorithm. Another reason is that the communication latency produces unnecessary computation which could have been avoided if incumbent solutions are communicated instantaneously from the other processor and the unnecessary nodes are pruned. It is in this way that we get the problem in parallel programming of how to reduce the growth in size of the total search space when multi-processors are used compared with that traversed on one processor using sequential algorithms. 5.4 5.4.1 Hierarchical Constraint Solver Soft Constraints and Constraint Hierarchies We have proposed a logical foundation of soft constraints in [Satoh 1990] by using a meta-language which expresses interpretation ordering. The idea of formalizing soft constraints is as follows. Let hard constraints be represented in first-order formulas. Then an interpretation which satisfies all of these first-order formulas can be regarded as a possible solution and soft constraints can be regarded as an order over those interpretations because soft constraints represent criteria applying to possible solutions for choosing the most preferred solutions. We use a metalanguage which represents a preference order directly. This meta-language can be translated into a second-order formula to provide a syntactical definition of the most preferred solutions. Although this framework is rigorous and declarative, it is not computable in general because it is defined by a second-order formula. Therefore, we have to restrict the class of constraints so that these constraints are computable. Therefore, we introduce the following restriction to make the framework computable. 1. We fix the considered domain so that interpretations of domain-dependent relations are fixed. 2. Soft and hard constraints consist of domaindependent relations only. If we accept this restriction, the soft constraints can be expressed in a first-order formula. Moreover, there is a relationship between the above restricted class of soft constraints and hierarchical CLP languages (HCLP languages) [Borning et ai. 1989, Satoh and Aiba 1990b], as shown in [Satoh and Aiba 1990a]. HCLP language is a language augmenting CLP language with labeled constraints. An HCLP program consists of rules of the form: h : - bb ... ,bn where h is a predicate, and bI , ... , bn are predicate invocations or constraints or labeled constraints. Labeled constraints are of the form: label C where C is a constraint in which only domain-dependent functional symbols can be functional symbols and label is a label which expresses the strength of the constraint C. As shown in [Satoh and Aiba 1990a], we can calculate the most preferable solutions by constraint hierarchies 124 in the HCLP language. Based on this correspondence, we have implemented an algorithm for solving constraint hierarchy on the PSI machine with the following features. 1. There are no redundant calls of the constraint solver for the same combination of constraints since it calculates reduced constraints in a bottom-up manner. 2. If an inconsistent combination of constraints is found by calling the constraint solver, it is registered as a nogood and is used for detecting further contradiction. Any extension of the combination will not be processed so as to avoid unnecessary combinations. 3. Inconsistency is detected without a call of the constraint solver if a processed combination subsumes a registered nogood. In [Borning et al. 1989], Borning et al. give an algorithm for the solving constraint hierarchy. However, it uses backtracking to get an alternative solution and so may redundantly call the constraint solver for the same combination of constraints. Our implemented language is called CHAL (Contrainte Hierarchiques avec Logique) [Satoh and Aiba 1990b], and is an extension of CAL. 5.4.2 Table 3: Performance of Parallel Hierarchical Constraint Solver( unit: sec) problems Tele4 5queen 6queen 1 43 1 69 1 517 1 2 32 1.34 39 1.77 264 1.96 Processors 4 8 32 32 1.34 1.34 26 21 2.65 3.29 136 77 3.80 6.71 16 29 1.48 19 3.63 50 10.34 Table 3 shows the speedup ration for three examples. Tele4 is to solve ambiguity in natural language phrases. 5queen arid 6queen are to solve the 5 queens and 6 queens problem. We represent these problems in Boolean constraints and use the Boolean Buchberger algorithm [Sato and Sakai 1988, Sakai and Aiba 1989] to solve the constraints. According to Table 3, we obtain 1.34 speedup for Tele4, 3.63 speedup for 5queen, and 10.34 speedup for 6queen. Although 6queen is a large problem for the Boolean Buchberger algorithm and gives us the largest speedup, the speedup saturates at around 16 processors. This expresses that the load is not well-distributed and we have to look for a better load-balancing method in the future. Parallel Solver for Constraint Hierarchies The algorithms we have implemented on the PSI machine have the following parallelism. 1. Since we construct a consistent constraint set in a bottom-up manner, the check for consistency for each independent constraint set can be done in parallel. 2. We can check if a constraint set is included in nogoods in parallel for each independent constraint set. 3. There is parallelism inside a domain-dependent constraint solver. 4. We can check for answer redundancy in parallel. Among these parallelisms, the first one is the most coarse and the most suitable for implementation on the Multi-PSI machine. So, we exploit the first parallelism. Then, features of the parallel algorithm become the following. 1. Each processor constructs a maximal consistent constraint set from a given constraint set in a bottom-up manner in parallel. However, oncea constraint set is given, there is no distribution of tasks. So, we make idle processors require some task from busy processors and if a busy processor can divide its task, then it sends the task to the idle processor. 2. By pre-evaluation of a parallel algorithm, we found that the nogood subsumption check and the redundancy check have very large overheads. So, we do not check nogood subsumptions and we check redundancy only at the last stage of execution. 5.5 Set Constraint Solver The set constraint solver handles any kind of constraint presented in the following conjunction of predicates. where each predicate Fi(X,X) is a predicate constructed from predicate symbols E, ~, =J. and =, function symbols n, U, and ."', element variables x, and set variables X, and some symbols of constant elements. For the above constraints, the solver gives the answer of the form: fl(X,X) =0 hl(X) = 0 hm(x) = 0 where hl(X) = 0, ... , hm(x) = 0 give the necessary and sufficient conditions for satisfying the constraints. Moreover, for each solution for the element variables, the system of whole equations instantiated by the solution puts the original constraints into a normal form (i.e. a solution). For more detailed information on the constraint solver, refer to [Sato et al. 1991]. 125 Let us first consider the following example. 0 A"'nC"'nE'" CUE ;2 B CUE ;2 D DnB'" ;2 A'" nB ~ A D AuB ;2 D In this example, {p} * {x} = 0 is the satisfiability condition. This holds if and only if x i- p. In this case, there are always A, B, C and D that satisfy the original constraints. The normal form is: D E*C A*B {x}*E*B {p} where the notation A'" denotes the complement of A. Since a class of sets forms a Boolean algebra, this constraint can be considered a Boolean constraint. Hence we can solve this by computing its Boolean Grabner base: D 5.6 We should note that there is neither an element variable nor a constant on elements in the above constraints. Hence they can be expressed as Boolean equations with variables A, B, C, D and E. This, however, does not necessarily hold in every constraint of sets. Consider the following constraints with an additional three predicates including elements. A"'nC"'nE'" 0 CuE CUE DnB'" ;2 ;2 A A'" nB ~ D AUB ;2 D ;2 (C' n {x}) U (E n {p} ) x p rt. rt. B D Dn{x,p} A B where x is an element variable and p is a constant symbol of an element. This can no longer be represented with the Boolean equations as above. For example the last formula is expressed as {p} * B = 0, where {p} is considered a coefficient. In order to handle such general Boolean equations, we extended the notion of Boolean Grabner bases [Sato et al. 1991], which enabled us to implement the set constraint solver. For the above constraint, the solver gives the following answer: {x}*E*B {p} *C *A {p} * E {x} * C {x} * A {p} * B {p} * {x} {x}*E+{x}*B+{x} {p} * C + {p} * A + {p} {p} *A {x} *B o o E+C+l o D E+C+l o A+B E*C E*C A*B *C *A {p} * E {x} * C {x} * A {p} * B A+B A+B E+C+ 1 o {x}*E+{x}*B+{x} {p} * C + {p} * A {p} * A {x} * B o o o + {p} Dependency Analysis of Constraint Set From several experiments on writing application programs, we can conclude that the powerful expressiveness of these languages is a great aid to programming, since all users have to do to describe a program is to define the essential properties of the problem itself. That is, there is no need to describe a method to solve the problem. On the other hand, sometimes the generality and power of constraint solvers turn out to be a drawback for these languages. That is, in some cases, especially for very powerful constraint solvers like the algebraic constraint solver in CAL or GDCC, it is difficult to implement them efficiently because of their generalities, in spite of great efforts. As a subsystem of language processors, efficiency in constraint solving is, of course, one of the major issues in the implementation of those language processors [Marriott and Sondergaard 1990, Cohen 1990]. In general, for a certain constraint set, the efficiency of constraint solving is strongly dependent on the order in which constraints are input to a constraint solver. However, in sequential CLF languages like CAL, this order is determined by the position of constraints in a program, because a constraint solver solves constraints accumulated by the inference engine that follows SLD-resolution. In parallel eLF languages like GDCC, the order of constraints input to a constraint solver is more important than in sequential languages. Since an inference engine and constraint solvers can run in parallel, the order of constraints is not determined by their position in a program. Therefore, the execution time may vary according to the order of constraints input to the constraint solver. In CAL and GDCC, the computation of a Grabner base is time-consuming and it is well known that the Buchberger algorithm is doubly exponential in worst-case complexity [Hofmann 1989]. Therefore, it is worthwhile to rearrange the order of constraints to make the constraint solver efficient. 126 We actually started research into the order of constraints based on dependency analysis [Nagai 1991, Nagai and Hasegawa 1991]. This analysis consisted of dataflow analysis, constraint set collection, dependency analysis on constraint sets, and determination of the ordering of goals and the preference of variables. To analyze dataflow, we use top-down analysis based on SLD-refutation. For a given goal and a program, the invocation of predicates starts from the goal without invoking a constraint solver, and variable bindings and constraints are collected. In this analysis, constraints are described in terms of graphical (bipartite graph) representation. An algebraic structure of s set of constraints is extracted using DM decomposition [Dulmage and Mendelsohn 1963], which computes a block upper triangular matrix by canonical reordering a matrix corresponding to the set of constraints. As a result of analysis, a set of constraints can be partitioned into relatively independent subsets of constraints. These partitions are obtained so that the number of variables shared among different blocks is as small as possible. Besides this partition, shared variables among partitions and shared variables among constraints inside of a block are also obtained. Based on these results, the order of goals and the precedence of variables are determined. 'vVe show the results of this method for two geometric theorem proving problems [Kapur and Mundy 1988, Kutzler 1988]: one is the theorem that three perpendicular bisectors of three edges of a triangle intersect at a point, and the other is the, so-called, nine points circle theorem. The former theorem can be represented by 5 constraints with 8 variables and gives about 3.2 times improvement. The latter theorem can be represented by 9 constraints with 12 variables and gives about 276 times improvement. 6 CAL and GDCC Application Systems To show the feasibility of CAL and GDCC, we implemented several application systems. In this section, two of these, the handling robot design support system and the Voronoi diagram construction program, are described. 6.1 Handling Robot Design Support System The design process of a handling robot consists of a fundamental structure design and a internal structure design [Takano 1986]. The fundamental structure design determines the framework of the robot, such as the degree of freedom, number of joints, and arm length. The internal structure design determines the internal details of the robot, such as the mortar torque of each joint. The handling robot design support system mainly supports the fundamental structure design. Currently, the method to design a handling robot is as follows: 1. First, the type of the robot, such as cartesian manipulator, cylindlical manipulator, or articulated manipulator has to be decided according to the requirements for the robot. 2. Then, a system of equations representing the relation between the end effector and joints is deduced. Then the system of equations is transformed to obtain the desired form of equations. 3. Next, a program to analyze the robot being designed is coded by using an imperative programming language, such as Fortran or C. 4. By executing the program, the design is evaluated. If the result is satisfactory, then the design process terminates, otherwise, the whole process should be repeated until the result satisfies the requirements. By adopting the CLP paradigm to the design process of a handling robot, through coding a CLP program representing the relation obtained in 2 in the above, the transformation can be done by executing the program. Thus, processes 2 and 3 can be supported by a computer. 6.1.1 Kinematics and Statics Program by Constraint Programming Robot kinematics represents the relation between a position and the orientation of the end effector, the length of each arm, and the rotation angle of each joint. We call a position and an orientation of the end effector, hand parameters, and we call the rest, joint parameters. Robot statics represent the relation between joint parameters: force working on the end-effector, and torque working on each joint [Tohyama 1989]. These relations are essential for analyzing and evaluating the structure of a handling robot. To make a program that handles handling robot structures, we have to describe a program independent of its fundamental structure. That is, kinematics and statics programs are constructed to handle any structure of robot by simply changing a query. Actually, these programs receive a matrix which represents the structure of a handling robot being designed in terms of a Est of lists. By manipulating the structure of this argument, any type of handling robot can be handled by the one program. For example, the following query asks the kinematics of a handling robot with three joints and three arms. robot([[cos3, sin3, 0, 0, z3, 0, 0, 1J, [cos2, sin2, x2, 0, 0, 1, 0, OJ, [cos1, sin1, 0, 0, z1, 0, 0, 1JJ, 127 5, 0, 0, 1, 0, 0, 0, 1, 0, px, py, pz, ax, ay, az, cx, cy, cz). where the first argument represents the structure of the handling robot, px, py, and pz represents a position, ax, ay, az, ex, cy, and cz represents an orientation by defining two unit vectors which are perpendicular to each other. sin's and cos's represent the rotation angle of each joint, and z3, x2, and z1 represent the length of each arm. For this query, the program returns the following answer. cos1-2 cos2-2 cos3-2 px py pz ax ay az ex ey ez 1-sin1-2 1-sin2-2 1-sin3-2 -5*cos2*sin3*sin1+z_3*sin2*sinl +5*eos3*eos1+x_2*eos1 5*cos3*sin1+x 2*sinl +5*eos1*cos2*sin3-z_3*eosl*sin2 5*sin3*sin2+z_1+z_3*eos2 -1*eos2*sin3*sin1+eos3*eos1 cos3*sin1+cosl*eos2*sin3 sin3*sin2 -1*eos1*sin3-eos3*eos2*sinl -1*sin3*sin1+eos3*eos1*eos2 eos3*sin2 That is, the parameters of the position and the orientation are expressed in terms of the length of each arm and the rotation angle of each joint. Note that this kinematics program has the full features of the CLP program. The problem of calculating hand parameters from joint parameters is called forward kinematics, and the converse is called inverse kinematics. vVe can deal with both of them with the same program. This program can be seen as a generator of programs dealing with any handling robot which has a user designed fundamental structure. Statics has the same features as the kinematics program described. That is, the program can deal with any type of handling robot by simply changing its query. 6.1.2 Construction of Design Support System The handling robot design support system should have the following functions 1. to generate the constraint representing kinematics and statics for any type of robot, 2. to solve forward and inverse kinematics, 3. to calculate the torque which works on each joint, and 4. to evaluate manipulability. The handling robot design support system consists of the following three GDCC programs in order to realize these functions, Kinematics a kinematics program Statics a statics program Determinant a program to calculate the determinant of a matrix Kinematics and Statics are the programs we described above. A matrix to evaluate the manipulability of a handling robot, called a Jacobian matrix, is obtained from the Statics program. Determinant is used to calculate the determinant of a Jacobian matrix. This determinant is called the manipulability measure and it expresses the manipulability of the robot quantitatively [Yoshikawa 1984]. To obtain concrete answers, of course, the system should utilize the GDCC ability to approximate the real roots of univariate equations. 6.2 Constructing the Voronoi Diagram We developed an application program which constructs Voronoi Diagram written in GDCC. By using the constraint paradigm, we can make a program without describing a complicated algorithm. A Voronoi diagram can be constructed by using constraints which describe only the properties or the definition of the Voronoi diagram. This program can compute the Voronoi polygon of each point in parallel. 6.2.1 Definition of the Voronoi Diagram For a given finite set of points 5 in a plane, a Voronoi diagram is a partition of the plane so that each region of the partition is a set of points in the plane closer to a point in 5 in the region than to any other points in 5 [Preparata and Shamos 1985]. In the simplest case, the distance between two points is defined as the Euclidian distance. In this case, a Voronoi diagram is defined as follows. Given a set 5 of N points in the plane, for each point Pi in 5, the Voronoi polygon denoted as V(Pi ) is defined by the following formula. where d(P, Pi) is a Euclidian distance between P and Pi' The Voronoi diagram is a partition so that each region is the Voronoi polygon of each point (see Figure 9). The vertices of the diagram are Voronoi vertices and its line segments are Voronoi edges. Voronoi diagrams are widely used in various application areas, such as physics, ecology and urbanology. 6.2.2 Detailed Design The methods of constructing Voronoi diagrams are classified into the following two categories: 1. The incremental method([Green and Sibson 1978]), and 128 Table 4: Runtime and reductions Processors Points 10 20 50 Figure 9: A Voronoi Diagram 100 200 2. The divide-and-conquer method([Shamos and Hoey 1975]). However, the simplest approach to constructing a Voronoi diagram is, of course, constructing its polygons one at a time. Given two points, Pi and Pj, a set of points closer to Pi than to Pj is just a half-plane containing Pi that is divided by the perpendicular bisector of PiPj. We name this line H(Pi , Pj). The Voronoi polygon of Pi can be obtained by the following formula. By using the linear constraint solver for GDCC, the Voronoi polygon can be constructed by the following algorithm which utilizes the above method to obtain the polygon directly. E. +- {x ;::: 0, Y;::: 0, x:::; XMax, Y:::; YMax} {loop a} for i = 1 to n C Fo +- lineaLconstraint..solver(E.) for j = 1 to n if(j i- i) then Ej Y:::; (Pjx - P;x)/(Pjy - P;y) . x +(P/y + plx - P;~ - P/x )/2 . (Pjy - p;y)2 CFj +- linear_constraint..solver(Ej U CFj_l) +- Let {eql,eq2, ... ,eqd(0 ~ k:::; n) be a set of equations obtained by changing inequality symbols in C Fj to equation symbols. {loop b} for I = 1 to k vertices := {} m:= 1 while (m =< k & number of elements of vertices i- 2) pp +- intersection( eql, eqm) if pp satisfies the constraint set C F; then vertices := {pp} U vertices m:= m+ 1 add the line segment between vertices to Voronoi edges. end. In this algorithm, the first half computes the Voronoi polygon for each point's Pi by obtaining all perpendicular bisectors of segments between Pi and other points and eliminating redundant ones. The second half computes the Voronoi edges. 2This inequality represents a half plane divided by a perpendicular bisector of (Pi, Pj) 400 1 130 1 890 1 4391 1 17287 1 52360 1 220794 1 2 67 1.936 447 1.990 2187 2.007 8578 2.015 26095 2.006 110208 2.003 4 33 3.944 241 3.685 1102 3.981 4305 4.014 13028 4.018 54543 4.048 Reductions 8 17 7.377 123 7.218 566 7.749 2191 7.887 6506 8.047 27316 8.082 15 16 7.844 88 10.077 336 13.065 1263 13.679 3500 14.959 14819 14.899 ( x 1000) 5804 42460 210490 830500 2458420 10161530 To realize the above algorithm on parallel processors, each procedure for each i in loop a in the above is assigned to a group of processes. That is, there are n process groups. Each procedure for each l in the loop b is assigned to a process in the same process group. This means that each process group contain k processes. These n x k processes are mapped onto multi-processor machines. 6.2.3 Results Table 4 shows the execution time and speedup for 10 to 400 points with 1 to 15 processors. According to the results, we can conclude that, when the number of points is large enough, we can obtain efficiency which is almost in proportion to the number of processors. By using this algorithm, we can handle the problem of constructing a Voronoi diagram in a very straight forward manner. Actually, comparing the size of the programs, this algorithm can be described in almost one third of the size of the program that is used by the incremental method. 7 Conclusion In the FGCS project, we developed two CLP languages: CAL, and GDCC to establish the knowledge programming environment, and to write application programs. The aim of our research is to construct a powerful high-level programming language which is suitable for knowledge processing. It is well known that constraints play an important role in both knowledge representation and knowledge processing. That is, CLP is a promising candidate as a high level programming language in this field. Compared with other CLP languages such as CLP(R), Prolog III, and CHIP, we can summarize the features of CAL and GDCC as follows: • CAL and GDCC can deal with nonlinear algebraic constraints. 129 • In the algebraic constraint solver, the approximate values of all possible real solutions can be computed, if there are only finite number of solutions. • CAL and GDCC have a multiple environment handler. Thus, even if there is more than one answer constraints, users can manipulate them flexibly. • Users can use multiple constraint solvers, and furthermore, users can define and implement their own constraint solvers. CAL and GDCC enable us to write possibly nonlinear polynomial equations on complex numbers, relations on truth values, relations -on sets and their elements, and linear equations and linear inequalities on real numbers. Since starting to write application programs for the algebraic constraint solver in the field of handling robot, we have wanted to compute the real roots of univariate nonlinear polynomials. We made this possible with CAL by adding a function to approximate the real roots, and we modified the Buchberger algorithm able to handle approximation values. Then, we faced the problem that a variable may have more than one value. To handle this situation in the framework of logic programming, we introduced a context tree in CAL. In GDCC, we introduced blocks into the language specification. The block in GDCC not only handle multiple values, but also localize the failure of constraint solvers. As for CAL, the following issues are still to be considered: 1. Meta facilities: Users cannot deal with a context tree from a program, that is, meta facilities in CAL are insufficient to allow users to do all possible handling of answer constraints themselves. 2. Partial evaluation of CAL programs: Although we try to analyze constraint sets by adopting dependency analysis, that work will be more effective when combined with partial evaluation technology or abstract interpretation. 3. More application programs: We still have a few application programs in CAL. By writing many application programs in various application field, we will have ideas to realize a more powerful CLP language. For this purpose, we are now implementing CAL in a dialect of ESP, called Common ESP, which can run on the UNIX operating system to be able to use CAL in various machines. tools to handle multiple contexts m GDCC's language specification. 2. More efficient constraint solvers: We need to improve both the absolute performance and the parallel speedup of the constraint solvers. 3. More application programs: Since parallel CLP language is quite new language, writing application programs may help us to make it powerful and efficient. Considering our experiences of using CAL and GDCC and the above issues, we will refine the specification and the implementation of GDCC. These refinements and experiments on various application programs clarified the need for a sufficiently efficient constraint logic programming system with high functionalities in the language facilities. Acknowledgment The research on the constraint logic programming system was carried out by researchers in the fourth research laboratory in ICOT in tight cooperation with cooperating manufactures and members of the CLP working group. Our gratitude is first due to those who have given continuous encouragement and helpful comments. Above all, we particularly thank Dr. K. Fuchi, the director of the ICOT research center, Dr. K. Furukawa, a vice director of the ICOT research center, and Dr. M. Amamiya of Kyushu University, the chairperson of the CLP working group. We would also like to thank a number of researchers we contacted outside of ICOT, in particular, members of the working group for their fruitful and enlightening discussions and comments. Special thanks go to all researchers in the fourth research laboratory: Dr. K. Sakai, Mr. T. Kawagishi, Mr. K. Satoh, Mr. S. Sato, Dr. Y. Sato, Mr. N. Iwayama, Mr. D. J. Hawley, who is now working at Compuflex Japan, Mr. H. Sawada, Mr. S. Terasaki, Mr. S. Menju, and the many researchers in the cooperating manufacturers. References As for GDCC, the following issues are still to be considered: [Aiba et al. 1988] A. Aiba, K. Sakai, Y. Sato, D. J. Hawley, and R. Hasegawa. Constraint Logic Programming Language CAL. In Proceedings of the International Conference on Fifth Generation Computer Systems 1988, November 1988. 1. Handling multiple contexts: Although current GDCC has functionalities to handle multiple contexts, users have to express everything explicitly. Therefore, we can design high-level [Backelin and Froberg 1991] J. Backelin and R. Froberg. How we proved that there are exactly 924 cyclic 7-roots. In S. M. Watt, editor, Proceedings of ISSAC'91. ACM, July 1991. 130 [Benichou et al. 1971] M. Benichou, L. M. Gauthier, P. Girodet, G. Hentges, G. Ribiere, and O. Vincent. Experiments in mixed-integer linear programming. Nlathematical Programming, (1), 1971. [Boege et al. 1986] W. Boege, R. Gebauer, and H. Kredel. Some Examples for Solving Systems of Algebraic Equations by Calculating Grabner Bases. Journal of Symbolic Computation, 2(1):83-98,1986. [Borning et al. 1989] A. Borning, M. Maher, A. Martindale, and M. Wilson. Constraint Hierarchies and Logic Programming. In Proceedings of the International Conference on Logic Programming, 1989. [Buchberger 1985] B. Buchberger. Grabner bases: An Algorithmic Method in Polynomial Ideal Theory. In N. Bose, editor, Multidimensional Systems Theory, pages 184-232. D. Reidel Publ. Comp., Dordrecht, 1985. [CAL Manual] Institute for New Generation Computer Technology. Contrante Avec Logique version 2.12 User's manual. in preparation. [Chikayama 1984] T. Chikayama. Unique Features of ESP. In Proceedings of FGCS'84, pages 292-298, 1984. [Chikayama et al. 1988] T. Chikayama, H. Sato, and T. Miyazaki. Overview of Parallel Inference Machine Operationg System (PIMOS). In International Conference on Fifth Generation Computer Systems, pages 230-251, 1988. [Dulmage and Mendelsohn 1963] A. L. Dulmage and N. S. Mendelsohn. Two algorithms for bipartite graphs. Journal of SIAM, 11(1), March 1963. [Green and Sibson 1978] P. J. Green and R. Sibson. Computing Dirichlet Tessellation in the Plane. The Computer Journal, 21, 1978. [Hofmann 1989] C. M. Hoffmann. Grabner Bases .Techniques, chapter 7. Morgan Kaufmann Publishers, Inc., 1989. [Jaffar and Lassez 1987] J. Jaffar and J-L. Lassez. Constraint Logic Programming. In 4th IEEE Symposium on Logic Programming, 1987. [Kapur and Mundy 1988] K. Kapur and J. L. Mundy. Special volume on geometric reasoning. Artificial Intelligence, 37(1-3), December 1988. [Kutzler 1988] B. Kutzler. Algebraic Approaches to Automated Geometry Theorem Proving. PhD thesis, Research Institute for Symbolic Computation, Johannes Kepler University, 1988. [Lloyd 1984] J. W. Lloyd. Foundations of Logic Programming. Springer- Verlag~ 1984. [Maher 1987] M. J. Maher. Logic Semantics for a Class of Committed-choice Programs. In Proceedings of the Fourth International Conference on Logic Programming, pages 858-876, Melbourne, May 1987. [Marriott and Sondergaard 1990] K. Marriott and H. Sondergaard. Analysis of constraint logic programs. In Proc. of NACLP '90, 1990. [Clarke et al. 1990] E. M. Clarke, D. E. Long, S. Michaylov, S. A. Schwab, J. P. Vidal, and S. Kimura. Parallel Symbolic Computation Algorithms. Technical Report CMU-CS-90-182, Computer Science Department, Carnegie Mellon University, October 1990. [Menju et al. 1991] S. Menju, K. Sakai, Y. Satoh, and A. Aiba. A Study on Boolean Constraint Solvers. Technical Report TM 1008, Institute for New Generation Computer Technology, February 1991. [Cohen 1990] J. Cohen. Constraint logic programming languages. Communications of the ACNI, 33(7), July 1990. [Nagai 1991] Y. Nagai. Improvement of geometric theorem proving using dependency analysis of algebraic constraint (in Japanese). In Proceedings of the 42nd Annual Conference of Information Processing Society of Japan, 1991. [Colmerauer 1987] A. Colmerauer. Opening the Prolog III Universe: A new generation of Prolog promises some powerful capabilities. BYTE, pages 177-182, August 1987. [Dincbas et al. 1988] M. Dincbas, P. Van Hentenryck, H. Simonis, A. Aggoun, T. Graf, and F. Berthier. The Constraint Logic Programming Language CHIP. In Proceedings of the International Conference on Fifth Generation Computer Systems 1988, November 1988. [Nagai and Hasegawa 1991] Y. Nagai and R. Hasegawa. Structural analysis of the set of constraints for constraint logic programs. Technical report TR-701, ICOT, Tokyo, Japan, 1991. lOki et al. 1989] H. Oki, K. Taki, S. Sei, and S. Furuichi. Implementation and evaluation of parallel Tsumego program on the Multi-PSI (in Japanese). In Proceedings of the Joint Parallel Processing Symposium (JSPP'89),1989. 131 [Ponder 1990] C. G. Ponder. Evaluation of 'Performance Enhancements' in algebraic manipulation systems. In J. Della Dora and J. Fitch, editors, Computer Algebra and Parallelism, pages 51-74. Academic Press, 1990. [Preparata and Shamos 1985J F. P. Preparata and M. 1. Shamos. Computational Geometry. Springer-Verlag, 1985. [Sakai and Aiba 1989] K. Sakai and A. Aiba. CAL: A Theoretical Background of Constraint Logic Programming and its Application. J01lmal of Symbolic Computation, 8:589-603, 1989. [Saraswat 1989J V. Saraswat. Concurrent Constraint PhD thesis, CarnegieMellon University, Computer Science Department, January 1989. Programming Languages. [Sato and Aiba 1991] S. Sato and A. Aiba. An Application of CAL to Robotics. Technical Report TM 1032, Institute for New Generation Computer Technology, February 1991. [Siegl 1990J K. Siegl. Grabner Bases Computation in STRAND: A Case Study for Concurrent Symbolic Computation in Logic Programming Languages. Master's thesis, CAMP-LINZ, November 1990. [Takano 1986] M. Takano. Design of robot structure (in Japanese). Journal of Japan Robot Society, 14(4), 1986. [Terasaki et al. 1992J S. Terasaki, D. J. Hawley, H. Sawada, K. Satoh, S. Menju, T. Kawagishi, N. Iwayama, and A. Aiba. Parallel Constraint Logic Programming Language GDCC and its Parallel Constarint Solvers. In International Conference on Fifth Generation Computer Systems, 1992. [Tohyama 1989] S. Tohyama. Robotics for Machine Engineer (in Japanese). Sougou Denshi Publishing Corporation, 1989. [Ueda and Chikayama 1990] K. Veda and T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. The Computer Joumai, 33(6), December 1990. [Sato and Sakai 1988] Y. Sato and K. Sakai. Boolean Grabner Base, February 1988. LA-Symposium in winter, RIMS, Kyoto University. [Vidal 1990] J. P. Vidal.. The Computation of Grabner bases on a shared memory multi-processor. Technical Report CMU-CS-90-163, Computer Science Department, Carnegie Mellon University, August 1990. [Sato et al. 1991] Y. Sato, K. Sakai, and S. Menju. Solving constraints over sets by Boolean Grabner bases (in Japanese). In Proceedings of The Logic Programming Conference '91, September 1991. [van Emden and Kowalski 1976] M. H. van Emden and R. A. Kowalski. The Semantics of Predicate Logic as a Programming Language. Journal of the ACJ'v1, 23( 4), October 1976. [Satoh 1990] K. Satoh. Formalizing Soft Constraints by Interpretation Ordering. In Proceedings of 9th European Conference on Artificial Intelligence, pages 585-590, 1990. [Van Hentenryck 1989J P. Van Hentenryck. Parallel constraint satisfaction in logic programming: Prelimiary results of chip with pepsys. In 6th International Conference on Logic Programming, pages 165-180, 1989. [Satoh and Aiba 1990a] K. Satoh and A. Aiba. Computing Soft Constraints by Hierarchical Constraint Logic Programming. Technical Report TR-610, ICOT, Tokyo, Japan, 1990. [Satoh and Aiba 1990b] K. Satoh and A. Aiba. Hierarchical Constraint Logic Language: CHAL. Technical Report TR-592, ICOT, Tokyo, Japan, 1990. [Senechaud 1990J P. Senechaud. Implementation of a parallel algorithm to compute a Grabner basis on Boolean polynomials. In J. Della Dora and J. Fi tch, editors, Computer Algebra and Parallelism, pages 159-166. Academic Press, 1990. [Shamos and Hoey 1975] M. 1. Shamos and D. Hoey. Closest-point problems. In Sixteenth Annual IEEE Symposium on Foundations of Computer Sceience, 1975. [Yoshikawa 1984] T. Yoshikawa. Measure of manipulatability of robot arm (in Japanese). Joumal of Japan Robot Society, 12(1), 1984. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 132 Parallel Theorem Provers and Their Applications Ryuzo Hasegawa and Masayuki Fujita Fifth Research Laboratory Institute for New Generation Computer Technology 4-28 Mita 1-chome, Minato-ku, Tokyo 108, Japan {hasegawa, rnfujita}@icot.or.jp Abstract This paper describes the results of the research and development of automated reasoning systems(ARS) being conducted by the Fifth Research Laboratory at ICOT. The major result was the development of a parallel theorem proving system MGTP (Model Generation Theorem Prover) in KL1 on a parallel inference machine, PIM. Currently, we have two versions of MGTP. One is MGTP IG, which is used for dealing with ground models. The other is MGTP IN, used for dealing with non-ground models. With MGTP IN, we have achieved a more than one-hundred-fold speedup for condensed detachment problems on a PIM/m consisting of 128 PEs. Nonmonotonic reasoning and program synthesis are taken as promising and concrete application area for MGTP provers. MGTP IG is actually used to develop legal reasoning systems in ICOT's Seventh Research Laboratory. Advanced inference and learning systems are studied for expanding both reasoning power and application areas. Parallel logic programming techniques and utility programs such as 'meta-programming' are being developed using KL1. The technologies developed are widely used to develop applications on PIM. 1 Introduction The final goal of the Fifth Generation Computer Systems (FGCS) project was to realize a knowledge information processing system with intelligent user interfaces and knowledge base systems on parallel inference machines. A high performance and highly parallel inference mechanism is one of the most important technologies to come out of our pursuit of this goal. The major goal of the Fifth Research Laboratory, which is conducted as a subgoal of the problem-solving programming module of FGCS, is to build very efficient and highly parallel automated re?-soning systems (ARS) as advanced inference systems on ~~Lrallel inference machines (PIM), taking advantage of the KL1language and PIMOS operating system. On ARS we intend to develop application systems such as natural language processing, Application Parallel Theorem Proving Figure 1: Goals of Automated Reasoning System Research at tCOT intelligent knowledgebases, mathematical theorem proving systems, and automated programming systems. Furthermore, we intend to give good feedback to the language and operating systems from KLI implementations and experiments on parallel inference hardware in the process of developing ARS. We divided ARS research and development into the following three goals (Figure 1): (1) Developing Parallel Theorem Proving Technologies on PIM Developing very efficient parallel theorem provers on PIM by squeezing the most out of the KL1language is the major part of this task. We have concentrated on the model generation method, whose inference mechanism is based on hyper-resolution. We decided to develop two types of model generation theorem provers to cover ground non-Horn problems and non-ground Horn problems. To achieve maximum performance on PIM, we have focused on the 133 We focused on automated programming as one of the major application areas for theorem provers in the non-Horn logic systems, in spite of difficulty. There has been a long history of program synthesis from specifications in formal logic. We aim to make a first-order theorem prover that will act as a strong support tool in this approach. We have set up three different ways of program construction: realizability interpretation in the constructive mathematics to generate functional programs, temporal propositionallogic for protocol generation, and the KnuthBendix completion technique for interface design of concurrent processes in Petri Net. We stressed the experimental approach in order to make practical evaluation. technological issues below: (a) Elimination of redundant computation Eliminating redundant computation in the process of model generation with the least overhead is an important issue. Potential redundancy lies in conjunctive matching at hyperresolution steps or in the case splitting of ground non-Horn problems. (b) Saving time and space by eliminating the over generation of models For the model generation method, which is based on hyper-resolution as a bottom-up process, over generation of models is an essential problem of time and space consumption. We regard the model generation method as generation and test procedures and have introduced a controlling mechanism called Lazy Model Generation. • Advanced Inference and Learning Theorem proving technologies themselves are rather saturated in their basic mechanisms. In this subgoal, extension of the basic mechanism from deductive approach to analogical, inductive, and transformational approaches is the main research target. Machine learning technologies on logic programs and meta-usage of logic are the major technologies that we decided to apply to this task. (c) Finding PIM-fitting and scalable parallel architecture PIM is a low communication cost MIMD machine. Our target is to find a parallel architecture for model generation provers, which draws the maximum power from PIM. We focused on OR parallel architecture to exploit parallelism in the case splitting of a ground nonHorn prover, MGTP /G, and on AND parallel architecture to exploit parallelism in conjunctive matching and subsumption tests of a nonground Horn prover, MGTP /N. One of the most important aims of developing theL rem provers in KLl is to draw the maximum advantage of parallel logic programming paradigms from KLl. Programming techniques developed in building theorem provers help to, or are commonly used to, develop various applications, such as natural language processing systems and knowledge base systems, on the PIM machines based on logic programming and its extension. We focused on developing meta-programming technology in KLl as a concrete base for this aim. We think it is very useful to develop broader problem solving applications on PIM and to extend KLl to support them. (2) Application A model generation theorem prover has a general reasoning power in various AI areas because it can simulate the widely applied tableaux method effectively. Building an efficient analytic tableaux prover for modal propositional logic on mod~l generation theorem provers was the basic goal of this extension. This approach could naturally be applied to abductive reasoning in AI systems and logic programming with negation as failure linked with broader practical AI applications such as diagnosis. By using analogical reasoning, we intended to formally simulate the intelligent guesswork that humans naturally do, so that results could be obtained even when deductive systems had no means to deduce to obtain a solution because of incomplete information or very long deductive steps. Taking the computational complexity of inductive reasoning into account, we elaborated the learning theories of logic programs by means of predicate invention and least-general generalization, both of which are of central interest in machine learning. In transformational approach, we used fold/unfold transformation operations to generate new efficient predicates in logic programming. The following sections describe these three tasks of research on automated reasoning in ICOT's Fifth Research Laboratory for the three years of the final stage of ICOT. 2 Parallel Theorem Proving Technologies on PIM In this section, we describe the MGTP provers which run on Multi-PSI and PIM. We present the technical essence of KLl programming techniques and algorithms that we developed to improve the efficiency of MGTP. 134 2.1 Parallel Model Generation Theorem Prover MGTP The research on parallel theorem proving systems aims at realizing highly parallel advanced inference mechanisms that are indispensable in building intelligent knowledge information systems. We started this research project on parallel theorem provers about two and a half years ago. The immediate goal of the project is to develop a parallel automated reasoning system on the parallel inference machine, PIM, based on KL1 and PIMOS technology. We aim at applying this system to various fields such as intelligent database systems, natural language processing, and automated programming. At the beginning, we set the following as the main subjects. • To develop very fast first-order parallel theorem provers As a first step for developing KL1-technology theorem provers, we adopted the model generation method on which SATCHMO is based as a main proof mechanism. Then we implemented a modelgeneration based theorem prover called MGTP. Our reason was that the model generation method is particularly suited to KL1 programming as explained later. Based on experiences with the development of MGTP, we constructed a "TP development support system" which provided us with useful facilities such as a proof tracer and a visualizer to see the dynamic behavior of the prover. • To develop applications Although a theorem prover for first-order logic has the potential to cover most areas of AI, it has not been so widely used as Prolog. One reason for this is the inefficiency of the proof procedure and the other is lack of useful applications. However, through research on program synthesis from formal specification[Hasegawa et at., 1990], circuit verification, and legal reasoning[Nitta et at., 1992], we became convinced that first-order theorem provers can be effectively used in various areas. We are now developing an automated program synthesis system, a specification description system for exchange systems, and abductive and non-monotonic reasoning systems on MGTP. • To develop KL1 programming techniques Accumulating KL1 programming techniques through the development of theorem provers is an important issue. We first developed KL1 compiling techniques to translate given clauses to corresponding KL1 clauses, thereby achieving good performance for ground clause problems. We also developed methods to parallelize MGTP by making full use of logical variables and the stream data type of KL1. • To develop KL1 meta-programming technology This is also an important issue in developing theorem provers. This issue is discussed in Section 2.1.2 Meta-Programming in KL1. We have implemented basic meta-programming tools called Meta-Library in KL1. The meta-library is a collection of KL1 programs which offers routines such as full unification, matching, and variable managements. 2.1.1 Theorem Prover in KL1 Language Recent developments in logic programming have made it possible to implement first-order theorem provers efficiently. Typical examples are PTTP by Stickel [Stickel 1988], and SATCHMO by Manthey and Bry [Manthey and Bry 1988]. PTTP is a backward-reasoning prover based on the model elimination method. It can deal with any firstorder formula in Horn clause form without loss of completeness and soundness. SATCHMO is a forward-reasoning prover based on the model generation method. It is essentially a hyperresolution prover, and imposes a condition called rangerestricted on a clause so that we can derive only ground atoms from ground facts. SATCHMO is basically a forward-reasoning prover but also allows backwardreasoning by employing Prolog over the Horn clauses. The major advantage of these systems is because the input clauses are represented with Prolog clauses and most parts of deductions can be performed through normal Prolog execution. In addition to this method we considered the following two alternative implementations of object-level variables in KL1: (1) representing object-level variables with KL1 ground terms (2) representing object-level variables with KL1 variables The first approach might be the right path in metaprogramming where object- and meta-levels are separated strictly, thereby giving it dear semantics. However, it forces us to write routines for unification, substitution, renaming, and all the other intricate operations on variables and environments. These routines would become considerably larger and more complex than the main program, and introduce overhead to orders of magnitude. In the second approach, however, most of operations on variables and environments can be performed beside the underlying system instead of running routines on top of it. Hence, it enables a meta-programmer to save writing tedious routines as well as gaining high efficiency. Furthermore, one can also use Prolog var predicates to write routines such as occurrence checks in order to make built-in unification sound, if necessary. Strictly speaking, this approach may not be chosen since it makes the 135 distinction between object- and meta-level very ambiguous. However, from a viewpoint of program complexity and efficiericy, the actual profit gained by the approach is considerably large. In KLl, however, the second approach is not always possible, as in the Prolog case. This is because the semantics of KLI never allows us to use a predicate like Prolog var. In addition, KLI built-in unification is not the same as Prolog's counterpart, in that unification in the guard part of a KLI clause is limited to one way and a unification failure in the body part is regarded as a semantic error or exception rather than as a failure which merely causes backtrack in Prolog. Nevertheless, we can take the second approach to implement a theorem prover where ground models are dealt with, by utilizing the features of KLI as much as possible. Taking the above discussions into consideration, we decided to develop both the MGTP IG and MGTP IN provers so that we can use them effectively according to the problem domain being dealt with. The ground version, MGTP IG, aims to support finite problem domains, which include most problems in a variety of fields, such as database processing and natural language processing. For ground model cases, the model generation method makes it possible to use just matching, rather than full unification, if the problem clauses satisfy the rangerestrictedness condition 1. This suggests that it is sufficient to use KLl's head unification. Thus we can take the KLI variable approach for representing object-level variables, thereby achieving good performance. The key points of KLI programming techniques developed for MGTP IG are as follows: (Details are described in the next section.) • First, we translate a given set of clauses into a corresponding set of KLI clauses. This translation is quite simple. • Second, we perform conjunctive matching of a literal in a clause against a model element by using KLI head unification. • Third, at the head unification, we can automatically obtain fresh variables for a different instance of the literal used. The non-ground version, MGTP IN, supports infinite problem domains. Typical examples are mathematical theorems, such as group theory and implicational calculus. For non-ground model cases, where full unification with occurrence check is required, we are forced to follow the KL 1 ground terms approach. However, we do 1 A clause is said to be range-restricted if every variable in the clause has at least one occurrence in its antecedent. Problems Tools Solutions Reduudancy 1 in Conjunctive ramified StaCk] 2 3 Matching MERC Unification! Subsumption (Term IndeXing) Irrelevant Clauses Meta4 Programming inKLl I Parallelism + J (partial Falsify Relevancy Test Firmware Coding (Meta-Library) 5 Overgeneration [LaZY Model Generation of Models 6 KLI programming techniques J Non-Horn Ground f P J Horn ( AND Parallel ) OR Parallel + PIM machine And Sequential Figure 2: Major Problems and Technical Solutions not necessarily have to maintain variable-binding pairs as processes in KLl. We can maintain them by using the vector facility supported by KLl, as is often used in ordinary language processing systems. Experimental results show that vector implementation is several hundred times faster than process implementation. In this case, however, we cannot use the programming techniques developed for MGTP IG. Instead, we have to use a conventional technique, that is, interpreting a given set of clauses instead of compiling it into KLI clauses. 2.1.2 Key Technologies' to 'Improve Efficiency \Ale developed several programming techniques in the process of seeking ways to improve the efficiency of model generation theorem provers. Figure 2 shows a list of the problems which prevented good performance and the solutions we obtained. In the following sections we ou tline the problems and their solutions. Redundancy in Conjunctive Matching To improve the performance of the model generation provers, it is essential to avoid, as much as possible, redundant computation in conjunctive matching. Let us consider a clause having two antecedent literals, and suppose we have a model candidate M at some stage i in the 136 proof process. To perform conjunctive matching of an antecedent literal in the clause against a model element, we need to pick all possible pairs of atoms from M. Imagine that we are to extend M with a model-extending atom .6., which is in the consequent of the clause, but not in M. Then in the next stage, we need to pick pairs of atoms from M U.6.. The number of pairs amounts to: CM U 6.)2 = M x MUM x 6. u.6. x M u.6. x .6.. However, doing this in a naive manner would introduce redundancy. This is because M x M pairs were already considered in the previous stage. Thus we must only choose pairs which contain at least one .6.. (1) RAMS Method The key point of the RAMS (Ramified Stack) method is to retain in a literal instance stack the intermediate results obtained in conjunctive matching. They are instances which are a result of matching a literal against a model element. This algorithm exactly computes a repeated combination of .6.. and an atom picked from M without duplication([Fujita and Hasegawa 1990]). For non-Horn clause cases, the literal instance stack expands a branch every time case splitting occurs, and grows like a tree. This is how the RAMS name was derived. Each branch of the tree represents a different model candidate. The ramified-stack method not only avoids redundancy in conjunctive matching but also enables us to share a common model. However, it has one drawback: it tends to require a lot of memory to retain intermediate literal instances. Al A2 A3 11 M M M 11 M M M 11 11 11 M 11 M 11 M 11 11 11 11 11 * Al,* A2'* A3 ( '* means not-unifiable) Forground,and Figure 3: Multiple-Entry Repeated Combination (MERC) Method There are some trade-off points between the RAMS method and the MERC method. In the RAMS method, every successful result of matching a literal Ai against model elements is memorized so as not to rematch the same literal against the same model element. On the other hand, the MERC method does not need such a memory to store the information of partial matching. However, it still contains a redundant computation. For instance, in the computation for [M,.6.,.6.] and [M, 6., M] patterns, the common subpattern, [M,.6.], will be recomputed. The RAMS method can remove this sort of redundancy. Speeding up U nification/Subsumption (2) MERC Method The MERC (Multi-Entry Repeated Combination) method([Hasegawa 1991]) tries to solve the above problem in the RAMS method. This method does not need a memory to retain intermediate results obtained in the conjunctive matching. Instead, it needs to prepare 2n - 1 clauses for the given clause having n literals as its antecedent. The outline of the MERC method is shown in Figure 3. For a clause having three antecedent literals, AI, A 2 , A3 -+ C, we prepare seven clauses, each of which corresponds to a repeated combination of 6. and M, and perform the conjunctive matching using the combination pattern. For example, a clause corresponding to a combination pattern [M,.6., M] first matches literal A2 against.6.. If the match succeeds, the remaining literals, Al and A 3 , are matched against an element picked out of A1. Note that each combination pattern includes at least one .6., and that the [111,111, M] pattern is excluded. Almost all computation time is used in the unification and subsumption tests in the MGTP. Term indexing is a classical way and the only way to improve this process to one-to-many unification/subsumption.: We used the discrimination tree as the indexing structure. Figure 4 shows the effect of Term Memory on a typical problem on MGTP /G. Optimal use of Disjunctive Clauses Loveland et. al. [Wilson and, Loveland 1989] indicated that irrelevant use of disjunctive clauses in the ground model generation prover rises useless case spli tting, thereby leads to serious redundant searches. Arbitrary selected disjunctive clauses in MGTP may lead to a combinatorial explosion of redundant models. An artificial yet suggestive example is shown in Figure 5. In MGTP /G, we developed two methods to deal with this problem. One method is to introduce upside-down 137 • Execution Time and No. of Reductions(Instruction Unit of KL1) (1) With TM. 784197 red / 14944 msec (2) Without TM. 1629421 red /28174 msec • The Most Dominant Operations (1) With TM (The eight best predicates are of TM operations) Predicate assocs /4 term Type/ 'L termNodes 14 termNodes1/4 bind Constant 15 Others Red 466990 4jIY1 39771 20338 20304 193003 (1 - 1) (1 (2 (2 (3 - 2) 1) 2) 1) (3 - 2) (4) (5) (6) (7) Figure 6: Compiled code in UDMI 0 500,000 (2) Without TM(member predicate takes the first rank) Predicate member I 3 c/O satisfyLiteral/9 satlstyLlteral17 do 17 Uthers true -+ gp( c, X, Y). gp( c, X, Y), p( c, X, Y) -+ false. true -+ gq(X, c, Y). gq(X, c, Y), q(X, c, Y) -+ false. true -+ gr(X, Y, c). gr(X, Y, c), r(X, Y, c) -+ false. true -+ s( a) true -+ s(b) true -+ s(c) s(X), s(Y), s(Z), gp(X, Y, Z), gq(X, Y, Z), gr(X, Y, Z) -+ p(X,Y,Z);q(X,Y,Z);r(X,Y,Z) ROd, 1214048 l/ts404 133255 46_655. 270631 29936 1 0 500,000 1,000,000 Figure 4: Speed up by Term Memory false: -p( c, X, Y) false: -q(X, c, Y) false: -r(X, Y, c) s(a) s(b) s(c) s(X), s(Y), s(Z) -+ p(X,Y,Z);q(X,Y,Z);r(X,Y,Z) (1) (2) (3) (4) (5) (6) (7) Figure 5: Example Problem to Relevancy Testing meta-interpretation(UDMI)[Stickel 1991] into MGTP /G. By using upside-down meta-interpretation, the above problem was compiled into the bottom-up rules in Figure 6. Note that this is against the range restricted rule but is safe with Prolog-unification. The other method is to keep the positive disjunctive clauses obtained by the process of reasoning. False checks are made independently on each literal in the disjunctive model elements with unit models and if the check succeeds then that literal is eliminated. The disjunctive models can be sorted by their length. This method showed considerable speed-up for n-queens problems and enumeration of finite algebra. Meta-Programming in KLI Developing fast meta-programs such as unification and matching programs is very significant in making a prover efficient. Most parts of proving processes are ~he executions of such programs. The efficiency of a prover depends on how efficient meta-programs are made. In Prolog-Technology Theorem Provers such as PTTP and SATCHMO, object-level variables 2 are directly represented by Prolog variables. With this representation, most operations on variables and environments can be performed beside the underlying system Prolog. This means that we can gain high efficiency by using the functions supported by Prolog. Also, a programmer can use the Prolog var predicate to write routines such as occurrence checks in order to make built-in unification sound, if such routines are necessary. Unfortunately in KL1, we cannot use this kind of technique. This is because: 1) the semantics of KL 1 never allow us to use a predicate like var, 2) KL1 built-in unification is not the same as its Prolog coun terpart in that unification in the guard part of a KL1 clause can only be one-way, and 3) a unification failure in the body part is regarded as a program error or exception that cannot be backtracked. We should, therefore, treat an object-level variable as constant data (ground term) rather than as a KLI variable. It forces us to write routines for unification, substitution, renaming, and all the other intricate operations of variables and environments. These routines can become extremely large and complex compared to the main program, and may make the overhead bigger. To ease the programmer's burden, we developed .A1etaLibrary. This is a collection of KL1 programs to support meta-programming in KL1 [Koshimura et al., 1990]. 2variables appearing in a set of given clauses 138 The meta-library includes facilities such as full unification with occurrence check, and variable management rou tines. The performance of each program in the metalibrary is relatively good. For example, unification program runs at 0.25 rv 1.25 times the speed of built-in unification. The major functions in meta-library are as follows. unify(X,Y, Env,-NewEnv) unify_oc(X,Y, Env,-NewEnv) rnatch(Pattern,Terget, Env, -NewEnv) oneway_unify(Pattern,Terget, Env,-NewEnv) copy_terrn(X,-NewX, Env,-NewEnv) shallow(X,Env, -NewNnv) freeze(X,-FrozenX, Env) rnelt(X,-MeltedX, Env) create_env(-Env, Size) fresh_var(Env, -VarAndNewEnv) equal(X,Y, Env, -YN) is_type(X,Env, -Type) unbound(X, Env, -YN) database (Request Stream) get_object (KL1Terrn, -Object) get_kll_term(Object, -KL1Term) Over-Generation of Models A more important issue with regard to the efficiency of model generation based provers is reducing the total amount of computation and memory space required in proof processes. Model-generation based provers must perform the following three operations. • create new model elements by applying the model extension rule to the given clauses using a set of model-extending atoms 6. and a model candidate set M (model extension). • make a subsumption test for a created atom to check if it is subsumed by the set of atoms already being created, usually by the current model candidate. • make a false check to see if the unsubsumed model element derives false by applying the model rejection rule to the tester clauses (rejection test). The problem with the model generation method is the huge growth in the number of generated atoms and in the computational cost in time and space, which is incurred by the generation processes. This problem becomes more critical when dealing with harder problems which require deeper inferences (longer proofs), such as Lukasiewicz problems. To solve this problem, it is important to recognize that proving processes are viewed as generation-and-test processes, and that generation should be performed only when required by the test. Table 1: clause) Comparison of complexities (for unit tester Algorithm Basic Full-test/Lazy Lazy & Lookahead T S G M pm 2 pm'l. f.lp 2 m 4Ct f.lm'l.Ct p2 m 4 p3 m 4 m'l. pm'l. m'l. (ft/ p )mCt m/p m t m is the number of elements in a model ca:ndidate when false is detected in the basic algorithm. p is the survival rate of a generated atom, f.l is the rate of successful conjunctive matchings (p ~ jt), and a(l ~ a ~ 2) is the efficiency factor of a subsumption test. t For this we proposed a lazy model generation algorithm[Hasegawa et ai., 1992] that can reduce the amount of computation and space necessary for obtaining proofs. Table 1 compares the complexities of the model generation algorithms 3 , where T(S/G) represents the number of rejection tests (subsumption tests/model extensions), and M represents the number of atoms stored. From a simple analysis, it is estimated that the time complexity of the model extension and subsumption test decreasesfrom O(m4) in the algorithms without lazy control to O(m) in the algorithms with lazy control. For details, refer to [Hasegawa et ai., 1992]. Parallelizing MGTP There are three major sources when parallelizing the proving processes in the MGTP prover:' multiple model candidates in a proof, multiple clauses to which model generation rules are applied, and multiple literals in conjunctive matching. Let us assume that the prime objective of using the model generation method is to find a model as a solution. There may be alternative solutions or models for a given problem. We take it as OR-paralleli-sm to seek these multiple solutions at the same time. According to our assumption, multiple model candidates and multiple clauses are taken as sources for exploiting OR-parallelism. On the other hand, multiple literals are the source of AND-parallelism since all the literals in a clause relate to a single solution, where shared variables in the clause should have compatible values. For ground non-Horn cases, it is sufficient to exploit OR parallelism induced by case splitting. For Horn clause cases, we have to exploit AND parallelism. The 3The basic algorithm taken by OTTER[McCune 1990) generates a bunch of new atoms before completing rejection tests for previously generated atoms. The full-test algorithm completes the tests before the next generation cycie, but still generates a bunch of atoms each time. Lookahead is an optimization method for testing wider spaces than in Full-test/Lazy. 139 main source of AND parallelism is conjunctive matching. Performing subsumption tests in parallel is also very effective for Horn clause cases. In the current MGTP, we have not yet considered the non-ground and non-Horn cases. (1) Parallelization of MGTP IG With the current version of the MGTP IG, we have only attempted to exploit OR parallelism on the Multi-PSI machine. (a) Processor allocation The processor allocation methods that we adopted achieve 'bounded-OR' parallelism in the sense that OR-parallel forking in the proving process is suppressed so as to meet restricted resource circumstances. One way of doing this, called simple allocation, is sketched as follows. We expand model candidates starting with an empty model using' a single master processor until the number of candidates exceeds the number of available processors, then distribute the remaining tasks to slave processors. Each slave processor explores the branches assigned without further distributing tasks to any other processors. This simple allocation scheme for task distribution works fairly well since communication costs can be minimized. (b) Speed-up on Multi-PSI One of the examples we used is the N-queens problem given below. 12 .............. J.: ..........):.............j:.....:L.../·/ / i. / / 10 ............... 1.............. .1 .......... t ~ t ... ii/)'/1/~V........ :. . . t...····· r4~-+-r-r~~+-~y,:~~: ....~-+~~~ 8 ././: ~) ./:....... o 1 2 .:.... ,...y 10 __ l:~een ~~~ ...........-- __ 12 14 6queen 4queen 16 Numb.r of PEs Figure 7: Speedup of MGTP IG on Multi-PSI (N-queens) rate is rather small for the 4-queens problem only. This is probably because in such a small problem, the constant amount of interpretation overhead would dominate the proper tasks for the proving process. (2) Parallelization of MGTP IN For MGTP IN, we have attempted to exploit AND parallelism for Horn problems. We have several choices when parallelizing modelgeneration based theorem provers: 1) proofs which change or remain unchanged according to the number of PEs used C1 : Cn : Cn + 1 : true -t p(l, l)j p(l, 2)j ... j p(l, n). true -t p(n,1)jp(n,2)j ... jp(n,n). p(Xl, Yl),p(X 2 , 12), unsaf e(X1' Yi, X 2 , 12) - t false. The first N clauses express every possible placing of queens on an N by N chess board. The last clause expresses the constraint that a pair of queens must satisfy. So, the problem would be solved when either one model (one solution) or all the models (all solutions) are obtained for the clause set. The performance has been measured on an MGTP IG prover running on a Multi-PSI using the simple allocation method stated above. The speedup obtained using up to 16 processors are shown in Figure 7. For the 10-queens problem, almost linear speedup is obtained as the number of processors increases. The speedup 2) model sharing (copying in a distributed memory architecture) or model distribution, and 3) master-slave or master-less. A proof changing prover may achieve super-linear speedup while a proof unchanging prover can achieve, at most, linear speedup. The merit of model sharing is that time consuming subsumption testing and conjunctive matching can be performed at each PE independently with minimal inter-PE communication. On the other hand, the benefit of model distribution is that we can obtain memory scalability. The communication cost, however, increases as the number of PEs increases, since generated atoms need to flow to all PEs for subsumption testing. The master-slave configuration makes it easy to build a parallel system by simply connecting a sequential version of MGTP IN on a slave PE to the master PE. However, it needs to be designed with devices so as to minimize the load on the master 140 Table 2: Performance of MGTP /N (Th 5 and Th 7) Problem Time (sec) Reductions Th5 KRPS/PE Speedup Time (sec) Reductions Th7 KRPS/PE Speedup 16 PEs 41725.98 38070940 57.03 1.00 48629.93 31281211 40.20 1.00 64 PEs 11056.12 40759689 57.60 3.77 13514.47 37407531 43.25 3.60 Speedup 16 14 12 10 8 6 ---m- #3 4 --+-....... ;:;:; ..... 2 #58 #77 ideal process. On the other hand, a master-less configuration, such as a ring connection, allows us to achieve pipeline effects with better load balancing, whereas it becomes harder to implement suitable control to manage collaborative work among PEs. 0 2 8 10 12 14 16 Figure 8: Speedup ratio MGTP /N prover on Multi-PSI using 16PEs. The execution times taken to solve these problems are 218, 12, and 37 seconds. As shown in the figure, there is no saturation in performance up to 16 PEs and greater speedup is obtained for the problems which consume more time. Given the above, we implemented a proof unchanging version of MGTP /N in a master-slave configuration based on lazy model generation. In this system, generator and subsumption processes run in a demand-driven mode, while tester processes run in a data-driven mode. The main features of this system are as follows: Table 2 shows the performance obtained by running MGTP /N for Theorems 5 and 7 [Overbeek 1990], which are also condensed detachment problems, on Multi-PSI with 64 PEs. We did not use heuristics such as sorting, but merely limited term size and eliminated tautologies. Full unification is written in KL1, which is thirty to one hundred times slower than that written in C on SUN/3s and SPARCs. Note that the average running rate per PE for 64 PEs is actually a little higher than that for 16 PEs. With this and other results, we were able to obtain almost linear speedup. 1) Proof unchanging allows us to obtain greater speedup as the number of PEs increases; 2) By utilizing the synchronization mechanism supported by KL1, sequentiality in subsumption testing is minimized; Recently we obtained a proof of Theorem 5 on PIM/m with 127 PEs in 2870.62 sec and nearly 44 billion reductions (thus 120 KRPS/PE). Taking into account the fact that the PIM/m CPU is about twice as fast as that of Multi-PSI, we found that almost linear speedup can be achieved, at least up to 128 PEs. 3) Since slave processes spontaneously obtain tasks from the master and the size of each task is well equalized, good load balancing is achieved; 4) By utilizing the stream data type of KL1, demand driven control is easily and efficiently implemented. Figure 8 displays the speedup ratio for condensed detachment problems #3, #58, and #77, taken from[McCune and Wos 1991], by running the 6 No. of PEs Our policy in developing parallel theorem provers is that we should distinguish between the speedup effect caused by parallelization and the search-pruning effect caused by strategies. In the proof changing parallelization, changing the number of PEs is merely betting, and may cause the strategy to be changed badly even though it results in the finding of a shorter proof. By using the demand driven control, we can not only suppress unnecessary model extensions and subsumption tests but also maintain a high running rate that is the key to achieving linear speedup. 4 2.2 Reflection and Parallel Meta-Programming System Reflection is the capability to feel the current state of the computation system or to dynamically modify it. The form of reflection we are interested in is the comp'l.ltational reflection proposed by [Smith 1984]. We try to 141 incorporate meta-level computation and computational reflection in logic programming language in a number of directions. As a foundation, a reflective sequential logic language R-Prolog* has been proposed [Sugano 1990]. This language allows us to deal with syntactic and semantic objects of the language itself legally by means of several coding operators. The notion of computational reflection is also incorporated, which allows computational systems to recognize and modify their own computational states. As a result, some of the extra-logical predicates in Prolog can be redefined in a consistent framework. We have also proposed a reflective parallel logic programming language RGHC (Reflective Guarded Horn Clauses ) [Tanaka and Matono 1992]. In RGHC, a reflective tower can be constructed and collapsed in a dynamic manner, using reflective predicates. A prototype implementation of RGHC has also been performed. It seems that RGHC is unique in the simplicity of its implementation of reflection. The meta-level computation can be executed at the same speed as its object-level computation. we also try to formalize distributed reflection, which allows concurrent execution of both object level and meta level computations [Sugano 1991]. The scope of reflection is specified by grouping goals that share local environments. This also models the eventual publication of constraints. We have also built up several application systems based on meta-programming and reflection. These are the experimental programming system ExReps [Tanaka 1991], the process oriented GHC debugger [Maeda et al., 1990, Maeda 1992] and the strategy management shell [Kohda and Maeda 91a, Kohda and Maeda 1991b]. ExReps is an experimental programming environment for parallel logic languages, where one can input programs and execute goals. It consists of an abstract machine layer and an execution system layer. Both layers are constructed using meta-programming' techniques. Various reflective operations are implemented in these layers. The process oriented GHC debugger provides highlevel facilities, such as displaying processes and streams in tree views. It can control a the behavior of a process by interactively blocking or editing its input stream data. This makes it possible to trace and check program execution from a programmer's point of view. A strategy management shell takes charge of a database of load-balancing strategies. When a user job is input, the current leading strategy and several experimental alternative strategies for the job are searched for in the database. Then the leading task and several experimental tasks of the job are started. The shell can evaluate the relative merits between the strategies, and decides on the leading strategy for the next stage when the tasks have terminated. 3 Applications Reasoning of Automated ARS has a wider application area if connected with logic programming and a formal approach to programming. We extended MGTP to cover modal logic. This extension has lead to abductive reasoning in AI systems and logic programming with negation as failure linked with broader practical applications such as fault diagnostics and legal reasoning. We also focused on programming, particularly parallel programs, as one of the major application area of formal logic systems in spite of difficulties. There has been a long history of program synthesis from specifications in formal logic. We are aiming to make ARS, the foundational strength of this approach. 3.1 Propositional Modal Tableaux in MGTP MGTP's proof method and the tableaux proof procedure[Smullyan 1968] are very close in computationally. Each rule of tableaux is represented by an input clause for MGTP in a direct manner. In other words, we can regard the input clauses for MGTP as a tableaux implementation language, as Horn clauses are a programming language for Prolog. MGTP tries to generate a model for a set of clauses in a bottom-up manner. When MGTP successfully generates a model, it is found to be satisfiable. Otherwise, it is found to be unsatisfiable. satisfiable apply(MGTP, ASetOfClauses) = { un sat'zs f'za bl e Since we regard MGTP as an inference system, a propositional modal tableaux[Fitting 1983, Fitting 1988] has been implemented in MGTP. apply(MGT P, TableauxProver(Formula)) ={ satisfiable unsat'zs f'za bl e In tableaux, a close condition is represented by a negative clause, an input formula by a positive ~lause and a decomposition rule by a mixed clause for MGTP in a direct manner[Koshimura and Hasegawa 1991]. There are two levels in this prover. One is the MGTP implementation level, the other is the tableaux implementation level. The MGTP level is the inference system level at which we mainly examine speedup of inference such as redundancy elimination and parallelization. At the tableaux level, inference rules, which indicate the property of a proof domain, are described. It follows that we mainly examine the property of the proof domain at the tableaux level. It is useful and helpful to have these two levels, as we can separate the description for the property of the domain from the description for the inference system. 142 3.2 Abductive Reasoning and N onmonotonic Modeling sophisticated agents capable of reasoning with incomplete information has been a major theme in AI. This kind of reasoning is not only an advanced mechanism for intelligent agents to cope with some particular situations but an intrinsically necessary condition to deal with commonsense reasoning. It has been agreed that neither human beings nor computers can have all the information relevant to mundane or everyday situations. To function without complete information, intelligent agents should draw some unsound conclusions, or augment theorems, by applying such methods as closedworld assumptions and default reasoning. This kind of reasoning is nonmonotonic: it does not hold that the more information we have, the more consequences we will be aware of. Therefore, this inference has to anticipate the possibility of later revisions of beliefs. We treat reasoning with incomplete information as a reasoning system with hypotheses, or hypothetical reasoning [Inoue 1988], in which a set of conclusions may be expanded by incorporating other hypotheses, unless they are contradictory. In hypothetical reasoning, inference to reach the best explanations, that is, computing hypotheses that can explain an observation, is called abduction. The notion of explanation has been a fundamental concept for various AI problems such as diagnoses, synthesis, design, and natural language understanding. We have investigated methodologies of hypothetical reasoning from various angles and have developed a number of abductive and nonmonotonic reasoning systems. Here, we shall present hypothetical reasoning systems built upon the MGTP [Fujita and Hasegawa 1991J. The basic idea of these systems is to translate formulas with special properties, such as nonmonotonic provability (negation as failure) and consistency of ahductive explanations, into some formulas with a kind of modality so that the MGTP can deal with them using classical logic. The extra requirements for these special properties are thus reduced to generate-and-test problems for model candidates. These, can, then, be handled by the· MGTP very efficiently through case-splitting of non-unit consequences and rejection of inconsistent model candidates. In the following, we show how the MGTP can be used for logic programs containing negation as failure, and for abduction. 3.2.1 Logic Programs and Disjunctive Databases with Negation as Failure In recent theories of logic programming and deductive databases, declarative semantics have been given to the extensions of logic programs, where the negation-asfailure operator is considered to be a nonmonotonic modal operator. In particular, logic programs or de- ductive databases containing both negation as failure (not) and classical negation (-.) can be used as a powerful knowledge representation tool, whose applications contain reasoning with incomplete knowledge [Gelfond and Lifschitz 1991], default reasoning, and abduction [Inoue 1991aJ. However, for these extended classes of logic programs, the top-down approach cannot be used for computation because there is no local property in evaluating programs. For example, there has been no top-down proof procedure which is sound with respect to the stable model semantics for general logic programs. We thus need bottom-up computation for correct evaluation of negation-as-failure formulas. In [Inoue et al., 1992a], a bottom-up computation of answer sets for any class offunction-free logic programs is provided. These classes include the extended disjunctive databases [Gelfond and Lifschitz 1991J, the proof procedure of which has not been found. In evaluating not P in a bottom-up manner, it is necessary to interpret not P with respect to a fixpoint of computation because, even if P is not currently proved, P might be proved in subsequent inferences. We thus came up with a completely different way of thinking for not. When we have to evaluate not P in a current model candidate we split the model candidate in two: (1) the model candidate where P is assumed not to hold, and (2) the model candidate where it is necessary that P holds. Each negation-as-failure formula not P is thus translated into negative and positive literals with a modality expressing belief, i.e., "disbelieve P" (written as -.KP) and "believe P" (written as KP). Based on the above discussion, we translate any logic program (with negation as failure) into a positive disjunctive program (without negation as failure) of which the MGTP can compute the minimal models. The following is an example of the translation of general logic programs. Let II be a general logic program consisting of rules of the form: Al (- A I +1 , ... , Am, not A m+1 , ••. , not An, (1) where, n 2: m 2: I 2: 0, 1 2: I 2: 0, and each Ai is an atom. Rules without heads are called integrity constraints and are expressed by l = for the form (1). Each rule in II of the form (1) is translated into the following MGTP rule: ° A I+1 , ..• , Am, -7 -.KAm +ll ···, -.KAn , All KAm +1 I ... I KAn· (2) For any MGTP rule of the form (2), if a model candidate S' satisfies A I+ 1 , . .. , Am, then S' is split into n - m + I (n 2: m 2: 0, I ::; 1) model candidates. Pruning rules with respect to "believed" or "disbelieved" literals are expressed as the following integrity constraints. These are dealt with by using object-level schemata on the MGTP. °: ; -.KA, A-7 for every atom A (3) -.KA, KA-7 for every atom A (4) Given a general logic program II, we denote the set of rules consisting of the two schemata (3) and (4) by tr(II), 143 and the MGTP rules obtained by replacing each rule (1) of II by a rule (2). The MGTP then computes the fixpoint of model candidates, denoted by M(tr(II)), which is closed under the operations of the MGTP. Although each model candidate in M(tr(II)) contains "believed" atoms, we should confirm that every such atom is actually derived from the program. This checking can be done very easily by using the following constraint. Let 5' E M(tr(II)). helpful for understanding under what conditions each model is stable or unstable. The MGTP can find all answer sets incrementally, without backtracking, and in parallel. The proposed method is surprisingly simple and does not increase the computational complexity of the problem more than computation of the minimal models of positive disjunctive programs. The procedure has been implemented on top of the MGTP on a parallel inference machine, and has been applied to a legal reasoning system. For every ground atom A, if K A E 5', then A E 5' . (5) Computation by using MGTP is sound and complete with respect to the stable model semantics in the sense that: 5 is an answer set (or stable model) of II if and only if 5 is one of the atoms obtained by removing every literal with the operator K from a model candidate 5' in M(tr(II)) such that 5' satisfies condition (5). Example: _Suppose that the general logic program II consists of the four rules: R +- notR, R +- Q, P +- notQ, Q +- not P. These rules are translated to the following MGTP rules: ~ -,KR,R I KR, Q~R, ~ ~ -,KQ,P I KQ, -,KP, Q I KP. In this example, the first MGTP rule can be further reduced to ~ KR. if we prune the first disjunct by the schema (3). Therefore, the rule has computationally the same effect as the integrity constraint: +- notR. This integrity constraint says that every answer set has to contain R: namely, R should be derived. Now, it is easy to see that M(tr(II)) = {51, 52, 53}' where 51 = {KR,-,KQ,P,KP}, 52 = {KR,KQ,-,KP,Q,R}, and 53 = {KR, KQ, KP}. The only model candidate that satisfies the condition (5) is 52, showing that {Q,R} is the. unique stable model of II. Note that {P} is not a stable model because 51 contains K R but does not contain R. In [Inoue et al., 1992a], a similar translation was also given to extended disjunctive databases which contain classical negation, negation as failure and disjunctions. Our translation method not only provides a simple fixpoint characterization of answer sets, but also is very 3.2.2 Abduction There are many proposals for a logical account of abduction, whose purpose is to generate query explanations. The definition we consider here is similar to that proposed in [Poole et al., 1987J. Let ~ be a set of formulas, r a set of literals and G a closed formula. A set E of ground instances of r is an explanation of G from (~, f) if 1. ~ 2. ~ u E 1= G, and U E is consistent. The computation of explanations of G from (~, r) can be seen as an extension of proof-finding by introducing a set of hypotheses from f that, if they could be proved by preserving the consistency of the augmented theories, would complete the proofs of G. Alternatively, abduction can be characterized by a consequence-finding problem [Inoue 1991b], in which some literals are allowed to be hypothesized (or skipped) instead of proved, so that new theorems consisting of only those skipped literals are derived at the end of deductions instead of just deriving the empty clause. In this sense, abduction can be implemented by an extension of deduction, in particular of a top-down, backward-chaining theorem-proving procedure. For example, Theorist [Poole et al., 1987] and SOL-resolution [Inoue 1991b] are extensions of the Model Elimination procedure [Loveland 1978J. However, there is nothing to prevent us from using a bottom-up, forward-reasoning procedure to implement abduction. In fact, we developed the abductive reasoning system APRICOT /0 [Ohta and Inoue 1990], which consists of a forward-chaining inference engine and the ATMS [de Kleer 1986]. The ATMS is used to keep track of the results of inference in order to avoid both repeated proofs of subgoals and duplicate proofs among different hypotheses deriving the same subgoals. These two reasoning styles for abduction have both merits and demerits, which are complementary to each other. Top-down reasoning is directed to the given goal but may result in redundant proofs. Bottom-up reasoning eliminates redundancy but may prove subgoals unrelated to the proof of the given goal. These facts suggest that it is promising to simulate top-down reasoning using 144 a bottom-up reasoner, or to utilize cashed results in topdown reasoning. This upside-down meta-interpretation [Bry 1990] approach has been attempted for abduction in [Stickel 1991], and has been extended by incorporating consistency checks in [Ohta and Inoue 1992]. We have already developed several parallel abductive systems [Inoue et at., 1992b] using the the bottom-up theorem prover MGTP. We outline four of them below. 1. MGTP+ATMS (Figure 9). This is a parallel implementation of APRICOT /0 [Ohta and Inoue 1990] which utilizes the ATMS for checking consistency. The MGTP is used as a forward-chaining inference engine, and the ATMS keeps a current set of beliefs M, in which each ground atom is associated with some hypotheses. For this architecture, we have developed an upsidedown meta-interpretation method to incorporate the top-down information [Ohta and Inoue 1992]. Parallelism is exploited by executing the parallel ATMS. However, because there is only one channel between the MGTP and the ATMS, the MGTP often has to wait for the results of the ATMS. Thus, the effect of parallel implementation is limited. parallelization depends heavily on how much consistency checking is being performed in parallel at one time. 3. All Model Generation Method. No matter how good the MGTP+MGTP method might be, the system still consists of two different components. The possibilities for parallelization therefore remain limited. In contrast, model generation methods do not separate the inference engine and consistency checking, but realize both functions in a single MGTP. In such a method, the MGTP is used not only as an inference engine but also as a generate-and-test mechanism so that consistency checks are automatically performed. For this purpose, we can utilize the extension and rejection of model candidates supplied by the MGTP. Therefore, multiple model candidates can be kept in distributed' memories instead of keeping one global belief set M, as done in the above two methods, thus great amounts of parallelism can be obtained. The all model generation method is the most direct way to implement reasoning with hypotheses. For each hypothesis H in f, we supply a rule of the form: 2. MGTP+MGTP (Figure 10). -7 This is a parallel version of the method described in [Stickel 1991]. In addition, consistency is checked by calling another M G TP (MGT P _2). In this system, each hypothesis H in f is represented by fact(H, {H}), and each Horn clause in ~ of the form: Al A ... A An::) C, is translated into an MGTP rule of the form: n fact(C, cc( U Ed), i:::=+ where Ei is a set of hypotheses from f on which Ai depends, and the function cc is defined as: cc(E) ={ ~I if if ~ U ~ E is consistent U E is not consistent A current set of beliefs M is kept in the form of fact(A, E) representing a meta-statement that L:; U E 1= A, but is stored in the inference engine (!vI GT P _1) itself. Each time MGT P _1 derives a new ground atom, the consistency of the combined hypotheses is checked by MGT P-2. The parallelism comes from calling H I..,KH, (6) where ..,KH means that H is not assumed to be true in the model. Namely, each hypothesis is assumed either to hold or not to hold. Since this system may generate 21r1 model candidates, the method is often too explosive for several practical applications. 4. Skip Method. To limit the number of generated model candidates as much as possible, we can use a method to delay the case-splitting of hypotheses. This approach is similar to the processing of negation as failure with the MGTP [Inoue et al., 1992a], introduced in the previous subsection. That is, we do not supply any rule of the form (6) for any hypothesis of f, but instead, we introduce hypotheses when they are necessary. When a clause in L:; contains negative occurrences of abducible predicates HI," ., Hm (Hi E f, m 2: 0) and is in the form: Al A ... A Al A HI A ... A Hm ::) C , --..--abducibles we translate it into the following MGTP rule: multiple MGT P -2's at one time. This system achieves more speed-up than the MGTP+ATMS method. However, since Jl.1GT P _1 is not parallelized, the effect of In this translation, each hypothesis in the premise part is skipped instead of being resolved, and is moved to the right-hand side. This operation is 145 Dependencies ATMS MGTP M Current Set of Beliefs Consistency Checks Model Generation Figure 9: MGTP+ATMS Hypotheses MGTP_1 MGTP_2 M Sat / Unsat Model Generation cc Consistency Checks Figure 10: MGTP+MGTP a counterpart to the Skip rule in the top-down Just as in approach defined in [Inoue 1991b]. schema (3) for negation as failure, a model candidate containing both Hand -,KH is rejected by the schema: -,KH, H-t for every hypothesis H . Some results of evaluation of these abductive systems as applied to planning and design problems are described in [Inoue et al., 1992b]. We are now improving their performance for better parallelism. Although we need to investigate further how to avoid possible combinatorial explosion in model candidate construction for the skip method, we conjecture that the skip method (or some variant thereof) will be the most promising from the viewpoint of parallelism. Also, the skip method may be easily combined with negation as failure so that knowledge bases can contain both abducible predicates and negation-as-failure formulas as in the approach of [Inoue 1991a]. 3.3 3.3.1 Program Synthesis by Realizability Interpretation Program Synthesis by MGTP We used Realizability Interpretation (an extension of Curry-Howard Isomorphism) in the area of constructive mathematics [Howard 1980], [Martin 1982] in order to give an executable meaning to proofs obtained by efficient theorem provers. Our approach for combining prover technologies and Realizability Interpretation has the following advantages: • This approach is prover independent and all provers are possibly usable. • Realizability Interpretation has a strong theoretical background. • Realizability Interpretation is general enough to cover concurrent programs. Two systems MGTP and PAPYRUS, developed in ICOT, are used for the experiments on sorting algorithms in order to get practical insights into our approach(Figure 11). A model generation theorem prover (MGTP) implemented in KL1 runs on a parallel machine:Multi-PSI. It searches for proofs of specification expressed as logical formulae. MGTP is a hyper-resolution based bottom up (infers from premises to goal) prover. Thanks to KL1 programming technology, MGTP is simple but works very efficiently if problems satisfy the rangerestrictedness condition. The inference mechanism of MGTP is similar to SATCHMO[Manthey and Bry 1988], in principle. Hyper-resolution has an advantage for program synthesis in that the inference system is constructive. This means that no further restriction is needed to avoid useless searching. PAPYRUS (PArallel Program sYnthesis by Reasoning Upon formal Systems) is a cooperative workbench for formal logic. This system handles the proof trees of user defined logic in Edinburgh Logical Framework(LF)[Harper et al., 1987]. A typed lambda term in LF represents a proof and a program can be extracted from this term by lambda computation. This system treats programs (functions) as the models of a logical formula by user defined Realizability Interpretation. PAPYRUS is an integrated workbench for logic and provides similar functions to PX[Hayashi and Nakano 1988], Nuprl[Constable et aI., 1986], and Elf[Pfenning 1988]. We faced two major problems during research process: • Program extraction from a proof in clausal form, and 146 PAPYRUS , ~ Propositions I Subprograms ProofTrm J--i Realizer ~rogram Problem I Equations I IPropositions I '" ~I I I Goals . MGTP MGTPprover • Demodulator -.• Proof Tree • • Equality Reasoning } Figure 11: Pro.gram Synthesis by MGTP • Incorporation of induction and equality. The first problem relates to the fact that programs' cannot be extracted from proofs obtained by using the excluded middle, as done in classical logic. The rules for transforming formulae into clausal form contains such a prohibited process. This problem can be solved if the program specification is given in clausal form because a proof can be obtained from the clause set without using the excluded middle. The second problem is that all induction schemes are expressed as second-order propositions. In order to handle this, second-order unification will be needed, which still is impracticaL However, it is possible to transform a second-order proposition to a first-order proposition if the program domain is fixed. Proof steps of equality have nothing to do with computation, provers can use efficient algorithms for equality as an attached procedure. 3.3.2 A Logic System for Extr~cting Interactive Processes There has been some research Martin 1982, Sato 1986] and [Howard 1980, [Hayashi and Nakano 1988] into program synthesis from constructive proofs. In this method, an interpretation of formulas is defined, and the consistent proof of the formula can be translated into a program that satisfies the interpretation. Therefore we can identify the formula as the specification of the program, proof as programming, and proof checking as program checking. Though this method has many useful points, the definition of a program in this method is only ",\ TerI~ (function)". Thus it is difficult to synthesize a program as a parallel process by which computers can communicate with the outside world. We proposed a new logic fl, that is, a constructive logic extended by introducing new operators fl and Q. The operator fl is a fixpoint operator on formulae. We can express the non-bounded repetition of inputs and outputs with operators fl and Q. Further, we show a method to synthesize a program as a parallel process like CCS[Milner 1989] from proofs of logic fl. We also show the proof of consistency of Logic fl and the validity of the method to synthesize a program. 3.4 Application of Theorem Proving to Specification Switching System of a We apply a theorem proving technique to the software design of a switching system, whose specifications are modeled by finite state diagrams. The main points of this project are the following: 1) Specification description language Ack, based on a transi tion system. 2) Graphical representation in Ack. 3) Ack interpreter by MGTP. We introduce the protocol specification description language, Ack. It is not necessary to describe all state transitions concretely using Ack, because several state transitions are deduced from one expression by means of theorem proving. Therefore, we can get a complete specification from an ambiguous one. Ack is based on a transition 'system (8, So, A, T), where 8 is a set of state, So (E S) is an initial state, A is a set of actions, and T(T ~ 8 x A x 8) is a set of transition relations. 147 onhook(a) ........ ~......... )idle(a) ..•....•.•.••.;. Figure 13: An interpretation result of Ack specification Figure 12: An example of Ack specification • action onhook(a) from idle(a) to idle(a). Graphical representation in Ack consists of labeled circles and arrows. A circle means a state and an arrow means an action. Both have two colors: black and gray. This means that when a gray colored state transition exists a black colored state transition exists. Textual phrase representation in Ack can be represented by a first order predicate logic by the following. VX3Y(A[X] --+ B[X, Y]). where A[X] and B[X, Y] are conjunctions of the following atomic formulas. state( S) - S means a state. trans(A, So, Sl) - An action A means a state So to a state S1' A[X] corresponds to grayed color state transitions and B[X, Y] corresponds to black color state transitions. The Ack interpreter is described by MGTP. This type offormula is translated into an MGTP formula. A set of models deduced from Ack specification formulae form a complete state transition diagram. Figure 12 shows an example of Ack specification. Rule 1 of Figure 1 means the existence of an action sequence from an initial state idl e( a) such that offhook(a) --+ dial(a,b) --+ offhook(b). This is represented by the following formula. --+ trans( offhook( a), idle( a), dt( a)), trans(dial(a, b), dt(a), rbt(a)), trans( ofihook(b), rbt( a), x( a, b)). Rule 2 of Figure 1 means that the action offhook(a) changes any state to idle(a). It is represented by the following formula. VS( state(S) 1\ state(idle( a)) --+trans( onhook( a), S, idle( a))) Figure 13 shows an interpretation of the result of Figure 12. In this example, the following four transitions are automatically generated. • action onhook(a) from dt(a) to idle(a). • action onhook(a) from rbt(a) to idle(a). • action onhook(a) from x(a, b) to idle(a). 3.5 MENDELS ZONE: A Parallel Program Development System MENDELS ZONE is a software development system for parallel programs. The target parallel programming language is MENDEL, which is a textual form of Petri Nets, MENDEL is then translated into the concurrent logic programming language KLI and executed on the MultiPSI. MENDEL is regarded as a more user-friendly version of the language. MENDEL is convenient for the programmer to use to design cooperating discrete event systems. MENDELS ZONE provides the following functions: 1) Data-flow diagram visualizer [Honiden et al., 1991] 2) Term rewriting system: Metis[Ohsuga et al., 90][Ohsuga et al., 91] 3) Petri Nets and temporal logic based programming environment [Uchihira et al., 90a][Uchihira et al., 90b] For 1), we define the decomposition rule for data-flow diagram and extract the MENDEL component from decomposed data-flow diagrams. A detailed specification process from abstract specification is also defined by a combination of data-flow diagrams and equational formulas. For 2), Metis is a system to supply experimental environment for studying practical techniques for equational reasoning. The policy of developing Metis is enabling us to implement, test, and evaluate the latest techniques for inference as rapidly and freely as possible. The kernel function of Metis is Knuth-Bendix (KB) completion procedure. We adopt Metis as a tool for verifying the MENDEL component. The MENDEL component can be translated into a component of Petri Nets. For 3), following sub-functions are provided: 148 1. Graphic editor The designer constructs each component of Petri Nets using the graphic editor, which provides creation, deletion, and replacement. This editor also supports expansion and reduction of Petri Nets. 2. Method editor The method editor provides several functions specific to Petri Nets. Using the method editor, the designer describes methods (their conditions and actions) in detail using KLl. 3. Component library Reusable component are stored in the component library. The library tool supports browsing and searching for reusable components. 4. Verification and synthesis tool Only the skeletons of Petri Nets structures are automatically retracted (slots and KL1 codes of methods are ignored) since our verification and synthesis are applicable to bounded net. The verification tools verifies whether Petri Nets satisfy given temporal logic constraints. Inference Analogical Reasoning Analogical reasoning is often said to be at the very coreof human problem-solving and has long been studied in the field of artificial intelligence. We treat a general type of analogy, described as follows: when two objects, B (called the base) and T (called the target), share a property S (called the similarity), it is conjectured that T satisfies another property P (called the projected property) which B satisfies as well. In the study of analogy, the following have been central problems: 1) Selection of an object as a base w.r.t a target. gies. The verified Petri Nets are translated into their textual form (MENDEL programs). The MENDEL programs are compiled into KL1 programs, which can be executed on Multi-PSI. During execution, firing methods are displayed on the graphic editor, and values of tokens are displayed on the message window. The designer can check visually that program behaves to satisfy his expectation. Advanced Learning 4.1 2) Selection of pertinent properties for drawing analo- 5. Program execution on Multi-PSI 4 have also investigated the application of minimally multiple generalization for constructive logic pograms learning. In the empirical approach, we have studied automated programming, especially, the logic program transformation and synthesis method based on unfold/fold transformation which is a well-known technique for deriving correct and efficient programs. The following subsections briefly describe these studies and their results. and It is expected that we will, before long, face a software crisis in which the necessary quantity of computer soft,,\Tare cannot be provided even if we were all to engage in software production. In order to avoid this crisis, it is necessary for a computer system itself to produce software or new information adaptively in problem-solving. The aim of the study on advanced inference and learning is to explore the underlying mechanism for such a system. In the current stage in which we have no absolute approach to the goal, we have had to do exhaustive searches. We have taken three different but co-operative approaches: logical, computational and empirical In the logical approach, analogical reasoning has been analyzed formally and mechanisms for analogical reasoning have been explored. In the computational approach, we have studied inventing new predicates, which are one of the lnost serious problems in learning logic programs. \Ale 3) Selection of a property for projection w.r ..t. a certain similarity. Unfortunately, most previous works were only partially successful in answering these questions, by proposing solu tions a priori. Our objective is to clarify, as formally aspossible,the general relationship between those analogical factors T, B, S, and P under a given theory A. To find the relationship bvetween the analogical factors would answer these problems once and for all. In [Arima 1992, Arima 1991], we clarify such a relation and show a general solution. When analyzing analogical reasoning formally based on classical logic, the following are shown to be reasonable: • Analogical reasoning is possible only if a certain form of rule, called the analogy prime rule (APR), is a ded ucti ve theorem of a given theory. If we let S (x) = ~(x,S) and P(x) = II(x,P), then the rule has the following form: vx, s, p. Jatt ( S , p) 1\ Job j ( X, S) 1\ ~ ( X, S) ::) II ( X, p) , where each of Jatt(s,p), Jobj(X,S), ~(x,s) and II(x,p) are formulae in which no variable other than its argument occurs freely. • An analogical conclusion is derived from the APR, together with two particular conjectures: one conjecture is Jatt(S, P) where, from the information about 149 the base case, E(B, S) (= S(B)) and II(B, P) (= P(B)). The other is Jobj(T, S) where, from the information about the target case, E(T, S)( = S(T)). 4.2.2 Minimally Multiple Generalization Machine Learning is one of the most important themes in the area of artificial intelligence. A learning ability is necessary not only for processing and maintaining a large amount of knowledge information but also for realizing a user-friendly interface. We have studied the invention of new predicates is one of the most serious problems in learning logic programs. We have also investigated the application of minimally multiple generalization to the constructive learning of logic programs. Another important problem in learning logic programs is to develop a constructive algorithm for learning. Most learning by induction algorithms, such as Shapiro's model inference system, are based on a search or enumerative method: While search and enumerative methods are often very powerful, they are very expensive. A constructive method is usually more efficient than a search method. In the constructive learning of logic programs, the notion of least generalization [Plotkin 1970] plays a central role. Recently, Arimura proposed a notion of minimally multiple generalization (mmg) [Arimura 1991], a natural extension of least generalization. For example, the pair of heads in a clause in a normal append program is one head in the mmg for the Herbrand model of the program. Thus, mmg can be applied to infer the heads of the target program. Arimura has also given a polynomial time algorithm to compute mmg. We are now investigating an efficient constructive learning method using mmg. 4.2.1 4.3 Also, a candidate based on abduction + deduction is shown for a non-deductive inference system which can yield both conjectures. 4.2 Machine Learning of Logic Programs Predicate Invention Shapiro's model inference gives a very important strategy for learning programs - an incremental hypothesis search using contradiction backtracing. However, his theory assumes that an initial hypothesis language with enough predicates to describe a target model is given to the learner. Furthermore, it is assumed that the teacher knows the intended model of all the predicates. Since this assumption is rather severe and restrictive, for the practical applications of learning logic programs, it should be removed. To construct a learning system without such assumptions, we have to consider the problem of predicates invention. Recently, several approaches to this challenging and difficult problem have been presented [Muggleton and Buntine 1988], and [Ling 1989]. However, most of them do not give sufficient analysis on the computational complexity of the learning process, which is where the hypothesis language is growing. We discussed the problem as nonterminal invention in grammatical inference. As is well known, any context-free grammar can be expressed as a special form of the DCG (definite clause grammar) logic program. Thus, nontermina.l invention in grammatical inference corresponds to predicate invention. We have proposed a polynomial time learning algorithm for the class of simple deterministic languages based on nonterminal invention and contradiction backtracking[Ishizaka 1990]. Since the class of simple deterministic languages strictly includes regular languages, the result is a natural extension of our previous work[Ishizaka 1989]. Logic Program Transformation / Synthesis Automated programming is one important advanced inIn researching ference problem. automatic program transformation and synthesis, the unfold/fold transformation [Burstall and Darlington 1977, Tamaki and Sato 1984] is a well-known program technique to derive correct and efficient programs. Though unfold/fold rules provide a very powerful methodology for program development, the application of those rules needs to be guided by strategies to obtain efficient programs. In unfold/fold transformation, the efficiency improvement is mainly the result of finding the recursive definition of a predicate, by performing folding steps. Introduction of auxiliary predicates often allows folding steps. Thus, invention of new predicates is one of the most important problems in program transformation. On the other hand, unfold/fold transformation is often utilized for logic program synthesis. In those studies, unfold/fold rules are used to eliminate quantifiers by folding to obtain definite clause programs from first order fornmlae. However, in most of those studies, unfold/fold rules were applied nondeterministically and general methods to derive definite clauses were not known. We have studied logic program transformation and synthesis method based on unfold/fold transformation and have obtained the following results. (1) We investigated a strategy of logic program transformation based on unfold/fold rules [Kawamura 1991]. New predicates synthesized automatically to perform folding. We also extended 150 this method to incorporate goal replacement transformation [Tamaki and Sato 1984J. (2) We showed a characterization of classes of first order formulae from which definite clause programs can be derived automatically [Kawamura 1992J. Those formulae are described by Horn clauses extended by universally quantified implicational formulae. A synthesis procedure based on generalized unfold/fold rules [Kanamori and Horiuchi 1987J is given, and with some syntactic restrictions, those formulae successfully transformed into equivalent definite clause programs. 5 Conclusion We have overviewed research and -development of parallel automated reasoning systems at ICOT. The constituent research tasks of three main areas provided us with the following very promising technological results. (1) Parallel Theorem Prover and its implementation techniques on PIM We have presented two versions of a modelgeneration theorem prover MGTP implemented in KL1: MGTP /G for ground models and MGTP /N for non-ground models. We evaluated their performance on the distributed memory multi-processors Multi-PSI and PIM. Range-restricted problems require only matching rather than full unification, and by making full use of the language features of KL1, excellent efficiency was achieved from MGTP /G. To solve non-range-restricted problems by the model generation method, however, MGTP /N is restricted to Horn clause problems, using a set of KL1 metaprogramming tools called the Meta-Library, to support the full unification and the other functions for variable management. To improve the efficiency of the MGTP provers, we developed RAMS and MERC methods that enable us to avoid redundant computations in conjunctive matching. We were able to obtain good performance results by using these methods on PSI. To ease severe time and space requirements in proving hard mathematical theorems (such as condensed detachment problems) by MGTP /N, we proposed the lazy model generation method, which can decrease the time and space complexity of the basic algorithm by several orders of magnitude. Our results show that significant saving in computation and memory can be realized by using the lazy algOl'ithm. For non-Horn ground problems, case splitting was used as the basic seed of OR parallel MGTP /G. This kind of problem is well-suited to MIMD machine such as Multi-PSI, on which it is necessary to make granularity as large as possible to minimize communication costs. We obt'ained an almost linear speedup for the n-queens, pigeon hole, and other problems on Multi-PSI, using a simple allocation scheme for task distribution. For Horn non-ground problems, on the other hand, we had to exploit the AND parallelism inherent to conjunctive matching and subsumption. We found that good performance and scalability were obtained by using the AND parallelization schemeof MGTP/N. In particular, our latest results, obtained with the MGTP /N prover on PIM/m, showed linear speedup on condensed detachment problems, at least up to 128 PEs. The key technique is the lazy model generation method, that avoids the unnecessary computation and use of time and space while maintaining a high running rate. The full unification algorithm, written in KL1 and used in MGTP /N, is one hundred times slower than that written in C on SPARCs. We are considering the incorporation of built-in firmware functions to bridge this gap. But developing KL1 compilation techniques for non-ground models, we believe, will further contribute to parallel logic programming on PIM. Through the development of MGTP provers, we confirmed that KL1 is a powerful tool for the rapid prototyping of concurrent systems, and that parallel automated reasoning systems can be easily and effectively built on the parallel inference machine, PIM. (2) Applications The modal logic prover on MGTP /G realizes two advantages. The first is that the redundancy elimination and parallelization of MGTP /G directly endow the prover with good performance. The second is that direct representation of tableaux rules of modal logic as hyper-resolution clauses are far more suited to adding heuristics for performance. This prover exhibited excellent benchmark results. The basic idea of non-monotonic and abductive systems on MGTP is to use the MGTP as an metainterpreter for each system's special properties, such as nonmonotonic provability (negation as failure) and the consistency of abductive explanations, into formulae having a kind of modality such that MGTP can deal with them within classical logic. The extra requirements for these special properties are thus reduced to "generate-and-test" problems of model candidates that can be efficiently handled by MGTP 151 through the case-splitting of non-unit consequences and rejection of inconsistent model candidates. the Knuth-Bendix (KB) completion procedure. We adopt Metis to verify the components of Petri Nets. We used MGTP for the application of program synthesis in two ways. Only the skeletons of Petri Net structures are automatically retracted (slots and the KLI codes of methods are ignored) since our verification and synthesis are applicable to a bounded net. The verification tool verifies whether Petri Nets satisfy given temporal logic constraints. In one approach, we used Realizability Interpretation( an extension of Curry-Howard Isomorphism), an area of constructive mathematics, to give executable meaning to the proofs obtained by efficient theorem provers. Two systems, MGTP and PAPYRUS, both developed in ICOT, were used for experiments on sorting algorithms to obtain practical insights into our approach. We performed experiments on sorting algorithms and Chinese Reminder problems and succeeded in obtaining ML programs from MGTP proofs. To obtain parallel programs, we proposed a new logic ~l, that is a constructive logic extended by introducing new operators ~ and q. Operator ~ is a fixpoint operator on formulae. We can express the nonbounded repetition of inputs and outputs with operators ~ and q. Furthermore, we 'showed a method of synthesizing "program" as a parallel process, like CCS, from proofs of logic~. We also showed the proof of consistency of Logic ~ and the validity of the method to synthesize "program". Our other approach to synthesize parallel programs by MGTP is the use of temporal logic, in which specifications are modeled by finite state diagrams, as follows. 1) Specification description language Ack, based on a transition system. 2) Graphical representation in Ack. 3) Ack interpreter by MGTP. It is not necessary to describe all state transitions concretely using Ack, because several state transitions are deduced from one expression by theorem proving in temporal logic. Therefore, we can obtain a complete specification from an ambiguous one. Another approach is to use term rewriting systems(Metis). MENDELS ZONE is a software development system for parallel programs. The target parallel programming language is MENDEL, which is a textual form of Petri Nets, that is translated into the concurrent logic programming language KLI and executed on Multi-PSI. We defined the decomposition rules for data-flow diagrams and subsequently extracted programs. Metis provides an experimental environment for studying practical techniques by equational reasoning, of implement, and test. The kernel function of Metis is (3) Advanced Inference and Learning To extend the reasoning power of AR systems, we have taken logical, computational, and empirical approaches. In the logical approach, analogical reasoning, considered to be at the very core of human problemsolving, has been analyzed formally and a mechanism for analogical reasoning has been explored. In this approach, our objective was to clarify a general relationship between those analogical factors T, B, Sand P under a given theory A, as formally as possible. Determining the relationship between the analogical factors would answer these problems once and for all. We clarified the relationship and formulated a general solution for them all. In the computational approach, we studied the inventing of new predicates, one of the most serious problems in the learning of logic programs. We proposed a polynomial time learning algorithm for the class of simple deterministic languages, based on nonterminal invention and contradiction backtracing. Since the class of simple deterministic languages includes regular languages, the result is a natural extension of our previous work. We have also investigated the application of minimally multiple generalization to the constructive learning of logic programs. Recently, Arimura proposed the notion of minimally multiple generalization (mmg) . vVe are now investigating an efficient constructive learning method that uses mmg. In the empirical approach, we have studied automated programming, especially, the logic pTogram transformation and synthesis method based on an unfold/fold transformation, a well-known means of deriving correct and efficient programs. We investigated a strategy for logic program transformation based on unfold/fold rules. New predicates are synthesized automatically to perform folding. We also extended this method to incorporate a goal replacement transformation. We also showed a characterization of the classes of first order formulae, from which definite clause programs can be derived automatically. These formulae are described by Horn clauses, extended by universally quantified implicational formulae. A synthesis procedure based on generalized unfold/fold rules 152 is given, and with some syntactic restrictions, these formulae can be successfully transformed into equivalent definite clause programs. These results contribute to the development of FGCS, not only in AI applications, but also in the foundation of the parallel logic programming that we regard as being the kernel of FGCS. Acknowledgment The research on automated reasoning systems was carried out by the Fifth Research Laboratory at ICOT in tight cooperation with five manufactures. Thanks are firstly due to who have given support and helpful comments, including Dr. Kazuhiro Fuchi, the director of ICOT, and Dr. Koichi Furukawa; the deputy director of ICOT. Many fruitful discussions took place at the meetings of Working Groups: PTP, PAR, ANR, and ALT. We would like to thank the chair persons and all other members of the Working Groups. Special thanks go to many people at the cooperating manufacturers in charge of the joint research programs. [Nitta et al., 1992J K. Nitta, Y. Ohtake, S. Maeda, M. Ono, H. Ohsa¥i and K. Sakane, HELIC-II: a legal resoning system on the parallel inference machine, in Proc. of FGCS'92, Tokyo, 1992. [Stickel 1988J M.E. Stickel, A Prolog Technology Theorem Prover: Implementation by an Extended Prolog Compiler, in Journal of Automated Reasoning 4 pp.353-380, 1988. [McCune 1990J W.W. McCune, OTTER 2.0 Users Guide, Argonne National Laboratory, 1990. [McCune and Wos 1991] W.W. McCune and L. Wos, - . Experiments in Automated Deduction with Condensed Detachment, Argonne National Laboratory, 1991. [Overbeek 1990J R. Overbeek, Challenge Problems, (private communication) 1990. [Wilson and Loveland 1989] D. Wilson and D. Loveland, Incorporating Relevancy Testing in SATCHMO, Technical Report of CS(CS-1989-24), Duke University, 1989. References [Fitting 1983] M. Fitting, Proof Methods for Modal and Intuitionistic Logic, D.Reidel Publishing Co., Dordrecht 1983. [Fuchi 1990J K. Fuchi, Impression on KL1 programming - from my experience with writing parallel provers -, in Proc. of KLl Programming Workshop '90, pp.131-139, 1990 (in Japanese). [Fitting 1988] M. Fitting, "First-Order Modal Tableaux", Journal of Automated Reasoning, Vol.4, No.2, 1988. [Hasegawa et al., 1990J R. Hasegawa, H. Fujita and M. Fujita, A Parallel Theorem Prover in KL1 and Its Application to Program Synthesis, in ItalyJapan-Sweden Workshop '90, ICOT TR-588, 199B. [Koshimura and Hasegawa 1991] M. Koshimura and R. Hasegawa, "Modal Propositional Tableaux in a Model Generation Theorem Prover" , In Proceedings of the Logic Programming Conference '91, Tokyo, 1991 (in Japanese). [Fujita and Hasegawa 1990J H. Fujita and R. Hasegawa, A Model Generation Theorem Prover in KL1 Using Ramified-Stack Algorithm, ICOT TR-606, 1990. [Hasegawa 1991J R. Hasegawa, A Parallel Model Generation Theorem Prover: MGTP and Further Research Plan, in Proc. of the Joint AmericanJapanese Workshop on Theorem Proving, Argonne, Illinois, 1991. [Hasegawa et al., 1992J R. Hasegawa, M. Koshimura and H. Fujita, Lazy Model Generation for Improving the Efficiency of Forward Reasoning Theorem Provers, ICOT TR-751, 1992. [Koshimura et al., 1990J M. Koshimura, H. Fujita and R. Hasegawa, Meta-Programming in KL1, ICOTTR-623, 1990 (in Japanese). [Manthey and Bry 1988J R. Manthey and F. Bry, SATCHMO: a theorem prover implemented in Prolog, in Proc. of CADE 88, Argonne, illinois, 1988. [Smullyan 1968] R.M. Smullyan, First-Order Logic, Vol 43 of Ergebnisse der Mathematik, Springer-Verlag, Berlin, 1968. [Arima 1991] J. Arima, A Logical Analysis of Relevance in Analogy, in Proc. of Workshop on Algorithmic Learning Theory ALT'91, Japanese Society for Artificial Intelligence, 1991. [Arima 1992] J. Arima, Logical Structure of Analogy, in FGCS'92, Tokyo, 1992. [Kohda and Maeda 91a] Y. Kohda and M. Maeda, Strategy Management Shell on a Parallel Machine, IIAS RESEARCH Memorandum IIAS-RM-91-8E, Fujitsu, October 1991. [Kohda and Maeda 1991b] Y. Kohda and M. Maeda, Strategy Management Shell on a Parallel Machine, in poster session of ILPS'91, San Diego, October 1991. 153 [Maeda et al., 1990] M. Maeda, H. Uoi, N. Tokura, Process and Stream Oriented Debugger for GHC programs, Proceedings of Logic Programming Conference 1990, pp.169-178, ICOT, July 1990. [Plotkin 1970] G.D. Plotkin, A note on inductive generalization. In B. Meltzer and D. Michie, editors, lo.lachine Intelligence 5, pp. 153-163. Edinburgh University Press, 1970. [Maeda 1992] M. Maeda, Implementing a Process Oriented Debugger with Reflection and Program Transformation, in Proc. of FGCS'92, Tokyo, 1992. [Burstall and Darlington 1977J R.M. Burstall and J. Darlington, "A Transformation System for Developing Recursive Programs", J.ACM, Vo1.24, No.1, pp.44-67, 1977. [Smith 1984] B.C. Smith, Reflection and Semantics in Lisp, Conference Record of the 11th Annual ACM Symposium on Principles of Programming Languages, pp.23-35, ACM, January 1984. [Sugano 1990] H. Sugano, Meta and Reflective Computation in Logic Programs and its Semantics, Proceedings of the Second Workshop on MetaProgramming in Logic, Leuven, Belgium, pp.19-34, April, 1990. [Sugano 1991] H. Sugano, Modeling Group Reflection in a Simple Concurrent Constraint Language, OOPSLA '91 Workshop on Reflection and Metalevel Architectures in Object-Oriented Programming, 1991. [Tanaka 1991] J. Tanaka, An Experimental Reflective Programming System Written in GHC, Journal of Information Processing, Vol.l4, No.1, pp.74-84, 1991. [Tanaka and Matono 1992] J. Tanaka and F. Matono, Constructing and Collapsing a Reflective Tower in Reflective Guarded Horn Clauses, in Proc. of FGCS'92, Tokyo, 1992. [Arimura 1991] H. Arimura, T. Shinohara and S. Otsuki, Polynomial time inference of unions of tree pattern languages. In S. Arikawa, A. Maruoka, and T. Sato, editors, Proc. ALT '91, pp. 105-114. Ohmsha, 1991. [Ishizaka 1989] H. Ishizaka, Inductive inference of regurar languages based on model inference. International journal of Computer .fo/[athematics, 27:67-83, 1989. [Ishizaka 1990] H. Ishizaka, Polynomial time learnability of simple deterministic languages. Machine Learning, 5(2):151-164, 1990. [Kanamori and Horiuchi 1987J T. Kanamori and K. Horiuchi, "Construction of Logic Programs Based on Generalized Unfold/Fold Rules", Proc. of 4th International Conference on Logic Programming, pp.744-768, Melbourne, 1987. [Kawamura 1991] T. Kawamura, "Derivation of Efficient Logic Programs by Synthesizing New Predicates", Proc. of 1991 International Logic Programming Symposium, pp.611 - 625, San Diego, 1991. [kawamura 1992J T. Kawamura, "Logic Program Synthesis from First Order Logic Specifications", to appear in International Conference on Fifth Generation Computer Systems 1992, Tokyo, 1992. [Tamaki and Sato 1984] H. Tamaki and T. Sato, "Unfold/Fold Transformation of Logic Programs", Proc. of 2nd International Logic Programming Conference, pp.127-138, Uppsala, 1984. [Bry 1990] F. Bry, Query evaluation in recursive databases: bottom-up and top-down reconciled. Data & Knowledge Engineering, 5:289-312, 1990. [de Kleer 1986] J. de Kleer, An assumption-based TMS. Artificial Intelligence, 28:127-162, 1986. [Fujita and Hasegawa 1991J H. Fujita and R. Hasegawa, A model generation theorem prover in KLI using a ramified-stack algorithm. In: Proceedings of the Eighth International Conference on Logic Programming (Paris, France), pp. 535-54:8, MIT Press, Cambridge, MA, 1991. [Gelfond and Lifschitz 1991] M. Gelfond and V. Lifschitz, Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365-385, 1991. [Ling 1989] X. Ling, Inventing theoretical terms in inductive learning of functions - search and constructive methods. In Zbigniew W. Ras, editor, lo.lethodologies for Intelligent Systems, 4, pp. 332341. North-Holland, October 1989. [Inoue 1988] K. Inoue, Problem solving with hypothetical reasoning, in Proc. of FGCS'88, pp. 1275-1281, Tokyo, 1988. Muggleton [Muggleton and Buntine 1988] S. and W. Buntine, Machine invention of first-order predicates by inverbng resolution. In Proc. 5th International Confe7'ence on lo.lachine Learning, pp. 339-352, 1988. [Inoue 1991aJ K. Inoue, Extended logic programs with default assumptions, in Proc. of the Eighth InternaUonal Conference on Logic Programming (Paris, France), pp. 490-504, MIT Press, Cambridge, MA, 1991. 154 [Inoue 1991b] K. Inoue, Linear resolution for consequence-finding, To appear in: Aritijicial Intelligence, An earlier version appeared as: Consequence-finding based on ordered linear resolution, in Proc. of IJCAI-91, pp. 158-164, Sydney, Australia, 1991. [Inoue et al., 1992a] K. Inoue, M. Koshimura and R. Hasegawa, Embedding negation as failure into a model generation theorem prover, To appear in CADE 92, Saratoga Springs, NY, June 1992. [Inoue et al., 1992b] K. Inoue, Y. Ohta, R. Hasegawa and M. Nakashima, Hypothetical reasoning systems on the MGTP, ICOT-TR 1992 (in Japanese). [Loveland 1978] D.W. Loveland, Automated Theorem Proving: A Logical Basis . . North-Holland, Amsterdam, 1978. [Ohta and Inoue 1990] Y. Ohta and K. Inoue, A forward-chaining multiple context reasoner and its application to logic design, in: Proceedings of the Second IEEE International Conference on Tools for A rt1ficial Intelligence, pp. 386-392, Herndon, VA, 1990. [Ohta and Inoue 1992] Y. Ohta and K. Inoue, A forward-chaining hypothetical reasoner based on upside-down meta-interpretation, in Proc. of FGCS'92, Tokyo, 1992. [Poole et al., 1987J D. Poole, R. Goebel and R. Aleliunas, Theorist: a logical reasoning system for defaults and diagnosis, In: Nick Cercone and Gordon McCalla, editors, The Knowledge Frontier: Essays in the Representation of Knowledge, pp. 331-352, Springer-Verlag, New York, 1987. [Stickel 1991J M.E. Stickel, Upside-down meta-interpretation of the model elimination theorem-proving procedure for deduction and abd.uction, ICOT TR-664, 1991. [Constable et al., 1986J R.1. Constable et aI, Implementing Mathematics with the Nuprt Proof Development System) Prenticd-Hall, NJ, 1986. [Hayashi and Nakano 1988J S. Hayashi and H. Nakano, PX: A Computational Logic) MIT Press, Cambridge, 1988. F. Honsell and [Harper et at., 1987J R. Harper, G. Plotkin, A Framework for Defining Logics, in Symposium on Logic in Computer Science, IEEE, pp. 194-204, 1987. [Pfenning 1988J F. Pfenning, Elf: A Language for Logic Definition and Verified Meta-Programming, in Fourth Annual Symposium on Logic in Computer Science, IEEE, pp. 313-322, 1989. [Takayama 1987J Y. Takayama, Writing-Programs as QJ Proof and Compiling into Prolog Programs, in Proc. of IEEE The Symposium on Logic Programming '87, pp. 278-287, 1987. [Howard 1980] W.A. Howard, "The formulae-as-types notion of construction" , in Essays on Combinatory Logic, Lambda Calculus and Formalism, Academic Press, pp.479-490, 1980. [Martin 1982J P. Martin-Lof, "Constructive mathematics and computer programming", in Logic, Methodology, and Philosophy of Science VI, Cohen, 1.J. et aI, eds., North-Holland, pp.153-179, 1982. [Sato 1986] M. Sato, "QJ: A Constructive Logical System with Types", France-Japan Artificial Intelligence and Computer Science Symposium 86, Tokyo, 1986. [Milner 1989] R. Milner, "Communication and Concurrency", Prentice-Hall International, 1989. [Honiden et al., 1990] S. Honiden et al., An Application of Structural Modeling and Automated Reasoning to Real-Time Systems Design, in The Journal of Real-Time Systems, 1990. [Honiden et at., 1991] S. Honiden et al., An Integration Environment to Put Formal Specification into Practical Use in Real-Time Systems, in Proc. 6th IWSSD, 1991. [Ohsuga et at., 91J A. Ohsuga et al., A Term Rewriting System Generator, in Software Science and Engineering, World Scientific ,1991. [Ohsuga et al., 90] A. Ohsuga et al., Complete Eunification based on an extension of the KnuthBendix Completion Procedure, in Pmc. of Workshop on Word Equations and Related Topics, LNCS 572, 1990. [Uchihira et al., 90aJ N. Uchihira et al., Synthesis of Concurrent Programs: Automated Reasoning Complements Software Reuse, in Proc. of 23th HICSS, 1990. [Uchihira et al., 90bJ N. Uchihira et al., Verification and Synthesis of Concurrent Programs Using Petri Nets and Temporal Logic, in Trans. IEICE, Vol. E73, No. 12, 1990. PROCEEDINGS OF THE INTERNA nONAL CONFERENCE ON FIFTH GENERA nON COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 155 Natural Language Processing Software Yuichi Tanaka Sixth RE'sE'arch Laboratory Institute for New Generation Computer Technology 4-28, Mita l-chome. Minato-ku. Tokyo 108. Japan ytanaka@icot.or.jp Abstruct In the Fifth Generation Computer Systems project. the goal of natural language processing (NLP) is to build an intelligent user interface for the proto-type machine of the Fifth Generation. In the initial and intermediate stage of our project. mathematical and linguistic theories of discourse understanding was investigated and we built some experimental systems for the theories. In the final stage. we have built a system of general tools for NLP and, using them. developed experiment.al systems for discourse processing. based on the result and experience of the software development in the past two stages. In the final stage, we have four themes of NLP research and development. The first theme, Language Knowledge-base, is a collection of basic knowledge for NLP including Japanese grammar and Japanese dictionary. In the second theme, Language Tool Box, we have developed several basic tools especially for Japanese processing. Tools are: morphological and syntax analyzers, sentence generator. concordance system, and etc. These two themes form the infrastructure of our NLP systems. Experiment with discourse processing is the third and main theme of our research. We have developed several systems in this field including text generation, discourse structure construction, and dialog systems. The last theme is parallel processing. We have developed an experimental system for cooperative parallel natural language processing in which morphological analysis, syntax analysis, and semantic analysis are integrated in a uniform process in a type inference framework. 1 Introduction To establish an intelligent interface between machine and human, it is necessary to research discourse processing. In discourse processing we include not only discourse un~ derstanding where compter understands the contents of utterances of human and infers the human's intention. Parallel Natural Language Processing Morphological, Syntactic, Semantic Analysis based on Type Inference Natural Language Interface Discourse Processing Systems Linguistic Knowledge-base Language Tool Box Figure 1: Overview of NLP Software but also text generation by which more than one sentellCes expressing speaker's consistent assertion are produced. We put this discourse processing research at the center of our research and development activity. and also develop some supporting tools and data as the infrastructure. Language Knowledge-base is a collection of basic knowledge for natural language processing including Japanese grammar and Japanese dictionary. We have build a Japanese grammar in phrase structure grammar based on unification grammar formalism. ntil now. r 156 there were no Japanese grammar with sufficient size for practical use and usable by every researcher and developer. The purposes of development of this grammar are these two points. It is written in DCG (Definite Clause Grammar) based on the exhaustive investigation of Japanese language phenomena. Also we have developed a .] apanese grammar based on dependency grammar formalism. To reduce ambiguity arisen during analysis. we introduced structural and linguistic constraints on dependency structure based on a new concept 'rank' for each word and word pair. Adding to the Japanese grammar. we have developed a large-scale Japanese dictionary for morphological analysis. It has about 1.50,000 entries including more than 40,000 proper nouns so that it can be used for morphological analysis of newspaper articles. These grammar and dictionary are described in section 2. Language Tool Box is a collection of basic NLP tools especially for Japanese processing. Input and output modules for some experimental NLP systems we made so far, mainly Japanese morphological analyzer. syntax analyzer and sentence generator. were useful for other NLP applications. VVe have refined their user-interface. made programs robust to unexpected inputs. and increased efficiency to make them easier to apply to various applica.tions. Currently. not only input and output tools are included in this collection. but also supporting tools for lexicographers and grammar writers such as concordance system and grammar editor. The description of these tools and their publication will be appeared in section :3. Development of discourse processing systems is the main theme of our research. vVe have collected rules for language phnomena concerning discourse. and developed several experimental systems in this field including text generation, discourse structure construction. and dialog systems. The text generation system produces one or more paragraphs of text concerning to a given theme based on its belief and judgement. rhe discourse structure construction system uses discourse rules as a grammar to construct a tree-like discourse structure of a given text. The experimental dialog systems handle user's intention. situation, and position to remove user's misunderstanding and to produce user friendly responces. These system are described in section 4. As parallel NLP experiment. we have developed a small system for cooperative processing in which morphological analysis. syntax analysis. and semantic analysis are amalgamated into a uniform process in a type inference framework. This system. running on multi-PSI machine. achieves about 12 speed up rate using :32 PEs. Precise description of the system and the experiment will be appeared in section .5. The overview of the whole activity for these four themes is shown in Figure 1. 2 Linguistic Knowledge-base Language Knowledge-base is a collection of basic knowledge for natural language processing including Japanese grammar and Japanese dictionary. We have build a Japanese grammar in phrase structure grammar based on unification grammar formalism. There has been no set of standard Japanese grammar rules which people get and handle easily and quickly. This is an obstacle for researchers in Japanese language processing VI' ho try to make experimental systems to prove 'some ideas or who try to build application systems in various field. Our Japanese grammar has been developed to overcome such obstcles and designed as a standard in a sense that it covers most of the general language phenomena and it is written in a common form to various environment. DCG (Definite Clause Grammar). Also we have developed a Japanese grammar based on dependency grammar formalism. Historically, there have been several Japanese dependency grammar because it is recognized easier to build a dependency grammar rules for Japanese because of loose constraints on word order of Japanese language. Vie introduced structural and linguistic constraints on dependency structure in order to avoid structural ambiguity. These constraints are based on a new concept 'rank' for each word and word pair. .\dding to the Japanese grammar. we have developed a large-scale Japanese dictionary for morphological analysis. It has about 150.000 entries including more than -:1-0.000 proper nouns so that it can be used for morphological analysis of newspaper articles. The precise description of Language Knowledge-base will be presented in [Sano and Fukumoto 92J submitted to IeOT session of this conference. 2.1 Japanese Grammar 2.1.1 Localized Unification Grammar Conventional Japanese grammar for computers are not satisfactory to practical application because they lacked formality. uniformity of precision and exhaustiveness [Kuno and Shibatani 89J [Masuoka 89J [Nitta and Masuoka 89J. Having made an exhaustive investigation, we collected language phenomena and rules to explain those phenomena objectively expressed in a DCG style formal description [Pereira 80J. This description is based on the Unification Grammar formalism [Calder 89J [Carlson 89J [Moens 89J. They covers most of the phenomena appearing in contemporary written text [Sano 89J [Sano et ai. 90J [Sano and Fukumoto 90J. We classified these phenomena according to the complexity of corresponding surface expressions [Sano 91]. Grammar rules are classified also according to their corresponding phnomena. The classification of phenomena (rules) is shown in Table 1. 157 Table 1: Classification of Grammar Rules level 1",2 3",4 5 6 7",8 9 10", 11 12 phenomena single predicate negation / aspect / honorification subject+complement+predicate / topicalization passive / causative modification (to nouns / to verbs) particles (1) / coordination (2) compound sentence / condition particles (2) / coordination (2) / conjunction The syntactic-semantic structure of sentence is shown in Figure 2. In this figure, State-of-affairs (SOA) is the minimum sub-structure of the whole structure. A SOA has a predicate with some cases and optional complements. Composition of one or more SOAs form a description. The semantic contents of a sentence is a description preceded by a Topic. And furthermore the semantics of a sentence contains speaker's intention expressed by .Modal. According to this structure, rules of each level (Table 1 ) are divided into several groups. Rules of outermost group analyze speaker's intention through the expression at the end of sentences. Rules of the second group analyze topic-comment structure, that is a dependency relation between a topicalized noun phrase marked by a particle "wa" and the main predicate. And rules for analyzing description, voice, etc. follow. Sentence will be shown in Figure :3. SYN 1· Cato(SYN 2. X 2 , topic(X 2 , R~~l' { [ } ) I REL21. Fl. (X. Z)) Cat1(SYN 1.X 1, REL 1. F 1, (X. Y)). Cat'2(SYN'2.X2. REL 2.F2 . (Y.Z)). Figure 3: An Example of LUG Grammar Rules 2.1.2 Restricted Dependency Gramar For .Japanese language. there has been many researche:; on dependency grammar because there are no strong constraints of word order in .J apanese [Kodama 87]. In these researches. in order to determine whether a \-vord depends on other. no globa.l informa.tion are used but that of onh' these two words. However. this kind of local informatio~1 is not sufficient to recognize the structure of whole sentence including topic and ellipsis. Consequently. wrong interpretation of a sentence are produced as a result of dependency analysis [Sugimura and Fukumoto 89]. V.je introduced structural and linguistic constraints on dependency structure in order to avoid this kind of structural ambiguity. These constraints are described in terms of rank for each word and word pair. Rank represents strength of dependency between words which reflects global information in a whole sentence [Fukumoto and Sano 90]. Definition of ranks and their constraints are described in [Sano and Fukumoto 92] in detail. Figure 4 shows a structural ambiguity and its resolution. For the sentence "Kare-ga yob'U-to dete-kita. ("When he called framework. Most of the conventional NLP systems have been designed a collection of independently acting modules. Processing in each module is hidden from the outer world, and we use these modules as black-boxes. But since parallel cooperative processing needs internal information being exchanged between modules. we must adopt other framework for parallel NLP. One answer to this problem is to abstract processing mechanism to merge all such processing as morphology, syntax, semantics, and etc. Constraint transformation proposed by Hasida [Hashida 91] is one of the candidates of this framework. We proposed a type inference method [Martin-Lof 84] as another candidates. This type inference mechanism is based on a typed record structure [Sells 85] or a record structure of types similar to 'lb-term [Alt-Kaci and Nasr 86], sorted feature structure [Smolka 88], QUIXOTE [Yasukawa and Yokota 90], order-sorted logic [Schmidt-Schauss 89]. Morphological analysis and syntax analysis is performed by layered stream method [Matsumoto 86]. Roles of process and communication are exchanged in comparison with the method used in PAX [Satoh 90]. This system, running on multi-PSI machine, using a Japanese dictionary with 10,000 nouns, 1000 verbs, 700 concepts, and a Japanese grammar LUG [Sano 91] [Sano and Fukumoto 92], achieves about 12 speed-up rate using 32 processing elements. Figure 9 shows the relation between number of processors (1 '" 32) and processing time in milli second for a 25-word long sentence. Figure 10 shows the relation between reductions and speed-up ratio for various evaluation sentences. The detail of this system will be presented in the paper [Yamasaki 92] submitted to this conference. syn 18 16 14 ,g .. Cii 12 c. = 10 '0 8 CD CD c. (I) 6 4 2 O~~~~--~~~~~~~ Oe+O 2e+6 4e+6 6e+6 8e+6 1e+7 reductions Figure 9: Performance of Experimental System (1) • -_.,,*... o morph+syn+sem morph+syn syn 20 18 16 ,g .. Cii c. = '0 14 12 10 CD CD 8 (I) 6 C. Acknowledgment 4 We wish to thank Dr. Kazuhiro Fuchi, director of Ie aT Research Center, who gave us a chance to research natural language processing, and also Dr. Shunichi Uchida. Manager of Research Division, for his helpful advise on the fundamental organization and direction of our research. 2 O~~-r-r~~--~r-~~~ Oe+O 2e+6 4e+6 6e+6 8e+6 1e+7 reductions Figure 10: Performance of Experimental System (2) 162 References [Abe et al. 91] H. Abe, T. Okunishi, H. Miyoshi, and Y. Obuchi. A Sentence Division Method using Connectives. In Proc. of the 42nd Conference of Information Processing Society of Japan (in Japanese). 1991. pp. 13-15. [Alt-Kaci and Nasr 86] H. Alt-Kaci and R. Nasr. LOGIN: A Logic Programming Language with Builtin Inheritance, The Journal of Logic Programming, Vol. 3, No.3, Oct. 1986. [Aizawa and Ehara 73] T. Aizawa and T. Ehara. Kan.aKanji Conversion by Computer (in Japanese), NHK Technical Research, Vol. 25, No.5, 1973. [Appelt 85a] D. E. Appelt. Planning English Sentences, Cambridge University Press, 1985. [Appelt 85b] D. E. Appelt. Bidirectional Grammar and the Design of Natural Language Generation Systems, In Proc. TINLAP-B5, 1985. [Appelt 87] D. E. Appelt. A Computational Model of Referring, In Proc. IJCAI-B7, 1987. ,[Appelt 88] D. E. Appelt. Planning Natural Language Referring Expressions. In David D. McDonald and Leonard Bole (eds.) , Natural Language Generation Systems. Springer-Verlag, 1988. [Barwise and Perry 83] J. Barwise and J. Perry. Situation and Attitudes, MIT Press, 1983. [Brooks 86] R. A. Brooks. A Robust Layered Control System for a Mobile Robot, IEEE Journal of Robotics and Automation, Vol. Ra-2, No. 1. March, 1986. [Fukumoto and Sano 90] F. Fukumoto, H. Sano. Restricted Dependency Grammar and its Representation. In Proc. The 41st Conference of Information Processing Society of Japan (in Japanese), 1990. [Fukumoto 90] J. Fukumoto. Context Structure Extraction of Japanese Text based on Writer's Assertion. In Research Report of SIG-NL, Information Processing Society of Japan (in Japanese). 78-15, 1990. [Fukumoto and Yasuhara 91] J. Fukumoto and H. Yasuhara. Structural Analysis of Japanese Text. In Research Report of SIG-NL, Information Processing Society of Japan (in Japanese). 85-11, 1991. [Grosz and Sidner 85] B. Grosz and C. L. Sidner. The structures of Discource Structure, Technical Report CSL1, CSLI-85-39, 1985. [Hashida 91] K. Hasida. Aspects of Integration in Natural Language Processing, Computer Software, Japan Society for Software Science and Technology, Vol. 8, No.6. Nov. 1991. [Hovy 85] E. H. Hovy. Integrating Text Planning and Production in Generation. In the Proceedings of the International Joint Conference on Artificial Intelligence. 1985. [Hovy 87] E. H. Hovy. Interpretation in Generation. In the Proceedings of 6th AAAI Conference. 1987. [Hovy 88] E. H. Hovy. Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum Associates, Publishers, 1988. [Hovy 90a] E. H. Hovy. Unresolved Issues in Paragraph Planning. In Current Research in Natural Language Generation. Academic Press, 1990. [Calder 89] Jonathan Calder, Ewan Klein, Henk Zeevat. Unification Categorial Grammar. In Proc. of the Fourth Conference of. the European Chapter of the ACL, Manchester, 1989. [Hovy 90b] E. H. Hovy. Pragmatics and Natural Language Generation. Artificial Intelligence 43, 1990. pp. 153-197. [Carlson 89] Lauri Carlson. RUG: Regular Unification Grammar. In Proc. of the Fourth Conference of the European Chapter of the ACL, Manchester, 1989. [Ichikawa 78] T. Ichikawa. An Introduction to Japanese Syntax fo·T' Teachers. Kyoiku Shuppan Publishing Co., 1978. [Danlos 84] 1. Danlos. Conceptual and Linguistic Decisions in Generation. In Proc. of the International Conference on Computational Linguistics, 1984. [De Smedt 90] K. J. M. J. De Smedt. Incremental Sentence Generation. NICI Technical Report, 90-01, 1990. [Fujisaki 89] H. Fujisaki. Analysis of Intonation and its Modelling in Japanese Language. Japanese Language and Education of Japanese (in Japanese). Meiji Shoin Publishing Co., 1989, pp. 266-297. [Ikeda et al. 88] T. Ikeda, K. Hatano, H. Fukushima and S. Shigenaga. Generation Method in the Sentence Generator of Language Tool Box (LTB). In Proc. of the 5th Conference of Japan Society for Software Science and Technology (in Japanese), 1988. [Ikeda 91] T. Ikeda. Natural Language Generation System based on the Hierarchy of Semantic Representation (in Japanese). Computer Software, Japan Society for Software Science and Technology, Vol. 8, No.6, Nov. 1991. 163 [Ikeda et al. 92] T. Ikeda, A. Kotani, K. Hagiwara, Y. Kubo. Argument Text Generation System (Dulcinea). In Proc. of FGCS '92, ICOT, Jun. 1992. [Masuoka 89J T. Masuoka, Y. Takubo. Basic Japanese Grammar (in Japanese). Kuroshio Publishing Co .. Tokyo. 1989. [Katoh and Fukuchi 89] Y. Katoh and T. Fukuchi. Tense, . Aspect and Mood (in Japanese). Japanese Example Sentences and Problems for Foreigners 15. Aratake Publishing Co., Tokyo. 1989. [Matsumoto et al. 83aJ Y. Matsumoto. M. Seino. H. Tanaka. Bep Translator (in Japanese). Bulletin of the Electrotechnical Laboratory. Vol. 47. No.8. 1983. [Kempen and Hoenkamp 87] G. Kempen and E. Hoenkamp. An Incremental Procedural Grammar for' Sentence Formulation, Cognitive Science, Vol. 11. 1987. [Matsumoto et al. 83bJ Yuji Matsumoto, H. Tanaka. H. Hirakawa. H. Miyoshi and H. Yasukawa. BUP: A Bottom-up Parser Embedded in Prolog, New Generation Computing, Vol. 1, 1983. [Kinoshita 81] S. Kinoshita. Writing Techniques in Scientific Field (in Japanese). Chuo-Kouron Publishing Co., 1981. pp. 82-88. [Kinoshita et al. 89] S. Kinoshita, K. Ono, T. Ukita and M. Amano. Discourse Structure Extraction in Japanese Text Understanding. In Symposium on Discourse Understanding Model and its Application (in Japanese), Information Processing Society of Japan, 1989. pp. 125-136. [Kodama 87] T. Kodama. Research on Dependency Grammar (in Japanese). Kenkyu-sha, 1987. pp. 161194. [Kubo et al. 88] Y. Kubo, M. Yoshizumi. H. Sano. K. Akasaka and R. Sugimura. Development Environment of the Morphological Analyzer LAX. In Proc. of the 37th Conference of Information Processing Society of Japan' (in Japanese). 1988. pp. 1078-1079. [Kubo 89] Y. Kubo. Composition of Word Semantics in Morphological Analyzer LAX. In Proc. of the 39th Conference of Information Processing Society of Japan (in Japanese). 1989. pp. 598-599. [Kuno and Shibatani 89] S. Kuno, K. Shibatani. New Development in Japanese Linguistics (in Japanese). Kuroshio Publishing Co., Tokyo, 1989. [Littman and Allen 87] D. J. Littman and J. F. Allen. A Plan Recognition Model for Subdialogues in Con- versation, Cognitive Science 11, 1987. pp. 163-200. [Mann and Thompson 86] W. C. Mann and S. A. Thompson. Rhetorical Structure Theory: Description and Construction of Text Structure. In P1'OC. of the Third International Workshop on Text Generation, 1986. In Dordrecht (ed.), Natural LanguagE Generation: New Results in Artificial Intelligena, Psychology, and Linguistics. Martinus Nijhoff Pub- lishers, 1987. [Martin-Lof 84] P. Martin-Lof. Intuitionistic Type Theory - Studies in Proof Thoery, Lecture Notes, 1984. [Matsumoto 86J Y. Matsumoto. A Parallel Parsing System for Natural Language Analysis, Proc. of 3rd 111.ternatioanl Conference on Logic Programming, London, 1986. Lecture Notes in Computer Science 225. pp. 396-409, 1986. [Matsumoto and Sugimura 87J Y. Matsumoto and R. Sugimura. A Parsing System based on Logic Programming. In Proceedings of the International Joint Conference. of Artificial Intelligence. 1987. [Matsumoto 90J Y. Matsumoto and A. Okumura. Programming Searching Problems in Parallel Logic Programming Languages - An Extentioll of Layered Streams -. In Proc. of the I{L1 Programmil/.fj vVorkshop '.90 (in JapanesE). 1990. [Maruyama and Suzuki 91] T. Maruyama and H. Suzuki. Cooperative Sentence Generation in Japanese Dialog based on Simple Principles (in JapanEsE). In Proc. of thE 8th Conferena of Nihon Ninchi Kagaku Kai (in Japanese). 1991. [McKeown 85aJ K. R. McKeown. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge Univer- sity Press. 1985. [McKeown 85bJ K. R. McKeown. Discourse Strategies for Generating Nat ural- Language Text, A rtificial Intelligence 27, 1985. pp. 1-41. [Meteer 90J M. V\". Meteer. The 'Generation Gap' - the Problem of Expressibility in Text Planning. Techn.ical Report. BBN Systems and Technologies Corporation. 1990. [Minami 74J F. Minami. The Structure of Contemporary Japanese Language (in Japanese). Taishu-kan Publishing Co .. 197-1. [Moens 89J Marc Moens. Jonathan Calder. Ewan Klein. Mike Reape. Henk Zeevat. Expressing Generalizations in Unification-based Grammar Formalisms. In Proc. of the FO'urth Conference of the European Chapter of the ACL, Manchester, 1989. 164 [Morioka 87] K. Morioka. Vocabulary Construction (in Japanese). Meiji Shoin Publishing Co., 1987. [Morita 89] Y. Morita. Dictionary of Basic Japanese (in Japanese). Kadokawa Publishing Co., 1989. [Morita and Matsuki 89] Y. Morita and Y. Matsuki. Sentence Types of Japanese (in Japanese). ALK Publishing Co., Tokyo. 1989. [Nagano 86] K. Nagano. Japanese Syntax - a Grammatical Study (in Japanese). Asakura Publishing Co., 1986. [Nakajima and Sugimura 89] A. Nakajima and R. Sugimura. Japanese Morphological Analysis with TRIE Dictionary and Graph Stack. In Proc. of the 39th Conference of Information Processing Society of Japan (in Japanese). 1989. pp. 589-590. [Nitta and Masuoka 89] Y. Nitta and T. Masuoka (eds.), Modality in Japanese (in Japanese). Kuroshio Publishing Co., Tokyo. 1989. [NLRI81] National Language Research Institute. Demonstratives in Japanese (in Japanese). Ministry of Finance. 1981. [NLRI82] National Language Research Institute. Particles and Auxiliary Verbs of Japanese (in Japanese). Shuei Publishing Co., Tokyo. 1982. [NLRI 85] National Language Research Institute. Aspect and Tense of Contemporary Japanese (in Japanese). Shuei Publishing Co., Tokyo. 1985. [NLRI89] National Language Research Institute. Rtsearch and Education of DiscOUl'se (in Japanese). Ministry of Finance. 1989. [Nobukuni 89] Y. Nobukuni. Division Algorithm of Long Sentence, In Proc. of the 39th Conference of In/ormation Processing Society of Japan (in Japanese). 1989. p. 593. [Okumura and Matsumoto 87a] A. Okumura and Y. Matsumoto. Parallel Programming with Layered Streams. In Proc. of the 1987 International Symposium on Logic Programming. San Francisco, September 1987. pp. 224-232 .. [Okumura and Matsumoto 87b] A. Okumura and Y. Matsumoto. Parallel Programming with Layered Streams. In Proc. of the Logic Programming Conference '87 (in Japanese), 1987. pp. 223-232. [Pereira 80] Fernando C. N. Pereira, David H. D. Warren. Definite (!lause Grammars for Language Analysis -- A Survey of the Formalism and a Comparison with Augmented Transition Networks, Artificial Intelligence. Vol. 13, No.3. 1980. pp. 231-278. [Saitoh et al. 91] Y. Saitoh, M. Shibata and J. Fukumoto. Analysis of Relationship of Adjoining Sentences for Context Structure Extraction. In Proc. of the 43rd Conference of Information. Processing Society of Japan (in Japanese). 1991. [Sakuma 88] M. Sakuma. Context and Paragraph. Japanese Linguistics (in Japanese). Vol. 7, No.2. 1988. pp. 27-40. [Sano et al. 88] H. Sano, K. Akas aka, Y. Kubo and R. Sugimura. Morphological Analysis based on Word Formation. In Proc. of the 36th Conference of Infol'":. mation Processing Society of Japan (in Japanese), 1988. [Sano 89] H. Sano. Hierarchical Analysis of Predicate using Contextual Information. In Symposium on Di8course Understanding Model and ds Application (in Japanese), Information Processing Society of Japan, 1989. [Sano et ai. 90] H. Sano, F. Fukumoto, Y. Tanaka. Explanatory Description based Grammar - SFTB (in Japanese), ICOT-Technical Memo, TM-0885, 1990. [Sano and Fukumoto 90] H. Sano, F. Fukumoto. Localized Unification Grammar and its Representation. In Proc. of the 41st Conference of Information Processing Society of Japan (in Japanese), 1990. [Sano 91] H. Sano. User's Guide to SFTB (in Japanese), ICOT, Sep. 1991. [Sano and Fukumoto 92] H. Sano, F. Fukumoto. On a Grammar Formalism, Knowledge Bases and Tools for Natural Language Processing in Logic Programming. In Pmc. of FGCS '92. ICOT, Jun. 1992. [Satoh 90] H. Satoh. Improvement of Parallel Syntax Analyzer P~'\X. In Proc. of KL 1 Programming Jtl:'"orkshop '90 (in Japanese), leOT, Tokyo, 1990. [Schmidt-Schauss 89] M. Schmidt-SchauB. Computational Aspects of an Order-Sorted Logic with Term Declarations, Lecture Notes in Artificial Intelligence, Springer-Verlag, 1989. [Searl 69] J. R. Searl. An Essay in the Philosophy of Language, Cambridge University Press, 1969. [Sells 85] P. Sells. Lectures on Contemporary Syntactic Theories, CSLI Lecture Notes, No.3, 1985. [Shibata et al. 90] M. Shibata, Y. Tanaka and J. Fukumoto. Anaphora Phenomena in Newspaper Editorials. In Proc. of the 40th Conference of Inj'07'mation Processing Society of Japan (in Japanese), 1990. 165 [Shinnou and Suzuki 91] H. Shinnou and H. Suzuki. tTtilization of Sound Information in Incremental Analysis. In Research Report of SIC-NL, Information Processing Society of Japan (in Japanese). 8.5-7. 1991. [Shiraishi et al. 90] T. Shiraishi. Y. Kubo and M. Yoshizumi. Format of Morpheme Dictionary and Dictionary Improvement. In Proc. of the 41st COIIference of Information Processing Society of Japan (in Japanese), 1990. pp. 19:3-194. [Smolka 88] G. Smolka. A Feature Logic with Subsorts. IBM Deutschland, Stuttgart, Germany, LILOC Report, No. 33, May 1988. [Sugimura et al. 88] R. Sugimura, K. Akasaka, Y. Kubo. Y. Matsumoto and H. Sano. LAX - Morphological Analyzer in Logic Programming. In Proc. of tht Logic Programming Conference '88 (in Japanese). 1988. pp. 213-222. [Sugimura and Fukumoto 89] R. Sugimura. F. Fukumoto. Dependency Analysis by Logic Grammar. In Symposium on Discourse Fndtrstanding lVIodel and its Application (in Japanest). Information Processing Society of Japan. 1989. [Suzuki and Tsuchiya 90] H. Suzuki and S. Tsuchiya. Incremental Interpretation of Japanese Ctterance. III Proc. of the 7th Conference of Siholl .'linchi A-agaku Kai (in Japanese). 1990. pp. 46-47. [Tanaka et al. 91] Y. Tanaka. M. Shibata and J. Fukumoto. Repetitive Occurrence Analysis of a v\lord in Context Structure Analysis System. In P1'OC. of the 43rd Conference of Information Processing Society of Japan (in Japanese). 1991. [Teramura et al. 87] H. Teramura, Y. Suzuki. N. Noda and M. Yazawa. Case Study in Japanese Crammar (in Japanese). Outousha Publishing Co.. Tokyo. 1987. [Tokunaga and Inui 91] T. Tokunaga and K. Inui. Survey of" Natural Language Sentence Generation in 1980's. In Journal of Japanese Society for Artificial Intelligence (in Japanese). Vol. 6. Nos. 3-.5. 1991. [Tomita 87] M. Tomita. An Efficient Augmented Context Free Parsing Algorithm. Computational Linguistics 13, 1-2, 31-46. 1987. [Tsujii 89] J. Tsujii. Context Processing. In Symposium on Natural Language Processing (in Japanese). Information Processing Society of Japan. 1988. pp. 7.587. [Ueda and Chikayama 90] K. Ueda and T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. The Computer Journa1. Vol. 33. No.6. Dec. 1990. pp. 494-.500. [Yamanashi 86] M. Yamanashi. Speech Act Japanese). Taishukan Publishting Co .. 1986. (in [Yamanashi 89] M. Yamanashi. Discourse. Context and Inference. III Symposium. 011 DiscoUl'st l'lIdf I'standing .Hodtl ([lid its .-lpplicatioll (ill Japal/u;(). illformation Processing Society of .Japan. 1989. pp. 1-1:2. [Yamasaki 92] S. Yamasaki. A Parallel Cooperatiw' ~atural Language Processing System Laputa. In Proc. of FCC'S '9:2. ICOT . .Jun. 1992. [Yasukawa and Yokota 90] H. Yasukawa and K. Yokota. The Overview of a Knowledge Representation Language QUIXOTE. IeOT (draft), Oct. 21. 1990. [Yoneda et al. 89] J. ·Yoneda. Y. Kubo. T. Shiraishi and M. Yoshizumi. Interpreter and Debugging E11vironment of LAX. In Proc. of thE 3.9th Con/(:/'enct of Information Processing Society of Japan (in Japanese). 1989. pp. 596-.597. [Yoshida and Hidaka 87] :vI. Yoshida a.nd S. Hiclaka. Studifs on Documentation in Standard Japallf8f (in Japallt8t). ] 987. [Yoshimura tf al. tl2] 1\.. \·oshimura. T. Hidaka and ~l. Yoshida. Ou LOllgest :\iarhillg :\1(-'1 hod clHd vVord Millimizing Mt-'tllOd ill Japauese :\(ol'phologiral Allal~'sis. III RU;fo,.ch RtjJort of Sl(,'-.\L Illformatio11 Processiug Society of J apau (ill JUplll1 f.' ~sigp paraweter 2700 2600 ----~----~----~----~---- ---- : : : : A B I I I I I I I I C -D -- _ I:==g ___ ~ ____ ~ ____ 1 -- --- -- calculator - SA RA Olllli lPSlI mIL Muhi-lPSlI Par a I I e I Par a I I e I 2S00 ~ .::: Ranges of design .......parameter~ 2400 Cos 1 2300 Figure 27: System organization 2100 The system organization of Desq is shown in Figure 27. Desq consists of three subsystems: 2000 1000 2000 3000 4000 SOOO 6000 7000 SOOO 9000 Computation time (sec) Figure 26: Experimental Result 4.5 Design Supporting System based on Deep Reasoning In design, there are many cases in which a designer does not directly design a new device, but rather, changes or improves an old device. Sometimes a designer only changes the parameters of components in a device to satisfy the requirements. The designer, in such cases, knows the structure of the device, and needs to determine the new values of the components. This is common in electronic circuits. Desq (Design supporting system based on qua.litative reasoning) determines valid ranges of the design decisions using qualitative reasoning. Desq uses an envisioning mechanism, which, by using qualitative reasoning, determines all possible behaviors of a system. However, the quaIitative reasoning of Desq Behavior reasoner This subsystem is based on a qualitative reasoning system. Its model building reasoning part builds simultaneous inequalities from initial data using definitions of physical rules and objects. The simultaneous inequalities are a model of a target system. The envisioning part derives all possible behaviors. Design parameter calculator This subsystem calculates ranges of design parameters undefined in initial data. Parallel constraint solver This subsystem solves simultaneous inequalities. It is written in KL1 and is executed on a parallel inference machine. Desq finds the valid ranges of design parameters as follows: (1) Perform envisioning with design parameters whose values are undefined in initial data, 181 (2) Select preferable behaviors from possible behaviors found by envisioning, (3) Calculate the ra.nges of the design parameters that give preferable behaviors. As a.n experiment, Desq successfully determined the valid range of resistance Rb in the DTL circuit in Figure 28. generates the operations that deal with these unforeseen events. It utilizes the Device Model and the Operation Principle Model. The Precondition-Generator generates the preconditions of each operation generated by the Operation-Generator, and, as a result, generates rule-based knowledge for plant control. The SimulationVerifier predicts the plant behavior that will be observed ~hen the plant is operated according to the generated knowledge. It utilizes the Dynamics Model, verifies the knowledge using predicted plant behavior, and gives feedback to the Operation-Generator, if necessary. 5 V --1.....-41.........-+1, Undefined parameter Figure 28: DTL circuit 4.6 A Diagnostic and Control Expert System Based on a Plant Model Currently in the field of diagnosis and control of thermal power plants, the trend in systems is that the more in-. telligent and flexible they become, the more knowledge they need. As for knowledge, conventional diagnostic and control expert systems are based on heuristics stored a priori in knowledge bases. So, they cannot deal with unforeseen events when they occur in a plant. Unforeseen events are abnormal situations which were not expected when the plant was designed. To overcome this limitation, we have focused on model-based reasoning and developed a diagnostic and control expert system based on a plant model. The system (Figure 29) consists of two subsystems: the Shallow Inference Subsystem (SIS) and the Deep Inference Subsystem (DIS). The SIS is a conventional plant control systern based on heuristics, namely shallow knowledge for plant control. It selects and executes plant operations according to the heuristics stored in the knowledge base. The Plant ]I.;[onitor detects occurrences of unforeseen events, and then activates the DIS. The DIS utilizes various kinds of models to realize the thought processes of a skilled human operator and to generate the knowledge for plant control to deal with unforeseen events. It consists of the following modules: the Diagnosor, the Opemtion-Genemtor, the Precondition-Generator, a.nd the Simulation- Verifier. The Diagnosor utilizes the Qualitative Causal ]I.;[odel for plant process parameters to diagnose unforeseen events. The Operation-Generator Figure 29: System Overview The knowledge generated and verified by the DIS is transmitted to the SIS. The SIS, then, executes the plant operations accordingly, and, as a result, the unforeseen events should be taken care of. We have implemented the system on Multi-PSI. To realize a rich experimental environment, we have also implemented a plant simulator on a mini-computer. Both computers are linked by a data transmission line. We have incorporated both a device and a dynamics model for each device of a thermal power plant (to a total of 78). We summarize the experimental results as follows. • The DIS could generate plan control knowledge to deal with unforeseen events. • The SIS executed plant operators according to the generated knowledge and could deal with unforeseen events. • We have demonstrated a fivefold improvement in reasoning time by using Multi-PSI with 16 processor elements. 182 4.7 Adaptive Model-Based Diagnostic System Though traditional rule-based diagnostic approaches that use symptom-failure association rules have been incorporated by many current diagnostic systems, they lack robustness. This is because they cannot deal with unexpected cases not covered by the rules in its Imowledge base. On the other hand, model-based diagnostic systems that use the behavioral specification of a device are more robust than rule-based expert systems. However, in general, many tests are required to reach a conclusive decision because they lack the heuristic knowledge which human experts usually utilize. In order to solve this problem, a model-based diagnostic system has been developed which is adaptable because of its ability to learn from experience [Koseki et al. 1990]. This system consists of several modules as shown in Figure 30. The knowledge base consists of design knowledge and experiential knowledge. The design knowledge represents a correct model of the target device. It consists of a structural description which expresses component interconnections and a behavior description which expresses the behavior of each component. The experiential knowledge is expressed as a failure probability for each component. The diagnosis module utilizes those two kinds of know ledge. Test Pattern Selector / Generator t Symptom Diagnosis Module Test Learning Module Test result Suspects Figure :30: Structure of the System F~gure :31 shows the diagnosis flow of the system. The system keeps a set of suspected components as a suspectlist. It uses an eliminate-not-suspected strategy to reduce the number of suspects in the suspect-list, by repeating the test-and-eliminate cycle. It starts by getting an initial symptom. A symptom is represented as a set of target device input signals and an observed incorrect output signal. It calculates an initial suspect-list from the given initial symptoms. It performs model-based reasoning to obtain a suspect-list using a correct design model and an expected correct output signal. To obtain an expected correct output signal for the given inputs, the system ca.rries out simulation using the correct design model. Figure 31: Diagnosis Flow After obtaining the initial suspect-list, the system repeats a test-and-eliminate cycle, while the number of suspects is greater than one and an effective test exists. A set of tests is generated by the test pattern generator. Among the generated tests, the most cost effective is selected as the next test to be performed. The effectiveness is evaluated by using a minimum entropy technique that utilizes the fault probability distribution. The selected test is suggested and fed into the target device. By feeding the test into the target device, another set of observations are obtained as a test result and are used to eliminate the non-failure components. Learning Mechanism The performance of the test selection mechanism relies on the preciseness of the presumed probability distribution of components. In order to estimate an appropriate probability distribution from a small amount of observation, the system acquires a presumption tree using minimum description length(MDL) criterion. A description length of a presumption tree is defined as the sum of the code length and the loglikelihood of the model. Using the constructed presumption tree, the probability distribution of future events can be presumed appropriately. The algorithm is implemented in KL1 language on a parallel inference machine, Multi-PSI. The experimental results show that the 16 PE implementation is about 11 times as fast as the sequential one. The performance of the adaptive diagnostic system (in terms of the required number of tests) was also examined. The target device was a packet exchange system and its model was comprised of about 70 components. The experimental results show that the number of required tests can be reduced by about 40% on average by using the learned know ledge. 183 4.8 Motif Extraction System One of the important issues in genetic information processing is to find common patterns of sequences in the same category which give functional/structural attributes to proteins. The patterns are called motifs, in biological terms. On Multi PSI, we have developed the motif extraction system shown in Figure 32. In this, a motif is represented by stochastic decision predicates and the optimal motif is searched for by the genetic algorithm with the minimum description lellgth(MDL) principle. Protein DB Motif motif(S,cytochrome_c) with p :- contain("C:XXCH",S). means that if a given sequence contains "CXXCH" it is cytochrome_c with probability p. Minimum Description Length Principle We employ the minimum description length(MDL) principle because it is effective in estimating a good probabilistic model for sample data, including uncertainty avoiding overfitting. The MDL principle suggests that the best stochastic decision predicate minimizes the following value. predicate description length scri ption length + correctness de- The value of the predicate description length indicates the predicate complexity(i.e. smaller values are better). The value of the correctness description length indicates the likelihood of the predicate(i.e. smaller values are better). Therefore, the MDL principle balances the trade-off between the complexity of motif representation and the likelihood of the predicate to sample data. Genetic Algorithm The genetic algorithm is a probabilistic search algorithm which simulates the evolution process. We adopt it to search for the optimal stochastic motif, because there is a combinatorially explosive number of stochastic motifs and it takes enormous computation time to find the optimal stochastic motif by exhaustive searches. The following procedures are performed in order to search for the optimal point of a given function fusing the simple genetic algorithm. 1. Give a binary representation that ranges over the domain of the function f Genetic Algorithm with MDL Principle Motif is represented by binary string. Motif's fittness value is calculated using MDL principle. Figure 32: Motif Extraction System Stochastic Decision Predicate It is difficult to express a motif as an exact symbolic pattern, so we employ the stochastic decision predicate as follows. motif(S,cytochrome_c) with 129/225 :- contain("CXXCH",S). motif(S,others) with 8081/8084. This example means that if S contains a subsequence matched to "CXXCH", then S is cytochrome c with probability ~;;, otherwise S is another protein with probability ~g~!. 2. Create an initial population which consists of a set of binary strings 3. Update the population repeatedly using selection, crossover, and mutation operators 4. Pick up the best binary string in the population after certain generations We apply the simple genetic algorithm to search for the optimal motif representation. Each motif is represented by a 120-bit binary string, with each bit corresponding to one pattern (e.g. "CXXCH"). The 120-bit binary string represents the predicate whose condition part is the conjunction of the patterns containing the corresponding bits. Table 2 is the result of applying the motif extraction system to Cytochrome c in the Protein Sequence Database of the National Biomedical Research Fomidation. This table shows the extracted motifs and their description lengths. CL is a description length of motif complexity, PL is a description length of probabilities, and DL is a description length of motif correctness. 184 Table 2: cytochrome c Motif CXXCH others Compared 8309 8084 Matched 225 8084 Description Length 286.894 (CL DL = 260.209) Correct 129 8081 = 16.288, PL = 10.397, 5.2 Early Experiences As the first programs to run on the experimental parallel inference machine Multi-PSI, ,four programs were developed to solve relatively simple problems. These were demonstrated at the FGCS'88 conference [Ichiyoshi 1989]. They are: Packing Piece Puzzle (Pentomino) 5 5.1 Performance Analysis of Parallel Programs Why Performance Analysis? A rectangular box and a collection of pieces with various shapes are given. The goal is to find all possible ways to pack the pieces into the box. The puzzle is often known as the Pentomino puzzle, when thepieces are all made lip of 5 squares. The program does a top-down OR-parallel all solution search. Shortest Path Problem Along with the development of various application programs, we have been conducting a study of the performance of parallel programs in a more general framework. The main concern is the performance of parallel programs that solve large-scale knowledge information processing problems on large-scale parallel inference machines. Parallel speedup comes from decomposing the whole problem into a number of subproblems and solving them in parallel. Ideally, a parallelized program would run p times faster on p processors than on one processor. There are, however, various overhead factors, such as load imbalance, communication overhead, and (possible) increases in the amount of computation. Knowledge processing type programs are "non-uniform" in (1) that the number and size of subproblems are rarely predictable, (2) that there can be random communication patterns between the subproblems, and (3) that the amount of total computation can depend on the execution order of subproblems. This makes load balancing, communication control, and scheduling important and nontrivial issues in designing parallel knowledge processing programs. The overhead factors could make the effective performance obtained by actually running those programs far worse than the "peak performance" of the machine. The performance gap may not be just a constant factor loss (e.g., 30 % loss), but could widen as the number of processors increases. In fact, in poorly designed parallel programs, the effective-to-peak performance ratio can approach zero as the number of processors increases without limit. If we could understand the behavior of the various overhead factors, we would be able to evaluate parallel programs, identify the most serious bottlenecks, and possibly, remove them. The ultimate goal is to push the horizon of the applicability of large-scale parallel inference machines into a wide variety of areas and problem instances. Given a graph, where each edge has an associated nonnegative cost, and a start node in the graph, the problem is to find the lowest cost path from the start node to every node in the graph (single-source shortest path problem). The program performs a distributed graph algorithm. We used square grid graphs with randomly generated edge costs. Natural Language Parser The problem is to construct all possible parse trees for an English sentence. The program is a PAX parser [Matsumoto 1987], which is essentially a bottom-up chart parsing algorithm. Processes represent chart entries, and are connected by message streams that reflect the data flow in the chart. Tsumego Solver A Tsumego problem is to the game of go what the checkmate problem is to the game of chess. The black stones surrounding the white stones try to capture the latter by suffocating them, while the white tries to survive. The problem is finding out the result assuming that black and white do their best. The result is (1) white is captured, (2) white survives, or (3) there is a tie. The program does a parallel alphabeta search. In the Pentomino program, the parallelism comes from concurrently searching different parts of the search tree. Since disjoint subtrees can be searched totally independently, there is no communication between search subtasks or speculative computation. Thus, load balancing is the key factor in parallel performance. In the first version, we implemented a dynamic load balancing mechanism and attained over 40-fold speedup using 64 processors. The program starts in a processor called the master, which expands the tree and generates search subtasks. Each of the worker processors requests the master processor for a subtask in a demand-driven fashion (i.e., 185 it requests a subtask when it becomes idle). Later improvement of data structures and code tuning led to better sequential performance but lower parallel speedup. It was found that the subtask generation throughput of the master processor could not keep up with the subtask solution throughput ofthe worker processors. A multi-level subtask a.llocation scheme was introduced, resulting in 50 fold speedup on 64 processors [Furuichi el al. 1990]. The load balancing mechanism was separated from the program, and was released to other users as a utility. Several programs have used it. One of them is a parallel iterative deepening A * program for solving the Fifteen puzzle. Although the search tree is very unbalanced because of pruning with a heuristic function, it attained over 100 fold speedup on a 128-processor PIM/m [\i\Tada et al. 1992]. The shortest path program has a lot of inter-process communication, but the communication is between neighboring vertices. A mapping that respects the locality of the original grid graph can keep the amount of inter-]Jrocesso1' communication low. A simple mapping, in which the square graph was divided into as many subgraphs as there are processors, maximized locality. But the parallel speedup was poor, because the computation spread like a wavefront, making only some of the processors busy at any time during execution. By dividing the graph into smaller pieces and mapping a number of pieces from different parts of the graph, processor utilization was increased [Wada and Ichiyoshi 1990]. The natural language parser is a communication intensive program with a non-local communication pattern. The first static mapping of processes showed very little speedup. It was rewritten so that processes migrate to where the necessary data reside to reduce inter-processor communication. It almost halved the execution time [Susaki et al. 1989]. The Tsumego program did parallel alpha- beta searches up to the leaf nodes of the game tree. Sequential alphabeta pruning can halve the effective branching factor of the game tree in the best cases. Simply searching different alternative moves in parallel loses much of this pruning effect. In other words, the parallel version might do a lot of redundant speculative computation. In the Tsumego program, the search tasks of candidate moves are given execution priorities according to the estimated value'of the moves, so as to reduce the amount of speculative computation lOki 1989]. Through the development of these programs, a number of techniques were developed for balancing the load, localizing communication, and reducing the amount of speculative computation. 5.3 Scalability Analysis A deeper understanding of various overheads in parallel execution requires the construction of models and anal- ysis of those models. The results form a robust core of insight into parallel performance. The focus of the research was the scalability of parallel programs. Good parallel programs for utilizing largescale parallel inference machines have performance that scales, i.e., the performance increases in accordance with the increase in the number of processors. For example, two-level load balancing is more scalable than single-level load balancing, because it can use more processors. But deciding how scalable a program is requires some analytical method. As a measure of scalability, we chose the isoefficiency Junction proposed by Kumar and Rao [Kumar et al. 1988]. For a fixed problem instance, the efficiency of a parallel algorithm (the speedup divided by the number of processors) generally decreases as the number of processors increases. The efficiency can often be regained by increasing the problem size. The function J(p) is defined as an isoefficiency function if the problem size (identified with the sequential runtime) has to increase as J(p) to maintain a given constant efficiency E as the number of processors p increases. An isoefficiency function grows at least linearly as p increases (lest the subtask size allocated to each processor approaches zero). Due to various overheads, isoefficiency functions generally have strictly more than linear growth in p. A slow growth rate, such as p log p, in the isoefficiency function would mean a desired efficiency can be obtained by running a problem with a relatively small problem size. On the other hand, a very rapid growth rate such as 2P would indicate that only a very poor use of a large-scale parallel computer would be possible by running a problem with a realistic size. On-demand load balancing was chosen first for analysis. Based on a probabilistic model and explicitly stated assumptions on the nature of the problem, the isoefficiency functions of single-level load balancing and multilevel load balancing were obtained. In a deterministic case (all subtasks have the same running time), the isoefficiency function for single-level load balancing is p2, and that for two-level load balancing is p3/2. The dependence of the isoefficiency functions on the variation in subtask sizes was also investigated, and it was found that if the subtask size is distributed according to an exponential distribution, a logp (respectively, (10gp)3/2) factor is added to the isoefficiency function of single-level (respectively, two-level) load balancing. The details are found in [Kimura et al. 1991]. More recently, we studied the load balance of distributed hash tables. A distributed hash table is a parallelization of a sequential hash table; the table is divided into subtables of equal size, each one of which is allocated to each processor. A number of search operations for the table can be processed concurrently, resulting in increased throughput. The overhead comes mainly from load imbalance and communication overhead. By allo- 186 cating an increasing number of buckets (= subtable size) to each processor, the load is expected to be improved. We set out to determine the necessary rate of increase of subtable size to maintain a good load balance. A very simple static load distribution model was defined and analyzed, and the isoefficiency function (with regard to load imbala.nce) was obtained [Ichiyoshi et al. 1992]. It was found that a relatively moderate growth in subtable size q (q = w((log p) 2)) is sufficient for the average load to approach perfect balance. This means that the distributed hash table is a data structure that can exploit the computational power of highly parallel computers with problems of a reasonable size. 5.4 Program Logic Simulator Placement (KL1) (ESPt) Routing Alignment by 3-DP Alignment by SA Folding Simulation Legal Reasoning (Rule-based engine) (Case-based engine) Go Playing Game Remaining Tasks vVe have experimented with a few techniques for making better use of the computational power of large-scale parallel computers. We have also conducted a scalability analysis for particular instances of both dynamic and static load balancing. The analysis of various parallelizing overheads and the determination of their asymptotic characteristics gives insight into the nature of large-scale parallel processing, and guides us in the design of programs which run on large-scale parallel computers. However, what we have done is a modest exploration of the new world of large-scale parallel computation. The analysis technique must be expanded to include communication overheads and specula.tive computation. Now that PIM machines with hundreds of processors have become operational, the results of asymptotic analysis can be compared to experimental data and their applicability ca.n be evalua.ted. 6 Therefore, we have got accustomed to developing programs in KL1 in a short time. Generally, to develop parallel programs, programmers have to consider the synchronization of each modules. This is troublesome and often causes bugs. However, as KL1 has automated mechanisms to synchronize inferences, we were able to develop parallel programs in a relatively short period of time as follows. Summary of Parallel Application Programs vVe have introduced overviews on parallel application progra.ms and results of performance analysis. We will summarize knowledge processing and parallel processing using PIlVls/KL1. (1) Knowledge Processing by PIM/KL1 vVe have developed parallel intelligent systems such as CAD systems, diagnosis systems, control systems, a ga.me system, and so on. Knowledge technologies used in them are the newest, and these systems are valuable from viewpoint of AI applications, too. Usually, as these technologies need much computation time, it is impossible to solve large problems using sequential machines. Therefore, these systems are appropriate to evaluate effectiveness of parallel inference. vVe have already been experienced in knowledge processing by sequentia.l logic programming languages. t: Size 8k man*month 3 4k 8k 4.9 k 7.5 k 3.7 k 13.7 k 4 4 2 4 2 5 2.5 k 2k 11k 3 6 10 An extended Prolog for system programming. In those cases where the program didn't show high performance, we had to consider another process model in regards to granularity of parallelism. Therefore, we have to design the problem solution model in more detail than when developing it on sequential machines. (2) Two types of Process Programming The programming style of KL1 is different from that of sequential logic programming language. A typical programming style in KL1 is process programming. A process is an object which has internal states and procedures to manipulate those internal states. Each process is connected to other processes by streams. Communication is through these streams. A process structure can be easily realized in KL1 and many problem solving techniques can be modeled by process structures. We observed that two types of KL1 process structure are used in application programs. 1. Static process structure The first type of process structure is a static one. In this, a process structure for problem solving is constructed, then, information is exchanged between processes. The process structure doesn't change until the given problem is solved. Most distributed algorithms have a static process structure. The majority of application programs belong to this type. For example, in the Logic Simulator, an electrical circuit is divided into sub circuits and each sub circuit 187 is represented as a process (Figure 3). In the Protein Sequence Analysis System, two protein sequences are represented as a two dimensional network of KL1 processes (Figure 9). In the Legal Reasoning System, the lefthand side of a case rule is represented as a Rete-like network of KL1 processes (Figure 17). In co-LODEX, design agents are statically mapped onto processors (Figure 22). 2. Dynamic process structure The second type of process structure is a dynamic one. The process structure changes during computation. Typically, the toplevel process forks into subprocesses, each subprocess forks into subsubprocesses, and so on (Figure 33). Usually, this process structure corresponds to a search tree. Application programs such as Pentomino, Fifteen Puzzle and Tsumego belong to this type. Figure 33: A search tree by a dynamic process structure (3) New Paradigm for Parallel Algorithms We developed new programming paradigms w.hile designing parallel programs. Some of the parallel algorithms are not just parallelizations of sequential algorithms, but have desirable properties not present in the base algorithm. In combinatorial optimization programs, a parallel simulated annealing (SA) algorithm (used in the LSI cell placement program and MASCOT), a parallel rule-based annealing (RA) algorithm (used in the High Level Synthesis System), and a parallel genetic algorithm (GA) (used in the Motif Extraction System) were designed. The parallel SA algorithm is not just a parallel version of a sequential SA algorithm. By statically assigning tem.peratures to processors and allowing solutions to move from processor to processor, the solutions compete for lower temperature processors: a better solution has a high possibility of moving to a lower temperature. Thus, the programmer is freed from case-by-case tuning of temperature scheduling. The parallel SA algorithm is also time-homogeneous, an important consequence of which is it does not have the problem in sequential SA that the . solution can be irreversibly trapped in a local minimum at a low temperature. In the parallel RA algorithm, the distribution of the solution costs are monitored, and used to the judge whether or not the equilibrium state has been reached. In the go-playing program, the flying corps idea suited for real-time problem solving was introduced. The task of the flying corps is to investigate the outcome of moves that could result in a potentially large gain (such as capturing a large opponent group or invasion of a large opponent territory) or loss. The investigation of a possibility may take much longer time than allowed in real-time move making and cannot be done by the main corps. (4) Performance by Parallel Inference Some application programs exhibited high performance by parallel execution, such as up to 100-fold speedup using 128 processors. Examples include the logic simulator (LS) (Figure 4), the legal reasoning system (LR) (Figure 18), and MGTP which is a theorem prover developed by the fifth research laboratory of ICOT[Fujita et al. 1991] [Hasegawa et al. 1992]. Understandably, these are the cases where there is a lot of parallelism and parallelization overheads are minimized. The logic simulator (LS), the legal reasoning system (LR), and MGTP have high parallelism coming from the data size (a large number of gates in the logic simulator and a large number of case rules in the legal reasoning system) or problem space size (MGTP). A good load balance was realized by static even data allocation (LS, LR), or by dynamic load allocation (MGTP). Either communication locality was preserved by process clustering (LS), or communication between independent subtasks is small (rule set division in LR or OR-parallel search in MGTP). (5) Load Distribution Paradigm In all our application programs, programs with a static process structure used a static load distribution, while programs with a dynamic process structure used semistatic or dynamic load distribution. In a program with a static process structure, a good load balance can usually be obtained by assigning roughly the same number of processes to each processor. To reduce the communication overhead, it is desirable to respect the locality in the logical process structure. Thus, we first divide the processes into clusters of processes that are close to each other. Then, the clusters are mapped onto the processors. This direct cluster-toprocessor mapping may not attain good load balance, since, at a given point in computation, only part of the process structure has a high level of computational activity. In such a case, it is better to divide the process 188 structure into smaller clusters and map a number of clusters that are far apart from each other on one processor. This multiple mapping scheme is adopted in the shortest path program and the logic simulator. In the three dimensional DP matching program, a succession of alignment problems (sets of three protein sequences to align) are fed into the machine and the alignment is performed in a pipelined fashion, keeping most processors busy all the time. In a program with a dynamic process structure, newly spawned processes can be allocated to processors with a light computational load to balance the load. To maintain low communication overhead, only a small number of processes are selected as candidates of load distribution. For example, in a tree search program, not all search substasks but only those at certain depths are chosen for interprocessor load allocation. The Pentomino puzzle solver, the Fifteen puzzle solver and the Tsumego solver use this on-demand dynamic load balancing scheme. (6) Granularity of Parallelism To obtain high performance by parallel processing, we have to consider the granularity of parallelism. If the size of each subtask is small, it is hard to obtain high performance, because parallelization overheads such as process switching and communication are serious. For example, in the first version of the Logic Simulator, the gates of the electrical circuit were represented as processes communicating with each other via streams. The performance of this version was not high because the task for each process was too small. The second version represented sub circuits as processes (Figure 3), and succeeded in improving the performance. . (7) Programming Environment The.first programs to run on the Multi-PSI were developed before the KL1 implementation on the machine had been built. The user wrote and debugged a program on the sequential PDSS (PIMOS development support system) on a standard hardware. The program was then ported to the the Multi-PSI, with the addition of load distribution pragmas. The only debugging facilities on the Multi-PSI were those developed for debugging the implementation itself, and it was not easy to debug application programs with those facilities. Gradually, the PIMOS operating system [Chikayama 1992] added debugging facilities such as an interactive tracing/spying facility, a static code checker that gives warnings on single-occurrence variables which are often simply misspelled, and a deadlock reporting facility. The deadlock reporting facility identifies perpetually suspended goals and, instead of printing out all of them (possibly very many), it displays only a goal that is most upstream in the data flow. It has been extremely helpful in locating the cause of a perpetual suspension (usually, the culprit is a producer process failing to instantiate the variable on which the reported goal is suspended). Performance monitoring and gathering facility was later added (and is still being enhanced) [Aikawa 1992]. Post-mortem display of processor utilization along the time axis often clearly reveals that one processor is being a' bottleneck at a particular phase of computation. The breakdown of processor time (into computing/ communicating/idling) can give' a hint on how the process structure might be changed to remove the bottleneck. . Sometimes knowledge of KL1 implementation is necessary to interpret the information provided by the facility to tune (sequential as well as parallel) performance. A similar situation exists in performance tuning of application programs on any computers, but the problem seems' to be more serious in a parallel symbolic language like KL1. How to bridge the gap between the programmer's idea of KL1 and the underlying implementation remains a problem in performance debugging/tuning. 7 Conclusion We introduced overviews of parallel application programs and research on performance analysis. Application programs presented here contain interesting technologies from viewpoint of not only parallel processing but knowledge processing. By developing various knowledge processing technologies in KL1 and measuring their performance, we showed that KL1 is a suitable language to realize parallel knowledge processing technologies and that they are executed quickly on PIM. Therefore, PIM and KL1 are appropriate tools to develop large scale intelligent systems. Moreover, we have developed many parallel programming techniques to obtain high performance. We were able to observe their effects actually on the parallel inference machine. These experiences are summarized as guidelines for developing larger application systems. In addition to developing application programs, the performance analysis group analyzed behaviors of parallel programs in a general framework. The results of performance analysis gave us useful information for selecting parallel programming techniques and for predicting their performance when the problem sizes are scaled up. The parallel inference performances presented in this paper were measured on Multi-PSI or PIM/m. We need to cOnipare and analyze the performances on different PIMs as future works. We 'would also. like to develop more utility programs which will help us to develop parallel programs, such as a dynamic load balancer other than the multi-level load balancer. 189 Acknowledgement The research and development of parallel application programs has been carried out by researchers of the seventh research laboratory and cooperating manufacturers with suggestions by members of the PIC, GIP, ADS and KAR working groups. We would like to acknowledge them and their efforts. We also thank Kazuhiro Fuchi, the director of ICOT, and Shunichi Uchida, the manager of the research department. References [Aikawa 1992] S. Aikawa, K. Mayumi, H. Kubo, F. Matsuzawa. ParaGraph: A Graphical Tuning Tool for Multiprocessor Ssytems. In Proc. Int. Conf. on Fifth Generation Computer Systems 1992, ICOT, Tokyo, 1992. [Barton 1990] J. G. Barton, Protein Multiple Alignment and Flexible Pattern Matching. In Methods in Enzymology, Vol.18S (1990), Academic Press, pp. 626645. . [Chikayama 1992] Takashi Chikayama. KL1 and PIMOS. In Proc. Int. Conf. on Fifth Generation Computer Systems 1992, ICOT, Tokyo, 1992. [Date et al. 1992] H. Date, Y. Matsumoto, M. Hoshi, H. Kato, K. Kimura and K. Taki. LSI-CAD Programs on Parallel Inference Machine. In Proc. Int. Conf. on Fifth Generation Computer Systems 1992, ICOT, Tokyo, 1992. [de Kleer 1986] J. de Kleer. An Assumption-Based Truth Maintenance System, Artificial Intelligence 28, (1986), pp.127-162. . [Doyle 1979] J. Doyle. A Truth Maintenance System. Artificial Intelligence 24 (1986). [Falkenhainer 86] B. Falkenhainer, K. D. Forbus, D. Gentner. The Structure-Mapping Engine. In Proc. Fifth National Conference on Arlifical Intelligence, 1986. [Fujita et al. 1991] H. Fujita, et. al. A Model Generation Therem Prover in KL1 Using a Ramified-Stack Algorithm. ICOT TR-606 1991. programs on the Multi-PSI. In Proc. of PPoPP'90, 1990, pp. 50-59. [Goto et al. 1988] Atsuhiro Goto et al. Overview of the Parallel Inference Machine Architecture. In Proc. Int. Conf. on Fifth Generation Computer Systems 1988, ICOT, Tokyo, 1988. [Hasegawa et al. 1992] Hasegawa, R. et al. MGTP: A Parallel Theorem Prover Based on Lazy Model Generation. To appear in Proc. CADE' (System Abstract), 1992. [Hirosawa et al. 1991] Hirosawa, M., Hoshida, M., Ishikawa, M. and T. Toya, T. Multiple Alignment System for Protein Sequences employing 3-dimensional Dynamic Programming. In Proc. Genome Informatics Workshop II, 1991 (in Japanese). [Hirosawa et al. 1992] Hirosawa, H., Feldmann, R.J., Rawn, D., Ishikawa, M., Hoshida, M. and Micheals, G. Folding simulation using Temperature parallel Simulated Annealing. In Proc. Int. Conf. on Fifth Generation Computer System 1992, ICOT, Tokyo, 1992. [Ichiyoshi 1989] N. Ichiyoshi. Parallel logic programming on the Multi-PSI. ICOT TR-487, 1989. (Presented at the Italian-Swedish-Japanese Workshop '90). [lchiyoshi et al. 1992] N. Ichiyoshi and K. Kimura. Asymptotic load balance of distributed hash tables. In Proc. Int. Conf. on Fifth Generation Computer Systems 1992, 1992. [Ishikawa et al. 1991] Ishikawa,M., Hoshida,M., Hirosawa,M., Toya,T., Onizuka,K. and Nitta,K. (1991a) Protein Sequence Analysis by Parallel Inference Machine. Information Processing Society of Japan, TRFI-2S-2, (in Japanese). [Jefferson 1985] D. R. Jefferson. Virtual Time. ACM Transactions on Programming Languages and Systems, Vol.7, No.3 (1985), pp. 404-425. [Kimura et al.1991] K. Kimura and K. Taki. Timehomogeneous Parallel Annealing Algorithm. In Proc. IMACS'91, 1991. pp. 827-828. [F~ui 1989] S. Fukui. Improvement of the Virtual Time Algorithm. Transactions of Information Processing Society of Japan, Vol. 30 , No.12 (1989), pp. 15471554. (in Japanese) [Kimura et al. 1991] K. Kimura and N. Ichiyoshi. Probabilistic analysis of the optimal efficiency of the multilevel dynamic load balancing scheme. In Proc. Sixth Distributed Memory Computing Conference, 1991, pp. 145-152. [Furuichi el ai. 1990] M. Furuichi, K. Taki, and N. Ichiyoshi. A multi-level load balancing scheme for or-parallel exhaustive search [Kitazawa 1985] H. Kitazawa. A Line Search Algorithm with High Wireability For Custom VLSI Design, In Proc. ISCAS'85, 1985. pp.1035-1038. 190 [Koseki et al. 1990] Koseki, Y., Nakakuki, Y., and Tanaka, M., An adaptive model-Based diagnostic system, In Proc. PRICAI'90, Vol. 1 (1990), pp. 104109. [Kumar et al. 1988] V. Kumar, K. Ramesh, and V. N. Rao. Parallel best-first search of state space graphs: A summary of results. In Proc. AAAI-88, 1988, pp. 122-127. [Maruyama 1988] F. Maruyama et al. co-LODEX: a cooperative expert system for logic design. In Proc. Int. Conf. on Fifth Genemtion Computer Systems, ICOT, Tokyo, 1988, pp.1299-1306. [Maruyama 1990] F. Maruyama et al. Logic Design System with Evaluation-Redesign Mechanism. Electronics and Communications in Japan, Part III: Fundamental Electronic Science, Vol. 73, No.5, Scripta Technica, Inc. (1990). [Maruyama 1991] F. Maruyama et al. Solving Combinatorial Constraint Satisfaction and Optimization Problems Using Sufficient Conditions for Constraint Violation. In Proc. the Fourth Int. Symposium on Artificial Intelligence, 1991. [Matsumoto 1987] Y. Matsumoto. A parallel parsing system for natural language analysis. In Proc. Third International Conference on Logic Programming, Lecture1 Notes on Computer Science 225, Springer- Verlag, 1987, pp. 396-409. [Matsumoto et al. 1992] Y. Matsumoto and K. Taki. Parallel logic Simulator based on Time Warp and its Evaluation. In Proc. Int. Conf. on Fifth Genemtion Computer Systems 1992, ICOT, Tokyo, 1992. [Minoda 1992] Y. Minoda et al. A Cooperative Logic Design Expert System on a Multiprocessor. In Proc. Int. Conf. on Fifth Genemtion Computer Systems 1992,ICOT,Tokyo, 1992. [Nakakuki et al. 1990] Nakakuki, Y., Koseki, Y., and Tanaka, M., Inductive learning in probabilistic domain, In Proc. AAAI-90, Vol. 2 (1990), pp. 809-814. [Needleman et al. 1970] Needleman,S.B. and Wunsch,C.D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins. J. of Mol. Bioi., 48 (1970), pp. 443-453. [Nitta et al. 1992] K. Nitta et. al. HELIC-II: A Legal Reasoning System on the Parallel Inference Machine. In Proc. Int. Conf. on Fifth Generation Computer Systems 1992, ICOT, Tokyo, 1992. lOki 1989] H. Oki, K. Taki, S. Sei, and M. Furuichi. Implementation and evaluation of parallel Tsumego program on the Multi-PSI. In Proc. the Joint Parallel Processing Symposium (JSPP'89), 1989, pp. 351357. (In Japanese). [Skolnick and Kolinsky 1991] Skolnick, J. and Kolinski,A., Dynamic Monte Carlo Simulation of a New Lattice Model of Globular Protein Folding, Structure and Dynamics, Journal of Molecular Biology, Vol.221, N02, pp.499-531. [Susaki et al. 1989] K. Susaki, H. Sato, R. Sugimura, K. Akasaka, K. Taki, S. Yamazaki, and N. Hirota. Implementation and evaluation of parallel syntax analyzer PAX on the Multi-PSI. In Proc. Joint Parallel Processing Symposium (JSPP'89), 1989, pp. 342-350. (In Japanese). [Uchida et al. 1988] Shunichi Uchida et al. Research and Development of the Parallel Inference System in the Intermediate Stage of the FGCS Project. In Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. [Ueda et al. 1978] Ueda, Y., Taketomi, H. and Go, N. (1978) Studies on protein folding, unfolding and fluctuations by computer simulation. A three dimensionallattice model of lysozyme. Bilpolymers Vol.17 pp.1531-1548. [Wada and Ichiyoshi 1990] K. Wada and N. Ichiyoshi. A study of mapping of locally message exchanging algorithms on a loosely-coupled multiprocessor. ICOT TR-587, 1990. [Wada et al. 1992] M. Wada, K. Rokusawa, and N. Ichiyoshi. Parallelization of iterative deepening A* algorithm and its implementation and performance measurement on PIM/m. To appear in Joint Symposium on Parallel Processing JSPP'92 (in Japanese). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 191 Algorithmic & Knowledge Based Methods Do they "Unify" ? With some Programme Remarks for UNU/IIST* Dines Bj0rner and J0rgen Fischer Nilsson t April 1992 Abstract We examine two approaches to software application development. One is based on the conventional stepwise algorithmic approach typified by the imperative programming language (eg. PASCAL) tradition - but extends it with mathematical techniques for requirements development. The other is the knowledge based systems approach typified by the logic programming (eg. PROLOG) tradition. We contrast techniques and we attempt to find unifying issues and techniques. We propose a Most "Grand" Unifier - in the form of a Partial Evaluator (ie. Meta-interpreter - which establishes relations between the two approaches. The paper finally informs of the UNU/IIST, the United Nations University's International Institute for Software Technology. UNU/IIST shall serve especially the developing world. We outline consequences of the present analysis for the work of the UNU/IIST. The Fifth Generation Computer Project When the first author was invited, in late February, to read this paper at the plenum session of the International Conference on Fifth Generation Computer Systems it was expected that ... UNU/IIST strategies for prompting research and development in the field of computer science, including issues of education, creativity and international collaboration . .. and future prospects of computer science . .. would be covered by this presentation. *Invited paper for the Plenum Session of the International Conference on Fifth Generation Computer Systems, FGCS'92 ICOT, Tokyo, Japan, June 1-5, 1992. t Professor Dines Bj!llrner is Director of UNUJUST: United Nations University's International Institute for Software Technology, Apartado (Post office box) 517, Macau, e-mail: unuiist%uealab%umacmr V 1\ P 3This 'belief' is part of our future research plans. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 199 THE ROLE OF LOGIC IN COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE J. A. Robinson Syracuse University New York 13244-2010, U.S.A. ABSTRACT The modern history of computing begins in the 1930s with the rigorous definition of computation introduced by Godel, Church, Turing, and other logicians. The first universal digital computer was an abstract machine invented in 1936 by Turing as part of his solution of a problem in the foundations of mathematics. In the 1940s Turing's logical abstraction became a reality. Turing himself designed the ACE computer, and another logician-mathematician, von Neumann, headed the design teams which produced the EDV AC and the lAS computers. Computer science started in the 1950s as a discipline in its own right. Logic has always been the foundation of many of its branches: theory of computation, logical design, formal syntax and semantics of programming languages, compiler construction, disci plined programming, program proving, knowledge engineering, inductive learning, database theory, expert systems, theorem proving, logic programming and functional programming. Programming languages such as LISP and PROLOG are formal logics, slightly extended by suitable data structures and a few imperative constructs. Logic will always remain the principal foundation of computer science, but in the quest for artificial intelligence logic will be only one partner in a large consortium of necessary foundational disciplines, along with psychology, neuroscience, neurocomputation, and natural linguistics. 1 -LOGIC AND COMPUTING I expect that digital computing machines will eventually stimulate a considerable interest in symbolic logic. One could communicate with these machines in any language provided it was an exact language. In principle one should be able to communicate in any symbolic logic. A. M. Turing, 1947 The computer is the offspring of logic and technology. Its conception in the mid-1930s occurred in the course of the researches of three great logicians: Kurt Godel, Alonzo Church, and Alan Turing, and its subsequent birth in the mid 1940s was largely due to Turing's practical genius and to the vision and intellectual power of another great logician-mathematician, John von Neumann. Turing and von Neumann played leading roles not only in the design and construction of the first computers but also in laying the general logical foundations for understanding the computation process and for developing computing formalisms. Today, logic continues to be a fertile source of abstract ideas for novel computer architectures: inference machines, dataflow machines, database machines, rewriting machines. It provides a unified view of computer programming, (which is essentially a logical task) and a systematic framework for reasoning about programs. Logic has been important in the theory and design of high-level programming languages. Logical formalisms are the immediate models for two major logic programming language families: Church's lambda ca1cul us for functional programming languages such as LISP, ML, LUCID and MIRANDA, and the Horn-dause-resolution predicate calculus for relational programming languages such as PROLOG, P ARLOG, and GHC. Peter Landin noted over twenty years ago that ALGOL-like languages, too, were merely 'syntactically sugared' only-slightly-augmented versions of Church's lambda-calculus, and recently, another logical formalism, Martin-Lof's Intuitionistic Type Theory, has served (in, for example, Constable's NUPRL) as a very-highlevel programming language, a notable feature of which is that a proof of a program's correctness is an automatic accompaniment of the programwriting process. 200 To design, understand and explain computers and programming languages; to compose and analyze programs and reason correctly and cogently about th.eir properties; these are to practice an abstract logical art based upon (in H. A. Simon's apt phrase) a 'science of the artificial' which studies rational artifacts in abstraction from the engineering details of their physical realization, yet with an eye on their intrinsic efficiency. The formal logician has had to become also an abstract engineer. 1.1 LOGIC AND ARTIFICIAL INTELLIGENCE Logic provides the vocabulary and many of the techniques needed both for analyzing the processes of representation and reasoning and for synthesizing machines that represent' and reason. N. ]. Nilsson, 1991 In artificial intelligence (AI) research, logic has been used (for example, by McCarthy and Nilsson) as a rational model for knowledge representation and (for example by Plotkin and Muggleton) as a guide for the organization of machine inductive inference and learning. It has also been used (for example by Wos, Bledsoe and Stickel)' as the theoretical basis for powerful automated deduction systems which have proved theorems of interest to professional mathematicians. Logic's roles in AI, however, have been more controversial than its roles in the theory and practice of computing. Until the difference (if any) between natural intelligence and artificial intelligence is better understood, and until more experiments have tested the claims both of logic's advocates and of logic's critics concerning its place in AI research, the controversies will continue. 2 LOGIC AND THE ORIGIN OF THE COMPUTER. Logic's dominant role in the' invention of the modern computer is not widely appreciated. The computer as we know it today was invented in 1936, an event triggered by an important logical discovery announced by Kurt Godel in 1930. Godel's discovery decisively affected the outcome of the so-called Hilbert Program. Hilbert's goal was to formalize all of mathematics and then give positive answers to three questions a,bout the resulting formal system: is it consistent? is it complete? it is decidable? Godel found that no sufficiently rich formal system of mathematics can be both consistent and complete. In proving this, Godel invented, and used, a high-level symbolic programming language: the formalism of primitive recursive functions. As part of his proof, he composed an elegant modular functional program (a set of connected definitions of primitive recursive functions and predicates) which constituted a detailed computational presentation of the syntax of a formal system of number theory, with special emphasis on its inference rules and its notion of proof. This computational aspect of his work was auxiliary to his main result, but is enough to have established Godel as the first serious programmer in the modern sense. Godel's computational example inspired Alan Turing a few years later, in 1936, to find an explicit but abstract logical model not only of the computing process, but also of the computer itself. Using these as auxiliary theoretical concepts, Turing disposed of the third of Hilbert's questions by showing that the formal system of mathematics is not decidable. Although his original computer was only an abstract logical concept, during the following decade (1937-1946) Turing 'became a leader in the design, construction and operation of the first real computers. The problem of answering Hilbert's third question was known as the Decision Problem. Turing interpreted it as the challenge either to give an algorithm which correctly decides, for all formal mathematical propositions A and B, whether B is formally provable from A, or to show that is there no such algorithm. Having first clearly characterized what an algorithm is, he found the answer: there is no such algorithm. For our present purposes the vital part of Turing's result is his characterization of what counts as an algorithm. He based it on an analysis of what a 'computing agent' does when making a calculation according to a systematic procedure. He showed that, when boiled down to bare essentials, the activity of such an agent is nothing more than that of (as we would now say) a finitestate automaton which interacts, one at a time, with the finite-state cells comprising an infinite memory. Turing's machines are plausible abstractions from real computers, which, for Turing as for everyone else in the mid-1930s, meant a person who computes. The abstract Turing machine is an idealized model of any possible computational scheme such a human worker could carry out. His great achievement was to show that some Turing machines are 'universal' in that they can exactly mimic the behavior of' any Turing machine whatever. All that is needed is to place a 201 coded description of the given machine in the universal machine's memory together with a coded description of the given machine's initial memory contents. How Turing made use of this universal machine in answering Hilbert's third question is not relevant to our purpose here. The point is that his universal machines are the abstract prototypes of today's stored program general-purpose computers. The coded descri ption of each particular machine is the program which causes the universal machine to act like that particular machine. Abstract and purely logical as it is, Turing's work had an obvious technological interpretation. There is no need to build a separate machine for each computing task. One need build only one machine-a universal machine-and one can make it perform any conceivable computing task simply by writing a suitable program for it. Indeed Turing himself set out to build a universal machine. He began his detailed planning in 1944, when he was still fully engaged in the wartime British code-breaking project at Bletchley Park, and when the war ended in 1945 he moved to the National Physical Laboratory to pursue his goal full time. His real motive was already to investigate the possibility of artificial intelligence, a possibility he had frequently discussed at Bletchley Park with Donald Michie, I. J. Good, and other colleagues. He wanted, as he put it, to build a brain. By 1946 Turing completed his design for the ACE computer, based on his abstract universal machine. In designing the ACE, he was able to draw on his expert knowledge of the sophisticated new electronic digital technology which had been used at Bletchley Park to build special-purpose code-breaking machines (such as the Colossus). In the event, the ACE would not be the first physical universal machine, for there were others who were after the same objective, and who beat NPL to it. Turing's 1936 idea had started others thinking. By 1945 there were several people planning to build a universal machine. One of these was John von Neumann. Turing and von Neumann first met in 1935 when Turing was an unknown 23-year-old Cambridge graduate student. Von Neumann was already famous for his work in many scientific fields, including theoretical physics, logic and set theory, and several other important branches of mathematics. Ten years earlier, he had been one of the leading logicians working on Hilbert's Program, but after Godel's discovery he suspended his specifically logical researches and turned his attention to physics and to mathematics proper. In 1930 he emigrated to Princeton, where he remained for the rest of his life. Turing spent two years (from mid-1936 to mid1938) in Princeton, obtaining a doctorate under Alonzo Church, who in 1936 had independently solved the Decision Problem. Church's method was quite different from Turing's and was not as intuitively convincing. During his stay in Princeton, Turing had many conversations with von Neumann, who was enthusiastic about Turing's work and offered him a job as his research assistant. Turing turned it down in order to resume his research career in Cambridge, but his universal machine had already become an important item in von Neumann's formidable intellectual armory. Then came the war. Both men were soon completely immersed in their absorbing and demanding wartime scientific work. By 1943, von Neumann was deeply involved in many projects, a recurrent theme of which was his search for improved automatic aids to computation. In late 1944 he became a consultant to a University of Pennsylvania group, led by J. P. Eckert and J. W. Mauchly, which was then completing' the construction of the ENIAC computer (which was programmable and electronic, but not universal, and its programs were not stored in the computer's memory). Although he was too late to influence the design of the ENIAC, von Neumann supervised the design of the Eckert-Mauchly group's second computer, the EDV AC. Most of his attention in this period was, however, focussed on designing and constructing his own much more powerful machine in Princeton - the Institute for Advanced Study (lAS) computer. The EDV AC and the lAS machine both exemplified the socalled von Neumann architecture, a key feature of which is the fact that instruction words are stored along with data in the memory of the computer, and are therefore modifiable just like data words, from which they are not intrinsically distinguished. The lAS computer was a success. Many close copies were eventually built in the 19505, both in US government laboratories (the AVIDAC at Argonne National Laboratory, the ILLIAC at the University of Illinois, the JOHNIAC at the Rand 202 Corporation, the MANIAC at the Los Alamos National Laboratory, the ORACLE at the Oak Ridge National Laboratory, and the ORDV AC at the Aberdeen Proving Grounds), and in foreign laboratories (the BESK in Stockholm, the BESM in Moscow, the DASK in Denmark, the PERM in Munich, the SILLIAC in Sydney, the SMIL in Lund, and the WEIZAC in Israel); and there were at least two commercial versions of it (the IBM 701 and the International Telemeter Corporation's TC-1). The EDSAC, a British version of the EDV AC, was running in Cambridge by June 1949, the result of brilliantly fast construction work by M. V. Wilkes following his attendance at a 1946 EDVAC course. Turing's ACE project was, however, greatly slowed down by a combination of British civilservice foot-dragging and his own lack of administrative deviousness, not to mention his growing preoccupation with AI. In May 1948 Turing resigned from NPL in frustration and joined the small computer group at the University of Manchester, whose small but universal machine started useful operation the very next month and thus became the world's first working universal computer. All of Turing's AI experiments, and all of his computational work in developmental biology, took place on this machine and its successors, built by others but according to his own fundamental idea. Von Neumann's style in expounding the design and operation of EDVAC and the lAS machine was to suppress engineering details and to work in terms of an abstract logical description. He discussed both its system architecture and the principles of its programming entirely in such abstract terms. We can today see that von Neumann and Turing were right in following the logical principle that precise e'ngineering details are relatively unimportant in the essential problems of computer design and programming methodology. The ascendancy of logical abstraction over concrete realization has ever since been a guiding principle in computer science, which has kept itself organizationally almost entirely separate from electrical engineering. The reason it has been able to do this is that computation is primarily a logical concept, and only secondarily an engineering one. To compute is to engage in formal reasoning, according to certain formal symbolic rules, and it makes no logical difference how the formulas are physically represented, or how the logical transformations of them are physically realized. Of course no one should underestimate the enormous importance of the role of engineering in the history of the computer. Turing and von Neumann did not. They themselves had a deep and quite expert interest in the very engineering details from which they were abstracting, but they knew that the logical role of computer science is best played in a separate theater. 3 LOGIC AND PROGRAMMING Since coding is not a static process of translation, but rather the technique of providing a dynamic background to control the automatic evolution of a meaning, it has to be viewed as a logical problem and one that represents a new branch of formal logics. J. von Neumann and H. Goldstine, 1947 Much emphasis was placed by both Turing and von Neumann, in their discussions of programming, on the two-dimensional notation known as the flow-diagram. This quickly became a standard logical tool of early programming, and it can still be a useful device in formal reasoning about computations. The later ideas of Hoare, Dijkstra, Floyd, and others on the logical principles of reasoning about programs were anticipated by both Turing (in his 1949 lecture Checking a Large Routine) and von Neumann (in the 1947 Report Planning and Coding of Problems for an Electronic Computing Instrument). They stressed that programming has both a static and a dynamic aspect. The static text of the program itself is essentially an expression in some formal system of logic: a syntactic structure whose properties can be analyzed by logical methods alone. The dynamic process of running the program is part of the semantic meaning of this static text. 3.1 AUTOMATIC PROGRAMMING Turing's friend Christopher Strachey was an early advocate, around 1950, of using the computer itself to translate from high-level 'mathematical' descriptions into low-level 'machine-language' prescriptions. His idea was to try to liberate the programmer from concern with 'how' to compute so as to be able to concentrate on 'what' to compute: in short, to think and write programs in a more natural and human idiom. Ironically, Turing himself was not much interested in this idea, which he had already in 1947 pointed out as an 'obvious' one. In fact, he seems to have had a hacker's pride in his fluent machine-language virtuosity. He was able to think directly and easily in terms of bare bit patterns and of the 203 unorthodox number representations such as the Manchester computer's reverse (i.e., low-order digits first) base-32 notation for integers. In this attitude, he was only the first among many who have stayed aloof from higher-level programming languages and higher-level machine architectures, on the grounds that a real professional must be aware of and work closer to the actual realities of the machine. One senses this attitude, for example, throughout Donald Knuth's monumental treatise on the art of computer programming. It was not until the late 1950s (when FORTRAN and LISP were introduced) that the precise sequential details of how arithmetical and logical expressions are scanned, parsed and evaluated could routinely be ignored by most programmers and left· to the com pu ter to work out. This advance brought an immense simplification of the programming task and a large increase in programmer productivity. There soon followed more ambitious language design projects such as the international ALGOL project, and the theory and practice of programming language design, together with the supporting software technology of interpreters and compilers, quickly became a major topic in computer science. The formal grammar used to define the syntax of ALGOL was not initially accompanied by an equally formal specification of its semantics; but this soon followed. Christopher Strachey and Dana Scott developed a formal 'denotational semantics' for programs, based on a rigorous mathematical interpretation of the previously uninterpreted, purely syntactical, lambda calculus of Church. It was, incidentally, a former student of Church, John Kemeny, who devised the enormously popular 'best-selling' programming language, BASIC. 3.2 DESCRIPTIVE AND IMPERATIVE ASPECTS There are two sharply-contrasting approaches to programming and programming languages: the descriptive approach and the imperative approach. The descriptive approach to programming focusses on the static aspect of a computing plan, namely on the denotative semantics of program expressions. It tries to see the entire program as a timeless mathematical specification which gives the program's output as an explicit function of its input (whence arises the term 'functional' programming). This approach requires the computer to do the work of constructing the described output automatically from the given input according to the given specifications, without any explicit direction from the programmer as to how to do it. The imperative approach focusses on the dynamic aspect of the computing plan, namely on its operational semantics. An imperative program specifies, step by step, what the computer is to do, what its 'flow of control' is to be. In extreme cases, the nature of the outputs of an imperative program might be totally obscure. In such cases one must (virtually or actually) run the program in order to find out what it does, and try to guess the missing functional description of the output in terms of the input. Indeed it is necessary in general to 'flag' a control-flow program with comments and assertions, supplying this missing information, in order to make it possible to make sense of what the program is doing when it is running. Although a purely static, functional program is relatively easy to understand and to prove correct, in general one may have little or no idea of the cost of running it, since that dynamic process is deliberately kept out of sight. On the other hand, although an operational program is relatively difficult to understand and prove correct, its more direct depiction of the actual process of computation makes an assessment of its efficency relatively straightforward. In practice, most commonly-used high-level programming languages-even LISP and PROLOG-have both functional and operational features. Good programming technique requires an understanding of both. Programs written in such languages are often neither wholly descriptive nor wholly imperative. Most programming experts, however, recommend caution and parsimony in the use of imperative constructs. Some even recommend complete abstention. Dijkstra's now-classic Letter to the Editor (of the Communications of the ACM), entitled 'GOTO considered harmful' is one of the earliest and best-known such injunctions. These two kinds of programming were each represented in pure form from the beginning: Gadel's purely descriptive recursive function formalism and Turing's purely imperative notation for the state-transition programs of his machines. 204 3.3 LOGIC AND PROGRAMMING LANGUAGES In the late 1950s at MIT John McCarthy and his group began to program their IBM 704 using symbolic logic directly. Their system, LISP, is the first major example of a logic programming language intended for actual use on a computer. It is essentially Church's lambda calculus, augmented by a simple recursive data structure (ordered pairs), the conditional expression, and an imperative 'sequential construct' for specifying a series of consecutive actions. In the early 1970s Robert Kowalski in Edinburgh and Alain Colmerauer in Marseille showed how to program with another only-slightly-augmented system of symbolic logic, namely - the Horn-clauseresolution form of the predicate calculus. PROLOG is essentially this system of logic, augmented by a sequentializing notion for lists of goals and lists of clauses, a flow-of-control notion consisting of a systematic depth-first, back-tracking enumeration of all deductions permitted by the logic, and a few imperative commands (such as the 'cut'). PROLOG is implemented with great elegance and efficiency using ingenious techniques originated by David H. D. Warren. The princi pal virtue of logic programming in either LISP or PROLOG lies in the ease of writing programs, their intelligibility, and their amenability to metalinguistic reasoning. LISP and PROLOG are usually taken as paradigms of two distinct logic programming styles (functional programming and relational programming) which on closer examination turn out to be only two examples of a single style (deductive programming). The general idea of purely descriptive deductive programming is to construe computation as the systematic reduction of expressions to a normal form. In the case of pure LISP, this means essentially the persistent application of reduction rules for processing function calls (Church's beta-reduction rule), the conditional expression, and the data-structuring operations for ordered pairs. In the case of pure PROLOG, it means essentially the persistent application of the beta-reduction rule, the rule for the distributing AND through OR, the rule for eliminating existential quantifiers from conjunctions of equations, and the rules for simplifying expressions denoting sets. By merging these two formalisms one obtains a unified logical system in which both flavors of programming are available both separately and in combination with each other. My colleague Ernest Sibert and I some years ago implemented an experimental language based on this idea (we called it LOGLISP). Currently we are working on another one, called SUPER, which is meant to illustrate how such reduction logics can be implemented naturally on massively parallel computers like the Connection Machine. LISP, PROLOG and their cousins have thus demonstrated the possibility, indeed the practicality, of using systems of logic directly to program computers. Logic programming is more like the formulation of knowledge in a suitable form to be used as the axioms of automatic deductions by which the computer infers its answers to the user's queries. In this sense this style of programming is a bridge linking computation in general to AI systems in particular. Knowledge is kept deliberately apart (in a 'knowledge base') from the mechanisms which invoke and apply it. Robert Kowalski's well-known equational aphorism 'algorithm = logic + control ' neatly sums up the necessity to pay attention to both descriptive and imperative aspects of a program, while keeping them quite separate from each other so that each aspect can be modified as necessary in an intelligible and disciplined way. The classic split between procedural and declarative knowledge again shows up here: some of the variants of PROLOG (the stream-parallel, commi tted -choice nondeterministic languages such as ICOT's GHC) are openly concerned more with the control of events, sequences and con currencies than on the management of the deduction of answers to queries. The uneasiness caused by this split will remain until some way is found of smoothly blending procedural with declarative within a unified theory of computation. Nevertheless, with the' advent of logic programming in the wide sense, computer science has outgrown the idea that programs can only be the kind of action-plans required by Turing-von Neumann symbol-manipulating robots and their modern descendants. The emphasis is (for the programmer, but not yet for the machine designer) now no longer entirely on controlling the dynamic sequence of such a machine's actions, but increasingly on the static syntax and semantics of logical expressions, and on the corresponding mathematical structure of the data and other objects which are the denotations of the expressions. It is interesting to speculate how different the history of computing might have 205 been if in 1936 Turing had proposed a purely descriptive abstract universal machine rather than the purely imperative one that he actually did propose; or if, for example, Church had done so. We might well now have been talking of 'Church machines' instead of Turing machines. We would be used to thinking of a Church machine as an automaton whose states are the expressions of some formal logic. Each of these expressions denotes some entity, and there is a semantic notion of equivalence among the expressions: equivalence means denoting the same entity. For example, the expressions (23 + 4)/(13 -4), 1.3 + 1.7, Az. (2z + 1)1/2 (4) are equivalent, because they all denote the number three. A Church machine computation is a sequence of its states, starting with some given state and then continuing according to the transition rules of the machine. If the sequence of states eventually reaches a terminal state, and (therefore) the computation stops, then that terminal state (expression) is the output of the machine for the initial state (expression) as input. In general the machine computes, for a given expression, another expression which is equivalent to it and which is as simple as possible. For example, the expression '3' is as simple as possible, and is equivalent to each of the above expressions, and so it would be the output of a computation starting with any of the expressions above. These simple-as-possible expressions are said to be in 'normal form'. The 'program' which determines the transitions of a Church machine through its successive states is a set of 'rewriting' rules together with a criterion for applying some one of them to any expression. A rewriting rule is given by two expressions, called the 'redex' and the 'contractum' of the rule, and applying it to an expression changes (rewrites) it to another expression. The new expression is a copy of the old one, except that the new expression contains an occurrence of the contractum in place of one of the occurrences of the redex. If the initial state is (23 + 4) / (13 - 4) then the transi tions are: (23 + 4)/(13 - 4) 27/(13 -4) 27/9 becomes becomes becomes 27/(13 - 4), 27/9, 3. Or if the initial state is AZ. (2z + 1)1/2 (4), then the trans 1tions are: Az. (2z + 1)1/2 (4) becomes «2 x 4) + 1)1/2 becomes (8 + 1)1/ 2 becomes 91/2 becomes «2 x 4) + 1)1/2 (8 + 1)1/ 2 91/2 3. Most of us are trained in early life to act like a simple purely arithmetical Church machine. We all learn some form of numerical rewriting rules in elementary school, and use them throughout our lives (but of course Church's lambda notation is not taught in elementary school, or indeed at any time except when people later specialize in logic or mathematics; but it ought to be). Since we cannot literally store in our heads all of the infinitely many redex-contractum pairs <23 + 4, 27>, <2+2, 4> etc., infinite sets of these pairs are logically coded into simple finite algorithms. Each algorithm (for addition, subtraction, and so on) yields the contractum for any given redex of its particular type. We hinted earlier that an expression is in normal form if it is as simple as possible. To be sure, that is a common way to think of normal forms, and in many cases it fits the facts. Actually to be in normal form is not necessarily to be in as simple a form as possible. What counts as a normal form will depend on what the rewriting rules are. Normal form is a relative notion: given a set of rewriting rules, an expression in normal form is one which contains no redex. In designing a Church machine care must be taken that no expression is the redex of more than one rule. The machine must also be given a criterion for deciding which rule to apply to an expression which contains distinct redexes, and also for deciding which occurrence of that rule's redexes to replace, in case there are two or more of them. A simple criterion is always to replace the leftmost redex occurring in the expression. A Church machine, then, is a machine whose possible states are all the different expressions of some formal logic and which, when started in some state (i.e., when given some expression of that logic) will 'try' to compute its normal form. The computation mayor may not terminate: this will depend on the rules and on the initial expression. Some of the expressions for some Church machines may have no normal form. Since for all interesting formal logics there are infinitely many expressions, a Church machine is not a finite-state automaton; so in practice the same provision must be made as in the case of the 206 Turing machines for adjoining as much external memory as needed during a computation. this expression and display the result' imperative (as in LISP's classic read-eval-print cycle). Church machines can also serve as a simple model for parallel computation and parallel architectures. One has only to provide a criterion for replacing more than one redex at the same time. In Church's lambda calculus one of the rew~iting rules ('beta reduction') is the logical verSIOn of executing a function call in a highlevel programming language. Logic programming languages based on Horn-clauseresolution can also be implemented as Church machines, at least as far as their static aspects are concerned. 4 LOGIC AND ARTIFICIAL INTELLIGENCE In the early 1960s Peter Landin, then Christopher Strachey's research assistant, undertook to convince computer scientists that not merely LISP, but also ALGOL, and indeed all past, present and future programming languages are essentially the abstract lambda calculus in one or another concrete manifestation. One need add only an abstract version of the 'state' of the computation process and the concept of 'jump' or change of state. Landin's abstract logical model combines declarative programming with procedural programming in an insightful and natural way. Landin's thesis also had a computer-design aspect, in the form of his elegant abstract logic machine (the SECD machine) for executing lambda calculus programs. The SECD m'achine language is the lambda calculus itself: there is no question of 'compiling' programs into a lower-level language (but more recently Peter Henderson has described just such a lower-level SECD machine which executes compiled LISP expressions). Landin's SECD machine is a sophisticated Church machine which uses stacks to keep track of the syntactic structure of the expressions and of the location of the leftmost redex. We must conclude that the descriptive and imperative views of computation are not incompatible with each other. Certainly both are necessary. There is no need for their mutual antipathy. It arises only because enthusiastic extremists on both sides sometimes claim that computing and programming are 'nothing but' the one or the other. The appropriate view is that in all computations we can expect to find both aspects, although in some cases one or the other aspect will dominate and the other may be present in only a minimal way. Even a pure functional program can be viewed as an implicit 'evaluate In AI a controversy sprang up in the late 1960s over essentially this same issue. There was a spirited and enlightening debate over whether knowledge should be represented in procedural or declarative form. The procedural view was mainly associated with Marvin Minsky and his MIT group, represented by Hewitt's PLANNER system and Winograd's application of it to support a rudimentary natural language capability in his simple simulated robot SHRDLU. The declarative view was associated with Stanford's John McCarthy, and was represented by Green's QA3 system and by Kowalski's advocacy of Horn clauses as a logic-based deductive programming language. Kowalski was able to make the strong case that he did because of Colmerauer's development of PROLOG as a practical logic programming language. Eventually Kowalski found an elegant way to end the debate, by pointing out a procedural interpretation for the ostensibly purely declarative Horn clause sentences in logic programs. There is an big epistemological and psychological difference between simply describing a thing and giving instructions for constructing it, which corresponds to the difference between descriptive and imperative programming. One cannot always see how to construct the denotation of an expression efficiently. For example, the meaning of the descriptive expression the smallest integer which is the sum of two cubes in two different ways. seems quite clear. We certainly understand the expression, but those who don't already (probably from reading of Hardy's famous visit to Ramanujan in hospital) know that it denotes the integer 1729 will have to do some work to figure it out for themselves. It is easy to see that 1729 is the sum of two cubes in two different ways if one is shown the two equations 1729 = 13 + 123 1729 = 103 +93 but it needs at least a little work to find them oneself. Then to see that 1729 is the smallest integer with this property, one has to see somehow that all smaller integers lack it, and this means checking each one, either literally, or by some clever shortcut. To find 1729, in the first 207 place, as the denotation of the expression, one has to carry out the all of this work, in some form or another. There are of course many different ways to organize the task, some of which are much more efficient than others, some of which are less efficient, but more intelligible, than others. So to write a general computer program which would automatically and efficiently reduce the expression the smallest integer which is the sum of two cubes in two different ways to the expression '1729' and equally well handle other similar expressions, is not at all a trivial task. 4.1 AI AND PROGRAMMING Automatic programming has never really been that. It is no more than the automatic translation of one program into another. So there must be some kind of program (written by a human, presumably) which starts off the chain of translations. An assembler and a compiler both do the same kind of thing: each accepts as input a program written in one programming language and delivers as output a program written in another programming language, with the assurance that the two programs are equivalent in a suitable sense. The advantage of this technique is of course that the source program is usually more intelligible and easier to write than the target program, and the target program is usually more efficient than the source program because it is typically written in a lower-level language, closer to the realities of the machine which will do the ultimate work. The advent of such automatic translations opened up the design of programming languages to express 'big' ideas in a style 'more like mathematics' (as Christopher Strachey put it). These big ideas are then translated into smaller ideas more appropriate for machine languages. Let us hope that one day we can look back at all the paraphernalia of this program-translation technology, which is so large a part of today's computer science, and see that it was only an interim technology. There is no law of nature which says that machines and machine languages are intrinsically low-level. We must strive towards machines whose 'level' matches our own. Turing and von Neumann both made important contributions to the beginnings of AI, although Turing's contribution is the better known. His 1950 essay Computin~ Machinery and Intelli~ence is surely the most quoted single item in the entire literature of AI, if only because it is the original source of the so-called Turing Test. The recent revival of interest in artificial neural models for AI applications recalls von Neumann's deep interest in computational neuroscience, a field he richly developed in his later years and which was absorbing all his prodigious intellectual energy during his final illness. When he died in early 1957 he left behind an uncompleted manuscript which was posthumously published as the book The Computer and the Brain. 4.2 LOGIC AND PSYCHOLOGY IN AI If a machine is to be able to learn something, it must first be able to be told it. John McCarthy, 1957 I do not mean to say that there is anything wrong with logic; I only object to the assumption that ordinary reasoning is largely based on it. M. L. Minsky, 1985 AI has from the beginning been the arena for an uneasy coexistence between logic and psychology as its leading themes, as epitomized in the contrasting approaches to AI of John McCarthy and Marvin Minsky. McCarthy has maintained since 1957 that AI will come only when we learn how to write programs (as he put it) which have common sense and which can take advice. His putative AI system is a (presumably) very large knowledge base made up of declarative sentences written in some suitable logic (until quite recently he has taken this to be the first order predicate calculus), equipped with an inference engine which can automatically deduce logical consequences of this knowledge. Many wellknown AI problems and ideas have arisen in pursuing this approach: the Frame Problem, Nonmonotonic Reasoning, the Combinatorial Explosion, and so on. This approach demands a lot of work to be done on the epistemological problem of declara ti vel y representing knowledge and on the logical problem of designing suitable inference engines. Today the latter field is one of the flourishing special subfields of AI. Mechanical theoremproving and automated deduction have always been a source of interesting and hard problems. After over three decades of trying, we now ha ve well-understood methods of systematic deduction which are of considerable use in practical applications. 208 Minsky maintains that humans rarely use. logic in their actual thinking and problem solving, but adds that logic is not a good basis even for artificial problem solving-that computer programs based solely on Mc<:arthy'~ lo~ical deductive knowledge-base paradigm wIll fall to displa y intelligence because of their inevi~able computational inefficiencies; that the pre~hcate calculus is not adequate for the representation of most knowledge; and that the exponential complexity of predicate calculus proof procedures will always severely limit what inferences are possible. Because it claims little or nothing, the view can hardly be refuted that humans undoubtedly ar~ in some sense (biological) machines whose design, though largely hidden from us at present and obviously exceedingly complicated, calls for some finite arrangement of material components all built ultimately out of 'mere' atoms and molecules and obeying the laws of physics and chemistry. So there is an abstract design whic~, when physically implemented, produces (m ourselves, and the animals) intelligence. Intelligent machines can, then, b~ built. Indee~, they can, and do routinely, bUild. and repa?r themselves, given a suitable enVIronment In which to do so. Nature has already achieved NInatural intelligence. Its many manifestations serve the AI research community as existence proofs that intelligence can occur in physical systems. Nature has already solved all the AI problems, by sophisticated schemes only a very few of which have yet been understood. 4.3 THE STRONG AI THESIS According to Strong AI, the computer is not merely a t.ool in the study of the mind; rather, the approprIately programmed computer really is a mind, in. the sense. that computers given the right programs can be hterally saId to understand and have other cognitive states. ,. R. Searle, 1980 Turing believed, indeed was the first to propound, the Strong AI thesis that artificial intelligence can be achieved simply by appropriate programming of his universal computer. Turing's Test is simply a detection device, waiting for intelligence to occur in machines: if a machine is one day programmed to carryon fluent and intelligentseeming conversations, will we not, argued Turing, have to agree that this intelligence, or at least this apparent intelligence, is a property of the program? What is the difference between apparent intelligence, and intelligence itself? The Strong AI thesis is also implicit in Mc<:~r~hy's long-pursued project to reconstruct artifiCially something like human intelligence by implementing a suitable formal system. Thus the Turing Test might (on McCarthy's view) eventually be passed by a deductive kn?wle.d~e base, containing a suitable repertory of hngUIStIC and other everyday human knowledge, and an efficient and sophisticated inference engine. The system would certainly have to have. a mastery of (both speaking and understanding). ~atural . language. Also it would have to exhibit to a sufficent degree the phenomenon of 'learning' so as to be capable of augmenting and improving its knowledge base to keep it up-to-date both in t~e small (for example in dialog management) and In the large (for example in keeping up wi.th ~~e news and staying abreast of advances In SCientifiC knowledge). In a recent vigorous defense of the Strong AI thesis, Lenat and Feigenbaum argued that if enough knowledge of the right kind is encoded in the system it will be able to 'take off' and autonomously acquire more through reading books and newspapers, watching TV, taking courses, and talking to people. It is not the least of the attractions of the.Strong AI thesis is that it is empirically testable. We shall know if someone succeeds in building a system of this kind: that indeed is what Turing's Test is for. 4.4 EXPERT SYSTEMS Expert systems are limited-scale attempted practical applications of McCarthy's idea. Some of them (such as the Digital Equipment Corporation's system for c~nfiguring. ': AX computing systems, and the highly speclahzed medical diagnosis systems, such as MYCIN) have been quite useful in limited contexts, but there have not been as many of them as the more enthusiastic proponents of the idea might have wished. The well-known book by Feigenbaum & McCorduck on the Fifth Generation Project was a spirited attempt to stir up enthusiasm for Expert Systems and Knowledge Engineering in the United States by portraying ICOT's mission as a Japanese bid for leadership in this field. There has indeed been much activity in devising specialized systems of applied logic whose axioms collectively represent a body of expert knowledge for some field (such as certain diseases, their symptoms and treatments) and whose deductions represent the process of solving problems posed 209 about that field (such as the problem of diagnosing the probable cause of given observed symptoms in a patient). This, and other, attempts to apply logical methods to problems which call for inference-making, have led to an extensive campaign of reassessment of the basic classicial logics as suitable tools for such a purpose. New, nonclassical logics have been proposed (fuzzy logic, probabilistic logic, temporal logic, various modal logics, logics of belief, logics for causal reationships, and so on) along with systematic methodologies for deploying them (truth maintenance, circumscription, non-monotonic reasoning, and so on). In the process, the notion of what is a logic has been stretched and modified in many different ways, and the current picture is one of busy experimentation with new ideas. coded in the genes. This is not exactly an AI problem. One cannot help wondering whether Turing may have been disappointed, at the end of his life, with his lack of progress towards realizing AI. If one excludes some necessary philosophical clarifications and preliminary methodological discussions, nothing had been achieved beyond his invention of the computer itself. Von Neumann's view of AI was a 'logico-neural' version of the Strong AI thesis, and he acted on it with typical vigor and scientific virtuosity. He sought to for'malize, in an abstract model, aspects of the actual structure and function of the brain and nervous system. In this he was consciously extending and improving the pioneer work of McCullogh and Pitts, who had described their model as 'a logical calculus immanent in nervous activity'. Here again, it was logic which served as at least an approximate model for a serious attack on an ostensibly nonlogical problem. The empirical goal of finding out how the human mind actually works, and the theoretical goal of reproducing its essential features in a machine, are not much closer in the early 1990s than they were in the early 1950s. After forty years of hard work we have 'merely' produced some splendid tools and thoroughly explored plenty of blind alleys. We should not be surprised, or even disappointed. The problem is a very hard one. The same thing can be said about the search for controlled thermonuclear fusion, or for a cancer cure. Our present picture of the human mind is summed up in Minsky's recent book The Society of Mind, which offers a plausible general view of the mind's architecture, based on clues from the physiology of the human brain and nervous system, the computational patterns found useful for the organization of complex semantic information-processing systems, and the sort of insightful interpretation of observed human adult- and child-behavior which Freud and Piaget pioneered. Logic is given little or no role to play in Minsky's view of the mind. Von Neumann's logical study of selfreproduction as an abstract computational phenomenon was not so much an AI investigation as an essay in quasi-biological information processing. It was certainly a triumph of abstract logical formalization of an undeniably computational process. The selfreproduction method evolved by Nature, using the double helix structure of paired complementary coding sequences found in the DNA molecule, is a marvellous solution of the formal problem of self-reproduction. Von Neumann was not aware of the details of Nature's solution when he worked out his own logical, abstract version of it as a purely theoretical construction, shortly before Crick and Watson unravelled the structure of the DNA molecule in 1953. Turing, too, was working at the time of his death on another, closely-related problem of theoretical biology-morphogenesis-in which one must try to account theoretically for the unfolding of complex living structural organizations under the control of the programs Minsky rightly emphasizes (as logicians have long insisted) that the proper role of logic is in the context of justification rather than in the context of discovery. Newell, Simon and Shaw's 1956 well known propositional calculus theoremproving program, the Logic Theorist, illustrates this distinction admirably. The Logic Theorist is a discovery simulator. The goal of their experiment was to make their program discover a proof (of a given propositional formula) by 'heuristic' means, reminiscent (they supposed) of the way a human would attack the same problem. As an algorithmic theorem-prover (one whose goal is to show formally, by any means, and presumably as efficiently as possible, that a given propositional formula is a theorem) their program performed nothing like as well as the best nonheuristic algorithms. The logician Hao Wang soon (1959) rather sharply pointed this out, but it seems that the psychologcial moti v a tion of their investigation had eluded him (as indeed it has many others). They had themselves very much muddled the issue by contrasting their heuristic 4.5 LOGIC AND NEUROCOMPUTATION 210 theorem-proving method with the ridiculously inefficient, purely fictional, 'logical' one of enumerating all possible proofs in lexicographical order and waiting for the first one to turn up with the desired proposition as its conclusion. This presumably was a rhetorical flourish which got out of control. It strongly suggested that they believed it is more efficient to seek proofs heuristically, as in their program, than algorithmically with a guarantee of success. Indeed in the exuberance of their comparison they provocatively coined the wicked but amusing epithet 'British Museum algorithm' for this lexicogaphic-enumeration-of-all-proofs methodthe intended sting in the epithet being that just as, given enough time, a systematic lexicographical enumeration of a.11 possible texts will eventually succeed in listing any given text in the vast British Museum Library, so a logician, given enough time, will eventually succeed in proving any given provable proposition by proceeding along similar lines. Their implicit thesis was that a proof-finding algorithm which is guaranteed to succeed for any provable input is necessarily unintelligent. This may well be so: but that is not the same as saying that it is necessarily inefficient. Interestingly enough, something like this thesis was anticipated by Turing in his 1947 lecture before the London Mathematical Society: ... if a machine is expected to be infallible, it cannot also be intelligent. There are several mathematical theorems which say almost exactly that. 5 CONCLUSION Logic's abstract conceptual gift of the universal computer has needed to be changed remarkably little since 1936. Until very recently, all universal computers have been realizations of the same abstraction. Minor modifications and improvements have been made, the most striking one being internal memories organized into addressable cells, designed to be randomly accessible, rather than merely sequentially searchable (although external memories remain essentially sequential, requiring search). Other improvements consist largely of building into the finite hardware some of the functions which would otherwise have to be carried out by software (although in the recent RISC architectures this trend has actually been reversed). For over fifty years, successive models of the basic machine have been 'merely' faster, cheaper, physically smaller copies of the same device. In the past, then, computer science has pursued an essentially logical quest: to explore the Turing-von Neumann machine's unbounded possibilities. The technological ,challenge, of continuing to improve its physical realizations, has been largely left to the electrical engineers, who have performed miracles. In the future, we must hope that the logician and the engineer will find it possible and natural to work more closely together to devise new kinds of higher-level computing machines which, by making programming easier and more natural, will help to bring artificial intelligence closer. That future has been under way for at least the past decade. Today we are already beginning to explore the possibilities of, for example, the Connection Machine, various kinds of neural network machines, and massively parallel machines for logical knowledge-processing. It is this future that the bold and imaginative Fifth Generation Project has been all about. Japan's ten-year-Iong ICOT-based effort has stimulated (and indeed challenged) many other technologically advanced countries to undertake ambitious logic-based research projects in computer science. As a result of ICOT's international leadership and example, the computing world has been reminded not only of how central the role of logic has been in the past, as generation has followed generation in the modern history of computing, but also of how important a part it will surely play in the generations yet to come. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE O~ FIFTH GENERATION COMPUTER SYSTEMS 1992, edIted by ICOT. © ICOT, 1992 21 I PROGRAMS ARE PREDICATES C.A.R. Hoare Programming Research Group, Oxford University Computing Laboratory, 11 Keble Road, Oxford, OXl 3QD, England. Abstract Requirements to be met by a new engineering product can be captured most directly by a logical predicate describing all its desired and permitted behaviours. The behaviour of a complex product can be described as the logical composition of predicates describing the behaviour of its simpler components. If the composition logically implies the original requirement, then the design will meet its specification. This implication can be mathematically proved before starting the implementation of the components. The same method can be repeated on the design of the components, until they are small enough to be directly implement able. A programming language can be defined as a restricted subset of predicate notation, ensuring that the described behaviour may be efficiently realised by a combination of computer software and hardware. The restrictive notations give rise to a specialised mathematical theory, which is expressed as a collection of algebraic laws useful in the transformation and optimisation of designs. Non-determinism contributes both to reusability of design and to efficiency of implementation. This philosophy is illustrated by application to hardware design, to procedural programs and to PROLOG. It is shown that the procedural reading of logic programs as predicates is different from the declarative reading, but just as logical. For my part, I have been most inspired by the philosophy with which this project approaches the daunting task of writing programs for the new generation of computers and their users. I have long shared the view that the programming task should always begin with a clear and simple statement of requirements and objectives, which can be formalised as a specification of the purposes which the program is required to meet. Such specifications are predicates, with variables standing for values of direct or indirect observations that can be made of the behaviour of the program, including both questions and answers, input and output, stimulus and response. A predicate describes, in a neutral symmetric fashion, all permitted values which those variables may take when the program is executed. The over-riding requirement on a specification is clarity, achieved by a notation of the highest possible modularity and expressive power. If a specification does not obviously describe what is wanted, there is a grave danger that it describes what is not wanted; it can be difficult, expensive, and anyway mathematically impossible to check against this risk. A minimum requirement on a specification language is that it should include in full generality the elementary connectives of Boolean Algebra: conjunction, disjunction, and negation - simple and, or, and not. Conjunction is needed to connect requirements, both of which must be met, for example, • it must control pressure and temperature. 1 Inspiration It is a great honour for me to address this conference which celebrates the completion of the Fifth Generation Computer Systems project in Tokyo. I add my own congratulations to those of your many admirers and followers for the great advances and many achievements made by those who worked on the project. The project started with ambitious and noble goals, aiming not only at radical advances in Computer Technology, but also at the direction of that technology to the wider use and benefit of mankind. Many challenges remain; but the goal is one that inspires the best work of scientists and engineers throughout the ages. Disjunction is needed to allow tolerances in implementation • it may deviate from optimum by one or two degrees. And negation is needed for even more important reasons • it must not explode! As a consequence, it is possible to write a specification like PV -,p which is always true, and so describes every possible observation of every possible product. Such a tolerant 212 specification is easy to satisfy, even by a program that gets into an infinite loop. In fact, such infinite failure will be treated as so serious that the tautologously true specification is the only one that it satisfies. Another inspiring insight which I share with the Fifth Generation project is that programs too are predicates. When given an appropriate reading, a program describes all possible observations of its behaviour under execution, all possible answers that it can give to any possible question. This insight is one of the most convincing justifications for the selection of logic programming as the basic paradigm for the Fifth Generation project. But I believe that the insight is much more general, and can be applied to programs expressed in other languages, and indeed to engineering products described in any meaningful design notation whatsoever. It gives rise to a general philosophy of engineering, which I shall illustrate briefly in this talk by application to hardware design, to conventional sequential programs, and even to the procedural interpretation of PROLOG programs. But it would be wholly invalid to claim that all predicates can be read as programs. Consider a simple but dramatic counter-example, the contradictory predicate logic, this can be assured with mathematical certainty by a proof of the simple implication P & -,p A very simple example of this philosophy is taken from the realm of procedural programming. Here the most important observable values are those which are observed before the program starts and those which are observed after the program is finished. Let us use the variable x to denote the initial value and let x' be the final value of an integer variable, the only one that need concern us now. Let the specification say that the value of the variable must be increased which is always false. No computer program (or anything else) can ever produce an answer which has a property P as well as its negation. So this predicate is not a program, and no processor could translate it into one which gives an answer with this self-contradictory property. Any theory which ascribes to an implement able program a behaviour which is known to be unimplementable must itself be incorrect. A programming language can therefore be identified with only a subset of the predicates of predicate calculus; each predicate in this subset is a precise description of all possible behaviours of some program expressible in the language. The subset is designed to exclude contradictions and all other unimplementable predicates; and the notations of the language are carefully restricted to maintain this exclusion. For example, predicates in PROLOG are restricted to those which are definable by Horn clauses; and in conventional languages, the restrictions are even more severe. In principle, these gross restrictions in expressive power make a programming language less suitable as a notation for describing requirements in a modular fashion at an appropriately high level of abstraction. The gap between a specification language and a programming language is one that must be bridged by the skill of the programmer. Given specification S, the task is to find a program P which satisfies it, in the sense that every possible observation of every possible behaviour of the program P will be among the behaviours described by (and therefore permitted by) the specification S. In f- P =? S. A simple explanation of what it means for a program to meet its specification is one of the main reasons for interpreting both programs 'and specifications within the predicate calculus. Now we can explain the necessity of excluding the contradictory predicate false from a programming notation. It is a theorem of elementary logic that f- false =? S, so false enjoys the miraculous property of satisfying every specification whatsoever. Such miracles do not exist; which is fortunate, because if they did we would never need anything else, certainly not programs nor programming languages nor computers nor fifth generation computer projects. 2 Examples S = (x' > x) Let the program add one to x P = (x:= x + 1) The behavioural reading of this program as a predicate describing its effect is P = (x' = x + 1) i.e., the final value of x is one more than its initial value. Every observation of the behaviour of P in any possible initial state x will satisfy this predicate. Consequently the Validity of the implication i.e., f- x' = x + 1 =? x' > x will ensure that P correctly meets its specification. So does the program x:=x+7, 213 but not x:= 2 xx. To illustrate the generality of my philosophy, my next examples will be drawn from the design of combinational hardware circuits. These can also be interpreted as predicates. A conventional and-gate with two input wires named a and b and a single output wire named x is described by a simple equation x = a A b. The values of the three free variables are observed as voltages on the named wires at the end of a particular cycle of operation. At that time, the voltage on the output wire x is the lesser of the voltages on the input wires a and b. Similarly, an or-gate can be described by a different predicate with different wires interest to the user of the product, and even the option of observing them is removed by enclosure, as it were, in a black box. The variables therefore need to be hidden or removed or abstracted from the predicate describing the observable behaviour of the assembly; and the standard way of eliminating free variables in the predicate calculus is by quantification. In the case of engineering designs, existential quantification is the right choice. It is necessary that there exist an observable value for the hidden variable; but no one cares exactly what value it is. A formal justification is as follows. Let S be the specification for the program P, and let x be the variable to be hidden in P. Clearly, one could never wish to hide a variable which is mentioned in the specification, so clearly x will not occur free in S. Now the designer's original proof obligation without hiding is ~ d = y V c, P S; =} and the proof obligation after hiding is i.e., the voltage on d is the greater of those on y and c. A simple wire is a device that maintains the same voltage at each of its ends, for example x = y. Now consider an .assembly of two components operating in parallel, for example the and-gate together with the or-gate. The two predicates describing the two components have no variables in common; this reflects the fact that there is absolutely no connection between them. Consequently, their simultaneous joint behaviour consists solely of their two independent behaviours, and is correctly described by just the conjunction of the predicates describing their separate behaviours (x=aAb) & (d=yVc) This simple example is a convincing illustration of the principle that parallel composition of components is nothing but conjunction of their predicates, at least in the case when there is no possibility of interaction between them. The principle often remains valid when the components are connected by variables which they share. For example, the wire which connects x with y can be added to the circuit, giving a triple conjunction (x = a A b) & (x = y) & (d = (y V c)). This still accurately describes the behaviour of the whole assembly. The predicate is mathematically equivalent to (d = (a A b) V c) & (x = y = (a A b)). When components are connected together in this way by the sharing of variable names (x and y), the values of the shared variables are usually of no concern or ~ (jx.P) =} S. By the predicate calculus, since x does not occur in S, these two proof obligations are the same. But often quantification simplifies, as in our hardware example, where the formula jx, y. x =a Ab & y =x & d = y V c, reduces to just d = (a A b) V c. This mentions only the visible external wires of the circuit, and probably expresses the intended specification of the little assembly. Unfortunately, not all conjunctions of predicates lead to implement able designs. Consider for example the conjunction of a negation circuit (y = -,x) with the wire (y = x), connecting its output back to its input. In practice, this assembly leads to something like an electrical short circuit, which is completely useless - or even worse than useless, because it will prevent proper operation of any other circuit in its vicinity. So there is no specification (other than the trivial specification true) which a short-circuited design can reasonably satisfy. But in our oversimplified theory, the predicted effect is exactly the opposite. The predicate describing the behaviour of the circuit is a self-contradiction, equivalent to false, which is necessarily unimplementable. One common solution to the problem is to place careful restrictions on the ways in which components can be combined in parallel by conjunction. For example, in combinational circuit design, it is usual to make a rigid distinction between input wires (like a or c) and output wires (like x or d). When two circuits are combined, the output wires of the first of them are allowed to be connected to the input wires of the second, but never 214 the other way round. This restriction is the very one that turns a parallel composition into one of its least interesting special cases, namely sequential composition. This means that the computation of the outputs of the second component has to be delayed until completion of the computation of the outputs of the first component. Another solution is to introduce sufficient new values and variables into the theory to ensure that one can describe all possible ways in which an actual product or assembly can go wrong. In the example of circuits, this requires at least a three-valued logic: in addition to high voltage and low voltage, we introduce an additional value (written 1-, and pronounced "bottom"), which is observed on a wire that is connected simultaneously both to high voltage and to low voltage, i.e., a short circuit. We define the result of any operation on 1to give the answer 1-. Now we can solve the problem of the circuit with feedback, specified by the conjunction x = -'y & y =x In three-valued logic, this is no longer a falsehood: in fact it correctly implies that both the wires x and yare short circuited x = y = 1-. The moral of this example is that predicates describing the behaviour of a design must also be capable of describing all the ways in which the design may go wrong. It is only a theory which correctly models the possibility of error that can offer any assistance in avoiding it. If parallelism is conjunction of predicates, disjunction is equally simply explained as introducing non-determinism into specifications, designs and implementations. If P and Q are predicates, their disjunction (P V Q) describes a product that may behave as P or as Q, but does not determine which it shall be. Consequently, you cannot control or predict the result. If you want (P V Q) to satisfy a specification S, it is necessary (and sufficient) to prove both that P satisfies S and that Q satisfies S. This is exactly the defining principl~ of disjunction in the predicate calculus: it is the least upper bound of the implication ordering. This single principle encapsulates all you will ever need to know about the traditionally vexatious topic of non-determinism. For example, it follows from this principle that non-deterministic specifications are in general easier to implement, because they offer a range of options; but non-deterministic implementations are more difficult to use, because they meet only weaker specifications. Apart from conjunction (which can under certain restrictions be implemented by parallelism), and disjunction (which permits non-deterministic implementation), the remaining important operator of the predicate calculus is negation. What does that correspond to in programming? The answer is: nothing! Arguments about computahility show that it can never be implemented, because the complement of a recursively enumerable set is not in general recursively enumerable. A commonsense argument is equally persuasive. It would certainly be nice and easy to write a program that causes an explosion in the process which it is supposed to control. It would be nice to get a computer to execute the negation of this program, and so ensure that the explosion never occurs. Unfortunately and obviously this is impossible. Negation is obviously the right way to specify the absence of explosion, but it cannot be used in implementation. That is one of the main reasons why implementation is in principle more difficult than specification. Of course, negation can be used in certain parts of programs, for example, in Boolean expressions: but it can never be used to negate the program as a whole. We will see later that PROLOG negation is very different from the kind of Boolean negation used in specifications. The most important feature of a programming lan- . guage is recursion. It is only recursion (or iteration, which is a special case) that permits a program to be shorter than its execution trace. The behaviour of a program defined recursively can most simply be described by using recursion in the definition of the corresponding predicate. Let P(X) be some predicate containing occurrences of a predicate variable X. Then X can be defined recursively by an equation stating that X is a fixed point of P X ~ P(X). But this definition is meaningful only if the equation has a solution; this is guaranteed by the famous Tarski theorem, provided that P(X) is a monotonic function of the predicate variable X. Fortunately, this fact is guaranteed in any programming language which avoids nonmonotonic operators like negation. If there is more than one solution to the defining equation, we need to specify which one we want; and the answer is that we want the weakest solution, the one that is easiest to implement. (Technically, I have assumed that the predicate calculus is a complete lattice: to achieve this I need to embed it into set theory in the obvious way .) The most characteristic feature of computer programs in almost any language is sequential composition. If P and Q are programs, the notation (P,Q) stands for a program which starts like P; but when P terminates, it applies Q to the results produced by P. In a conventional programming language, this is easily defined in predicate notation as relational composition, using conjunction followed by hiding in exactly the same way as our earlier combinational circuit. Let x stand for an observation of the initial state of all variables of a program, and let x' stand for the final state. Either or both of these may take the special value 1-, standing for nontermination or infinite failure, which is one of the worst ways in which a program can go wrong. Each program is a predicate P(x,x') or Q(x,x'), describing a relation 215 between the initial state x and the final state x'. For example, there is an identity program II (a null operation), which terminates without making any change to its initial state. But it can do this only if it starts in a proper state, which is not already failed II ~f (X =I ..L => x' = X). Sequential composition of P and Q in a conventional language means that the initial state of Q is the same as the final state produced by P; however the value of this intermediate state passed from P to Q is hidden by existential quantification, so that the only remaining observable variables are the initial state of P and the final state of Q. More formally, the composition (P, Q) is a predicate with two free variables (x and x') which is defined in terms of P and Q, each of which are also predicates with two free variables (P, Q)(x, x') ~f 3y. P(x, y) & Q(y, x'). Care must be taken in the definition of the programming language to ensure that sequential composition never becomes self-contradictory. A sufficient condition to achieve this is that when either x or x' take the failure value ..L, then the behaviour of the program is entirely unpredictable: anything whatsoever may happen. The condition may be formalised by the statement that for all predicates P which represent a program \lx'.P(..L, x') Also, composition is associative; to follow the pair of operations (P, Q) by R is the same as following P by the pair of operations (Q, R) ((P,Q),R) 3 = (P, (Q,R)). PROLOG In its procedural reading, a PROLOG program also has an initial state and a result; and its behaviour can be described by a predicate defining the relation between these two. Of course this is quite different from the predicate associated with the logical reading. It will be more complicated and perhaps less attractive; but it will have the advantage of accurately describing the behaviour of a computer executing the program, while retaining the possibility of reasoning logically about its consequences. The initial state of a PROLOG program is a substitution, which allocates to each relevant variable a symbolic expression standing for the most general form of value which that variable is known to take. Such a substitution is generally called O. The result 0' of a PROLOG program differs from that of a conventional language. It is not a single substitution, but rather a sequence of answer substitutions, which may be delivered one after the other on request.. For example, the familiar PROLOG program append (X, Y, Z) and \Ix. P(x,..L) => \lx'.P(X, x'). The imposition of this condition does complicate the theory, and it requires the theorist to prove that all programs expressible in the notations of the programming language will satisfy it. For example, the null operation II satisfies it; and for any two predicates P and Q which satisfy the condition, so does their sequential composition (P, Q), and their disjunction P V Q, and even their conjunction (P " Q), provided that they have no variables in common. As a consequence any program written only in these restricted notations will always satisfy the required conditions. Such programs can therefore never be equivalent to false, which certainly does not satisfy these conditions. The only reason for undertaking all this work is to enable us to reason correctly about the properties of programs and the languages in which they are written. The simplest method of reasoning is by symbolic calculation using algebraic equations which have been proved correct in the theory. For example, to compose the null operation II before or after a program P does not change P. Algebraically this is expressed in a law stating that II is the unit of sequential composition (P, IT) = P = (IT, P). may be started in the state Z = [1,2]. It will then produce on demand a sequence of three answer states X X X [], Y [1], [1,2], Y Y [1,2] [2] [ ]. Infinite failure is modelled as before by the special state ..L; when it occurs, it is always the last answer in the sequence. Finite failure is represented by the empty sequence [ ]; and the program NO is defined as one that always fails in this way NO(O,O' ) d:1 (0 =I ..L => Of = [ ]). The program that gives an affirmative answer is the program YES; but the answer it gives is no more than what is known already, packaged as a sequence with only one element Y ES(O, 0' ) ~f (0 = .L => 0' = [0]). 216 A guard in PROLOG is a Boolean condition b applied to the initial state () to give the answer YES or NO b((), e') ~ V e' = [e] & (M) e' = [ ] & (oM). Examples of such conditions are VAR and NONVAR. The effect of the PROLOG or(P; Q) is obtained by just appending the sequence of answers provided by the second operand Q to the sequence provided by the first operand P; and each operand starts in the same initial state (P; Q)((), ()') ~f :lX, Y. P((), X) & Q((), Y) & append(X, Y, e). The definition of append is the same as usual, except for an additional clause which makes the result of infinite failure unpredictable append ([.1], Y, Z) append ([ ], Y, Y) append ([XIX s], Y, [XIZs]) :- append (Xs,Y,Zs). In all good mathematical theories, every definition should be followed by a collection of theorems, describing useful properties of the newly defined concept. Since NO gives no answer, its addition to a list of answers supply by P can make no difference, so NO is the unit of PROLOG semicolon NO;P = P = P;NO. and concat ([ ], [ ]) con cat ([XIXs]' Z) :- append (X, Y, Z) & concat(X s, Y) The idea is much simpler than its formal definition; its simplicity is revealed by the algebraic laws which can be derived from it. Like composition in a conventional language, it is associative and has a unit YES P, (Q,R) = (P, Q),R (YES,P) = P But if the first argument fails finitely, so does its composition with anything else (NO,P) = P; (Q; R). The PROLOG conjunction is very similar to sequential composition, modified systematically to deal with a sequence of results instead of a single one. Each result of the sequence X produced by the first argument P is taken as an initial state for an activation of the second argument Q; and all the sequences produced by Q are concatenated together to. give the overall result of the composition (P, Q)(e, e') ~f :lX, Y. pee, X) & & each (X, Y) con cat (Y, ()') where = NO. However (P, NO) is unequal to NO, because P may fail infinitely; the converse law therefore has to be weakened to an implication NO => (P,NO). Finally, sequential composition distributes leftward through PROLOG disjunction ((P; Q), R) = (P, R); (Q, R). But the complementary law of rightward distribution certainly does not hold. For example, let P always produce answer 1 and let Q always produce answer 2. When R produces many answers, (R, (P; Q)) produces answers Similarly, the associative property of appending lifts to the composition of programs (P; Q); R = (P,YES). 1,2,1,2, ... whereas (R, P); (R, Q) produces 1,1,1, ... ,2,2,2 .... Many of our algebraic laws describe the ways in which PROLOG disjunction and conjunction are similar to their logical reading in a Boolean algebra; and the absence of expected laws also shows clearly where the traditionallogical reading diverges from the procedural one. It is the logical properties of the procedural reading that we are exploring now. The acid test of our procedural semantics for PROLOG is its ability to deal with the non-logical features like the cut (!), which I will treat in a slightly simplified form. A program that has been cut can produce at most one result, namely the first result that it would have produced anyway each ([ ], [ ]) each ([XIXs], [YIY s]) '- Q(X, Y) & each (Xs, Ys) P!((), ()') ~f :lX. pee, X) & trunc (X, ()'). 217 The truncation operation preserves both infinite and finite failure; and otherwise selects the first element of a sequence trunc ([1-], Y) trunc ([ ], [ ]) trunc ([XIX s], [Xl). A program that already produces at most one result is unchanged when cut again P!!=P! If only one result is wanted from a composite program, then in many cases only one result is needed from its components (P; Q)! = (P!; Q!)! (P, Q)! = (P, Q!)! Finally, YES and NO are unaffected by cutting YES! = YES, NO! = NO. PROLOG negation is no more problematic than the cut. It turns a negative answer into a positive one, a non-negative answer into a negative one, and preserves infinite failure rv P((}, 0') *! 3Y. P((}, Y) & neg (Y,O') where A striking difference between PROLOG negation and Boolean negation is expressed in the law that the negation of an infinitely failing program also leads to infinite failure rv true = true. This states that true is a fixed point of negation; since it is the weakest of all predicates, there can be no fixed point weaker than it (J.LX. rv = X) true. This correctly predicts that a program which just calls its own negation recursively will fail to terminate. That concludes my simple account of the basic structures of PROLOG. They are all deterministic in the sense that (in the absence of infinite failure) for any given initial substitution (), there is exactly one answer sequence ()' that can be produced by the program. But the great advantage of reading programs as predicates is the simple way in which non-determinism can be introduced. For example, many researchers have proposed to improve the sequential or of PROLOG. One improvement is to make it commute like true disjunction, and another is to allow parallel execution of both operands, with arbitrary interleaving of their two results. These two advantages can be achieved by the definition (PIIQ)(O,(}') *! 3X, Y. P(O,X) & Q(O, Y) & inter (X, Y, 0') where the definition of interleaving is tedious but routine neg ([1-], Z) neg ([ ], [OJ) neg([XIXs],[ l). The laws governing PROLOG negation of truth values are the same as those for Boolean negation rv YES = NO and rv NO = YES. The classical law of double negation has to be weakened to intuitionistic triple negation rvrvrv P = rv P. Since a negated program gives at most one answer, cutting it makes no difference Finally, there is an astonishing analogue of one of the familiar laws of de Morgan rv (P; Q) = (rv P, rv Q). The right hand side is obviously much more efficient to compute, so this law could be very effective in optimisation. The dual law, however, does not hold. inter inter inter inter ([1-], Y, Z) inter (X, [1-], Z) inter (X, [ ], X) ([ ], Y, Y) ([XIXs]' Y, [XIZ] :- inter (Xs, Y, Z) (X, [YIY s], [YIZ]) :- inter (X, Y s, Z). Because appending is just a special case of interleaving, we know append(X, Y, Z) =} inter (X, Y, Z). Consequently, sequential or is just a special case of parallel or, and is always a valid implementation of it (P; Q) =} (PIIQ). The left hand side of the implication is more deterministic than the right; it is easier to predict and to control; it meets every specification which the right hand side also meets, and maybe more. In short, sequential or is in all ways and in all circumstances better than the parallel or - in all ways except one: it may be slower to implement on a parallel machine. In principle nondeterminism is demonic; it never makes programming easier, and its only possible advantage is an increase in performance. However, in many cases (including this one) non-determinism also simplifies specifications and 218 designs, and facilitates reasoning about them at higher levels of abstraction. My final example is yet another kind of disjunction, one that is characteristic of a commit operation in a constraint language. The answers given are those of exactly one of the two alternatives, the selection being usually non-deterministic: the only exception is in the case when one of the operands fails finitely, in which case the other one is selected. So the only case when the answer is empty is when both operands give an empty answer ( P ~ Q) ~f = [ ]) # [ ]) & P & Q) & (P V Q)) V P(O, -1.) V Q(O, -1.). ( (0' V ((0' (The last two clauses are needed to satisfy the special con di tions described earlier). The definition is almost identical to that of the alternative command in Communicating Sequential Processes, from which I have taken the notation. It permits an implementation which starts executing both P and Q in parallel, and selects the one which first comes up with an answer. If the first elements of P and Q are guards, this gives the effect of flat Guarded Horn Clauses. 4 Conclusion In all branches of applied mathematics and engineering, solutions have to be expressed in notations more restricted than those in which the original problems were formulated, and those in which the solutions are calculated or proved correct. Indeed, that is the very nature of the problem of solving problems. For example, if the problem is • Find the GCD of 3 and 4 a perfectly correct answer is the trivially easy one • the GCD of 3 and 4; but this does not satisfy the implicit requirement that the answer be expressed in a much more restricted notation, namely that of numerals. The proponents of PROLOG have found an extremely ingenious technique to smooth (or maybe obscure) the sharpness of the distinction between notations used for specification and those used for implementation. They actually use the same PROLOG notation for both purposes, by simply giving it two different meanings: a declarative meaning for purposes of specification, and a procedural meaning for purposes of execution. In the case of each particular program the programmer's task is to ensure that these two readings are consistent. Perhaps my investigation of the logical properties of the procedural reading will assist in this task, or at least explain why it is such a difficult one. Clearly, the task would be simpler in a language in which the logical and procedural readings are even closer than they are in PROLOG. This ideal has inspired many excellent proposals in the development of logic and constraint languages. The symmetric parallel version of disjunction is a good example. A successful result of this research is still an engineering compromise between the expressive power needed for simple and perspicuous specification, and operational orientation towards the technology needed for cost-effective implementation. Such a compromise will (I hope) be acceptable and useful, as PROLOG already is, in a wide range of circumstances and applications. In the remaining cases, I would like to maintain as far as possible the inspiration of the Fifth Generation Computing project, and the benefits of a logical approach to programming. To achieve this, I would give greater freedom of expression to those engaged in formalisation of the specification of requirements, and greater freedom of choice to those engaged in the design of efficiently implementable programming languages. This can be achieved only by recognition of the essential dichotomy of the languages used for these two purposes. The dichotomy can be resolved by embedding both languages in the same mathematical theory, and using logical implication to establish correctness. But what I have described is only the beginning,nothing more than a vague pointer to a whole new direction and method of research into programming languages and programming methodology. If any of my audience is looking for a challenge to inspire the next ten years of research, may I suggest this one? If you respond to the challenge, the programming languages of the future will not only permit efficient parallel and even non-deterministic implementations; they will also help the analyst more simply to capture and formalise the requirements of clients and customers; and then help the programmer by systematic design methods to exercise inventive skills in meeting those requirements with high reliability and low cost. I hope I have explained to all of you why I think this is important and exciting. Thank you again for this opportunity to do so. Acknowledgements I am grateful to Mike Spivey, He Jifeng, Robin Milner, John Lloyd, and Alan Bundy for assistance in preparation of this address. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE O~ FIFTH GENERATION COMPUTER SYSTEMS 1992, edIted by ICOT. © ICOT, 1992 219 PANEL: A Springboard for Information Processing in the 21st Century Chairman: Robert A. Kowalski Imperial College of Science, Technology and Medicine Department of Computing, 180 Queen's Gate, London SW7 2BZ, England rak@doc.ic.ac.uk In general terms, the question to be addressed by the panel is simply whether the Fifth Generation technologies, developed at ICOT and other centres throughout the world, will lead the development of information processing in the next century. Considered in isolation, the most characteristic of these technolOgies are: • • • knowledge information processing applications, concurrent and constraint logic programming languages, and parallel computer architectures. But it is the integration of these technologies, using logic programming to implement applications, and using multiple instruction, multiple data (MIMD) parallelism to implement logic programming, which is the most distinguishing characteristic of the Fifth Generation Project. To assess the future prospects of the Fifth Generation technologies, we need to consider the alternatives. Might multi-media, communications, or data process'ing, for example, be more characteristic than artificial intelligence of the applications of the future? Might objectorientation be more characteristic of the languages; and sequential, SIMD, MISD, or massively parallel connectionist computers be more typical of the computer architectures? Certainly many of these technologies have been flourishing during the last few years. Old applications still seem to dominate computing, at the expense of new Artificial Intelligence applications. Object-orientation has emerged as an alternative language paradigm, apparently better suited than logic programming for upgrading existing imperative software. Both conventional and radically new connectionist architectures have made rapid progress, while effective MIMD architectures are only now beginning to appear. But it may be wrong to think of these alternatives as competitors to the Fifth Generation technologies. Advanced database and data processing systems increasingly use Artificial Intelligence techniques for knowledge representation and reasoning. Increasingly many database and programming language systems have begun to combine features of object-orientation and logic programming. At the level of computer architectures too, there seems to be a growing consensus that connectionism complements symbolic processing, in the same way that subsymbolic human perception complements higherlevel human reasoning. But, because it provides the crucial link between applications and computer architectures, it is with the future of computer languages that we must be most concerned. The history of computer languages can be viewed as a slow, but steady evolution away from languages that reflect the structure and behaviour of machines to languages that more directly support human modes of communication. It is relevant to the prospects of logic programming in computing, therefore, that logic programming has begun to have a great influence, in recent years, on models of human languages and human reasoning outside computing. This influence includes contributions to the development of logic itself, to the development of "logic grammars" in computational linguistics, to the modelling of common sense and non-monotonic reasoning in cognitive science, and to the formalisation of legal language and legal reasoning. Thus, if computer languages in the future continue to become more like human languages, as they have in the past, then the future of logic programming in computing, and the future impact of the Fifth Generation technologies in general, must be assured. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 220 Finding the Best Route for Logic Programming Herve Oallaire OSI 25 Bd de l' Amiral Bruix, 75782 Paris Cedex 16 France gaUaire@ gsi.fr Abstract The panel chainnan has asked us to deal with two questions relating Logic Programming (LP) to computing. They have to do with whether LP is appropriate (the most appropriate?) as a springboard for computing as a whole in the 21st century, or whether it is so only for aspects (characteristics) of computing. I do not think that there is a definite answer to these questions until one discusses the perspective from which they are asked or from which their answer is to be given. In summary, we can be very positive that LP will play an important role, but only if it migrates into other leading environments. 1 Which Perspective To Look From We are asked to talk about directions for the future, for research as well as for development. Clearly, for me, there will not be a Yes/No answer to the questions debated on this panel. I don't shy away, but at the same time there are too many real questions behind the ones we are asked to address. Thus, I will pick the one aspects I am mostly connected to: research on deductive databases and constraint languages, experience in commercial applications development and in building architectures for such commercial developments. Whether these different perspectives lead to a coherent picture is questionable. If I ask the question relative to computing as a whole, I can ask it from the perspective of a researcher, from that of a manufacturer, from that of a buyer of such systems and ask whether LP is the pervasive paradigm that will be the underlying foundation of computing as a whole from each of these perspectives. If I ask the question relative to the characteristics of computing, I can look at computing from the perspective of an end-user, of an application developer (in fact from many such applications, e.g. scientific, business, CAD, decision support, teaching, office support, ... ), of a system developer, of a language developer, of an architecture engineer, of a tool developer (again there are many such tools, e.g. software engineering, application analyst, etc). I can even look at it from the perspective of research in each of the domains related to the perspectives just listed, for example a researcher in user interface systems, a researcher in software engineering, in database languages, in knowledge representation, etc. But the picture is even more complicated than it appears here; indeed I can now add a further dimension to the idea of perspective elaborated upon here. Namely I can ask whether LP is to be seen as the "real thing" or whether it is to be an abstract model essentially. For example, ask whether it is a good encompassing model for all research aspects of computing, for some of them (the perspectives), whether it is a good abstract model for computations, for infonnation systems, for business models, even if they do not appear in this fonn to their users, this being asked for each type of computation carried out in a computing system. Looking at these questions is to study whether LP should be the basis of the view of the world as manipulated at each or some of the following levels: user's level, at system level, at application designer level, at research level, ... or whether it should only be a model of it, i.e. a model in which they basic problems of the world (at that level) are studied, and that the two would match only in some occasions. 2 Global Perspective I think we have to recognise that the world is defini tely never going to be a one level world (ie providing in hardware a direct implementation of the world view); second that the world view will be made of multiple views; third we have to' accept that different views will need different tools to study a version of a problem at that level; and fourth that it may be appropriate to use abstractions to get the appropriate knowledge into play. Consequently, neither LP nor any other paradigm will be the underlying foundation for computing; it is very appropriate however, for each paradigm to ask what its limits are. This is what has been my understanding of most projects around LP in the past ten to fifteen years; trying several angles, pushing to the limits. Developing hardware for example is one such worthwhile effort. 221 3 Model and Research Perspective As a model of computing, from a research perspective, LP will continue to develop as the major candidate for giving a "coherent" view of the world, a seamless integration of the different needs of a computing system for which it has given good models. To come to examples, I believe that LP has made major contributions in the following areas: rule-based programming, with particularly results on deductive databases, on problem solving and AI, specific logics for time and belief, sol utions to problems dealing with negation, to those dealing with constraint programming and to those dealing with concurrent programming. It will continue to do so for quite some time. In some cases it will achieve a dominant position; in others it will not, even if it remains a useful formalism. In the directions of research at ECRC, we have not attempted to get such a unified framework, even though we have tried to use whatever was understood in one area of research into the others (eg, constraints and parallelism, constraints and negation, .. ). LP will not achieve the status of being the unique encompassing model adopted by everyone. Indeed, there are theoretical reasons that have to do with equivalence results and the human intelligence which makes it very unlikely that a given formalism will be accepted as the unique formalism to study. Further, there is the fact that the more we study, the more likely it is that we have to invent formalisms at the right level of abstraction for the problems at hand. Mapping this to existing formalisms is often possible but cumbersome. This has the side advantage that formalisms evolve as they target new abstractions; LP has followed that path. 4 need to keep the continuity with the so-called legacy applications is perhaps even stronger than fads. To propose a new computing paradigm to affect the daily work of any of the professionals (whether they develop new code or use it) is a very risky task. C++ is a C based language; we have no equivalent to a Cobol based logic language. And still, c++ is not a pure object oriented language. The reason why the entity-relationship modeling technique (and research) is successful in the business place is that it has been seen as an extension of the current practices (Cobol, relational) not as a rupture with them. SQL is still far from incorporating extensions that LP can already provide but has not well explained: where are the industrial examples of the recursive rules expressed in LP? what is the benefit (cost, performance, ... ) of stating business rules this way as opposed to programming; and without recursion, or with limited deductive capabilities, relational systems do without logic or just borrow from it; isn't logic too powerful a formalism for many cases? LP, just like AI based technology, has not been presented as an extension of existing engines; rather it has been seen as alternatives to existing solutions, not well integrated with them; it has suffered, like AI from that situation. Are there then new areas where LP can take a major share of the solution space? In the area of constraint languages, there is no true market yet for any such language; and the need for integration to the existing environments is of a rather different nature; it may be sufficient to provide interfaces rather than integration. A not to be overlooked problem, however is that when it is embedded in logic, constraint programming needs a complex engine, that of logic; when it is embedded in C, even if it is less powerful or if it takes more to develop it (hiding its logical basis in some sense), it will appear less risky to the industrial partners who will use or build it. Commercial Perspective As a tool for computing in general, from a business or manufacturer's point of view LP has not achieved the status that we believed it would. Logic has found, at best, some niches where it can be seen as a potential commercial player (there are many Prolog programs embedded in several CASE tools for example, natural language tools are another example). When it comes to the industrial or commercial world things are not so different from those in the academic or research world: the resistance to new ideas is strong too, although for different reasons. Research results being very often unconc1usive when it comes to their actual relevance or benefits in practical terms, only little risk is taken. Fads play an important role in that world where technical matters are secondary to financial or management matters; the object technology is a fad, but fortunately it is more than that and will bring real benefits to those adopting it. We have not explained LP in terms as easy to understand as done in the object world (modularity, encapsulation in particular). The 5 More Efforts Needed Let me mention three areas where success can be reached, given the current results, but where more efforts are needed. Constraint based packages for different business domains, such as transportation, job shop scheduling, personnel assignment, etc will be winners in a competitive world; but pay attention to less ambitious solutions in more traditional languages. Case tools and repositories will use heavily logic based tools, particularly the deductive database technology, when we combine it with the object based technology for what each is good at. Third and perhaps more importantly, there is a big challenge to be won. My appreciation of computing evolution is as follows: there will be new paradigms in terms of "how to get work done by a computer"; this will revolve around some simple notions: applications and systems will be packaged as objects and run as distributed object systems, communicating through messages and events; forerunners of these technologies can 222 already be seen on the desktop (not as distributed tools), such as AppleEvents, OLE and VisualBasic, etc and also in new operating systems or layers just above them (e.g. Chorus, CORBA, NewVave, etc). Applications will be written in whatever formalism is most appropriate for the task they have to solve, provided they offer the right interface to the communication mechanism; these mechanisms will become standardised. I believe that concurrent and constraint logic languages have a very important role to play in expressing how to combine existing applications (objects, modules). If LP had indeed. the role of a conductor for distributed and parallel applications, it would be very exciting; it is possible. Following a very similar analysis, I think that LP rules, particularly when they are declaratively used, are what's needed I to express business rules relating and coordinating business objects as perceived by designers and users. To demonstrate these ideas, more people, knowledgeable in LP need to work on industrial strength products, packaging LP and its extensions and not selling raw engines; then there will be more industrial interest in logic, which in the longer term will trigger and guarantee adequate research levels. This is what I have always argued was needed to be done as a follow up of ECRC research, but I have not argued convincingly. This is what is being done with much enthusiasm by start ups around Prolog, by others around constraint languages, e.g. CHIP; this is what is being started by BULL on such a deductive and object oriented system. Much more risk taking is needed. 6 Conclusion From the above discussion the reader may be left with a somewhat mixed impression as to the future of our field; this is certainly not the intent. The future is bright, provided we understand where it lies. LP will hold its rank in the research as well as in the professional worlds. The major efforts that the 1980's have seen in this domain have played an essential role in preparing this future. The Japanese results, as well as the results obtained in Europe and in the USA, are significant in terms of research and of potential industrial impact. The major efforts, particularly the most systematic one, namely the Fifth Generation Project, may have had goals either too ambitious or not thoroughly understood by many. If we understand where to act, then there is a commercial future for logic. At any rate, research remains a necessity in this area. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 223 The Role of Logic Programming in the 21st Century Ross Overbeek Mathematics and Computer Science Division Argonne National Laboratory, Argonne, Illinois 60439 overbeek@mcs The Changing Role 1 Logic programming currently plays a relatively minor role in scientific problem-solving. Whether this role will increase in the twenty-first century hinges on one question: When will there be a large and growing number of applications for which the best available software is based on logic programming? It is not enough that there exist a few small, peripheral applications successfully based on logic programming; the virtues of logic programming must be substantial enough to translate into superior applications programs. 1.1 Applications Applications based on logic programming are starting to emerge in three distinct contexts: 1. Applications based on the expressi ve power of logic programming in the dialects that support backtracking and the use of constraints. These applications frequently center on small, but highly structured databases and benefit dramatically from the ability to develop prototypes rapidly. 2. Parallel applications in which the expressive power of the software environment is the key issue. These applications arise from issues of real-time control. As soon as the performance adequately supports the necessary response, the simplicity and elegance of the solution become most important. In this context, dialects of committed-choice logic programming have made valuable contributions. 3. Parallel applications in which performance is the dominant issue. In these applications, successful solutions have been developed in which the upper levels of the algorithm are all implemented in a committed-choice dialect, and the lower layers in C or Fortran. What is striking about these three contexts is that they have not been successfully addressed within a unified framework. Indeed, we are still far from achieving such a framework. 1.2 Unification of Viewpoints Will a successful unification based on logic programming and parallelism emerge as a dominant technology? Two distinct answers to this question arise: No, the use of logic programming will expand within the distinct application areas that are emerging. Logic programming will play an expanding role in the area of information processing (based on complex databases), which will see explosive growth in the Unix/C, workstation, mass software, and networking markets. On the other hand, logic programming will play quite a different role in the context of parallel computation. While an integration of the two roles is theoretically achievable, in practice it will not occur. Yes, a single technology will be adopted that is capable of reducing complexity. Developing such a technology is an extremely difficult task, and it is doubtful that integration could have proceeded substantially faster. Now, however, the fundamental insights required to achieve an integration are beginning to occur. The computational framework of the twenty-first century-a framework dominated by advanced automation, parallel applications, and distributed processing-must be based on a technology that allows simple software solutions. I do not consider these viewpoints to be essentially contradictory; there is an element of truth in each. It seems clear to me that the development and adoption of an integrated solution must be guided by attempts to solve demanding applications requirements. In the short run, this will mean attempts to build systems upon existing, proven technology. The successful development of a unified computational framework based on logic programming will almost certainly not occur unless there is a short-term effort that develops successful a.pplications for the current computing market. However, the complex automation applications that will characterize the next century simply cannot be adequately addressed from within a computational framework that fails to solve the needs of both distributed computation and knowledge information processing. The continued development of logic programming will require a serious effort to produce new solutions to sig- 224 nificant applications-solutions that are better than any existing solutions. The logic programming community has viewed its central goal as the development of an elegant computational paradigm, along with a demonstration that such a paradigm could be realized. Relatively few individuals have taken seriously the task of demonstrating the superiority of the new technology in the context of applications development. It is now time to change this situation. The logic programming community must form relationships with the very best researchers in important application areas, and learn what is required to produce superior software in these areas. 2 My Own Experiences parallel systems. We find ourselves limited, however, because of load-balancing problems, which are difficult to address properly in the context of the tools we chose. We are now rewriting the code using bilingual programming, with the upper levels coded in PCN (a la.nguage deriving much of its basic structure from committedchoice logic programming languages) and its lower levels in C. This app~oach will provide a framework for addressing the parallel processing issues in the most suitable manner, while allowing us to optimize the critical lower-level floating-point computations. This experience seems typical to me, and I believe that future systems will evolve to support this programming paradigm. 3 Summary Let me speak briefly about my own experiences in workBalance between the short-term view and the longering on computational problems associated with the analrange issues relating to an adequate integration is necesysis of biological genomes. Certainly, the advances in sary to achieve success for logic programming. The need molecular biology are leading to a wonderful opportuto create an environment to support distributed applicanity for mankind. In particular, computer scientists can make a significant contribution to our understanding of tions will grow dramatically during the 1990s. Exactly the fundamental processes that sustain life. Molecular biwhen a solution will emerge is still in doubt; however, it ology has also provided a framework for investigating the does seem likely that such an environment will become a utility of technologies like logic programming and parallel fundamental technology in the early twenty-first century. processing. Whether logic programming plays a central role will depend critically on efforts in Japan, Europe, and America I believe that the first successful integrated databases during this decade. If these efforts are not successful, less to support investigations of genomic data will be based elegant solutions will become adopted and entrenched. on logic programming. The reason is that logic programThis issue represents both a grand challenge and a ming offers the ability to do rapid prototyping, to integrand opportunity. Which approach will dominate in grate database access with computation, and to handle complex data. Other approaches simply lack the capa- . the next century has not yet been determined; only the significance of developing the appropriate technology is bilities required to develop successful genomic databases. Current work in Europe, Japan, and America on databases completely clear. to maintain sequence data, mapping data, and metabolic data all convince me that the best systems will emerge Acknowledgments from those groups who base their efforts on logic programming. This work was supported in part by the Applied MatheNow let me move to a second area-parallel processmatical Sciences subprogram of the Office of Energy Reing. Only a very limited set of applications really requires search, U.S. Department of Energy, under Contract 'ATthe use of parallel processing; however, some of these ap31-109-Eng-38. plications are of major importance. As an example, let me cite a project in which I was involved. Our group at Argonne participated in a successful collaboration with Gary Olsen, a biologist at the University of Illinois, and with Hideo Matsuda of Kobe University. Vve were able to create a tool for inferring phylogenetic trees using a maximum likelihood algorithm, and to produce trees that were 30-40 times more complex than any reported in the literature. This work was done using the Intel Touchstone DELTA System, a massively parallel system containing 540 i860 nodes. For a number of reasons, our original code was developed in C. We created a successful tool that exhibited the required performance on both uniprocessors and larger PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 225 Object-Based Versus Logic Programming Peter Wegner Brown University, Box 1910, Providence, RI. 02912 pw@cs.brown.edu Abstract: This position paper argues that mainstream application programming in the 21st century will be object-based rather than logic-based for the following reasons. 1) Object-based programs model application domains more directly than logic progmms. 2) Object-based programs have a more flexible program structure than logic programs. 3) Logic programs can be intractable, in part because the satisfiability problem is NP-complete. 4) Soundness limits the granularity of thinking while completeness limits its scope. 5) Inductive, abductive, probabilistic, and nonmonotonic reasoning sacrifice the certainty of deduction for greater heuristic effectiveness. 6) Extensions to deductive logic like nonmonotonic or probabilistic reasoning are better realized in a general computing environment than as extensions to logic programming languages. 7) Object-based systems are open in the sense of being both reactive and extensible, while logic programs are not reactive and have limited extensibility. 8) The don't-know nondeterminism of Prolog precludes reactiveness, while the don't-care nondeterminism of concurrent logic programs makes them nonlogical. 1. Modeling Power and Computability Object-based programs model application domains more directly than logic programs. Computability is an inadequate measure of modeling capability sfnce all programming languages are equivalent in their computing power. A finer (more discriminating) measure, called "modeling power", is proposed that is closely related to "expressive power", but singles out modeling as the specific form 'of expressiveness being studied. Features of object-based programming that contribute to its modeling power include: • assignment and object identity Objects have an identity that persists when their state changes. Objects with a mutable state capture the dynamically changing properties of real-world objects more directly than mathematical predicates of logic programs. • data abstraction by information hiding Objects specify the abstract properties of data by applicable operations without commitment to a data representation. Data abstraction is a more relevant form of abstraction for modeling than logical abstraction. • messages and communication Messages model communication among objects more effectively than logic variables. The mathematical behavior of individual objects can be captured by algebras or automoata, but communication and synchronization protocols actually used in practical object-based and concurrent systems have no neat mathematical models. These features are singled out because they cannot be easily expressed by logic programs. Shapiro [Shl] defines the comparative expressive power (modeling power) of two languages in terms of the difficulty of mapping programs of one language into the other. Language Ll is said to be more expressive than language L2 if programs of L2 can be easily mapped into those ofLI but the reverse mapping is difficult (according to a complexity metric for language mappings). The specification of comparative expressive power in terms of mappings between languages is not entirely satisfactory. For example, mapping assembly languages into problem-oriented languages is difficult because of lack of design rather than quality of design. However, when applied to two well-structured language classes like object-based and logic languages this approach does appear promising. Since logic programs have a procedural interpretation with goal atoms as procedure calls and logic variables as shared communication channels, logic programming can be viewed as a special (reductive) style of procedure-oriented programming. Though language features like nondeterminism, logic variables, and partially instantiated structures are not directly modeled, the basic structure of logic programs is procedural. In contrast, object-oriented programs in Smalltalk or C++ do not have a direct interpretation as logic programs, since objects and classes cannot be easily modeled. Computational objects that describe behavior by collections of operations sharing a hidden state cannot be easily mapped into logic program counterparts. 2. Limitations of Inference and Nondeterministic Control All deduction follows from the principle that if an element belongs to a set then it belongs to any superset. The Aristotelian syllogism "All humans are mortal, Socrates is human, therefore Socrates is mortal" infers that Socrates belongs to the superset of mortals from the fact that he belongs to the subset of humans. This problem can be specified in Prolog as follows: Prolog clause: mortal(x) f- human(x). Prolog fact: human(Socrates). Prolog goal: mortal(Socrates). The clause "mortal(x) f- human(x)", which specifies that the set of mortals is a superset of the set of humans, allows the goal "mortal(Socrates)" to be proved from the fact "human(Socrates)" . A Prolog clause of the form "P(x) lfQ(x)" asserts that the . set of facts or objects satisfying Q is a subset of those satisfying P, being equivalent to the assertion "For all x, Q(x) implies P(x)". A Prolog goal G(x) is true if there are facts in the data- 226 base that satisfy G by virtue of the set/subset relations implied by- the clauses of the Prolog program. Prolog resolution and unification allows the subset of all database facts satisfying G to be found by set/subset inference. Inferences of the form ltset(x) if subset(x)" are surprisingly powerful, permitting all of mathematics to be expressed in terms of set theory. But the exclusive use of "set if subset" inference for computation and/or thinking is unduly constraining, since both computation and thinking go beyond mere classification. Thinking includes heuristic mechanisms like generalization and free association that go beyond deduction. Nondeterminism is another powerful computation mechanism that limits the expressive power of logic programs. Prolog nondeterminically searches the complete goal tree for solutions that satisfy the goal. In a Prolog program with a predicate P appearing in the clause head of N clauses "P(Ai) ~ Bi" , a goal peA) triggers nondeterministic execution of those bodies Bi for which A unifies with Ai. This execution rule can be specified by a choice statement of the form: choice (Ai/Bi, A2/B2, ... , AN/BN) endchoice nondeterministically execute the bodies Bi of all clauses for which the clause head P(Ai} unifies with the goal peA). Bodies Bi are guarded by patterns Ai that must unify with A for Bi to qualify for execution. This form of nondeterminism is called don't-know nondeterminism because the programmer need not predict which inference paths lead to successful inference of the goal. Prolog programs explore all alternatives until a successful inference path is found and report failure only if no inference path allows the goal to be inferred. The order in which nondeterministic alternatives are explored is determined by the system rather than by the user, though the user can influence execution order by the order of listing alternatives. Depth-first search may cause unnecessary nonterminating computation, while breadth-first search avoids this problem but is usually less efficient. Prolog provides mechanisms like the cut which allows search mechanism's to be tampered with. This extra flexibility undermines the logical purity of Prolog programs. Sequential implementation of don't-know nondeterminism requires backtracking from failed inference paths so that the effects of failed computations become unobservable. Since pro. grams cannot commit to an observable output until a proof is complete, don't-know nondeterminism cannot be used as a computational model for reactive systems that respond to external stimuli and produce incremental output [Sb2]. 3. Intactability and Satis6ability Certain well-formulated problems like the halting problem for Turing machines are noncomputable. Practical computability ~s further restricted by the requirement of tractability. A problem IS tractable if its computation time grows no worse than polynomially with its size and intractable if its computation time grows at least exponentially. The class P of problems computable in polynomial time by a deterministic Turing machine is tractable, while the class NP of problems computable in polynomial time by a nondeterministic Turing machine has solutions checkable in polynomial time though it may take an exponential time to find them [GJ]. The question whether P = NP is open, but the current belief is that NP contains inherently intractable problems, like the satisfiability problem, that are not in P. The satisfiability problem is NP-complete; a polynomial time algorithm for satisfiability would allow all problems in NP to be solved in polynomial time. The fundamental problem of theorem proving, that of finding whether a goal can be satisfied, is therefore intractable unless it turns out that P = NP. The fact that satisfiability is intractable is not unacceptable especially when compared to the fact that computability is undecidable. But in practice exponential blowup arises more frequently in logic programming than undecidability arises in traditional programming. Sometimes the intractability is inherent in the sense that there is no tractable algorithm that solves the problem. But in many cases more careful analysis can yield a tractable algorithm. Consider for example the sorting problem which can be declaratively specified as the problem of finding an ordered permutation. sort(x} :- permutation(x}, ordered(x}. Direct execution of this specification requires n-factorial steps to sort n elements, while more careful anlysis yields algorithms like quicksort that require only n logn steps. High-level specifications of a problem by logic programs can lead to combinatorially intractable algorithms for problems that are combinatorially tractable when more carefully analyzed. The complexity of logic problem solving is often combinatorially unacceptable even when problems do have a solution. The intractability of the satisfiability problem causes some problems in artificial intelligence to become intractable when blindly reduced to logic, and provides a practical reason for being cautiousin the use of logic for problem solving. 4. Soundness, Completeness and Heuristic Reasoning Soundness assures the semantic accuracy of inference, requiring all provable assertions to be true, while completeness guarantees inference power, requiring all true assertions to be provable. However, soundness strongly constrains the granularitv of thinking, while completeness restricts its semantic scope. Sound reasoning cannot yield new knowledge; it can only make implicit knowledge explicit. Uncovering implicit knowledge may require creativity, for example when finding whether P = NP or Fermat's last theorem. But such creativity generally requires insights and constructions that go beyond deductive reasoning. The design and construction of software may likewise be viewed as uncovering implicit knowledge by creative processes that transcend deduction. The demonstration that a given solution is correct may be formally specified by "sound" reasoning, but the process of finding the solution is generally not deductive. Human problem solvers generally make use of heuristics that sacrifice soundness to increase the effectiveness of problem solving. McCarthy suggested supplementing formal systems by a heuristic advice taker as early as 1960 [OR], but this idea has not yet been successfully implemented, presumably because the mechanisms of heuristic problem solving are too difficult to automate. Heuristics that sacrifice soundness to gain inference power include inductive, abductive, and probabilistic forms of reasoning. Induction from a finite set of observations to a general law is central to empirical reasoning but is not deductively sound. Hume's demonstration that induction could not be justified by "pure reason" sent shock waves through nineteenth and twentieth century philosophy. Abductive explanation of effects by their potential causes is another heuristic that sacrifices soundness to permit plausible 227 though uncertain conclusions. Choice of the most probable explanation from a set of potential explanations is yet another form of unsound heuristic inference. Inductive, abductive, and probabilistic reasoning have an empirical justification that sacrifices certainty in the interests of common sense. Completeness limits thinking in a qualitatively different mann~r from soundness. Completeness constrains reasoning by commItment to a predefined (closed) domain of discourse. The requirement that all true assertions be provable requires a closed notion of truth that was shown by Godel to be inadequate for handling naturally occurring open mathematical domains like that of arithmetic. In guaranteeing the semantic adequacy of a set of axioms and rules of inference, completeness limits their semanti~ expressive~ess, making difficult any extension to capture a ncher semantIcs or refinement to capture more detailed semantic properties. Logic programs cannot easily be extended to handle nonformalized, and possibly nonformalizable, knowledge outside specific formalized domains. The notion of completeness for theories differs from that for logic; a theory is complete if it is sufficiently strong to determine the truth or falsity of all its primitive assertions. That is, if every ground atom of the theory is either true or false. Theories about observable domains are generally inductive or abductive generalizations from incomplete data that may be logically completed by uncertain assumptions about the truth or falsity of unobserved and as yet unproved ground atoms (facts) in the domain. For example, the closed-world assumption [GN] assumes that every fact not provable from the axioms is false. Such premature commitment to the falsity of nonprovable ground assertions may have to be revoked when new facts become known, thereby making reasoning based on the closedworld assumption nonmonotonic. Nonmonotonic reasoning is a fundamental extension that transforms logic into a more powerful reasoning mechanism. But there is a sense in which nonmonotonic reasoning violates the foundations of logic and may therefore be viewed as nonlogical. The benefits of extending logic to nonmonotonic reasoning must be weighed against the alternative of completely abandoning formal reasoning and adopting more empirical prinCiples of problem solving, like those of object-oriented programming. Attempts to generalize logic to nonmonotonic or heuristic reasoning, while intellectually interesting, may be pragmatically inappropriate as a means of increasing the power of human or computer problem solving. Such extensions to deductive logic are better realized in a general computing environment than as extensions to logic programming languages. Both complete logics and complete theories require an early commitment to a closed domain of discourse. While the closed-world assumption yields a different form of closedness than that of logical completeness or closed application programs, there is a sense in which these forms of being closed are related. In the next section the term open system is examined to characterize this notion as precisely as possible. 5. Open Systems A system is said to be an open system if its behavior can easily be modified and enhanced, either by interaction of the system with the environment or by programmer modification. 1. A reactive (interactive) system that can accept input from its environment to modify its behavior is an open system. 2. An extensible system whose functionality and/or number of components can be easily extended is an open system. Our definition includes systems that are reactive or extensi- b~e or both, reflecting the fact that a system can be open in many dIfferent ways. Extensibility can be intrinsic by interactive system evolution or extrinsic by programmer modification. Intrin- sic extensibility accords better with biological evolution and :-v ith human leami.ng and development, but extrinsic extensibility IS the more practIcal approach to software evolution. The following characterization of openness explicitly focuses on this distinction: 1. A system that can extend itself by interaction with its environment is an open system. 2. A system that can be extended by' programmer modification (usually because of its modularity) is an open system. Since extrinsic extensibility is extremely important from the point of view of cost-effective life-cycle management, it is viewed as sufficient to qualify a system as being open. While either one of these properties is sufficient to qualify a system as being open, the most flexible open systems are open in both these senses. Object-oriented systems are open systems in both the first and second senses. Objects are reactive server modules that accept messages from their environment and return a result. Systems of objects can be statically extended by modifying the behavior of already defined objects or by introducing new objects. Classes facilitate the abstract definition of behavior shared among a collection of objects, while inheritance allows new behavior to be defined incrementally in terms of how it modifies already defined behavior. Classes have the open/closed property [Me]; they are open when used by subclasses for behavior extension by inheritance, but are closed when used by objects to execute messages. The idea of open/closed subsystems that are both open for clients wishing to extend them and closed for clients wishing to execute them needs to be further explored. Logic languages exhibiting don't-know nondeterminism are not open in the first sense, while soundness and completeness restrict extensibility in the second sense. To realize reactive openness concurrent logic languages abandon don't-know nondeterminism in favor of don't-care nondeterminism, sacrificing logical completeness. Prolog programs can easily be extended by adding clauses and facts so they may be viewed as open in the second sense. But logical extension is very different from object-based extensibility by modifying and adding objects and classes. Because object-based languages directly model their domain of discourse, object-based extensibility generally reflects incremental extensions that arise in practice more directly than logical extension. 6. Don't-Care Nondeterminism Don't-care nondeterminism is explicitly used in concurrent languages to provide selective flexibility at entry points to modules. It is also a key implicit control mechanism for realizing selective flexibility in sequential object-based languages. Access to an object with operations opJ, op2, ... , opN is controlled by an implicit nondetermnistic select statement of the form: select (op1, op2, ... ,opN) endselect Execution in a sequential object-based system is deterministic from the viewpoint of the system as a whole, but is non- 228 deterministic from the viewpoint of each object considered as an isolated system. The object does not know which operation will be executed next, and must be prepared to select the next executable operation on the basis of pattern matching with an incoming message. Since no backtracking can occur, the nondeterminism is don't care (committed choice) nondeterminism. Concurrent porgramming languages like CSP and Ada have explicit don't care nondeterminism realized by guarded commands with guards Gi whose truth causes the associated body Bi to become a candidate for nondeterministic execution: select (GIIIBI, G211B2, ... ,GNlIBN) endselect The keyword select is used in place of the keyword choice to denote selective don't care nondeterminism. while guards are separated from bodies by II in place of I. Guarded commands, originally developed by Dijkstra, govern the selection of alternative operations at entry points of concurrently executable tasks. For example, concurrent access to a buffer with an APPEND operation executable when the buffer is not full and a REMOVE operation executable when the buffer is not empty can be specified as follows: select (notjullIlAPPEND, notemptyliREMOVE) endselect Monitors support unguarded don't-care nondeterminism at the module interface. Selection between APPEND and REMOVE operations of a buffer implemented by a monitor has the following implicit select statement: select (APPEND, REMOVE) endselect The monitor operations wait and signal on internal monitor queues notfull and notempty play the role of guards. Monitors decouple guard conditions from nondeterministic choice, gaining extra flexibility by associating guards with access to resources rather than with module entry. Consider a: concurrent logic program with a predicate P appearing in the head of N clauses of the form "P(Ai) ~ GiIIBi". A goal P(A) triggers nondeterministic execution of those bodies Bi for which A unifies with Ai and the guards Gi are satisfied. This execution rule can be specified by a select statement of the form: select ((AI ;GI )IIBI, (A2;G2)IIB2, ... , (AN;GN)IIBN) endselect Bi is a candidate jor execution if A unifies with Ai and Gi is satisfied Since no backtracking can occur once execution has committed to a particular select alternative, the nodeterminism is don't-care nondeterminism. However, don't care nondeterminism in concurrent logic languages is less flexible than in objectbased languages because data abstraction and object-based message communication is not supported. Don't-care nondeterminism is useful in realizing reactive flexibility, but is neither necessary nor sufficient for concurrent systems. Concurrent nonreactive systems for very fast computations are commonplace, while sequential object-based systems are reactive but not nonconcurrent. Reactiveness and concurrency are orthogonal properties of computing systems. Don't-care nondeterminism is primarily concerned with enhancing reactive flexibility and is not strictly necessary for concurrency. Nondeterministic selection is relatively complex because it combines merging of incoming messages from multiple sources with selection among alternative next actions by pattern matching. The essential nondeterminism in concurrent systems arises from uncertainty about the arrival order (or processing order) of incoming messages and is modeled by implicit nondeterministic merging of streams rather than by explicit selection. For example, the nondeterministic behavior of a bank account with $100.00 when two clients each attempt to withdraw $75.00 depends not on selective don't-care nondeterminism but simply on the arrival order of messages from clients. 7. Are Concurrent Logic Programs Nonlogical? Don't-care nondeterminism serves to realize reactive computations and also to keep the number of nondeterministic alternatives explored to a manageable size. But it may cause premature commitment to an inference path not containing a solution at the expense of paths that possibly contain solutions. Don'tcare nondeterminism is nonmonotonic since adding a rule may have the effect of preventing commitment to an already existing rule. Logic programs employing don't-care nondeterminism are incomplete in the sense that they may fail to prove true assertions that would have been derivable by don't-know nondeterminism from the same set of clauses. It becomes the responsibility of the programmer to make sure that programs do not yield different results for different orders of don't-care commitment. Under don't-care nondeterminism the result of a computation from a set of clauses depends on the order of don't-care commitment. This weakens the claim that concurrent logic languages are logical, reducing them to the status of ordinary programming languages. Clauses lose the status of inference rules, becoming mere computation rules. As hinted at in [Co], don't care nondeterminism takes the L out of LP, reducing logic programming to programming. The committed-choice inference paradigm loses the status of a proof technique and becomes a computational heuristic whose rules impose a rigid structure on both conceptualization and computation. Don't-know nondeterminism provides a computational model for logical inference, while don't-care nondeterminism models incremental, reactive computation, but sacrifices logical inference. Reactive systems are open systems in the sense that they may react to stimuli from the environment by returning results and changing their internal state. Objects are a prime example of reactive systems, responding interactively to messages they receive. The inability of don't-know nondeterminism to handle reactiveness is a serious weakness of both logic programs and deductive reasoning. The fundamental reason for this is the inability of inference systems to commit themselves to incremental output. While pure logic programming is incompatible with reactiveness it is definitely compatible with concurrency. The components of logical, expressions may be concurrently evaluated. Universal. and existential quantification, which is simply transfinite conjunction and disjunction, can be approximated by concurrent evaluation of components. Reactiveness is orthogonal to concurrency in the sense that concurrent nonreactive systems for very fast computations are commonplace, while sequential object-based systems are reactive but not nonconcurrent. However, reactive responsiveness is as important in large applications as concurrency. The identification of reactiveness and concurrency as independent goals of system design marks a step forward in our understanding of system requirements. The process interpretation of concurrent logic programs views goal atoms as processes and logic variables as streams. 229 The set of goals at any given point in the computation becomes a dynamic network of processes that may be reconfigured during every goal-reduction step. Every concu~ent lo~ic progr~ h~s a process interpretation, but concurrent obJect-onented appl~catIon programs cannot be directly mapped into concurrent 10~IC programs. Thus concurrent logic programs are less expressIVe t?an object-oriented programs in the sense of [Sh1]. LogIcal processes have no local state; they are atom~c predic~tes ~hose granularity cannot be adapted to the granulanty of o?Jects m th.e application domain. Concurrent logic progra"!s ~lve up theIr claim to be logical without gaining the commurucatlon and computation flexibility of traditional concurrent languages. 8. Are Multiparadigm Logic/Object Systems Possible? Can the object-based and logic programming paradigms be combined to capture both the decomposition and abstraction power of objects and the reasoning power of logic? Experience suggests that logic is not by itself a sufficient mech~ism for problem solving and that combining logical and nonl?glcal paradigms of problem solving is far harder than one mIght expect. Logic plays a greater role in verifying the correctness of programs than in their development and evolution. Finding a solution to a problem is less tractable than verifying the correctness or adequacy of an already given solution. For example, solutions of problems in NP can be verified in polynomial time but appear to require exponential time to find. Verification and validation is generally performed separately after a program (or physical engineering structure) has been constructed. The logic and object paradigms have different conceptual and computational models. Logic programs have a clausal inference structure for reasoning about facts in a database, while object-based programs compute by message passing among heterogeneous, loosely-coupled software components. Logical reasoning is top-down (from goals to subgoals), while objectbased design is bottom-up (from objects of the domain). Object-based programs lend themselves to development and evolution by incremental program changes that directly correspond to incremental changes of the modeled world. Inference rules provide less scope for incremental descriptive evolution, since rules for reasoning are not as amenable to change as object descriptions. ICOT's choice of logic programming as the vehicle for future computing contrasts with the US Department of Defense's choice of Ada. Because Ada was designed in the 1970s, when the technology of concurrent and distributed software components was still in a primitive state, it has design flaws in its module architecture. But its goals are squarely in the objectoriented tradition of model building based on abstraction. During the past 15 years we have accumulated much experience in designing object-oriented, distributed, and knowledge-based systems. The international computing community may well be ready for a major attempt to synthesize this experience in developing a standard architecture for distributed, intelligent problem solving in the 21st century. Such an architecture would be closer to the object-oriented than to the logic programming tradition. Next-generation computing architectures should try to synthesize the logic and object-oriented traditions, creating a multiparadigm environment to support the cooperative use of both abstraction and inference paradigms. For example, an object's operations could in principle be implemented as logic programs, though the use of Prolog as an implementation language for object interfaces presents some technological problems. Perhaps technological progress in the 21st century will resolve these problems so that multi paradigm environments can be developed facilitating the cooperative application of both abstraction and inference paradigms. Problem solving is a social process that involves cooperation among people, especially for large projects with a long life cycle. Decomposition of a problem into object abstractions is important both for cooperative software development and for incremental maintenance and enhancement. While objectoriented problem representation is not uniformly optimal for all problems, it does provide a robust framework for cooperative incremental software evolution for a much larger class of problems than logical representation. The early optimism that artificial intelligence could be realized by a general problem solver gave way in the 1960s to an appreciation of the importance of domain-dependent knowledge representation. The debate concerning declarative versus procedural knowledge representation was resolved in the 1970s in favor of predicate calculus declarative representation. AI textbooks of the 1980s [CM, ON] advocate the predicate calculus as a universal framework for knowledge representation, with domain-dependent behavior modeled by nonlogical predicate symbols satisfying nonlogical axioms. The logic and network approaches to AI have competed for research funds since the 1950s [Or], with the logic-based symbol system hypothesis dominating in the 1960s and 1970s and distributed pattern matching and connectionist learning networks staging a comeback in the late 1980s [RM]. The idea that intelligence evolves through learning is an appealing alternative to the view that intelligence is determined by logic, but attempts to realize nontrivial intelligence by learning have proved combinatorially intractable. Distributed artificial intelligence research [BO] and Minsky's The Society of Mind [Min], view problem solving as a cooperative activity among distributed agents very much in the spirit of object-oriented programming. Ascribing mental qualities like beliefs, intentions, and consciousness to agents is likewise compatible with the object-oriented approach. 9. References [BG] A. H. Bond and L. Gasser, Readings in Distributed Artificial Intelligence, Morgan-Kaufman 1988. [Co] J. Cohen, Introductory remarks for the special CACM issue on logic programming, CACM, March 1992 [CM] E. Chamiak and D. McDermott, Introduction to Artificial Intelligence, Addison-Wesley 1984 [GJ] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Complete ness, Freeman 1978. [ON] M. R. Oenesereth and N. 1. Nilson, Logical Foundations of Artificial Intelligence, Morgan-Kaufmann 1987. [Or] The Artificial Intelligence Debate, Editor Stephen Oraubard, MIT Press 1988 [Me] Bertrand Meyer, Object-Oriented Software Construction, Prentice-Hall Intemational1988. [Min] Marvin Minsky, The Society of Mind, Simon and Schuster, 1987 [RM] D. E. Rumelhart and 1. L. McLelland, Parallel Distributed Processes, MIT Press 1986. [Shl] E. Shapiro, Separating Concurrent Languages with Categories of Language Embeddings, TR CS91-05, Weizmann Institute, March 1991 [Sh2] E. Shapiro, The Family of Concurrent Programming Languages, Computing Surveys, September 1989 PROCEEDINGS OF THE INTERNATIONAL CONFERENCE. ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 230 Concurrent Logic Programming as a Basis for Large-scale Knowledge Information Processing Koichi Furukawa Institute for New Generation Computer Technology 4-28, Mita l-chome, Minato-ku, Tokyo 108, Japan furukawa@icot.or.jp 1 The Future of Information Processing As the Fifth Generation Computer Systems project claims, the information processing field is pursuing know ledge information processing. Since the amount of information being produced is increasing rapidly, there is a growing need to extract useful information from this information. The most important and promising technologies for information extraction are knowledge acquisition and machine learning. They include such activities as classification of information, rule acquisition from law data and summary generation from documents. For such activities, heavy symbolic computation and parallel symbolic processing are essential. Combinatorial problems are another source of applications requiring heavy symbolic computation. Human genome analysis and the inversion problems are examples of these problems. For example, in diagnosis, it is quite easy to forecast the symptoms given the disease. However, to identify the disease from the given symptoms is usually not so easy. We need to guess the disease from the symptom and to verify the truth of that guess by further observation of the system. If the system is linear, then the inversion problem is simply to compute the matrix inverse. But, in general, there is no straightforward way to solve the inversion problem. There may be many candidates for any guess and this becomes even worse when we take multiple faults into account. Note that abductive reasoning, one of the most important reasoning processes for open-ended problems, is also characterized as a general inversion formalism against deduction. Cooperative problem solving (or distributed AI) is another important direction for future information processing. Like human society, one feasible way of dealing with large-scale problems is for a number of experts to cooperate. To exchange ideas between experts, mutual understanding is essential, for which we need complicated hypothetical reasoning to fill the gaps in terminology between them. These three examples show the need for heavy concurrent information processing in the field of knowledge information processing in the future. 2 The Role of Logic Programming Logic programming provides a basic tool for representing and solving many non-trivial artificial intelligence problems. 1. As a knowledge representation tool, it can express situations without being limited to a closed world, as was believed until recently. The negation by failure rule makes it possible to express an open-ended world, which is essential for representing common sense and dealing with non-monotonic reasoning. Recently, a model theory for dealing with general logic programs which contains negation-by-failure literals in the body of clauses has been studied. The theory, called stable model semantics, associates a set of feasible models, natural extensions of least models, to each general logic program. 2. As an inference engine, logic programming provides a natural mechanism for computing search problems by automatic backtracking or by an OR-parallel search mechanism. Recent research results show the possibility of combining top-down and bottom-up strategies for searching. 3. As a syntactic tool for non-deductive inference, logic programming provides a formal and elegant formalism. Abduction, induction and analogy can be naturally formalized in terms of logic and logic programming. Inoue et al. [Inoue et al. 92] showed that abductive reasoning problems can be compiled into proof problems of first order logic. This means that non-deductive inference problems can be translated into deductive inference problems. Since abduction is a formalization of a kind of inversion problems, 231 this provides a straightforward way to solve such problems. There was a common belief that logic and logic programming had severe restrictions as tools for complex AI problems that require open endedness. However, recent research results shows they are expressive enough to represent and solve such problems. 3 The Role of Concurrent Logic Programming Concurrent logic programming is a derivative of logic programming and is good for expressing concurrency and executing in parallel. From a computational viewpoint, concurrent logic programming only supports AND-parallelism, which is essential for describing concurrent and cooperative activities. The reason why we adopted concurrent logic programming as our kernel language in the FGCS· project is that we wanted simplicity in the design for our machine language for parallel processors. Since concurrent logic programming languages support only AND-parallelism, they are simpler than those languages which support both AND- and OR-parallelism. We succeeded in writing many useful and complex application programs in KLl, the extension of our concurrent logic programming language, FGHC, for practical parallel programming. These include a logic simulator and a router for VLSI-CAD, and a sequence alignment program in genome analysis. These experimental studies show the potential of our language and its parallel execution technology. The missing computational scheme in concurrent logic programming is OR-parallelism. It comes from the very fundamental nature of concurrent logic programming language, that is, the committed choice mechanism. ORparallelism plays an essential role in many AI problems because of the requirement for searching. A great deal of effort has been made to achieve OR-parallel searching in concurrent logic programming by devising programming techniques. We developed three methods for different applications: a continuation-based method for algorithmic problems, a layered stream method for parallel parsing, and a query compilation method for database problems. These three methods cover many realistic applications. Therefore, we have almost developed OR-parallelism successfully. This means that there is a possibility of building parallel deductive databases in concurrent logic programming. One of the most significant achievements using the query compilation method is a bottom-up theorem prover, MGTP [FujitaHasegawa 91]. This is based on the SATCHMO prover by [Manthey 88]. MGTP is a very efficient theorem prover which utilizes the full power of KLI in a natural way by performing only one way unification. SATCHMO has a restriction in the problems it can efficiently solve: range-restrictedness [Furukawa 92]. However, most real life problems satisfy this condition and, therefore, it is very practical. We succeeded in computing abduction, which was translated to a theorem proving problem in first order logic, by using MGTP. We succeeded in solving a very important class of inversion problems in parallel on our parallel inference machine, PIM. 4 Conclusion Concurrrent logic programming gained· its expressive power for concurrency at the sacrifice of Prolog's search capability. By devising programming techniques we have finally almost recovered the lost search capability. This means that we now have a very expressive parallel programming language for a wide range of applications. As an example, we have shown that the technique enabled realization of an efficient parallel theorem prover, MGTP. We have also shown success in deductively solving an important class of inversion problems, formulated by abduction, by the theorem prover. Our research results indicate that our concurrent logic programming and parallel processing based technologies have great potential for solving many complex future AI problems. References [FujitaHasegawa 91] H. Fujita and R. Hasegawa, A Model Generation Theorem Prover in KL1 Using a Ramified-Stack Algorithm. In Proc. of the Eighth International Conference on Logic Programming, Paris, 1991. [Furukawa 92] K. Furukawa, Summary of Basic Research Activities of the FGCS Project. In Proc. of FGCS92, Tokyo, 1992. [Inoue et al. 92] K. Inoue, M. Koshimura and R. Hasegawa, Embedding Negation as Failure into a Model Generation Theorem Prover. To appear in CAD11: The Eleventh International Conference on A utomated Deduction, Saratoga Springs, NY, June 1992. [Manthey 88] R. Manthey and F. Bry, SATCHMO: A Theorem Prover Implemented in Prolog. In Proc. of CADE-88, Argonne, illinois, 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 232 Knowledge Information Processing in the 21st Century Shunichi Uchida Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan uchida@icot.or.jp 1 A New Research Platform Here in the last decade of the 20th century, the beginning of the 21st century is close enough for us to be able to forecast the kind of changes that will happen to new computer technologies and in the market and to predict what kind of research fields will be the most important. I would like to try to forecast what will happen to parallel processing and knowledge information processing (KIP) based on my experience in the FGeS project. It is quite certain that the following two events will happen; 1. Large-scale parallel hardware will be used for largescale problem solving. 2. Symbolic processing and knowledge information processing applications will be extended greatly. However, it is not so obvious whether these two events will be effectively combined or will remain separate. The key technology is new software technology to enable us to efficiently produce large and complex parallel programs, especially for symbolic and knowledge processing applications. If this parallel software technology is provided with large-scale parallel hardware, a very large change will happen in the market in the 21st century. I think that the FGeS project has developed the kernel of this key technology and shown that these two events will surely be combined. In the FGCS project, we proposed the hypothesis that a logic programming language family would be superior to any other language families in exploiting new software technology and applications, especially for symbolic and knowledge information processing. The first step in proving this hypothesis was to show that the above two events can be smoothly combined by logic programming. We decided to design and implement a logic language on large-scale parallel hardware. . In designing and implementing this logic language, the most important problem was to find an efficient method to realize the following two very complex mechanisms; 1. An automatic process synchronization mechanism based on a dataflow model 2. An automatic memory management mechanism including an efficient garbage collection method for distributed memories These mechanisms greatly reduce the burden of parallel programming and are indispensable for implement-· ing not only a parallel logic language but also any other high-level language including functional language such as a paralfel version of LISP. We have developed a parallel logic language, KLl, its language processor and programming environment, and a parallel operating systems, PIMOS. These are now implemented on parallel inference machine hardware, PIM hardware, which connects up to 512 processing elements. We have also developed a parallel DBMS called KappaP on the PIMOS. We call all of these software systems FGeS basic software. Through the development of experimental parallel application systems using this basic software, we have already experienced that we can efficiently produce parallel programs which make full use of the power of parallel hardware. This basic software is now available only on PIM hardware which has some hardware support to make KLI programs run faster such as tag handling support or a large capacity main memory. However, recently, it has been announced that many interesting parallel hardware systems are to appear in the market as high-end supercomputers aiming at large-scale scientific calculations. Some of them have an MIMD architecture and employ a RISe type general purpose microprocessor as their processing element. It is certain that the performance and memory capacity of these processing elements will increase in the next few years. At that stage, it will be possible to implement the FGeS basic software on this MIMD parallel hardware and obtain reasonable performance for symbolic and knowledge processing applications. If this is implemented, this parallel hardware will have a highlevel parallel logic programming environment combined with a conventional programming environment. This new environment should provide us with a powerful and widely-usable common platform to exploit knowledge information processing technology. 233 2 2.1 KIP R&D in the 21st Century Knowledge representation knowledge base management and The first step to proving the hypothesis that the logic language family is the most suitable for knowledge information processing is to obtain a new platform for further research into knowledge information processing. For this step, a low-level logic language, namely, KLI was developed. The second step is to show that a logic language will exploit new software technology to handle databases and natural knowledge bases. The key technology in this step will be knowledge representation and knowledge base management technology. Using a logic language as the basis for knowledge representation, it should be a natural consequence that the knowledge representation language has the capability of performing logical deduction. Users of the language will consider this capability desirable for describing knowledge fragments, such as various rules in our social systems and constraints in various machine design. The users may also want the language to have been an object-oriented modeling capability and a relational database ·capability, as built-in functions. Currently, we do not have good criteria to combine and harmonize these important concepts and models to realize a language having these rich fu~ctions for knowledge representation. The richness of these language capabilities will always impose a heavy overhead on its language processor. The language processor in this case is a higher-level inference engine built over a database management system. It is interesting to see how much the processing power of parallel hardware will compensate for this overhead. In the FGCS project, we developed a database management system, Kappa-II based on the nested relational model. It was implemented on a sequential inference machine, PSI, for the first time. Now, its parallel version, Kappa-P written in KLl, has been built on the PIM hardware. Over Kappa-P, we have designed a knowledge representation language, Quixote and a KBMS based on the deductive and object-oriented model. Its first implementation has been completed and is now under evaluation. Quixote is one of the high-level logic languages developed over KLl. These evaluation results should provide very interesting data for forecasting database research at the beginning of the 21st century. Another high-level logic language developed in the FGCS project is a parallel constraint logic programming language, GDCC. GDCC has a constraint solver in its language processor which can be regarded as an inference engine dedicated to algebraic problem solving. Another kind of inference engine is a parallel theorem prover for first order logic which is called a model gen- eration theorem prover, MGTP. This prover is now used as the kernel of a rule-based reasoner in a law expert system, also known as the legal reasoning system, Hellic- II. These logic languages and inference engines will be further developed during this decade. They will be implemented on large-scale parallel hardware and will be used as important components to organize a new platform to build a knowledge programming environment in the first decade of the 21st century. 2.2 Knowledge programming knowledge acquisition and The third step to proving the hypothesis is to show that a knowledge programming environment based on logic programming will efficiently work to build knowledge bases, namely, the contents of a KBMS. Knowledge programming is a programming effort to translate knowledge fragments into internal knowledge descriptions that are kept and used in a KBMS. This process may be regarded as a conversion or compiling process from "natural" knowledge descriptions, which exist in our society for us to work with, into "artificial" knowledge descriptions, which can be kept in the KBMS and used efficiently by application systems such as expert systems. If this process is done almost automatically by some software with a powerful inference engine and knowledge base, it is called "knowledge acquisition". Some people may call it "learning". In human society, we have many "natural" knowledge bases such as legal rules and cases, medical care records, design rules and constraints, equipment manuals, language dictionaries, various business documents and rules and strategies for game playing. They are too abstract and too context-dependent for us to translate them into "artificial" knowledge descriptions. In the FGCS project, we developed several experimental expert systems such as a natural language processing system, a legal reasoning system, and a Go playing system. We have learned much about the problems of how to code or program a "natural" knowledge base, how to structure knowledge fragments to be able to use them in application programs, and so on. We have also learned that there is a big gap between the level of "natural" knowledge descriptions and that of the "artificial" knowledge descriptions which current software technology can handle. We were forced to realize again that "natural" knowledge bases have been built not for computers but for human beings. The existence of this large gap means that current computer technology is not intelligent enough to accept such knowledge bases. It is obvious that more research effort is needed to build much more powerful inference engines that will provide us with much higher-level logical reasoning functions based on formal and informal models such as CBR, 234 ATMS and inductive inference. In parallel with this effort, we have to find some new methods of preprocessing "natural" knowledge descriptions to obtain more well-ordered forms and structures for "artificial" knowledge bases. For example, we have to create new theories or smodeling techniques to explicitly define contextdependent information hidden behind "natural" knowledge descriptions. The situation theory will be one of these theories. It is interesting to see how these powerful inference engines will relate to knowledge representation language and knowledge structuring methods. Another interesting question will be to what extent the power of larger-scale parallel hardware and parallel software technology will make these higher-level inference functions practical for real applications. It is certain that research into knowledge information processing will continue to advance in the 21st century, opening many new research fields as it advances and leaving a large growing market behind it. leOT SESSIONS PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 237 LSI-CAD Programs on Parallel Inference Machine Hiroshi Date t Kazuo Taki t Yukinori Matsumoto t Hiroo Kato+ Koichi Kimurat Masahiro Hoshi+ tInstitute for New Generation Computer Technology 1-4-28, Mita, Minato-ku, Tokyo 108, Japan {date, yumatumo, kokimura, taki}@icot.or.jp +Japan Information Processing Development Center 3-5-8, Shibakouen, Minato-ku, Tokyo 105, Japan {j-kato, hoshi}@icot21.icot.or.jp Abstract This paper presents three kinds of parallel LSI-CAD systems developed in ICOT and describes their experimental results on a parallel inference machine. These systems are routing, placement and logic simulation. All of them are implemented in KL1, a concurrent logic language, and executed on the Multi-PSI, a distributed memory machine with 64 processors. We regard our parallel inference machines as high performance general purpose machines. We show programming techniques to derive high performance on parallel inference machines. The common objectives of these systems are, firstly, to provide speedup by extracting major parallelism, and, secondly, to show the applicability of our hardware and language system to practical applications. For this reason, our systems are evaluated using real LSI chip data. The key features are, in the routing system, concurrent object modeling of routing problems to realize a lot of concurrency; in the placement system, time-homogeneous parallel simulated annealing to optimize placement results; and in the logic simulation system, the Time Warp mechanism as a time-keeping mechanism for simulations. Experimental results of these systems show that these techniques are effective for parallel execution on largescale MIMD machines with distributed memory structure, like the parallel inference machines. 1 Introduction A parallel computer system PIM (the Parallel Inference Machine), one of the goals of the Japanese Fifth Generation Computer Systems project, has been completed, and its evaluation is starting. PIM has been developed mainly to target high performance knowledge information processing. Since most problems in this domain are of an extremely large size, exploiting the whole power of parallel machines is important. In practice, however, it is not easy to derive their maximum power because of the non-uniformity of computation, that is dynamically changing parallel computation depending on time and space. In order to move programs efficiently on PIM, the following are important. First is to adopt good concurrent algorithms. Second is to design programs based on programming paradigms to realize high parallelism. And last is to use effective load distribution techniques including processor mapping. We aimed at gaining experiences with these techniques through large-scale practical application experiments on PIM. PIM is an inference machine, however, its applicability should not be limited to knowledge information processing. From the viewpoint that PIM is a high performance general purpose machine, we chose LSI-CAD as one of the application fields. Nowadays, LSI-CAD is indispensable for LSI design. The integration of the LSI chip has increased exponentially in proportion to the progress of the semiconductor process technology. The quality of LSIs depend on the performance of LSI-CAD tools. Therefore, higher performance is required. Besides, the flexibility of the tools must be kept for a variety of demands. Using hardware accelerators is one possible way of obtaining faster tools, however, it usually results in a sacrifice of flexibility. A likely alternative is to para.llelize software tools. This certainly satisfies the above two requirements: making the tools faster and keeping their flexibility. We focused on three stages of LSI-CAD; logic simulation) placement and routing, which are currently the most time-consuming in LSI design. Each system has following features. The routing system finds paths based on the lookahead line search algorithm [Kitazawa 1985]. This algorithm provides high quality solutions, however, it was originally proposed with the assumption of sequential execution. We introduced a new implementation method of parallel 238 router based on the concurrent objects model, and improved the basic algorithm to make it suitable for parallel execution. The concurrent objects model is expected to derive a lot of parallelism among small granular processes. We investigated the description complexity and overhead of our routing programs. Also its performance (real speed, speedup and wiring rate) was evaluated in comparison with a sequential router on general purpose computer using real LSI chip data. The cell placement problem is a combinatorial optimization problem. Simulated annealing (SA) is a powerful algorithm to solve such problems. Cooling schedules are important for efficient execution of SA. In our placement system, the time-homogeneous parallel SA algorithm [Kimura et aI. 1991] was adopted. This algorithm constructs appropriate cooling schedules automatically. We evaluated quality of solutions in our system using MCNC benchmark data. Logic simulation is an application of discrete event simulation. The key to its efficient execution in parallel is keeping the time correctness without large overheads. We adopted the Time Warp mechanism (TW) as the time-keeping mechanism. TW has been considered to contain large rollback overheads, however, it has not been evaluated in detail yet. We not only improved the rollback process but also added some devices so that TW would become an efficient time-keeping mechanism. Cascading-Oriented Partitioning strategy for partitioning circuits are also proposed to attain good load distribution. We evaluated our system on speedup and real speed (events/sec) as compared with the systems that had other time-keeping mechanisms (Conservative and Time Wheel) using ISCAS89 benchmark data. These systems were implemented in KL1 [Chikayama et aI. 1988, Ueda et aI. 1990], a concurrent logic language, and have been experimented with on the MultiPSI/V2 [Nakajima et a1.1989, Taki 1988], a prototype of PIM. This paper is organized as follows: The routing system is described in Section 2. A routing algorithm based on the concurrent objects model and its implementation is presented in detail. Section 3 explains the placement system. The time-homogeneous SA algorithm is introduced and optimization in the implementation is explained. Section 4 overviews the logic simulator and reports on its evaluation. Our conclusion is given in Section 5. 2 2.1 Routing System Background There have been many trials to realize high speed and good quality router systems with parallel processing. These trials can be classified into two areas. One is the hardware engine which executes the specified routing algorithm efficiently [Kawamura et al. 1990, Nair et aI. 1982, Suzuki et al. 1986]. The other is concurrent rout- Object-oriented Modeling +Concurrent Algorithm (Distributed Algorithm) Problem V I ~ Process Structure~ (Logical concurrency) V Grouping (Physical parallelism) PEn: Processors Figure 1: Program design paradigm based on the concurrent objects model ing programs implemented on general purpose parallel machines [Brouwer 1990, Olukotun et aI. 1987, Rose 1988, Watanabe et aI. 1987, Won et aI. 1987]. The former approach can realize very high speeds, while the latter can provide large flexibility. We took the latter approach to realize both high speed and a flexible router system, targetting very large MIMD computers. In general, a lot of parallelism is needed to feed a large MIMD computer. So, we propose a. completely new parallel routing method, based on a small granual concurrent objects model. The routing method was implemented on the distributed memory machine, MultiPSI, with a logic programming langu'age KL1. We made preliminary evaluations of the new router, from the viewpoints of (1) data size vs. efficiency, (2) wiring rate vs. parallelism, and (3) comparison of execution speed with general purpose computers. This section contains the following. A programming paradigm based on the concurrent objects model, a router program with an explanation of concurrent algorithms and implementation, problems in parallelization, and preliminary measurements and evaluation results. 2.2 Programming Paradigm Formalizing a problem based on the concurrent objects model is one of the most prospective ways to embed parallelism in a given problem. This section describes our methodology to design parallel programs from problem formalization to parallel execution. We also show coding samples in the KL1 language. Figure 1 shows the flow of parallel program design. Firstly, a given problem is formalized based on the concurrent objects model. That is, many objects make a solution cooperatively, by exchanging messages. At the same time, a concurrent algorithm is designed upon the 239 ConcurrenCObject ([ Message_1 true 1-0 Process correspond to Message_1. Renew of Interior state variables. Output of Message to other objects ~ r. Concurrent_Object ([ Message_21 Rest J. Interior state variables. Stream variable) : - 71 l11 ;t1 111" \::; model. Sometimes, the algorithm is a distributed algorithm. Through this design phase, the activities of the objects corresponding to messages are defined. Then, each object is implemented as a KLI process. Process connection topology is decided based on input data. Usually, a much larger number of processes is needed than the number of processors to get good load balance. Logical concurrency (the possibility of parallel processing) is designed through this flow. Secondly, the processes, which exch~ge me~sages frequently, are grouped to increase communication locally. When each process has a large computational amount (large granularity) and a low communication rate, this phase can be omitted. Then, the groups are assigned to processors and executed. This is called mapping. Physical parallelism is realized in this phase. The KLI language system allows independent descriptions of the problem solving part (logical concurrency) and the mapping part (physical parallelism) of a program. Performance tuning of parallel processing can be done only by changing the mapping part, not by changing the problem-solving algorithm. The KLI language is quite suitable for describing concurrent objects. Processes representing the objects are written in the self recursive call of the KLI language. These processes can communicate with each other through the message streams. Figure 2 shows a coding sample of an object. The functions of an object are defined with a set of clauses. Each clause corresponds to a message which the object receives. 2.3 Router Program We used the lookahead line search method [Kitazawa 1985] as a basic algorithm. Then we reconstructed the algorithm for highly parallel execution, taking the concurrent objects model as a basic design framework. 2.3.1 Basic algorithm The lookahead line search method is one of the line search algorithms coupled with lookahead operation. It is, if you like, a sort of hill-climbing algorithm, looking for a good route. The algorithm, also, has two features. One is to escape from the local optimum point with the ..,V ' / / v/v r/~ ~V r/~ ~/ Stream variable = [Message I New Stream variableJ. ConcurrenCObject ( Rest. New Interior state variables. New Stream variable). Figure 2: Implementation of a concurrent object in KLI -- '" master line.processes I Rest J. Interior state variables. Stream variable) :- I // // -- - I - ...... r-- ' line·processes Figure 3: Master line processes and line processes help of the Inhibited Expected Point (IEP) flag. The other is backtracking to retrace bad routes and to retry searching. The algorithm guarantees connection between a start point and a target point when paths exist between them. 2.3.2 Concurrent routing algorithm In KL1 programming, an execution unit is a process corresponding to an object. Since the line search algorithm decides a route, line by line, we designed the concurrent algorithm so that objects=processes corresponds to every line segment on a routing grid. Line processes exchange messages with each other to look for a good route. Each line process maintains the corresponding line's status and, at the same time, the execution entity of the search. As Figure 3 shows, each process corresponds to each grid line (master line process) and line segment (line process) on it. A master line process manages line processes on the same grid line and passes messages between the line processes and crossing line processes. The routing procedure of one net is almost the same as that of the basic algorithm, except that the procedure is broken down into a sequence of messages and their operations are executed among processes. Computing the best expected point is done as follows. The expected point is the closest location to a goal on a line segment. The distance to the goal is used for a cost function in the hill climbing method. When a line process receives a routing request message with information of a goal point, it changes its status to "under searching". Then, it sends request messages for calculation of expected points to line processes that cross it (Figure 4). Thus computation of the expected point is executed concurrently on each line process that receives the request message. After the computation .,results are returned to the searching line process, it aggregates those results and determines the best expected point. When the best expected point is determined, the searching line is connected to the crossing line that includes the best expected point. The searching line process splits into an occupied part and a free part, and 240 requesting execution of an expected point to each other o :an expected point ••• : a line process executing expected points EI: a connected path - : a line process searching routing paths expected point [connect ' / /. T ~C::% 'ijV S v: f7j " '11 s ~~ ~~ VV/ V/V l~ 1/ t71 k:: ?J tL rJ (a) Parallel execution of expected points I ...... /. ' / T v::% ~V /.~ ~~ rv/v v/rv Figure 5: Example of deadlock (b) Connection of routing path (termination condition) expected point - target point 1 s V/' VV V II T ~ lJ vij ~/. /.~ ~/. r; f/l '/ '/ fL ~ (c) Termination condition of parallel execution of expected points S V' 7J tL tLl /. ' / / U ~~ ~ r.;: tJ~ V;t/ T v ' / V/.V (d)Completion of routing Figure 4: Parallel execution of expected points the status is maintained. Then, the next routing request message is sent to the connected crossing line. Messages are sequenced at the entrance of each process. Only one message can be handled at any time in a process. No problems of exclusive access to an object or locking/unlocking objects arise with this scheme. In our algorithm, two types of parallelism are embedded. One is concurrent computation in the lookahead operation and the other is concurrent routing of different nets. 2.4 Problems in Parallel Execution When we parallelize the lookahead line search method, three problems arise. The first is deadlock, the second is conflict among routing nets and the last is memory overflow for communicating between processors. 2.4.1 Avoidance of deadlock When two or more nets are searched concurrently, deadlock may occur. Figure 5 shows an example. When line processes that intersect orthogonally send request messages to compute the expected point to each other at the same time, computation will not occur. This is because they cannot carry out the next messages until the execution of present messages terminates. If it is guaranteed that execution of a message terminates within a fixed period, deadlock can be avoided. To satisfy this condition, we made the following modification. Firstly, messages are grouped into group A and group B. B-type messages are guaranteed to terminate execution within a fixed period. A-type messages are not guaranteed to terminate, that is, some synchronization with other processes is needed before terminating message execution. We modify the operations of A-type messages as follows. Each process executing an A-type message observes all messages arriving successively. When an Atype message is found, it is left in a message queue, that is no operations are performed. When a B-type message is found, it is processed immediately before termination of the currently executing A-type message. For this processing of B-type messages, a temporary pr·ocess status that differs from the sequential algorithm is needed. By applying this modification, deadlock can be avoided. In our router, the routing request messages are A-type, and the request messages of computing expected points are B-type. 2.4.2 Conflict among nets When concurrent routing of multiple nets is done, different nets may conflict on the same line segment. In this situation, message sequencing works well and the first message to arrive (corresponding to net A) occupies the segment. The second message to arrive (net B) fails to complete a route, and backtracks. However, net A may backtrack afterwards and may release the line segment. In this case, net B does not visit the line segment anymore and the line segment may be left unused. This fact causes lower quality routes (longer paths) or a lower wiring rate (more unconnected nets). To avoid those degradations in routing quality, the scheduling of the order in which nets start routing is important and may limit concurrency by eliminating the number of nets routed concurrently. However, parallelism can be affected by these controls. Relations between wiring rates and parallelism are studied in the experiments. 241 PEO PE1 o o Speedup 25r-----------------------------~24 :distributer process 20 2x4 1x2 2x2 :master line process 1x1 15 10 ... : •••••.••••••••• 0 •• '.' ••• , ••• ,···,·,·,·,·,···,···, 5 PE2----'--PE3 o ~----------------------------~ Figure 6: Improved process structure 24 8 16 32 64 # of PEs Figure 7: Data size vs. speedup 2.4.3 Overflow of memory for communication among processors When we implement the concurrent program using KLl, two kinds of memory are necessary. One is the memory for representing processes. The other is the memory for communication paths among processors. In our routing program, the process structure shown in Figure 3 was implemented. Each master line processes, which mediates between line processes, must communicate all orthogonal master line processes. Therefore the number of communication paths is increasing for largescale data. Experimental results show that the maximum grid size of chip data to be treated by this routing program is about 500 x 500. This size is too small for applying practical data. In order to solve this problem, we improved the process structure, as in Figure 6. Each distributer process controls communication among processors. 2.5 Measurements and Evaluation We evaluate our router from the following three points of view. (1) Data size vs. Speedup, (2) Parallelism vs. Wiring Rate, and (3) Comparison with a general purpose computer. The program was executed on a MIMD machine with distributed memory and 64 processors, the Multi-PSI. Two types of real LSI data were used. The features of these data are shown in Table 1. Terminals to be connected are distributed uniformly in DATAl. Meanwhile, terminals are concentrated locally in DATA2. DATA3 is large-scale data. Table 1: Testing data Data Grid size # of nets Presented by DATAl 262xl06 136 Hitachi Ltd. I DATA2 322x389 71 NTT Co. DATA3 2746 x3643 556 NTT Co. 2.5.1 Data size vs. speedup Generally, when data size increases, the number of processes increase too and more parallelism can be expected. Higher parallelism can lead to greater efficiency or larger speedup with a fixed number of processors. We measured the relationship between the size of data and speedup. In this experiment, we used data copying DATAl. Here we measured four cases (1 xl, 1 x2, 2x2, and 2x4). Figure 7 shows the result of measurement. This graph shows that the larger the size of data, the higher the speedup. It also shows 24-fold speedup with 64 processors for 2 x4 data it does not look saturated yet. We have to investigate the limit of speedup with increasing data size. 2.5.2 Wiring rate vs. parallelism Parallel routing of multiple nets may cause a degradation in wiring rate. We measured the relation between wiring rate and parallelism for DATAl and DATA2, as shown in Figure 8. The two vertical axes show execution time and wiring rate. The horizontal axis shows the number of nets routed concurrently. Parallelism is proportional to this. When equal to one, parallelism only arises from parallellookahead operations. It was observed that terminal-distributed data shows good wirability, even if parallelism is high, when the terminal-concentrated data is poor. Concentrated terminals tend to cause a lot more net confliction. 2.5.3 Comparison with a general purpose computer The execution time of DATA2 with a single processor was measured as 111 seconds. From Figure 8, speedup caused only by lookahead operation is calculated as 4.9. The execution time of our system was compared with a general purpose computer, the IBM 3090/400, which is a 15 MIPS machine. The sequential lookahead line search router on the IBM machine was developed by Dr. Kitazawa. (NTT Co.) before our work wa.s conducted. Table 2 shows the performance of the routers. 242 Execution time(sec.) Wiring rate(%) 30~--~--~~~~~~~~~~~~~~100 Ij Wiring rate (DATAl) 99% I ~ 25 20 1\ \ 90 "\ x 64 x 0.25). While, the actual performance is comparable with the 15MIPS machine. The degradation of actual performance must be caused by the implementation overhead of the object-oriented pr,ogram and KL1 language. ' \Wiring rate (DATA2) \ ..... ~,..,. 15 " ... ~,, 80 , , 2.6 72% ... ",,® 70 10 Execution time (DATA2) o o ~ 8 sec. 7 sec. ro 00 1~ 100 # of nets Figure 8: Wiring rate vs. parallelism Two cases of the Multi-PSI measurements (with 64PEs) are included in the table. One routed all nets concurrently and the other routed each net one after another. The former case shows the better execution time but worse wiring rate. The latter case accomplished the perfect wiring rate but worse execution time for DATA2. We expect to realize both good execution time and good wiring rate by controlling the number of nets wired concurrently and changing the wiring order. (In fact, on DATA2, W:100 % and E:16 sec. under the number of nets wired concurrently is equal to 2.) The evaluation for large data (DATA3) has just started. The wiring rate in the table is still insufficient but it will be improved as mentioned just above. Table 2: Comparison of performance Data\Machines IBM Multi-PSI Multi-PSI 3090/400 (64PEs) t (64PEs) t DATA2 E 7.45 7.0 20.0 W 100 72 100 DATA3 E 405.0 360.0 N.A. N.A. W 100 90 .. E:executlOll time (Sec.),W:wmng rate(%) t concurrent wiring of all nets t sequential wiring of each net Multi-PSI (lPE) 111.0 100 N.A. N.A. The execution time of the router on the Multi-PSI can be considered almost comparable with that on an IBM machine. When our router is ported to PIM machines, the next model to the Multi-PSI, the execution time will be reduced to 1/10 to 1/20 in execution with 256 to 512 processors. The performance of the bare hardware of a Multi-PSI processor is 2 to 3 MIPS. And the efficiency of parallel processing (speedup/number of processors) is 25% for the case of Multi-PSI(64PEs) with concurrent wiring of all nets on DATA2. So, bare hardware performance with 64 processors is expected to be 32 to 48 MIPS (2 to 3 Discussions We presented a new routing method based on the concurrent objects model, which can include very large concurrency and is suitable for very large parallel computers. The program was implemented on a distributed memory machine with 64 processors. Preliminary evaluation was then done with actual LSI data. The experimental results showed that the larger the data size, the higher the efficiency attained by a maximum of 24-fold speedup with 64 processors against single processor execution. The speedup curve did not look significantly saturated, that is, more speedup can be expected with more data. In experiments on parallelism and the wiring rate, a good wiring rate with large parallelism was attained for data in which terminals are distributed uniformly. However, for data with concentrated terminals, the wiring rate became significantly worse, due to the increase in parallelism. We must improve the wiring rate in the latter case. The actual performance of our router system was compared with an almost identical router on a high-end general purpose computer (IBM3090/400, 15 MIPS). Results showed that the speed of both systems was comparable. Based on a rough comparison of bare hardware speeds, the implementation overheads of the parallel object-oriented program and our language are estimated as 100 to 200% in total, against the sequential FORTRAN program on the IBM machine. 3 3.1 Placement System Background Cell placement is the initial stage of the LSI layout design process. After the functional and logical designs of the circuit are completed, the physical positions of the circuit components are determined so as to route all electrical connections between cells in a minimum area without violating any physical constraints. Heuristics for evaluating the quality of a placement usually promote one or more of the following: minimum estimated wire length, an even distribution of wires around the chip, minimum layout area, and regular layout shape. The cell placement problem is well-known as a difficult combinatorial optimization problem. In other words, it is not feasible for obtaining the optimum placement of a circuit with practical size because it takes excessively amounts of CPU time. So efficient techniques to get nearly optimum placement must be employed in practice. 243 3.2 Simulated Annealing Approximate methods are used to solve the combinatorial optimization problem. One such method is called iterative improvement. In this algorithm, the initial solution is generated, and, then, modified repeatedly to try to improve it. In each iteration, if the modified solution is better than the previous one, the modified solution becomes the new solution. The process of altering the solution continues until we can make no more improvement, thus yielding the final solution. The problem with this algorithm is that it can be trapped at a local optimum in a solution space. The Simulated Annealing(SA) algorithm [Kirkpatrick et al. 1983] is proposed to solve this problem. It probabilistically accepts a new solution even if the new solution may be worse temporarily. Its acceptance probability is calculated according to the change in the estimated cost value of the solution and the parameter "temperature". The cost function is often referred to as "energy". In this way, it is possible to search for the global optimum without being trapped by local optima. The details of this algorithm are as follows. It is constructed from two criteria, the inner loop criterion and the stopping criterion. At first, the initial solution and initial temperature are given. In the inner loop criterion, new' solu tions are generated iteratively and each solution is evaluated to decide whether it is acceptable. The units of iteration which are constructed by generating and estimating the new solution are called "step". In each stage of the inner loop criterion, the temperature parameter is fixed. In the stopping criterion, after a sufficient number of iterations are performed in the inner loop, the temperature is decreased gradually according to a given set of temperatures called the "cooling schedule". The stopping criteria are satisfied when the energy no longer changes. One of the most difficult things in SA is finding an appropriate cooling schedule, which largely depends on the given problem. If the cooling schedules are not adequate, satisfiable solutions will never be obtained. 3.3 Parallel Simulated Annealing A new parallel simulated algorithm(PSA) is proposed to solve the cooling schedule problem [Kimura et al. 1991]. The most important characteristic of this algorithm is that it constructs the cooling schedule automatically from the given set of the temperatures. The basic idea is to use parallelism in temperature, to perform SA processes concurrently at various temperatures instead of sequentially reducing the temperature. So it is scheduleless or time-homogeneous in the sense that there are no time-dependent control parameters. After executing a fixed number of annealing steps, the solutions between the adjacent temperatures are proba~ bilistically exchanged as follows. When the fixed number of annealing steps is denoted by k, 1/ k is called the "fre- quency". When the energy of the solution at a higher temperature is lower than that at a lower temperature the solutions between these temperatures are exchanged unconditionally. Otherwise they are exchanged according to a probability that is determined by differences in their energies and temperatures [Kimura et al. 1991]. In PSA, even if a solution is trapped at a local optimum at a certain temperature, it is still possible to search for global optima because another new solution can be supplied from a higher temperature. So a nearly optimum solution will finally be found at the lowest temperature. 3.4 Outline of the System Our experimental placement system employs the PSA algorithm. It is constructed on Multi-PSI, an MIMD machine, and the KL1 language is used to implement the system [Chikayama et al. 1988]. The intention is to provide a satisfiable solution in a feasible time. It is also applied to placement problems to examine the efficiency of the PSA algorithm. The object of this system is the standard cell LSI without any macro blocks. The standard cells have uniform height and variant widths. These cells are arranged in multiple cell-blocks so as to minimize the chip area. Namely, it decides the location of each cell so as to minimize the total estimated wire length, which approximates the total routing length. 3.5 Implementation 3.5.1 Initial placement and new solution generation The initial cell positions are determined randomly. In our placement system, SA processes are split into two temperature regions. The number of temperatures in the two regions should be specified by the user. Usually one or more temperatures are necessary for the lower region. In the higher temperature region, there are two ways to generate a new solution. One way is to move a randomly selected cell to a random destination. The other way is to exchange the position between two randomly selected cells. In the lower temperature region, generating a new solution is done by exchanging two arbitrary adjacent cells within a cell-block. Moreover the range-limiter window is introduced [Sechen et al. 1985]. The range-limiter window restricts the. ranges for moving and exchanging cells. The lower the temperature becomes, the smaller the size of the window becomes. It suppresses the generation of new solutions that are unlikely to be accepted. 3.5.2 Estimation of a new solution The energy of a solution is the sum of the three values listed below [Sechen et al. 1985]. 244 Table 3: Number of temperatures .vs. quality of solution • estimated wire length • the cell overlap penalty • the block length penalty The estimated wire length approximates the routing lengths between the cells. The estimated wire length of a single net is the half-perimeter length of the minimum bounding box which encloses all of the pins comprising the net. The cell overlap penalty estimates the overlap between cells. In the higher temperature region, we permit overlap between cells because the cost of recalculating the estimated wire length for a new solution can be reduced. If overlap were not permitted, the overlap incurred by moving or exchanging cells would have to be removed by shifting many cells. As a result, the estimated wire length would have to be recalculated with respect to all of the nets connected to these shifted cells. In the lower temperature region, cells are never overlap, a new solution can be re-estimated only by calculating the change in total estimated wire length, because the two penalties don't change in this case. The block length penalty estimates the difference between ideal and real block length. It is desirable to have cell blocks of a uniform length. When the solutions are exchanged between the two temperature regions, the overlap between cells in the higher temperature region is removed as the solution is passed to the lower one. 3.5.3 Load distribution and solution exchange between adjacent temperatures In PSA, each SA process is assigned to a separate processor, because executions at each temperature are highly independent and the amount of execution is nearly equal. When we try to implement the exchange mechanism of the solutions, the natural way may be to exchange the solutions between the processors. But, when an MIMD machine like Multi-PSI is used, the exchange of large placement data between processors incurs a large communication overhead. So, solutions exchange between adjacent temperatures should be done by exchanging temperature values between processors. Processors with adjacent temperatures hold a common variable and use this for communication. This is called a "stream" in KL1 [Chikayama et al. 1988] and is realized by an endless 'list'. These streams are also swapped between processors when the solutions between adjacent temperatures are exchanged. 3.5.4 Performance monitoring subsystem The monitor displays the energy value of each SA process in real- time. It is useful to overview the entire state of the system performance. This energy graph is updated . when adjacent temperatures are exchanged. As it displays the exchange in energy value in real time, it helps us to decide when to stop the execution. After several short time executions, we can decide the number of temperatures·in the two regions and the highest temperature from the dispersion of the energy graph. The monitoring subsystem is constructed on a Front End Process so that it does not incur an overhead in SA process execution. It is also possible to roll back the energy graph while SA processes are being executed. 3.6 Experimental Results and Discussions The MCNC benchmark data [MCNC 1990], consisting of 125 cells and 147 nets, was chosen for our measurements. In the initial placement, the value of energy was 911520 and a lower bound of the chip area was estimated as 1.372[mm 2]. The PSA was executed in 20,000 inner loops, with exchanges every hundred inner loops. 64 processors can be used on Multi-PSI. The number of temperatures is 63, the highest temperature is 10,000, the lowest is 20 and other temperatures are determined proportionally. 5 temperatures are assigned to the lower temperature region. The lower bound of the area of the final solution is estimated as 0.615 [mm 2 ], reduced by 56.0 % in comparison to the initial solution. The execution time was about 30 minutes and the final energy was 424478. Table 3 shows the system performance of the relation between the number of temperatures and quality of solutions. When the number of temperatures is 32, 16 or 8, with the other conditions the same, the lower bounds of the final chip are estimated as shown in Table 3. When the number of temperatures is 63, the cooling schedules adopted by the final solution were as follows. The initial temperature was 3823, the highest temperature in the process was 4487, and the number of temperatures the solution passed was 53. We observed that 10 solutions out of the initial 63 had been disposed of at the lowest temperature. This indicates that the mechanism of the automatic cooling schedules actually worked as intended. When the number of temperatures is 8, the results are even worse. If the dispersions in energy for each temperature are too far from each other, the chance of exchange gets small. So the automatic cooling schedules will not work as intended. As a result, th·e algorithm can not get out of the local optimum. To get an effective cooling schedule, it is necessary to 245 find the appropriate value of the highest temperature so that it can reach the disorder state. It is also necessary to adjust the number of temperatures according to the size of the problem. A's a future work, we are planning to study the mechanism for deciding the initial temperatures assigned to each processor from the energy dispersion of the solutions. From the viewpoint of system performance, more speed-up and improvement in the ability to treat larger amounts of benchmark data are needed as the next step. 4 4.1 Logic Simulator Background The logic simulator is used in order to verify not only the functions of designed circuits but also the timing of signal propagation. Parallel logic simulation is treated as a typical application of Parallel Discrete Event Simulation (PDES). PDES can be modeled so that several objects (state automata) change their states by communicating with each other. A message has information on the event whose occurrence time is stamped on the message (timestamp). In logic simulation, an object corresponds to a gate and an event means the change of the signal value. In PDES, the time-keeping mechanism is essential for efficient execution. The mechanisms broadly fall into three categories: synchronous mechanisms, conservative mechanisms and optimistic mechanisms. Their peculiar shortcomings are widely known; the synchronous mechanisms require global synchronization, the conservative mechanisms often deadlock and the optimistic mechanisms need rollback. We are targeting an efficient logic simulator on PIM, which is a distributed memory MIMD machine. We adopted an optimistic mechanism, the Time Warp mechanism (TW), whose rollback process has been considered to be heavy. In practice, however, TW has neither been evaluated in detail nor compared with other mechanisms on MIMD machines. We expected that TW would be suitable for logic simulator on large-scale MIMD machines with some devices that reduced the rollback overhead. Thus a local message scheduler, an antimessage reduction mechanism and a load distribution scheme were added to our system and evaluated. Furthermore, we made two other simulators using different time-keeping mechanisms and compared the mechanisms with TW. 4.2 Time Warp Mechanism The Time Warp mechanism[Jefferson 1985] was proposed by D. R. Jefferson. InPDES using TW, each object usually acts according to received messages and also records the history of messages and states, assuming that messages arrive chronologically. But when a message arrives at an object out of time-stamp order, the object rewinds its history (this process is called rollback), and makes adjustments as if the message had arrived in the correct time-stamp order. After rollback, ordinary computation is resumed. If there are messages which should not have been sent, the object also sends antimessages in order to cancel those messages. 4.3 System Specification The system simulates combinatorial circuits and sequential circuits that have feedback loops. It handles three values: Hi, Lo, and X (unknown). A different delay time can be assigned to each gate (non-unit delay model). Since this simulator only treats gates, flip-flops and other functional blocks should be completely decomposed into gates. 4.4 Implementation Since TW contains its peculiar overheads caused by the rollback processes, some devices for reducing overheads are needed for quick simulation. Furthermore, inter-PE communication overheads must be reduced because the simulator works on a distributed memory machine such as PIM. For these purposes, a load distribution scheme, a local message scheduler and an antimessage reduction mechanism are included in our simulator. These are expected to reduce the overheads described above and might promote efficient execution of the simulator. Each device is outlined below. Details are presented in [Matsumoto et al. 1992]. • Cascading-Oriented Partitioning We propose the "Cascading-Oriented Partitioning" strategy for partitioning circuits to attain high-quality load distribution. This scheme provides adequate partitioning solutions that satisfy these three requirements: load balancing, keeping inter-PE communication frequency low and deriving a lot of parallelism. • Local Message Scheduler During simulation, there are usually several messages to be evaluated in a PE. When the Time Warp mechanism is used, the bigger the time-stamp a message has, the more likely the message is to be rolled back. For this reason, appropriate message scheduling in each PE is n'eeded for reducing rollback frequency. • Antimessage Reduction As long as messages are sent through the KL1 stream, messages arrive at their receiver in the same order as they are transmitted. In this environment, subsequent antimessages can be reduced. We adopted this optimization, expecting that it would reduce the rollback cost. 246 Performance Speedup 100000 60 Ideal 50 s13207 s9234 s5378 s1494 40 9K ev /sec) Time Warp Conservative Time Wheel 80000 60000 30 40000 20 20000 10 10 20 30 40 50 10 60 Figure 9: Speedup 4.5 Measurements We executed several experimental simulations on the Multi-PSI. Four sequential circuits, presented in ISCAS'89, were simulated in our experiments. Figure 9 shows the system performance when the circuits were simulated using various numbers of PEs. The best performance is also shown there. In the best case, very good speedup of 48-fold was attained using 64 PEs. Approximately 99K events/sec performance, fairly good for a full-software logic simulator, was also attained. 4.6 Comparison between Time-keeping Mechanisms For the purpose of comparing the Time Warp mechanism with others on the same machine, we made a further two simulators; one uses the synchronous mechanism and the other uses the conservative mechanism. In the synchronous mechanism, only messages with the same time-stamp can be evaluated simultaneously. Therefore, a time wheel residing in each PE must synchronize globally at every tick. On the other hand, the problem of deadlock should be resolved [Misra 1986, Soule et al. 1989] in conservative mechanisms. Our simulator basically uses null messages to avoid deadlock. A mechanism for reducing unnecessary null messages is also added in order to improve performance. Figure 10 compares system performance when circuit s13207 was simulated under the same conditions (load distribution, input vectors, etc.). The synchronous mechanism showed good performance using comparatively few PEs, however, the performance peaked at 16 PEs. Global synchronization at every tick apparently limits performance. The conservative mechanism indicated good speedup but poor performance: using 64 PEs, only about 1.7 k events/sec performance was obtained. We measured the number of null messages generated during the simulation and found that the number of null messages was 40 times as many as that of actual events! That definitely was the cause of the poor performance. 20 30 40 50 60 No. of PEs No. of PEs Figure 10: Performance Comparison (events/sec) This comparison substantiates that the Time Warp mechanism provides the most efficient simulation of the three mechanisms on distributed memory machines such as the Multi-PSI. 5 Concluding Remarks This paper presented ICOT-developed parallel systems for routing, placement and logic simulation, and reported on their evaluation. In the routing system, the router program was designed based on the concurrent objects model and congruent with the KLI description was introduced. As a result, appreciably good speedup was attained and the quality of the solutions was high especially for large-scale data. The parallel placement system is based on timehomogeneous SA, which realizes an automatic cooling schedule. The remarkable point of this system is that parallelization was applied not for the purpose of speedup, but to obtain high quality solutions. The parallel logic simulator simply targeted quick execution. Absolutely good speedup was attained. The experimental results for three kinds of time-keeping mechanisms revealed that the Time Warp mechanism was the most efficient time-keeping mechanism on distributed memory machines. These three systems are positive examples which support that PIM possesses high applicability to various practical problem domains as a general purpose parallel machine .. Besides them, we are currently developing a hybrid layout system in which routing and placement are performed concurrently, improving interim solutions incrementally. These experiments, including the hybrid layout system, are just the preliminary experiments in the coming epoch of parallel machines, but they must be one of the most important and fundamental experiences for the future. 247 Acknowledgement Valuable advice and suggestions were given by the members of PIC- WG, a working group in ICOT, during discussion of parallel LSI-CAD. The authors gratefully thank them. Data for the evaluation of our systems were recommended and given by NTT Co., Hitachi Ltd. and Fujitsu Ltd. We also thank these companies. References [Brouwer 1990] R. J. Brouwer and P. Banerjee. PHIGURE : A Parallel Hierarchical Global Router. In Proc. 27th Design Automation Conj., 1990. pp. 650-653. [Chikayama et al. 1988] T. Chikayama, H. Sato and T. Miyazaki. Overview of the parallel inference machine operating system (PIMOS). In Pmceedings of International Conference on Fifth Generation Computer Systems, ICOT, Tokyo, 1988. pp. 230-251. [Fukui 1989] S. Fukui. Improvement of the Virtual Time Algorithm. Transactions of Informa.tion Processing Society of Japan, Vol.30, No.12 (1989), pp. 15471554. (in Japanese) [Jefferson 1985] D. R. Jefferson. Virtual Time. ACM Transactions on Programming Languages and Systems, Vo1.7, No.3 (1985), pp. 404-425. [Kawamura et al. 1990] K. Kawamura, T. Shindo, H. Miwatari and Y. Ohki. Touch and Cross Router. In Proc. IEEE ICCAD90, 1990. pp. 56-59. [Kimura et al. 1991] K. Kimura and K. Taki. Timehomogeneous Parallel Annealing Algorithm. In Proc. IMACS'91, 1991. pp. 827-828. [Kirkpatrick et al. 1983] S. Kirkpatrick, C. D. Gellat and M. P. Vecci. Optimization by Simulated Annealing, Science, Vo1.220, No.4598, 1983. pp. 671-681. [Kitazawa 1985] H. Kitazawa. A Line Search Algorithm with High Wireability For Custom VLSI Design, In Pmc. ISCAS'85, 1985. pp. 1035-1038. [Matsumoto et al. 1992] Y. Matsumoto and K. Taki. Parallel logic Simulator based on Time Warp and its Evaluation. In Proc. Int. Conj. on Fifth Generation Computer Systems, ICOT, Tokyo, 1992. [MCNC 1990] P1'OC. International TiVorkshop Layout Synthesis '90 Research Triangle Park, North CaroEna, USA, May 8-11, 1990. [Misra 1986] J. Misra. Distributed Discrete-Event Simulation. ACM Computing Surveys, Vol. IS, No.1 (1986), pp. 39-64. [Nair et al. 1982] R. Nair, S. J. Hong, S. Liles and R. Villani. Global Wiring on a Wire Routing Machine. In Pmc. 19th Design Automation Conj.l 1982. pp. 224-231. [Nakajima et al.1989] K. Nakajima, Y. Inamura, N. Ichiyoshi, K. Rokusawa and T. Chikayama. Distributed Implementation of KLI on the MultiPSIjV2, In Proc. 6th Int. Conj. on Logic Programming, 1989. pp. 436-45l. [Olukotun et al. 1987] O. A. Olukotun and T. N. Mudge. A Preliminary Investigation into Parallel Routing on a Hypercube Computer, In Proc. 24th Design A utomation Con!, 1987. pp. 814-S20. [Rose 1988] J. Rose. Locusroute: A Parallel Global Router for Standard Cells, In Proc. 25th Design A utomation Con!, 1988. pp. 189-195. [Sechen et al. 1985] C. Sechen and A. SangiovanniVincentelli. The TimberWolf Placement and Routing Package, IEEE Journal of Solid-State Circuits, Vol.SC-20, No.2, (1985), pp. 510-522. [Soule et al. 1989] L. Soule and A. Gupta. Analysis of Parallelism and Deadlock in Distributed-Time Logic Simulation. Stanford University Technical Report, CSL-TR-89-378 (1989). [Suzuki et al. 1986] K. Suzuki, Y. Matsunaga, M. Tachibana and T. Ohtsuki. A Hardware Maze Router with Application to Interactive Rip-up and Reroute. IEEE Trans. on CAD, Vol.CAD-5, No.4, (1986), pp. 466-476. [Taki 1988] K. Taki. The parallel software research and development tool: Multi-PSI system, Programming of Future Generation Computers, pp. 411-426, North- Holland, 1988. [Ueda et al. 1990] K. Ueda, T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. The Computer Journal, Vol.33 , No.6, (1990), pp. 494'500. [Watanabe et al. 1987] T. Watanabe, H. Kitazawa, Y. Sugiyama. A Parallel Adaptable Routing Algorithm and its Implementation on a Two-Dimensional Array Processor. IEEE Trans. on CAD, Vol.CAD-6, No.2, (1987), pp. 241-250. [Won et a1. 1987] Y. Won, S. Sahni and Y. El-Ziq. A Hardware Accelerator for Maze Routing. In Proc. 24th Design Automation Conj., 1987. pp. SOO-806. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 248 Parallel Database Management System: Kappa-P Moto Kawamura Hiroyuki Sato t Kazutomo N aganuma Kazumasa Yokota Institute for New Generation Computer Technology (ICOT) 21F. Mita-Kokusai Bldg., 1-4-28 Mita, Minat'o-ku, Tokyo 108, Japan e-mail: {kawamura, naganuma, kyokota }@icot.or.jp Tel: +81-3-3456-3069 Fax: +81-3-3456-1618 t Mitsubishi Electric Corporation Computer & Information Systems Laboratory 5-1-1 Ofuna, Kamakura, Kanagawa 247, Japan e-mail: hiroyuki@isl.melco.co.jp Tel: +81-467-46-3665 Fax: +81-467-44-9269 Abstract A parallel database management system (DBMS) called K appa- P has been developed in order to provide efficient database management facilities fQr knowledge information processing applications in the Japanese FGeS project. The data model of Kappa-P is based on a nested relational model for treating complex data structures, and has some new data types. Kappa-P has features of both a parallel DBMS on a tightly-coupled multiprocessor and a distributed DBMS on a looselycoupled multiprocessor. In this paper, we describe the overview of Kappa-P. 1 Introduction In the Japanese FGeS (Fifth Generation Computer System) project, many knowledge information processing systems (KIPSs) have been designed and developed under the framework of logic and parallelism. Among them, R&D of databases and knowledge-bases[14] aims at an integrated know ledge-base management system (KBMS) under a framework of deductive objectoriented databases (DOODs). Kappa 1 is a database management system (DBMS) located in the lower layer and is also a name of the project. The objective is to provide database management facilities for many KIPSs and support efficient processing of the FGCS prototype system as the database engine. In the Kappa project, we have developed a sequential DBMS, Kappa-II and a parallel DBMS, Kappa-P. Both systems adopt a nested relational model. Kappa-II, which is a research result of the intermediate stage, is written in ESP, and works on sequential inference machines PSI and its operating system SIMpas. The system showed us that our approaches based l,Knowledge Application-Oriented Advanced Database Management System on the nested relational model are sufficient for KBMSs and KIPSs, and has been used as the DBMS on PSI machines by various KIPSs, for instance natural language processing systems with electronic dictionaries, proof checking systems with mathematical knowledge, and genetic information processing systems with molecular biological data. A parallel DBMS project called Kappa-P[7] was initiated at the beginning of the final stage. Kappa-P is based on Kappa-II from a logical point of view, and its configuration and query processing have been extended for the parallel environment. Kappa-Pis written in K11 and works on the environment of PIM machines and their operating system PIMOS. The smallest configuration of Kappa-P is almost the same as Kappa-II. Compared both systems on the same machine, KappaP works with almost the same efficiency as Kappa-II. Kappa-P is expected to work on PIM more efficiently than Kappa-II, as their environments are different. We describe the design policies in Section 2 and the features in Section 3. We -explain the features of Kappa's nested relational model that are different from others in Section 4. Then, we describe an overview of the Kappa-P system: data placement in Section 5, management of global information in Section 6, query processing in Section 7, and implementation issues of element DBMSs in Section 8. 2 Design Policies There are various data and knowledge with complex data structure in our environment. For example, molecular biological data treated by genetic information processing systems includes various kinds of information and huge amounts of sequence data. The GenBank/HGIR database[3] has a collection of nucleic acids sequences, the physical mapping data, and related bibliographic information. Amount of data has 249 been increasing exponentially. Furthermore, the length of values is extremely variable. For example, the length of sequence data ranges from a few characters to 200,000 characters and becomes longer for genome data. Conventional relational model is not sufficient for efficient data representation and efficient query processing. Since the data is increasing rapidly, more processing power and more secondary memory will be required to manage it. Such situations require us to have a data model which can efficiently treat complex structured data and huge amount of data. Parallel processing enables us to improve throughput, availability, and reliability. PIM-p is a hybrid MIMD multi-processor machine which has two aspects, a tight1y-coupl~d multi-processor with a shared memory, called a cluster, and a loosely-coupled multiprocessor connected by communication networks. Disks can be connected to each cluster directly. The architecture can be that of typical PIMs. Both applications and Kappa-P are executed on the same machine. Both KBMSs and KIPSs need a lot of processing power to improve their response, so Kappa- P should be designed to improve the throughput. The system should use resources effectively, and be adapted for the environment. For the above requirements, the system is designed as follows: • In order to treat complex structured data efficiently, a nested relational model is adopted. The model is nearly the same as Kappa-II's data model, which shows us efficient handling of complex structured data. New data types and new indexed attributes should be added to handle huge amounts of data efficiently. • The system should use system resources effectively to improve throughput. System resources are processing elements, shared memories, disks, and communication networks. The system should processors effectively. use hybrid multi- Main memory database facilities should be provided for effective utilization of (shared) main memories. Because the data structure of the nested relation with variable occurrences and strings is complex, such a structure can be handled more efficiently on main memory than secondary memory. The system should provide parallel disk access to reduce disk access overheads. The system should actively control communication among clusters in order to reduce 'communication overheads in query processing. • The system should be adapted for the environment. Though Kappa-P may be similar to database machines[l], the difference between Kappa-P and database machines is that both applications and a DBMS work together on the same machine. The system should provide functions to reduce communication overheads between applications and the system, because they work together on the same machine. The functions execute part of applications at clusters which produce input data. - The system should provide a mechanism to process some queries in an application, because internal processing of an application is parallel, and some queries can occur in parallel. The system uses the PIMOS file system, a part of which is designed for Kappa-P. The file system provides efficient access to large files, mirrored disks, and the sync mechanism for each file. 3 Features According to the policies mentioned in the previous section, Kappa-P has been implemented. The system has the following features: • Nested Relational Model Already mentioned, conventional relational model is not appropriate in our environment. In order to treat complex structured data efficiently, a nested relational model is adopted. The nested relational model with a set constructor and hierarchical attributes can represent complex data naturally, and can avoid unnecessary division of relations. The model is nearly the same as Kappa-II's data model, which shows us efficient handling of complex structured data. Because Kappa-P is also the database engine for the KBMS of the FGCS project, the semantics of nested relations matches the knowledge representation language, QUIxoT£[lO] of the KBMS. Term is added as a data type to store various knowledge. The character code of the PIM machine is based on 2-byte code, but the code wastes secondary memory space. In order to store a huge amount of data, such as a genome database in the near future, new data types and new indexed attributes are added. • Configuration The configuration of Kappa-P corresponds to the architecture of the PIM machine, and distinguishes 250 inter-cluster parallelism from intra-cluster parallelism. Kappa-P is constructed of a collection of element DBMSs located in clusters. These element DBMSs cooperate to process each other's queries. Figure 1 shows the overall configuration of KappaP. The global map of relations is managed by some element DBMSs called server DENISs. Sever DBMSs manage not only global map but also ordinary relations. Element DBMSs, with the exception of server DBMSs, are called local DBMSs. Interface processes are created to mediate application programs and Kappa-P, and receive queries as messages. element DBMSs. However, in the current implementation, the replicated relation can be used for the global map only, that is, for server DBMSs. Relations can be located in main memory or secondary memory in an element DBMS. Relations in main memory are temporary relations with no correspondent data in secondary memory. This means relation dis.tribution in an element DBMS. A quasi main memory database, which guarantees to reflect those modifications to secondary memory, contains a relation in secondary memory and a replica of the relation in main memory. • Query Processing There are two kinds of commands for query processing: primitive commands and 1(QL, a query language based on extended relational algebra. Primitive commands are the lowest operations for relations, and can treat relations efficiently. The KQ1 is syntactically like K11. New operations can be defined temporarily in a query. A query in KQ1 is translated into sub-queries in intermediate operations for extended relational algebra, and is submitted to relevant element DBMSs. A query in primitive commands is submitted to relevant element DBMSs. The query is processed as a distributed transaction among relevant element DBMSs, and is finished under the control of two phase commitment protocol. • Parallel Processing Figure 1: Configuration • Data Placement The placement of relations also corresponds to parallelism: inter element DBMS placement and intra element DBMS placement. In order to use inter-cluster parallelism, relations can be located in some element DBMSs. The simple case is the distribution of relations like distributed DBMSs. When a relation needs a lot of processing power and higher bandwidth of disk access, the relation can be declustered as a horizontally partitioned relation and can be located in some element DBMSs. When a relation is frequently accessed in any query, some repljcas of the relation can be made and can be located in some Parallel processing of Kappa-P corresponds to the architecture of the PIM machine: inter-cluster parallelism among element DBMSs and intra-cluster parallelism in an element DBMS. The trade-off is processing power and communication overheads. There are two kinds of parallel processing depending on data placement. Distribution of relations and horizontal partition of relations give us inter-cluster paralleljsm. In this case, a query is translated into sub-queries for some element DBMSs. Replication of a relation decentralizes access to the relation and improves availability. • Compatibility to Kappa-II The PIM machine is used via PSI machines acting as front-end processors. In order to use programs developed on PSI machines, such as terminal interfaces or application programs on Kappa-II, Kappa-P provides a program interface compatible to Kappa-II's primitive commands. 251 4 Nested Relational Model A nested relational model is well known to reduce the number of relations in the case of multi-value dependency and to represent complex data structures more naturally than conventional relational model. However, there have been some nested relational models[8, 9, 2] since the proposal in 1978[6]. That is, even if they are syntactically the same, their semantics are not necessarily the same. In Kappa, one of the major problems is which semantics is appropriate for the many applications in our environment. Another problem is which part of QUIXOTe should be supported by Kappa as a database engine because enriched representation is a trade-off in efficient processing. In this section, we explain the semantics of the Kappa model. Intuitively, a nested relation is defined as a subset of a Cartesian product of domains or other nested relations: NR ~ Ei .. - El x··· x En D 12NR where D is a set of atomic values 2. That is, the relation may have a hierarchical structure and a set of other relations as a value. It corresponds to introducing tuple and set constructors as in complex objects. Corresponding to syntactical and semantical restrictions, there are various subclasses, in each of which extended relational algebra is defined. In Kappa's nested relation, a set constructor is used only as an abbreviation for a set of normal relations as follows: {r[ll = a, 12 = {bI, ... , bn }]} ¢:} {r[ll = a, 12 = bd,· .. , r[ll = a, 12 = bn ]} The operation of ":::}" corresponds to an unnest operation, while "¢::" corresponds to a nest or group-by operation. "¢::", however, is not necessarily congruent for the application sequence of nest or group-by operations. That is, in Kappa, the semantics of a nested relation is the same as the corresponding relation without set constructors. The reason why we take such semantics is to retain the first order semantics for efficient processing and to remain compatible with widely used relational model. Let a nested relation NR= {ntl,···,nt n } = {til'···' tid for i = 1,···, n, where nti then the semantics of N R is {tIl'···' tlk,···, t n },· .. , tnk}. Extended relational algebra to this nested re- results according to the above semantics, which guarantees to produce the same result to the corresponding relational database, except treatment of attribute hierarchy. As a relation among facts in a database is conjunctive from a proof-theoretic point of view, we consider a query as the first order language: that is, in the form of rules. The semantics of a rule constructed by nested tuples with the above semantics is rather simple. For example, the following rule r[ll =X, 12 = {a, b, e}] ¢:: B, r'[13 = Y, 14 = {d, e}, 15 = Z], B'. can be transformed into the following set of rules without set constructors: r[ll= X, h = a] ¢:: B, r'[1 3=.Y, 14 = d, 15 = Z], r'[13 = Y, 14 = e, 15 = Z], B'. r[l1=X,12=b] ¢:: B, r'[13 = Y,14 = d, 15 = Z], r'[1 3= Y, 14 = e, 15 = Z], B'. r[11=X,12=e] ¢:: B, r'[13= Y, 14 = d, 15 = Z], r'[13= Y,1 4 = e, 15 = Z], B'. That is, each rule can also be unnested into a set of rules without a set constructor. The point of efficient processing of Kappa relations is how to reduce the number of unnest and nest operations, that is, how to process sets directly. Under the semantics, query processing to nested relations is different from conventional procedures. For example, consider a simple database consisting of only one tuple: r[ll = {a,b},12 = {b,e}]. For a query ?-r[ll = X,1 2 = X], we can get X = {b}, that is, an intersection of {a, b} and {b, e}. That is, a concept of unification should be extended. In order to generalize such a procedure, we must introduce two concepts into the procedural semantics[ll]: 1) Residue Goals Consider the following program and a query: r[I=S'] ¢:: B. ?-r[l=S]. If S n S' is not an empty set during unification between r[l = S] and r[l = S'], new subgoals are r[l = S \ S'], B. That is, a residue subgoal r[1 = S \ S'] is generated if Sl \ S2 is not an empty set. Otherwise, the unification fails. Note that there might be residue subgoals if there are multiple set values. 2) Binding as Constraint Consider the following database and a query: lational database is defined in Kappa and produces 2The term "atomic" does not carry its usual meaning. For example, when an atomic value has a type term, the equality must be based on unification or matching. rl[ll=Sl]. r2[12 = S2]. ?-rl[ll =XJ, r2[1 2=X]. 252 Although we can get X = 51 by u~ification between rd11 = X] and r1[11 = 51] and a new subgoal r2[12 = 51], the succeeding unification results in r2[12 = 51 n 52] and a residue subgoal r2[12 = 51 \ 52]' Such a procedure is wrong, because we should have an answer X = 51 n 52' In order to avoid such a situation, the binding information is temporary and plays the role of the constraints to be retained. The procedural semantics of extended relational algebra is defined based on the above concepts. According to the semantics, a Kappa database is allowed not necessarily to be normalized also in the sense of nested relational model, in principle: that is, it is unnecessary for users to be conscious of row nest structure. On the other hand, in order to develop deductive databases on Kappa, a logic programming language, called CRL [11] is developped on the semantics and is further extended in QUIXOTE [10]. There remains one problem such that unique representation of a nested relation is not necessarily decided in the Kappa model, as already mentioned. In order to decide a unique representation, each nested relation has a sequence of attributes to be nested in Kappa. Consider some examples of differences among some models. First, assume a relation r consisting of two tuples: {r[ll r[ll = a,12 = {b,c}l, = a,l2 = {c,d}]} By applying a row-nest operation on 12 to R, we get two possible relations: {r[ll {r[ll = a, 12 = {b, c, d}]} = a, 12 = {{b, c}, {c, d}}]} According to the semantics of Kappa and Verso[9], we get the first relation, while, according to one of DASDBS[8] and AIM-P[2l, we get the second. Secondly, consider another relation r' consisting of only one tuple: By applying selection operations Cf12=b1 and get the following two relations, respectively: CfI3=Cl' we {r'[ll = a,l2 = b1,13 = {C1,C2}]} {r'[ll = a, 12 = {b 1, bd, l3 = C1]} If we apply a union operation to the above two relations, we get two possible relations. According to the semantics of Verso, we get the following (original) relation: That is, although the combination of 11 = a, 12 = b2, and 13 = C2 is not selected after two selections, it comes back to life in the result of the union. On the other hand, according to the semantics of Kappa, we have one of the following: = a, 12 = b1 , 13 = {Cll cd], = a,12 = b2,13 = C1]}, or {r'[ll = a, 12 = {b 1, b2}, 13 = C1l, r'[h = a, 12 = b1 , 13 = C2]} {r'[ll r'[ll Which relation is selected depends on nested sequence defined in the schema. According to the above semantics, the Kappa model guarantees more efficient processing by reducing the number of tuples and relations, and more efficient representation by the complex construction than relational model. 5 Data Placement In order to obtain larger processing power using intercluster parallelism, relations should be located in different element DBMSs. Kappa-P provides three kinds of data placement: distribution, horizontal partition, and replication. • Distribution Distribution of relations is a simple case like distributed DBMSs. When relations are distributed in some element DBMSs, larger processing power can be obtained, but communication overheads may be generated at the same time. A database designer should be responsiple for distribution of relations, because how to distribute relations relates to relationships among relations and kinds of typical queries to the database. In typical queries, strongly related relations should be in the same element DBMS, and loosely related relations might be in different element DBMSs. A query to access these relations is divided into subqueries for some element D BMSs by an interface process (Figure 1), and each sub-query is processed as a distributed transaction. • Horizontal Partition A horizontally partitioned relation is a kind of declustered relation. It is logically one relation, but consists of some sub-relations containing distributed tuples according to some declustering criteria. A horizontally partitioned relation is effective when the relation needs a lot of processing power and higher bandwidth of disk access. For example, it is effective in a case of a molecular biological database which includes sequence data which requires homology search by a 253 pattern called motif. A database designer is also responsible for horizontal partition of relations, because horizontal partition does not always guarantees efficient processing if it does not satisfy declustering criteria. A query to access horizontally partitioned relations is converted into sub-queries to access each sub-relation. Each sub-query is processed in parallel in a different clusters with sub-relations. Especially, when the query is a unary operation or a binary operation suitable for the declustering criteria, each sub-query can be processed independently and communication overheads among clusters can be disregarded. In other cases, as communication overheads among clusters can't be disregarded and it is necessary to convert the queries to reduce the overheads. • Replication Replication of a relation in some element DBMSs enables us to decentralize to access the relation, and to improve availability. Only the global map held in server DBMSs is replicated with a voting protocol in the current implementation of Kappa-P. The replication avoids centralizing access for server DBMSs, and even if some server DBMSs would stop, server facilities can work on. of a relation name and an element DBMS name in which the relation exists. This information is referred in order to find relevant element DBMSs from relation names at the beginning of query processing. When a relation is created, a message to register the relation name information is sent from the transaction. When a relation is deleted, a message to erase the relation name information is sent from the transaction. Global information is replicated in order to decentralize ascesses to server DBMSs. Replication of the information is implemented by using a voting protocol. In order to access server DBMSs, a distributed transaction uses two phase commitment protocol. • Management of Physical Information Server DBMSs manage physical information, such as start-up information, current status, and stream to communicate. Sever DBMSs watch the state of element DBMSs. At the beginning of query processing, a server DBMS connects an interface process to relevant element DBMSs. 7 Query Processing 7.1 6 Management of Global Information Metadata of Kappa-P is divided into two kind of information: global information and local information. The global information consists of logical information, such as the database name and relation names, and physical information about element DBMSs, such as start-up information, current status, and stream to communicate. The local information also consists of logical information, such as the local database name and schema, and physical information, such as file names and physical structures of relations. Each element DBMS manages local information, and server DBMSs manage global information in addition ordinary relations~ The role of server DBMSs is management of global information, especially, management of relation names for query processing and establishment of communication path between an interface process and element DBMSs. • Management of Relation Names It is necessary to guarantee the uniqueness of relation names. The simplest way is that a relation name forces to contain the relevant element DBMS name. Such a name is not suitable for Kappa-P, because Kappa-P treat logically one database. Server DBMSs manage relation names centrally, and provide location independent relation names. The information consists Query Language There are two kinds of language for query processing: KQL, a query language based on extended relational algebra, and primitive commands. A query in both primitive commands and KQL is in the form of a message to an interface process, and the result is returned through the tuple stream which dose not have cursors as SQL. • KQL(Kappa Query Language) KQL is syntactically similar to KLI. Operations of extended relational algebra are written like predicates, and new operations can be defined temporarily, which take relations only as their arguments. Figure 2 shows a query in KQL. • Primitive Commands Primitive commands are the lowest operation for nested relations and a collection of unary operators for a nested relation. Figure 3 shows an example in primitive commands. 7.2 Query Processing A query in KQL is processed in the following steps. • Query Translation 254 go(Result: :resultl, Temp: :result2) :- true I selection(table2, "(from = "icot"), Temp), difference(tablel, tablel, ErnptyTable), transitive_closure(tablel, ErnptyTable, tablel , Result). The information is used to estimate amounts of intermediate results, and reduce communication costs . • The number of tuples and the. average size of tuples The information is also used to estimate amounts of intermediate results. transitive_closure(Delta, In, R, Out) :empty (Delta) I In = Out. transitive_closure(Delta, In, R, Out) :- true joinCIn, In, "(to = from), Inl), projectionCInl, {, 1. from' , '2.to'}, In2: : {from, to}), union(In2, R, Nextln), difference(Nextln, In, Delta), transitive_closure(Delta, Nextln, R, Out). Figure .2: Query in KQL ifp:create(pc, off, IFP, StatusO), IFP = [open( [] , Statusl), begin_ transaction( [table1(read)] ,0 ,Status2) , create_format(tablel, "(*), FMT, Status3), read_record(tablel,FMT,rid,TupleStrearnO,S4) I IFP1] , TupleStrearnO = [Bufferl I TupleStrearnl] , '1.'1. Bufferl = [Tuple1, ... TupleN] TupleStrearnl = [Buffer2 I TupleStrearn2], '1.'1. Buffer2 = [Tuple2l, ... Tuple2N] IFPl = [ end_transaction(Status5), close(Status6)] . Figure 3: Primitive Commands A query in KQL is translated into sub-queries, which is called intermediate operations (shortly, operations in the following procedures) for extended relational algebra, by an interface process of Kappa-P. 1) Get relation names by parsing a query, and get location information of the relations from randomly selected server DBMSs. 2) Get schemata and supplementary information of the relations from relevant element DBMSs. In case of horizontally partitioned relations and quasi main memory relations, information of subrelations is assembled into one. Supplementary information is followings: • List of indexed attributes The algorithm of query processing is dependent on whether an attribute is indexed or not. • Uniqueness of attribute value The algorithm is dependent on whether an attribute value is unique or not. • Kinds of attribute values and the number of attribute values 3) Replace an operation for a horizontally partitioned relation with some operations for sub-relations and add merge operations. In case of a quasi main memory relation, replace an update operation with operations both the secondary memory relation and the temporary relation. Replace a non-update operation with an operation for the temporary relation. 4) The executing order of operation in the query is extracted from the query, and operations to control executing order are embedding in the query. Since the query can include update operations, it is impossible to control the data flow graph only. 5) Using basic optimization techniques by supplementary information of relations, the query is translated, and an algorithm for processing extended relational algebra is determined. In this phase, some execution plans are produced. 6) According to the location information of relations, the candidates are divided into sub-sequences to minimize the communication costs estimated by the supplementary information. Sub-sequences with the least communication cost are chosen, and operations to transfer tuples are embedded in the su b-sequences. 7) Each sub-sequence is translated into KLI program with procedures calling intermediate operations. • Query Execution Each sub-sequence is sent to the related element DBMS, and processed. Each sub-sequence is executed as a distributed transaction with two phase commitment protocol. Although processing in an element DBMS is based on tuple streams, data in other element DBMSs are accessed via transfer operations embedded in the query translation phase. 7.3 User Process in Element DBMS Because both Kappa-P and application programs work together on the same machine, the system cannot only provide higher communication bandwidth between them, but can also reduce communication overheads between them by allocating them in the same cluster. 255 In primitive commands, a tuple filter are taken as the argument of a read operation. The read operation invocates the filter in the same cluster in which a relation exists. If the relation is horizontally partitioned, filters for each sub-relation is invocated, and the outputs of all filters are merged into one. In KQL, a filter is specified as one of the new operations. 8 EJ Element DBMS An element DBMS contains full database management facilities, and accepts intermediate operations for extended relational algebra and primitive commands. We are not concerned with communication overheads in element DBMSs. Kappa-P uses parallel processing on a shared memory only, but doesn't use parallel operations for secondary memory in element DBMSs, in the current implementation. • Parallel Processing by Tuple Stream Stream programming is a very typical programming style in KLI. In general, a query can be expressed as a graph, which consists of some nodes corresponding to the operations and arcs corresponding to relationships among operations. In KLl, the graph corresponds to the processing structure of the query. The nodes become processes, and arcs become streams through which tuples are sent. In KLl, the number of tuples in the streams does not only depend on the amount of intermediate results, but also the number of processes to be scheduled. So, it is very important to control the number of tuples, and to drive the streams on demand with double buffering. Figure 4 shows an example for parallel processing by tuple stream: Table 3 = 1r[a,b)Table 1 N Table 2. • Parallel Processing of Primitive Commands Primitive commands process various operations in parallel for nested relations, for instance, operations set for and index operations of temporary relations. A set is a collection of tuple identifiers, and is obtained by restriction operations. In order to parallelize set operations, a set is partitioned according to the range of tuple identifiers. Index structure of temporary relations is T-tree[5], which is more sufficient in main memory than B-tree. Range retrieval operations, we are processed in parallel. In general, leaf nodes are connected in order like B+ -tree to trace succeeding leaf node directory. In our experiments, such a structure can't work efficiently in KLI. Range retrieving on a tree, whose leaf nodes are connected, is done following steps: finding the minimum value of the range, and then, tracing through the Figure 4: Parallel Processing by Tuple Stream connection until the maximum value is found .. These steps are almost processed sequentially. Assuming that H is the height of the tree, and R is the number of leaf nodes between the range, the number of comparison of values is H + R. On the other hand, range retrieving on a tree whose leaf nodes are not connected is done following steps: finding a minimum value of the range, finding a maximum value of the range, and collecting values between them. These steps are almost processed in parallel. The number of comparison is 2H. The latter has advantages about parallelism, wide range retrieving, and efficient implementation in KLl. • Main Memory Database Facilities Each cluster of PIM has hundreds of mega bytes of main memory. In order to use such a large memory effectively, Kappa-P provides temporary relations and quasi main memory database facilities. Because tuples of nested relations with variable occurrences and strings are complex, such a structure can be handled more efficiently in main memory than in secondary memory. Temporary relations exist only in main memory having no correspondent data in secondary memory, so modifications to the temporary relations are not reflected to secondary memory. But, temporary relations are useful for application programs which create many intermediate relations such as deductive databases. The temporary relations have the same interface as secondary memory relations. A quasi main memory database, which guarantees to reflect those modifications to secondary memory, is not a pure main memory database, but parallel processing enables the quasi main memory database to work with nearly the same throughput as a main memory database. A quasi main memory relation is a kind of 256 replicated relations consisting of a pair of a secondary memory relation and a temporary relation. Kappa-P guarantees that both relations have the same logical structure, such as tuples, indexed attributes, and the same tuple identifiers, even if the relation is updated. Operations except for update operations can be executed by the temporary relation, because the temporary relation is processed faster than the secondary memory relation. Update operations should be executed by relations in parallel and asynchronously, and synchronization is achieved by two phase commitment protocol, which guarantees the equivalence of their contents. 9 Conclusions In this paper, we described a parallel DBMS Kappa-P. In order to provide KBMSs and KIPSs with efficient database management facilities, the system adopts a nested relational model, and is designed to use parallel resources efficiently by using various parallel processing. The smallest configuration of Kappa-P is almost the same as Kappa-II. Compared both systems on the same machine, Kappa-P works with almost same efficiency as Kappa-II. Kappa-P is expected to work on PIM more efficiently than Kappa-I. We will make various experiments for efficient utilization of parallel resources, and show that the system provides KBMSs and KIPSs with efficient database management facilities in the FGCS prototype system. Acknowledgment The Kappa project has had many important contributions in addition to the listed authors. The members of biological databases of the third research laboratory: Hidetoshi Tanaka and Yukihiko Abiru, the users of Kappa-II, and Kaoru Yoshida of LBL have shown us many suggestions for improvements. We' thank Hideki Yasukawa of the third research laboratory for useful suggestions. The authors are grateful to Kazuhiro Fuchi and Shunichi Uchida for encouraging the projects. References [IJ D. J. DeWitt and J. Gray, "Parallel Database Systems: The Future of Database Processing or a Passing Fad 7", SIGMOD RECORD, Vol.l9, No.4, Dec.,1990. [2J P. Dadam, et aI, "A DBMS Prototype to Support Extended NF2 Relations: An Integrated View on Flat Tables and Hierarchies", Proc. SIGMOD, 1986. [3] "GenBank/HGIR Technical Manual", LA-UR 883038, Group T-I0, MS-K710, Los Alamos National Laboratory, 1988. [4J M. Kawamura and H. Sato, "Query Processing for Parallel Database Management System", Proc. J(Ll Programming Workshop '91, 1991. (in Japanese) [5J T. J. Lehman and M. J. Carey, "A Study of Index Structures for Main Memory Database Management Systems", Proc. VLDB, 1986. [6J A. Makinouchi, "A Consideration on Normal Form of Not-Necessarily-Normalized Relation in the Relational Data Model", Proc. VLDB, 1977. [7J H. Sato and M. Kawamura, "Towards a Parallel Database Management System (Extended Abstract)", Proc. Joint American-Japanese Workshop on Parallel J( nowledge Systems and Logic Programming, Tokyo, Sep. 18-20, 1990. [8] H.-J. Schek and G. Weikum, "DASDBS: Concepts and Architecture of a Database System for Advanced Applications", Tech. Univ. of Darmstadt! TR, DVSI-1986-Tl, 1986. [9J J. Verso, "VERSO: A Data Base Machine Based on Non INF Relations", INRIA-TR, 523, 1986. [10J H. Yasukawa, H. Tsuda, and K. Yokota, "Object, Properties, and Modules in QUIXOTE", Proc. FGCS'92, Tokyo, June 1-5, 1992. [l1J K. Yokota, "Deductive Approach for Nested Relations", Programming of Future Generation Computers II, eds. by K. Fuchi and L. Kott, NorthHolland, 1988. [12J K. Yokota, M. Kawamura, and A. Kanaegami, "Overview of the Knowledge Base Management System (KAPPA)", Proc. FGCS'88, Tokyo, Nov.28Dec.2, 1988. [13J K. Yokota and S. Nishio, "Towards Integration of Deductive Databases and Object-Oriented Databases - A Limited Survey", Proc. Advanced Database System Symposium, Kyoto, Dec., 1989. [14J K. Yokota and H. Yasukawa, "Towards an Integrated Knowledge-Base Management System Overview of R&D for Databases and KnowledgeBases in the FGCS project", Proc. FGCS!92, Tokyo, June 1-5, 1992. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE O~ FIFTH GENERATION COMPUTER SYSTEMS 1992, edIted by ICOT. © ICOT, 1992 257 Objects, Properties, and Modules in QUIXOTE Hideki Yas ukaw a , Hiroshi Tsuda, and Kazumasa Yokota Institute for New Generation Computer Technology (ICOT) 21F. Mita-Kokusai Bldg., 1-4-28 Mita, Minato-ku, Tokyo 108, JAPAN e-mail: yasukawa@icot.or.jp.tsuda@icot.or.jp.kyokota@icot.or.jp Abstract This paper describes a knowledge representation language QUIXOTE. QUIXOTE: is designed and developed at ICOT to support wide range of applications in the Japanese FGCS project. QUIXOTE: is basically a deductive system equipped with the facilities for representing various kinds of knowledge, and for classifying knowledge. In QUIXOTE: , basic notions for representing concepts and knowledge are objects and their properties. Objects are represented by extended terms called object terms, and their properties are represented by subsumption constraints over the domain of object terms. Another distinguished feature of QUIXOTe is its concept of modules. Modules play an important role in classifying knowledge, modularizing a program or a database, assumption-based reasoning, and so on. In this paper, the concepts of objects, properties, and modules are presented. vVe also present how modules work with objects and their properties. assumption-based reasoning, and so on. In this paper, concepts of objects, properties, and modules are presented. We also present how modules work with objects and their properties, for example, in classifying or modularizing them. Other features of QUIXOTe and the formalism appear in other papers[20, 12, 21]. Section 2 shows how objects and their proper.ties are treated in a simple version of QUIXOTE:. Section 3 shows how complex objects are introduced in QUIXOTe, and how they are used to deal with exceptions in property inheritance. Section 4 describes deductive rules in QUIXOT£, and the overview of deductive aspects of QUIXOTE:. Section 5 describes module concepts with some examples. Section 6 describes the facilities for relating modules, especially to import or to export rules among modules. Section 7 describes queries in QUIXOTE:, which provides the facilities to deal with modifications of a program, or assumption-based reasoning. Finally, Section 8 describes a brief comparison with related works. 2 1 Introduction Logic programming is a powerful paradigm for knowledge information processing systems from the viewpoint of knowledge representation, inference, advanced databases, and so on. QUI XOTE: is designed and developed to support this wide range of applications in the Japanese FGCS project. Briefly speaking, it is a constraint logic programming language, a knowledge representation language, and a deductive object-oriented database language. In QUIXOTe, basic notions for representing concepts and knowledge are objects and their properties. An object in QUIXOTE: is represented by an extended term called an object teTm, and its properties are defined as a set of subs7.1mption constraints. Another distinguished feature of QUIXOTe is its concept of mod1tles. A module corresponds to a part of the world (situation) or the local database. In QUIXOTE, its module concepts play an important role in classifying knowledge, modularizing a program or a database, A simple system of objects and their properties Object-oriented features are very useful for applying logic programming to 'real' applications. QUIXOTE: is designed as a logic programming language with features such as: object identity, complex objects, encapsulation, inheritance, and methods, which are also appropriate for deductive object-oriented databases and situation theoretic approaches to natural language processing systems. An object is a key feature in QUIXOTE: to represent concepts and know ledge. In knowledge representation applications, it is important to identify an object or to distinguish two objects, as in the case of object-oriented languages. Object identity is the basic notion for identifying objects. QUIXOTE: precisely defines object identity, where extended terms are used as object identifiers. In this sense, extended terms in QUIXOTe are called object terms. In this section, the simplified treatment of objects and their properties are presented. That is, the case of every 258 object term is atomic. In the next section, the system of object terms is extended to non-atomic and complex cases, including the non-well-founded (circular) case. 2.1 Basic Objects At the first approximation, we assume that each object has a unique atomic symbol as its identifier. The important thing, here, is that objects are related to each other. There are some relations to be considered, such as is_a-relations, parLofrelations, and so forth. In QUIXCJTE, subsumption relations among objects are used to relate objects. A set BO of atomic symbols called basic objects is assumed. BO is partially ordered by the subsumption relation (written ~), and (BO,~, T,.1) is a lattice with T as its maximum element and .1 as its minimum element. A basic object is used as an object identifier (an object term) in this simple setting. An example of the lattice is BO* = ({animal,mammal,human,dog},~, T,..L) ° property of the object. In what follows, we say that has the attributes [ll = Vb' •. , In = vnJ when there is no confusion. For example, the following is an attribute term representing that John has the property of being 20 years old and being a male: john/rage Notice that an object identifier and its property (attribution) are separated by"/". It is useful to regard an attribute of an object as a concept. For example, John's age can be seen as a concept. In QUIXOTE, this kind of concept is represented by dotted terms . .A dotted term is defined as a pair of an object term and a label, and has the following form: 0.1 where ° is an object term and l is a label. For example, John's age is represented by the following dot ted term: john.age where the following holds: mammal h1lman dog ~ ~ ~ Attribute Terms mammal, mammal. Terms and Dotted = o.l O.ll In addition to the basic objects, we assume a subset L of BO, called labels. Labels are used to define the attributes of objects. An attribute of the object 0 is represented by the triple (0, I, v) where I is a label and v is an object. Thefollowing example shows that John has the attribute of his age being 20: (john, age, 20). A property of an object is represented by a set of the pairs of a label and its value, that is, a set of attributes. Thus, John's having a property of being 20 years old and being a male is represented by {(john, age, 20), (john, sex, mal e)}. Formally, a label I is interpreted as a function: [I] : BO ---7 = 20. A dotted term is treated as a global variable ranging over the domain of object terms, and interpreted as an object term. The following holds for dotted terms: animal, 01 2.2 = 20, sex = maleJ. BO. The syntactic construct for representing an object and its properties is the attribute term. An attribute term is of the form: 2.3 02 =? 01. l = 02. 1 = ol,o.l = 02 =? 01 = 02 = 01 =? 0.ll.l2 = 01. l2. Properties as Subsumption Constraints It is often the case that an object has certain attribute while its value is not fully specified. john/rage ---7 positive_integerJ The above attribute term represents that John has the property of his age being subsumed by positive integer. In this case, John's age is not specified but constrained as being subsumed by positivejnteger. Constraints in the simplified QUIXOTE are subsumption constraints over basic objects. As mentioned in 2.1, the domain of basic objects is a lattice under the subsumption relation. Thus, the rules of subsumption constraints are simply defined as follows: • x = x, • if x = y then y = x, where 0 is an object term, Ii's are labels, and Vi'S are objects. The syntactic entity [II = VI, ... , In = VnJ is called the attribution of the object term o. It specifies a • if x = y and y = z then x = z, • x ~ x, 259 • if x ~ y and y ~ z then x ~ z, • if x = y and x ~ z then y ~ z, • if x = y and z ~ x then z ~ y, the representative element of the equivalence class that is defined by the following rule: [ s] d,;j {x E s • if x ~ y and y ~ x then x • if y ~ x and z ~ x then (y i t z), and X = Y == X [{I, integer, "abc", string}] = {integer, string}, z) ~ x where (x t y) is the infimum (meet) and (x i y) is the supremum (join). Note'that x = y is equivalent to the conjunction of x ~ y and y ~ x, that is, the following holds: de! ~ Y }. Under this definition, ~wordering becomes a partial ordering, and can be used as an equivalence relation. For example, the following holds: = y, • if x ~ y and x ~ z then x ~ (y I --,:Jy E s X i= y /\ x ~ Y /\ X ~ y. A set of subsumption constraints is solvable if and only if it does not contain a = b for two distinct basic objects a and b with respect to the above rules[20]. The aforementioned attribute term is defined as a pair of the basic object john and a subsumption constraint john.age ~ positive_integer. Such a pair can also be written as: provided that 1 2.4 ~ ~ integer and "abc" string. Property Inheritance It is natural to ·assume that properties are inherited from object terms with respect to ~-ordering. Consider the following example: swallow ~ bird. birdJ[canfly ~ yes]. Since swallow is a kind of (subsumes) bird and bird has the attribute [can fly - t yes], swallow has the same attribute by default. The rule for inheritance of properties between objects IS: john/I{john.age ~ positive_integer}. This is just the opposite of the description of the attribute term shown above. The general form of an attribute term is as follows: Definition 1 (Rule for inheritance) 01 If 01 ~ 02 ~ 02 => 01. / ~ 02.1. holds, then the following holds according to this rule: where 0Pi E {=,~, f-}. For each label not explicitly specified, we assume that its value is constrained, that is, subsumed by T. This assumption states that the property of an object can be partially specified. In addition to the subsumption constraints over basic objects, the subsumption constraints over sets of basic objects are also used in QUIXOTE. For example, the following attribute term represents that both cooking and walking are John's hobbies: john/[hobby f- {cooking, walking}], that is, the dotted term john.hobby is a set subsuming the set {cooking, walking}. The subsumption relation ~H over the domain of sets of basic objects is defined as Hoare-ordering over ~ ordering as follows: It should be noted that the domain of sets of basic objects is classified by the equivalence relation defined by the Hoare-ordering. In QUIXOTE, any set is interpreted as • if 02 has the attribute same attribute, [1 - t 0'], • if 01 has the attribute [l same attribute. f - 0'], then 01 also has the then 02 also has the Notice that the attribute [1 = 0'] is the conjunction of [1 ~ 0'] and [1 f - 0']. As mentioned before, an attribute term consists of an object term and a set of subsumption constraints, thus, property inheritance can be considered as constraint inheritance. 3 Complex Objects The simplified approach shown in the previous section lacks the capability to represent the complex objects required in actual applications, such as trees, graphs, proteins, chemical reactions, and so forth. A complex object has certain "structures" intrinsic to its nature. Knowledge representation languages must be able to represent such complex structures, that is, the object identifiers in QUIXOT£ language. Thus, it is important to give a facility for introducing complex object terms into QUIXOTE. 260 3.1 Intrinsic vs. Extrinsic Properties The approach adopted in QUIXOTE is a natural extension of the simplified language given in the previous section. An object has the property, that is, a set of attributes, which are intrinsic to identifying that object. Thus, the properties of an object are separated into two, the intrinsic property and the other extrinsic properties. Similarly, the attributes of an object are divided into two, intrinsic attributes and extrinsic attributes. In QUIXOTE, the intrinsic attributes are included in the object term representation but not in the attribution of an attribute term representation. For example, the concept of red apple is represented by the following complex object term: apple[color = red]. Notice the difference between this object term and the attribute term apple/[color = red]. The latter represents the concept of apple with the attribute [color = red] as its extrinsic property. Let 0 be a basic object, 11, 12, . .. be labels, and 0b 02, . .. be object terms. • Every basic object is an object term. • A term 0[11 = 01, 12 = 02, . .. ] is an object term if it contains only one value specification for each label. • A term is an object term only if it can be shown to be an object term by the above definition. For an object term 0[11 = 01, 12 = 02, ... J, 0 is called the principal object and [11 = 01,l2 = 02, ... ] is called the intrinsic pmperty specification. The intrinsic property specification of an object term is the set of intrinsic attributes of the object term, and interpreted as the indexed set of object terms indexed by the labels. Thus, an object term is interpreted as the pair of its principal object 0 and the indexed set s, and is written as: (0, s). Let EO = {human,20,30,int,male,female}, 20 ~ int,30 ~ int, L = {age, sex}. The following terms are object terms in QUI,YO'U: human, human[age = 20, sex = male]. These two object terms are interpreted as (human, {}) and (human, {(age,20), (sex,male)}). The object term T[ll = Vb ... ] is described as [11 V1, ..• ] for convenience. By the definition· of complex object terms, the following holds: For example, human[age = 20].age = 20 holds. It is possible to have object terms containing variables ranging over ground object terms as follows: human human[age 3.2 = X, sex = Y]. Extended Subsumption Relation Given the subsumption relations ~ among basic objects, the relations can be extended into subsumption relations among complex object terms. The extended subsumption relations preserve the ordering on basic objects, and also constitute a lattice. The precise definition of a extended subsumption relation is given in [20], intuitive understanding will suffice at this point. Intuitively, 01 ~ 02 (we say 02 subsumes 01) holds between two complex object terms 01 and 02 if and only if: (1) the principal object of object of 01, (2) 01 02 subsumes the principal has more labels than 02, and (3) the value of each label of 02 subsumes the value of each label of 01. For example, the following holds: human[age = 20, sex = male] = integer], ~ animal[age because the principal object of animal[age = integer] (animal) subsumes the principal object of human[age = 20, sex = male] (human), the object term human[age = 20, sex = male] has more labels than animal[age = integer], and 20 ~ integer holds. Similarly, human[age = 20] ~ animal[age = int] holds, but human[age = 20] and human[sex = male] cannot be compared with respect to ~-ordering over complex object terms. In such extended subsumption relations over object terms, the object term T is the largest among all the object terms. In QUIXOTE, the object term ...L is the smallest of all, that is, ...L is used as the representative element ofthe class of object terms that are smaller than ...L 1. The semantic domain of object terms is a set of labeled graphs, a subclass of hypersets with urelement[2, 13]. 1 From the definition of object terms and subsumption relation over them, it is possible to have an object term of the form: 261 The reason such a domain is adopted is to allow object terms with infinite structure. Subsumption relations correspond to hereditary subset relations[2] on that domain. The rules for extended subsumption constraints are those listed in 2.3 plus the following: • if (Ol,Sl) (l, V2) E S2 ~ (02,S2) then 01 ~ 02 and for each there exists (l, VI) such that VI ~ V2, 4 Deductive Rules It is important for know ledge representation languages to provide facilities for certain types of inferences, namely, deductive inference. The deductive system of QUIXOTe is defined by deductive rules (rules, for short) . 4.1 • if (Ol,Sl) (l, vd E = (02,S2), then 01 = there exists (l, V2) E S2 (the symmetric condition follows). Sl and for each such that VI = V2 02 These two rules correspond to the simulation and bisimulation relations in [2, 13], where the bisimulation relation is an equivalence relation. 3.3 Exception on Property Inheritance By introducing complex object terms in terms of intrinsic-extdnsic distinction, it becomes possible to define the notion of exceptions on the inheritance of properties in a clear way. Intuitively, the intrinsic property of an object is the property that distinguishes that object from others, and such properties should not be inherited. In addition to the rule for property inheritance given in 2.4, the rule for exception is defined as follows: Rules in QuIXOTE First, a literal (atomic formula) of QUIXOTe is defined to be an object term or an attribute term. The rules of QUIXOTe are defined as follows: (1) a literal H, (2) H ~ B ll ... , Bn where H, B 1 , .• . , Bn are literals. H is called the head and the "B1 , . .. ,Bn" is called the body of the rule. Rules of the form (1) are sometimes called unit rules or facts 2 • Rules of the form (2) are called non-unit rules. A fact H is shorthand for the non-unit rule whose body is empty, that is, the rule H ~. When there is no confusion non-unit clauses are simply called rules. A database or a program is defined as a finite set of rules. A fact specifies the existence of an object and its property. The following is an example of facts: john; ; Definition 2 (Rule for exception) The intrinsic attributes of an object term override the attribution inherited from the other object terms) and any of the intrinsic attributes is not inherited to the other object terms. In sum, the intrinsic attributes are out of the scope of property inheritance. For example, consider the attribute of the object term bird[canfly -+ no] with respect to the following database definition: bird/[canfly = yes], john/rage = 20];; The former fact specifies that the literal john holds (or is true), that is, the database has the object john as its member. In addition to that, the latter specifies that john has the property of [age = 20]. . The informal meaning of the rule H ~ B 1 , •.• ,Bn IS as usual, that is, if B 1 , ..• ,Bn holds then H holds. As mentioned in Section 2.3, properties are interpreted as subsumption constraints. Thus, a rule is defined as a triple (H, B, C) of the object term H in its head, the. set of object terms B in its body, and the set of constramts C. The elements of B are called subgoals. Thus, any rule can be represented by the following form: bird[canfly = no]. H~BIIC. The object term bird[canfly = no] inherits the attribute [canfly -+ yes], by the rule for inheritance. However, bird[canfly = no] contains the intrinsic specification on the label canfly. Thus, bird[canfly = no] has the attribute [canfly -+ no] as its property by the rule for exception. Thus, given in Section 3.1, the following holds even if property inheritance occurs: This form of rule is called constraint-based form. It is possible to associate constraints, other than those corresponding to attributes, with a rule as follows: john/[daughter X/[Jather f- {X}] ~ = john] II {X ~ female}. 2 Sometimes, a fact is defined to be a unit-rule having a nonparametric object term as its head. In that case, ~he set of fa~ts corresponds to an extensional database in conventlOnal deductIve databases. 262 Precisely speaking, the set of constraints C of a rule is classified into two, the constraints in the head of the rule (head constraints) and the constraints in the body (body constraints). For example, the rule O/[ll = 01, l2 = 02] -¢= p/[l3 = 03], q/[l4 = 04] has {O.ll = 01, 0.l2 = 02} as its head constraints, and {P.l3 = 03, q.l4 = 04} as its body constraints. In the context of object-oriented languages, the attributes in the head of a rule correspond to the methods, and the body of the rul~ corresponds to their implementation, as in F-logic[8]. 4.2 Derivations and Answers Compared to the usual notion of the derivation of goals and answers in logic programming languages like Prolog, two points must be explained in the case of QUIXOTE. The first point is the role of object terms as object identifiers. The value of an attribute of an object must be unique, since the label of the attribute is interpreted as a function. The second point is the fact that the attributes of an object can be partially specified and they are interpreted as subsumption constraints. Consider the following database: Example 1 o[l = X]/[ll o[l = Xl/[ll -+ -+ a, l2 = b] d,13 = e] -¢= -¢= X X II II {X ~ e}i i {X ~ f}i i inconsistent, the two rules cannot be applied together, that is, the derivations given by the two rules must not be merged. Definition 3 (Derivation of a goal) A derivation of a goal Go by a program is defined as the 5tuple (G, R, 8, HC, BC) of a sequence G (= Go, G 1 , ... ) of goals) a sequence R (= R 1, ... ) of the renaming variants of the rules) a sequence 8 (= 01 , ... ) of most general unifiers3) the two sets of constraints HC and BC of all the head constraints and all the body constraints of the rules in p) such that each G H1 is derived from Gi and Ri+1 using OHlJ and (HC U BC)8 is solvable. Definition 4 (Assumed constraint set) The assumed, constraint set of a derivation D ( = (G, P, 8, HC, BC)) is defined as the set of all constraints in BC that are not satisfied by HC with respect to the substitution 8. The assumed constraint set of a derivation is the set of attributes of objects which are assumed to derive the goal. This is because some attributes of objects in a database are partially defined. Each derivation has its own derivation context defined as the consequence relation (t- c) between its assumed constraint set and its head constraints. A derivation context A t-c B of a goal represents that the goal is derived by assuming A, and as a consequence, B holds. The notion of a refutation is defined similarly as usual: a derivation that has the empty goal as the last element in its sequence. In Example 2, the two refutations of the goal 0 have the following derivation contexts, : Pi i, where both p ~ c and p ~ p.l2 ~ b t-c o.h ~ a, p.12 ~ d t-c o.h ~ c. f hold. In this case, 0[1 = p]/[ll -+ a,12 = b] holds by the first rule and o[l = pJ/[t1 -+ d, t3 = eJ holds by the second rule. Thus, by combining these two, the object term o[l = p] gains [/1 -+ (a 1 d), 12 = b, l3 = eJ as its attribute. This process is done by merging the attributes of the derived subgoals equivalent to each other. The merging pro~ess becomes complicated if we take into account the partiality of the attributes of an object. Consider the following example: Example 2 0/[/1 o/[ll -+ aJ -¢= -+ e] -¢= P/[l2 p/[l2 -+ -+ bJ;; dJ;; p;; . The subgoal p of the first rule holds with attribute [l2 -+ bJ, which is not defined in the database. This is because the fact Pi; in the example does not specify the value of its I-attribute. Similarly, the subgoal p of the second rule holds with [12 -+ dJ. If these two attributes are To deal with the merging of attributes discussed above, a goal must be merged into the other refutation of the same goal if the derivation contexts of the two refutations have some relation to each other, that is, if the assumed constraint set of one refutation holds in the assumed constraint set of another refutation. This means that the condition holds in a weaker assumption also holds in a stronger assumption. For example, in Example 2, if b ~ d holds, then the second refutation is merged into the first one. As a consequence, a new refutation is given instead of the first refutation, whose derivation context is as follows: p.l2 ~ b t-c 0.11 ~ (a Moreover, if b ~ d and e refutations becomes: ~ 1 e). a, then the context of both 3The most general unifier of two object terms is defined similarly to the usual one, except for the definition of terms. 263 This means that the first derivation is absorbed to the second with respect to the merge, because (a 1 c) = c holds. After merging all possible pairs of refutations, the notion of an answer to a query is defined as follows: Definition 5 (Answer) An answer to the query is defined as a pair of the answer substitution and the derivation context of a refutation. Thus, the following two answers are given to the query ?-o/[l = Xl to the dat~base shown in Example 2: b r-c o.h ~ a), r-c 0.11 ~ c), (0, p.12 ~ (O,p.12 ~ d if no condition is given among a, b, c , and d. The QUIXOT£ interpreter returns all answers at once, that is, it employs the top-down breadth-first search strategy. Modules in QUIXOTE -5 In this section, a module concept is introduced into (1) defines our conception of a module as a set of rules with the same index. Thus, if we regard an index as the identifier for a context or a situation, the set of rules can be seen as the chunk of knowledge relevant for that context or situation. As the result of introducing indexes, each literal has come to have the form: m:A where m is an index called module identifier, and A is an object" term or an attribute term. Hence, the usual consequence relation between formulas should be replaced by: Intuitively, this means that A holds in m with reference to parts ml, ... , mi of the database. In obtaining the answer, the 'choice of parts of the database can be seen as the assumptions. In QUIXOT£, an object term is used as a module identifier. The use of object terms as module identifiers enables the user to treat modules as objects, and provides meta-like programming facilities. QUIXOT£. 5.2 5.1 Need for Modules in Deductive System The goal of kriowledge representation is to provide a facility for reasoning about a problem by using given knowledge in the way that ordinary people do: we call this everyday-reasoning, or human-reasoning. Such reasoning systems can be defined as the pair (R, A) of a set of deductive rules and an algorithm for extracting all consequences from the rules. " For simplicity, fix A, and think of R as the knowledge in a reasoning system. Rules with Module Identifiers Corresponding to the constraint-based form of a rule given in Section 4, a modularized rule has the following form: mo :: 00 ¢= m1 : 0b ... ,m n : On II C where 00,01,"" On are object terms, mo, m1,' .. , mn are module identifiers, and C is a set of constraints 4 • This rule specifies the following two things: (1) this rule is in (or is accessible from) the module with a module identifier mo, and consistent nor complete, even though its fragments may be consistent in themselves, (2) if each subgoal mi : 0i holds with respect to a variable assignment and constraints C then mo : 00 holds. • reasoning is situation-dependent, i.e., some fragment of R is relevant or meaningful in a certain situation, Generally, the modules and their rules are defined as follows: • R is neith~r • reasoning usually requires some assumptions. One way to deal with such an aspect of reasoning is to associate an index to each literal and each rule in R. Indexes can be used: (1) to define a fragment of rules (a chunk of know ledge) which can be used in a certain situation, and (2) to clarify which assumption (set of rules) is used. where rb' .. ,7'm are rules. Note that it is possible for modules to be nested. Thus, it is easy to have a set of rules in a module as the set of all rules with the module identifier of that module. 4Precisely, this form represents the rule 00 with index mo. <= ml : 01,"" mn : On II C 264 The set of rules in the module with m as its identifier is written as Em s . In general, a module identifier may be a parametric object term, that is, an object term with variables in its description. The variables appeared in a rule are interpreted as universally quantified, thus the parametric module identifiers which are equivalent with respect to variable renaming are regarded as the same. In QUIXOTe, it is assumed that each module is consistent. It is an important feature of modules to represent inconsistent knowledge where inconsistency arises from differences in situations or context. For example, consider the situation of John's believing that Mary is 20 years old, when she is actually 21 years old. The following database shows the treatment of such a problem: johns_belief :: mary/rage = 20];; reaLworld :: maryj[age = 21];; In this case, the database is consistent as a whole unless the two modules are related to each other. The following example shows the use of parametric module identifiers to describe so-called generic modules. A parametric module identifier can be used to pass parameters to the rules in the module. Example 3 (Generic Module) sorter[cmp = ej :: { sort[l = 0, sorted = [], cmp = e];; sort[l = [AIX]' sorted = Y, cmp = ej ¢= split[l = [AIX]' base = A, cmp = e, II = L 1 , 12 sort [1 = L 1 , sorted = Yl, cmp = (7], sort[1 = L 2 , sorted = 1'2, cmp = e], list: append[ll = Y1 , 12 = 1'2, I = Y]; ; = L 2 j, ... }j j less_than :: { compare[arg1 {A < B}j j compare[arg1 = A, arg2 = B, res = yes] II = A, arg2 = B, res = no] {B < A}}; j Module sorter[cmp = e) has the definition of a quicksorting procedure which uses the argument G as the comparator, and module less_than has the definition of a comparator, where the relation < is used as the constraint relation for comparing two objects. In processing the query: ?-sorter[cmp = less_than] : sort[ I = L, sorted = R, cmp = G], -----------------5 Precisely, Em should· be defined as the set of rules that are properly in m. Taking rule inheritance into account, the set of rules in a module is the union of the proper set and sets of rules imported from the other modules. the module identifier less_than is passed to the rules in the sorting module, and used to compare two elements of list L. It is possible to give module identifiers other than it for using different comparator in the sorting procedure. The next example shows the treatment of state transitions by using modules to represent states. Example 4 (State Transition) m.:: { a/ron = nil]; ; b/[on := a]; ; c/[on = nil]; ; d/[on = c]}; ; sc[sit = M,op = move[obj = A, fr G/[on = A] ¢= M : Aj[on = nil], M: B/[on = A], /vI: G/[on = nil];; B/[on = nil] ¢= M : A/[on = nil], M: B/[on = A], M: G/[on = nil];; A/ [on = nil] ¢= M : A/[on = nil], M: Bj[on = AJ, Ai: G/[on = nil]};; = B, to = eJ] :: { In the initial state m, block a is on top of block b, and block c is on top of block d. move[obj = A, fr = B, to = e] represents the operation of moving A from the top of B to the top of G. Module sc[sit = M,op = OP] defines how the state of M is changed by operation OP. At the same time, the module identifier shows the history of state transitions. For example, the following answers are obtained: ?-sc[sit = m,op X/ron = a]. Answer :X = c. = move[obj = a, fr = b, to = c]] : In this case, the module that represents the state after an operation is not included in the given program, it is possible to create new modules by adding a program to a query (Section 7) and by issuing a create_module command. Concerning modifications made by the sequence of queries and create_module commands, QUIXOTe employs transaction logic with special commands, begin_transaction, end_transaction, and abort-transaction. If some modules are created in one transaction, they are incrementally added to the program unless the transaction ends with abort-transaction. 265 6 Relating Modules 6.2 It is important to relate some modules in defining the database and when reasoning. Two ways of relating modules should be considered, that is, referring to other modules and importing/ exporting rules from other modules. As shown above, a rule of QUIXOT£ has a subgoal of the form m : A in its body. This sub goal specifies the external reference to the module with m as its identifier. In such a case, module m can be seen as encapsulated, because no rule is imported to it. 6.1 Simple Submodule Relationship Sometimes, it is useful to define databases by providing a facility to import/export among modules as in typical object-oriented languages. In QUIXOTf, importing/exporting rules are done by rule inheritance defined in terms of the binary relation ~s over modules called the submodule relation. The submodule relation is similar to the subsituation relation in PROSIT[15]6. Basically, rule inheritance is defined as follows: To treat various rule inheritance phenomena, two orthogonal modes, local and overriding, are introduced into QUIXOT£. Each rule may have these modes, which control how each rule is inherited according to submodule relations. If a rule is local, then it is not inherited to other modules. An overriding rule overrides the other rules inherited from other modules, that is, the inheritance of some rules is canceled. There are several possibilities on what rules are to be canceled by an overriding rule. Currently, the inheritance of a rule is canceled if its head has object terms with the same principal object and its labels are same as the one of the head of overriding rule. This is similar to the 'retract' predicate of Prolog. Each rule has an inheritance mode. The value of the inheritance mode is (0), (l), or (01), if explicitly specified. (0) means 'overriding, (I) means 'local', and (ol) means 'local and overriding'. If a rule has no inheritance mode, the rule is regarded as having 'non-local and nooverriding' by default. Consider the following example. Example 5 (Exception by Inheritance Mode) Definition 6 (Rule Inheritance) bird ::. canfly/[pol = yes];; If ml ~s m2 then module ml inherits all the rules of m2J that is} all the rules in m2 are exported to mI. Under this definition, the set of rules of ml is ~ml U ~m2' The right hand side of ~s in a submodule definition may be a formula of module identifiers with settheoretical union, intersection, or difference. For example, if we have ml :: {rn, ... , rli}, m2:: {r21, .. ·,r2j}, { m2 , m3} :: {r31, ... , r3k} , penguin :: (01) canfly/[pol = no];; super _penguin:: { ... }; ; bird penguin ~s super _penguin; ; penguin ~ bird ~ penguin ~s bird, where rules in ml should be overridden. then ml has the set of rules 6.3 Taking the rule inherita.nce into account, a special module identifier selfis also introduced as in most objectoriented programming languages. For example, consider the following: 0 ~s The inheritance of the rule of the module bird is canceled in the module penguin by its 'overriding' rule, whereas the module super _penguin gains canfly/[pol = yes], because the rule in bird is inherited to it. By introducing local and overriding modes for rule inheritance, it is possible to relate subsumption and submodule relations closely as follows: ml ~s m2 - m3 ml :: Controlling Rule Inheritance =? 01 II C. The subgoal 01 is interpreted as self: 01' In this context, self is evaluated as mI' If m ~s ml, then m ras the rule m :: 0 =? 01 \I C, and self is evaluated as m in this case. 6Considering a module as a class, a super-class of mI. ml ~s m2 means that m2 is Links between two Modules Sometimes, a facility for representing changes of state is required as shown in the example in Section 5.2. The relation between the two states before and after an operation is represented by a special form of object terms. However, simpler and more sophisticated treatment may be required for general treatment of state transitions or changes of states. The problem is how to relate modules and objects. Another kind of relations called links are provided as follows: 266 where ml and m2 are module identifiers, and 01 and 02 are object identifiers. L is called the name of a link relation. Notice that link relations are defined over module identifiers and object terms. The former links are called module-links and the latter links are called object-links. The links defined above obeys the following rule: This rule shows how module-links and object-links colaborate. According to this rule, a pair of a modulelink definition and an object-link definition can be transformed as follows: The following is an example of link usage: mdagt = a] tU~Ck mdagt = a] to_the_righLof[obj = b]1U~Ck to_theJefLof[obj =b]. This example means that b is to the right of an agent a in a module ml, while b is to the left of a in m2 after a' turns back. By traversing the used links, one can keep track of the stages of reasoning. This feature is especially important in assumption-based reasoning and plan-goal based reasoning. Most of the links appeared in semantic networks can be represented by labels in an attribute term, while some of the links accompanying inference are represented by the pairs of a module-link and an object-link. &submod; ; penguin &e_mod; ; &b_rule; ; ~s bird, . .. ; ; bird :: canfly/[pol = yes];; penguin :: color[arg = black_white]; ; ... ; ; &e_rule; ; &e_pgm. A query is defined as a pair (A, P) of a set of attribute terms A and a program definition P (=(E, M H , OH, R)). The purpose of this query is to find the answer to A in the context of adding P. Thus, a query (A, P) to a program P' (=(E',MH,OH,R')) is the same as a query (A, []) to a program (EUE',MHUMH , OHUOH,RUR'). To deal with the modification of the program, a new transaction begins just before a query is processed and ends just after the process is terminated. QUIXOTE: transactions can pe nested, and the user can specify whether the modifications or updates done in each transaction are valid for successive processes or not. This feature of adding a program fragment in a query extends the ability of the assumption-based reasoning in QUIXOTE:, as shown in the following query, to the program above. ?-super_penguin : canfly/[pol = X];; &b_pgm; ; &b_mod; ; &submod; ; penguin> -supe1'_penguin; ; &e_mod; ; &b_rule; ; 7 penguin :: (ol)canfly/[pol = no];; &e_rule; ; &e_pgm. Programs and Queries As mentioned before, a database or a program is defined as a finite set of rules. More precisely, some additional information is associated with the definition of a database or a program. A definition of a QUIXOTE program concept is defined as a 4-tuple (E, M H , OH, R) of the environment part E of the definition of macros and information on program libraries, the module part MH of the definition of the submodule relation, the object part OH of the definition of the lattice of basic objects, and a set of rules R7. The following is an example of a program definition. &b_pgm; ; &b_env; ; ... ; ; &e_env; ; &b_oq); ; &subsum; ; bird &e_obj; ; ~ penguin, ... ; ; &b_mod; ; 7Precisely, MH contains the definition of module-links, and 0 H contains the definition of object-links. 8 8.1 Related Works Objects and Properties Beginning with Ait-Kaci's work on 1/J-terms, there are a number of significant works on the formalization complex terms and feature structures [16, 13, 1, 3, 4]. Formalization of the object terms and attribute terms of QUIXOTE: is closely related to and influenced by those works, especially the work done by Mukai on CIL [14] and CLP(AFA) [13]. Compared to those works, the unique point of QUIXOTE: is its treatment of object identity that plays an important role in introducing object-orientedness into definite clause constraint languages. As for object-orientation, Kifer's F-logic is closely related to QUIXOTE:, although the treatment of object identity and property inheritance is quite different. In Flogic, object identity is not defined over complex terms 267 but over normal first-order terms. The approach taken in QUIXOTe is more fine-grained than that of F-Iogic. 8.2 Modules As module concepts are very important in knowledge representation as well as programming, several related works have been done [9, 10, 11, 15]. First, a brief comparison of the language features of these works is presented. From the viewpoint of knowledge representation, modularization corresponds to the classification of knowledge. In such sense, the flexibility to relate modules is important. QUIXOTe provides a number of ways to do this, for example, by specifying the nesting of modules. QUIXOTe supports multiple module nesting by allowing set-theoretical operators to relate modules, which are also used for the exception handling, while other languages do not mention to it. QUIXOTe also provides a facility for dealing with exceptions on exporting/importing rules by using the combination of modes associated with each rule (local and overriding). This covers the features described in [9, 10, 11]. Furthermore, as in most object-oriented languages, QUIXOTe introduces the special module identifier self which can be seen as a meta-level variable and plays an important role in rule inheritance, while other languages do not. On the contrary, other languages have introduced the notion of side-effects mainly to make computation efficient. This is because the others are essentially designed as programming languages. This feature, including database updates, will be enhanced in the next version of QUIXOTe. Concerning the semantics of modules and reasoning with modularized formulas, Gabbay [6] proposes a prooftheoretic framework for extending normal deductive systems called the Labeled Deductive System (LDS). In LDS, each formula is labeled, in the form of t : A, where t is a symbol called label and A is a logical formula. The consequence relation is replaced by: tt : At, ... , tn : An f- s : B. In his concatenation logic, the following inference rule is the key to relating labeled formulas: s : a, t : a J b f- (t + s) : b. This means that b is obtained by using s first and then by using t. The label (t + s) indicates the order of label use. This corresponds to the notion of links in QUIXOTe, as expla.ined in Section 6.3. It is worthwhile investigating the relationship between LDS and QUIXOTe, namely, to give a proof theory for QUIXOTe. This is work to be done in the future. 9 Concluding Remarks Version 1.0 of QUIXOTe, written in KL1 (designed by ICOT as a parallel language for parallel inference machines PIM), has been completed. It has been used for several application systems, such as legal reasoning systems[19], natural language processing systems[18], and molecular biological databases[17]. Through those experiences, the usefulness of the features of QUIXOTe are being examined. We are now working with the new version of QUIXOTe for more efficient representation and processing. In the new version, the following features are introduced: 1) Relation ~etween Subsumption and Submodule This feature is discussed briefly at the end of Section 6.2. 2) Updates In Sections 5.2 and 6.3, we show a simple example of state transition. However, such problems are closely related to updates of databases or pr~grams. Currently, only facts can be added or deleted. In the next version, the facility for adding or deleting non-unit clauses will be provided. The point is how to deal with those updates in a parallel processing environment without causing semantic problems. 3) Meta-Rule Meta-rules are useful both in programming languages and knowledge representation languages. They provide a facility to describe schemata to define generic procedures or knowledge. For example, in HiLog[5], the following general transitive closure rule can be written: tc(R)(X, Y): -R(X, Y). tc(R)(X, Y): -tc(R)(X, Z), tc(R)(Z, Y). In QUIXOTe, new variables corresponding to the principal objects of object terms would be introduced to support such a function. Acknow ledgement We would like to express our gratitude to the members of the third laboratory of ICOT, and the members of the QUIXOTe project for their discussions and cooperation. We are grateful to the members of the working groups of ICOT, STS (Situation Theory and Semantics) and NDB (New-generation DataBases) and IDB (Intelligent DataBases), for their stimulative discussions and useful comments. We also would like to thank Dr. Kazuhiro Fuchi, Dr. Koichi Furukawa, and Dr. Shunichi Uchida of ICOT for· their continuop.s encouragement. 268 References [1] S. Abiteboul and S. Grumbach, "COL: A LogicProc. Based language for Complex Objects", EDBT, in LNCS, 303, Springer, 1988 [2] P. Aczel, Non- Well-Founded Set Theory, CSLI Lecture Notes No. 14, 1988. [3] F. Bancilhon and S. Khoshahian, "A Calculus for Complex Objects", Proc. ACM PODS, 1985 [4J W. Chen and D. S. Warren, "Abductive Reasoning with Structured Data", Proc. the North American Conference on Logic Programming, pp.851-867, Cleveland (Oct., 1989). [5] W. Chen, M. Kifer, and D. S. Warren, "HiLog as a Platform for Database Language", Proc. the Second International Workshop on Database Programming Language, Gleneden Beach, Oregon, 1989. [6] D. Gabbay, "Labeled Deductive Systems, Part 1", CIS-Bericht-90-22, CIS, Universitat Munchen, Feb., 1991. [7] M. H6hfeld and G. Smolka, "Definite Relations Over Constraint Languages",. LILOG report 53, IBM Deutschland, Stuttgart, Germany, Oct., 1988. [8] M. Kifer, G. Lausen, and J. vVu, "Logical Foundations for Object-Oriented and Frame-Based Languages", Technical Report 90/14 (revised), June, 1990. [9] D. Miller, "A Theory of Modules for Logic Programming", The International Symposium on Logic Programming, 1986. [10] 1. Monterio and A. Porto, "Contextual Logic Programming", The International Conference on Logic Programming, 1989. [11] L. Monterio and A. Porto, "A Transformational View of Inheritance in Programming", The International Conference on Logic Programming, 1990. [12] Y. Morita, H. Haniuda, and K. Yokota, "Object Identity in Qun:oT£", Proc. SIGDES and SIGAl of IPS], Oct., 1990. [13] K. Mukai, "CLP(AFA): Coinductive semantics of horn clauses with compact constraints", In J. Barwise, G. Plotkin, and J.M. Gawron, editors, Situation Theory and Its Applications) volztme II. CSLI Publications, Stanford University, 1991. [14] K. Mukai, "Constraint Logic Programming and the Unification of Information", PhD thesis, Department of Computer Science, Faculty of Engineering, Tokyo Institute of Technology, 1991. [15] H. Nakashima, H. Suzuki, P-K. Halvorsen, S. Peters, "Towards a Computational Interpretation of Situation Theory", The International Conference on Fifth Generation Computer Systems, 1988. [16] G. Smolka, "Feature logic with subsorts", Technical Report LILOG Report 33, IWBS, IBM Deutschland GMBH, W. Germany, 1989. [17] H. Tanaka, "Protein Function Database as a Deductive and Object-Oriented Database", The Second International Conference on Database and Expert System Applications, Berlin, Apr., 1991. [18] S. Tojo and H. Yasukawa, "Temporal Situations and the Verbalization of Information", The Third International Workshop on Situation Theory and Applications (STAS), Oiso, Nov., 1991. [19] N. Yamamoto, "TRIAL: a Legal Reasoning System (Extended Abstract)", France-Japan Joint Workshop, Renne, France, July, 1991. [20J H. Yasukawa and K. Yokota, "Labeled Graphs as Semantics of Objects", Proc. SIGDBS and SIGAl of IPS], Oct., 1990. [21] K. Yokota and H. Yasukawa, "QUIXOTE: an Adventure on the Way to DOOD (Draft)", Workshop on Object-Oriented Computing'91, Hakone, Mar., 1991. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 269 Resource Management Mechanism of PIMOS Hiroshi YASHIRO*t, Tetsuro FU JISE~, Takashi CHIKAYAMA t , Masahiro MATSUO~, Atsushi HORn and Kumiko WADAt t Institute for New Generation Computer Technology 1-4-28 Mita, Minato-ku, Tokyo 108, Japan +Mitsubishi Research Institute, Inc. 1-8-1 Shimomeguro, Meguro-ku, Tokyo 153, Japan Abstract The parallel inference machine operating system (PIMOS) is an operating system for the parallel inference systems developed in the Japanese Fifth Generation Computer Systems project. PIMOS is written in a concurrent logic language KL1, which adds numerous extensions to its base language, Guarded Horn Clauses, for efficient meta-level execution control of programs. Using such features, PIMOS is designed to be an efficient, robust and flexible operating system. This paper describes the resource management mechanism of PIMOS, which is characterized by its unique communication mechanism and hierarchical management policy. Hierarchical management of user tasks in a distributed fashion is mandatory in highly parallel systems so that the management overhead of the operating system can also be distributed to the processors running in parallel. The meta-level execution control structure called shoen is provided by the KL1 language and is us~d for provid~ ing such hierarchical management in a natural fashion. In concurrent logic languages, message streams implemented by shared logical variables are frequently utilized the media of interprocess communication. PIMOS, based on this programming style, provides multiplexed streams with flexible control for communication between user programs and the operating system. 1 Introduction In the Fifth Generation Computer Systems project of Japan, the parallel inference machines, PIMs, have been developed to provide the computational power required for high performance knowledge information systems [Goto et al. 1988, Taki 1992]. The parallel inference machine operating system, PIMOS [Chikayama et al. 1988], was designed to control highly parallel programs efficiently on PIMs and provide a comfortable software development environment for concurrent logic language KLl. PIMOS was first developed on an experimental model of 'parallel inference machine, called Multi-PSI * EMAIL: yashiro~icot.or.jp [Nakajima et al. 1989], consisting of up to 64 processing elements connected via a two-dimensional mesh network. The system was first developed in 1988 and has been used since then to research and develop various experimental parallel application software. Later, the system was ported to several models of parallel inference machines with considerable improvements in various aspects. . 1.1 Shoen Mechanism The language in which PIMOS and all the application programs are written is called KLl. KL1 is a concurrent logic language based on Guarded Horn Clauses [Ueda 1986] with subsetting for efficient execution and extensions for making it possible to describe the full operating system in it. The greatest benefit of using a concurrent logic language in writing parallel systems is the implicit concurrency and data-flow synchronization features. With these features, one of the most difficult parts of parallel programming, synchronization, becomes automatic, making software development much easier than in conventional programming languages with explicit synchronization. An important addition by the KLI language to regular concurrent logic languages is its meta-level execution control construct named shoen. Shoen enables the encapsulation of exceptional events and the description of explicit execution control over a group of parallel computational activities. The execution unit of KL1 programs is a preposition called a goal, which will eventually be proven by the axiom set given as the program. This proof process is the execution process of the programs, as it is with any other logic programming languages. As the proof process can proceed concurrently for each goal, the goals are fine-grained parallel processes. As no backtracking feature is provided in concurrent logic languages, all the goals in the system form one logical conjunction. Thus, if no structuring mechanism is available, failure in a user's goal means failure of the whole system. The shoen mechanism provides a way of grouping goals, isolating such failure to a particular 270 group of goals. Such a group is called a shoen. 1 A shoen can be initiated by invoking the following primitive. execute(Code, Argv, MinPrio, MaxPrio, ExcepMask, Control, -Report) The arguments Code and Argv represent the code and arguments of the initial goal of the shoen. This goal is reduced to simpler goals during the execution (or proof) process, and all such descendant goals will belong to this shoen. A shoen has a pair of streams named the control stream and the report stream, which are represented here by the two arguments Control and Report respectively. The control stream is used to send commands to control the gross execution of the goals belonging to the shoen, such as starting, stopping, resuming or aborting them as a group. Exceptional events internal to the shoen, such as failure, deadlock; exception such as arithmetical overflow, or termination of computation are reported by the messages received from the report stream (Figure 1). Control Stream Report Stream II 1't- Shoen v 8 8 0 o~ ~ Figure 1: Shoen The two arguments MinPrio and MaxPrio specify the priority range of the goals belonging to the shoen. PIMaS does not try to control scheduling of each finegrained parallel process, but controls them as a group using the control stream and this priority mechanism. Shoen can be nested by arbitrary levels. Stopping a shoen, for example, will make all the children or grandchildren shoen inside it. The argument ExcepMask is used to determine which kinds of exceptional events should be reported to this particular level of the hierarchical structure of the shoen. PIMOS supervises user programs using this shoen mechanism. The exception reporting mechanism is used to first establish the communication path from the user programs to PIMOS. An exceptional event to be reported can be intentionally generated using the following primitive. raise(Tag, Data, Info) IThe shoen mechanism is an extension of the meta-call construct of Par log [Foster 1988] and can be considered to be a language-embedded version' of the meta-interpreters seen in systems based on Concurrent Prolog [Shapiro 1984] The argument Tag specifies the kind of event generated by this primitive. This, along with the mask specified when the shoen is created, determines at which level in the shoen hierarchy this event should be processed. The two arguments Data and Info are passed as detailed information of the event. The Data argument can be any data, instantiated, uninstantiated or partly instantiated, while the Info argument has to be instantiated before the event is generated. The above primitive will be suspended until this argument is completely instantiated to be a ground term without any logical variables. By monitoring the report stream, PIMOS can receive the requests from the user as messages coming from the stream in the following format. exception(Kind, Eventlnfo, -NewCode, -NewArgV) The Kind argument indicates the kind of exceptional event. In this case, the fact that the event was intentionally generated can be recognized. The Eventlnfo argument is more detailed information of the event. In the above case, the Data and Info arguments of the raise primitive will be combined together through this argument. The NewCode and NewArgV arguments specify an alternative goal to be executed in the object level in place of the goal that generated the event. PIMOS utilizes such a goal for inserting a protection filter, which will be described later. 1.2 Resources In conventional systems, memory management and process management are two of the most important tasks of the operating system. In the case of PIMOS, as the underlying language implementation of KL1 provides primitives for those fundamental resources, PIMOS do not have to be concerned with such low-level management. KL1 provides automatic memory management feature including garbage collection, as is the case with Lisp or Prolog. Thus, basic memory management is automatic in the language implementation. KL1 provides implicit concurrency and data-flow synchronization, context switching or scheduling is already supported by the language. Thus, PIMOS does not deal with low-level fine-grained process management, but controls largergrained groups of processes using the priority system provided by the language. As memory and process are managed in the KL1 language implementation level, we call them languagedefined resources. On the other hand, other higherlevel resources, such as virtual I/O devices, are more directly controlled by PIMOS. We call them OS-defined resources. In what follows, we will concentrate on the management of such OS-defined resources. 271 2 Communication Mechanism ?- pimos(Req), user(Req). The basic principles of the communication mechanism are described in this section. This lays the basis for the foundation of the PIMOS resource management mechanism. 2.1 Stream Communication In a parallel environment, efficient management of various resources becomes much more difficult than in a sequential environment. When data in a particular memory area should not be overwritten while being processed by the operating system, the operating system can simply suspend the execution of user programs in a sequential system. In a highly parallel environment, this will seriously spoil the merit of fine-grained parallelism, as all the user processes sharing the memory space must be stopped irrespective of whether they actually have any possibility of changing the data. A frequently used programming technique in concurrent logic languages is the object-oriented programming style [Shapiro and Takeuchi 1983]. In this style, a process (actually a goal which becomes perpetual by recursively calling itself) can have internal data which cannot be accessed from outside and shared data containing variables which can be used for interprocess communication. Interprocess communication is effected by gradually instantiating the data shared between processes. Instantiation corresponds to sending data and observing it corresponds to receiving the data. When the shared data is instantiated gradually to a list structure of messages, the structure can be considered to be a communication stream. PIMOS also utilizes this technique for communication between the user programs and the operating system. For example, reading a character string from the keyboard can be effected by a program shown in Figure 2 (after establishing a communication path by generating an exceptional event as explained in a previous section). The user sends a message getb/2, that requests the reading of N characters. When PIMOS receives the message, it reads N characters from the keyboard to the variable KBDString (readFromKBD/2). Then, the user receives the String instantiated to KBDString. As the cdr of the list, ReqT, will be a new shared variable after this operation, it can be used for successive such communication. 2.2 user(Req) :true I Req = [getb(N, String) IReqT], pimos([getb(N,String)IReqT]):true I readFromKbd(N,KBDString), KBDString=String, pimos(ReqT). Figure 2: An example of interprocess communication between user and PIMOS With the simple mechanism described above, however, intentional or accidental error in user programs may cause system failure in the following ways. Multiple Writer Problem When both the system and user programs write different values to the same variable, a unification failure may occur. In a concurrent language like KL1, unifications by PIMOS and the user may be executed concurrently. Thus, this contradiction may cause PIMOS to fail if it tries to instantiate the variable later. Forsaken Reader Problem The user program may fail to instantiate the arguments of the message sent to PIMOS, in which case PIMOS may wait forever for it to be instantiated. To solve problems, a filtering process called the protection filter is inserted in the stream between PIMOS and the user program. This filter is inserted in the objectlevel (within the user's shoen) using the above described NewCode and NewArgV arguments of the exception reporting message. To solve the forsaken reader problem, the filter will :{lot send a message to PIMOS until its arguments are properly instantiated. To solve the multiple writer problem, the filter will not unify the result from the operating system with the variable supplied by the user until it is properly instantiated by the operating system (Figure 3). In the actual implementation, such filtering programs are automatically generated from the message protocol. definitions. Protection Mechanism 2.3 In a system based on a concurrent logic language, many of the problems that might arise in a conventional operating system will never be a problem. As the communication path between the user programs and the system programs can be restricted to shared logical variables, there is no way for user programs to overwrite the memory area used by the system programs. Asynchronous Communication Stream communication is simple, yet powerful enough for simple applications, but it does not provide sufficient flexibility and efficiency at the same time when controlling various I/O devices. As communication delay is a crucial factor in distributed processing, it is desirable to send messages in a 272 in their nature. After they are used, new paths can be established by sending the reset message described below through the main communication stream. filter([get(C)IUser],DS):true I as = [get(C)IOS1], wait_and_unify(Cl,C), filter(User,OSl). wait_and_unify(OSV,UserV) wait(OSV) I UserY = OSV. 2.4 Figure 3: An example of the protection filter pipelined manner for better throughput. To allow this, it is desirable to allow messages to be sent before being sure that they are really needed and to allow them to be canceled if they are found to be unnecessary afterwards. If only one communication stream is available between the operating system and the user, this cancellation is not possible (Figure 4). Multiplexing Communication Paths It is sometimes mandatory to share some (virtual) resources among several processes. A typical example is with the terminal device shared among processes running under a shell. In such cases, only one process should be able to use the device at a time, but quick switching among processes (when a process is suspended by a terminal interrupt, for example) is essential for comfortable operation. On the other hand, the pipelining of I/O request messages is mandatory for better throughput. With only the mechanism of the "abort" and "attention" lines mentioned above, the aborted requests will merely disappear. This does not provide more flexible control, such as suspending a process and resuming it afterwards. PIMOS provides the following I/O messages to solve the problem. user process device driver Figure 4: Blocked stream To solve the problem, PIMOS provides another communication path for emergencies. We call the path abort line. This communication path is implemented as a simple shared variable. Instantiation of this variable notifies cancellation of commands already sent to the stream. Another problem is that, with only one communication stream from the user to the operating system, there is no way for the devices to send asynchronous information to the users. To solve this, besides the abovementioned two communication paths, a communication path in the reverse direction called the attention line is provided (see Figure 5). user process I) stream I ~ornmany~ ) device driver abolt - attention Figure 5: Asynchronous communication with a device These two "lines" are one-time communication paths resetCResult): The variable Result is instantiated to a term normal(-Abort, Attention, ID). The arguments Abort and Attention correspond to new abort and attention lines. An identifier for a sequence of commands subsequently sent on this stream is returned in the argument ID. resend(ID, AStatus): When I/O request messages are aborted using the abort line, the device drivers remember the aborted messages associated with the identifier. The resend command tells the device driver to retry the aborted messages associated with ID. cancel(ID, AStatus): This cancel message tells the device driver to forget about the aborted messages associated with ID. Suppose that a certain device, such as a window device, is shared by two user processes, A and B. Each user process has one communication path to the device. The communication paths connected from the user processes are merged to a "switch" process, which has another communication path connected to a "control" process (Figure 6( a)). The control process is usually a part of a program such as a command interpreter shell that lets two or more programs share one display window. When a program running under the shell is suspended by an interruption, there may remain 1/ a messages that have been already sent from the interrupted program to the device driver but have not been processed yet. In such a case, the control process suspends the processing through the abortion line and sends a reset message to the device through 273 the switch process (Figure 6(b)). The suspended messages are kept in the device driver with 1D. If the program resumes communication with the device, the control process commands the switch process to send a resend message with ID as its argument to make it resume the suspended 110 requests. Standard I/O Device: A task is associated with its standard 110 devices. Standard 110 devices are aliases of some devices they are associated with. The correspondence is specified when the task is generated. The resource sharing mechanism described above is attached to these tasks. Server: 1/0 subsystems of PIMOS are actually provided by corresponding tasks called servers. They are made relatively independent of the kernel of PIMOS, making the modularity of the system better. The file subsystem is typical of such servers. 3.1 Resource Management Hierarchy As mentioned above, tasks are the unit of management of user programs. All communication paths from user program to PIMOS are associated with certain tasks. Resources obtained by requests through such paths are also associated with the tasks. (a) switch for multiplexing streams abort reset,resend Process A ID =1 ID reset, resend =1 abort Process B ID = 2 Example: --- : connected communication path . .. . disconnected communication path (b) commands between the switch and the device driver Figure 6: Multiplexing streams 3 Resource Management Mechanism All the devices provided by PIMOS have the stream interface described above, with attention and abort lines when required. Thus, management of resources in PIMOS is management of these communication paths. This section describes the mechanism of the management by PIMOS. The following are the keywords to understand the mechanism. Task: Tasks are the units of management of user programs. A task consists of an arbitrary number of goals (fine-grained processes) corresponding to a shoen in the language level, and forming a hierarchical structure. General Request Device: The general request device is the top level service agent. This is the stream user programs can obtain directly from PIMOS. Request streams to all other devices are obtained by sending messages to this device. Tasks are implemented using the shoen mecha~ism of KLl. A task is a shoen with its supervisor process inside the PIMOS kernel. The kernel controls the utilization of resources within the task. Tasks are handled just like ordinary 110 devices. A task handler is a device handler whose corresponding device happens to be a shoen. Tasks are unique in that they may have children resources. As its consequence, a task can have tasks as its children resources forming a nested structure. Corresponding to this, task handlers and other resource controlling processes inside PIMOS also form a hierarchical structure, called the resource tree. This resource tree is the kernel of resource management by PIMOS. One layer of the resource tree is represented by the task handler and device monitors corresponding to its children resources connected by streams in a loop structure (Figure 7). Device monitor processes are common with all kinds of devices. Associated with each device monitor is a device handler, which depends on the category of the device. Device monitors and device handlers are dynamically created when a new virtual device is created and inserted in the loop structure. The device handlers can be classified as follows. Task Handler: A task handler corresponds to a shoen. As described above, usual shoens whose control and report streams are directly connected to their creator. Those streams of shoens corresponding to a task are connected to the task handler. The creator of the task (user programs) can only control and observe the behavior of tasks indirectly through requests to PIMOS. General Request Handler: General request devices are the primary devices provided by PIMOS. Through them, information on the task itself is ob- 274 Figure 7: Resource tree tained and various other devices (including children tasks) can be created. Standard 110 Handler: Standard I/O devices are aliases corresponding to some other device. They provide the resource sharing mechanism described above. Server Device Handler: Server devices are the most common form of virtual devices provided by PIMOS. The device handlers watch the status of the client task and notify its termination to the server task. 3.2 Providing Services To minimize the "kernel" of PIMOS, the kernel provides its fundamental resource management mechanism only. Other services, such as virtual devices such as files or windows, are provided by tasks called "servers". Figure 8 shows an overview of the management hierarchy of PIMOS. The basic I/O system (BIOS) provides the low-level I/O, but it does not provide the protection mechanism. To protect the system, basic I/O service is provided only for the kernel. The kernel provides the above-described resource tree, which provides the resource management mechanism for tasks. Tasks here include both user program tasks and server tasks. As described above, communication between the user programs and PIMOS can be established using the raise primitive. However, this mechanism only establishes a path to the kernel (the resource tree) and not to a server task. The communication path between a client task and a server task can be established as follows (see Figure 9, also ). 1. To start the service, servers register their service to the service table kept in the kernel of PIMOS. The table associates service names to a stream to the corresponding server. The code for the stream filter for protecting the server from clients' malfunction is also registered in the table. 275 3.3 Task Figure 8: An overview of the management hierarchy 2. The client task establishes a communication path to the PIMOS kernel and requests a service by its name. 3. The kernel searches for the name in the service table, and if a matching service is found, connects the client and the server, inserting a protection filter process inside the client. Although the above written order is typical, The order of 1 and 2 is not essential. Requests made prior to registration of the service will simply be suspended. In step 3, PIMOS inserts a device monitor and a device handler corresponding to the server device. The device handler watches for termination of the client task and notifies it to the server (Figure 10) for finalizing the service provided. This separation of the kernel and the servers in PIMOS allows flexible configuration of the system and assures system robustness. Failures in a server will not be fatal to the system; the services provided by the server will become unavailable, but the kernel of the system not to be affected. Table 1 lists standard services in the most recent version of PIMOS (Version 3.2). Each of these services is implemented using the above client/server mechanism. Various other servers, such as database servers, can be added easily and canonically to these standard servers. Table 1: Standard service in PIMOS(Version 3.2) Name Service atom Database of atom identifiers and their unique printable names. file File and directory service. module Database of executable program codes. socket Internet socket service. timer Timer service. Database of user authentication information. user window Display window service. I Standard I/O PIMOS provides a management mechanism for sharing resources, which enables the sharing of resource streams between a parent task and its children tasks (and subsequent children tasks). When a task is generated normally, standard 110 devices of the parent task are inherited to the child task. Multiplexing of the request bstream is implemented as described previously. Standard 110 devices are not a usual device but a kind of alias of the device it is associated with. Since the protection mechanism of PIMOS, a messages filtering process, has to know the message protocol of the stream, the message protocol for the standard 110 device is restricted to a common subset of 110 device protocols. 3.4 Low Level I/O In the lowest level, PIMOS supports SCSI (Small Computer Standard Interface) for device control. Each operation to the SCSI bus is provided as a built-in predicate by the KL1 language implementation. For example, a primitive for sending a device command through the SCSI bus is as follows. scsi_command(SCSI, Unit, LUN, Command, Length, Direction, Data, DataP, ANewData, ATransferredLength, AID, AResult, ANewSCSI) The argument SCSI should be an object representing the state of the SCSI bus interface device at a certain moment. NewSCSI, on the other hand, represents the state of the device after sending the command. This is instantiated only after completing the operation and the value will be used in the next operation, which will be suspended until it is instantiated. The proper ordering of operations is thus maintained. The Unit and LUN arguments designate a specific device connected to the SCSI bus. Arguments Command and Direction are used to control communication on the SCSI bus. The argument ID is used for command abortion, whose mechanism is similar to one described previously. Since the KL1 processor needs garbage collection, realtime programming in KL1 is basically impossible. On the other hand, physical operations on SCSI require realtime response. The above primitive only reserves the operation and actual operation will be done eventually, with lower level real-time routines. Explicit buffers are used to synchronize the activities of their lower level routines with KL1 programs. Other arguments, Data, DataP, NewData, TransferredLength are used to specify such buffers. 276 name service file I PIMOS I Tasks I (2) request ............ .... ~===Client Task .... .... # Server Task===1==..... (3) insertion of protection filter ~~~ Figure 9: Communication between client and server (1 ) ~~====Serv~TMk======~ Figure 10: Communication between client and server(2) 277 3.5 Virtual Machine As all the communication between the user programs and PIMOS is initiated through the control and report streams of the shoen which implements the user task, a user program can emulate PIMOS and make application programs run under its supervision. This is useful for debugging application programs. The same technique can also be used to debug PIMOS itself by writing a BIOS emulator, as all the other parts of PIMOS communicate with BIOS through paths established using the shoen mechanism. Figure 11 depicts an actual implementation of a virtual machine on PIMOS. As the virtual machine is a usual task in PIMOS, the protection mechanism of PIMOS prevents failures in the version of PIMOS being debugged on the virtual machine from being propagated to the real PIMOS. This facility has been conveniently used in debugging the kernel of PIMOS. Physical machine Virtual machine (task) of experimental parallel application software for about three and a half years already, proving the feasibility and practicality of implementing an operating system in concurrent logic languages. Acknowledgement Many of researchers of ICOT and other related research groups. Too numerous to be listed here, participated in the design and implementation of the operating system itself and development tools. We would also like to express our thanks to Dr. S. Uchida, the manager of the research center of ICOT, and Dr. K. Fuchi, the director of the ICOT research center, for their valuable suggestions and encouragement. References [Chikayama et al. 1988] T. Chikayama, H. Sato and T. Miyazaki. Overview of the Parallel Inference Machine Operating System (PIMOS). In Proceedings of the International Conference on Fifth Generation Computer Systems, ICOT, Tokyo, 1988, pp.230-251. [Foster 1988] 1. Foster. Parlog as a Systems Programming Language. Ph. D. Thesis, Imperial College, London, 1988. BIOS simulator Kemel(resource tree) BIOS Figure 11: Virtual machine on PIMOS 4 Conclusion The resource management scheme used in PIMOS based on the concurrent logic language KLI is described. It depends heavily on the meta-level control mechanism called shoen provided by the language for efficient hierarchical resource management. PIMOS itself has a hierarchical structure, consisting of a kernel and server tasks. This structure enables a flexible system configuration and reinforces the robustness of the system. The system consisting of parallel inference machines (Multi- PSI and recently PIM) and earlier versions of PIMOS has been heavily used in research and development [Goto et al. 1988] A. Goto, M. Sato, K. Nakajima, K. Taki and A. Matsumoto. Overview of the Parallel Inference Machine Architecture (PIM). In Proceedings of the International Conference on Fifth Generation Computer Systems, ICOT, Tokyo, 1988, pp.208-229. [Nakajima et al. 1989] K. Nakajima, Y. Inamura, N. Ichiyoshi, K. Rokusawa and T. Chikayama. Distributed Implementation of KLI on the MultiPSIjV2. In Proceedings of the Sixth International Conference on Logic Programming, 1989, pp.436-451. [Shapiro and Takeuchi 1983] E. Shapiro and A. Takeuchi. Object Oriented Programming in Concurrent Prolog. In New Generation Computing, VoLl, No.l(1983), pp.25-48. [Shapiro 1984] E. Shapiro. Systems Programming in Concurrent Prolog. In Proceedings of the 11th A CM Symposium on Principles of Programming Languages, 1984. [Taki 1992] K. Taki. Parallel inference machine PIM. In Proceedings of the International Conference on Fifth Generation Computer Systems, ICOT, Tokyo, 1992. [Ueda 1986] K. Ueda. Guarded Horn Clauses: A Parallel Logic Programming Language with the Concept of a Guard. Technical Report TR-208, ICOT, 1986. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 278 The Design of the PIMOS File System Takeshi MO RI Masaki SATO Takashi CHIKAYAMA F'umihide ITOH Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan Tadashi SATO Tatsuo KATO Mitsubishi Electric Computer Systems (Tokyo) Corporation 87-1, Kawakami-cho, Totsuka-ku, Yokohama 244, Japan Abstract This paper describes the design and implementation of the PIMOS file system. The file system was designed for loosely-coupled multiprocessor systems, where caching is essential for reducing not only disk accesses but also for communication between processors. To provide applications with flexible load distribution, the caching scheme has to support consistency semantics under which modifications of a shared file immediately become visible on other processors. Two different caching schemes, one for data files and the other for directories, have been designed. This is necessary because they have different access patterns. Logging the modifications of directories and other essential information secures the consistency of the file system in case of system failure. Multiple log areas reduce the time required to write logs. Buddy division of blocks enables released blocks to be collected efficiently. Hierarchically organized free block maps control buddy division. The file system has been implemented on PIM. File systems on external 110 systems poorly meet requests from multiprocessors, due to limited communication bandwidths. We, thus, constructed an internal file system on disks incorporated into multiprocessor systems. Distributed file systems are similar to our file system in that shared files are accessed from processors connected via a network, with some communication delay. However, the communication bandwidth of the network is much broader in our case. Also, processes using files are normally cooperate rather than compete. These considerations affect the design. We have clarified functions essential to file systems for loosely-coupled multiprocessor systems, and then considered how to implement them. Although the system is an experimental one, we included the essential features of practical file systems in our design, such as disk access optimization. The system has been implemented . as a part of PIMOS, in concurrent logic language KL1 [Ueda and Chikayama 1990]. 2 1 Introduction PIMOS [Chikayama et al. 1988] has been developed by the Fifth Generation Computer Systems project of Japan as the operating system for PIM [Goto 1989] as a part of the parallel inference system for knowledge information processing. PIMOS has a file system which was designed to realize a robust file system optimized for loosely-coupled multiprocessor systems like PIM. This paper describes the design and implementation of this file system. The file system for the parallel inference system should provide a bandwidth broad enough to support knowledge information processing application software running on high performance parallel computers. To allow flexible load distribution, the semantics it provides should be location-independent. That is, the contents of files should look the same to the program regardless of the processor it is running on. Design Principles In order to draw parallelism from loosely-coupled multiprocessor systems, centralizing loads to a small number of server processors with disks should be· avoided. The cost of communications between processors is more expensive in loosely-coupled multiprocessor systems than in tightly-coupled ones, and the cost of disk accessing is still more expensive than that of communication. Thus, both disk accessing and interprocessor communication should be reduced. This necessitates distributed caching. Data cached in memory may be lost upon system failure. For dat,a files, the loss is limited to files being modified at the time of the failure. Loss in a modified directory, however, may cause inconsistency in the file system, such as a deleted, nonexistent file still being registered in a directory, The loss may spread to files under the directory, even though they were not accessed at the time of the failure. Consequently, the file system needs pro- 279 tection against failure to preserve its consistency. Disk access optimization is one of the primary features of practical file systems. Most of the overheads in disk accesses are seeks, so a reduction in seek cost, i.e., the number of seeks and per-seek cost, is required. 3 Design Overview We allowed multiple servers to distribute the loads of file accesses. A caching mechanism was incorporated to reduce disk accesses and communication among processors. A logging mechanism secures the consistency of the file system against system failure. A disk area management scheme similar to that of conventional file systems reduces the seek time for disk accessing. An overview of these features is described in this section. 3.1 Multiple Servers In order to draw parallelism from multiprocessor systems, load centralization should be avoided. File systems have inherent centralization in that a disk can be accessed only by a processor connected to it. Multiple disks connected to multiple processors with a server running on each relaxes centralization and make the system scalable. A processor with disks can run a server, but the processor is not dedicated to it. The server processors also operate as clients when their disks are not accessed, providing better utilization of computational resources on multiprocessor systems. 3.2 Caching Mechanism In order to reduce disk accessing and interprocessor communication, data files and directories are cached onto all processors that access them. 3.2.1 Caching of Data Files Consistency semantics for caches of the same data on different processors has been realized. The execution of an application program on a multiprocessor system is distributed among processors. The strategies of distribution are diverse and depend upon the application. In order to distribute computation flexibly, file access result must be identical no matter which processor accesses the file. In other words, modification by another processor has to be visible immediately. This kind of consistency semantics is called Unix semantics [Levy and Silberschatz 1989] in distributed file systems. It was originally introduced to maintain software compatibility between distributed and conventional uniprocessor Unix systems. For the same reason, Unix semantics are indispensable to a file system for multiprocessor systems. There is no problem in sharing a file when all the sharers merely read the file. When a file is shared in write mode, the simplest way to support Unix semantics is to omit caching and centralize all accesses to the file on the server. This method is reasonable in the environments where shared files are rarely modified. On multiprocessor systems, where processors solve problems cooperatively, modifying shared files is quite common, since distributing the computational load between processors, including file accessing, is essential for efficient execution. Consequently, a caching mechanism is designed in which a shared file can be cached even if it can be modified and Unix semantics are preserved. 3.2.2 Caching of Directories In order to identify a file, the file path name is analyzed using directory information. The caching of directories along with the caching of data files can be used to avoid the centralization of loads to server processors and reduce communication with those processors. Accessing directories is quite different from accessing to data files. Data files are read and written by users, and the contents of files are no concern of the file system. On the other hand, the contents of directories form a vital part of the file system. Thus, a different caching mechanism for directory information was designed. 3.3 Logging Mechanism Modifications of directories and other information vital for the file system are immediately logged on disk. Modifications are made to data files much more often, and writing all modifications immediately to a disk decreases performance severely. Instead, we provide a mechanism which explicitly specifies the synchronization of a particular file. Simply writing to a disk immediately does not assure the consistency of the file system. For example, if the system fails while moving a file from one directory to another, the file may be registered in either both or none of the directories, depending on the internal movement algorithm. This inconsistency can be avoided by twophase modification. First, any modification is written as a log to an area other than the original. Second, the original is modified when logging is complete. If the system fails before the completion of the logging, the corresponding modification is canceled. If the system fails after completion of the logging but before the modification of the original, the original is modified using the log in a recovery procedure, validating the corresponding modification. In either case, the consistency of the file system is preserved. The system may fail while a log is being written, leaving an incomplete log. In order to detect this, we introduced a flag to indicate the end of a log that corresponds to an atomic modification 280 transaction. The completion of logging can be regarded as completion of the modification. The original may be modified at any time before the log is overwritten. This means that logging does not slow down response time. Rather, it improves response time. For example, when a file is moved, two directories have to be modified. The modification of the two originals may need two seeks. Writing the log needs only one seek. Moreover, we use multiple log areas and write the log to the area closest to the current disk head position to reduce the seek time. A log contains the disk block image after modification. Because the block corresponding to a more recent modification overrides the older modifications, only the newest constituent must be copied to the original. The more times the same information is modified, the less times the original is modified. Frequent modification of the same information, which is known to be the case in empirical studies [Ousterhout et al. 1985], minimizes the throughput decline caused by extra writing for logging. Each log area is used circularly, overwriting the oldest log with a new log. In order to reduce disk accessing, the modifications of the original should be postponed as long as possible, that is, until immediately before the corresponding log is overwritten. To detect the logical tail of a log area, namely the last complete log, each log block has a number, named a log generation, which counts the incidences of overwriting the log area. The multiplicity of log areas has caused a new problem to arise: how can the newest block be determined after a system failure. If there is only one log area, the newest log block is the closest one to the logical tail of the log area, and the log blocks are always newer than or as new as the corresponding original block. However log blocks in different areas do not show the order in which they were written. If the newest log block is overwritten after it is copied to the original block, the original block is newer than the remaining log blocks. We have solved this problem by attaching a number, named a block generation, to the log blocks and to the original blocks. The block generation counts incidences of modifying the block. 3.4 Disk Area Management To reduce the number of seeks, the unit of area allocation to files should be made larger. Larger blocks cause lower storage utilization, as a whole large block must be allocated even for small files. Our solution is to provide two or more sizes of blocks and to allocate smaller blocks to small files. To reduce the time per seek, a whole disk is divided into cylinder groups, and blocks of one file are allocated in the same cylinder group as much as is possible. The log areas mentioned in the previous subsection are placed in each cylinder group. These methods are commonly used in conventional file systems. A unique feature of the PIMOS file system is buddy division of a large block into small blocks, which reduces disk block fragmentation. 4 4.1 Implementation Multiple Servers The whole file system consists of logiCal volumes, each of which corresponds to one file system of Unix. A logical volume can occupy the whole or a part of a physical disk volume. The processor connected to the disk becomes the server of files and directories in the logical volume. Logging and disk area management in the volume is also the responsibility of the server. 4.2 4.2.1 Data File Caching Mechanism Overview To realize Unix semantics with reasonable efficiency on loosely-coupled multiprocessor systems, we decided to stress the performance of exclusive or read-only cases, and tried to minimize disk accesses and interprocessor communication in such cases. The unit of caching is a block, which is also the unit of disk 1/0. This simplifies management and makes caches on server processors unnecessary. A processor where caches are made is called a client, as in distributed file systems. Each client makes caches from all the servers together and swaps cached blocks by the least recently used (LRU) principle. Unix semantics is safeguarded by modifying the cache after excluding caching on other clients. The caching mechanism is similar to that for coherent cache memory [Archibald and Baer 1986]. While a coherent memory caching scheme depends on a synchronous bus, our platform, a loosely-coupled multiprocessor system, provides only asynchronous message communication. This means that we must consider message overlaps. A ~lient classifies each cached block into five permanent states, according to the number of sharers and the necessity of writing back to the disk. In addition, there are three more temporary states. In the temporary states, the client is awaiting a response from the server to its request. A server does not know the exact state of cached blocks, but only knows which clients are caching the blocks., Requests for data, replies to the requests, and other notifications needed for coherence are always transferred between the server and clients, rather than directly between clients. Cached data itself may be transferred directly between clients. 281 4.2.2 Cache States The principle for keeping cache coherence is simple: allowing modification by a client only when the block is cached by no other client. To realize this, "shared" and "exclusive" cache states are defined. Permanent cached block states can be as follows: Invalid (I) means that the client does not have the cache. Exclusive-clean (EC) means that the client and no other clients have the unmodified cache. Exclusive-modified (EM) means that the client and no other clients have the modified cache. Shared-modified (SM) means that the client has the modified cache, and some other clients mayor may not have cache for the same block. Shared-unconcerned (SU) means that the client has the cache but does not know whether it was modified, and some other clients mayor may not have cache for the same block. Temporary cached block states can be as follows: Waiting-data (WD) means that the client does not have and is waiting for the data to cache, and that the data can be shared with other clients. Waiting-exclusive-data (WED) means that the client does not have and is waiting for the data to cache, and that the data cannot be shared with other clients, as the client is going to modify it. Waiting-exclusion (WE) means that the client already has the cache and is waiting for the invalidation of caches on all other clients. In other words, the client is waiting to become exclusive. 4.2.3 State Transition by Client Request A request from a user to a client is either to read or to write some blocks. Another operation needed for a cache block is swap-out, i.e., to write the data back to the disk forcibly by LRU. This request or operation is accepted only in permanent states, and is suspended in temporary states as the client is still processing the previous request. The state transition for a request to read is shown in Figure 1(a). If the state is I, the client requests the data to the server, changes its state to WD, and waits. After a while, the server reports the pointer to the data and the state to change to. The pointer points to another client when it already has the data, or to the server when the server read the data from the disk because no clients have the data. The client reads the data, lets the user read it, and changes to EC, SM, or SU according to the report. If the state was originally EC, EM, SM, or SU, the client simply lets the user read the available data and stays in the same states. The state transition for a request to write is shown in Figure l(b). If the state is I, the client requests exclusive data to the server, changes to WED, and waits. After a while, the server reports the pointer. The client reads the data, lets the user modify it, and changes to EM. If the state was originally EC or EM, the client lets the user modify the data immediately, and changes to or stays in EM. If the state was SM or SU, the client requests the server to invalidate caches in other clients, changes to WE, and waits. Then, if the server reports completion of the invalidation, the client lets the user modify the data and changes to EM. Another client may also request the invalidation simultaneously, and its request may reach the server earlier. In this case, the server requests the invalidation of the cache, and the client abandons the cache and changes to WED. Eventually, after the server receives the request to invalidate from the client, the pointer to the data is reported. The state transition for swap-out is shown in Figure 1( c). The client reports the swap-out to the server, and changes to I. If the state is EM or SM, the pointer to the data is also reported at the same time. The server reads the data and writes it back to the disk when it cannot make any other client EM or SM. If the state is EC or SU, writing the data back to the disk is not required, as the data is either the same as that on the disk or is cached by some other client. 4.2.4 State Transition by Server Request A request from the server to a client is either to share, to yield, to invalidate or to synchronize the cache. It is accepted not only in permanent states but also in temporary states. A request to share is caused by a request to read by another client. The state transition for this is shown in Figure 2(a). If the state is EC or SU, the client reports the pointer to the data and indicates that the requesting client should change to SUo In each case, the state of the requested client after replying is SU. If the state is EM or SM, there is a question of which client should take responsibility for writing back the data. In the current design, the requesting client takes it. Consequently, the requested client reports the pointer to the data and indicates that the requesting client should change to SM. The requested client becomes SUo A client may receive a request to share in state I, if swap-out overlaps with the request. In this case, the server knows of the absence of the data when it receives the swap-out. The client consequently ignores the request. Moreover, the client will possibly receive the request while awaiting data after swapping it out in WD or WED. The client can also ignore the request. Fur- 282 (c) Swap out (b) Write (a) Read Figure 1: State transition diagrams by a request to the client (a) Share (b) Yield or invalidate (c) Synchronize Figure 2: State transition diagrams by a request from the server thermore, the client can receive the request in WE if the request to invalidate other caches overlaps with the request. In this case, the client, while waiting in WE, reports the pointer to the data and indicates that the requesting client should change to state SUo Completion of the invalidation will be reported, after the request of invalidation is received by the server. The actions of a client for requests to yield and to invalidate are the same, except when the pointer to the data is reported. State transition is shown in Figure 2(b). If the state is EC, EM, SM or SU, the client reports the pointer to the data or simply abandons the cache, and changes to I. The client can receive the requests in I, WD, or WED and ignore them, for the same reason as when a request to share is received. If the state is WE, the client reports the pointer or abandons the cache, and changes to WED, as described in the case of a request to write to the client. On a request to synchronize, the server requests a client to send the data and write it back to the disk. State transition is shown in Figure 2( c). If the state is EM or SM, the client reports the pointer to the data and changes to EC from EM, or SU from SM. If the state is EC or SU, the client reports that writing back is unnecessary. If the state is I, WD, or WED, the swap-out has overlapped with the request. The client may ignore the request because the server will receive the swap-out message with or without the pointer to the data. If the state is WE, the client reports the pointer, while awaiting completion of the invalidation in WE. The temporary states enable message overlaps to be dealt with efficiently. 4.3 Directory Caching Mechanism Most accesses to directories are to analyze file path names. In order to analyze a file path name on a client, directory information is cached. The unit of caching is 283 one member of a directory. Each client swaps caches by LRU. The server maintains information on the directories and members that clients cache. The server also caches the disk block images of cached directories, and when a member is added or removed, it modifies the images and writes them to the disk as a log. When a file path name is analyzed on a client, the members on the path are cached to the client one after another. If the same members appear on subsequent path name analyses, the cached information is used. When a member is added to a directory, the member is added to caches after the addition is logged by the server. These operations require communication only between the client and the server. vVhen a mem.ber is removed, the removal is notified to the server. The server requests the invalidation of the cache to all the clients caching the member. After the server has received acknowledgement of invalidation from all the clients, the server writes a log, thus completing the removal. Although this removal may take time, it is not expected to affect the total throughput of directory caching because frequently updated members of directories are not likely to be cached by clients other than the one that modifies them. Information about access permission is also necessary for analyzing path names. Therefore, it is cached in the same way as directory information. 4.4 Logging Mechanism 4.4.1 Log Header In order to manage logging, each block in the logs and the originals has a header consisting of the following items. Except for block generation, the information has no relevance in the original. Block identifier shows the corresponding original block. It consists of a file identifier and a block offset in the file. Log generation counts the number of times a log area is used. Atomic modification end is a flag which shows the last block of a log corresponding to an atomic modification. Block generation counts the number of times a block has been modified in order to identify the newest block among logs and the original. Because the size of a header is limited, the maximum size of numbers allowed in the log generation and block generation items are limited. However, as we will discuss, three log generations, at most, can exist in a log area at anyone time. This means that the cyclic use of three or slightly more generations is sufficient. Limited numbers Lo a a • ,I I. • n If+-.- - - - - - - + 1 .I • • n 2n -1 2n I- 1 Figure 3: Distribution of block generations for block generations can also be used cyclically. This assures that the newest block is always spotted in the following way. Suppose that 2n numbers, from a to 2n - 1, are used for block generation. Block generation starts from 0, increases by 1 until 2n - 1 is reached, and returns to o. We have introduced the following control: if the absolute value of the difference between the block generation of the log block to be written next, L N , and that of the oldest existent block, L o , is equal to n, the oldest block, whether it is in a log or is the original, is invalidated before the next log is written. Distribution of block generation under this control is shown in Figure 3. As is shown, the invariant condition is that LN - Lo < n if LN > L o , and Lo - LN > n if Lo > LN· Consequently, after a failure, the newest block is spotted as follows. The distribution of block generations dictates that either all of the generations are in a range narrower than n or that the distribution has a gap wider than n. In the former case, the newest block is the one which has the largest generation. In the latter case, it is the one which has the largest number in the group below the gap. In practice, the invalidation of the oldest block occurs rarely. Our current implementation allocates 24 bits for block generation. Invalidation occurs only if there are 223 = 8,388,608 modifications to the same block and, in addition, if the oldest block happens not to have been overwritten by modifications. However, even one modification every ten milliseconds during one whole day barely amounts to 100 x 60 x 60 x 24 = 8,640, 000. 4.4.2 Logging Procedure While the file system is in operation, logs are written as follows: 1. Create after-modification images of a set of blocks. Set block identifiers and block generations. Line up the blocks and set an atomic modification end flag to the log header in the last block. 284 2. Choose the log area in the cylinder group where the disk heads currently reside. Set log generations to the log headers. Write the log, the sequence of the blocks, to the log area. (1) 1111111 ~.~ 1111111111111 ~.~ ~ (2) 3. Report the completion of logging. 4. Make room in the log area for the subsequent writing. In other words, if the newest blocks are in the part where the next log to the area will be written, copy them to the corresponding original blocks. (3) Log area table maintains the next log position and the log generation in each log area. Log record table maintains the block identifier corresponding to each position in the log areas. Log block table maintains, for every block that has at least one log, the position and the block generation of each log, and the block generation of the original. 4.4.3 Recovery Procedure After a system failure, the tables for log management are recovered as follows: 1. Find out decreasing points in log generation in each log area. 2. Choose the first of the decreasing points as the tentative logical tail of each log area. 3. Find out the real logical tail of each log area by rejecting the incomplete log from the tentative logical tail. 4. Decide the logical head of each log area and recover the tables from valid log blocks. Decreasing points in log generation show that the log blocks were logged last before system failure or were being logged at the time of the system failure. There may be more than one decreasing point if an intelligent disk Jo I0I~,~ 10 10 101 ~ ~.~ 1111111111111 ~.= ru& (1) One generation (2) Two generations (3) Tree generations 5. Invalidate the oldest blocks if necessary, i.e., if there are blocks whose next modifications will require them to be invalidated. Invalidation is performed by setting a null block identifier to the log header. Making room and invalidating can be done at any time before the next log is written. It should be done immediately after logging to get the best response at the next logging. The size of the room is made to be the maximum size of a log corresponding to an atomic modification, or slightly more. When a logical volume is dismounted, all the newest blocks are written to the corresponding originals. The following tables in memory are used to control logging: 1111111 ~.~ 111 &0 I -U- : Tentative tail 1: Other decrease Figure 4: Distribution of log generations drive changes the order of writing of physical blocks to promote efficiency. In this case, there is also one less increasing point than the number of decreasing points, and the decreasing and increasing points are distributed in the range of one atomic log. Taking into account the circular use of a log area, the log generation of the physical first block is usually one larger than that of the physical last block. If the two generations are equal, the physical tail of a log area is one of the decreasing points in log generation. Examples of the distribution of log generations are shown in Figure 4. There can be one, two, or three log generations in a log area. If there is only one decreasing point in log generation, it becomes the tentative logical tail. If there are two or more decreasing points, the first one is selected as the tentative logical tail. The real logical tail is immediately after the last block with an atomic modification end flag before the tentative logical tail. Two tails are identical if the block immediately before the tentative logical tail has the flag. The logical head is a certain number of blocks away from the real logical tail. The number of blocks corresponds to the room made for the next log writing. Valid log blocks consist of the blocks between the logical head and the real logical tail. After the tables are recovered, the file system can start operation. 4.5 Disk Area Management To manage the buddy division of large blocks, we use a hierarchy of free block maps in memory as shown in Figure 5. Each free block is registered as free in only one map. We also maintain the number of free blocks registered in each map. When a free block of a certain size is required and the map of that size has enough free blocks, the map is searched. If it does not have enough free blocks to make 285 Hierarchy of maps in memory (1) Blocks 5 (2) Half blocks (3) Quarter blocks (minimum size) (1) 1'---_0_-'--_1_-'----_0_-'--_0------' (2) 1 0 o o On mounting o o 11 -1J, o o 1 On dismounting Conclusion The design and implementation of the PIMOS file system has been described. A multiplicity of servers distributes the file system loads to them and draws out scalability from multiprocessor systems. The caching mechanism, which guarantees Unix semantics, enables applications, including file' accessing, to be executed in parallel easily. The logging mechanism secures the consistency of the file system against system failure. The buddy division of free blocks suppresses fragmentation without much overhead. We are already implementing the file system on PIM. The tuning of parameters and the evaluations of the file system are to be done in the future. An original map on a disk 0: Used 1: Free Figure 5: Hierarchy of free block maps the search efficiently, the map for blocks of twice the size is searched. This continues until the map of the largest block size is reached. When a block is released and the buddy of the block is free, the two blocks are united and become one free block of twice the size. Otherwise, the released block alone becomes free. The hierarchy of maps is unfolded from the free block map on a disk whose unit is the smallest block when the logical volume is mounted. It is folded into the original map and saved on the disk when the volume is dismounted. We use the two-step allocation method common to conventional file systems. In the free block map of the largest block size in memory, only some of the free blocks are registered as free. Another map of the largest block size is made and written to the disk where, in addition to the original used blocks, the free blocks registered as free in memory are registered as used. In this way, the map ensures that the blocks registered as free on it are free, though those registered as used are not necessarily used. Consequently, the file system can start up after a system failure, using the map of the largest block size on the disk, without a time-consuming scavenging operation. When free blocks in memory become scarce, some are added to the map in memory, and the map entries on the disk corresponding to those blocks are changed to "used" . Conversely, when free blocks in memory become surplus, some are removed from the map in memory, and the map entries on the disk corresponding to those blocks are changed to "free". The scarcity and the surplus are judged based on threshold numbers of free blocks in memory. Acknow ledgement We would like to thank Mr. Masakazu Furuichi at Mitsubishi Electric Corporation and Mr. Hiroshi Yashiro at ICOT for their intensive discussions. We would also like to express our thanks to Dr. Shunichi Uchida, the manager of the research department, and Dr. Kazuhiro Fuchi, the director of the research center, both at ICOT, for their suggestions and encouragement. References [Archibald and Baer 1986] J. Archibald and J. 1. Baer. Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model. ACM Transactions on Computer Systems, Vol. 4, No.4 (1986), pp. 273-298. [Chikayama et al. 1988] T. Chikayama, H. Sato and T. Miyazaki. Overview of the Parallel Inference Machine Operating System (PIMOS). In Proc. Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988, pp. 230251. [Goto 1989] A. Goto. Research and Development of the Parallel Inference Machine in FGCS Project. In M. Reeve and S. E. Zenith (Eds.), Parallel Processing and Artificial Intelligence, Wiley, Chichester, 1989, pp. 65-96. [Levy and Silberschatz 1989] E. Levy and A. Silberschatz. Distributed File Systems: Concepts and Examples. TR89-04, Department of Computer Sciences, The University of Texas at Austin, Austin, 1989. [Ousterhout et al. 1985] J. K. Ousterhout, H. D. Costa, D. Harrison, J. A. Kunze, M. Kupfer and J. G. Thompson. A Trace-Driven Analysis for the UNIX 4.2 BSD File System. In Proc. 10th ACM Symposium on Operating Systems Principles, ACM, New York, 1985, pp. 15-24. [Ueda and Chikayama 1990] K. Ueda and T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. The Computer Journal, Vol. 33, No.6 (1990), pp. 494-500. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 286 ParaGraph: A Graphical Tuning Tool for Multiprocessor Systems Seiichi Aikawa Mayumi Kamiko Hideyuki Kubo Fumiko Matsuzawa Institute for New Generation Fujitsu Limited Computer Technology 1015, Kamiodanaka Nakahara-ku, 4-28, Mita 1-chome, Minato-ku, Kawasaki 211, Japan Tokyo 108, Japan Abstract Distributing computational load to many processor is a critical issue for efficient program execution on multiprocessor systems. Naive even distribution of load, however, tends to increase communication overhead considerably, which must also be minimized for efficient execution. It is almost impossible to achieve optimal load distribution automatically. It is especially so on scalable loosely-coupled multiprocessor systems, since the communication cost is relatively high. Finding a good load distribution algorithm is one of the most important research topics for parallel processing. Tools for evaluating load distribution algorithms are very useful for this kind of research. This paper describes a system called ParaGraph that gathers periodical statistics of the computational and communication load of each processor during program execution, in both the higher level of programming language and lower level of implementation, and presents them graphically to the user. 1 Introduction In the Japanese Fifth Generation Computer Systems Project, parallel inference systems have been developed for promoting parallel software research and development. The system adopts a concurrent logic programming language KL1 [Ueda 90] as the kernel and consists of a parallel inference Illachine, PIM [Goto 88] and its operating system, PIMOS [Chikayama 88]. For efficient program execution, the computational load must be appropriately distributed to each processor. On scalable loosely-coupled multiprocessor systems, load balancing. and minimization of communication overhead are essential, but become more difficult compared to tightly-coupled systems as communication costs increase. Although many load distribution algorithms have been developed [Furuichi 89, Kimura 89], none have been sufficient to execute every program effectively. Finding a good load distribution algorithm is one of the most important research topics for parallel processing. Takashi Chikayama Tools for evaluating load distribution algorithms are very useful for this kind of research. The objective of the ParaGraph system is to help programmers design and evaluate load distribution algorithms on loosely-coupled multiprocessor systems. ParaGraph gathers profiling information during program execution on the parallel inference machine, PIM, and displays it graphically. Many performance displays have been devised for special purpose, processor utilization, communication, and program execution[Malony 90, Heath 91]1. Profiling information can be viewed as having three axes: what, when, and where. We have designed graphical views based on three axes to display every kind of information with the same form. We also have designed graphical views to be easy to compare the profiling information. This is because bottlenecks are often determined by comparing with the contents of the information relatively in overall execution. In Section 2, how load distribution can be described in KL1 on PIM are described. Section 3 describes the implementation of the ParaGraph system and graphical representation of program execution, and Section 4 discusses how useful graphical displays are to detect performance bottlenecks with examples of various programs. Section 5 concludes the paper. 2 2.1 Load Distribution Algorithms Load distribution in KL1 The parallel inference machine runs a concurrent logic programming language called KL1 [Ueda 90, Chikayama 88, Ichiyoshi 89]. A KL1 program consists of a collection of guarded Horn clauses of the form: where H, Gi , and Bi are atomic formulas. H is called the head, Gi , the guard goals, and Bi the body goals. The guard part consists of the head and the guard goals and the body consists of body goals. They are separated l[Heath 91J describes a tool having the same name as our system, but they are quite different. 287 by the commitment operator(I). A collection of guarded Horn clauses whose heads have the same predicate symbol P and the same arity N, define a procedure P with arity N. This is denoted as PIN. The guard goals wait for instantiations to variables (synchronization) and test them. When the guard part of one or more clauses succeed, one of those clauses is selected and its body goals are called. These body goals communicate with each other through their common variables. If variables are not ready for testing in the guard part because the value has not been computed yet, testing is suspended. In addition to the above basic mechanism, there is a mapping facility. The mapping facility includes load distribution specification2 • The programmer can annotate the program by attaching pragmas to the body goals to specify a processor (specified by Goal@node( Proc) ). The programmer must tell the KL1 implementation which goals to execute on which processors. next_queen(N,I,J,B,R,D,BL):- J>o, D=O I BL = {BLO,BL1}, R = {RO,Ri}, BLO = [get(Proc)IBL2], try_ext(N,I,J,B,RO,D,BL2)~node(Proc), next_queen(N,I,-(J-l),B,Rl,D,BL1). Figure 1: A sample KL1 program Figure 1 shows a part of a KL1 program. If the goal next_queen/7 is commit'ted to this clause, its body goals are called. The goal try_ext/7 has a processor specification, and it is to be executed on processor number "Proc". This processor number can be dynamically computed. 2.2 Design Issues Load balancing derives maximum performance by efficiently utilizing the processing power of the entire system. This is done by partitioning a program into mutually independent or almost independent tasks, and distributing tasks to processors. Many load balancing studies have been devised, but they are tightly coupled to particular applications. Therefore, programmers have to build load distribution algorithms for their own applications. To distribute the computational load efficiently, the programmer should keep in mind the following points. Since load distribution is implemented by using goals, the programmer should understand the execution behavior of each goal. When goals are executed on a looselycoupled multiprocessor, the programmer should investi2The other mapping facility is priority specification to specify what priority the goal should be executed. gate the load on individual processors and the communication overhead between processors. For evaluating load distribution algorithms, tools must provide many graphic displays for the programmer to understand the computational and communication load of each processor in both the higher program and lower implementation levels. No single display and no single profiling level can provide the full information needed to detect performance bottlenecks. System Overview 3 3.1 Gathering Information To statistically profile large-scale program execution, KL1 implementation provides information gathering facilities, processor profiling and shoen profiling. KL1 implementation provides these facilities as language primitives, to minimize the undesirable influence to the execution behavior of programs. These facilities have been implemented at the firmware level. The profiling facilities are summarized as follows. • Processor profiling Profiles the low-level behavior of the processor, such as how much CPU time went to the various basic operations required for program execution. • Shoen profiling Profiles the higher-level behavior of the processor, such as how many times each piece of the program was executed. To minimize the perturbation, the gathered profiling information resides in each processor's local memory during program execution, and after execution, ParaGraph collects and displays this information graphically. Since profiling information is automatically produced by the KL1 implementation, programmers do not have to modify the application programs. 3.1.1 Processor Profiling The basic low-level activities can be categorized into computation, communication, garbage collection, and idling. Computation means normal program execution such as goal's reductions and suspensions, communication means sending and receiving inter-processor messages, garbage collection means itself, and finally, idling means doing nothing. The processor profiling facility measures how much time went to each category for each processor. Such information can be periodically gathered to show gradual changes of behavior. The profiling facility can also measure frequencies of sending and receiving various kinds of interprocessor messages [Nakajima 90]. 288 • A throw_goal message transfers a KLI goal with a throw goal pragma to a specified processor. • A read message requests for some value from the remote processor when a clause selection condition requires it. • An answer_value message replies to a read message when the request value becomes available. • A unify message requests body unification (giving a value to a variable). Every type of profiling informa.tion can be easily displa.yed with the views described below with a. menuoriented user interface such as the bottom-right window in Figure 2. If the window size is too, small to displa.y everything in detail, coarser display a.ggregating several cycles or several processors together is possible to see the overall beha.vior at a glance. Scrolling on the vertical a.nd horizontal directions are also possible if details are to be examined. It is also possible to displa.y only selected "Wha.t" items. 3.2.1 3.1.2 Shoen Profiling "Shoen" [Chikayama 88P is a mechanism provided in KLI for grouping goals and controlling their execution in a meta-level. The shoen mechanism can be considered to be an interpreter for the KLI language. It also provides profiling facility at a higher level than processor profiling. Processor profiling gathers a number of important statistics from many aspects that help analyzing performance bottlenecks, but it provides no information on where in the program is the root of such a behavior. To correlate execution behavior with a portion of the program, shoen profiling measures how many times goals associated with each predicate are reduced or suspended (due to unavailability of data required for reduction). Transition of behavior can be observed by periodically gathering the information. 3.2 Graphic Displays The profiling information can be viewed as having three axes: what, when, and where. In sequential execution, "where" is a constant and the "when" aspect is not important, since the execution order is strictly designated. Therefore, simple tools like gprof provided with UNIX4 suffice. However, all three axes are important when parallel execution is concerned. If such massive information is not presented carefully, the user might be more confused than informed. Therefore, ParaGraph provides a variety of graphic displays. We named each representation using the terms "What," "When," and "Where." The term "What" is the visualization target corresponding to the type of profiling information such as low-level processor behavior, higherlevel processor behavior, and interprocessor message frequencies. The term "When" indicates time expressed by an integer that is a cycle number. The term "Where" indicates the processor' number and is expressed by an integer. Figure 2 shows the graphic displays of ParaGraph. These displays are execution behavior of all solution search program of N queen problem. 3The word "shoen" is a Japanese word that means "manor". 4UNIX is a trademark of AT&T Bell Laboratories A WhatxWhen View There a.re two kinds of views in terms of "Wha.t" and "When" items. One is a. Wha.t x When view which shows the behavior of each "What" item during execution. A gra.ph is displayed of a "Wha.t" item in order of the total volume. The x axis is the cycle numbers, and the y axis is the rate of processor utilization, the number of messages, and the number of reductions or suspensions corresponding to the type of profiling information. Since every graph is drawn with the same scale on the vertical axis, it is easy to compare with "Wha.t" items. The other is an overall What x When view which shows the behavior of all "What" items during execution. Each "What" item is stacked in the same graph and displayed as a line. The y axis represents the average rate of processor utilization, the total number of messages, and the total number of reductions and suspensions 'corresponding to the type of profiling informa.tion. These views are helpful for example, if a progra.m has sequential bottlenecks such as tight synchroniza.tion. In this case, the number of goal reductions will be down at some portion during program execution. Such a. problem will be detected easily by observing program execution. The top-left window in Figure 2 shows received message frequencies on all processors with What x When view. In this window, four kinds of received message frequencies are displayed on each gra.ph. These messages are displayed in order of the total number of received messages. The other messages are displa.yed by scrolling vertically. From this, we know that each received message frequency on all processors is less than 6,500 times/an interval (an interval is 2 second). As this program is divided almost mutually independent subtasks, communication message frequency is very low. 3.2.2 A When x Where View A When x Where view shows the behaviors of aJl "What" items on each processor. Each processor is displayed with various color patterns that indicate volume. The rela.tionship between color patterns and volume are shown in the bottom right corner. The darker the pattern, the busier the processor. Volume means the rate of processor utilization, the number. of messages, and the number of 289 " ."....~ DGC 4000 o receive rnmm send 2000 o r •• ~ 5 7 T "::~, a 3: 2.01~ 2000 o~ ... n ••• "_ .... .,lu. 5 III compute 7 :~T uni+~ 5 7 ~~~~~--~~=~~7T I.:·.~.:..:.~.:.~. ~1lI ~m~ so ... 1. x:lc .... cle ~: 1 noel. Figure 2: Sample graphic displays: a What x When view (top-left window), an overall What x Where view (top-right window), and a When x Where view (bottom-left window) and a menu-oriented user interface (bottom-left window) reductions or suspensions that correspond to the type of profiling information. It's also possible to display only selected "What" items instead of all of them. The bottom-left window in Figure 2 is a When x Where view. The x axis is the cycle number, and the y axis is the processor number. This view displays the execution behavior of all goals on a 32-processor machine. The color patterns indicate the number of reductions. The relationship between the number of reductions and color pattern is displayed on the bottom right corner. From this, we know that the work load on each processor was well balanced, and this program was executed about 50,000 reductions/an interval on each processor at each moment in time. 3.2.3 A WhatxWhere View There are two kinds of views in terms of "What" and "Where" items. One is a WhatxWhere view which shows the load balance of each "What" item on each processor. A bar chart is displayed of a "What" item in order of total volume. The x axis represents the proces- sor numbers, the y axis represents the rate of processor utilization, the number of messages, and the number of reductions or suspensions that correspond to the type of the profiling information. All bar charts are drawn with the same scale on the vertical axis, so it is easy to compare with the volume of each "What" item. The other is an overall What x Where view which shows the load balances of all "What" items on each processor. Each "What" item is stacked in the same bar chart and displayed by a certain color pattern. The y axis represents the average rate of processor utilization, the total number of messages, and the number of total reductions or suspensions that correspond to the type of profiling information. The relationship between each category and color pattern was displayed on the top-right corner. The top-right window in Figure 2 shows the low-level behavior of the processor with an overall What x Where view. In this window, each categories of low-level behavior is displayed with several color pattern. From this, the average of computation took more than 80% of total execution time, and the average of commu- 290 nication on each processor was less than 5%. Thus, this view shows most of the processors run fully, and this example program was executed very efficiently on each processor. 4 Examples This section discusses which views to use to view various performance bottlenecks. For efficient program execution on multiprocessor systems, the following phases are usually repeated until a solution is reached: 1) a program is partitioned into subtasks, 2) the subtask is mapped to each processor dynamically, and 3) each processor runs subtasks while communicating with each other. Various problems are often encountered when executing a program on multiprocessor systems. We will show how graphic displays in both the higher program and lower implementation levels are helpful with performance problems. 4.1 Uneven Partitioning When the granularity between subtasks is very different, it is useful to observe the low-level processor behavior with a WhenxWhere view and the higher-level processor behavior with a What x Where view. From the When x Where view, we will find which processors run fully and which are idle. From the WhatxWhere view, we will determine which goals caused the load imbalances. The left window in Figure 3 shows the low-level behaviors on each processor with a When x Where view, while the right window in Figure 3 shows the higher-level behaviors of the same processors with a What x Where view on a 16-processor·machine. An example program is a logic design expert system which generates a circuit based on a behavior specification. The strategy of parallel execution is that first, the system divides a behavior specification into sub-specifications, next designs subcircuits based on the sub-specifications on each processor, and finally gathers partial results together and combines them. The When x Where view suggests that processors around No. 11 run fully, but most of the other processors were idle. The What x Where indicates the top six goals were mainly executed on processor No. II. From this, we know that very complicated tasks are allocated to processor No. 11, that is, uneven partitioning of behavior specification must cause a bottleneck in performance. 4.2 Load Imbalance If a mapping algorithm has problems such as allocating subtasks to the same processor, it is useful to observe low-level behavior of the processor with a When x Where view and higher-level behavior with a WhatxWhere view. From the WhenxWhere view, we see which processors are running fully and which are idle, and from the What x Where view, we see the load balance of each goal. Using both views, we can determine how to distribute the goals that are imbalanced to each processor. The bottom-left window of Figure 4 shows low-level behavior of the processor with a WhenxWhere view, the top-left window and the top-right window show the higher-level behavior of the processor with an overall What x Where view, a What x Where view respectively. An example program is a part of the theorem prover· which evaluates whether an input formula is a tautology. The strategy consists of 2 steps: 1) convert an input formula to clause form (i.e, conjunctive normal form), 2) evaluate its clause form and determine whether it is a tautology. The step 1 is executed in parallel as follows. First, main task partitions an input formula into subformulas. Second, it generates subtasks to convert subclause forms, and finally, distributes subtasks to many processors dynamically. These steps are repeated recursively until subformulas are converted to subclause forms. The step 2 is executed in sequential on processor No. O. The When x Where view of the bottom-left window in Figure 4 suggests that only certain processors (processor No. 6-15 and No. 23-31) run fully and that. the others were mostly idle. The overall When x Where view of the top-left window also suggests that most of the goals were executed on certain processors and the number of reduction of top five goals were higher than the other goals. We can check the load of each goal on each processor from the What x Where view of the top-right window in Fugure 4. These goals were executed on certain processors and were the cause of the load imbalances. From this, we have to change its mapping algorithm to be flatten the shape, to use all processors efficiently. 4.3 Large Communication Overhead When subtasks are not mutually independent and must communicate with each other closely, the program is less efficient because of communication overhead. In this case, the low-level behavior ofthe processor with an overall What x Where view and frequencies of sending and receiving messages with a What x Where view are helpful. From the overall What x Where view, we will learn how much time has been consumed on message handling for each processor, while the WhatxWhere view shows us what kind of messages each processor has sent or received. Figure 5 displays an execution behavior of an improved version of the program described in Section 4.2. The left window shows the load balances of all goals on a 32processor machine with an overall WhatxWhen view. 291 0<-: - I 1<: o o yori: : c_al terna add/6 pimos: : hasher: hashO/4 S pimos: : keyed_ba do_get_i f _any_ l1li yori:: c_al terna add_FBVa I ue/4 • subtract • o o yori:: agent_add sub/6 unify pimos:: keyed_ba keyed_bag/6 IillllD p imos: : keyed_ba rehash_each_en l1li pimos:: hasher: hash/3 ~~H+~I+I-I++H+H+H+t++H+t++H+tf:H.: .cale LUillL1..LU;:;Ll...U-'L.1.LLLLLLLLLLL.LLLLLl...L.1-U-LL..L.L.'-"-"" ~; f ~~~!. Figure 3: The low-level processor behavior (left window) and execution behavior of goals (right window) This view shows that the work load on each processor was balanced in overall execution, but was not efficient because oflarge communication overhead. It will be proved from low-level behavior of the processor with an overall What X Where view shown in the right window. The right window of Figure 5 suggests the load average on each processor was about 80 - 85%, but the average of computation on each processor was about 20%. Most of the processing power was consumed sending and receiving message handli~g time more than 60% of total execution time. Figure 6 shows the same program execution as Figure 5. The left window shows the receiving and sending message handling time rate with What X Where view, the right window shows the frequencies of four received interprocessor messages with a What x When view. The left window of Figure 6 shows the message handling time on each processor at each moment in time was almost equally, the right window shows that the read message was received about 180,000 times, answeLvalue message was about 165,000 times, unify message was 100,000 times, and throw ~oal message was about 64,000 times per interval on all processors. The tasks generated in this program communicated with each other closely among processors as compared with the result of N queen's message frequencies (see the top-left window of Figure 2). 'From this, we know that as work loads are distributed more and more, it becomes easier to balance work loads on each processor, but communication overhead also increases and performance is thus lowered. As a result, we have to redesign or improve how to divide into subtasks. Because the generated subtasks that were not mutually independent, and it caused such a problem we mentioned above. 5 Conclusion We developed the ParaGraph system on parallel inference machines to provide graphic displays of processor utilization, interprocessor communication, and execution behavior of parallel programs. Experiments with various programs have indicated that graphic displays are helpful in dividing work loads evenly and determining where the bottlenecks are on multiprocessor systems. We released a version last year as a tuning tool of PIMOS, but have experienced some problems. In the future, we will improve the system considering the following points. First, real-time performance visualization tools are needed. Although displaying execution behavior in realtime perturbs the program being monitored, it is useful not only in early tuning but also in debugging such as detecting deadlock status and infinite loops. To develop such a tool, low overhead instrumentation techniques and new displays that programmers would not be pressed to understand appearing in real-time must be devised. Second, tools which can visualize the portion of the performance bottlenecks directly are needed. Massively parallel machines that have thousands of processors and programs for long runs produce a large amount of profiling information, but it is difficult to process or display for simple expansion of our system because of a vast quantity of information. To solve such problems, analysis techniques indicating bottlenecks directly will be needed. We will study automatic analysis techniques and graphical displays of its result (we call this bottleneck visualization). One such approach is critical path analysis, which identifies the path through the program that consumed the most time [Miller 90]. 292 o me: : bmtp_depthl after_rewrite Ome:: bmtp_depth before_rewri te liliI!Iunify .. me: : bmtp_depthl ru)e/3 • me: : bmtp_depth after_rewrit • others ":~'::l ;..... -- -~""'~.~t••r"_r-••-rlt~._~ar~.I' ' ' 25' ' - '-~-: :30=31 '''''''~ "00E5~ L.,·o _ ~o"'-'-'~~.~.:-'-:b~.tP~_.c.P"""th"""1:b~.fC""'or-._r-••~rl"!',,""'_ar~9/S!5 3:'2002r~ ___ o Node ~~. . ~ 4: 3031 Nod. l!uniffJ 25· 3031 Noale 333'3~~o~_~"'_,o'71•• I::b•.t••_.".P~th~1:T'" , !u,~./1I!I I3"~2-5Im.-30~31 Figure 4: Low-level processor behavior (bottom-left window), the load balances of all goals (top-left window), and the load of each goal (top-right) 6 Acknowledgments We thank K. Nakao and T. Kakuta who helped us to develop this tool, and all the researchers of ICOT and other companies who tested our tool. References [Ueda 90] K. Ueda and T. Chikayama, "Design of the Kernel Language for the Parallel Inference Machine," The Computer Journal, December 1990. [Goto 88] A. Goto, M. Sato, K. Nakajima, K. Taki, and A. Matsumoto, "Overview of the Parallel Inference Machine (PIM) arc'hitecture," In Proceedings of the International Conference on Fifth Generation Computer Systems, pages 208-229, 1988. [Chikayama 88] T. Chikayama, H. Sato, and T. Miyazaki, "Overview of the Parallel Inference Machine Operating System (PIMOS)," In Proceedings of the International Conference on Fifth Generation Computer Systems, pages 230-251, 1988. [Furuichi 89] M. Furuichi, K. Taki, N. Ichiyoshi, "A Multi-Level Load Balancing Scheme for OR-Parallel Exhaustive Search Program on the Multi-PSI," ICOT TR-526, 1989. [Kimura 89] K. Kimura, and N. Ichiyoshi, "Probabilistic Analysis of the Optimal Efficiency of the MultiLevel Dynamic Load Balancing Scheme," In Proceedings of the Sixth Distributed Memory Computing Conference, 1989. [Heath 91] M. T. Heath, and J. A. Etheridge, "Visualizing the Performance of Parallel Programs," IEEE Software, pages 29-39, September 1991. [Malony 90] A. D. Malony, n. A. Reed, D. C. Rudolph, "Integrating Performance Data Collection, Analysis, and Visualization," Addison-Wesley Publishing Company, pages 73-97, 1990. 293 ~ I o DGC me: : bmtp_depth: after_rewrite_ ~ [] me: : bmtp_depth: before_rewrite l1li unify I, i EI receive l1li send ii! • compute • me: : bmtp_depth: III rule/3 I •• m:; !::~~;~~r:;~ ~, others !I Figure 5: The ~oad balances of goals (left window) and low-level processor behavior (right window) "'''y 1.00E5 o 100 .0 00 1 ,.,.,.,, ;~ ..... 30 ..... ~g o ':.'.50~ soa.le x: 1 o~ol. ~: 1 notl. so&le x: 1 o~ol. Figure 6: Low-level processor behavior about message handling (left window) and message frequencies (right window) [Ichiyoshi 89] N. Ichiyoshi, "Research Issues in Parallel Knowledge Information Processing," ICOT TM0822, November 1989. [Nakajima 89] K. Nakajima, Y. Inamura, N. Ichiyoshi, T. Chikayama, and H. Nakashima, "Distributed Implementation of KL1 on the Multi-PSI/V2," In Proceedings of the Sixth International Conference on Logic Programming, 1989. [Nakajima 90] K. Nakajima, and N. Ichiyoshi, "Evaluation of Inter-processor Communication in the KLI Implementation on the Multi-PSI," ICOT TR-531, 1990. [Miller 90] B. P. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S. Lim, and T. Torzewski, "IPS-2: The Second Generation of a Parallel Program Measurement System," IEEE Trans. Parallel and Distributed Systems Vol. 1 No.2, pages 206-217, April 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 294 PROTEIN SEQUENCE ANALYSIS BY PARALLEL INFERENCE MACHINE MASATO ISHIKAWA, MASAKI HOSHIDA, MAKOTO HIROSAWA, TOMOYUKI TOYA, KENTARO ONIZUKA AND KATSUMI NITTA Institute for New Generation Computer Technology 4-28, Mita I-chame, Minato-ku, Tokyo 108, Japan ishikawa@icot.or.jp Abstract We have developed a multiple alignment system for protein sequence analysis. The system works on a parallel inference machine PIM. The merits of PIM bring prominent features to the multiple alignment system. The system consists of two major components: a parallel iterative aligner and an intelligent refiner. The aligner uses a parallel iterative search for aligning protein sequences. The search algorithm is the BergerMunson algorithm with its parallel extension. Our implementation shows that the algorithm extended in parallel can rapidly produce better solutions than the original Berger-Munson algorithm, The refiner uses condition-action rules for refining multiple sequence alignments given by the aligner. The rules help to extract motif patterns from ambiguous alignment patterns. 1 Introduction Molecular biology and genetic technology have been advancing at an astonishing rate in recent years. Major activities in these fields are closely related to DNA and protein. This is because a set of DNA molecules in a cell contain the genetic information for the complete design of the living organism. This information is embodied as protein to build up the body and to keep its mechanisms alive. Each piece of genetic information, represented by a sequence of nucleic acids, is translated into a sequence of amino acids to form protein. As the method to determine DNA or protein sequences has progressed to its current state, the amount of known sequence data has grown rapidly. For example, Genbank, one of the most widely distributed databases, contains information on more than sixty million nu- cleotides. The growing number of genetic sequences in databases inevitably makes the field of genetic information processing one of the most important application areas for computer science. The fundamental technique for analyzing genetic sequence data by computer is to examine similarities among sequences. This usually requires large amounts of computation to find the similarities, since there are a lot of sequences in the database to be examined. The computational problem can be partly solved with parallel implementation. There have been some experiments with parallel sequence analysis [Iyengar 1988]. Another approach to. the problem is to furnish the analysis program with biological know-how as heuristics. Many consider that logic programming languages are a profitable way of implementing heuristics. Parallel sequence analysis with a logic programming language has been tried [Butler et al. 1990]. We have developed a multiple alignment system for protein sequence analysis. The system has been implemented on a parallel inference machine PIM using a parallel logic programming language KLl. The aim of this paper is to show PIM's availability in the field of genetic information processing. The organization of the rest of this paper is as follows. In Section 2, we briefly explain our 'application problems. We present our multiple sequence alignment system in Section 3. Then, the results of experiments and comparison with other methods are discussed in Section 4. Finally, conclusions are given in Section 5. 2 Protein sequence analysis As described above, the genetic information, stored in DNA, is translated into sequences of amino acids. A chain of amino acids folds to become protein in water. 295 The structure of the protein depends on the sequence itself, that is, the same sequence will form the same structure. The function of the protein is chiefly determined by its structure, because proteins whose shapes are complementary can interact with each other. Every protein is made up of twenty kinds of amino acids which are distinguished by twenty different code letters. A protein has about two hundred amino acids on average and is represented by a linear sequence of code letters. Because every amino acid has its own properties of volume, hydrophobicity, polarity and so on, the order of the amino acids in the protein sequence gives structure and function of the protein. The protein sequence determination technique has been so established that more than twenty thousand sequences have been specified by the letters; this number is growing day by day. The structures of proteins are also being solved. Methods such as X-ray crystallography reveal how the linear chain of amino acids fold together. But this takes so many months to solve that only three hundred protein structures have been determined so far. An important way of discovering new genetic infor- . mation is inferring the unknown structure of a protein from its sequence. We do this by analyzing the sequence of amino acids, because, fortunately, proteins that have similar sequences have similar structures. Multiple sequence alignment is one of the most typical methods of sequence similarity analysis. The alignment of several protein sequences can- provide valuable information for researching the function or structure of proteins, especially if one of the aligned proteins has been well characterized. Let us show an example of multiple sequence alignment. The next set of sequences represents four parts of different protein sequences. Each letter in the sequences means an amino acid. For instance, GDVEK stands for a row of Glycine, Aspartic acid, Valine, Glutamic acid and Lysine. GDVEKGKIFIMKCSQCHTVEKGGKHKTGPNLHGLFG ASFAEAPAGTTGAKIFKTKCAQCHTVKGHKQGNGLFG PYAPGDEKKGASLFKTAQCHTVEKGGANKVGPNLHGVFG PPKARAPLPPGDAARGEKLRAAQCHTANQGGANGVGYGLVG A good multiple sequence alignment for the given sequences is as follows: ~---------GDVEKG-KIFIMKCSQCHTVEKGGKHKTGPNLHGLFG --ASFAEAPAG--TTGAKIFKTKCAQCHTV-KG--HKQG---NGLFG ------PYAPGDEKKGASLFKT--AQCHTVEKGGANKVGPNLHGVFG PPKARAPLPPGDAARGEKL---RAAQCHTANQGGANGVG---YGLVG * * **** * * * * Each sequence is shifted by gap insertion-dash characters. Each column of the resultant alignment has the same or similar amino acids. An identical pattern such as QCHT is considered to be an important site called a sequence motif, or simply a motif, because an important protein sequence site has been conservative along with evolutional cycles between mutation and natural selection. Multiple sequence alignment is useful not only for inferring the structure and function of proteins but also for drawing a phylogenetic tree along the evolutional histories of the creatures. A AD HE AHIE D H E c::> ADH-E A-HIE Figure 1: Pairwise dynamic programming Computers partly solve the problem of multiple sequence alignment automatically, instead of relying on the hands and eyes of experts. The results obtained by computers, however, have not been as satisfactory as those by human experts. That is because multiple sequence alignment is one of the most time and space consuming problems. The dynamic programming algorithm [Needleman and Wunsch 1970, Smith and Waterman 1981, Goad and Kanehisa 1982], theoretically, provides an optimal solution according to a given evaluation score. This, however, requires memory space for an Ndimensional array (where N is the number of sequences) and calculation time for the N - th power of the sequence length. Though a method was proposed to cut unnecessary computation in the dynamic programming algorithm [Carrillo and Lipman 1988], it still needs too much computation to solve any practical alignment problem. A number of heuristic algorithms for multiple alignment problems have been devised [Barton 1990, Johnson and Doolittle 1986] in order to obtain approximate solutions within a practical time. Most of these algorithms are based on pairwise dynamic programming. Figure 1 shows the algorithm of dynamic programming applied to a tiny pairwise alignment. The algorithm searches the best path in the figurative network from the top left node to the bottom right node minimizing the total cost of arrows. Each cost indicated on an arrow reflects the similarities between the characters being compared. The best path corresponds to the optimal alignment; each arrow in the path corresponds to each column in the alignment. Vertical and 296 PNPRI SA ARNYKIPLT~3:~ t:"u%j b:j 10 ~ G"l Q-+O-M~~~~~~-+n-~~~~ U---~IoQ-~~-U---+1:)~~~a--.'O G"lHH >tUtU Q-+O-M~~~~~~~n-~~~~ IlzJ --PNPRI-SA--ARNyKIPLT-----KFGIP-N----MFNIP-REQA--TL GA-T---- 110 .. >1 Il:"f 2 Figure 2: Iterative strategy of Berger-Munson algorithm horizontal arrows indicate the insertion of gaps. 3 Multiple alignment system We have developed a multiple alignment system for protein sequence analysis on PIM. The system consists of two components: a parallel iterative aligner and an intelligent refiner. The aligner uses a paral-. leI iterative search for aligning protein sequences. The refiner uses condition-action rules for refining multiple sequence alignments given by the aligner. 3.1 Parallel iterative aligner The search algorithm in the iterative aligner is the Berger-Munson algorithm extended in parallel. The B-M algorithm [Berger and Munson 1991] is based on the same pairwise dynamic programming method as conventional heuristic algorithms for multiple sequence alignment. The algorithm, however, features a novel randomized iterative strategy so as to generate a highscore multiple alignment. Figure 2 illustrates the iterative strategy, whose procedure is as follows: the initially aligned sequences are randomly divided into two groups (step 1). By fixing the alignment of sequence members within each group we can optimize the alignment between the groups, us- ing the pairwise dynamic programming method (step 2). The resultant alignment, in turn, is the starting point for the next alignment of a different pair of groups (step 3). Each iteration that improves the alignment between two sequence groups will also improve the global alignment. Though the B-M algorithm often results in a much better multiple alignment than those obtained by conventional algorithms, its randomized iteration needs more than a few hours to solve multiple alignment of a practical scale. When a parallel machine is available, the iterative strategy extended in a parallel way is fairly helpful for reducing execution time. The B-M algorithm extended in parallel is as follows: a1l2n-l-1 possible partitions of n aligned sequences are respectively evaluated by the pairwise dynamic programming method. In each iteration, the evaluation is executed in parallel and the alignment which has the best score is selected as the starting point for the next iteration. 3.2 Intelligent refiner Aligning multiple protein sequences requires biological know-how, since the alignment score is not sufficient to evaluate them. The intelligent refiner holds dozens of condition-action rules that reflect the biological know-how for refinement. Part of the biological know-how has been obtained by interviewing human 297 1000~------------------------~ bad (a) Original 8-M algorithm Score (b) Parallel 8-M algorithm o· (c) Tree-based algorithm good -1000+-~--~~--~~--~~~~~ o 80 40 120 1 60 Cycles Figure 3: Comparing alignment score histories experts. Another part of it corresponds to the information contained in a motif database PROS IT E. Let us explain an example of the condition-action rule, which features a well-known motif pattern called Zinc Finger. Zinc Finger is characterized by two separated Cs, Cysteines, and two separated Hs, Histidines. The condition part of the rule checks whether an alignment has the half-aligned motif pattern of Zinc Finger or not, and if it finds the weak motif pattern, it tries, in its action part, to enhance the weak pattern to make it strong (see Figure 4). Every condition-action rule is represented with a parallel logic programming language KLl. 4 Experimental results Our multiple sequence alignment system works on PIM/m, a MIMD-type parallel machine equipped with up to 256 processing elements (PEs). We have investigated the performance of our system by testing the two components separately. 4.1 Parallel iterative aligner The B-M algorithm enables us to gradually improve global multiple alignment. Improvement is evaluated by the alignment score. We have defined the alignment score as follows. The alignment score is a total sum- mation oJ the similarity scores of every pair of aligned sequences, each of which is derived by summing up the similarity values of every ch~racter pair in the column. Each similarity value is given by the odds matrix. A gap penalty corresponding to each row of gaps in the two sequences is added to the similarity score. We use P AM250 [Dayhoff et al. 1978] as the odds matrix, each value of which is a logarithm of the mutation probability of a character pair; zero is the neutral value. We have reversed the sign of each value of the matrix to assimilate the habit of optimization problems. So the most similar character pair, W VS. W, gives the lowest value, -17, and the least similar pair, WVS. C, gives the highest value, 8. The gap penalty imposed on a row of k gaps is a linear relation: a + bk where a and b are parameters. We set a = 4 and b = 1 as default values. The linear relation is feasible and popular for alignment done by the dynamic programming algorithm [Gotoh 1982]. Character pairs gap VS. gap and outside gap VS. any character are ignored; they are assigned the neutral value zero. We have implemented three algorithms for comparison analysis: the original B-M algorithm, the B-M algorithm extended in parallel and the tree-based algorithm. The tree-based algorithm [Barton 1990] is one of the most typical and conventional methods for multiple sequence alignment. Figure 3 compares the histories of the alignment scores obtained by the algorithms. 298 (l)Before: ------------ILD---FHE-KLLHPGIQKT---TKLF--GET---yyFPNSQLLIQNIINECSICNLAKTEHRNTDM--P-TKTT ------------LLD---F-----LHQLTHLSFSKMKALLERSHSPyyMLNRDRTL-KNITETCKAC--AQVNASKSAVKQG-TR-LTDALLIT---PVLQ---LSP-AELHSFTHCG---QTAL--TLQ----GATTTEA--SNILRSCHAC---RGGNPQHQMPRGHI--------VADSQATFQAyPLREAKDLHTALHIG---PRAL--SKA---CNISMQQA--REVVQTCPHC------NSAPALEAG-VN-------------ISD--PIHEATQAHTLHHLN---AHTL--RLL---yKITREQA--RDIVKACKQC---VVATPVPHL--G-VN-------------ILT--ALESAQESHALHHQN---AAAL--RFQ---FHITREQA--REIVKLCPNC---PDWGSAPQL--G-VN-(score = -781) * * * (2)After: ------------ILD---F------HEKLLHPGIQKTTKLF-GET---yyFPNSQLLIQNIINECSICNLAKTEHRNTDM--P-TKTT ------------LLD---F-----LHQ-LTHLSFSKMKALLERSHSPyyMLNRDRTL-KNITETCKAC--AQVNASKSAVKQG-TR-LTDALLIT---PVLQ---LSP-AELHS-FTHCG---QTAL--TLQ----GATTTEA--SNILRSCHAC---RGGNPQHQMPRGHI--------VADSQATFQAyPLREAKDLHT-ALHIG---PRAL--SKA---CNISMQQA--REVVQTCPHC------NSAPALEAG-VN-------------ISD--PIHEATQAHT-LHHLN---AHTL--RLL---yKITREQA--RDIVKACKQC---VVATPVPHL--G-VN-------------ILT--ALESAQESHA-LHHQN---AAAL--RFQ---FHITREQA--REIVKLCPNC---PDWGSAPQL--G-VN-(score = -762) * * * * * Figure 4: Application of intelligent refiner Every algorithm solves the same small alignment problem which consists of seven sequences with eighty code letters each. The initial state of the alignment problem has no gaps inside the sequences. (a) Original B-M algorithm: The randomized iterative strategy executed by a single PE is applied to the alignment problem. Each iteration cycle takes twentyeight seconds on average.. We set thirty-two as the convergence condition; execution stops, if no variation of alignment score is found during thirty-two iteration cycles. Three runs with distinct sequences of random numbers give converged alignment scores: -752, -779 and -851. (b) Parallel B-M algorithm: The best-choice iterative strategy executed by sixty-three PEs is applied to the alignment problem. In each iteration, sixty-three possible partitions of aligned sequeI:J.ces are distributed to the PEs so that they can be evaluated at the same time. Each iteration cycle takes thirty seconds on average. The execution stops if no variation of alignment score is found. The final alignment, which is obtained at the fourteenth cycle with score -851, is the same alignment as one of the three obtained in (a). (c) Tree-based algorithm: The tree-based algorithm is a conventional method to rapidly produce a practical multiple alignment. The algorithm aligns sequences one after another by pairwise dynamic pro-· gramming. The order in which sequences are aligned depends on the tree-like representation that was previously determined by analyzing the distance of similarity of every pair in the sequences. Our implementation of the algorithm solves the problem in eighty seconds. The alignment score of the solution, -617, is indicated by a horizontal line. We made the following observations from these results. 1. The parallel B-M algorithm (b) solves alignment problems about ten times faster than the original B-M algorithm (a). 2. The original B-M algorithm (a) gives different alignments depending on the sequence of random numbers, whereas the parallel B-M algorithm (b) gives a constant alignment that often has a better score than obtained by (a). 3. (a) and (b) show that either ofthe B-M algorithms gives a much better alignment than the conventional tree-based algorithm (c). Thus, the parallel B-M algorithm can constantly generate high-score alignments in a small number of cycles. And PIM can execute the algorithm in a practical time. 4.2 Intelligent refiner The refiner holds dozens of condition-action rules and checks a given alignment with the condition parts in parallel. If some condition parts match the alignment, the action parts paired with the condition parts are executed so as to produce candidates for a refined alignment. After evaluation of the candidates, some of them are displayed as refined alignments. Let us show an example of the refinement. Figure 4 (1) shows an alignment which contains a weak Zinc-Finger motif pattern. Cs are aligned completely in two columns, but Hs are not aligned completely in two columns; Q exists among identical Hs in 299 a column. (* indicates a completely aligned column and ~ indicates an almost completely aligned column.) Application of the intelligent refiner to the alignment produces Figure 4 (2). The condition-action rule described in Section 3 has worked on the refinement process. The Zinc- Finger motif pattern is brought into full relief in the refined alignment. Although it has a score that is slightly worse than the previous alignment, it is a valuable alignment from a biological point of view. Thus, the intelligent refiner helps to extract motifs from ambiguous alignment patterns and to produce biologically valuable alignments. Constructing the intelligent refiner on PIM is a profitable way, since KL1, a logic programming language on PIM, is suitable for representing such biological know-how. 5 Conclusions We have developed a multiple sequence alignment system on PIM. The parallel iterative aligner of this system with the extended Berger-Munson algorithm can constantly generate better alignments than conventional methods in a practical time. The intelligent refiner of this system uses condition-action rules for refining alignments given by the aligner. The rules reflecting biological know-how help us to extract motif patterns from ambiguous alignment patterns. These results show that PIM is fairly available in the field of genetic information processing. The extended algorithm searches all 2n - 1 - 1 possibilities in parallel and selects the best one. There is a problem because the number of possibilities increases exponentially as the number of sequences grows. Some practical alignment problems with more than twenty sequences have about a million possibilities. In those cases, preprocessing with cluster analysis is useful for reducing the possibilities without reducing the quality ofthe resultant alignment. The cluster analysis divides given sequences into a few groups based on similarities between sequences; similar sequences gather in the same groups. One of our future works is to represent complex biological know-how as a combination of simple conditionaction rules. Acknowledgments We gratefully acknowledge Osamu Gotoh for calling our attention to the B-M algorithm. We would also like to thank Naoyuki Iwabe and Kei-ichi Kuma for providing us with the biological know-how to refine alignments. References [Barton 1990] J. G. Barton. Protein multiple sequence alignment and flexible pattern matching. In R. F. Doolittle (ed), Methods in Enzymology Vol. 183, Academic Press, 1990. pp.403-428. [Berger and Munson 1991] M. P. Berger and P. J. Munson. A novel randomized iterative strategy for aligning multiple protein sequences. Computer Applications in the Biosciences, 7, 1991. pp.479-484. [Butler et al. 1990] Butler, Foster, Karonis, Olson, Overbeek, Pfiunger, Price and Tuecke. Aligning Genetic Sequences. Strand: New Concepts in Parallel Programming, Prentice-Hall, 1990, pp.253-271. [Carrillo and Lipman 1988] H. Carrillo and D. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48, 1988, pp.10731082. [Dayhoff et al. 1978] M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt. A model of evolutionary change in proteins. In M. O. Dayhoff (ed), Atlas of Protein Sequence and Structure Vol.5, Supp1.3, Nat. Biomed. Res. Found., Washington, D. C., 1978, pp.345-352. [Goad and Kanehisa 1982] W. B. Goad and M. 1. Kanehisa. Pattern recognition in nucleic acid sequences. 1. A general method for finding local homologies and symmetries. Nucleic Acids Res., 10, 1982, pp.247-263. [Gotoh 1982] O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. BioI., 162, 1982, pp.705-708. [Iyengar 1988] A. K. Iyengar. Parallel DNA Sequence Analysis. MIT/LCS/TR-428, 1988. [Johnson and Doolittle 1986] M. S. Johnson and R. F. Doolittle. A method for the simultaneous alignment of three or more amino acids sequences. J. of Mol. Evol., 23, 1986, pp.267-278. [Needleman and Wunsch 1970] S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. of Mol. BioI., 48, 1970, pp.443-453. [Smith and Waterman 1981] T. F. Smith and M. F. Waterman. Identification of common molecular subsequences. J. of Mol. Biol., 147, 1981, pp.195-197. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 300 Folding Simulation using Temperature Parallel Simulated Annealing Makoto Hirosawa,* Richard.J.Feldmann,t David Rawn,+ Masato Ishikawa,* Masaki Hoshida,* George Michealst hirosawa@icot.or.jp Abstract We applied temperature parallel simulated annealing to the biological problem of folding simulation. Watercounting is introduced to formulate folding simulation as an optimization problem. Nobody has ever solved the folding simulation problem. We cannot obtain biologically significant consequences either. However, from the viewpoint of the evaluation value of the folding simulation, we observed the effectiveness of parallel computing. 1 Introduction Folding simulation uses a computer to simulate the process of protein formation from its stretched state to its native folded state. This research topic has held the interest of biologists for a quarter of a century and has never been solved. No researche has been able to reach the native folded state by folding simulation. Three of the authors(Feldmann, Rawn and Micheals) have been interested in formulation for protein folding. They introduced the water-counting model, which requires solution by computer. Meanwhile, the other three the authors (Hirosawa, Ishikawa and Hoshida) have studied the application of Multi-PSI [Nakajima et al. 1989J parallel inference machine to biological problems. A first attempt was made to the problem of multiple alignment [Ishikawa et al. 1991J using temperature parallel simulated annealing [Kumura and Taki 1990]. It was *Institute for New Generation Computer Technology(ICOT) tNational Institutes of Health ~Towson University so successful that other biological applications were sought. As the requirements of both partners matched, we combined efforts to conduct collaborative research. The purpose of this research was to investigate the applicability of the optimization algorithm, temperature parallel simulated annealing, to folding simulation and to evaluate the effectiveness of the water-counting model. The concept of folding simulation is explained in the second section and the water-counting model and its computational formulation are introduced in the third section. Then, temperature parallel simulated annealing is explained in the fourth section. Finally, the simulation results are shown in the fifth section. 2 2.1 What is folding simulc~.tion? Biological background of folding simulation Proteins are biological substances and they are essential to the existence of all creatures, from humans to the AIDS virus. A protein is a linear chai of amino acids. It consists of 20 kinds of amino acids. The structure of protein is determined by the order of the amino acids in the sequence. The structure of protein is closely related to its function. Therefore, it is very important to know the structure of the protein. Even now, it is very difficult to determine the structure of a protein. X-ray crystallography and NMR(Nuclear Magnetic Resonance) can be used to determine structure. But the former method can only be 301 utilized when crystalization of protein is succeessful, and this crystalization is very difficult to do. The latter method can be adopted when the size of the protein is small. Both require plenty of time from months to a year. On the other hand, we can determine the order of amino acids in the sequence of protein extremely easier than we can determine a structure of a protein. A technique for determining the sequence of a protein has been established. That is· why folding simulation is important and necessary. Folding simulation simulates, by computer, the process of protein formation from its stretched state to its native folded state. Before simulation starts, information on the order of the amino acids is provided. 2.2 Folding simulation as an optimization problem Folding simulation is a research topic that has fascinated hundreds of theoretical bio-chemists for a quarter of a century. The molecular dynamic method is, theoretically, able to solve folding simulation problem. The method precisely simulates the movement of each atom driven by kinetic forces. However, it requires such huge amounts of computational time that actual folding simulation problems cannot be solved (it can simulate pico-second movements of a protein whereas the whole folding process takes a few seconds or more). To make the computational time tractable, we have to seek effective approximation methods. In each approximation method, abstract representation (e.g. the amino-acid ball which represents all atoms in an amino acid as a single ball) and the limited structure state (e.g. limited location or angle) are often introduced. We can regard such an approximation method as combinatorial optimization, because each discrete state is evaluated by a properly-defined potential energy to be minimized and effective transition between states is devised. One of the most frequently employed approximation methods is lattice representation [Ueda et aI. 1978] [Skolnick and Kolinsky 1991], which restricts the position of amino acids in 3-dimensional lattice cells. Although the lattice representation can remarkably reduce computational time for folding simulation, no significant result had been produced until recently. That is partly because the lattice formulation might not be good enough to simulate the folding process, and partly because the computational power required might have still been too big. The work by Skolnick and his co-researchers is the first research that did not poorly reproduce the native structure of protein by the folding simulation. However, the parameterization he employed to reproduce the native structure has drawn criticism. 3 Computational Formulation of Folding Simulation We introduced a water-counting model to approximate folding simulation concisely and to formulate folding simulation as an optimization problem based on the model. The water-counting model employs a lattice representation for protein and water. 3.1 Concept of water-counting model Coming back to basic biological knowledge, we have sought a simulation method that requires the most minimal parameterization possible. Then, we found the water-counting model as a biological model. In 1958, I.M. Klotz recognized that the folded structure of a protein depends upon its interaction with water [Koltz 1958]. At about the same time, W.Kauzmann showed that the hydrophobic effect provides the principle driving force for protein folding [Kaumann 1959]. Hydrophobicity is a measure that represents the degree to which amino acids don't favor water. Amino acids that favor water are called hydrophilic amino acids, and those that don't are called hydrophobic amino acids. Hydrophobic force is caused by the above tendency of amino acids. Since many biologists today, if not all, still recognize hydrophobic force as a primary force, we simplified folding simulation by employing hydrophobic force without using any other kind of force. Next, we investigated the origin of hydrophobic force. We concluded that !he binding and detaching of water to and from amino acids produces hydrophobic force. We can interpret the global minimum energy of protein in terms of the number of water molecules bound to proteins. Because the energy is calculated by the number of water molecules around amino acid, we 302 coined it the water-counting model. 3.2 Representation of Protein We will describe a way to represent protein on a 3dimensional lattice. In lattice celis, any place protein is not present, water will fill. As described earlier, proteins are linear chains of amino acids. Each amino acids is composed of two parts, namely, a main chain and a side chain. Main chains form the backbone of protein. Side chains of amino acids determine the properties of the amino acid. The main chain of an amino acid serves to connect adjacent amino acid. The relative location between two adjacent amino acids is like the move that a knight in chess makes, but on a 3-dimensionallattice (Figure 1), (±3, ±1, ±1). Every main chain .of amino acid occupies 27 (= 33 ) lattice cells. Each of the twenty kinds of amino acids has different side chains (Table 1). For example, their volume (the number of lattice cells occupied) and hydrophobicity [Janin 1979] differ. 3.3 Evaluation of State The energy of states are evaluated in the following formula. E( Ener9Y) = L~dechain"(Water Count m - 1) Water Count m = x HydrophobicitYm Number of adjacent cell" (of "ide chain) occupied by other amino acid" The number of adjacent cell" of the "ide chain In the first formulas, the terms from hydrophobic amino acids are negative and those from hydrophilic' amino acid are positive. The more the absolute value F E -0.6 -0.7 0.5 32 52 20 C 0.9 16 Amino acid I Hydrophobicity 0.7 Volume 48 K L -1.8 0.5 60 48 M 0.4 40 V Amino acid T S R Hydrophobicity -1.4 -0.1 -0.2 0.6 Volume 16 28 36 68 G H 0.3 0 -0.1 40 P N Q '-0.5 -0.3 -0.7 28 28 40 W 0.3 68 Y -0.4 56 Table 1: Characteristics of amino acid side chains: each letter signifies one of 20 amino acids, for example, E signifies glutamic acid. of the hydrophobicity of an amino acid is, the greater its contribution to the energy is. The energy can be reduced both by increasing the amount of water around the hydrophilic amino acid and by reducing the amount of water around the hydrophobic amino acid. The minimization of energy has the effect of inviting hydrophobic amino acids toward the center of the protein where there is less water and to oust hydrophilic amino acids to the surface of the protein where water is abundant. 3.4 Figure 1: Representation of a part of protein: main chains(shaded) and side chains(unshaded) D A Amino acid Hydrophobicity 0.3 12 Volume Transition between States As a transition from one state to another, we introduce two classes of transition. One is rotational transition and the other is translational transition (Figure 2). Rotational transition is the move that proteins probably take in actual folding processes. We first focus on one amino acids and select which side of the protein to rotate (in the figure, the right side is selected). Then, by regarding a connection line between the focused amino acid and its adjacent amino acid (in the figure, the adjacent amino is on the left) as an fixed axis, the selected side of the protein is rotated. Translational transition is the moving of proteins that is done for computational convenience. As with rotational transition, one amino acid is focused on and the side to move is selected. Then, the adjacent amino acid of the selected protein is moved and other amino acids on the selected side are moved translationally. (the direction to translate is specified by the move of the adjacent ~ino acid). After a new state is created by the transition se- 303 lected, a collision check is executed. If, in the next possible state, there is no multiply occupancy of any lattice cell by different parts of the protein, this state is acceptable. Otherwise, the state is discarded and new transitions are tested until some that is accepted iteration, the current solution Xn is randomly modified to get a candidate x~ for the next solution, and the variation of the energy llE = E(x~) - E(xn) is calculated to evaluate the candidate. When llE :S 0, the modification is good enough to accept the candidate: is found. X n+l = x~. When llE > 0, the candidate is accepted with probability p = exp( -llE /Tn ), but rejected oth- _ ---o:.......~ ...... N 0 0 ...... .. /'~~ t focused amino acid 0 t focused amino acid Figure 2: Rotational transition (left) and Translational transition (right) erwise: X n+l = X n , where {Tn }n=O,l, ... is a cooling schedule (a sequence of temperatures decreasing with n). Because solution Xn is distributed according to the Boltzmann distribution at temperature T, the distribution converges to the lowest energy state (optimal solution) as the temperature decreases to zero (Figure 3). Thus, one might expect SA to be capable of providing the optimal solution, in principle. It is well-known that the cooling schedule has great influence on SA performance. This is where the cooling schedule problem 4 Temperature Parallel Simulated Annealing arIses. T (temperature) In the proceeding section, we formulated folding simulation as the problem to search for the minimum energy in a solution space. We employed temperature parallel simulated annealing as an algorithm to find a global optimal solution. Temperature parallel simulated annealing is an algorithm that can circumvent a scheduling T2 T3 T4 Ts K2 ~!_~K-=3_--: i K4 ~ .JJ. (time) parallelize t on t on t on t on t on SA SA is a stochastic algorithm used to solve complex combinatorial optimization problems [Kirkpatrick 1983]. It searches for a global optimal solution in a solution 1 oL..------------ t problem of simulated annealing (SA), by introducing the concept of parallelism in temperature. In this section, SA is explained firstly, then temperature parallel SA is introduced. 4.1 K1 T1 PE1 PE2 PE3 PE4 PE5 Figure 3: Ordinary SA and temperature parallel SA space without being captured in local optima. SA simulates the annealing process of physical systems using a parameter, temperature, and an evaluation value, eneT'!]y. At high temperatures, the search 4.2 Temperature Parallel SA point in the solution space jumps out of local energy minimum. At low temperatures, the point falls to the nearest local energy minimum. The basic idea behind the algorithm is to use parallelism in temperature [Kumura and Taki 1990], to perform annealing processes concurrently at various tem- An outline of the SA algorithm is as follows. Given an arbitrary initial solution Xo, the algorithm generates peratures. The algorithm automatically constructs an appropriate cooling schedule from a given set of tem- a sequence of solutions {x n }n=O,1,2, ... iteratively, finally outputting Xn for a large enough value of n. In each peratures (Figure 3). Hence, it partly solves the cooling schedule problem. 304 The outline of the algorithm is as follows. Each processor maintains one solution and performs the annealing process concurrently at a constant temperature that differs between processors. After every k annealing steps, each pair of processors with adjacent temperatures performs a probabilistic exchange of solutions. Let p(T, E, T', E') denote the probability of the exchange between two solutions: one with energy E at temperature T and the other with energy E' at temperature T'. This is defined as follows: minimum energy value, is selected as a solution for the algorithm. ° ow------------------------------. I if 6.T·6.E < p( T, E, T', E') = { exp( _ 6.~foE) otherwise 5.1 Experimental result The minimum energy versus the cycles of simulation of those two algorithms is plotted in FigA. In the figure, the result using sequential SA, ordinary SA, is also plotted. Its energy is the average of energy obtained by sequential SAs in the simple parallel SA. -10000 where 6.T = T - T', 6.E = E - E'. The probability has been defined such that solutions with lower energy tend to be at lower temperatures. Hence, the solution at the lowest temperature is expected to be the best solution so far. The cooling schedule is invisibly embedded in the parallel execution. The temperature parallel algorithm has advantages other than the dispensability of the cooling schedule. We can stop the execution at any time and examine whether a satisfactory solution has already been obtained. The algorithm of temperature parallel SA is implemented as a tool kit. When we want to solve some problem using temperature parallel SA, if we use the tool kit, all we have to do is to write a program that just corresponds to the problem. 5 Experiment and Discussion We selected flavodoxin, whose structure is known, as the protein to simulated. This protein is of a medium size and has 138 amino acids. We ran the folding simulation program using temperature parallel SA on MultiPSI using 20 processors over 10 days. This corresponds to 30,000 cycles. We also ran the folding simulation program using simple parallel SA in 30,000 cycles, also with 20 processors. The simple parallel SA is a naive combination of sequential SAs: every available processor has one solution and anneals it sequentially using a distinct sequence of ra,ndom numbers. All resulta,nt solutions are compared with each other and the best one, the one with the -20000 -30000 Energy -40000 -50000 -60000 +----.---r--......--r----....---r---.----! 10000 20000 300pO 40000 o Steps Figure 4: Energy history of folding simulation One of the structure of flavodoxin produced by the program is shown in Figure 5. Unfortunately, its structure is not similar to the structure of real flavodoxin. However, a favorable tendency ,where hydrophobic amino acids are inside the structure while hydrophilic amino acids are outside the structure, was observed. 5.2 Discussion The effectiveness of the water-counting model will be evaluated first, then the effectiveness of the temperature parallel SA as an optimization method for practical problems will be evaluated. The structure of the flavodoxin produced was not similar to its real structure. However, this doesn't necessarily indicate a defect in the water-counting model. We, instead, think that the result is due to insufficiency of transitions we introduced. The rough structure of protein, especially that of small protein, can be reproduced by global transition 305 Figure 4 shows the tendency for energy of all methods to be minimized further after the completion of a specified cycle of simulation. Only simulation by temperature parallel SA can be resumed without rescheduling. Because two kinds of parallel SAs are almost the same, we think that temperature parall~l SA is more advantageous than simple parallel SA. Figure 5: Result structure of folding simulation (flavodoxin) that is like rotational transition and translational transition. There is little collision among amino acids in the path from the stretched state to the roughly formed structure. However, a fine protein 'structure is rarely reproduced by global transitions alone due to the collisions. We think that the local transition modes that can avoid collision should be incorporated to reproduce the native structure with collision check. We are planning to introduce a local transition, kink mode [Skolnick and Kolinsky 1991]. We think that the necessary mode of transition must be incorporated before we can evaluate the effectiveness of the water-counting model. Next, we evaluate the effectiveness of temperature parallel SA as an optimization method by using Figure 4. Readers who are familiar with SA should consult Appendix. We made the following observations from this energy profile in consideration of the above points. 1. Two kinds of parallel SAs made better results within a fixed time than sequential SA~ This is simply the effect of multiple processors. 2. Up to the middle stage of simulation, temperature parallel SA is always better than simple parallel SA. This is because temperature parallel SA can produce optimal solutions as that time. 3. Two kinds of parallel SAs have almost the same final energy value. Simulated annealing is most effective when states generated at higher temperatures can cover nearly all the solution space. In the case of folding simulation, this is hard to do it. We are now engaged in trying to restrict the solution space of simulated annealing by knowledge and/or heuristics to the extent that the solution space can be covered by simulated annealing. 6 Summary We studied folding simulation as an application of parallel simulated annealing. This program was written in KL1 and was executed on the parallel inference machine Multi-PSI. As the biological model the watercounting model that uses lattice representation and only hydrophobic interaction between amino acids was selected. The structure of flavodoxin produced by program is not appropriate from a biological point of view. This suggests that the program requires further improvements. The kink mode of transition is one candidate to incorporate. However, the insight was gained from the point view of computer science, namely evaluation of temperature parallel simulated annealing. The result using temperature parallel SA had almost the same final energy value (which is much better than that obtained by sequential SA) as the result using simple parallel SA. In consideration of the dispensability of rescheduling when further optimization is necessary, temperature parallel SA was proved to be advantageous. The other thing we learnt was that a module that restricts the solution space of folding simulation is required. We think knowledge engineering must be employed to do this, and also that KL1 is suitable for use. Acknowledgment The authors would like to especially thank Y. Totoki of IMS for his programming and experimentation with the 306 folding simulation. Without his endeavors which were conducted through what should have been his winter vacation, this paper couldn't have been written. We would also like to thank Kouichi Kimura, the founder of temperature parallel SA, for valuable discussions. We also thank Dr. Uchida and Dr. Nitta for their support in this international collaboration. References [Kumura and Taki 1990] Kimura, K. and Taki, K. (1990) Time-homogeneous parallel annealing algorithm. Proc. Compo Appl. Aifath. (IMACS'91j, 13, 827-828. [Nakajima et al. 1989] Nakajima, K., Inamura, Y., Ichiyoshi, N., Rokusawa, K. and Chikayama, T. (1989) Distributed implementation of KLI on the Multi-PSIjV2. Proc. 6th Int. Conf. on Logic Programming. [Ishikawa et al. 1991] Ishikawa, M., Hoshida, M., Hirosawa, M., Toya, T., Onizuka, K. and Nitta, K. (1991) Protein Sequence Analysis by Parallel Inference Machine. Information Processing Society of Japan, TRFI-23-2, (in Japanese). [Gierasch and King 1990] Gierasch, L.M and King, J( ed) (1990) Protein folding. American association for the advance of science. [Skolnick and Kolinsky 1991] Skolnick, J. and Kolinski, A. (1991) Dynamic Monte Carlo Simulation of a New Lattice Model of Globular Protein Folding, Structure and Dynamics. Journal of Moleculer Biology Vol. 221 no.2, 499-531. [Ueda et al. 1978] Ueda, Y., Taketomi, H. and Go, N. (1978). Studies on protein folding, unfolding and fluctuations by computer simulation. A three dimensionallattice model of lysozyme. Bilpolymers Vol.11 1531-1548. [Koltz 1958] Koltz,I.M. (1958) Science Vol.128, 825[Kaumann 1959] Kauzmann (1959) Advances in Protein Chemistry, Vol. 14 , no .1. [Janin 1979] Janin, J. (1979) Surface and side volumes in globular proteins. Nature(Londonj Vol. 211, 491492. [Kirkpatrick 1983] Kirkpatrick, S., Gelatt, C.D. and Vecci, M.P. (1983) Optimization by simulated annealing. Science, vol. 220, no.4598. Appendix Readers should pay attention to the following possibility when they discuss the result of Figure 4. 1. The two energy histories obtained by the two parallel SA algorithms might include the influence of statistical fluctuation, because each parallel algo- rithm was experienced only once. The sequential algorithm, however, was done twenty times and each point in the history represents the average energy value. 2. All SA procedures may be quick quenched in- stead of annealed, because the number of steps at each temperature, 1500, would be relatively small against the size of the solution space. If so, temperature parallel SA is disadvantageous for obtaining good energy in a short time, because not all processors in temperature parallel SA will necessarily do quick quenching; some processors may often do real annealing. 3. All SA proceduresy may not reach any minima in the solution space, because every decline in energy history is not sufficiently saturated. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 307 Toward a Human Genome Encyclopedia Kaoru Yoshida 11, Cassandra Smith 2, Toni Kazic 3, George Michaels 4, Ron Taylor 4, David Zawada 5, Ray Hagstrom 6, Ross Overbeek 6 1 Division of Cell and Molecular Biology, Lawrence Berkeley Laboratory, Berkeley, CA 94720, U.S.A. Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, CA 94720, U.S.A. 3 Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, U.S.A. 4Division of Computer Research and Technolgy, National Institutes of Health, Bethesda, MD 20894, U.S.A. 5 Advanced Computer Applications Center, Argonne National Laboratory, Argonne, IL 60439-4832, U.S.A. 6Division of Mathematics and Computer Science, Argonne National Laboratory, Argonne, IL 60439-4832, U.S.A. 2 Abstract Aiming at building a human genome encyclopedia, a human genome mapping database system, Lucy, is being developed. Taking chromosome 21 as the first testbed, more than forty maps of different kinds have been extracted from publications, and several public and local genome databases have been integrated into the system. To our knowledge, Lucy is one of the first systems that have ever succeeded in genome database integration. The success owes .to the following key design strategies: (1) A sequential logic programming language, Prolog, has been used so that the database construction and query management could rely on the internal database facility of Prolog. (2) An object-oriented data representation has been employed, so that any kind of data could be manipulated in the same manner. (3) A mini language, map expression, has been designed, which enables map representation in a relative-addressing manner and also linkage of one map to another. These strategies are applicable for building a genome mapping database not only on human chromosome 21 but also beyond chromosomes and beyond species. 1 1.1 Introduction Why Biological Applications? The fact that only four DNA bases (adenine, thymidine, guanine, and cytosine - symbolically represented as A, T, C and G respectively) encode most of the information on current life and its history is fascinating from th~ viewpoint of computer science. More interesting is that many biological reactions are due to the property that A and T make a complementary pair as well as G and C do. Genome analysis is potentially a large application area for symbolic computation. As biological experimental methodology develops, more gene information is accumulated and analysed. This holds especially true for such large scale models as the human genome whose total genome size reaches a few billion of bases. Since NIH (National Institute of Health) and DOE (U.S. Department of Energy) embarked a joint national research initiative [30, 31], human genome projects have been initiated in many other countries and research activities are being expanded and accelerated day by day [89, 65, 83]. To proceed efficiently in the ever accelarating climate of current biological research, strong support and feedback from computer-aided analysis is mandatory [74, 53, 39]. 1.2 What is a Physical Mapping Process? Genome mapping is similar to geographical mapping. The genome mapping is now akin to the early times of geography. First of all, it is not known yet exactly how big the genome is. Continents, countries, states, cities and streets work as geographical markers which give positional information, addresses, on the earth. As well, continental-level landmarks with location-specific information such as a single copy DNA sequence (i.e., sequences that occur only once in the genome) [70] have been discovered here and there on the genome; fragmentary maps around these landmarks are being drawn, some of which are being glued one to another. Furthermore, as there are geographical maps and time-zone maps, there are different kinds of genome maps, roughly categorized into two kinds: physical maps giving physical distances (i.e., the number of bases lying) between the markers and genetic maps giving recombination frequencies between the markers. This section introduces what is genome physical mapping. It should help in understanding the genomic data which will be involved in the genome mapping databases described in later sections. For more details, consult [90]. Chop, Identify and Assemble. Figure 1 shows how physical mapping is done. In general, a genome is too large to be directly sequenced2 with the current sequencing technology. For example, the total size of hu2 DNA sequencing means experimentally reading a DNA sequence consisting of A, T, C, G bases. 308 man chromosome 21, the shortest human chromosome, is thought to be 50 to 65 Mb (mega bases), while the maximum length of DNA read per day is about 500 bases, including reading error corrections, and the cost of sequencing is about one dollar per base [44, 9]. If the chromosome 21 were read in a serial manner, it would take 250 years. Hence, first of all, the chromosome has to be excized into pieces (called fragments), which are small enough for further analysis, such as a 20-30 Kb (kilo bases) to a 2-3 Mb (step (1)). The excision is done by a physical method (e.g. by irradiation [27]) or by a chemical method (e.g. by digestion with restriction enzymes 3 ) (step (3)). Then, the DNA fragments are to be assembled to the original chromosome. For the assembling, there are a variety of methods, depending on (a) whether or not the DNA fragments may overlap, (b) how the overlap and adj acency of the DNA fragments are detected, and (c) whether each step of experiment is attempted against an individual fragment or to a group of fragments at a time. A Conventional Physical Mapping Method. Figure 1 shows a conventional mapping method which deals with non-overlapping restriction fragments. This method starts with chemical digestion using a restriction enzyme. The restriction fragments are sorted in size (by the electrophoresis method (step (4)) and assigned to an approximate region through hybridization4 • Each fragment is hybridized to a variety of cell lines 5 • As each cell line covers a different region, the pattern of hybridization signals against different cell lines determines which region the target fragment resides (step (2)). Next, hybridization is attempted against probes within that region. With a positive hybridization signal on with a probe, the fragment is determined to lie around the address of the probe (step (5)). A clone containing a specific restriction site is called a linking clone. A linking clone is split at the restriction site and then each half is hybridized to complete digests 6 • As the two halves are known to be ·next to each other, complete digests fished by the halves of a linking clone are found to be adjacent. Thus, linking clones introduce 3 Restriction enzymes recognize some specific DNA pattern of four to a dozen of bases and cut a double-stranded DNA at some specific position in the pattern. 4A double stranded DNA is formed if each strand contains a complementary sequence to the other. Hybridization is an attempt to make a double-stranded DNA or an RNA-DNA hybrid using this property. By labelling a probe (i.e. the counterpart) with an isotope or a dye, by means of autoradiograph or florescence one can detect if the probe has hybridized to the target or not. 5 Cell lines are DNA segments which are generated by deleting. a portion of chromosomes or by translocating between different chromosomes. 6 Complete digests are restriction fragments obtained when the restriction enzymes react to completion, Le., everyone ofthe target sites is cut. In contrast, partial digests are those which contain some fraction of the target sites uncut. the notion of adjacency that works as a strict constraint in linear-ordering restriction fragments. As a result of hybridization against a number of probes, fragments are eventually given a linear order (step (6)). The process (3) thru (6) is repeated until a map with the desired precision is obtained. ANew Physical Mapping Method. A new method, called clone contig assembly, is shown in Figure 2. This method uses clones of overlapping fragments of almost the same size determined by the cloning vector. By determining overlapping pairs of clones, walking is attempted from one clone to another. The resulting walking path forms an island of contiguous clones, that is called a contig. This method has variations depending on how the overlaps are detected (e.g., whether based on the restriction digest pattern of each clone or based on hybridization signals [54]). Furthermore, the overlap detection can be attempted against a group of clones at a time, in common. The feasibility of extracting the maximum amount of information in every step of biological experiment and the potential for automation are attracting much attention to these contig assembly methods [29]. In addition, given a set of overlapping clones, the variation of length and overlaps of clones gives a statisticallimit on the number of independent islands which can be constructed from the clones [69,26,52]. It should be noted that this method can be carried out, vigorously relying on statistical and computational analysis [14,37]. In summary, the physical mapping process consists of the three steps: (1) excising the whole DNA into pieces, (2) characterizing every piece through hybridization or digestion, and (3) assembling the pieces. While steps (1) and (2) are' done through biological experiments, step (3) is a probablistic combinatorial problem. In order to solve this problem, information retrieval from a variety of genome databases is required together with powerful computational tools. 1.3 Mapping Data Knowledge and Mapping Section 1.2 introduced the physical mapping process from the viewpoint. of biological experiments. The resulting data are published in the form of inventories as shown in Tables 1 and 2. Identification and Adjacency. Table 1 gives a relation between hybridization probes and restriction fragments 'obtained by digesting cell line WAY-17 with restriction enzyme NotI [88]. For instance, row 1 implies that clone 2310 which is a representative of locus D21S3 hybridizes to a 2200Kb complete digest and to two partial digests: a 2200Kb fragment and a 2600Kb fragment. Section 1.2 introduced linking clones with the notion of adjacency. HMG141 and HMG14s are the two halves 309 Table 1: Restriction fragments and hybridization probes (I) DNA frogments terge. DNA •• quente (4) (3) Frectlonetlon le.g.electroPhoresls) EMclllon 'e.g.dlgutlon) Noll digestion "II~ .) IFlr·Im mujmuuml ........' LII ~ ~:~~~:~::' uuuu......~:~ ':;.:;.~:~';.;: (2) I linking about ' ·'dJ.t.ncy- : clon.s ,,0 ~;;:;:> _____ : Figure 1: A process of restriction map construction (2) 'ortloIDlg .. "o. (3) croning In "ector ~ -+ iruf... -+ OO~=:i::::~ O P1 _,OO·l20KtI YAC _ 150· 500 IC.b ~II~~"""~=="~~ ,.53= --- (4) contlg assembly N otl restriction fragments 2200,2600 75 300,360,560,630 300,360,530,1000 300,2100,2900 75,1800 1800,2100,2300,2600 2000,2400,2700 1800 750,2100,2350 750,1200,1800,2050,2300 750 750 : !(5) WJJ II ~illjliU i; ~ :"·: : ·: (I) Ilr,l' DNR .. quI.CI 1 2 3 4 5 6 7 8 9 10 11 12 13 Probe locus/gene clone D21S3 231C HMG14 HMG14l HMG14s HMG14 6-40-3 D13s D13l *D21S101 JG373 *D21S15 pGSE8 LA1711 LAI71s *D21S51 SF93 *D21S53 512-16P *D21S39 SF13A == --(a) Hybridization Signal IargeIclone I ....... ATTGCCATAAT . . . . . . I 1IIIIIIIIIq I TAACGG'l'ATT_ probe (b) Restriction Site Pattern Figure 2: A process of contig assembly of linking clone HMGI4. Hence, the 75Kb fragment hybridized to HMG141 must be next to the 300Kb fragment hybridized to HMGI4s. Similarly, for a pair of D13s and D131, the 300Kb fragment and the 75Kb fragment must be adjacent; for another pair of LAI711 and LAI71s, the 1800Kb fragment and the 800Kb fragment must be adjacent. The 300Kb fragments in rows 3 to 5 can be interpreted to be identical, assuming that the 360Kb fragment (in rows 3 and 4) be a partial digest containing the 300Kb and the 75Kb fragments and also assuming that the 2100Kb fragment (in rows 4 and 5) be a partial digest containing the 300Kb and 1800Kb fragments. Thus, given a relationship of restriction fragments and hybridization probes, each restriction fragment is identified using strict constraints such as linking clones and also using its neighborhood information such as a pattern of partial digests. Confirming Information. In Table 1, the 750Kb fragments in rows 10 to 13 seem to be identical. Also, the ordering of loci D21S101 (row 7) and D21S15 (row 8) is not evident in this table, nor the ordering among loci D21S51, D21S53 and D21S39. Table 2 shows a relationship of multiple kinds of restriction fragments (of a different cell line, CHG3) and hybridization probes [17] around the same region as Table 1. With an assumption that the NotI restriction sites be rather conserved in different cell lines and considering of 10-20% errors in size, the 750Kb fragment in Table 1 can'be interpreted to correspond to the 700Kb fragment in rows 4 to 6 in Table 2. The identification of the 750Kb fragments in Table 1 is confirmed by the same set of MluI digests «200Kb, 1250Kb and 1400Kb) and NruI digests (600Kb and 2000Kb) found around the 700Kb fragment in Table 2. As for the ordering of loci D21S101 and D21S15, the 1600Kb MluI fragment in rows 1 and 2 connects ])21S101 with D21S3 and the 1400Kb MluI 310 Table 2: Multiple digests fragment in rows 3 to 6 connects D21S15 with D21S39. In general, mapping data contain a non-trivial imprecision which clouds their interpretation. Interpretation for a set of mapping data becomes less ambiguous with additional information. It is obviously efficacious to accumulate data until a convincing interpretation is acquired. dow interface for end users, both of which rather directly reflect the underlying implementation. Programmers and users must be knowledgable about implementation issues, such as how each relational table is linked to others. A high level interface is also required for easily sharing and exchanging data between different databases. Among leading database integration efforts, Genlnfo [71, 60] is notable. Three databases: Genbank (DNA sequence database), PIR (protein sequence database) and MEDLINE (medical/biological literature database) are converted into the form of an object-oriented data representation language, ASN .17 [72], so that data can be easily exchanged among the databases. ASN.l has also been applied to the construction of a metabolic compound database [49]. In summary, various kinds of information are involved in the genome mapping process. The integration of different databases is a key issue in proceeding further biological research. 1.4 1.5 Probe clone D21S3 pPW231 D21S101 JG373 D21S15 E8 D21S39 SF13A MxA/B D21S51 SF93 Notl 700,1800 1400 1400 700 700 700 : : locus/gene 1 2 3 4 5 6 : : Restriction fragments Mlul Nrul 700,1600 600 1000,1600 1400,2000 1250,1400 1400,2000 <200,1250,1400 600,2000 <200,1250,1400 600,2000 <200,700,1400 500 : : Genome Mapping Databases Public Databases and Laboratory Notebooks. The constantly growing population of genome databases [80] contains precious few mapping databases even considering different species, such as mouse [47], Caenorhabditis elegance [32] and Eschericia coli [6]. As for human, GDB (Genome Data Base) [40] is the only public mapping database. It contains information about genes, loci (landmarks), clones, contacts and maps. As for maps, consensus maps are collected each of which contains merely the consensus order of loci, without information on physical/genetic distance between loci yet [75]. Laboratory data that are primary or secondary level of experimental data including image films will someday be available in so-called laboratory notebook databases which are now under development [54, 68, 4, 58]. Especially for the contig assembly mapping method for which a computer analysis environment is essential, system development efforts are intensive and have been applied for mapping chromosomes X, 16 and 19 [54, 23, 66]. What seems to be missing in genome databases is a continuous link between public databases and laboratory notebook databases. There is a strong need to compare laboratory mapping efforts against those reported in publications and public databases. Implementations, Interfaces and Integrations. In terms of implementation strategies, most genome databases, including the above, have been implemented using relational database management systems which are based on a normal form (or fiat) relational model [24]. Also these databases provide a query language (usually SQL) interface for programmers and an interactive win- Goal and Strategies Many queries issued in the physical mapping process are imprecise, e.g., "Get all information around this locus", and "What are the consensus and differences around this locus in all collected maps?" To address these queries, all related information must first be collected from publications and various databases into a map in which all available information is woven at every location of human genome, i.e. a human genome encyclopedia. Then, using this encyclopedic map, a genomic grammar [81] will interface to the user. The construction of a human genome database system, LucY', has started. Taking chromosome 21 as the first test bed, more than forty maps of different types have been collected from publications, and several public and local databases have been int~grated into the system. Currently, the system is ready to answer rather general queries such as shown above. To our knowledge, this is the first integrated physical mapping database that has ever been implemented. The key design features which have enabled the prototyping of Lucy are: • logic programming, • object-oriented data representation and query interface, • map representation language. The following sections will describe each of these features in detail. 7 Abstract Syntax Notation 1, ISO 8824. 8The name is derived from the nickname given to the first fossile of hominid [48]. The motto herein is "For any question on human, ask Lucy". 311 Representation of Genome Information 2 2.1 Exploitation of Logic ming Featuers Program- Lucy has been implemented in a sequencial logic programming language, Prolog, for its following features: 1. Database Facility and Inference Mechanism: Its internal database facility and inference mechanism enable validation of biological data and rules as knowledge immediately when they are expressed as Prolog predicates (programs). Even if they were expressed as Prolog terms (data) as second order predicates, the inference mechanism could be implemented rather easily in Prolog. 2. Declarative Expresssion and Set Operations: Its declarative expression and (built-in) search utilities (e.g., built-in set operations such as setof and bagof) minimize the amount of programming effort for knowledge representation and database retrieval. 3. Recursive Queries: Its capability of handling recursive programming and recursive data structures enables a straightfoward implementation of recursive queries that are hard to be implemented with normal form relational databases and conventional query languages such as SQL [28]. 4. Foreign Language Interface: It is necessary to have a foreign language interface (which is provided in several Prolog implementations) to other conventional but efficient languages, such as C and Fortran, in order to import and develop the computationally intensive sequence analysis and statistical mapping tools. 5. Portability: Lucy should be developed as a real system to be used for biological analysis. The stability and po~tability of the system are the first priority. 2.2 Object-Oriented and Interface Representation '/.--------------------------------------------------------------------'/. table2(Locus, Clone, flotldigests, !Uuldigests, flruldigests). '/,--------------------------------------------------------------------table2(>D21S3', 'pPW231', [700,1800], table2('D21S101', 'JG373, [1400], table2('D21S15', 'E8', [1400], [700,1600], [600]). [1000,1600], [1400,2000]). [1250,1400], [1400,2000]). Then, for every element involved in these tables, such as loci, clones and enzymes, information collected from publications and public databases was stored similarly in a flat relational form. Obviously, as the number and variety of relations increased, it will be accordingly difficult to program and maintain the database in this format, and to remember the exact form of each relation. Another burden handling various different tables becomes obvious when encoding mapping rules. For example, the following program defines the notion of adjacency introduced with linking clones, namely that two restriction fragments are adjacent if one fragment is hybridized to one half linking clone and the other fragment to the other half linking clone and if the restriction fragments are both complete digests: is_adj acent_ to (FragmentA, FragmentB) is_half_linking_clone(HalfLinkingCloneP, LinkingClone), is_half_linking_clone (HalfLinkingCloneQ, LinkingClone), HalfLinkingCloneP \= HalfLinkingCloneQ, is_hybridized_to (HalfLinkingCloneP, FragmentA, Enzyme), is_hybridized_ to (HalfLinkingCloneQ, FragmentB, Enzyme), FragmentA \= FragmentB, is_coMplete_digest (FragmentA) , is_coMplete_digest (FragmentB) . Here troublesome is that if hybridization results were stored in various forms, predicate is_hybridized_ to/3 would have to be defined for each kind of digests in each different table, as follows: is_hybridized_to(Probe, Fragment, 'BotI') tablel C, Probe, BotIFragments), member (Fragment , BotIFragments). is_hybridized_ to (Probe, Fragment, 'BotI') table2C, Probe, flotIFragments, _, _), member (Fragment , flotIFragments). is_hybridized_ to (Probe, Fragment, '!UuI') :table2 Probe, _, MluIFragments, _), member (Fragment , MluIFragments). <-, is_hybridized_to(Probe, Fragment, 'BruI') :table2 C, Probe, _, _, flruIFragments), member (Fragment , flruIFragments). The hybridization results shown in Tables 1 and 2 in Section 1.3 could be represented as Prolog facts of a flat reiational form as follows: where member(X, Y) is a built-in predicate which succeeds if X is a member of Y. To relieve these difficulties, an object-oriented data representation has been adopted in Lucy. The hybridization relationship between a fragment and probes has been embedded as an attribute of the fragment. '/,--------------------------------------------------------------------- 2.2 .1 '/, tablet(Locus, Clone, Principle BotIdigests). '/.--------------------------------------------------------------------tablel(>D21S3', '231C', [2200,2600]). tablel (>HMG14, , 'HMG14l, [75]). tablet( 'HMG14', 'HMG14s, [300,360,560,630]). First of all, we recognize that any kind of datum is an object composed of attributes and represented as a Prolog fact, obj ect/2, consisting of a functor, obj ect, and two arguments, as follows: 312 object(Objld, Attributes). where • Obj Id is an object identifier which is unique in the entire system and is formed of a class and a local identifier unique within the class; • Attributes is a set of attributes which constitute the object. The internal representation of attributes is encapsulated in the variable, Attributes. Next, we construct general interface methods which allow retrieval of information from an object without knowing how that object is internally represented as follows: • class(Objld, Class) returns the class of the object. • id(Objld, LocalId) returns the local identifier of the object. • attribute(Obj Id, Attribute) returns an attribute composed of an attribute name and an attribute value. 2.2.2 Examples Starting with a restriction fragment, let us consider several objects related with this fragment and see what kinds of information are associated with them. Note that, in this paper, the attributes are represented in the form of a list merely for ease of explanation; a different data structure, more efficient in space and access, is used in the real implementation. 1. Restriction Fragment: This defines the 750Kb NotI fragment appearing in rows 10 to 13 in Table 1, that has been digested from cell line WAV-17 with restriction enzyme NotI. This fragment was hybridized to four probes: LA171s, SF93, 512-16P and SF13A. This information was obtaipined in an experiment done by Denan Wang, April 1991, and appears in a literature, Saito et al (1991). object ('LUCY : fragment ' ('Denan1991 :WAV-17 /NotI/750#2'), [input_date (1991/4/24), digested_from(cell_line( 'WAV-17'», digested_by(restriction_enzyme( 'NotI'», probes( [half_linking( 'LA171s') ,clone(>SF93'), clone(>512-16P') ,clone(>SF13a')]), size( 'ltb' (750», source(ref('Denan Wang (April 1991)'». references([ref(>Saito et al (1991) ')]) ]) . 2. Probe: One of the probes, SF93, was offerred by Cox, and registerred in a local clone logbook, Plasmid Book, with a local name, CLS3048. It has an EcoRI site at one end and a SalI site at the other end and is cloned in a pUC1S vector, and is resi.stant to ampicillin. object( 'PB:clone' ('SF93'). [input_date (1991/8/8) • symbol ( 'SF93'). information_source (db( 'PB/ver.89-11-8'». if_confirmed(yes) • l.ab_number ( 'CLS : cl.one ' ( , CLS3048' » • vithin( [locus ( 'D21S51') .region( '21q22')]). size( 'ltb' (2 .1». clone_sites ([restriction_site( 'EcoRI'). restriction_site ( 'Sal!')]) , vector(vector( 'pUC18'». vector_size ( 'ltb' (2.7». ant ibiotic (amp) • source ( 'PB: contact' ('Cox'» ]). 3. Locus/Gene: Clone SF93 is a representative of locus D21S51 whose information is found in public database GDB. object( 'GDB : locus ' ('D21S51'). [input_date (1991/7/5) • informat ion_source (db ( , GDB/ver .1 .0'» • sources(['GDB:source'('ltorenberg et al (1987)'). 'GDB:source'('Burmeister et al (1990) ,)]). probes ( ['GDB :probe' ('SF-93')]). symbol ( 'D21S51'). full_name (' DIA Segment. single copy probe SF-93'). Ilithin( [region('21q22 .3')]). locus_type( 'DIA'). if_cloned(yes) • assignment_modes ( ['GDB: assigrunent..mode' ('I') • . 'GDB:assignment_mode' ('S'»)). certainty(con:firmed) • report (include) • create_date('Apr 171990 1:20:46:000AK'), modify_date('lov 25 1990 2:01:29:460Pl!'), approved_date ( 'Sep 8 1990 11 :06: 13 : 320Pll')]) . 4. Contact: The person simply referred to as Cox in the Plasmid Book is David R. Cox whose detailed information is found also in public database GDB. object(>PB: contact' ('Cox') • Attributes) :object(contact('David R. Cox'), Attributes). object (>GDB: contact' ('David R. Cox'), [input_date (1991/7/5) , information_source (db( 'GDB/ver.l.0'», 'GDB: idx' (>GDB : contact ' (1148», symbol(>David R. Cox'). contact_address(['Univ. of California at San Francisco'. 'Dept. of Pediatrics/Psych/Biochem', '505 Parnassus Ave .• Box 0106'). city_address( 'San Francisco'), state_address( 'CA'), post_code( '94143'), country_name ( 'USA'), email_address(.rjbflcanctr.mc . dUke .edu'), phone_number ( '1-(415) 476-4212'), 'FAX_number' ('1-(415) 476-9843') ]). 5. Literature: The mapping effort concerning the above restriction fragment and clones was presented in the literature, Saito et al (1991), as follows: object('LUCY:reference'('Saito et al (1991)'), [input_date (1991/4/24) , kind(paper) • authors(['Akihiko Saito'. 'Jose P. Abado', 'Denan Vang', 'Kisao Ohki', 'Charl.es R. Cantor', 'Cassandra L. Smith']), titl.e('Construction ~d Characterization of a lotI Linking Library of Human Chromosome 21'), journal ( 'Genomics') • vol.ume(10) • year(1991) ]). 313 Thus, not only biological data but also personal information and literature references are all represented in an object-oriented manner. Lucy (chromosome 21 only) 2.2.3 Restricting Classifications Many biological terms have been introduced so far, such as chromosome, locus, gene, probe, clone and restriction fragment, but each of them represents just a piece of DNA. For example, when a restriction fragment is cloned, it is called a clone. When it is used for hybridization and gives information as a landmark, it is called a probe. The more biological experiments are applied to an object, the more names and attributes are given to it. Also a set of constraints over attributes forms a new category (or class). For example, when a clone is sequenced and found to contain some restriction site in it, it is called a linking clone; if the restriction site is an NotI, then it is called an NotI linking clone. object (linking_clone (Id) , Attributes) :object(clone(Id), Attributes), find_attribute (Attributes, categories (Categories» , is_member (linking_clone , Categories). object ('lotI_linking_clone' (Id), Attributes) :object (linking(Id), Attributes), find_attribute (Attributes, linking_site(restriction_enzyme( 'lotI '») . ::::l Iiio. aequenca •• le.tures, literatures Genbank 68.0 transcrlptionconlrol Iiio. - sequenees,leatures. DTCS89 literatures t.strictlonenzyme recognition patterns, methylation sites, literatures REBASE 91.07 _ Plasmid Book (local) J Figure 3: Information sources on chromosome 21, integrated in Lucy their original class name is preceeded with the database name, such as PB: clone. It preserves its origin as an additional attribute, self (Obj) object (clone(Id) , Attributes) :member(Class, ['GDB:probe', 'PB:clone', 'CLS:clone', 'LA: clone', 'LL: clone', 'YZ: clone' , 'Saltalti:clone', 'LUCY: clone']) , object_id(Obj, Class, Id), object(Obj, AttributesO), add_attribute(AttributesO, self(Obj), Attributes). ) Broading Classifications The information sources constituting Lucy cover more than forty maps collected from publications and several different kinds of public and local biological databases, as shown in Figure 3. In general, in integrating databases, mainly two kinds of strategies are considered: (1) one is to distill the source databases and unite them into a single database, and (2) the other is to preserve the original form of the source database and provide a bridging interface over them. Similar biological experiments are being done in parallel at different places. As a result, similar data are accumulated in different databases or even in a single database independently. Also, each datum stored in a version of a database might be corrected or changed through later experiments and reported in a later version of the database. In integrating a genome database, preserving the redundancy and inconsistency of data is a substantial effort. As a result, the second integration strategy taken in Lucy keeps track of the redundancy and inconsistency. The following program provides a bridging interface to bundle clones which are stored in various sources. Any clone can be referred to with a class name, clone, while literatures GDB1.0 Iiio. ... lOCi, genes, clones, contacts, HGM10.5 As a principle, objects may have no class when they are created. Classification is made as more attributes are accumulated and properties are found through later experiements. 2.2.4 Integrated Databases Collected maps . In summary, the notion of class introduced in Lucy is loose unlike such a stringent notion as "class-astemplate" which is-'widely adopted in object-oriented programming languages [41, 78, 91]. 3 Constructing a Global Map from Fragmentary Maps In order to understand mapping information in a visual form, a general graphic interface, GenoGraphics [95,43], has been hooked up to the Lucy database system. As shown in Figure 3, those maps collected in Lucy have a variety of range and scaling unit. Some maps cover q-telomeric regions, some do centromeric regions, and many others do some specific region (or island) such as locus D21S13 that is concerned with the Alzheimer disease. Also physical maps are measured their coordinates in Kb (kilo base), genetic maps are in cM (centiMorgans), and cytognetic maps are in ratio (%). For the moment, even the total genome size of chromosome 21 is not precisely determined. If every object in maps measured in percentage were specified with an 314 absolute coordinate, the coordinate would have to be modified every time the total genome size is corrected through later experiments. Similarly, the exact position of the D21S13 locus is not fixed, either. Every time a more precise position were determined for locus D21S13, the coordinates of all maps around the locus would have to be changed. 3.1 [ H <: I, A, := G I <: B <: C <: -500 <: #L 1 <: D <: E <: -300 <: F =: qter -500 G gap L1 gap Map P Map Expression Link First of all, objects in each fragmentary map should be addressed in a local coordinate system within the map, so that the specification of coordinates of objects does not need to be modified in the event that their island floats around. Namely, a relative addressing coordinate system is required. Next, for those fragmentary maps associated with some landmark, when the landmark moves around, they should follow without modification in their coordinate system. In Lucy, a map representation language called an map e:cpression has been introduced, which allows a map to be represented in a local coordinate system and in a relative addressing manner, and to be linked to another with an anchoring mechanism. The syntax of a map expression is defined as follows: ::= , : =, I '=:' ': <, I '<:' , [' , '" , ' ] ' 1. Relative Addressing: Two notions are associated with a map expression: one is the current position and the other is the current direction. (a) Linear-Ordering Expressions A : < B and A <: B mean, in common, that A is left of B; additionaly, the former means that B is evaluated after A is done, while the latter does that A is evaluated after B is done. (b) Changing the Current Evaluation Direction Expression: = A means to put the left bound of A at the current position and proceed the evaluation rightward, while expressin A =: means to put the right bound of A at the current position and proceed the evaluation leftward. (c) Multi-Pinning Expression [A, BJ <: C means that A is left of C as well as B is left of C. 2. Anchoring: Objects constituting a map expression include positions and anchors (positions associated with labels). A label is globally accessible beyond a map expression so that it connects one map expression with another. w MapQ Figure 4: Map expressions (a) Memorizing an Anchor Expression #L : < B means to memorize the left bound of B under the label L. (b) Referring to an Anchor Expression A <: ?L means to refer to an anchor labelled with L to take it as the right bound of A. Figure 4 illustrates an example of expressing two fragmentary maps, P and Q, which are linked up at the middle. Map P starts with the q-telomere which is followed by fragment F, a 300Kb gap, fragment E, fragment D, a 500Kb gap, fragment C and fragment B. At the left bound of fragment B, three other fragmentary maps start: one map proceeds pinning leftward on fragment I and then on fragment H, one map goes leftward from fragment A, and the other map goes rightward from fragment G. The position of the left bound of fragment D is labelled L1 to be an anchor for map Q. Map Q contains two fragmentary maps starting with the anchor labelled L1. One map proceeds pinning with Y and then X leftward from the anchor, and the .other does with Z and then W rightward from the anchor. Figures 5, 6 and 7 show those maps represented in map expressions, using GenoGraphics. Figure 5: this is an NotI restriction map around the qtelomere region of chromosome 21, some of whose data have been introduced in Table 1. Notl fragments and sites are shown in light green; gray lines denote hybridization signals between fragments and probes. Thus an interpretation of biological' data is visualized to help understanding and verifying the mapping process. Figure 6: in [34], regions are defined based on breakpoints (bounds) of various cell lines. The map of regions 315 r-_ _ ~-~=-~ ____~Y~'iM~iM~i!=&~-~ t.,~;_:·_"'_"'_" _'_"_"'_'5'_"'_"'_' n W_I]_~_ ,IJ: [ - '-~~~~~~:~If~'"-t;l; ,ji ~~.:i ____ 1111:~,=~~-::-=__ ._=-= ___ ~"~!,~_~=~_;n _ ~"~ _ _ mj~~ , : .... f----:";-~-_1HIL---++---!;I-.:-;---.:,---.;.:..---:..m---;..~~.;........j1rl11 aw..,,,,,..;... T~ ~ ~:~' __ l __ ._______ -. .. .-...... _,~.,.l .. oIi:l _I----'---';'----+---+--fl-~-\-J--.-+-,-_+_I+II: I=:=___-=:=--=~_~~~---~~_:.~ ---._. . - . _- --- iL__ ._--------------._----_ .. _--- .. t _ __ ~-=-~=--==---________ .__~:____ "..... i~' r-- -"'~'-"'---"-'~-~~''''''' ____: "__ _~---,-------Hi------jh-------j!--------jhll,--+---+1i~--H: I~~·~~: ;;,~:,-;::;~'<;~==~~~_=.~il Figure 7: Three maps around locus D21S13 Figure 5: Visualizing a restriction map with hybridization signals (labelled Gar90) is expressed with the breakpoints of the cell line panel (labelled bkpt st) as anchors. For example, region A7 is formed with the left bound of cell line 1 ; 21 and the left bound of cell line ACEM- 2. Figure 7: three restriction maps, labelled IchS13, CoxS13 and Raf91, are those around the D21S13 locus whose position is given in the chromosome 21 physical anchor map (labelled 21phy), scaled in percentage. rr 3 'J~O 11)Q)Cj~1 4 Inquiry to Lucy This section presents the current status of queries Lucy can currently handle. Concering the map visualized in Figure 5, the mapping effort started with the q-telomere and reached around locus D21S17 where a 1300Kb NotI fragment was pinned down. The following queries are issued to retrieve information related with this region so that the mapping effort can be advanced toward the centromere. ~'Ir---------------------- II III III ---f---: 1. Regarding locus D21S17, its regional information is retrieved. I I I I I +-+-1+-11 I ••••• ' ---a~h·MM!t."_~_tr.~.'l.~"'.('.{ I ?- get_attributes Clocus<'D21S17'), [sel:f(S», llithin(Rs)]), print_object (Rs-S), :fail. Object Id: - ([region(21q22 .1-q22. 2)], GDB: locus(D21S17» Figure 6: Gardiner's region map generated from a cell line panel 1 Ji$!$.mMW :- ! I II' Object Id: -( [region(21q21. 2-qter)], KGlUO. 5 :locus(D21S17) Object Id: - ([gardiner_region(B1), region(21q22. 3)], LUCY :locus (D21S17» 316 The region recorded in GDB is narrower than the one in HGMIO.5 which is the predecessor of GDB. Also the D21S17 locus is assigned to region B1 In Gardiner's map shown in Figure 6. 2. Then, objects which occur left of D21S17 in all maps on which D21S17 occurs are retrieved. I ?- setof(Obj-MID, 05·( occurs_on(map(MID), locus(>D21S17'», ordered_obj ects_on_map (map (MID) , Os), left_to(Obj, locus('D21S17'), Os) ), OMs) , keymerge(OMs, KOMs) , !, member(OM, KOMs) , print_object (OM), fail. Object Id: -(clone (pGSH8), [chr21_Denan1991_physical_around_21q22 .3]) Object Id: - (gardiner_region(Bl), [chr21_Gardiner1990]) Object Id: -(locus(D21S58), [chr21_Burmeister1991_RH, chr21_Petersen1991 _female_meiosis, chr21_Petersen1991_maleJlleiosis, chr21_Tanzi 1988_female, chr21_Tanzi1988_male, chr21_ Tanzi1988_sex_averag ed, chr21_physical_anchors]) Object Id: -(locus (D21S82), [chr21_Warren1989_female_meiosis, chr21_Warr en1989_male_meiosis] ) Beside clone pGSH8 and region B1, loci D21S58 and D21S82 are reported. 3. For the D21S58 locus, its regional information is retrieved. I ?- get_attributes (locus ('D21S58'), [self(S), llithin(Rs)]), print_object(Rs-S), fail. Object Id: -( [region(21q22 .1-q22 .2)], GDB :locus(D21S58» Object Id: -( [region(21q21)], HGM10. 5 :locus (D21S58» Object Id: - ([gardiner_region(D4)], LUCY: locus (D21S58» 5. Finally, detailed information about locus D21S58 is retrived. - ? print·object(locus(,D21S58'». Ca.tegories: [1] locus Input Date: 1991/8/5 Investiga.tors: [1] contact(P. C. Watkins) Object Id: locus(D21S58) Probes: [1] clone(524.5P) References: [1] Katheleen Ga.rdiner, Michel Horisberger, Ja.n Kra.us, Umadev i Tantravahi, Julie Korenberg, Veena. Rao, Shyam Reddy, Da.vid Pa.Uerson, "Ana.lysis of huma.n chromosome 21: correla.don of p hysica.l a.nd cytogenetic ma.ps; gene a.nd CpG isla.nd distributio ns", The EMBO Jounal, 9, 25.34, 1990 [2] Michael B. Petersen, Susan A. Slaugenhaupt, John G. Lewis I Andrew C. Wa.rren, Aravinda. Cha.kra.va.rti, Stylianos Antonarak is, "A Genetic Linkage Map of 27 Markers on Human Chromosom 2 I", Genomics, 9, 407.419, 1991 Self: LUCY:locus(D21S58) Within: [1] gardinerregion(D4) [GDB/ver.1.0] Approved date: Sep 8 1990 10:57:11:140PM [GDB/ver.1.0] Assignment modes: [1] somatic cell hybrids [GDB/ver.1.0] Certainty: confirmed [GDB/ver.1.0] Create date: Jun 18 1989 9:42:08:000AM [GDB/ver.1.0] Full name: DNA Segment, single copy probe pPW524-5P [GDB/ver.1.0] GDB:idx: GDB:locus(8242) [GDB/ver.1.0] If cloned: yes [GDB/ver.1.0] Information source: db(GDB/ver.1.0) [GDB/ver.1.0] Input Date: 1991/7/5 IGDB/ver.1.0] Locus type: DNA [GDB/ver.1.0] Modify date: Nov 25 1990 2:01:47:640PM [GDB/ver.1.0] Polymorphism type: polymorphic Although the answers from GDB and HGMIO.5 conflict, the locus is assigned to region D4 in Gardiner's map, which is to the left of region B1. 4. In order to grasp what more loci reside further left, all loci not only in region D4 but also in every D region are retrieved. ?- setof(R-Id, Rs·( get_attribute (locus (Id) , llithin(Rs», member(gardiner_region(R), Rs), substring(R, "D") ), RIs) , keymerge(RIs, KRIs) , !, member(KRI, KRIs) , print_object (KRI) , fail. Object Id: -(Dl, [D21S54]) Object Id: - (D2, [D21S93]) Object Id: - (D3, [D21S63, SOD1]) Object Id: -(D4, [D21S58, D21S65]) [GDB/ver.1.0] Probes: [1] GDB:probe(pPW524.5P) [GDB/ver.1.0] Report: include [GDB/ver.1.0] Self: GDB:locus(D21S58) [GDB/ver.1.0] Sources: [1] P. C. Watkins, R. E. Tanzi, J. Roy, N. Stuart, P. Stanisl ovitis, J. F. Gusella, "A cosmid clone genetic linkage ma.p of chromosome 21 and localization of the brea.st cancer estrogen ·inducible (BCEI) gene.", Cytogenet Cell Genet, 46, 712, 1987 [2] M. Van Keuren, H. Drabkin, P. Watkins, J. Gusella, D. Pat terson, "Regional ma.pping of DNA sequences to chromosome 21." , Cytogenet Cell Genet, 40, 768.769, 1985 [3] P. C. Watkins, P. A. Watkins, N. Hoffman, P. Stanisloviti s, "Isolation of single-copy probes detecting DNA polymorphis ms from a. cosmid libra.ry of chromosome 21.", Cytogenet Cell G enet, 40, 773· 774, 1985 [4] M. L. Van Keuren, P. C. Watkins, H. A. Drabkin, E. W. Jab J. F. Gusella, D. Pa.tterson, "Regional localization of DNA sequences on chromosome 21 using somatic cell hybrids.", Am J Hum Genet, 38, 793-804, Jun 1986 5, [5] M. Burmeister, S. Kim, R. Price, T. hi, R. M. Myers, D. R. Cox, "A map of hromosome 21 constructed by ra.dia.tion ed-field gel electrophoresis", Genomics, [GDB/ver.1.0] Symbol: D21S58 [GDB/ver.1.0] Within: [1] region(21q22.1.q22.2) de La.nge, U. Tantrava the long arm of huma.n c hybrid mapping a.nd puIs In Press, ??, 1990 317 [HGMI0.,] single # of copies: [HGMI0.,] Assignmenl mode.: [I] somatic cell hybrids [HGMI0.,] Calegories: [1] locus [HGMI0.5] Cerlainly: provisiona.l [HGMI0.S] Information source: db(HGMI0.5) [HGMI0.5] Inpul Dale: 1991/3/27 [HGMI0.S] Probes: [I] clone(pPWS24.SP) [HGMI0.S] References: [I] ref(Walkins et al (HGM8)) [2] P. C. Watkins, R. E. Tanzi, K. T. Gibbons, J. V. Tricoti, G. Landes, R. Eddy, T. B. Shows, J. F. Gusella, "Isola.tion 0 f polymorphic DNA segments from huma.n chromosome 21.", Nuclei c Acid:; Res, 13, 6075·88, Sep 1985 [3] M. L. Van Keuren, P. C. Watkins, H. A. Drabkin, E. W. Jab J. F. Gusella., D. Pa.tterson, "Regiona.l loca.liza.tion of DNA sequences on chromosome 21 using soma.tic cell hybrids." I Am J Hum Genet, 38, 793.804, Jun 1986 5, [4] ref(Nakai el at (HGM9)) [HGMI0.'] Self: HGMI0.5:locus(D21S,8) [HGMI0.'] Within: [I] region(21q21) field of programming languages, it has been widely disseminated over the past ten years [78, 91]. The heart of object-orientation, that is encapsulating the internal details of an object, is important for the implementation and retrieval of various kinds of data involved in genome databases. Lucy has only adopted an object-oriented data representation. Other ramparts have not been constructed yet: neither object-specific methods nor class inheritance. They will be future work. Since the object-orientation was introduced to Lucy, some cases have been found where the framework does not fit naturally but where a nested (N F2: non-first normal form) relational model [38, 1, 76] would. Here is an example. Given a table of linking clones, an entry for LAl71 has been represented as follows: ------+---------+------------------------+----------------# of occurrences name I region I KluI BssHII I cloned fragments SacII I large small ------+---------+------------------------+----------------LA171 I 21q22.3 I LA179 I 21cen I I 3 0 3 3.0 1.1 2.1 0.96 I ------+---------+------------------------+----------------- Information are reported from publications, GDB and HGM10.5 in that order. 5 Concluding Remarks Promoted by requirements in variOl~s application areas as well as in biology, steady progress in database technology has been made in the last few years [82]. Since the normal-form (lNF or flat) relational model [24] was proposed, practice over the years has pointed out its inefficiency in data access and its verbosity in inquiry [25, 28]. The source of both problems is the primitive data structure, the flat relation. In genome databases implemented upon normal form relational database systems, these problems are cast in relief, since the volume and variety of involved data are large and growing. In fact, the number of tables constituting a genome mapping database is apt to be quite large (e.g., 68 tables in LLNL Genome Database [4] and over 100 tables in GDB). The present work could be regarded as one of the first to have successfully integrated public and local genome databases. The success greatly reflects the application of an object-oriented data representation and logic programming features, which should be the preliminary steps toward object-oriented databases [5, 36, 3, 7, 33, 85, 19] and deductive databases respectively. Through an experience with Lucy, it should be reasonable to conclude that these database technologies will contribute to the development and practice of genome databases. Object-Oriented Database Technology. Since the notion of object-orientation [41] was invented in the object ('LA: clone' ('LA171'), [input_date(1991/2/11) , categories( [linking_clone]), Ilithin( [region( '21q22 .3')]), cloning_ vector (].ambda) , linking_site (restriction_enzyme ('lotI'» , digested_from(genomic_DIA(human» , digested_bye [restict ion_enzyme ( , KcoRI ')]), contains ([times (restrictio]Lenzyme ('KluI'), 1), times (restriction_enzyme( 'BssHII'), 2), times (restriction_enzyme ('SacI!'), 3) ]), parts ( ['LA: clone' ('LA171l'), 'LA: clone' ('LA171s')]), references([ref('Saito et el (1991) ')]) ]). obj ect <'LA: clone' <'LA1711') , [input_date(1991/3/30) , categories ([half_Iinking_clone]) , linking_site (restriction_enzyme ('lotI'» , size('Kb'(3.0» ]). object <'LA: clone' <'LA171s') , [input_date(1991/3/30) , categories( [half_l inking_clone] ) , size<'Kb' (2 .1» J). As shown in Table 1, LA1711 and LA171s are those half linking clones which hybridized to fragments, 1800Kb and 750Kb, respectively. When these half linking clones were identified as objects, their sizes, 3.0Kb and 2.1Kb, were encapsulated in these objects. In contrast, consider the number of occurrences of restriction sites. It is questionable that an object should be created for the number of occurrences, such as once, twice or three times. Being part of an attribute, contains, the occurrences are stored as a nested relation of the form, tirnes/2. For the third and forth columns in the example above, their relational structures are similar, but the meanings of their data imply different implementations. Further studies will be necessary to clarify this problem. 318 Deductive Database Technology. The necessity of loading an inference mechanism into a database system has been claimed in knowledge-intensive applications [16, 56, 63, 84]. Because most biological knowledge is symbolic rules on the four characters of DNA, there is a potential requireiment for rule processing capability. A couple of genome database systems are being developed abreast of Lucy, exploiting logic programming facilities [42, 73, 6, 45]. In Lucy, the inference capability is being used mainly for query management. Few pieces of biological rules have been implemented. References [lJ S. Abiteboul and N. Bidoit. Non first normal form relations to represent hierarchically organized data. In Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pp.191-200, Waterloo, April 1984. [13J R. J. Brachman, A. Borgida, D. 1. McGuinness, and 1. A. Resnick. The CLASSIC knowledge representation system, or, KL-ONE: The next generation. Workshop on Formal Aspects of Semantic Networks, Santa Catalina Island, CA, February 1989. [14J E. Branscomb, T. Slezak, R. Pae, D. Galas, A. V. Carrano and M. Waterman. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries. Genomics, 8: 351-66 (1990). [15J Ivan Bratko. Prolog: programming for artificial interlligence. Second Edition, Addison Wesley, 1990. [16J Bruce G. Buchanan and Edward H. Shortliffe. Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Reading, Addison-Wesley, 1984. [17J Margit Burmeister, Suwon Kim, E. Roydon Price, Titia de Lange, Umadevi Tantravahi, Richard M. Myers, and David R. Cox. A Map of the Distal Regin of the Long Arm of Human Chromosome 21 Constructed by Radiation Hybrid Mapping and Pulsed-Field Gel Electophoresis. Genomics, 9:19-30,199l. [2J Serge Abiteboul and Paris C. Kanellakis. Object Identity as a Query Language Primitive. In Proceedings of the 1989 ACM-SIGMOD Conference on Management of Data, Portland, OR, May 1989. [18J Paul Butterworth, Allen Otis and Jacob Stein. The GemStone object database management system. Communication of the ACM, Vo1.34, No.10, pp.64-77, October 1991. [3J R. Agrawal and N. H. Gehani. ODE (Object Database and Environment): The Language and the Data Model. In Proceedings of the 1989 ACM-SIGMOD Conference on Management of Data, Portland, OR, May 1989. [19J Michael J. Carey, David J. DeWitt and Scott L. Vandenberg. A Data Model and Query Language for EXODUS. In Proceedings of the 1988 ACM-SIGMOD Conference on Management of Data, Chicago, IL, June 1988. [4J L. K. Ashworth, T. Slezak, M. Yeh, E. Branscomb and A. V. Carrano. Making the LLNL Genome Database 'Biologist Friendly'. DOE Human Genome Program ContractorGrantee Workshop, Santa Fe, NM, February 17-20 1991. [20J A. V. Carrano. Estalishing the order of human chromosomespecific DNA fragments. Basic Life Science, 46: 37-49 (1988). [5J M. Atkinson, F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, S. Zdonik. The Object-Oriented Database System Manifesto. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases, Kyoto, 1989. [6J A. Baehr, G. Dunham, A. Ginsburg, R. Hagstrom, D. Joerg, T. Kazic, H. Matsuda, G. Michaels, R. Overbeek, K. Rudd, C. Smith, R. Taylor, K. Yoshida and D. Zawada. An Integrated Database t() Support Research on Esherichia coli. Technical Report, Argonne National Laboratory, October 1991. [7J J. Banerjee, II.-T. Chou, J. Garza, W. Kim, D. Woelk, N. Ballou and H. J. Kim. Data model issues for object-oriented applications. ACM TOIS, Jan 1987. [8) F. Bancilhon and S. N. Khoshafian. A calculus of complex objects. In Proceedings of the ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, pp.53-59, March 1986. [9J Bart Barrell. DNA sequencing: present limitations and prospects for the future. FASEB Journal, 5:40-45, 1991. [10J B. J. Billings, C. 1. Smith and C. R. Cantor. New techniques for physical mapping of the human genome. FASEB Journal, 5:28-34 (1991). [l1J D. G. Bobrow and T. Winograd. An overview of KRL, a knowledge representation language. Cognitive Science, 1:346, 1977. [12J R. J. Brachman and J. Schmolze. An overview of the KLONE knowledge representation system. Cognitive Science, 9(2):171-216, April-June 1985. [21J A. V. Carrano, J. Lamerdin, 1. K. Ashworth, B. Watkins, E. Branscomb, T. Slezak, M. Raff, P. J. de Jong, D. Keith, L. McBride. A high-resolution, fluorescence-based semiautomated method for DNA fingerprinting. Genomics,4: 129-36 (1989). [22J A. V. Carrano, P. J. de Jong, E. Branscomb, T. Slezak and B. W. Watkins. Constructing chromosome- and region-specific cosmid maps of the human genome. Genome, 31: 1059-65, (1989). [23J M. J. Cinkosky and J. W. Fickett. SIGMA: Software for Integrated Genome Map Assembly. Proc. of the 11 th International Human Gene Mapping Workshop (HGMll) , London, August 18-22, 1991. [24J E. F. Codd, A Relational Model of Data for Large Shared Data Banks. Communication of ACM, 13:6, 1970. [25J E. F. Codd. The Relational Model for Database Management, Version 2. Addison Wesley, 1990. [26J A. Coulson, J. Sulston, S. Brenner and J. Karn. Toward a physical Map of the genome of the nematode, Caenorhabditis elegans. Proc. Natl. Acad. ci. USA, 83: 7821-7825,1986. (27) David R. Cox, Margit Burmeister, E. Roydon Price, Suwan Kim, Richard M. Myers. Radiation Hybrid Mapping: A Somatic Cell Genetic Method for Constructing High-Resolution of Mammalian Chromosomes. Science, 250: 245-250, 1990. [28J C. J. Date. An Introduction to Database Systems, Volume I, Fifth Edition, Reading, Addison Wesley, 1990. [29J Kay E. Davies and Shirley M. Tilghman, editors. Genome Analysis Volume 1: Genetic and Physical Mapping. Cold Spring Harbor Laboratory Press, 1990. 319 [30] Human Genome 1989-90 Program Report. DOEjER-0446P, U.S. Department of Energy, March 1990. [48] Donald Johanson and Maitland Edey. Lucy: The Beginnings of Human Kind. Simon & Schuster, 1981. [31] Understanding Our Genetic Inheritance, The U.S. Human Genome Project: The First Five Years FY 1991-1995. DOEjER-0452P, U.S. Department of Energy. [49] Peter Karp. A Knowledge Base of the Chemical Compounds of Intermediary Metabolism. unpublished, SRI International, September 1991. [32] R. Durbin, S. Dear, T. Gleeson, P. Green, L. Hillier, C. Lee, R. Staden and J. Thierry-Mieg. Software for the C. Elegans Genome Project. Genome Mapping and Sequencing Meeting, Cold Spring Harbor Laboratory, NY, May 8-12 1991. [50] G. Kiernan, C. de Maindreville and E. Simon. Making Deductive Database a Practical Technology: a step forward. In Proceedings of the 1990 ACM-SIGMOD Conference on Management of Data, Atlaritic City, NJ, May 1990. [33] D. Fishman et al. Iris: An object-oriented database management system. ACM TOIS, Vol.5, No.1, pp.48-69, January 1986. [34] Katheleen Gardiner, Michel Hoisberger, J an Kraus, Umadevi Tantravahi, Julie Korenberg, Venna Rao, Shyam Reddy and David Patterson. Analysis of human chromosome 21: coorrelation of physical and cytogenetic maps; gene and CpG island distributions. The EMBO Journal, vol.9, no.1, pp.25-34, 1990. [35] D. Garza, J. W. Ajioka, D. T. Burke and D. L. Hartl. Mapping the Drosophila genome with yeast artificial chromosomes. Science, 246: 641-6 (1989). [36] O. Deux et al. The O 2 system. Communication of the ACM, Vo1.34, No.10, pp.34-48, October 1991. [37] J. W. Fickett, M. J. Cinkosky, D. Sorensen and C. Burks. Integrated Maps: A Model and Supporting Tools. Proc. of the 11 th International Human Gene Mapping Workshop (HGM11), London, August 18-22,1991. [38] P. Fisher and S. Thomas. Operators for non-first-normalform relations. In Proceedings of the 7th International Computer Software Applications Conference, Chicago, November 1983. [39] Karen A. Frenkel. The Human Genome Project and Informatics. Communication of ACM, Vo1.34, No.11, 40-51, 1991. [40] GDB and OMIM Quick Guide (Version 4.0). GDBjOMIM User Support, The William H. Welch Medical Library, Baltimor·e, July 1991. [41] Adele Goldberg and David Robson. Smalltalk-80: The Language and Its Implementation. Addison-Wesley, 1983. [42] P. M. D. Gray and R. J. Lucas, editors. Prolog and Databases: implementations and new directions. Ellis Horwood, series in Artificial Intelligence, 1988. [43] Ray Hagstrom. GenoGraphics. International Chromosome 21 Workshop, Denver, CO, April 10-11, 1991. [44] 1. Hood, R. Kaiser, B. Koop and T. Hunkapiller. LargeScale DNA Sequencing. DOE Human Genome Program Contractor-Grantee Workshop, Santa Fe, NM, February 1720 1991. [45] Toni Kazic and Shalom Tsur. Building a Metabolic Function Knowledgebase System. Workshop on Biological Applications in Logic Programming, ILPS'92, San Diego, CA, November 1991. [46] Y. Kohara, K. Akiyama and K. Isono. The Physical Map of the Whole E.coli Chromosome: Application of a New Strategy for Rapid Analysis and Sorting of a Lrge Genomic Library. Cell, 50: 495-508, 1987. [47] M. Kosowsky, C. Blake, D. Bradt, J. Eppig, P. Grant, L. Mobraaten, J. Nadeau, J. Ormsby, A. Reiner, S. Rockwood, J. Saffer, T. Snell and M. Vollmer. Integration and Graphical Display of Genomic Data: The Encycolopedia of the Mouse Genome. Genome Mapping and Sequencing Meeting, Cold Spring Harbor Laboratory, NY, May 8-12 1991. [51] Charles Lamb, Gordon Landis, Jack Orenstein, Dan Weinreb. The ObjectStore database system. Communication of the ACM, Vol.34, No.10, pp.50-63, October 1991. [52] Eric S. Lander and Michael S. Waterman. Genomic Mapping by Fingerprinting Random Clones: A Methematical Analysis. Genomics, 2:231-239, 1988. [53] Eric S. Lander, Robert Langridge and Damian M. Saccocio. Mapping and Interpreting Biological Information. Communication of ACM, Vol.34, No.ll, 32-39, 1991. [54] Hans Lehrach, Radoje Drmanac, Jorg Hoheisel, Zoia Larin, Greg Lennon, Anthony P. Monaco, Dean Nizetic, Gunther Zehetner and Annemarie Poustka. Hybridization Fingerprinting in Genome Mapping and Sequencing. Genome Analysis Volume 1: Genetic and Physical Mapping, pp.3981, Cold Spring Harbor Laboratory Press, 1990. [55] A. J. Link and M. V. Olson. Physical map of the Saccaromyces cerevisia genome at 1l0-kilobase resolution. Genetics, 127: 681-98 (1991). [56] Douglas B. Lenat and R. V. Guha. The Evolution of CycL, The Cyc Representation Language. SIGART Bulletin, Vol. 2, No.3, ACM Press, June 1991. [57] Douglas B. Lenat and R. V. Guha. Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, 1990. [58] S. Lewis, W. Johnston, V. Markowitz, J. McCarthy, F. Olken and M. Zorn. The Chromosome Inforamtion System. DOE Human Genome Program Contractor-Grantee Workshop, Santa Fe, NM, February 17-20 1991. [59] Y. Lien. Hierarchical schemata for relational databases. ACM Transactions on Database Systems, Vol.6, No.1, pp.4869, March 1981. [60] David Lipman and James Ostel. Entrez: Sequences. In Proceedings of the 11th International Human Gene Mapping Workshop (HGM11), London, August 18-22, 1991. [61] Joseph Locker and Gregory Buzard. A dictionary of transcription control sequences. DNA Sequence - Journal of DNA Sequencing and Mapping, VoLl, pp.3-11, 1990. [62] Lohman et al. Extensions to Starburst: objects, types, functions and rules. Communication of the ACM, Vol. 34, No.10, pp.94-109, October 1991. [63] D. R. McCarthy and U. Dayal. The architecture of an active database management system. In Proceedings of ACM SIGMOD 89, pp.215-224, Portland OR, May 1989. [64] A. Makinouchi. A consideration on normal form of notneceissarily-normalized relation in the relational data model. In Proceedings of the Third International Conference on Very Large Databases, pp.447-453, Tokyo, October 1977. [65] Victor A. Mckusick. Current trends in mapping human genes. FASEB Journal, 5: 12-20, 1991. 320 [66] H. W. Mohrenweiser, K. M. Tynan, E. W. Branscomb, P. J. de Jong, A. Olsen, B. Trask and A. V. Carrano. Development of an integrated genetic functional physical map of human chromosome 19. In Proceedings of the 11th International Human Gene Mapping Workshop (HGMll), London, August 18-22, 1991. [67] Fumio Mizoguchi. Knowledge Representation and Knowlegdge Programming. (in Japanese) Computer Software, Vol. 8, No.4, JSSS, July 1991. [68] D. Nelson and J. W. Fickett. An Electronic Laboratory Notebook for Data Management in Physical Mapping. DOE Human Genome Program Contractor-Grantee Workshop, Santa Fe, NM, February 17-20 1991. [69] M. V. Olson, J. E. Dutchik, M. Y. Graham, G. M. Brodeur, C. Helms, M. Frank, M. MacCollin, R. Scheinman and T. Frand. Random-clone strategy for genomic restriction mapping in yeast. Proc. Natl. Acad. Sci. USA, 83: 7826-7830, 1986. [70] Marnard Olson, Leroy Hood, Charles Cantor and David Botstein. A Common Language for Physical Mapping of the Human Genome. Science, 245: 28-29, September 1989. [71] James Ostel. Genlnfo Backbone Database Overview (Version 1.0). Technical Report, NLM, NIH, Bethesda, MD, June 1990. [72] GenInfo ASN.1 Syntax: Sequences (Version 0.50). Technical Report, NCBI, NLM, NIH, Bethesda, MD, National Institutes of Health, MD, 1991. [73] Norman W. Paton and Peter M. D. Gray. An object-oriented database for strage and analysis of protein structure data. In Reading [42]. [74] Mark L. Pearson and Dieter Soll, The Human Genome Project: a paradigm for information management in the life sciences. FASEB Journal, 5: 35-39, 1991. [75] Peter PearSOll. The Genome Data Base. Proc. of the 11th International Human Gene Mapping Workshop (HGMll) , London, August 18-22,1991. [76] Mark A. Roth and Henry F. Korth. The Design of -,lNF Relational Databases into Nested Normal Forma. In Proceedings of the 1987 ACM-SIGMOD Conference on Management of Data, pp.143-159, San Francisco, May 1987. [77] Akihiko Saito, Jose P. Abado, Denan Wang, Misao Ohki, Charles R. Cantor and Cassandra L. Smith. Construction and Characterization of a Not! Linking Lbrary of Human Chromosome 21. Genomics, 10, 1991. [78] John H. Saunders. A Survey of Object-Oriented Programming Languages. In Journal of Object-Oriented Programming, VoLl, No.6, SIGS Publications, March/April 1989. [79] H. Scheck and P. Postor. Data structures for an integrated data base management and information retrieval system. In Proceedings of the Eighth International Conference on Very Large Datbases, pp.197-207, Mexico City, September 1982. [80] "Genome Databases." Science, 254: 201-7 (1991). [81] David B. Searls. The Computational Linguistics of Biological Sequences. Technical Report CAIT-KSA-901O, Center for Advanced Information Technology, UNISYS, PA, September 1990. [82] Avi Silberschatz, Michael Stonebraker, Jeff Ullman (editors ofthe specicial issues on next-generation database systems). Database Systems: Achievements and Opportunities. Communication of the ACM, Vol.34, No.IO, pp.1l0-120, October 1991. [83] J. C. Stephens, M. 1. Cavanaugh, M. I. Gradie, M. 1. Mador and K. K. Kidd. Mapping the human genome: current status. Science, 250: 237-44 (1990). [84] Michael Stonebraker et al. On rules, proc~dures, caching and views. In Proceedings of the 1990 ACM-SIGMOD Conference on Management of Data, Atlantic City, NJ, June 1990. [85] Michael Stonebraker and Greg Kemnitz. The POSTGRES next-generation database management system. Communication of the ACM, Vo1.34, No.10, pp.78-92, October 1991. [86] J. Claiborne Stephens, Mark L. Cavanaugh, Margaret I. Gradie, Martin 1. Mador and Kenneth K. Kidd. Mapping the Human Genome: Current Status. Science, 250: 237-250, 1991. [87] Shunichi Uchida and Kaoru Yoshida. The Fifth Generation Computer Technology and Biological Sequencing. In Proceedings of Workshop on Advanced Computer Technologies and Biological Sequencing, Argonne National Laboratory, pp.28-36, November 1988. [88] Denan Wang, Hong Fang, Charles R. Cantor and Cassandra L. Smith. A Contiguous Not! Restriction Map of Band q22.3 of Human Chromosome 21. to appear in Proceedings of National Accademy of Science U.S.A., 1992. [89] James Dewey Watson and Robert Mullan Cook-Deegan. Origins of the human genome project. FASEB Journal, 5: 8-11, 1991. [90] James D. Watson, John Tooze, and David T. Kurtz. Recombinant DNA - A Short Course -. Scintific American Books NY,1983. ' [91] Kaoru Yoshida. A'UM: A Stream-Based Concurrent ObjectOriented Programming Lanugage. Ph.D thesis! Keio University, Japan, March 1990. [92] Kaoru Yoshida, Cassandra L. Smith, Charles R. Cantor and Ross Overbeek, How will logic programming benefit genome DOE Human Genome Program Con tractoranalysis? Grantee Workshop, Santa Fe, NM, February 17-20 1991. [93] Kaoru Yoshida, Ross Overbeek, David Zawada, Charles R. Cantor and Cassandra L. Smith. Prototyping a Mapping Database of Chromosome 21. Genome Mapping and Sequencing Meeting, Cold Spring Harbor Laboratory, NY, May 8-12 1991. [94] Kaoru Yoshida and Cassandra Smith. Key Features in Building a Physical Mapping Database System - Through an Experience of Developing a Human Chromosome 21 Mapping Knowledge Base System -. The International Conference on the Human Genome (Human Genome III), San Diego, CA, October 21-23, 1991. [95] David Zawada. GenoGraphics for Open Windows - v1.1 alpha. Technical Report, Argonne National Laboratory, August 1991. (GenoGraphics is available via. anonymous FTP from info.mcs.anl.gov (140.332.20.2), PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 321 Integrated System for Protein Information Processing Hidetoshi Tanaka Institute for New Generation Computer Technology (ICOT) 1-4-28, Mita, Minato-ku, Tokyo 108 Japan htanaka@icot.or .jp Abstract This paper describes requirements for databases and DBMS in protein information processing, and an experiment on a privately integrated protein knowledge base. We consider an integrated DBMS-KRL system for an integrated protein KB-DB. In order to clarify unknown functions of proteins via empirical approaches, existing public' databases should be integrated as part of a private knowledge base, which can proceed biological knowledge discovery. And DBMSKRL system should support represent ability, parallel processing, information retrieval, advanced query processing, and quality management techniques for protein information. DBMS Kappa-P and KRL QU'IXOT£, both designed at leOT, perform a useful role in processing protein information. Kappa-P is for efficiency by its parallel processing and extensibility, while QU'IXOT£ is for advanced query processing and quality management, by its representational flexibility of object identification and module. 1 Introduction Molecular biological information processing is increasing in importance, as biological laboratories are improving in their computational environments and biological data are augmenting far more faster than biologists' understanding. To speed up converting such data into biological knowledge, biological database should help biologists by providing conveniences in storing, browsing, and query processing. Molecular biological databases have two categories: public databases and private databases. Public databases have hundreds of Mbytes of various data: sequence, structure, functions, and other indispensable auxiliary information of DNA, RNA, and protein. There are variation in their sizes and data structures. As for the sizes, PIR Release 30 (1991) has the proteins of 1 residue through to 6048 residues. GenBank Release 70 (1991) has the regions (loci) of the DNA sequences of 3 bases through to 229354 bases. As for the data structures, for example, the feature descriptions of proteins or loci require multiple nested structures. A protein often consists of plural amino acid sequences. A sequence of eucaryote is often coded in several separate DNA regions (exons) with separate expression regulatory region. The structure of regulatory regions is so unclear that we can only describe them as nested patterns of DNAs. Public databases are maintained under international cooperation or specific volunteers, and provide most molecular biological data freely. Although the amount of data is increasing rapidly, recent dynamic improvements of machine environments allow biologists to store such data in their own small systems, and to use them as part of value-added private databases. Such environments also allow them to create a database including their own experimental results, make cross-references between public and private databases, add customized query processing facilities, and try to conduct knowledge discovery by extracting rules from data. In this paper, we focus on such privately integrated databases which are developed as part of the molecular biological information processing system of the FGCS project. As an example, we are building an integrated protein knowledge base in the framework of deductive object oriented database (DOOD), which consists of a knowledge representation language (KRL) QU'IXOT£ and a DBMS Kappa-P. The reason why we choose protein information is due to their moderate amount for storage and study. As biological applications are very new, we had to check the appropriateness of the system and request to add several facilities to it. We have developed Kappa-P and QU'IXOT£ on a parallel inference machine PIM. Kappa-P employs a nested relational model, and has a facility of extensible DBMS, which appears to be suitable for parallel processing and sequence retrieval. QU'IXOT£ is based on a concept of DOOD. It provides a capability of advanced query processing, rich concepts such as module, identification, subsumption relation, and flexibility in describing knowledge. 322 This paper describes two types of system integration. One is database integration of protein information. The other is DB-KB (database and knowledge base) integration in Kappa-P + QUIXOTE. These are shown in Section 6. We describe the requirements for databases and DBMSs in Section 2. Overviews of the protein databases we are focusing on are described in Section 3. The suitability of Kappa-P and QUIXOTE as ingredients of our integrated know ledge base system is discussed in Sections 4 and 5. 2 2.1 Requirements for Biological Database Systems Requirements for Databases Most of the biologists' requirements for existing molecular biological databases are concentrated into the problem: it is difficult to access several databases at once. It is because the differences between the databases: the attributes' meanings, the values' variations, and their relations, must be understood beforehand. Such requirements are solved by integrating them. There are three approaches to database integration. Standardization Standardization is the most fundamental integration. It provides the simplest environment for the wide use of databases. CODATA (Committee on Data for Science and Technology) in ICSU (International Council of Scientific Unions) proposed standardization of attributes to realize the virtual integrated database [JIPID 90]. The schema of every public database should be a subset of the virtual schema. NLM (National Library of Medicine) provides GenInfo Backbone Database [NCBI 90]. According to [NCBI 90], it is built as a standardized primary database, which is assumed to be a basis for secondary, value-added databases for the specialized interests of different biologists. Determining a standard, however, is expensive. It is almo'lt impossible to make the widest virtual schema that covers all attributes of protein information. We should accumulate experiences in creating and using most private databases as before. Moreover, both standardizations started so recently that they have not so widely distributed yet. Integrated User Interface Making an integrated user interface is the fastest way of getting an integrated environment. It generally provides not only query processing facilities but also visual browsing facilities, which are quite attractive and useful for biologists. It used to need a lot of cost, however, to . remake an interface when a new database is to be added. GeneWorks 1 and Entrez 2 provide integrated environments to enable access to existing DNA,/ amino acid sequence databases, although they are packaged for browsing only, from a PC or Mac. They are not for adding new applications or new databases. S. Smith et al. (Harvard U. ) are developing an environment for genetic data analysis (GDE3) which will help access several databases at once by providing data exchange tools between representative databases. The first version is based on a multiple alignment editor and allows tools for sequence analysis to be included in the system. It can reduce efforts for the interface remaking by rich widgets. It has also just started and further improvement is expected. Integrated Knowledge Base The integrated knowledge base is our approach. It consists of two stages: to represent all facts in one language, and to supplement the rules necessary to get and use the facts (see Fig. 1). The former corresponds to standardization, and realizes a syntactically integrated database. Not only existing public databases but also private databases are integrated. In order to provide an useful integrated database system, efficient DBMS which allows to store and to access complex data easily. The latter stage converts a database into a knowledge base, by accumulating supplementary knowledge, which are rules or facts recognized by biologists themselves. It seems almost impossible to define common operations to all knowledge just as relational algebra to relational database. Thus, new concepts had better be introduced to the mechanism, so that each (or a cluster of) knowledge can have intrinsic methods. DOOD is a promising concept for the mechanism. 2.2 Requirements for DBMS Among the requirements for databases we can find ones for DBMS or data models which require improvement in retrieval and identification. Traditional ways are not so appropriate for some molecular biological applications, e.g., sequence retrieval and quality management. Information Retrieval DBMS is expected to support the facilities of information retrieval for the sequences of DNA, RNA, and amino acids. It is partly because a concept of DBMS is rather wider for biologists than traditional one for database researchers. lIntelliGenetics, Inc., Mountain View, CA 2N ational Institute of Health, Bethesda, MD 3Genetic Data Environment 323 Knowledge Base Integration accumulation of supplementary knowledge advanced query processing knowledge base customization = by DOOD Syntactical Integration "standardization" of public databases integration of public & private databases = by DBMS Figure 1: Integrated Knowledge Base of Proteins Although sequence retrieval is like full text search, there are some differences in the search criteria: similarity search via dynamic programming (DP) or other algorithms (ex. BLAST[Altshul et al. 90]), with given similarities between characters (namely, amino acids). "Keyword" extraction from the sequence is far more difficult than from the text. In order to process large sequences, they should be preprocessed by an information retrieval technique, as strings are preprocessed to make an alphabetical index. Keyword extraction corresponds to such preprocessing. However, we should consider first, what word is in DNA or amino acid sequences. A gene is a sentence, and a DNA is a character. Thus, a word .might be a specific DNA pattern closely related to some function, which is not so clear at present. An amino acid sequence is a sentence, and an amino acid is a character. So, a word may be a structural block or a functional block, either of which is represented as an amino acids' pattern including some varieties. Determining and extracting "keywords" is one of the big problems in biology. At present, the sequences should be regarded as character strings, and not as paragraphs or sentences which consists of words. Moreover, we have to consider the following features of the sequences. • DNA and RNA sequences consists of only four characters. • Proteins mostly consist of 20 characters, but there are some exceptions. • Similarity between the characters are defined. Identification Facilities DBMS is required to provide rich identification facilities. In treating molecular biological data, we should consider at least two kinds of errors: experimental errors and identification errors. Experimental errors are inevitable in molecular biological databases. It is necessary to repeat experiments to reduce them. In reading DNA sequences, for example, the same region should be repeatedly read to verify the result. In such case, GenBank is useful in reducing efforts of verifying. though it has sequences with various qualities. It contains many staff-reviewed sequences with many references, while it also contains a lot of sequences each of which has been just registered by a researcher. Identification errors possibly occur in this verification process. It is not so clear whether we can get the sequence of the same region because of slightly different repeating regions, or natural errors such as diseases or mutations. Representation of relations between proteins and functions are more ambiguous than relations between loci and DNA sequences in the example above. We should always consider identification errors in both proteins and functions. As experimental facts are accumulated, for example, "cytochromes transfer electrons" may turn into relations such as "cytochromes and ubiquinone transfer electrons" (protein identification is relaxed) or "cytochromes transfer electrons to generate energy" (function identification is detailed). A concept of object identity is important in such cases. Results including experimental errors should be treated as different objects to store them avoiding integrity problems, whereas they should be treated as an object when we ask the "verified" result. Furthermore, identifiers should be so flexible that we can change them with as few difficulties as possible. 3 Protein Databases In Section 2, general issues on molecular biological databases and DBMS are shown. This section focuses on protein databases and overviews the reasons for using existing public databases, as well as their general use, in order to consider the necessity of an integrated knowledge base. 3.1 Public Protein Databases Protein information includes amino acid sequences, 3D structures, and functions. Protein functions include thermodynamic, chemical, and organic functions of total or partial proteins. In addition, there is important auxiliary· information such as the authors, titles, and journals of the references relating to the data, source organisms, and experimental conditions. Public protein databases have been trying to cover all protein information: amino acid sequences (PIR, SwissProt), structures (PDB), partial patterns (ProSite), enzyme functions, and restricted enzymes (REBASE). All databases contain the auxiliary information mentioned 324 above. The amount of information contained for each area is shown in Table 1. It seems possible that whole databases can be held privately in order to catalyze a change in their use: from databases accepting biological applications to knowledge bases including public databases and processing advanced biological queries. Table 1: Public Protein Databases I database I release PIR Swiss-Prot PDB ProSite REBASE 3.2 30.0 (9/91) 18.0 (5/91) (7/91) 7.0 (5/91) (1/92) I entries I 33,989 20,772 688 508 1975 size 9,697,617 residues 6,792,034 residues 153 Mbytes 1 Mbytes 16 Mbytes Purposes of Protein Databases The final goal of using protein databases is to predict the unknown functions of a protein. Biologists gather enough of a known relations between functions and proteins to predict the unknown functions of a known/unknown protein as accurately as possible. Its subgoals are important for molecular biology: • Understanding ofthe relation between protein structure and function Most of protein functions are due to their structure. Structure might be predicted from their amino acid sequences, by molecular dynamics or several kinds of empirical approaches. 3.3 General Use of Protein Databases Similarity Search Most traditional uses of protein databases are supported by traditional DBMS, except for similarity searches in the sequence database. Biologists ask the database to get a set of proteins whose name is, for example, "cytochrome c", or proteins which are found in "E.Coli." This type of retrieval is supported by the traditional DBMS. They often want to examine such a ,set of sequences to discover a description of the similarity of a certain protein set, such as the existence of consensus sequences. They use multiple alignment [Ishikawa et al. 92] after they get all sequences they want. In this case, we consider interaction between application and database in . rather higher level (see Section 6). Another important use is similarity search. They search for amino acid sequences in the database that are similar to the unknown sequence they have. The unknown sequence may be a fragment or a whole sequence. The former is motif search, which is regarded as text content search, while the latter is homology search. Biologists want to get accurate results in these searches and examination. Because the accuracy affects the quality of function prediction and structure prediction. They would like to retrieve the several of the best sequences of similar functions in the database. In order to improve recall and precision ratios in protein similarity search, plenty of biologists' empirical knowledge and experimental results are indispensable. In addition to them, two problems have to be solved: finding an efficient algorithm for the homology and motif searches, and speeding up basic retrieval. The former needs the cooperation of biologists and computer scientists, whereas the latter could be devised independently by computer scientists, for some basic operations might be taken from techniques of the partial string match in the text database. Data management • Prediction of the 3D structure from the amino acid sequence Theoretically, most of protein 3D structures are calculated by molecular dynamics. It costs, however, enormous time to compute at present. Thus various empirical approaches have been tried, and would be tried. The sequence similarity search, especially for extraction of common sequence patterns or similar regions (which are often called 'consensus sequences' in molecular biology) is a first step of empirical approaches. How to represent and how to use biological knowledge to predict unknown structures or unknown functions are other early problems. Data management, such as designing schema, storing data, and checking integrity, are owing to great efforts of the staffs of public databases. Recently, schema of existing public databases are gradually standardized (as shown in Section 2), however, each existing database still employs independent naming rules using alphanumeric symbols such as 'P08478'(PIR), 'AMD1$XENLA'(Swiss-Prot), and '1.14.17.3'(Enzyme DB). Biologists are annoyed by updating cross-references among public databases and private ones. As fot storing, public databases accept an electronic form of registration to reduce staffs' efforts for quick storing. The U.S. National Institute of Health proposes a standard format for data exchanging (ASN.l [NCBI 90]), which simplifies registration procedures and 325 is useful in gathering them into personal systems. In order to distribute recent data as quickly as possible, PIR distributes less verified data for biologists. Thus, it reduces staffs' efforts for quick checking. When such data are used, verification process is owing to the biologists who would like to use them. PIR has three kinds of indications by their verification level: ' Annotated and Classified', 'Preliminary', and 'Unverified'. It is obvious that such indications are not enough for biologists' private data management. Cooperation with biologists is indispensable in settling how to identify data with their quality and make cross reference data, although some management can be independently devised for advanced uses. 4 Kappa-P: An Extensible Parallel DBMS We use Kappa-P as an ingredient in our integrated system. Kappa-P provides several facilities suitable for protein information. The efficiency of the nested relational model of Kappa is shown in [Yokota et al. 89], where efficient usage of storage and flexibility of schema evolution are described. In this section, we .show the effectiveness of Kappa-P as an extensible DBMS for protein information processing and how to embed information retrieval facilities into Kappa-P. 4.1 Parallel DBMS As the sequence search is executed exhaustively on a full sequence, its parallel processing is obviously effective. Fig. 2 shows the cofiguration of our system. Kappa Sequential DBMS KappaP Parallel/Distributed DBMS Figure 2: Configuration of Kappa-P The aim of the extensibility of Kappa (sequential DBMS) is to reduce interaction with applications, and to customize the command interface for each application. The modules of an application which frequently use Kappa commands are included in Kappa system so that the number of communication among processes on the left-hand side of Fig. 2 decreases. Beside these facilities, another kind of extensibility must be considered in a parallel system. Parallel DBMS Kappa-P consists of server DBMS and local DBMS, where server DBMS has a global map of local DBMSs and coordinates local DBMSs to deal with users' request, while locai DBMS holds users' data [Kawamura et al. 92]. In our environment, most applications work on the same PIM with Kappa-P. So, if the server DBMS merges all the answers from local DBMSs into one answer, the effectiveness of parallel processing is reduced. In order to avoid such a situation, the user defined commands, for example, DP or BLAST, are thrown to every local DBMS, and they play a role of filter from local databases to their server. The filters select data satisfying the given conditions, and send them to the server processor. It is obviously efficient to throw application procedures to every local DBMS. The extensibility in Kappa contributes to efficient parallel processing of sequence search as in Fig. 2. 4.2 Information Retrieval Extensible DBMS is also suitable for supporting information retrieval, obviously because it allows to customize command interface for applications. Sequence similarity searches, which correspond to full text searches, are implemented easily as "Application Commands (AC)" in Fig. 2. We have developed a character-pair based index system, especially for motif search. This kind of index system is also implemented as application commands, while indexes are held in each local DBMS. Thus, the number of communications between local and server DBMS decreases. Motif dictionary such as the public database ProSite could also be used as another useful index for sequence similarity searches. Extensible DBMS is so flexible that when such an improved index is developed it could be easily added in the system. 5 A Deductive Object Oriented Database QUIXOTE: We use QUIXOT£ as another ingredient in our system. Though advanced query processing is available by any logic programming language, facilities of QUIXOT£ are more suitable to represent protein information, especially protein functions [Tanaka 91]. 326 ·. (1) modulel In this section we focus on its representation facilities in data management: schema flexibility and powerful identification. (2) (3) module2 >- modulel. module2 · . {{ object2. object3. }} 5.1 (4) (5) module3 >- modulel. module3 · . object3. Objects and Modules of quixote obj ect1. Object Identifier Objects in QUIXOTE are represented by extended terms called object terms [Yasukawa et al. 92]. An object term is of the following form: Figure 4: Examples of QUIXOTE modules 0[11 = VI, ... ,ln = v n] "obj ectl is an object in modulel" is represented as (1). (3) is an abbreviation form. (2) represents an order between modules specifying that module2 inherits all the objects from modulel, and (4) represents another inheritance. Therefore, module2 has objectl, object2, and object3, whereas module3 has objectl and object3. Although obj ect3 is in both module2 and module3, it may have different properties in each module, because any relations between module2 and module3 is not defined. We can give different properties to the same object in different modules. Thus, we can use different modules to avoid database inconsistency when we get different results by different experiments. where 0 is called the head of the object term, li is a label, and Vi is the value of the li of the object term. The labels and their values of an object term represent the properties of the object which are intrinsic to identify it. In such sense, object terms playa role of object identifiers. An object may have properties other than those specified in its object term. To represent such properties (extrinsic properties) of an object, special form of term representation called an attribute term is used: o[h = Vb"" ln = Vn]j[l~ = v~, ... , l~ = v~]. This attribute term represents that the object identified by the object term 0[11 = Vb'" , ln = v n ] has properties [l~ = v~, ... , l'm = v'm] . It is important to distinguish intrinsic properties with extrinsic ones. Simplified examples are shown in Fig. 3. (1) and (2) describe the same object and their attribute terms contradict each other, while (1') and (2') represent different objects. object_head [011 [all (1) (2) ovl, 012 avl, a12 ov2, ... J / av2, ... J fact [labell =vlJ / [labe12=v2J . fact [labell=vlJ / [labe12=v3J . (1' ) fact [labell=vl, labe12=v2J . (2' ) fact [labell=vl, labe12=v3J . 5.2 Identifiers of Proteins Requirements Since it is impossible to give the clearest identifier instantly, identification requires that the following be satisfied. (1) Subsumption relation An identifier sometimes has to be generalized or specialized. For example, the sentence "cytochromes have a certain feature" sometimes has to be reconsidered as "cytochromes and hemoglobins have a certain feature" or "cytochrome c has a certain feature." It seems rare to misidentify completely different objects. Most erroneous identifiers have to change only their abstraction level, and need not to be altered completely. (2) Flexibility Figure 3: Examples of QUIXOTE objects Such representation of protein information is quite useful, for only the attributes whose values are determined can be used for identification. It is also useful in repeating local integrity checking, as data set would not stop increasing in amount. Module Modules in QUIXOTE help object management. Simplified examples are shown in Fig. 4. In the process of determining the clearest identifier, we feel it useful if DBMS accepts tentative identifiers which can be specialized or generalized at anytime. We could use trial and error to determine the proper identifier. (3) Module To distinguish tentative identifiers from fixed ones, or experimental results from derived ones, a facility for making modules is required. It allows local integrity to be checked within the module, and for the 327 global uniqueness of the labels of the identifiers to be ignored. Flexibility along Subsumption Relation Proteins need identification transition along sub sumption relation, as shown above. Fig. 5 is an example of how they are represented in QUIXOTE:. (fact1) cytochrome[lifename= E.Coli] / [feature featX] . (fact2) cytochrome[type= c] / [feature featX] . (hyp1) cytochrome / [feature = featX] . (cf.1) (cf.2) Fig. 6 shows an example of verification process. Upper modules inherit all facts and rules of their lower modules. 'PIR', 'Swiss-Prot', and 'Experimental Results' are modules each of which allows local integrity checking. If identifiers are conflicted between these modules, they can be settled at their upper module. 'Sequence' has some rules and cross-references between PIR and Swiss-Prot so that it can select and reply a specific set of protein sequences contained in these public databases. 'Integrated' has some rules to verify experimental results by merging the selected sequences from public databases. It also has cross-references between public databases and experimental results (but they are ignored in Fig. 6 to simplify the example). cytochrome ( E.Coli. _. featX ) _. c. featX ) cytochrome ( cytochrome ( [lifename(E.Coli).type(c)]. featX ) Integrated erified_sequence[id=protA] I verify by merging I (hyp2) protein[name={cytochrome.hemoglobin}] / [feature = featX]. Experimental Results sequence [id=protA] Figure 5: Proteins as objects in QUIXOTE: Provided that there is a feature named 'featX'. As experiments are repeated, the identifier of the protein whose feature is 'featX' may be changed. QUIXOTE: expressions (fact 1) and (fact2) are examples of the identification of experimental results. (fact1) mentions nothing about the (fact2) attribute "type" , and vice versa. In the relational data model or Prolog, it is necessary to redesign the attributes of tables or arguments of facts to reflect such schema changes, since attributes have to be fixed. This is shown in (cf .1). In Prolog, we can reflect them by using lists as shown in (cf. 2) [Yoshida et aI. 91]. However, it is necessary to support a particular unifier for the list, and users must manage the meaning of the list (e.g., connected by 'and' or 'or') carefully. QUIXOTE: allows set concepts with particular semantics to avoid such mismatches. When we consider what sort of protein has 'featX' and get (fact1) and (fact2), we can easily think of a hypothesis (hyp1). We can also get this hypothesis by QUIXOTE: , using object lattice of subsumption relation. Moreover, if we give some relations among 'cytochrome' , 'hemoglobin', and 'protein', another hypothesis such as (hyp2) is available. Modules for Data Management Objects of experimental results, verified results, and public databases have to be distinguished by modules, to be checked by different integrity checking methods. PIR Swiss-Prot sequence [id=A08478] sequence [id=AMD1$XENLA] Figure 6: Modules for Verification Process 6 An Integrated System for an Integrated Knowledge Base This section shows a system integration and a DB-KB integration, as to their configuration and their uses. We are considering two kinds of integration: KappaP and QUIXOTE: (DBMS and KRL), and existing public databases and biological knowledge (DB and KB). 6.1 Configurations of Integration DBMS and KRL There are three interactions in the integrated system of a database management system Kappa-P and a knowledge representation language QUIX"OT£ (see Fig. 7). (1) Interactions between Kappa-P and QUIXOTE: All facts (non-temporal objects) in QUIXOTE: are stored in Kappa-P, and Kappa-P activates necessary objects as the result of retrieval. 328 Molecular Biological Applications Integrated Knowledge Base (3) Figure 7: Integrated System of Kappa-P and Quixote Figure 8: Integrated Knowledge Base of Proteins (2) Applications and QUIXOTE The followings are supported or will be provided (are under development). 6.2 Use of the System Application of Sequence Analysis • advanced query processing facility (inference) As QUIXOTE is an extension of Prolog, it provides more flexible and powerful query processing facility. • a standard of the molecular biological data New databases and new rules (knowledge) are easily available by supporting ASN.1. • graphical user interfaces Ad-hoc uses are quite important for biologists. The system should support ad-hoc queries, with graphical user-friendly interfaces. Kappa supports user interfaces for the nested relation and for PIR on X-window. • class libraries for biological use This would include sequence retrieval and data management (see Sections 4 and 5). Ishikawa et al. (leOT) have developed a parallel processing algorithm of protein multiple alignment [Ishikawa et al. 92]. When the multiple alignment system and the knowledge base are connected, and a new multiple alignment algorithm using motifs is developed, it becomes an integrated application and knowledge base system. This is expected to enable automatic motif extraction and motif accumulation. Advanced Query Processing The query processing facilities of QUIXOTE realize a data pool of experimental results with query processing. They act as a prototype database or knowledge base for the experiment, which accumulates queries and shows the tendency of its usage in the integrated environment. Graphical User Interface (3) Applications and Kappa-P The system should support direct access to databases for simple queries. It currently supports a graphical user interface to access amino acid sequences and some libraries to maintain biological data. Protein Databases and Knowledge Bases There are many public protein databases (see Section 3). We are holding several databases including such public ones as those shown at the bottom of Fig. 8. An oval represents a module of rules and facts, while a rectangle represents a Kappa-P database. Modules in the upper two levels are mostly rules in QUIXOTE, while ones at the bottom are mostly facts in Kappa-P. The user may ask the top-level module any queries. It can also be integrated with private databases and customized to be a private knowledge base. An example of such integration and customization is shown in Fig. 6. The system has an user interface which allows it to use both an advanced query processing interface to QUIXOTE and a browsing and query-by-example interface for Kappa-P. The query interface provides or will provide facilities of displaying examples of queries, or graphs of answers such as the relations of objects given by a recursive query. The browsing interface also provides or will provide graphical displaying facilities. We have developed a visual feature exhibition of sequences of both GenBank and PIR. 7 Conclusion The requirements of molecular biology, especially protein engineering, which is a brand-new DBMSjKRL field were overviewed. Biological applications are now shown to be stimulating for DBMS and KRL, which are required to have various functions: information retrieval, deduction, 329 identification, module concepts, extensibility, and parallel processing. Such facilities of DBMSjKRL had better be requested by (computer- )biologists. It is important to cooperate with them to conduct further research. A private knowledge base including various existing public databases will proceed biological knowledge discovery. Although we have not mentioned in this paper, distributed DBMS is also necessary in case databases and knowledge bases exceed the personal system capacity. We think DOOD with extensible DBMS also play an important role, but it will be considered in future. Acknow ledgments The au thor wishes to thank Kazumasa Yokota, Hideki Yasukawa, Moto Kawamura and other people in the QUIXOTE project and Kappa-P project for their valuable comments on earlier versions of this paper. The author also wishes to thank the people on the computerbiology mailing list for their suggestions from the viewpoint of biology. References [Altshul et al. 90] Altshul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J.: "Basic Local Alignment Search Tool", J. Mol. BioI. 215, pp.403-410, (1990). [Ishikawa et al. 92] Ishikawa, M., Hoshida, M., Hirosawa, M., Toya, T., Onizuka, K. and Nitta, K.: "Protein Sequence Analysis by Parallel Inference Machine" FGGS 92, (Jun 1992). [JIPID 90] PIR-International (JIPID): PIR Newsletter, No.3 (June 1990). [Kawamura et al. 92] Kawamura, M., Sato, H., Naganuma, K. and Yokota, K.: "Parallel Database Management System: Kappa-P" FGGS 92, (Jun 1992). [NCBI 90] NCBI: "GenInfo Backbone Database", Version 1.59, Draft (Apr 1990). [Tanaka 91] Tanaka, H.: "Protein Function Database as a Deductive and Object-Oriented Database", Database and Expert Systems Applications, SpringerVerlag, pp.481-486 (Aug 1991). [Yasukawa et ai. 92] Yasukawa, H., Tsuda H. and Yokota, K.: "Objects, Properties, and Mod'Iles in Quixote", FGGS 92, (Jun 1992). [Yokota et al. 89] Yokota, K. and Tanaka, H.: "GenBank in Nested Relation", Joint Japanese-American Workshop on Future Trends in Logic Programming (Oct 1989). [Yoshidaet ai. 91] Yoshida, K., Overbeek, R., Zawada, D., Cantor, C.R. and Smith, C.L.: "Prototyping a Mapping Database of Chromosome 21", Proceedings of Genome Mapping & Sequencing Meeting, Cold Spring Harbor Laboratory, (1991). PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 330 Parallel Constraint Logic Programming Language GDCC and its Parallel Constraint Solvers Satoshi Terasaki, David J. Hawley; Hiroyuki Sawada, Ker: Sat~h, Satoshi Menju, Taro Kawagishi, Noboru Iwayama and Alora Alba Institute for New Generation Computer Technology 4-28, Mita 1-chorne, Minato-ku, Tokyo 108, Japan Abstract Parallelization of a constraint logic programming (CLP) language can be considered at two major levels; the execution of an inference engine and a solver in parallel, and the execution of a solver in .parallel. GDCC is a parallel CLP language that satisfies this two level parallelism. It is implemented in KL1 and is currently running on the Multi-PSI, a loosely coupled distributed memory parallel machine. GDCC has multiple solvers and a block mechanism that enables meta-operation to a constraint set. Currently there are three solvers: an algebraic solver for nonlinear algebraic equations using the Buchberger algorithm, a boolean solver for boolean equations using the Boolean Buchberger algorithm, and a linear integer solver for mixed integer programming. The Buchberger algorithm is a basic technology for symbolic algebra, and several attempts at its parallelization have appeared in the recent literature, with some good results for shared memory machines. The algorithm we present is designed for the distributed memory machine, but nevertheless shows consistently good performance and speedups for a number of standard benchmarks from the literature. 1 Introduction Constraint logic programming (CLP) is an extension of logic programming that introduces a facility to write and solve constraints in a certain domain, where constraints are relations among objects. The CLP paradigm was proposed by Colmeraure[Colmerauer 87], and Jaffar and Lassez[J affar and Lassez 87]. A similar paradigm (or languages) was proposed by the ECRC group [Dincbas et al. 88]. A sequential CLP language CAL (Contrainte avec Logique) was also developed at ICOT[Aiba et al. 88]. The CLP paradigm is a powerful programming methodology that allows users to specify what (declarative knowledge) without specifying how (procedural *Current office:: Compuflex Japan Inc. 12-4, Kasuya 4-chome, Setagaya-ku, Tokyo 157 Japan knowledge). This abstraction allows programs to be more concise and more expressive. Unfortunately, the generality of constraint programs brings with it a higher computational cost. Parallelization is an effective way of making CLP systems efficient. There are two major levels of parallelizing CLP systems. One is the execution of an inference engine and constraint solvers in parallel. The other is the execution of a constraint solver in parallel. Several works have been published on extending this work from the sequential to the concurrent frame. Among them are a proposal of ALPS [Maher 87] that introduces constraints into committed-choice language, a report on some preliminary experiments in integrating constraints into the PEPSys parallel logic system[Hentenryck 89], and a framework for a concurrent constraint (cc) language to integrate constraint programming with concurrent logic programming languages[Saraswat 89]. GDCC[Hawley 91b], Guarded Definite Clauses with Constraints, that satisfies two level parallelism, is a parallel CLP language that introduces the framework of cc into a committed-choice language KL1 [Ueda and Chikayama 90], and is currently running on the Mult.iPSI, a loosely coupled distributed memory parallel logIC machine. GDCC has multiple solvers to enable a user to easily specify a proper solver for a domain: they are an algebraic solver, a boolean solver and a linear integer solver. The incremental evaluation facility is very important to CLP language solvers. That is, a solver must consider cases where constraints are dynamically added to it during execution, not only those cases where all are given statically prior to execution. The algebraic solver is used to solve non-linear algebraic equations, and can be applied to fields such as computational geometries and handling robot design problems[S. Sato and Aiba 90]. The solver uses the Buchberger algorithm [Buchberger 83, Buchberger 85] that is a method of solving multi-variate polynomial equations. This algorithm is widely used in computer algebra and also fits reasonably well into the CLP scheme sin~e it is incremental and (almost) satisfaction-complete as shown in [Aiba et al. 88, Sakai and Aiba 89]. Re- 331 cently, there have been several attempts made to parallelize the Buchberger algorithm, with generally disappointing results[Ponder 90, Senechaud 90], except for shared-memory machines[Vida190, Clarke et al. 90]. An interesting parallel logic programming approach implemented in Strand88 1 on Transputers was reported by Siegl[Siegl 90], with good speedups on the small examples shown, but absolute performance was only fair. We parallelize the Buchberger algorithm, emphasizing on absolute performance and increment ability rather than deceptive parallel speedups. The boolean solver is used to solve boolean equations and can be applied to a wide range of applications such as logic circuit design. It uses the Boolean Buchberger algorithm [Yo Sato and Sakai 88]. It is different from the original Buchberger algorithm in load-balance of the internal processes, although they are basically similar. We implemented the parallel version of this algorithm, based on behavior analyses, using some example problems. The target problems for the linear integer solver are combinatorial optimization . problems such as scheduling problems, that obtain the minimum (or maximum) value with respect to an objective function in a discrete value domain under a certain constraint set. There are many kinds of formalization to solve the optimization problem, among them an integer programming that can be widely used for various problems. Integer programming still offers many methods of increasing search speed depending on the structures of problems, even if we focus on solving strictly optimized solutions only. The Branchand-Bound method can apply to wide extent of problems independently to problem structures. We developed a parallel Branch-and-Bound algorithm, aiming to implement a high-speed constraint solver for large problems, and to perform experiments for describing parallel search problem in KLI. The rest of this paper is organized as follows. We first mention the GDCC language and its system, and describe its parallel constraint solvers. Then, program examples in GDCC are shown using simple problems. Q ~ True or False We will present a brief summary of the basic concepts of cc[Saraswat 89]. The cc programming language paradigm models computation as the interaction of multiple cooperating agents through the exchange of information via querying and asserting the information into a (consistent) global database of constraints called the store. Constraints occurring in program text are classified by whether they are querying or asserting information, into the Ask and Tell constraints as shown in Figure 1. lStrand88 is similar to 1(L1, although somewhat less powerful in that it does not support full unification. Answer constraint \!:!!) This paradigm is embedded in a guarded (conditional) reduction system, where guards contain the Ask and Tell. Control is achieved by requiring that the Ask constraints in a guard are true (entailed), and that the Tell con~ straints are consistent (satisfiable), with respect to the current state of the store. Thus, this paradigm has a high affinity with KL1. . 2.1 GDCp Language GDCC is a member of the cc language family, although it does not support Tell in a guard part. The GDCC language includes most of KLI as a subset; KLI builtin predicates and unification can be regarded as the constraints of distinguished domain HERBRAND[Saraswat 89]. Now we define the logical semantics of GDCC as follows. S is a finite set of sorts, including the distinguished sort HERBRAND, F a set of function symbols, C a set of constraint symbols, P a set of predicate symbols, and V a set of variables. A sort i.s assigned to each variable and function symbol. A finite sequence of sorts, called a signature, is assigned to each function, predicate and constraint symbol. We define the following notations. • We write v : s if variable v has sort s, -7 S if functor and sort s, and S1 S 2··· Sn S1S2 ••. Sn Parallel eLP Language '--.,..,...----.""- Figure 1: The cc language schema • f : 2 Add constrain~ Query f has signature • p: S1S2 ... Sn if predicate or constraint symbols p has signature S1S2 ... Sn. We require that terms be well-sorted, according to the standard inductive definitions. An atomic constraint is a well-sorted term of the form C(t1' t 2, . .. , t n ) where c is a constraint symbol, and a constraint is a set of atomic constraints. Let 2::: be the many-sorted vocabulary F U CuP. A constraint system is a tuple (2:::, 6., V, C), where 6. is a class of 2::: structures. We define the following meta-variables: c ranges over constraints and g,h range over atoms. We can now define the four relations entails, accepts, rejects, and suspends. Let Xg be the variables in constraints c and C/. 332 Definition 2.1.1 c entails Cl Definition 2.1.2 c accepts Definition 2.1.3 c rejects c[ Cl ~ 61= (VXg)(c => Cl) Query ~f 6 1= (3)(c A Cl) ~6 1= (VXg)(c => -,cd Note that the property entails is strictly stronger than accepts, and that accepts and rejects are complementary. Definition 2.1.4 c suspends Cl ~ C accepts Cl A -, (c entails Cl ). A GDCC program is comprised of clauses that are defined as tuples (head, ask, tell, body), where "head" is a term having unique variables as arguments, "body" is a set of terms, "ask" is said to be Ask constraint, and "tell" is said to be Tell constraint. The "head" is the head part of the KLI clause, "ask" corresponds to the guard part 2 , and "tell" and "body" are the body part. A clause (h, a, c, b) is a candidate for goal 9 in the presence of store s if sAg = h entails a. A goal g commits to candidate clause (h, a, c, b), by adding t u c to the store s, and replacing 9 with b. A goal fails if the all candidate clauses are rejected. The determination of entailment for multiple clauses and commitment for multiple goals can be done in parallel. Below is a program of pony _and_man written in GD CC. pony_and_man(Heads,Legs,Ponies,Men) alg# Heads= Ponies + Men, alg# Legs= 4*Ponies + 2*Men. true I VVhere, pony_and_man(Heads,Legs,Ponies,Men) is the head of the clause, "I" is the commit operator, true is an Ask constraint, equations that begin with alg# are Tell constraints. alg# indicates that the constraints are solved by the algebraic solver. In a body part, not only Tell constraints, but normal KLI methods can also be written. In a guard part, we can only. write read-only constraints that never change the content of the store, i:(1 the same way as the KLI guard where active unification that binds a new value/structure to an undefined variable is inhibited. But, bi-directionality in the evaluation of constraints, the important characteristic of CLP, is not spoiled by this limitation. For example, the query ?- pony_and_rnan(5,14,Ponies,Men). will return Ponies=2, Men=3. Thus, we can evaluate a constraint bi-directionally as Tell constraints have no limitations like Ask. 2.2 GDCC System The GDCC system supports multiple plug-in constraint solvers with a standard stream-based interface, so that users can add new domains and solvers. Solve guard constraints Gncc source Figure 2: System Construction The system is shown in Figure 2. The components are concurrent processes. Specifically, a GDCC program and the constraint solvers may execute in parallel, "synchronizing" only and to the extent necessary, at the program's guard constraints. The GDCC system consists of: (i) Compiler Translates a GDCC source program into KLI code. (ii) Shell Translates queries and provides rudimentary debugging facilities. The debugging facilities comprise the standard KLI trace and spy functions, together with solverlevel event logging. The shell also provides limited support for incremental querying, in the form of inter-query variable and constraint persistence. (iii) Interface Interacts with a GDCC program (object code), sends body constraints to a solver and checks guard constraints using the results from a solver. (iv) Constraint Solvers Interact with the interface module and evaluate body constraints. The decision of entailment using a constraint solver is described in each solver's section, as it differs from each algorithm adopted by a solver. 2.3 Block A handling robot design support system [So Sato and Aiba 90] has been used as an experimental application of our CLP systems for a few years. In applying GDCC to this problem, two problems arose. These were the handling of multiple contexts and the synchronization between an inference engine and solvers. 2"ask" contains constraints in the HERBRAND domain, that is, it includes the normal guards in KLl. 333 To clarify the backgrounds to these problems, we explain the handling of multiple contexts in sequential CLP language CAL. CAL has a function to compute approximated real roots in univariate non-linear equations. For instance, it can obtain values X = ±.J2 from X 2 = 2. Using this facility, the handling robot design support system can solve a givE;n problem in detail. In this example, there are two constraint sets, one that includes X = .J2, and another that includes X = -.J2. CAL selects one constraint set from these two and solves it. Then the other is computed by backtracking (i.e., the system forces a failure). In other words, CAL handles these two contexts one- by-one, not simultaneously. In committed-choice language GDCC, however, we cannot use backtracking to handle multiple contexts. There are same problems in implementing hierarchical CLP language[K. Satoh and Aiba 90, K. Satoh 90b] in GDCC. The ether problem is the synchronization between an inference engine and solvers. It is necessary to describe to the timing and the target constraints to execute a function to find approximated real roots. In a sequential CLP, it is possible to control where this description is written in a program. While in GDCC, we need another kind of mechanism to specify a synchronization point, as a clause sequence in a program does not relate to the execution sequence. A similar situation occurs when a meta operation to constraint sets is required, such as computing a maximum value with respect to a given objective function. Constraint sets in GDCC are basically treated as global. Introducing local constraint sets, however, independence of the global ones, can eliminate these problems. Multiple contexts are realized by considering each local constraint as one context. An inference engine and solvers can be synchronized at the end point of the evaluation of a local constraint set. Therefore, we introduced a mechanism, called block, to describe the scope of a constraint set. We can solve a certain goal sequence with respect to a local constraint set. The block is represented in a program by a,builtin predicate call, as follows. call( Goals) using Solver-Package for Domain initial Input-Con giving Output-Con Constraints in goal sequence Goals are computed in a local constraint set. "using Solver-Package for Domain" denotes the use of Solver-Package for Domain in this block. "initial Input-Con" specifies the initial constraint set. "giving Output-Con" indicates that the result of computing in the block is Output-Con. Both local variables and global variables can be used in a block where the local variables are only valid within the block and the global ones are valid even outside the block. Local variables are specified by the builtin predicate alloc/2 that assigns variables to a block. Variables that are not allocated in a block are assumed to be global. Top level block Child block GDCCShell / KLl/PIMOS Listener ~.- .-. ri::-: ........ ". Create process ------~ Access streams ~ Constraints sets .......... -~ Figure 3: Implementation of block in GDCC A block is executed by evaluating Goals with respect to Input-Con. The result of Output-Con is a local constraint set, that is, it is never merged with the global ones unless specified explicitly by a user. Let us consider the next program. test:- true I alloc(200,A) , alg#A=-1, call( alg#A=1 call( alg#A=O initial nil giving CO, initial nil giving C1. This program returns the constraint set {A = I} as CO and the constraint set {A = a} as C1. The block mechanism is implemented by the three modules shown in Figure 3; an inference engine(block), a block handler and constraint solvers. To encapsulate failure in a block, the shoen mechanism of PIMOS[Chikayama et al. 88] is used. The block handler creates a block process, sends constraints from a block to a constraint solver, and goals to other processors. Each GDCC goal has a stream connecting to the block handler to which the goal belongs. 3 Parallel Constraint Solvers 3.1 Algebraic Solver 3.1.1 Domain of Constraint A constraint system that is the target domain of the algebraic solver is generally called a nonlinear algebraic polynomial equation. According to the definitions in Section 2.1, this can be formalized as the constraint system (~= FuCUP,6, V,C), where: S F {A} {x : AA -+ A, + : AA -+ A} U {fraction :-+ A} 334 n c {=} P {string starting with a lowercase letter} {string starting with an uppercase letter} axioms of complex numbers V 6. {:? n L: O:'i < L: /3;, ;=1 i=l n {:? or, n (L: O:'i, -O:'n, .. ·, -0:'2) - f' It(p') then F:=FU {It(p')+Rest(p)}, R:=R- {p} else R:= (R- {p}) U {Lt(p)+Rest(pHRU{fl}} endif end for F := F U Spoly(J', R)t, R:= R U {f'} endif endwhile output R (R is G(E» t: Spoly(J', R) is to be generated by S-polynomials between polynomial f' and all elements in rule set R. Definition 3;1.6 (Grabner basis [Buchberger 83]) Figure 4: Sequential Buchberger algorithm The Grabner basis G(E) is a finite set that satisfies the following properties. (i) X(E) = X(G(E)) (ii) For all f, g) f -g E X(E) iff f la= g la) especially) f E X(E) iff f la= 0) and) (iii) G is reduced if every element of the basis is irreducible w. r. t. all the others. From Theorem 3.1.1, the reduced G(E) can be regarded as being the canonical form of the solution of E = 0, because the reduced Grabner basis with respect to a given admissible ordering is unique. Moreover, when E = 0 does not have a solution, {I} E G(E) is deduced from Corollary 3.1.1. Definition 3.1.7 (Critical pair, S-polynomial) If two rewriting rules iI, h are not mutually prime) that is Lp(Jl) and Lp(h) have a' greatest common divisor other than 1) the pair fl, h is called the critical pair) and the polynomial made from this critical pair in the following way: L (f )lcm(Jl,f2) . f - L (f ) lcm(Jl,f2) . f c 2 Lp(iI) 1 c 1 Lp(J2) 2 is called S-polynomial and denoted by SpOly(Jl,h). where) lcm(Jl,fz) is the least common multiple of Lp(fl) and Lp(f2). Figure 4 shows the sequential version of the Buchberger algorithm. E denotes the input polynomial equation set, and R is the output Grabner basis. Line (4) indicates the rewriting process using R. Lines (7), (8) and (9) are the subsumption test in which the old rule set is updated by the newly generated rule. If the left hand side of an old rule is rewritten by the new rule, the rewritten rule goes back to equation set F. Line (12) is the S-polynomial generation. 3.1.3 Satisfiability, Entailment Based on the above results, we could determine satisfiability by using the Buchberger algorithm to incorporate the polynomial into the Grabner bases as per Corollary 3.1.1. But the method of Definition 3.1.6(ii) is incomplete in terms of deciding entailment, since the relation between the solutions and the ideal described in Theorem 3.1.1 is incomplete. For example, the Grabner basis of {X2 = O} is {X2 -+ O}, and rewriting using this Grabner basis cannot show that X = 0 is entailed. There are several approaches solving the entailment problem: (a) Use the Grabner basis of the radical of the generated ideal, X, i.e. {plpn E X}. Although it is theoretically computable, efficient implementation is not possible. (b) As a negation of p = 0, add pet. to the Grabner basis and use the Buchberger algorithm, where et. is a new variable. Iff 1 is included in the new Grabner basis, p= 0 is held in the old Grabner basis. This has the unfortunate side-effect of changing the Grabner basis. (c) Find n such that pn is rewritten to 0 by the Grabner basis of the generated ideal. Since n is bounded[Cangilia et a!. 88], this is a complete decision procedure. The bound, however, is very large. When there are a lot of resources to compute, and no more computation can be done, according to the method described in (c) we may adopt the incremental solution of repeatedly raising p from a small positive integer power and rewriting it by the Grabner basis. On the other hand, the total efficiency of the system is greatly affected by the 336 computation time in deciding entailment. Therefore, we determine the entailment by rewriting using a Grabner basis from the view point of efficiency, even though this method is incomplete. This decision procedure runs on the interface module parallel with the solver execution, as shown in Figure 2. Whenever a new rule is generated, the solver sends the new rule to the interface module via a communication stream. The interface determines entailment while storing (intermediate) rules to a self database. The interface updates the database by itself whenever a new rule from the solver arrives. It can also handle constraints such as inequalities in the guard parts, if they can be solved by passive evaluation. 3.1.4 Parallel Algebraic Solver There are two main sources of parallelism in the Buchberger algorithm, the parallel rewriting of a set of polynomials, and the parallel testing for stibsumption of a new rule against the other rules. Since the latter is inexpensive, we should concentrate on parallelizing the coarse-grained reduction component for the distributed memory machine. However, since the convergence rate of the Buchberger algorithm is very sensitive to the order in which polynomials are converted into rules, an implementation must be careful to select "small" polynomials early. Three different architectures have been implemented; namely, a pipeline, a distributed, and a master-slave architecture. The distributed architecture was already reported in [Hawley 91a, Hawley 91b], however, it has been greatly refined since then. The master-slave architecture also offers comparatively good performance. Thus, we touch on the distributed and master-slave architectures in the following sections. Distributed architecture The key idea underlying the distributed architecture is that of sorting a distributed set of polynomials. Each processor contains a complete set of rewriting rules and polynomials, and a load-distribution function w that logically partitions the polynomials by specifying which processor "owns" which polynomials. The position in the output rule sequence of each polynomial is calculated by its owning processor, based on an associated key (the leading power product), identical in every processor, and which does not change during reduction. A polynomial is output once it becomes the smallest remaining. The S-polynomials and subsumptions are calculated independently by each processor, so that the processors' sets of polynomials stay synchronized. As a background task, each processor rewrites the polynomials it owns, starting with those lowest in the sorted order. Termination of the algorithm is detected independently by each engine, when the input equation stream is closed, and when there are no polynomials remaining to be rewritten. Input Eqns Legend = Multi-writer stream c:=::J Not owned 1::::::::::::::::::::1 Owned Figure 5: Architecture of distributed type solver Figure 5 shows the architecture. The central data structures in the implementation are two work item lists: the global list and the local list. The global list, that contains all polynomials including both owned and not owned polynomials, is used to decide the order in which a processor can output a new rule based on the keys of polynomials. On the other hand, the local list consists of owned polynomials only. Items in the local list are rearranged by each processor to maintain increasing key order, whenever an owned polynomial is rewritten. There will be a situation where, when a processor is busy rewriting polynomials, another processor outputs a new rule. In such a case, any processor that receives a new rule must quit the current task as soon as possible to check subsumption and to update the old rule set. Continuing tasks while using the old rule set without interruption increases the number of useless tasks. To manage such interruption and resumption of rewriting, the complete execution of one piece of work is broken down into a three-stage pipeline; first polynomials are rewritten until the leading power products can be reduced no further, they are fully reduced, and thirdly the coefficients are reduced by taking the greatest common divisor among all coefficients of a polynomial. Based on this breakdown, we pipeline the execution of the entire list, giving us.maximum overlap between communication and local computation. Table 1 shows the results of benchmark problems to show the performance of this parallel algorithm, the benchmark problems are adopted from [Boege et al. 86, Backelin and Froberg91]. The monomial ordering is degree reverse lexicographic, and low level bignum (mul- 337 Table 1: Timing (sec) and speedup obtained with distributed architecture Problems Katsura-4 Katsura-5 Cyc5-roots Cyc6-roots 1 9.86 1 94.89 1 37.24 1 1268.96 1 Number of processors 2 4 8 7.48 5.34 4.82 2.05 1.32 1.85 62.43 48.20 39.95 1.52 1.97 2.38 20.02 22.52 33.33 1.12 1.86 1.65 1396.37 1555.58 817.07 0.909 0.816 1.55 16 5.94 1.66 40.52 2.34 29.73 1.25 3266.68 0.388 tiple precision integer) support on PIMOS is used for coefficient calculation. The method of detecting unnecessary S-polynomials proposed by [Gebauer and Moller 88] is implemented. Examples and their variable ordering are shown below. generating S-polynomials and detecting unnecessary Spolynomials are overlapped with every processor, second, the selection criteria of the next new rule is only a rough approximation as the keys of not owned polynomials are never updated during rewriting. Master-slave architecture In the distributed architecture, if the keys of other polynomials are updated according to their rewriting such that the global smallest polynomial can be found, then much communication between the processors is required. One simple way of avoiding such communication overhead is to have each processor output the local minimum polynomial and another processor decide the global minimum among them. Our third trial, therefore, is the masterslave architecture shown in Figure 6. < UI < U2 < U3 < U4) U6 - uo + 2ul + 2ui + 2U§ + 2Ul = 0 2UOUI + 2UI U2 + 2U2U3 + 2U3U4 - UI = 0 2UOU2 + 2Ul + 2U I U3 + 2U2U4 - U2 = 0 2UoU3 + 2U I U2 + 2UI U4 - U3 = 0 Uo + 2U I + 2U2 + 2U3 + 2U4 - 1 = 0 Katsura-4: (Uo < UI < U2 < U3 < U4 < Us) U6 - uo + 2Ul + 2ui + 2U§ + 2ul + 2Ug = 0 2UoUI + 2U I U2 + 2U2U3 + 2U3U4 + 2U4US - UI = 0 2UOU2 + 2Ul + 2U I U3 + 2U2U4 + 2U3US - U2 = 0 2UOU3 + 2U I U2 + 2UI U4+ 2U2US - U3 = 0 2UOU4 + 2U I U3 + 2UI U5 + U? - U4 = 0 Uo + 2U I + 2U2 + 2U3 + 2U4 + 2Us - 1 0 Katsura-5: (Uo Figure 6: Architecture of master-slave type solver = Cyclic 5-roots: (Xl < X 2 < X3 < X 4 < X 5) Xl +X2 +X3 +X4 +X5 = 0 X I X 2 + X 2X 3 + X 3X 4 + X 4X s + XSXI = 0 X I X 2X 3 + X 2X 3X 4 + X 3X 4X S + X 4X SX 1 + X SX 1 X 2 = 0 X1X2X3X4 + X 2X 3X 4X S +X3X 4X 5X 1 + X 4X SX 1 X 2 + X SX 1 X 2X 3 = 0 XIX2X3X4X5 = 1 Cyclic 6-roots: (Xl < X 2 < X3 < X 4 < Xs < X 6) Xl + X 2 + X3 + X 4 + Xs + X6 = 0 X I X 2 + X 2X 3 + X 3X 4 + X 4X S + XSX6 + X 6X I , X 1 X 2X3 + X 2X 3X 4 + X 3X 4X S +X4X 5X 6 + XSX6Xl + X 6X 1 X 2 = 0 =0 X 1X 2X 3X 4 + X 2X 3X 4X S + X3 X 4XSX6 +X4XSX6 X l + X 5X 6X 1 X 2 + X 6X 1 X 2X 3 = 0 XtX2X3X4XS + X2X3X4XSX6 + X3X4X5X6Xl +X4X5X6XIX2 + XSX6XIX2X3 + X6XIX2X3X4 XtX2X3X4X5X6 = 1 =0 Sometimes parallel execution is slower than sequential execution. Moreover a serious draw back occurs in the case of "cyclic 6-roots". The reasons are; first, redundant tasks increase in parallel since updating a rule set, The set of polynomials E is physically partitioned and each slave has a different part of them. The initial rule set of G(E) is duplicated and assigned to all slaves. New input polynomials are distributed to the slaves by the master. The reduction cycle proceeds as follows. Each slave rewrites its own polynomials by the G(E), selects the local minimum polynomial from them, and sends its leading power product to the master. The master processor awaits reports from all the slaves, and selects the global minimum power product. The minimum polynomial can be decided only after all the slaves have reported to the master. Those that are not minimums can be decided quickly, however. Thus, the not-minimum message is sent to the slaves as soon as possible, and- the processors receive the not-minimum message reduce polynomial by the old rule set while waiting for a new rule. On one hand, the slave that receives the minimum message converts the polynomial into a new rule and sends it to the master, the master sends the new rule to all the slaves except the owner. If several candidates are equal power products, all candidates are converted to rules by owner slaves and they go to final selection by the master. To make load balance during rewriting, each slave reports the number of polynomials it owns, piggybacked 338 onto leading power product information. The master sorts these numbers into increasing order and decides the order in which to distribute S-polynomials. After applying the unnecessary S-polynomial criterion, each slave generates the S-polynomials it should own corresponding to the order decided by the master. Subsumption test and rule update are done independently by each slave. Table 2 lists the results of the benchmark problems. The monomial ordering, bignum support and variable ordering are same as for the distributed architecture. Both absolute performance and speedup are improved compared with the distributed architecture. Speedup appears to become saturated at 4 or 8 processors except for "cyclic 6-roots". However, these problems are too small to obtain a good speedup because it takes about half a minute until all the processors become fully operational as the unnecessary S-polynomial criterion works well. Therefore, the relation between an ideal and solution and the relation between a solution and a Grabner basis is complete in a boolean polynomial. Thus, entailment can be decided by rewriting a guard constraint by a Grabner basis. The Boolean Buchberger algorithm differs from the (algebraic) Buchberger algorithm in the following points. That is, we have to consider self-critical pairs as well as critical pairs, where a self-critical pair polynomial (SCpolynomial) of boolean polynomial f is defined as X f + f for every variable X of Lp(J). As shown (ii) above, the coefficient calculation in the boolean solver is much cheaper than the algebraic solver, while self-critical pairs have to be considered. Thus, the load-balance of this algorithm is completely different from that of the algebraic solver. 3.2.1 Table 2: Timing and speedup of the master-slave architecture Problems Katsura-4 (sec) Katsura-5 (sec) Cyc.5-roots (sec) Cyc.6-roots (sec) 1 8.90 1 86.74 1 27.58 1 1430.18 1 Number of processors 2 4 8 5.83 6.53 7.00 1.27 1.53 1.36 31.89 57.81 39.88 2.72 2.18 1.50 21.08 19.27 19.16 1.43 1.44 1.31 863.62 433.73 333.25 3.30 4.29 1.66 16 9.26 0.96 36.00 2.41 25.20 1.10 323.38 4.42 Analysis of Sequential Algorithm and Parallel Architecture The sequential Boolean Buchberger algorithm is shown in Figure 7. Here EQlist is a list of input boolean constraints and GB is a Boolean Grabner basis. Numbers (1) to (6) indicate the step number of the algorithm. From Figure 7 we can see that the following are possible for parallel execution; (i) polynomial rewriting in step 6, (ii) monomial rewriting (lower granularity of (i)), (iii) subsumption test in step 4, (iv) SC-polynomial generation in step 5, and 3.2 Boolean Constraint Solver An algorithm called the Boolean Buchberger algorithm [Yo Sato and Sakai 88] has been proposed for boolean constraints. Boolean constraints are handled differently from algebraic constraints in the following points. (i) Multiplication and addition are logical-and and exclusive-or, respectively, in boolean constraints. (ii) Coefficients are boolean values, that is, 1 and O. So, a monomial is a product of variables. (iii) The power of a variable is equal to the variable itself (xn = X). So, a monomial is actually a product of distinct variables. From the property (iii), the theorem of a boolean polynomial that corresponds to Theorem 3.1.1 is as follows. Theorem 3.2.1 (Zero point theorem) Let f be a boolean polynomial. Every solution of E = 0 is also a solution of f=O, iff f E I(E). (v) S-polynomial generation in step 5. Since there is a communication overhead in the distributed memory machine, we have to exploit the most coarse-grained parallelism. To design a parallel execution model, we measured the execution time in each step in Figure 7 using two kinds of example program. One is a logic circuit problem for a counter circuit that counts the number of l's in a three-bit input and outputs the results as a binary code. The other is the n-queens problem where 4 queens have 80 equations with 16 variables, 5 queens have 165 equations with 25 variables, and 6 queens have 296 equations with 36 variables. The time ratio for each step is shown in Table 3. Table 3: Time ratio of each step (%) Problem 4queens 5queens 6queens circuit 1 25.8 6.4 1.0 2.1 2 3.2 3.4 1.5 4.2 Step number 3 4 5 6 8,4 17.3 25.4 19.0 22.3 3.7 14.5 50.4 15.0 2.0 2.6-- 77.7 8.2 3.3 8.0 74.0 Total(sec) 1.8 53.5 2240.0 70.7 339 Table 4: Time ratio in modified algorithm (%) input EQlist, GB EQlist := {p E EQlist I p 1GB# O} while EQlist (1) #0 q:= min{Lp(p) I p E EQlist} choose e E {p E EQlist I Lp(p) EQlist := EQlist - {e} r = e 1GB, RWlist := 0 { (2) = q} for every p E GB if Lp(p) =>r pi then GB:= GB - {p} RWlist := RWlist U {pi else GB:= (GB - {p}) (3) Problem 4queens 5queens 6queens circuit U {Lp(p) + Rest(p)} + Rest(p) 1GBu{r}} endif endfor Update Info Reducible EQs GB:=GBU{r} RWlist := RWlist U {pi + Rest(p)} ~ Reduced EQs • • Update Info Reducible EQs ~ • Figure 8: Parallel execution model endif endfor RWlist := RWlist U SCpoly(r)t U Spoly(r, GB) (5) while RWlist # 0 choose p E RWlist RWlist := RWlist - {p} (6) ~ ~ for every p E EQlist if Lp(p) =>r pi then EQlist := EQlist - {p} Step number 3' 4' 5' 12.1 23.5 36.4 24.7 11.2 54.0 15.5 39.9 41.9 8.2 33.5 51.7 ifp # 0 then if Lp(p) =>GB pi then RWlist := RT¥list U {pi else EQlist := EQlist U {p} endif endif endwhile endwhile output GB t: + Rest(p)} SCpoly( r) indicates the set of all self-crirical pair polynomials for r. Figure 7: Booelan Buchberger algorithm We can consider another parallel execution model by modifying the algorithm. Although Figure 7 shows all the reducible polynomials lumped together and rewritten in step 6, this reduction may be distributed to steps 3, 4 and 5. Moreover, reduction may be done in each step independently. Let steps 3', 4' and 5' denote the modified steps 3, 4 and 5. If execution times of steps 3', 4' and 5' are balanced after applying the modification to the algorithm, this model is also a good parallel execution model. However, as shown in Table 4, the times are not balanced. So, we can discard this possibility of parallelization. From the above analysis, it becomes clear that step 6 is the largest part of the execution, the other parts being small. Therefore, we can determine the masterslave parallel execution model to make the best use of parallelism in step 6, as shown in Figure 8. The. controller (master) is in charge of step 1 to step 5 in the algorithm and the other reducers (slaves) reduce polynomials by G B. The message from the controller to the reducers consists of update information for G Band the polynomials to be rewritten. After receiving the message, the reducer first updates its current GB according to the update information, rewrites the polynomials from the controller, and finally sends the results of the reduction to the controller. As the controller becomes idle after sending the message, the controller also acts as a reducer during the reduction process. The number of polynomials sent to each reducer is kept as equal as possible to balance the loads for each processor. 3.2.2 Implementation and Evaluation Having implemented the above parallel execution model in KL1, the following improvement was made. Improvement 1 We can remove redundant equations from EQlisi, produced by deleting rules in step 3, prior to their distribution. Although this removal can be done in each reducer, the distributed tasks may not be well balanced since the removal of tasks is much less involved than reduction. Improvement 2 We can distinguish rules of the form "x = A" ( "A" is variable) from other rules since these rules express assignments only and we need not consider SC-polynomials nor S-polynomials for these rules. These rules are stored differently in the controller and, if a new equation is input, we first apply these assignments in the controller to the equation. 340 By this application, reducers do not have to store such rules and the time needed to generate an SCpolynomial and S-polynomial can be saved. Improvement 3 If the right hand side (RHS) of a rule is 0, then no SC-polynomial can be produced. If both RHSs of two rules are 0, then an S-polynomial cannot be produced. Therefore, the RHS of a rule is checked first. This technique is also effective for the sequential version. Table 5 lists the execution times and the improvement ratio for the 6 queens. problem. Table 5: Timing and improvement ratio Number of PEs Original version (sec) Improved version (sec) Improvement ratlO(%) 1 3735 2489 66.6 2 2400 1706 71.1 4 1745 1223 70.1 8 1539 1142 74.2 16 1262 1092 86.5 Let a purely sequential part in the parallel execution model be a and its parallel executable part be b. Then, we can approximate the execution time for n PEs as (a+b)/n. By calculating a and bfrom the data, we obtain a = 1130, b = 2590 for the original version, and a = 930, b = 1540 for the improved version. This means that the parallel executable part constitutes 70% to the entire execution for the original version and 62% for the improved version. Since we parallelized the sequential algorithm to obtain the original version, 70% is a satisfactory ratio for parallel execution since this ratio is very near to the upper bound value calculated from the analysis of the sequential algorithm. The difference is caused by the task distribution overhead. In the improved version, the ratio of the parallel executable part is decreased because of the increase in the number of controller tasks. However, this result is encouraging since the overall performance is improved. 3.3 Integer Linear Constraint Solver The constraint solver for the integer linear domain checks the consistency of the given equalities and inequalities of rational coefficients, and gives the maximum or minimum values of the objective linear function under these constraint conditions. The integer linear solver utilizes the rational linear solver for the optimization procedure to obtain the evaluation of relaxed linear problems created as part of the solution. A rational linear solver is realized by the simplex algorithm. The purpose of this constraint solver is to provide a fast solver for the integer optimization domain by achieving a computation speedup by incorporating the search process into a parallel program. These solvers can determine satisfiability and entailment. Satisfiability can be easily checked by the simplex algorithm. Entailment is equivalent to negation failure with respect to a constraint set. In the following we discuss the parallel search method employed in this integer linear constraint solver. The problem we are addressing is a mixed integer programming problem, to find a maximum or minimum value of a given linear function under integer linear constraints. The method we use is the Branch-and-Bound algorithm. The Branch-and-Bound algorithms proceed by dividing the original problem into two child problems successively, prod ucing a tree-structured search space. If a certain node gives an actual integer solution (that is not necessarily optimal), and if other search nodes are guaranteed to have lower objective function values than that solution, then the latter nodes need not be searched. In this way, this method prunes sub-nodes through the search space to effectively cut down computation costs, but those costs still become quite high for large-scale problems, since the costs increase in an exponentially with the size of the problem. As a parallelization of the Branch-and-Bound algorithm, we distribute the search nodes created through the branching process to different processors, and let these processors work on their own sub-problems sequentially. Each sequential search process communicates with other processes to prune the search nodes. Many search algorithms utilize heuristics to control the schedule of the order of the sub-nodes to be searched, thus reducing the number of nodes needed to obtain the final result. Therefore it is important, in parallel search algorithms, to balance the distributed load among processors, and to communicate information for pruning as quickly as possible between these processors. We adopted one of the best search heuristics used in sequential algorithms. 3.3.1 Formulation of Problems We consider the following mixed-integer linear optimization problems. Problem - ILP Minimize the following objective function of real variables x j and integer variables Yj, n Z = 2: PiXi + 2: qiYi i=l i=l under the linear constraint conditions: n m 2: aijXj + 2: bijYj 2: ej i=l for 1 S:.j S:.l i=l n m i=l i=l 2: CijXj + 2: dijYj = Ii for 1 S:.j S:. k where is:. Xi E R and Xi 2: 0 for 1 S:. n YiEZ where lis:'Yis:'ui and li,uiEZ for 1S:.iS:.m aij, bij , Cij, dij , ei, fi are real constants. 341 In practical situations integer variables Yj often take only 0,1, but here we consider the general case. 3.3.2 Sequential Branch-and-Bound Algorithm As a preparation to solve the above mixed-integer linear problems I LP, we consider the continuously-relaxed problem LP. Problem - LP Minimize the following ob jective function of real variables Xj,Yj, Z = n m i=l i=l I: PiXi + I: qiYi under the linear constraint conditions: n m i=l i=l m I: aijXj + ~ bijyj ~ ej n I: CijXj + I: dijYj = Ii i=l for 1 S;j S; 1 While the above branching process can only enumerate integer solutions, if we have a means of guaranteeing that a sub-problem cannot have a better solution than the already obtained integer solutions in terms of the optimum value of the objective function, then we can skip these sub-problems and need only search for the remaining nodes. For mixed-integer linear problems we can use the solutions for continuously relaxed problems as a criterion for pruning. Continuously relaxed problems always have a better optimum value for the objective function than the original integer problems. Sub-problems whose continuously relaxed problems have no better optimum than the already obtained integer solution cannot give a better optimum value, hence it becomes unnecessary to search further (bounding procedure). Branch-and-Bound methods repeat branching and bounding in this way to obtain the final optimum. These sub-problems obtained through the branching process denote search nodes. for 1 S;j 5: k Sequential algorithm i=l where E R and Xi ~ 0 for 1 S; i 5: n Yi E R where Ii S; Yi S; Ui and li, Ui E Z for 1 S; i S; m Xi aij, bij , Cij, d ij , ei, Ii are real constants. LP can be solved by the simplex algorithm. If the values of original integer variables are exact integers, then it also gives the solution of ILP. Otherwise, we take a non-integer value Ys for the solution of LP, and impose two new interval constraints Ys, Is S; Ys S; [ys] and [ys] + 1 S; Ys S; Us, where Ys is an integer variable, and obtain two child problems (Figure 9). Continuing this procedure, called branching, we continue to divide the search space to produce more constrained sub-problems as we proceed deeper into the tree structured search space. Eventually this process leads to a sub-problem having a continuous solution that is also an integer solution to the problem. Also we can select the best integer solution from those found in the process. Step 0 Initial setting Let ILPo mean the original problem ILP, and N mean the set of search nodes. Set N to {ILPo }, and solve a continuously relaxed problem LPo ' If an integer solution is obtained go to Step5. Otherwise set the incumbent solution z to (X) and go to Step1. Step 1 Selecting branching no de If N = 0, then go to Step5. If N =I- 0, then select the next branching node I LPk out of N following the heuristics, and go to Step2. Step 2 Selecting branching variable and branch Select the integer variable Ys to be used for the branching process to work on I LPk according to the heuristics, and branch with respect to it. Let the resulting two nodes be ILPk " ILPk " Go to Step3. Step 3 Continuously relax two nodes Solve two continuously relaxed problems LPk , and LPk " by the simplex algorithm. Go to Step4. z: ::; Ys ::; [ii:J W:J+l::; Ys::; ii:' = W:J ii:" = W:J +1 u: Figure 9: Branching of nodes Step 4 Fathom two children nodes If relaxed problem LPk , does not have a solution, or gives a solution Zk' that is no better than the incumbent solution, in other words Zk' > z, then stop searching (bounding operation). If the point (x k ', f/J to achieve a solution Zk' has integer value y and moreover gives a better solution than the incumbent solution obtained so far, in other words Zk' < Z, then let z = Zk', X = xk' and fJ = yk' (revision of the incumbent). 342 If (X k ', i/) is not an integer solution and gives a better optimum value than the incumbent, then add this node, N:= N U {I LPk ,} (Addition of a node). Do the same thing to I LPk ", and go to Step1. Step 5 End step If Z i 00, then let the incumbent mum solution. (x, y) be the opti- If Z = 00, then problem I LP has no solution. 3.3.3 Heuristics for Branching The following two factors determine the schedule of the order in which the sequential search process goes through the nodes in the search space: Selection of the branching variable To select the branching variable when trying to branch at the node I LPk , v-i. If no incumbent solution is found, select the variable yj from those integer variables that do not take exact integer values in (xk, f/), and which gives the greatest difference between the two increases in the heuristic value, namely the one to attain maxj{lpup(j)(1- fj) - Pdown(j)!i Ii !inon-integer} v-ii. If an incumbent solution is found, select the variable yj out of those integer variables that do not take exact integer values in (xk, f/), and which gives the maximum of the minimum value of the left and right side heuristic values, namely that to attain maxj{min{pup(j)(l-!i),Pdown(j)fii finon-integer} 1. The priorities of sub-problems(nodes) to decide the next node on which the branching process operates. 3.3.4 2. Selection of a variable out of the integer variables with which the search space is divided. It is preferable that the above selections are done in such a way that the actual nodes, searched in the process of finding the optimal, form as small a part as possible within the total search space. We adopted one of the best heuristics of this type from operations research as a basis of our parallel algorithm([Benichou et al. 71]). Selection of sub-problems We use a combination of depth-first strategy and bestfirst strategy(w.r.t. heuristic function). In each branching process, what is called the pseudo-costs Pup(j), Pdown(j) of integer variables Yj are computed. These are the increase ratios of the optimum value of the continuously relaxed problem with regard to those integer variables. In the next heuristic function h( I LPk ) of the node is calculated: h(I LPk) = Zk + Lj~l min{pup(j)(l- fJ,Pdown(j)!i}, !i = YJ - [YJ]' Suppose the node I LPk is divided into I LPk , and ILPk ". n-I. When at least one of these two nodes is not yet terminated, select the one having a better(i.e., smaller) heuristic value h(I LP) as the next branching node (depth-first). n-ii. When both have terminated, a. if no incumbent solution has yet been found, select the latest node to which branching has been done (depth-first). b. if an incumbent solution has already been found, select the node having the best heuristic function value (best-first). Parallel Branch-and-Bound Method The parallel algorithm derived from the above sequential algorithm is implemented on Multi-PSI. Our parallel algorithm exploits the independence of many sub-processes created through branching in the sequential algorithm, distributing these processes to different processors. What is necessary here is that the search space is divided as evenly as possible among processors to achieve good load balance, and that the pruning operation is performed by all the processors simultaneously. Also, incumbent solutions found in each processor need to be communicated between processors. The details of the parallel algorithm is described in the following. Load balancing One parent processor works on the sequential algorithm up to a certain depth d of the search tree. It then creates 2d child nodes and distributes them to other processors as shown in Figure 10. These search nodes are allocated to different processors cyclically, where each of the processors works on these sub-problems sequentially. Therefore, load balancing is static in this case. Distribution is done only at a certain depth of the search tree, to prevent the granularity of a node from being too small and to decrease the communication costs. Heuristics for pruning Each processor has a share of a certain number of subproblems assigned, and works on these nodes with the same heuristics of branching node selection and branching variable selection as those of the sequential case. For the node selection heuristics, we use the priority control facility of KL1, to assign priorities to the search nodes on which the best-first strategy with the heuristic function can depend. (See lOki et al. 89] for details of this /' technique.) 343 Table 6: Speedup Processors Speedup N umber of nodes 1 1.0 242 2 1.5 248 4 1.9 395 8 2.3 490 One of the problems in parallel search algorithms is how to decrease the growth of the size of the total search space compared with the sequential search algorithms. Figure 10: Generation of parallel processes Transfer of global data While the search space is distributed among different processors, if the information to prune nodes is not communicated well among them, then the processor has to work on unnecessary nodes, and the overall work becomes larger compared with the sequential version. This causes a reduction in the computation speed. Therefore, incumbent solutions are transferred between processors to be shared so that each processor can update the current incumbent 'solution as soon as possible (Figure 11). This is realized by assigning a higher priority to the goal responsible for data transfer in the program. o t o Parent node Downt up Down2 0 Child node Figure 11: Report stream between nodes 3.3.5 GDCC Program Examples Example 1 : integer programming The following program is a simplified version of the integer programming used to find the integer solution that gives the minimum (or maximum) value of an objective function under given constraints. This program shows the basic structure of the Branch-and-Bound method. module pseudo_integer_programming. public integer_pro/3. integer_pro(X,Y,Z):- true call ((simplexIX>=5, simplexIX+2*Y>=-3, simplexIX+Y-Z<=5)) initial nil giving Co, take_min(Co). take_min(Co):- true cal1(simplexlmin(X+Y,Ans)) initial Co giving Coi, (Ans={minusinfinite,_} -> error; otherwise; Ans={_, [X=ValX!_]} -> check(ValX,Co)). Node ILPk Down~pl~p2 o 4 Experimental Results We implemented the above parallel algorithm in KLl, and experimented with job-shop scheduling problem. Table 6 shows. a result of computation speedups for a 4job3machine problem and the total number of searched nodes to get to the solution. The situation often occurs where a processor visits an unnecessary node before the processor receives pruning information. This is because communication takes a time, and certainly cannot be instantaneous, in a distributed memory machine. Table 6 shows a case where this actually happens. check(ValX,Co):- kli!integer(ValX) solve_another_variables(Co). otherwise. check(ValX,Co):- true! floor(ValX, SupX, InfX) , call (simplexIX=
=SupX) initial Co giving Co2, take_min(Co2). The block in the clause integer _pro solves a set of constraints. The block in the clause take..min finds the minimum value of the given objective function. If the minimum value exists (not -00), check is called. In clause check, if the value of X, that gives the minimum value of the objective function is not an integer, two new constraints are added in order to the X become integer (for instance, if X = 3.4 then X >= 4 and X <= :3), and the minimum values with respect to the new constraints are solved again. Method k11! integer decides whether the value X is an integer. Where, kl1! indicates KLI 344 method calling, a KL1 method is called from the GDCC program using this notation. Synchronization between the inference engine and the solver to get the minimum value is achieved by the blocks in integer _pro and take...Jllin. Multiple contexts are shown by t~e two blocks of check. CA Figure 12: A triangle and its parameters Example 2 : geometric problem Next, we show how to use a function to find the approximated roots of uni-variate equations and how to handle multiple contexts using an example which is also used in [Aiba et al. 88]. :- module heron. :- public tri/4, testl/4, test2/4. tri(A,B,C,S) :- true I alloc(10,CA,CB,H), alg#C=CA+CB, alg#CA**2+H**2=A**2, alg#CB**2+H**2=B**2, alg#H*C=2*S. testl(A,B,C,S) :- true I call( tri(A,B,C,S) ) initial nil giving GB, output1(GB). 1. output to a window screen test2(A,B,C,S) :- true call( tri(A,B,C,S) initial nil giving GB, Err= 1/100000000, kll!find:find(GB,Err,1 ,SubGB,UniEqs,UniSols), kl1!find:sol(SubGB,UniSols,Err,l,FGB), check (FGB , S). check([], _) :- true I true. check([FGBIFGBs], S) :- true call( check_ask(S,Ans) ) initial FGB giving Sol, check_sub(Ans, Sol, FGBs, S). check_sub (true , Sol, FGBs, S) :- true output (Sol) , 1. output to a 'window screen check (FGBs, S). check_sub (false , _, FGBs, S) check (FGBs, S). check_ask(S, Ans) check_ask(S, Ans) alg.S > 0 I Ans = true. alg.S =< 0 I Ans = false. Figure 12 shows the meaning of the constraints set contained in clause tri, where ** in equations indicates a power operation. CA,CB,H are local variables, A, B, C represents the three edges of a triangle, and S is its area. alloc(Pre JVarl J. " J YarN) is a declaration to give precedence Pre to variables Vari""J YarN. A monomial in:. eluding a variable that has the highest Pre is the highest monomial, that is the precedence of variables is stronger than the degree in comparison. If the goal, CB ?- alloc(0,A,B,C),alloc(5,S), heron:testl(A,B,C,S). is given, in which all parameters are free, this program outputs a Grabner basis consisting of seven rules. Among them is the following rule that contains only A, B, C and ' s. S**2= -1/16*C**4+1/8*C**2*B**2+1/8*C**2*A**2 -1/16*B**4+1/8*B**2*A**2-1/16*A**4. This is equivalent to Heron's formula. Of course, this program can be executed by a goal with concrete parameters. For example, when the goal, ?- alloc(5,S), heron:testl(3,4,5,S). is given, the program produces S**2= 36. However, the Buchberger algorithm cannot extract discrete values from this equation, as shown in section 3.1.2. Method test2 approximates the real roots from a Grobner basis, if the basis contains uni-variate equations. If the goal ?- alloc(5,S), heron:test2(3,4,5,S) is given, first the constraint set is solved to obtain Grobner basis GB using the call predicate, then univariate equations are extracted from GB using the method find: kll!find:find(GB,Err,l,SubGB,UniEqs,UniSols). Where, UniSols contains the all combinations of solutions with precision Err, UniEqs is a set of the uni-variate equations extracted from Grobner basis GB, and SubGB is the basis remaining after removing the uni-variate equations. The next method sol obtains a new Grobner basis FGB by asserting the combinations of approximated solutions UniSols into SubGB. It is necessary to modify the Buchberger algorithm to handle approximated solutions, as explained in [Aiba et al. 91]. FGB contains plural Grobner bases in list format, and these bases are filtered by the method check, which checks whether S> 0 is satisfied at the guard of the sub-block check_ask. 345 5 Conclusion GDCC is an instance of the cc language and satisfies two levels of parallelism: the execution of an inference engine and solvers in parallel, and the execution of a solver in parallel. A characteristic of a cc language is that it is more declarative than sequential CLP languages, since the guard part is the only synchronization point between an inference engine and solvers. GDCC inherits this characteristic and, moreover, it has a block mechanism to synchronize meta-operations with constraints. In the latest (master-slave) version of the parallel algebraic solver, the parallel execution of "cyclic 6-roots" with 16 processors is 4.42 times faster than execution with a single processor. With the boolean solver, parallel execution of the 6 queens problem with 16 processor is 2.28 times faster than with a single processor. We also show the realiza.tion of fast parallel search for mixed integer programming using the Branch-and-Bound algorithm. The following items are yet to be studied. As shown in the program examples, current users must describe everything explicitly to handle multiple contexts. Thus, support faculties and utilities to handle multiple contexts are required. We will also improve the parallel constraint solvers to obtain both good absolute performance and better parallel speedup. The algebraic solver requires parallel speedup. The boolean solver needs to increase the parallel executable parts of its algorithm. The linear integer solver has to improve the ratio of pruning in parallel execution. Through these refinements and experiments using the handling robot design system, we can realize a parallel CLP language system that has high functionality in both its language facilities and performance. {) Acknowledgments We would like to thank Professor Makoto Amamiya at Kyushu University and the members of the CLP working group for their discussions and suggestions. We would also like to thank Dr. Fuchi, Director of the ICOT Research Center, and Dr. Hasegawa, Chief of the Fourth and Fifth Laboratory, for their encouragement and support in this work. References [Aiba et al. 88] A. Aiba, K. Sakai, Y. Sato, D. Hawley and R. Hasegawa. Constraint Logic Programming Language CAL. In International Conference on Fifth Generation Computer Systems, pages 263-276, 1988. [Aiba et al. 91] A. Aiba, S. Sato, S. Terasaki, M. Sakata and K. Machida. CAL: A Constraint Logic Programming Langua.ge - Its Enhancement for Application to Handling Robots -. Technical Report TR-729, Institute for New Generation Computer Technology, 1991. [Backelin and Froberg 91] J. Backelin and R. Froberg. How we proved that there are exactly 924 cyclic 7roots. In S. M. Watt, editor, Proc. ISSAC'91 pages 103-111, ACM, July 1991. [Benichou et al. 71] M. Benichou, L. M. Gauthier, ,P. Girodet, G. Hentges, G. Ribiere and O. Vincent. Experiments in Mixed-Integer Linear Programming. In Mathematical Programming 1 pages 76-94. 1971. [Boege et al. 86] W. Boege, R. Gebauer and H. Kredel. Some Examples for Solving Systems of Algebraic Equations by Calculating Groebner Bases. Symbolic Computation, 2( 1 ):83-98, 1986. [Buchberger 83] B. Buchberger. Grobner bases:An Algorithmic Method in Polynomial Ideal Theory. Technical report, CAMP-LINZ, 1983. [Buchberger 85] B. Buchberger. Grabner bases:An Algorithmic Method in Polynomial Ideal Theory. In N. K. Bose, editor, Multidimensional Systems Theory, pages 184-232. D. Reidel Publishing Company, 1985. [Cangilia et ai. 88] L. Caniglia, A. Galligo and J. Heintz. Some new effectivity bounds in computational geometry. In Applied Algebra, Algebraic Alg01'ithms and Error-CorrEcting Codes - 6th International Conference, pages 131-151. Springer-Verlag, 1988. Lecture Notes in Computer Science 357. [Chikayama et al. 88] T. Chikayama, H. Sato and T. Miyazaki. Overview of Parallel Inference Machine Operationg System (PIMOS). In International Conference on Fifth Generation Compute1' Systems, pages 230-251, 1988. [Clarke et ai. 90] E. M. Clarke, D. E. Long, S. Michaylov, S. A. S.chwab, J. P. Vidal, and S. Kimura. Parallel Symbolic Computation Algorithms. Technical Report CMU-CS-90-182, Computer Science Department, Carnegie Mellon University, October 1990. [Colmerauer 87] A. Colmerauer. Opening the Prolog III Universe: A new generation of Prolog promises some powerful capabilities. BYTE, pages 177-182, August 1987. [Dincbas et al. 88] M. Dincbas, P. Van Hentenryck. H. Simonis, A. Aggoun, T. Graf and F. Bert heir . The Constraint Logic Programming Language CHIP. In International Conference on Fifth Generation Computer Systems, pages 693-702, 1988. 346 [Gebauer and Maller 88] R. Gebauer and H. M. Maller. On an installation of Buchberger's algorithm. Symbolic Computation, 6:275-286, 1988. [Hawley 91a] D. J. Hawley. A Buchberger Algorithm for Distributed Memory Multi-Processors. In The first International Conference of the A usirian Center' for Parallel Computation, Salzburg, September, 1991. Also in Technical Report TR-677 Institute for New Generation Computer Technology, 1991. [Hawley 91b] D. J. Hawley. The Concurrent Constraint Language GDCC and Its Parallel Constraint Solver. Technical Report TR-713 Institute for New Generation Computer Technology, 1991. [Hentenryck 89] P. Van Hentenryck. Parallel Constraint Satisfaction in Logic Programming: Prelimiary Results of CHIP within PEPSys. In 6th International Conference on Logic Programming, pages 165-180, 1989. [Jaffar and Lassez 87] J. Jaffar and J-L. Lassez. Constraint Logic Programming. In 4th IEEE Symposium on Logic Programming, 1987. [Li 86] G.-J. Li and W. W. Benjamin. Coping with Anomalies in Parallel Branch-and-Bound Algorithms, IEEE Trans. on Computers, 35(6): 568-,573, June 1986. [Maher 87] M. J. Maher. Logic Semantics for a Class of Committed-choice Programs. In Proceedings of the Fourth International Conference on Logic Programming, pages 858-876, Melbourne, May 1987. lOki et al. 89] H. Oki, K. Taki, S. Sei, and M. Furuichi. Implementation and evaluation of parallel Tsumego program on the Multi-PSI. In Proceedings of the Joint Parallel Processing Symposium (JSSP'89), pages 351-357, 1989. (In Japanese). [Ponder 90] C. G. Ponder. Evaluation of 'Performance Enhancements' in algebraic manipulation systems. In J. D. Dora and J. Fitch, editors, Computer Algebra and Parallelism, pages 51-74, Academic Press, 1990. [Quinn 90] M. J. Quinn. Analysis and Implementation of Branch-and-Bound Algorithms on a Hypercube Multiprocessor. IEEE Trans. on Computers, 39(3 ):384387, March 1990. [Sakai and Aiba 89] K. Sakai and A. Aiba. CAL: A Theoritical Background of Constraint Logic Programming and its Applications. Symbolic Computation, 8(6):589-603, 1989. [Saraswat 89] V. Saraswat. Concurrent Constraint Programming Languages. PhD thesis, Carnegie-Mellon University, Computer Science Department, January 1989. [K. Satoh and Aiba 90] K. Satoh and A. Aiba. Hierarchical Constraint Logic Language: CHAL. Technical Report TR-592, Institute for New Generation Computer Technology, 1990. [K. Satoh 90b] K. Satoh. Computing Soft Constraints by Hierarchical Constraint Logic Programming. Technical Report TR-610, Institute for New Generation Computer Technology, 1990. [So Sato and Aiba 90] S. Sato and A. Aiba. An Application of CAL to Robotics. Technical Memorandum TM-1032, Institute for New Generation Computer Technology, 1990. [Y. Sato and Sakai 88] Y. Sato and K. Sakai. Boolean Grabner Base, February 1988. LA-Symposium in winter, RIMS, Kyoto University. [Senechaud· 90] P. Senechaud. Implementation of a Parallel Algorithm to Compute a Grabner Basis on Boolean Polynomials. In J. D. Dora and J. Fitch, editors, Computer Algebra and Parallelism, pages 159166. Academic Press, 1990. [Siegl 90] K. Siegl. Grabner Bases Computation in STRAND: A Case Study for Concurrent Symbolic Computation in Logic Programming Languages. Master's thesis, CAMP-LINZ, November 1990. [Takeda et al. 88] Y. Takeda, H. Nakashima, K. Masuda, T. Chikayama and K. Taki. A load balancing mechanism for large scale multiprocessor systems and its implementation. In International Conference on Fifth Generation Computer Systems, pages 978-986, 1988. [Ueda and Chikayama 90] K. Ueda and T. Chikayama. Design of the Kernel Language for the Parallel Inference Machine. Computer Journal, 33(6):494-500, December 1990. [Vidal 90] J. P. Vidal. The Computation of Grabner Bases on a Shared Memory Multi-:-processor. Technical Report CM'U-CS-90-163, Computer Science Department. Carnegie Mellon University, August 1990. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 347 cu-Prolog for Constraint-Based Grammar Hiroshi TSCDA Institute for New Generation Computer Technology (leOT) 1-4-28 Mita, NIinato-ku. Tokyo 108. Japan E-mail: tsudaiQ!icot.or.jp Abstract cu-Prolog is a constraint logic programming (CLP) language appropriate for natural language processing such as a Japanese parser based on JPSG. Compared to other CLP languages, cu- Prolog has several unique features. Most CLP languages take algebraic equa.tions or inequations as constraints. cu-Prolog, on the other hand, takes the Prolog atomic formulas of userdefined predicates. cu-Prolog, thus. c,an describe symbolic and combinatorial constraints that are required for constraint-based natural language grammar description. As a constraint solver. cu-Prolog uses unfold/fold transformation dynamically with some heuristics, JPSG (Japanese Phrase Structure Grammar) is a constraint-based and unification-based Japanese grammar formalism beging developed by the PSG-working group at ICOT. Like HPSG (Head-driven PhrasE" Structure Grammar), JPSG is a phrase structure \vhose nodes are feature structures. Its grammar description is mainly formalized by local constraints in phrase structures. This paper outlines cu- Prolog and its application to the disjunctive feature structure and JPSG parser. 1 Introduction Two aspects are considered to classify contemporar:v natural language grammatical theories[Carpenter ft al. 91]. Firstly, They must be classified according to whether they have transformation operations among different structure levels. One current version of transformational gramma,. is GB (Government and Binding) theory[Chomsky 81]. So called unification-based grammars[Shieber 86]. such as GPSG (Generalized Phrase Structure Grammar). LFG (Lexical Functional Grammar). HPSG (Head-driwn Phrase Structure Grammar) [Pollard and Sag 87]. and JPSG (Japanese Phrase Structure Grammar)[Gunji 86] are categorized as non-transformational gramrna;s. Unification-based grammar is a phrase structure grammar whose nodes are feature structures. It uses unification as its basic operation. In this respect, it is congenial to logic programming. Secondly, classification must be made as to whether a language's grammar description is rule-based or constraint-based 1 . GPSG and LFG fall into the former category. The latter includes GB theory. HPSG. and JPSG. From the viewpoint of procedural computation. rule-based approaches are better. However. by constraint- based approaches. more general and richer grammar formalisms are possible because morphology. syntax. semantics. and pragmatics are all uniformly treated as constraints. Also. the most important feature of constraints. the declarative grammar description. allows various information £lovvs during processing. Consiclf'1' t he programming languages used t.o implement tlwse grammatical theories. For rule- based grammars. many approaches have beell attempted. such as Fl~G[I\.ay 8!5] and PATR-Il[Shieber 86]. As yet. however. no leading work has been done on constraintbased gramrnars. Our constraint logic programming language (,1lProlog [Tsuda et al. 89b. Tsuda et al. 89a] aims to provide an implementation framework for constraintbased gram.mars. "('nlike most eLP languages. cuProlog takes the Prolog atomic formulas of user-defined predicates as constraints. cu-Prolog originated from the technique of cOr/strained unification (or conditioned unifi:cat iOIl [Hasida and Sirai 86]) - a unification between t\ovo constrained Prolog patterns. The basic component of ClI- Prolog is a Constrained Horn Clausf (CHe) that adds constraints in terms of user-defined Prolog preclica tes to Hom clauses. Their domain is suitahle for symbolic ami combinatorial linguistic const raints. The COllst raint solver of ell-Prolog uses the unfold/fold [Tamaki alld Sato ~'n] transformation d~'namicalh' with certaiu lwurist ies. This paper illustrates • the outline of Cll-Prolog. • treatment of disjunctive feature structures with PST(Partially Specified Term) [Mukai 88] ill cuProlog. and lConstraint-based approaches are also called tnformatioll- based or principle-baSEd approaches. 348 • the JPSG parser as its most successful application. 2 Linguistic Constructions As an introduction, this section explains the various types of linguistic constraints in constraint-based grammar formalisms. 2.1 Disjwlctive Feature Structure Unification-based grammars utilize feature structures as basic information structures. A feature structure consists of a set of pairs of labels and their values. In (1), pos and sc are called features and their values are n and a singleton set < [pos = p] >. pos = n [ sc = ( [pos = p ] ) 1 (1 ) Morphological, syntactic, semantic, and pragmatic information are all uniformly stored in a feature structure. Moreover, nat ural language descriptions essentially require some framework to handle ambiguities such as polysemic words, homonyms, and so on. Disjunctive feature structures are widely used to handle disjunctions in feature structures[Kay 85]. Disjunctive feature structures consist of the following two types. Value disjunction A value disjunction specifies the alternative values of a single feature. The following example states that the value of the pos feature is n or v, and the value of the sc feature is <> (empty set) or < [pos = p] >. 2.2 Structural Principles Unification-based grammars are phrase structures whose nodes are feature structures. Their grammar descriptions consist of both phrase structure rules and local constraints in a phrase structure. In current unification- based grammars, such as HPSG and JPSG, phrase structure rules become very general and grammars are mainly described with a set of local constraints called structural principles. JPSG has only one phrase structure rule, as follows. M ~ D H M, D and H are the mother, the dependent daughter, and the head daughter respectively. This phrase structure is applicable to both the complementation structure and adjunction structure of Japanese 2 • In complementation structures, D acts as a complement. In adjunction structures, D works as a modifier. Structural principles are relations between the features of three nodes (M, D and H) in a local tree~ In the following, we explain some features and their constraints. mod: The mod feature specifies the function of D in a phrase structure. When the value is +,.D works as a modifier, and when -, it works as a complement. head features: Features such as pos, gr, case, and infl are called head features. These conform to the following head feature principle. The value of a head feature of M unifies with that of H. General disjunction A general disjunction specifies alternative groups of ·multiple features. In the following structure, sem = love(X, Y) is commo"n, and the rest is ambiguous. ! l pos = n ] pos = v vform = vs t Be = ( [pos = p sem = love(X, Y) 1) 1 (3) One serious problem in treating disjunctive feature structures is the computational complexity of their unification problem because it is essentially NPcomplete[Kasper and Rounds 86]. Some practically efficient algorithms to deal with disjunctions have been studied by [Kasper 87] and [Eisele and Dorre 88]. subcat features: Features subcai and adjacent are called subcat features. They take a set of feature structures that specify adjacent categories such as complements, and nouns. The subcat feature principle is In the complementation struct ure, the value of a subcat feature of M unifies with that of H minus D. In the adjunction structure, the value of a subcat feature of M unifies with that of H. sem: The sem feature specifies semantic information. In the complementation struCture, the sem value of M unifies with that of H. In the adjunction structure, the sem value of 1\.1 unifies with that of D. 2For example, "Ken-ga aisuru (Ken loves)" is the complementation structure, and "ooki-na yama (big mountain)" is the adjunction structure. 349 Below runs)." IS the analysis for "Ken-ga hashiru (Ken Head Body ~ Consil·a.int .---~ HE "4D : - B 1 • B 2 , ••.• Bn; C\, C2 , ••. , Cm . pos = v sc =<> [ sem = run(ken) 1 HEAD. called head. is an atomic fonnula. and B 1 , •.. , Bn. called body. is a sequenCE of atomic formulas. C\, .... C m . called constraint. is a sequence of atomic formulas or equal constraints of thE form,' Va1'iable = Term. Body or constraint can be Empty. 0 pos = v pos = p gr =ga [ sem = ken 1 6 Ken-ga sc = ( [ sem =p sem = X pos g~ =. ga From the viewpoint of declarative semantics. the above clause is equivalent to the following Horn Clause. l) HEAD : - B 1 • B 2 •..•• Bn. C't. C2... ·. c'n . = run(X) 3.3 I Derivation Rule cu-Prolog expands the derivation rule of Prolog by adding a constraint transformation operation. hashiru progra.nl goa./ 3 .---.. A.K:C. eu-Prolog 3.1 _ _snbstitldiOIl ______ C' = mf(CO LO.KO:C'. C6BStraiBed Horn Clause (CHC) The basic component of cu-Prolog is the Constrained Horn Clause (CHCj3. [Def] 1 (CHC) The Constrained Horn Clause (CHC) 30r Constraint Added Horn Clause (CAHC). A': -L:D. constraint tra."sj'ormation, r 0= m.gu(A. A') Conventional Approa~hes Prolog is often used as an implementation language for unification-based grammars. However, its execution strategy is fixed and procedural, i.e., always from left to right for AND processes, and from top to bottom for OR processes. Prolog programmers have to align goals such that they are solved efficiently. Prolog, therefore. is not well-suited for constraint-based grammars because it is impossible to stipulate in advance which type of linguistic constraints are to be processed in what order. Some Prolog-like systems such as PrologII and CIL(Mukai 88] have bind-hook mechanisms that can delay some goals (constraints) until certain variables bind. As the mechanism, however, can only check constraints by executing them, it is not always efficient. Most CLP languages, such as CLP(R)[Jaffar and Lassez 87], PrologIII, and CAL, take the constraints of algebraic domain with equations' or inequations. Their constraint solvers are based on algebraic algorithms such as Grabner bases, and solving equations. However, for AI applications and especially natural language processing systems, symbolic and combinatorial constraints. are far more desirable than algebraic ones. cu-Prolog, on the other hand, can use symbolic a~d combinatorial constraints because its constraint domain is the Herbrand universe. 3.2 ~ IS nElL' + DO) goa./ A and A' are heads. K and L are bodies. C. D. and C' are constraints. mgu(A. A') is the most general unifier between A and A'. m.f(C'str) is a canonical form of a constraint that is equivalent to Cstr. As a computational rule. when the transformation of CO + DO fails. the above derivation rule is not a.pplied. 3.4 PST cu-Prolog adopts PST (Partially Specified Term) [Mukai 88] as a data structure that corresponds to thf' fea.ture structure in unification-based grammars. [Def] 2 (Partially Specified Term (PST)) PST /.-; a term of thE following form : {h/t1,12/t2"" .in/t,,} . ii. calltd label. is an atom and Ii value. is a tenn. #- l.i(i #- j). t i . called o A recursive PST structure is not allowed. [Def] 3 (constrained PST) In cu-Prolog. PST is stored as an equal constraint with othu rfleoant constraints : x = PST. Cl(X), C2(X), .... cn(X) We call the abOVE tYPE of constraints constrained PST. X=PST correspo'nds to [Kas.per 87] 's unconditional conjunct. and cdX). C2(X) ..... cn(X) c01Tesponds to thE conditional confunct. 0 In the next subsection. we give its canonical form modula1'. The constrained PST ca.n naturally describe disjunctive feature structures of unification based grammars. 350 3.5 Cal!l6ftical form of a cOBstraint The canonical form of a constraint in CHC is called modular. First, we give an intuitive definition of modular without PST. [Def] 4 (modular (without PST» A sequence of atomic formulas GI , ... , Gm(m > 1) is modular when all its arguments are different variables. User-defined predicates in a constraint must be defined with modular Horn clauses 4. Constraint Transformation 3.6 For example, member(X, Y),member(U, V) is modular, member(X, Y), member(Y, Z) is not modular, and append(X, Y, [a, b, c, d]) is not modular. We expand the definition of modular for constrained PST. [Def] 5 (comji)onent) The component of an argument of a predicate is a set of labels to which the argument may bind. Here, an atom or a complex term is regarded 0 as a PST of the label [] . Cmp (p, n) stands for the component of the nth argument of' a predicate p. Cmp (T) represents a set of labels of a PST T. In 'a constraint of the form X=t, variable X is regarded as taking Cmp (t) . Components can be computed by static analysis of the program [Tsuda 91 J. Vacuous arg'ument places[Tsuda and Hasida 90J are arguments whose components are :-cO(X,_,_). 1. solve the new constraint success. Yo X is the final answer of the unification. X = {voice/active, trans/trans, subj/{person/second, numb/pI, case/nom, lex/you}, goal/{person/third}, actor/{person/second, numb/pI, case/nom, lex/you}, numb/pI, rank/clause}; Lines beginning with "(I" are user inputs. To this input, cu- Prolog returns equivalent modular constraint and definition clauses of newly defined predicates. Figure 1: Disjunctive feature unification 356 _:-p([ken.ga.ai.suru]). 1. user input of (( Ken ga ai - suru. ) ) rer.1. parse tree {sem/[10ve.V7_2030,V6_2029]. eore/{form/Form_1381. pos/v}, sc/V1_2024. ref1/ []. slash/V3_2026. ps1/ [] .ajn/ []. aje/ [J }--- [suff_p] I 1--{sem/[10ve.V7_2030.V6_2029], eore/{pos/v}, se/VO_2023, refl/[], slash/V2_2025, ps1/ [], ajn/ [] , aj e/ [] }--- [subeat_p] I I--{sem/ken, eore/{form/ga, pos/p}, se/[], ref1/[], slash/[], ps1/ [] , ajn/ [] • aj e/ [] }--- [adj aeent_p] I I--{sem/ken, eore/ {form/n, pos/n}, se/ [] , ref1/ [] , slash/ [] , ps1/ [J , ajn/ [] , aj e/ [] }--- [ken] I __ {sem/ken, eore/{form/ga, pos/p}, se/[], ref1/[], slash/[] ,ps1/[], ajn/[], aje/[{sem/ken, eore/{pos/n}, se/[], ref1/ReflAC_70}]}---[ga] 1__ {sem/[10ve,V7_2030,V6_2029], eore/{form/vs2, pos/v}}---[ai] ' __ {sem/[10ve,V7_2030,V6_2029], eore/{form/Form_1381, pos/v}, se/[], ref1/[], slash/[], ps1/[], ajn/[], aje/[{sem/[10ve,V7_2030,V6_2029], eore/{form/vs2, pos/v}, se/[], ref1/ReflAC_1493}]}---[suru] category= {sem/[10ve,V7_2030,V6_2029], eore/{form/Form_1381, pos/v}, sc/V1_2024, refl/[], slash/V3_2026, psl/[], ajn/[], aje/[]} Y.category constraint= e40 (VO _2023, VL2024, V2_2025, V3_2026, V4_2027, V5 _2028, {sem/ken, eore/{form/ga, pos/p}, se/[], refl/[], slash/[], ps1/[], ajn/[], ajc/[]}, V6_2029, {sem/V6_2029, core/{form/wo, pos/p}}, V7_2030, {sem/V7_2030, core/{form/ga, pos/p}}), syu_ren(Form_1381) Y.constraint about the category true. CPU time = 2.217 sec (Constraints Handling = 1. 950 sec) _ : -c40 (V1, _, _, V3, _, _, _, V6, _ , V7, _) . Y.so1 ve constraint V1=[] V3=[{sem/VO_4}] V6=VO_4 V7=ken; 1. solution 1 V1 = [{sem/VO_4, eore/{form/wo, pos/p}}] V3 = [] V6 = VO_4' V7 = ken; no. CPU time = 0.017 sec (Constraints Handling = 0.000 sec) 1. solution 2 The parsing of "Ken ga ai-suru" that has two meanings: "Ken loves (someone)" or "(someone) whom Ken loves." The parser draws a corresponding parse tree and returns the category of the top node with constraints. In this example, the ambiguity of the sentence is shown in the two solutions of the constraint c40. Figure 2: JPSG parser: disambiguation PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 357 Model Generation Theorem Provers on a Parallel Inference Machine Masayuki Fujita Miyuki Koshimura* Ryuzo Hasegawa Hiroshi Fujitat Institute for New Generation Computer Technology 4-28, Mita 1-chome, Minato-ku, Tokyo 108, Japan {mfujita, hasegawa, koshi}@icot.or.jp fujita@sys.crl.melco.co.jp Abstract This paper describes the results of the research and development on parallel theorem provers being conducted at ICOT. We have implemented a model-generation based parallel theorem prover called MGTP in KL1 on a distributed memory multi-processor, Multi-PSI, and on a parallel inference machine with the same architecture, PIM/m. Currently, we· have two versions of MGTP: one is MGTP /G, which is used for dealing with ground models, and the other is MGTP /N, used for dealing with non-ground models. While conducting research and development on the MGTP provers, we have developed several techniques to improve the efficiency of forward reasoning theorem provers. These include model generation and hyper-resolution theorem provers. First, we developed KL1 compilation techniques to translate the given clauses to KL1 clauses, thereby achieving good efficiency. To avoid redundancy in conjunctive matching, we developed RAMS, MERC, and ~-M methods. To reduce the amount of computation and space required for obtaining proofs, we proposed the idea of Lazy Model Generation. Lazy model generation is a new method that avoids the generation of unnecessary atoms that are irrelevant to obtaining proofs, and provides flexible control for the efficient use of resources in a parallel environment. For MGTP /G, we exploited OR parallelism with a simple allocation scheme, thereby achieving good performance on the Multi-PSI. For MGTPjN, we exploited AND parallelism, which is rather harder to obtain than OR parallelism. With the lazy model generation method, we have achieved a more than one-hundred-fold speedup on a PIM/m consisting of 128 PEs. *Present address: Toshiba Information Systems 2-1 Nissin-cho, Kawasaki-ku, Kawasaki, Kanagawa 210, Japan tPresent address: Mitsubishi Electric Corporation 8-1-1 Tsukaguchi-honmachi, Amagasaki, Byogo 661, Japan 1 Introduction The research on parallel theorem proving systems has been conducted under the Fifth Research Laboratory at ICOT as a part of research and development on the problem-solving programming module. This research aims at the realization of highly parallel advanced inference mechanisms that are indispensable in building intelligent knowledge information systems. The immediate goal of this research project is to develop a parallel automated reasoning system on the parallel inference machine, PIM, based on KL1 and PIMOS technology [Chikayama et. al. 88]. We aim at applying this system to various fields such as intelligent database systems, natural language processing, and automated programming. The motive for the research is twofold. From the viewpoint of logic programming, we try to further extend logic programming techniques that provide the foundation for the Fifth Generation Computer System. The research will help those aiming at extending languages and/or systems from Horn clause logic to full first-order logic. In addition, theorem proving is one of the most important applications that could effectively be built upon the logic programming systems. In particular, it is a good application for evaluating the abilities of KL1 and PIM. From the viewpoint of automated reasoning, on the other hand, it seems that the logic programming community is ready to deal with more classical and difficult problems [Wos et. al. 84] [Wos 88] that remain unsolved or have been abandoned. We might achieve a breakthrough in the automated reasoning field if we apply logic programming technology to theorem proving. In addition, this trial would also cause feedback for logic programming technology. Recent developments in logic programming languages and machines have shed light upon the problem of how to implement these classical but powerful methods efficiently. For instance, Stickel developed a model- 358 elimination[Loveland 78] based theorem prover called PTTP [Stickel 88]. PTTP is able to deal with any first-order formula in Horn clause form (augmented by contrapositives) without loss of completeness or soundness. It works by employing unification with occurrence check, the model elimination reduction rule, and iterative deepening depth-first search. A parallel version of PTTP, called PARTHENON[Bose et. al. 89], has been implemented by Clarke et al. on a shared Schumann et al. built a memory multiprocessor. connection-method[Bibel 86] based theorem-proving system, SETHEO[Schumann 89], in which a method identical to model elimination is used as a main proof mechanism. Manthey and Bry presented a tableaux-like theorem prover, SATCHMO[Manthey and Bry 88], which is a very short and simple program in Prolog. As a first step for developing KLI-technology theorem provers, we adopted the model generation method on which SATCHMO is based. Our reasons were as follows: (1) A useful feature of SATCHMO is that full unification is not necessary, and that matching suffices when dealing with range-restricted problems. This makes it very convenient for us to implement provers in KLI since KLI, as a committed choice language, provides us with very fast one-way unification. (2) It is easier to incorporate mechanisms for lemmatization, subsumption tests, and other deletion strategies that are indispensable in solving difficult problems such as condensed detachment problems [Wos 88][Overbeek 90][McCune and Wos 91]. In implementing model generation based provers, it is important to avoid redundancy in the conjunctive matching of clauses against atoms in model candidates. For this, we proposed the RAMS [Fujita and Hasegawa 91] and MERC [Hasegawa 91a] methods. A more important issue with regard to the efficiency of model generation based provers is how to reduce the total amount of computation and memory required for proof processes. This problem becomes more critical if we try to solve harder problems that require deeper inferences (longer proofs) such as Lukasiewicz problems. To solve this problem, it is important to recognize that proving processes are viewed as generation-and-test processes and that generation should be performed only when testing requires it. We proposed the Lazy Model Generation method in which the idea of demand-driven computation or 'generate-only-at-test' is implemented. Lazy model generation is a new method that avoids the generation of unnecessary atoms that are irrelevant to obtaining proofs, and provides flexible control for the efficient use of resources in a parallel environment. We have implemented two types of model generation prover: one is used for ground models (MGTP IG) and the other is used for non-ground models (MGTP IN). In implementing MGTP IG, we developed a compiling technique to translate the given clauses into KLI clauses by using advantage (1) listed above. This makes MGTP IG very simple and efficient. MGTP IG can prove non-Horn problems very efficiently on a distributed memory multi-processor, the Multi-PSI, by exploiting OR parallelism. MGTP IN, on the other hand, aims at proving difficult Horn problems by exploiting AND parallelism. For MGTP/N, we developed new parallel algorithms based on lazy model generation method. They run with optimal load balancing on a distributed memory architecture, and require a minimal amount of computation and memory to obtain proofs. In the next section, we explain the model generation method on which our MGTP provers are based. In Section 3, we discuss the problem of meta-programming in KLI, and outline the characteristics of MGTP IG and MGTP IN. In Section 4, we describe the essence of the main techniques developed for improving the efficiency of model generation theorem provers. In Section 5, we present OR parallelization and AND parallelization methods developed for MGTP/G and MGTP/N. Section 6 provides a conclusion. 2 Model Generation Theorem Prover Throughout this paper, a clause is represented in an implicational form: where A(I ::; i ::; n) and OJ(1 ::; j ::; m) are atoms; the antecedent is a conjunction of AI, A 2 , • •• , An; the consequent is a disjunction of 0 1 , O2 , • •• , Om. A clause is said to be positive if its antecedent is true(n = 0), and negative if its consequent is false(m = 0). A clause is also said to be tester if its consequent is false(m = 0), otherwise it is called generator. The model generation method incorporates the following two rules: • Model extension rule: If there is a generator clause, A -+ 0, and a substitution a such that Aa is satisfied in a model candidate M and Oa is not satisfied in M, then extend M by adding Oa into M. • Model rejection rule: If a tester clause has an antecedent Aa that is satisfied in a model candidate M; then reject M. We call the process of obtaining Aa a conjunctive matching of the antecedent literals against elements in a model. Note that the antecedent (true) of a positive clause is satisfied by any model. 359 The task of model generation is to try to construct a model for a given set of clauses, starting with a null set as a model candidate. If the clause set is satisfiable, a model should be found. The method can also be used to prove that the clause set is unsatisfiable, by exploring every possible mode~ candidate to see that no model exists for the clause set. For example, consider the following set of clauses, 81[Manthey and Bry 88]: 01: 02: 03: 04: 05: 06: p(X), s(X) ~ false. q(X), s(Y) ~ false. q(X) ~ s(J(X)). r(X) ~ s(X). p(X) ~ q(X); r(X). true ~ p(a); q(b). polarity of literals in a clause of implicational form is fixed to either positive or negative in the model generation method, whereas it is allowed to be both positive and negative in the model elimination ·method. Moreover, from the procedural point of view, model generation is restricted to proceeding bottom-up (as in forwardreasoning) starting at positive clauses (or facts). These restrictions, however, do not hurt the refutation completeness of the method. Model generation can also be viewed as unit hyperresolution. Our calculus, however, is much closer to tableaux calculus in the sense that it explores a tree, or a tableau, in the course of finding a proof. Indeed, a branch in a proof tree obtained by the tableaux method corresponds exactly to a model candidate. 3 3.1 cP ~q(b) p(a) A q(a) IC3 IC3 r(a) IC4 s(J(a)) s(a) IC2 ICI x X s(J(b)) IC2 x Figure 1: A proof tree for 81 A proof tree for the 81 problem is depicted in Fig. l. VVe start with an 'empty model, Mo = cPo Mo is first expanded into two cases, Ml = {p(a)} and M2 = {q(b)}, by applying the model extension rule to 06. Then Ml is expanded .by 05 into two cases: M3 = {p(a), q(a)} and M4 = {p(a), r(a)}. M3 is further extended by 03 to Ms = {p(a),q(a),s(J(a))}. Now with Ms the model rejection rule is applicable to 02, thus .A1s is rejected and marked as closed. On the other hand, M4 is extended by 04 to M6 =. {p(a),r(a),s(a)} which is rejected by 01. In a similar way, the remaining model candidate M2 is extended by 03 to M7 = {q(b), s(J(b))} , which is rejected by 02. Now that there is no way to construct any model candidate, we can conclude that the clause set 81 is unsatisfiable. The model generation method, as its name suggests, is closely related to the model elimination method. However, the model generation method is a restricted version of the model elimination method in the sense that the Two Versions of MGTP Meta-programming in KL1 Prolog-Technology Theorem Provers such as PTTP and 8ATCHMO utilize the fact that Horn clause problems can be solved very efficiently. In these systems, the theorem being proven is represented by Prolog clauses, and most deductions are performed as normal Prolog execution. However, that approach cannot be taken in KLI because a KL1 clause is not just a Horn clause; it has extra-logical constructs such as a guard and a commit operator. We should, therefore, treat the clause set as data rather than as a KLI program. In this case, the inevitable problem is how to represent variables appearing in a given clause set. Two approaches can be considered for this problem: (1) representing object-level variables with KL1 ground terms, or (2) representing object-level variables with KLI variables. The first approach might be the right path in metaprogramming, where object- and meta-levels are strictly separated, thereby providing clear semantics. However, it forces us to write routines for unification, substitution, renarping, and all the other intricate operations on variables and environments. These routines would become extremely large and complex compared to the main program, and would make the overhead bigger. In the second approach, most operations on variables and environments can be performed beside the underlying system, rather than as routines running on top of it. This means that a meta-programmer does not have to write tedious routines, and gains high efficiency. Also, a programmer can use the Prolog var predicate to write routines such as occurrence checks in order to 360 make built-in unification sound, if such routines are necessary. This approach makes the program much more simple and efficient, even though it makes the distinction between object- and meta-levels ambiguous. In KL1, however, the second approach is not always possible. This is because the semantics of KL1 never allow us to use a predicate like var. In addition, KL1 built-in unification is not the same as its Prolog counterpart in that unification in the guard part of a KL1 clause can only be one-way, and a unification failure in the body part is regarded as a program error or exception that cannot be backtracked. 3.2 Characteristics of MGTP jG and MGTP/N Taking the above discussions into consideration, we decided to develop both the 1\1GTP IG and MGTP IN provers so that we can use effectively them according to the problem domains dealt with. The ground version, MGTP IG, aims to support finite problem domains, which include most problems in various fields, such as database processing and natural language processing. For ground model cases, the model generation method makes it possible to use just matching, rather than full unification, if the problem clauses satisfy the rangeTestTicteciness condition 1 [J\,1anthey and Bry 88]. This suggests that it is sufficient to use KL1 's head unification. Thus we can take the KL1 variable approach for representing object-level variables, thereby achieving good performance. The key points of KL1 programming techniques developed for MGTP/G are as follows: (Details are described in the next section.) c(1,p(X),[], R):-trueIR=cont. c(1,s(X),[p(X)],R):-trueIR=false. c(2,q(X),[], R):-trueIR=cont. c(2,s(Y),[q(X)],R):-trueIR=false. c(3,q(X),[], R):-trueIR=[s(f(X))]. c(4 r(X),[], R):-trueIR=[s(X)]. c(5:p(X),[], R):-trueIR=[q(X),r(X)]. c(6,true,[], R):-trueIR=[p(a),q(b)]. otherwise. c(_,_,_,R):-trueIR=fail. Figure 2: Sl problem transformed to KL1 clauses For non-ground model cases, where full unification with occurrence check is required, we are forced to follow the KL1 ground terms approach. However, we do not necessarily have to maintain variable-binding pairs as processes in KLl. We can maintain them by using the vector facility supported by KL1, as is often used in ordinary language processing systems. Experimental results show that vector implementation is several hundred times faster than process implementation. In this case, however, we cannot use the programming techniques developed for MGTP /G. Instead, we have to use a conventional technique, that is, interpreting a given set of clauses instead of compiling it into KL1 clauses. To ease the programmer's burden, we developed MetaLibTaTy[Koshimura et. al. 90]. This is a collection of KL1 programs to support meta-programming in KLl. The meta-library includes facilities such as full unification with occurrence check, variable management routines, and term memory[Stickel 89][Hasegawa 91c]. 4 Technologies Efficiency Developed for • First, we translate a given set of clauses into a corresponding set of KL1 clauses. This translation is quite simple. 4.1 • Second, we perform conjunctive matching of a literal in a clause against a model element by using KL1 head unification. This section presents the compiling techniques developed for MGTP /G to translate given clauses to KL1 clauses. It also shows a simple MGTP /G interpreter obtained by using the techniques[Fuchi 90][Fujita and Hasegawa 90][Hasegawa et. al. 90a]. • Third, at the head unification, we can automatically obtain fresh variables for a different instance of the literal used. The non-ground version, MGTP IN, supports infinite problem domains. Typical examples are mathematical theorems, such as group theory and implicational calculus. 1 A clause is said to be range-restricted if every variable in the clause has at least one occurrence in its antecedent. For example, in the S I problem, all the clauses, GI-G6, are range-restricted since no va.riable appears in clause G6; the variable X in clauses GI, G3, G4 a.nd G5 has an occurrence in their antecedents; and variables X and Y in G2 have their occurrences in its antecedent. 4.1.1 KL1 Compiling Method Transforming problem clauses clauses to KL1 Our MGTP /G prover program consists of two parts: an interpreter written in KL1, and a set of KL1 clauses representing a set of clauses for the given problem. During conjunctive matching, an antecedent literal expressed in the head of a KL1 clause is matched against a model element chosen from a model candidate which is retained in the interpreter. Although conjunctive matching can be implemented simply in KL1, we need a programming trick for support- 361 ing variables shared among literals in a problem clause. The trick is to propagate the binding for a shared variable from one Ii teral to another. To understand this, consider the previous example, Sl. The original clause set is transformed into a set of KL1 clauses, as shown in Figure 2. In c (N ,P , S ,R) , N indicates clause number; P is an antecedent literal to be matched against an element t.aken from a model candidate; S is a pattern for receiving from the interpreter a stack of literal ins tances appearing to the left of P, which have already matched model elements; and R is the result returned to the interpreter when the match succeeds. Notice that original clause C1 (p(X), s(X) ----+ false.) is translated to the first two KLl clauses. The conjunctive matching for C1 proceeds as follows. First, the interpreter picks up a model element, E}, from a model candidate, and tries to match the first literal p(X) in Cl against E1 by initiating a goal, c(l,E},[],Rd. If the matching fails, then the resul t R1 = fail is returned by the last KL1 clause. If the matching succeeds, then the result R} = cont is returned by the first KL1 clause and the interpreter proceeds to the next literal s(X) in C1, picking up another model element, E 2 , from the model candidate and initiating a goal, c(l, E2 , [Ed, R2)' Since the literal instance in the third argument, [Ed, is ground, the variable X in [p(X)] in the head of the second KL1 clause gets instantiated to a ground term. At the same time, the term s(X) in that head is also instantiated due to the shared variable X. Under this instantiation, s(X) is checked to see whether it matches E 2 , and if the matching succeeds then the result, R2 = false, is returned. 4.1.2 a stack, S, of literal instances. If the match succeeds at a literal, Li, with a model element, P, then P is pushed onto the stack S, and the task proceeds to matching the next literal, Li+b together with the stack, [PIS]. According to the result of c/4: fail, cont, false or list (F), an ante1/9 process determines what to do next. If the result is cont, for example, ante1 will fork multiple ante' processes to try to make every possible combination of elements out of the current model for the conjunctive matching. If the conjunctive matching for all the antecedent literals of a clause succeeds, a cnsq/6 process is called to check the satisfiability of the consequent of the clause. cnsq1/8 checks whether a literal in the consequent is a member of the current model. If no literal in the consequent is a member of the current model, the current model cannot satisfy the clause. In this case, the model will be extended with each disjunct literal in the consequent of the clause by calling an extend/5 process. After extending the current model, a clauses/5 process is called for each extension of the model, and the results are combined by unsat/4. When a clauses process for some of the extended models returns sat as the result, it means that a model is found and the clause set is known to be satisfiable. If every extension of the model leads to unsat, the current model is not a part of any model for the given set of clauses. Thus, if the top-level clauses/5 process returns sat as the result, then the given clause set has a model and is satisfiable, and if it returns unsat, then the given clause set has no model and is unsatisfiable. A simple MGTP /G interpreter With the problem clauses are transformed to KL1 clauses as above, a simple interpreter is developed as shown in Figure 32 . The interpreter, given a list of numbers identifying problem clauses and a model candidate, checks whether the clauses are satisfiable or not. The top-level predicate, clauses/5, dispatches a task, ante/7, to check whether each clause is satisfied or not in the current model. If all the clauses are satisfied in the current model, the result, sat, is returned by sat/4 combining the results from the ante processes. For each clause in the given clauses, conjunctive matching is performed between the elements in the model candidate and the Ii terals in the antecedent of the clause with ante/7 and ante1/9 processes. The conjunctive matching for the antecedent literals proceeds from left to right, by calling c/4 one by one. An ante process retains 21n the program, 'alternatively' is a KLI compiler directive which gives a preference among clauses to evaluate their guards in such a way that clauses above alternatively are evaluated before those below it. The preference, however, may not be strictly obeyed. This depends on implementation. 4.2 Avoiding Redundant Conjunctive Matching To improve the performance of the model generation provers, it is essential to avoid, as much as possible, redundant computation in conjunctive matching. Let us consider a clause, C, having two antecedent literals. To perform conjunctive matching for the clause, we need 'to pick a 'pair of atoms out of the current model candidate, M. Imagine that, as a result of a satisfiability check of the clause; we are to extend the model candidate with ~, which is an atom in the consequent of the clause, C, but not in M. Then, in the conjunctive matching for the clause, C, in the next phase, we need to pick a pair of atoms from M U~. The number of pairs amounts to: (M U ~)2 = M x MUM x ~ U ~ x M U ~ x ~. 362 clauses(_,_,_,_,quit):-trueltrue. alternatively. clauses([JICs] ,C,M,A,B):-truel' ante(J, [true 1M] ,[] ,C,M,Al,B), sat(Al,A2,A,B), clauses(Cs,C,M,A2,B). clauses([],_,_,A,_):-trueIA=sat. ante(_,_,_,_,_,_,quit):-trueltrue. alternatively. ante(J,[PIM2],S,C,M,A,B):-truel mgtp: c(J ,P ,S ,R) , antel(J,R,P,S,M2,C,M,A,B). ante(_,[] ,_,_,_,A,_):-trueIA=sat. antel(J,fail,_,S,M2,C,M,A,B):-truel ante(J,M2,S,C,M,A,B). antel(J,cont,P,S,M2,C,M,A,B):-truel ante(J,M,[PIS] ,C,M,Al,B), sat(Al,A2,A,B), ante(J,M2,S,C,M,A2,B). antel(_,false,_,_,_,_,M,A,B):-truel A=unsat,B=quit. antel(J,F,_,S,M2,C,M,A,B):-list(F)1 cnsq(F,F,C,M,Al,B), sat(Al,A2,A,B), ante(J,M2,S,C,M,A2,B). cnsqC,_,_,_,_,quit):-trueltrue. alternatively. cnsq([DlIDs] ,F,C,M,A,B):-truel cnsql(Dl,M,Ds,F,C,M,A,B). cnsq([],F,C,M,A,_):-truel extend(F,M,C,A,_). cnsql(D,[DI_],_,_,_,_,A,_):-trueIA=sat. cnsql(_,[],Ds,F,C,M,A,B):-truel cnsq(Ds,F,C,M,A,B). othenTise. cnsql(D,[_IM2],Ds,F,C,M,A,B):-truel cnsql(D,M2,Ds,F,C,M,A,B). extendC, _, _,_ ,qu.it) : -true Itrue. alternatively. extend([DIDs],M,C,A,B):-truel clauses(C,C,[DIM],Al,_), unsat(Al,A2,A,B), extend(Ds,M,C,A2,B). extend([],_,_,A,_):-trueIA=unsat. sat(sat,sat,A,_):-trueIA=sat. sat(unsat,_,A,B):-trueIA=unsat,B=quit. sat(_,unsat,A,B):-trueIA=unsat,B=quit. unsat(unsat,unsat,A,_):-trueIA=unsat. unsat(sat,_,A,B):-trueIA=sat,B=quit. unsat(_,sat,A,B):-trueIA=sat,B=quit. Figure 3: A simple MGTP jG interpreter M It should be noted here that 1\11 x !II[ pairs were already considered in the previous phase of conjunctive matching. If they were chosen in this phase, the result would contribute nothing since the model candidate need not be extended with the same jj.. Hence, redundant consideration on M x-!l1[ pairs should be avoided at this time. Instead, we have to choose only the pairs which contain at least one jj.. This discussion can be generalized for cases in which we have more than two antecedent literals, any number of clauses, and any number of model candidates. Vie have taken two approaches to avoid the above redundancy. One approach uses a stack to keep the intermediate results obtained by matching a literal against an element out of the model candidate. The other approach recomputes the intermediate matching results without keeping them. 01 {~D2[ A1 , 01 X 11 01 X M 81 X 11 02 X 11 03 !! 02 x M i i 82 X 11 82 it .. _..........83 _.... _......... __..... A2, A3 ~ ~C Figure 4: RAMS method 4.2.1 RAMS Method The RAMS ( ramified-stack) method [Hasegawa et. al. 90a][Hasegawa et. al. 90b][Fujita and Hasegawa 91] retains in a stack an instance which is a result of matching a literal against a model element. The use of this method 363 for a Horn clause case is illustrated in Figure 4, where M is a model candidate and 6. is an atom picked from a model-extending candidate. • A stack called a literal instance stack (LIS), is assigned to each antecedent literal, A, in a clause for storing literal instances. Note that LIS for the last literal expressed in dashed boxes needs not actually be allocated. • LIS is divided into tVI/O parts: Di and Si where D i ( i ~ 1) is a set of literal instances generated at the current stage triggered by 6.; and Si is those created in previous stages. Al A2 A3 := Di x 6. U Di X M U Si X ~ ~ ~ * For ground,and A1 =1= A2 =1= A 3 ( =1= means not-unifiable) Figure 5: MERC method 6. (i ~ 1) where A x B denotes a set of pairs of an instance taken from A and B. The above tasks are performed from left to right. For non-Horn clause cases, each LIS branches to make a tree-structured stack when case splitting occurs. The name 'RAMS' comes from this. The idea is as follows: • A model is represented by a branch of a ramified stack, and the model is extended only at the top of the current stack. • After applying the model extension rule to a nonHorn clause, the current model may be extended to multiple descendant models. • Every descendant model that is extended.from a parent model can share its ancestors with other sibling models just by pointing to the top of the stack corresponding to the parent. • Each descendant model can extend the stack for itself, independent of other sibling models. The ramified-stack method not only avoids redundancy in conjunctive matching but also enables us to share a common model. However, it has one drawback: it tends to require a lot of memory to retain intermediate literal instances. 4.2.2 Generator M M M ~ M M M ~ ~ M ~ M ~ ~ M ~ ~ ~ • A task, being performed at each literal, Ai, computes the following: Di+ I M MERe Method The MERC (Multi-Entry Repeated Combination) method [Hasegawa 91a] tries to solve the above problem using the RAMS method. This method does not need a memory to retain intermediate results obtained in the conjunctive matching. Instead, it needs to prepare 2n - 1 clauses for the given clause having n literals as its antecedent. An outline of the MERC method is shown in Figure 5. For a clause having three antecedent literals, AI, A 2 , A3 ~ C, we prepare seven clauses. Each of these clauses corresponds to a repeated combination of 6. and 111, and performs conjunctive matching using the combination pattern. For example, a clause corresponding to a combination pattern [M, 6., 111] first matches literal A2 against 6.. If the match succeeds, it proceeds to match the remaining literals, Al and A 3 , against an element picked from M. Note that each combination pattern includes at least one 6., and that the [M, M, A1] pattern is excluded. For ground model cases, optimization can be used to reduce the number of clauses by testing the unifiability of antecedent literals. For example, if any antecedent literal in the given clause is not unifiable with the other antecedent literal in that clause, it is sufficient to consider the following three combination patterns: [6.,111, M],[1I1, 6., M] and [1\11, M, 6.] . The righthand side in Figure 5 shows the clauses obtained after making the unifiability test. 4.2.3 6.-M Method The' problem with the MERC method is that the number of prepared clauses increases exponentially as the number of antecedent literals increases. In actual implementation, we adopted a modified version of the MERC method, which we call the 6.-M method. In place of multiple entry clauses, the 6.-M method prepares a template like: {(6., 6.], [6., MJ, [M, 6.]) for clauses with two antecedent literals, and {[6., 6.,6.], [6.,6., M], [6., M, 6.], [M, 6., 6.], [6.,111, M], [M, 6., M], [M, M, 6.]} 364 for clauses with three antecedent literals, and so forth. According to this pattern, we enumerate all possible combinations of atoms for matching the antecedent literals of given clauses. There are some trade-off's between the RAMS method and the MERC and 6-M methods. In the RAMS method, every successful result of matching a literal Ai against model elements is memorized so that the same literal is not rematched against the same model element. On the other hand, both the MERC and 6-M methods do not need to memorized information on partial matching. However, they still contain a redundant computation. For instance, in the computation for [M,6,6] and [M, 6, A1] patterns, the common subpattern [M,6]' will be recomputed. The RAMS method can eliminate this sort of redundancy. 4.3 Lazy Model Generation Model-generation based provers must perform the following three operations. • create new model elements by applying the model extension rule to the given clauses using a set of model-extending atoms 6 and a model candidate set M (model extension). • make a subsumption test for a created atom to check if it is su bsumecl by the set of atoms already being created, usually by the current model candidate. • make a false check to see if the unsubsumed model element derives false by applying the model extension rule to the tester clauses (rejection test). The problem with the model generation method is the huge growth in the number of generated atoms and in the computational cost in time and space, which is incurred by the generation processes. To solve this problem, it is important to recognize that proving processes are viewed as generation-and-test processes, and that generation should be performed only when testing requires it. For this we proposed a lazy model generation algorithm [Hasegawa 91b][Hasegawa 91d][Hasegawa et. al. 92a][Hasegawa et. ai. 92b] that can reduce the amount of computation and space necessary for obtaining proofs. This section presents several algorithms, including the lazy algorithm, for the model generation method, and compares them in terms of time and space. To simplify the presentation, we assume that the problem is given only in Horn clauses. However, the principle behind these algorithms can be applicable to non-Horn clauses as well. 4.3.1 Basic Algorithm The basic algorithm shown in Figure 6 performs model generation with a search strategy in a breadth-first fash- M:= cp; D := {A I (true while D A) E a set of given clauses}; begin -t =I cp do D :=D-6; if CJMTester(6,M) :3 false then return(success); new:= CJA1Generator(6,M); M :=MU6; new' := subsumption(new, MUD); D:= D Unew'; end return(fail) Figure 6: Basic algorithm ion. This is essentially the same algorithm as the hyperresolution algorithm taken by OTTER [McCune 90] 3. In the algorithm, M represents model candidate, D represents the model-extending candidate (a set of model-extending atoms which are generated as a result of the application of the model extension rule and are going to be added to M), and 6 represents a subset of D. Initially, M is set to an empty set, and D is a set of positive (unit) clauses of the given problem. In each cycle of the algori thm, 1) 6 is selected from D, 2) a rejection test (conjunctive matching for the tester clauses) is performed on 6 and M, 3) if the test succeeds then the algorithm terminates, 4) if the test fails then model extension (conjunctive matching on the generator clauses) is performed on 6 and M, and 5) a subsumption test is performed on new against MU D. If D is empty at the beginning of a cycle, then the algorithm terminates as the refutation fails (In other words, a model is found for the given set of clauses). The conjunctive matching and subsumption test is represented by the following functions on sets of atoms. CJMCs(6,M) = {(TC I (TAl,'" ,(TAn - t (TC A AI," .,An - t C E Cs A (TAi = C7B(B EMU 6)(1 ~ Vi ~ n) A 3i(1 ~ i ~ n)C7Ai = C7B(B E .6.)} subsumption(6,M) = {C E 6 I VB E M(B dosen't subsume C)} 30TTER is a slightly optimized version of the basic algorithm where negative unit clauses are tested on literals in new as soon as they are generated as the full-test algorithm described in the next section. 365 process tester: repeat forever reqtlest(generator, 6.); 6.' := stlbstlmption(6., MUD); if CJMT (6.',M U D) E false then return(success); D:=DU6.'. ltd := >; D := {A I (trtle ~ A) E a set of given clauses}; while D =I- > do begin D:= D -~; new := C J ltdcenerator(6., ltd); !vI := ltd U 6.; new' := subsmnption(new, ltd U D); if CJ.~dTester(new', ltd U D) 3 false then return(success); D:= D Unew'; end return(fail) process generator: repeat forever while Btlf = > do begin D:=D-{e}; Btlf :=delayCJMc({e},M); M:= Mu {e} end; wait( tester); 6. :=forceBuf; until D = > and Buf = >. Figure 7: Full-test algorithm 4.3.2 Full-Test Algorithm Figure 7 shows a refined version of the basic algorithm called the full-test algorithm. The algorithm 1) selects 6. from D, 2) performs model extension using 6. and j\1 generating new for the next generation of 6., 3) performs a subsumption test on new against AI U D, and 4) performs a rejection test on new', which passed the su bsumption test, together with ltd U D. Though this refinement seems to be very small on the text level, the complexity of time and space is significantly reduced, as explained later. The points are as follows. The algorithm performs subsumption and rejection tests on all elements of new rather than on 6., a subset of new generated in the past cycles. As a result, if a falsifying atom 4, X, is included in new, the algorithm can terminate as soon as false is derived from X. That is, the algorithm neither overlooks the falsifying atom nor puts it into D as the basic algorithm does. Thus, it never generates atoms which are superfluous after X is found. 4.3.3 Lazy Algorithm Figure 8 shows another refinement of the basic algorithm, the lazy algorithm. In this algorithm, it is assumed that t'vvo processes, one for generator clauses and the other for tester clauses, run in parallel and communicate with each other. The tester process 1) requests 6. to the generator process, 2) performs a subsumption test on 6. against ltdUD, and 3) performs a rejection test on 6.. For the generator process, 1) if a buffer, Buf, used for storing a set of atoms which are the results of an application of the model extension rule, is empty, the generator selects an atom, e, from D and sets a code for model extension (delay CJM) for e and ltd onto Btlf, 2) waits for a request of 6. from the tester process, and 4 A falsifying atom, X, is an atom that satisfies the antecedent of a negative clause by itself or in combination with MUD. Figure 8: Lazy algorithm 3) forces the buffer, Buf, to generate 6.. delay (above) is an operator which delays the execution of its operand (a function call). Hence, the function call, CJ Mc( {e}, M), will not be activated during 1), but will be stored in Buf as a code. Later, at 3), when the force operator is applied to Buf, the delayed function call is activated. This generates the values that are demanded. Using this mechanism, it is possible to generate only the 6. that is demanded by the tester process. After the required amount of 6. is generated, a delayed function call for generating the rest of the atoms is put into Buf as a continuation. The atoms are stored in M and D in a way that makes the order of generating and testing the atoms exactly the same as in the basic algorithm. The point of the refinement in the lazy algorithm is, therefore, to equalize the speed of generation and testing while keeping the order of atoms that are generated and tested the same as that of the basic algorithm. This eliminates any excess consumption of time and space due to over-generation of redundant atoms. 4.4 Optimization Clauses of Unit Tester Given the unit tester clauses in the problem, the three algorithms above can be further optimized. There are two ways to do this. One is a dynamic way called the lookahead method. In this method, atoms are generated excessively in the generation process in order to apply the rejection rule with unit tester clauses. More precisely, immediately after generating new, the generator process generates 'newnext, which would be regenerated in a succeeding step. Then 366 newnext is tested with unit tester clauses. If the test fails, then newnezt is discarded whereas new is stored. < 6..,NI >::::} generate(A1 ,A2 - ? C)::::} new < new, j\1 U D >::::} generate(A ll A2 newnext ::::} test(A -? -? C) ::::} newnext false) The reason why neWnext is not stored is that testing with unit tester clauses does not require M or D, but can be done with only neWnext itself. On the other hand, for tester clauses with more than one literal, testing cannot be completed, since testing for combinations of atoms from neWnext would not be performed. newnext will be regenerated as new in the succeeding step. This means that some conjunctive matching will be performed twice for the identical combination of atoms in a model candidate. However, the increase in computational cost due to this redundancy is negligible compared to the order of total computational cost. The other method is a static one which uses partial evaluation. This is used to obtain non-unit tester clauses from a unit tester clause and a set of generator clauses by resolving the two. Generator: AI, A2 Unit tester: A N on-unit tester: -? -? C. false. 0' AI, 0' A2 - ? where O'C false. = 0' A The computational complexity for conjunctive matching using the partial evaluation method is exactly the same as that using the lookahead method. The partial evaluation method, however, is simpler than the lookahead method, since the former does not need any modification of the prover itself whereas the latter does. Moreover, the partial evaluation method may be able to reduce the search space significantly, since it can provide propagating goal information to generater clauses. However, in general, partial evaluation results in an increase in the number of clauses, Hence it may make performance worse. The two optimization techniques are equally effective, and will optimize the model generation algorithms to the same order of magnitude when they are applied to unit tester clauses. 4.4.1 Summary of Complexity Analysis In this section, we briefly describe the time and space complexity of the algorithms described above. The details are discussed in [Hasegawa et. al. 92a]. For simplicity, we assumed the following. 1) The problem consists of generator clauses with two antecedent literals and one consequent literal, and tester clauses with at most two literals. 2) 6.. is a singleton set of an atom selected from D. 3) The rate at which conjunctive matchings succeed for a generator clause, and atoms generated as the result pass a subsumption test, the survival rate, is p(O :::; p :::; 1). 4) The order in which 6.. is selected and atoms are generated according to 6.. is fixed for all of the three algori thms. Table 1 summarizes the complexity analysis. T/S/G stands for complexity entry of rejection test /subsumption test/model extension, and M stands'for the required memory space. The value of a(l :::; a :::; 2) represents the efficiency factor of the subsumption test. a = 1 means that a subsumption test is performed in a constant order, because the hashing effect is perfect. a = 2 means that a subsumption test is performed in a time proportional to the number of elemen.ts, perhaps because a linear search was made in the list. As for the condensed detachment pro blem, the hashing effect is very poor and a is very close to two. The memory space required for the basic, full-test/lazy and lazy lookahead algOrIthms decreases along this order by a square root for each. This means that the number of atoms generated decreases as the algorithm changes, which in turn implies that the number of subsumption tests decreases accordingly. In the case of a = 2, the most expensive computation of all is a subsumption test, and a decrease in its complexity means a decrease in total complexity. On the other hand, in the case of a = 1, the most expensive computation of all is the rejection test with two-literal tester clauses. This situation, however, is the same for all of the algorithms and adopting lazy computation will result in speedup by a constant factor. In any case, by adopting lazy computation, the complexity of the total computation is dominated by that of the rejection test . 4.4.2 Performance Experiment An experimental.result is shown in Table 2. The example, Theorem 4, is taken from [Overbeek 90]. We did not use heuristics such as weighting and sorting, but only limited term size and eliminated tautologies. Every algorithm is implemented in KLI and run on a pseudo Multi-PSI in PSI-II [Nakashima and Nakajima 87]. The OTTER entry represents the basic algorithm optimized for unit tester clauses and implemented in KLl. The figures in parentheses are of algorithms for tester clauses with two literals as a result of applying partial evaluation to unit tester clauses. In unify entries, 367 Table 2: Experimental result (Theorem 4) basic >14000 ( 463.86) Time (sec) full-test 409.17 (82.40) 1656+74800 (43981+4158) .5736 (.596) 272 (6:3) 1384 (209) - Unify (43981+74254) Subsumption test M Memory - (.5674) (272) D - (1:375) lazy 407.58 (81.82) 1656+74737 (43981+4158) 5736 (.596) 272 (63) 1384 (209) lazy lookahead 210.45 (81.69) 81956+409.5 (43981 +409.5) 593 (59:3) 63 (63) 209 (209) OTTER 409.16 (462.1:3) 1656+74800 (43981+74254) 5736 (5674) 272 (272) 1384 (1375) Table 1: Summary of complexity analysis Unit tester clause G S T basic full- tes t / lazy lazy lookahead pm 2 pm-2 m 2 pp 2m 4CY. f.Lm'IOi (f.L/p)m Ol M p'lm 4 m2 pJ m 4 pm 2 m/p m 2-literal tester clause basic full-test / lazy T S G M p'lm4 p21n4 f.L p'lm 401 p1n 201 p 2m4 rn 2 p3 m 4 pm 2 t m is the number of elements in model ca.ndidatE' when false is detected in the basic algorithm. t p is the survival rate of a generated atom. f.L is the rate of successful conjunctive matchings (p ~ f.L), and a is the efficiency factor of a subsumption test. a figure to the left of + represents the number of conjunctive matchings performed in tester clauses, and a figure to the right of + represents the number of conjunctive matchings performed in generator clauses. These results are a fair reflection of the complexity analysis shown in Table 1. For instance, to solve Theorem 4 without partial evaluation optimization, the basic algorithm did not reach a goal within 14,000 seconds, whereas the full-test and lazy algorithms reached the goal in about 400 seconds. The most time-consuming computation in all of the three algorithms (basic, full-test and lazy), is rejection testing. The difference in the time complexity between the basic algorithm and the other two algorithms is (/l-p2 m 401)/(/l-m 201 ) = p2m 201 , which results in the time difference mentioned above. The basic algorithm and the full-test/lazy algorithm do not differ in the number of unifications performed in the tester clauses. However, the number of unifications performed in the generator clauses and the number of su bsumption tests decreases as we move from the basic algorithm to the full-test and lazy algorithms. The decrease is about one hundredth when partial evaluation is not applied. and about one tenth when it is applied. By applying lookahead optimization, the lazy algorithm is further improved. Though the lookahea:d optimization and the partial evaluation optimization are theoretically comparable in their order of improvement, their actual performance is sometimes very different. For Theorem 4, the lazy algorithm optimized with partial evaluation took 81.82 seconds, whereas the same algorithm optimized with lookahead optimization took 210.45 seconds. This difference is caused by the difference in the number of unifications performed in the tester clauses. This is because in the lazy algorithms with lookahead optimization, the generator clause, p(X),p( e(X, Y)) ----7 p(Y), generates an atom before the unit tester clause, ptA) ----7 false tests the atom. In the same algorithm, with the partial evaluation optimization, the instantiation information of A is propagated to the antecedent of p( X), p( e( X, A)) ----7 false and the unification failure can be detected earlier. Partial evaluation optimization is effective for all the algorithms except OTTER. This is because lookahead optimization, in the OTTER algorithm, is already applied to unit tester clauses, and the algorithm remains the basic one for non-unit tester clauses. 5 Parallelizing MGTP There, are several ways to parallelize the proving process in the MGTP prover. These are to exploit parallelism in: • conjunctive matching in the antecedent part, • subsumption test, and • case splitting For ground non-Horn cases, it is sufficient to exploit OR parallelism induced by case splitting. Here we use 368 1--··-·-----·-- Table 3: Performance of MGTP IG on Multi-PSI Master Slave N-1 Figure 9: Simple allocation scheme OR parallelism to seek a multiple model, which produces multiple solutions in parallel. For Horn clause cases, we have to exploit AND parallelism. The main source of AND parallelism is conjunctive matching. Performing subsumption tests in parallel is also very effective for Horn clause cases. In the current NIGTP, we have not yet considered nonground and non-Horn cases. 5.1 89]. Processor Allocation The processor allocation methods we have adopted achieve 'bounded-OR' parallelism in the sense that ORparallel forking in the proving process is suppressed so as to meet restricted resource circumstances. One simple way of doing this, called simple allocation; is depicted in Figure 9. 'vVe expanded model candidates, starting with an empty model, using a single masterprocessor until the number of candidates exceeded the number of available processors. We then distributed the remaining tasks to slave-processors. Each slave processor explored the branches assigned without further distributing ta.sks to any other processors. This simple allocation scheme for task distribution works fairly well, since the communication cost can be minimized. 5.1.2 true true -7 -7 1 Number of processors 2 4 8 16 40 1.00 1.45 40 1.00 1.47 39 1.02 1.48 44 0.90 1.50 44 0.90 1.50 650 1.00 23.7 407 1.59 23.7 266 2.44 23.7 189 3.44 23.8 154 4.22 23.8 12,538 1.00 460 6,425 1.95 460 3,336 3.76 460 1,815 6.91 460 1,005 12.5 460 315,498 1.00 11,117 159,881 1.97 11,117 79,921 3.94 11,117 40,852 7.72 11,117 21,820 14.5 11,117 p(1, 1); p( 1,2); ... ; p( 1, n). p(2, 1); p(2, 2); ... ; p(2, n). OR Parallelization for MGTP /G With the current version of the MGTP IG, we have only attempted to exploit OR parallelism[Fujita and Hasegawa 90] on the Multi-PSI machine[Nakajima et. al. 5.1.1 Problem 4-queens Time (msec) Speedup Kred 6-queens Time (msec) Speedup Kred 8-queens Time (msec) Speedup Kred 10-queens Time (msec) Speedup Kred true -7 p(n, 1);p(n,2); ... ;p(n,n). P(Xll Yl), p(X2 , "Y2), unsafe(Xll Yl, X 2 ,"Y2) -7 false. The first N clauses simply express every possibility of placing queens on the N by N chess board. The last clause expresses the constraint that a pair of queens must satisfy. The problem can be solved when either a model (one, solution) or all of the models (all solutions)5 are obtained for the clause set. Performance was measured on the MGTP IG prover running on the Multi-PSI with the simple allocation method. Table 3 gives the result of the all-solution search on the N-queens problem. Here we should note that the total number of reductions stays almost constant, even though the number of processors used increases. This means that no extra computation is introduced by distributing tasks. Speedup obtained by using up to 16 processors is shown in Figures 10 and 11. For the 10queens and 7-pigeons problems, the speedup obtained as the number of processors increases is almost linear. The speedup rate is small only for the 4-queens problem. This is probably because the constant amount of interpretation overhead in such a small problem will dominate the tasks required for the proving process. Performance of MGTP /G on Multi-PSI One of the examples we used was the N-queens problem. This problem can be expressed by the following clause set: 5 All models can be obtained, if they are finite, by the MGTP interpreter in all-solution mode. 369 5.2 AND Parallelization for MGTP IN We have several choices when parallelizing modelgeneration based theorem provers: 1) proofs which change or remain unchanged according to the number of PEs used, 2) model sharing (copying in a distributed memory architecture) or model distribution, and 3) master-slave or masterless. Number of PEs Figure 10: Speedup of MGTP IG on Multi-PSI (N-queens) Ideal 7 pigeons 6 pigeons 5 pigeons Number of PEs Figure 11: Speedup of MGTP IG on Multi-PSI (Pigeon hole) . The proof obtained by a proof changing prover may be changed according to a change in the number of PEs. We might get super-linear speedup if the length of a proof depended on the number of PEs used. However, we cannot always expect an increase in speed as the number of PEs mcreases. On the other hand, a proof unchanging prover does not change the length of the proof, no matter how many PEs we use. Hence, we could always expect greater speedup as the number of PEs increased, though we would only get linear speedup at best. With model sharing, each PE has a copy of the model candidates and distributed model-extending candidates. With model distribution, both the model candidates and model-extending candidates are distributed to each PE. Model sharing and model distribution. both have advantages and disadvantages. From the distributive processing point of view, with model distribution, we can obtain memory scalability and more parallelism than with the model sharing method. For a newly created atom 0, there are n parallelisms in the model distribution method, since we can perform conjunctive matchings and subsumption tests for it in parallel where n is the number of processors. On the other hand, in the model sharing method, we cannot exploit this kind of parallelism for a single created atom unless conjunctive matchings and subsumption tests are·made for a different region of model candidates. From the communication point of view, however, the communication cost with model sharing is less than with model distribution. The communication cost with model distr.ibution increases as the number of PEs increases, since generated atoms need to flow to all PEs for subsumption testing. For example, if the size of model elements finally obtained is M, the number of communications amounts to O(M2) for a clause having two antecedent literals. On the other hand, with model sharing, we do not have to flow the generated atoms to all PEs. In this case, time-consuming subsumption tests and conjunctive matchings can be performed independently at each PE, with minimal inter-PE communication. The master-slave configuration makes it easy to build a parallel system by simply connecting a sequential version of MGTP IN on a slave PE to the master PE. However, its devices must be designed to minimize the load on 370 PE2 PE1 . t the master process. On the other hand, a masterless configuration such as ring connection allows us to achieve pipeline effects with better load balancing, whereas it becomes harder to implement suitable control to manage collaborative work among PEs. Our policy in developing parallel theorem provers is that we should distinguish between the speedup effect caused by parallelization and the search-pruning effect caused by strategies. In proof changing parallelization, changing the number of PEs is merely betting, and may cause a strategy to be changed for the worse even if it results in the finding of a shorter proof. In order to ensure the validity of our policy, we implemented proof changing and unchanging versions. In the following sections, we describe actual parallel implementations and compare them. 5.2.1 t 16 15 14 13 12 11 10 9 8 7 PEs PE4 h 9 f 9 .,.Rf' f e d C *1 Stage 5 *1 Stage 6 b M d ...,c" c *1 Stage 1 Stage 7 c b a *1 a *1 *1 Stage 8 ,,(f' % 3 2 e 9 b a *1 c b *1 Stage 4 a Stage 3 Stage 2 M M M Figure 12: Proof Changing and Model Sharing Proof Changing Implementation 1. Model Sharing This implementation uses model sharing, and a ring architecture in which processi(l ~ i < n) is connected to proceSSi+l and process n is connected to process}, where n is the number of PEs[Hasegawa 91aJ. proceSSi has a copy of model candidates M and distributed model-extending candidates D i • A rough sketch of operations processi(l < i ~ 71.) follows. performed in (1) Receive .6. i - 1 from proceSSi_l. (2) Pick up an atom Oi from Di such that OJ is not subsumed by any elements in M and .6. i - 1 . Di := Di - {od· (3) .6.i := .6. i - 1 U {Sd. (4) IfCJMTcstcr({Oi},MU.6.i-l) 3 false then send a termination message to all processes, otherwise, (5) Dj := Di U C J MGcnerator( {OJ}, M U .6.i- 1 ). (6) M := M U .6. i (update M in processi). (7) Send.6.j to proceSSj+l. For process 1 , instead of actions (3) and (6), the following actions are performed. (3') .6. 1 : = {oIl, and (6') M:= A1 U .6. n . Note that actions (4)""(8) can be performed in parallel. Figure 12 shows how models are copied, and conjunctive matching is executed in a pipeline manner in the case of n = 4. A letter denotes a model candidate element and an asterisk indicates an element on which conjunctive matching is performed. For example, processIon PEl selects an unsubsumed model element a (from its own model-extending candidate) at time tIl and sends it to proceSS2 on P E 2. proceSS2 stores element a into the model candidates in P E 2 , proposes a model-extending element b, sends a and b to the proceSS3, and starts conjunctive matching of band {a} U M. Note that conjunctive matching in a proceSSj can be overlapped. For example, the conjunctive matching in stage 6 does not have to wait for the completion of the conjunctive matching in stage 2. This exploits pipeline effects very well, resulting in low communication cost compared to the computation cost for conjunctive matching. 2. Model Distribution This implementation takes model distribution and a ring architecture. Each process has its own distributed model candidates and distributed modelextending candidates. The algorithm for each process is similar to the sequential basic algorithm. They differ in that: 1) conjunctive matching cannot be completed in one process because model candidates are distributed. Thus the continuations of conjunctive matching in each process need to go around the ring, and 2) newly created atoms have to go around the ring for subsumption testing. 5.2.2 Proof Unchanging Implementation We implemented a proof unchanging version in a masterslave configuration, and model sharing based on the lazy model generation. In this fmplementation, generator and subsumption processes run in a demand-driven mode, 371 while tester processes run in a data-driven mode. The main advantages of this implementation are as follows: k D 1) Proof unchanging allows us to 0 btain greater speedup as the number of PEs increases. Ii i-S 11 i-T cjm(j,1 .... j-1) /::,. i-G cjm (i, 1... i-1 ) M 2) By utilizing the synchronization mechanism supported by KLl, sequentiality in subsumption testing is minimized. Figure 13: Lazy Implementation 3) Since slave processes spontaneously obtain tasks from the master, and the size of each task is well equalized, good load balancing is achieved. 4) By utilizing the KLI stream data type, demanddriven control is easily and efficiently implemented. By using demand-driven control, we cannot only suppress unnecessary model extensions and subsumption tests but also maintain a high running rate, which is the key to achieving linear speedup. The model generation method consists of three tasks: 1) generation, 2) subsumption test, and 3) rejection test. We provided three processes to cope with this: • G(generator), • S(subsumption tester), and • T(rejection tester). The G/ T/ S process has a poin ter i / j / k which indicates an element of the stack, shown.in Figure 13. The stack elements are model candidates or model-extending candidates. In the figure, M denotes model candidates for which conjunctive matching performed by G is completed and D denotes model-extending candidates on which the subsumption test is completed. G/ T/ S process iterates the following actions. G: performs model extensions by using the i-th element (.6.) and the 1, ... , i-I -th elements (M), and sends newly created atoms to S. i := i + 1. s: performs subsumption tests on the newly created atoms against 1, ... , k-l -th elements (AI U D), and pushes the unsubsumed atoms to the stack. k := k + I where I is the number of unsubsumed atoms. T: performs model rejection tests on the j-th element and the 1, ... ,j -1 -th elements. Figure 14 shows a process structure for the proof unchanging parallel implementation. The central box represents the shared model and model-extending candidates. The upper boxes represent atoms generated by the generator G i and the arrows indicate the order in which the atoms are sent to the master process. Proof unchanging is realized by keeping this order. To make the system proof unchanging, the sequence order in which M and D are updated must remain the same as the sequence in a sequential case. The master process sends an atom generated by a generator process to a subsumption tester process in the same order as the master receives the atom, that is, the master aligns the elements generated by generator processes so as to be in the same order as in the sequential case. Many G/ T/ S processes work simultaneously. The master process is introduced to control task distribution, that is, giving a different task (~) to a different process. Each S process requests .6." to a G process through the master process. This means that the communication between G and S processes is indirect. The critical resource for S processes is the modelextending candidates D. The critical regions are the updating of D by D := D U new' and a part of subsumption(new, MUD) (see Figure 8). Most elements of MUD have already been determined by some subsumption tester process and synchronization in subsumption testing can be minimized so that most parts of subsumption tests should not be critical. To exclusively access the critical resource D, each S process requests to the master a pair of ~" and a key which indicates the right to update. If~" is subsumed by the already determined elements in MUD, the key is returned to the master process without any reference to the key. In this case, there is no synchronization with other S processes. If .6." is not subsumed by the already determined elements in MUD, the S process refers to the key to see if it has the right to update, and updates D by D := D U .6." if it has. Otherwise, the process waits until the other S process updates D. If the other S process updates D, the subsumption test is performed on the added elements. 372 Table 4: Performance Comparison (16PEs) #3 #6 #56 #58 #63 #69 Figure 14: Proof Unchanging The cri tical resources for the G processes are both the model candidates A1 and the model-extending candidates D. This is similar to tester processing. Problem Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE Speedup Time (sec) KRPSjPE #72 Speedup Time (sec) KRPS/PE Speedup #77 5.2.3 Performance of MGTP IN on Multi-PSI and PIM Some experimental results for the proof changing and unchanging versions in model sharing are shown in Tables 4 and 5, and Figures 15 and 16. Each program is implemented in KLI and runs on the Multi-PSI. Table 4 shows a performance comparison between the two versions with 16 PEs. In the proof unchanging version (PU column), we limited the term size and eliminated tautologies. In addition to the above, in the proof changing version (PC column), we used heuristics such as weighting and sorting. All problems are condensed detachment problems [McCune and \Vos 91]. We measured performance with 1, 2, 4, 8 and 16 PEs. In the PC time entry. column, the number of PEs in parentheses indicates the number of PEs which yield the best performance. In the proof unchanging version, we a.lways got the best performance with 16 PEs, whereas we sometimes got the best performance with 8 PEs in the proof changing version. We also have an example in which we got the best performance with 2 PEs. This comparison implies that super-linear speedup does not always signify an advantage in a parallelization method, because the proof unchanging version always beats the proof changing version in absolute speed with the problems used in the table. Figures 15 and 16 display the speedup ratio for the problems #3, #58, #77, #66, #92, and #112 using the PU 218.77 34.68 13.27 3.75 12.47 3.65 3.53 13.39 3.53 12.80 27.51 9.23 4.56 20.01 6.06 6.07 16.69 4.98 3.62 14.02 4.47 37.10 36.66 12.65 PC 6766 (16 PEs) 25.99 157.63 (16 PEs) 17.75 6.75 10.37 (8 PEs) 3.97 415.57 27.32 (16 PEs) 3.75 66.32 48.37 (16 PEs) 15.24 11.07 23.41 (16 PEs) 4.52 2.90 12.17 (16 PEs) 2.10 45.51 62.07 (8 PEs) 25.62 109.24 Speedup 16 14 12 10 8 6 4 ---EJ- #3 --+-- #58 #77 ~::- 2 ideal 0 0 2 4 6 8 10 12 14 16 No. of PEs Figure 15: Speedup ratio I 373 Speedup 16 6 Conclusion 14 .: ..- .:.... 12 10 8 6 ----m- 4 #GG #92 #112 2 ideal o 2 4 6 8 10 12 14 16 No. of PEs Figure 16: Speedup ratio II Table 5: Performance for 16/64 PEs Problem Time (sec) Reductions Th 5 KRPS/PE Speedup Time (sec) Reductions Th 7 KRPS/PE Speedup 16 PEs 41725.98 38070940558 57.03 1.00 48629.93 31281211417 40.20 1.00 64 PEs 11056.12 40759689419 57.60 3.77 13514.47 37407531427 43.25 3.60 proof unchanging version. There is no saturation in performance up to 16 PEs and grea,ter speedup is obtained for the problems which consume more time. Table 5 shows the performance obtained by running the proof unchanging version for Theorems 5 and 7 [Overbeek 90] on Multi-PSI with 64 PEs. We did not use heuristics such as sorting, but merely limited term size and eliminated tautologies. Note that the average running rate per PE for 64 PEs is actually a little higher than that for 16 PEs. With this and other results, we were able to obtain almost linear speedup. Recently we obtained a proof of Theorem 5 on PIM/m [Nakashima et. al. 92] with 127 PEs in 2870.62 sec and nearly 44 billion reductions 6 (thus 120 KRPS/PE). Taking into account the fact that the PIM/m CPU is about twice as fast as the Multi-PSI CPU, we found that near-linear speedup can be achieved, at least up to 128 PEs. 6The exact figure was 43,939,240,329 reductions We have presented two versions of the model-generation theorem prover MGTP implemented in KL1: MGTP /G for ground models and MGTP /N for non-ground models. We evaluated their performance on the distributed memory multi-processors Multi-PSI and PIM. When dealing with range-restricted problems in modelgeneration theorem provers, we only need matching rather than full unification, and can make full use of the language features of KL1, thereby achieving good efficiency. The key techniques for implementing MGTP /G in KL1 are as follows: (1) A given set of input clauses of implicational form are compiled into a corresponding set of KL1 clauses. (2) Generated models are held by the prover program instead of being asserted. (3) Conjunctive matching of the antecedent literals of an input clause against a model element is performed by very fast KL1 head unification. (4) Searching for a model element that matches the antecedent is performed by computing a repeated combination of model elements by means of loop executions instead of backtracking. (5) Fresh variables for a different instance of the antecedent literal are obtained automatically just by calling a KL 1 clause. These techniques are very simple and straightforward yet effective. For solving non-range-restricted problems, however, we cannot use the above techniques developed for MGTP /G. If the given problem is Horn, it can be solved by the MGTP prover extended by incorporating unification with occurrence check, without changing the basic structure of the prover. For non-Horn problems, however, substantial changes in the structure of the prover would be required in order to manage shared variables appearing in the consequent literals of a clause. Accordingly, we restricted MGTP /N to Horn problems, and developed a set of KL1 meta-programming tools called the Meta-Library to support full unification and the other functions for variable management. To improve the efficiency of the MGTP provers, we developed RAMS, MERC, and .0.-M methods that enable us to avoid redundant computations in conjunctive matching. We have obtained good performance results by using these methods on the PSI. Moreover, it is important to avoid very great increases in the amount of time and space consumed when proving hard theorems which require deep inferences. For this we proposed the lazy model generation method, which 374 can decrease the time and space complexity of the basic algorithm by orders of magnitude. Experimental results show that significant amounts of computation and memory can be saved by using the lazy algorithm. The parallelization of MGTP is one of the most important issues in our research project. For non-Horn ground problems, a lot of OR parallelism caused by case splitting can be expected. This kind of problem is well-suited to a local memory multi-processor such as Multi-PSI, on which it is necessary to make the granuality as large as possible so that communication costs can be minimized. We obtained an almost linear speedup for the n-queens, pigeon hole, and other problems on Multi-PSI, using a simple allocation scheme for task distribution. For Horn problems, on the other hand, we had to exploit the AND parallelism inherent in conjunctive matching and subsumption. Though the parallelism is large enough, it seemed rather harder to exploit than OR parallelism, since the Multi-PSI is not suited to this kind of fine-grained parallelism. Nevertheless, we found that we could obtain good performance and scalability by using the AND parallelization methods mentioned in this paper. In particular, the recent results obtained by running the MGTP IN prover on PIM/m showed that we could achieve linear speedup for condensed detachment problems, at least up to 128 PEs. The key technique is the lazy model generation method, that avoids the unnecessary computation and use of memory space while maintaining a high running rate. For MGTP IN, full unification is written in KLl, which is thirty to one hundred times slower than that written in Con SUN/3s and SPARCs. To further improve the performance of MGTP IN, we need to incorporate built-in firmware functions for supporting full unification, or to develop KL1 compiling techniques far non-ground models. Through the development of MGTP provers, we confirmed that KLI is a powerful tool for the rapid prototyping of concurrent systems, and that parallel automated reasoning systems can be easily and effectively built on the para.llel inference machine, PIM. Acknowledgment We would like to thank Dr. Kazuhiro Fuchi, the director of ICOT, and Dr. Koichi Furukawa, the deputy director of ICOT, for giving us the opportunity to do this research and for their helpful comments. Many fruitful discussions took place at the PTP Working Group meeting. Thanks are also due to Prof. Fumio Mizoguchi of the Science University of Tokyo, who chaired PTP-WG, and many people at the cooperating manufacturers in charge of the joint research. References [Bibel 86] W. Bibel, Vieweg, 1986. A utomated Theorem Proving, [Bose et. al. 89] S. Bose, E. M. Clarke, D. E. Long and S. Michaylov, PARTHENON: A Parallel Theorem Prover for Non-Horn Clauses in Proc. of 4th Annual Symp. on Logic in Computer Science, 1989. [Chikayama et. al.88] T. Chikayama, H. Sato and T. Miyazaki, .Overview of the Parallel Inference Machine Operating System (PIMOS), in Proc. of FGCS'88, 1988. [Fuchi 90] K. Fuchi, Impression on KL1 programming from my experience with writing parallel provers -, in Proc. of KLl Programming Workshop '90, pp.131139, 1990 (in Japanese). [Fujita and Hasegawa 90] H. Fujita and R. Hasegawa, Implementing A Parallel Theorem Prover in KL1, in Proc. of KLl Programming Workshop '90, pp.140149, 1990 (in Japanese). [Fujita et. al. 90] H. Fujita, M. Koshiniura, T. Kawamura, M. Fujita and R. Hasegawa, A ModelGeneration Theorem Prover in KLl, Joint US-Japan Workshop, 1990. [Fujita and Hasegawa 91] H. Fujita and R. Hasegawa, A Model-Generation Theorem Prover in KLI Using Ramified Stack Algorithm, In Proc. of the Eighth International Conference on Logic Programming, The MIT Press, 1991. [Hasegawa et. al. 90a] R. Hasegawa, H. Fujita and M. Fujita, A Parallel Theorem Prover in KL1 and Its Application to Program Synthesis, Italy-JapanSweden Workshop, ICOT-TR-588, 1990. [Hasegawa et. al.90b] R. Hasegawa, T. Kawamura, M. Fujita, H. Fujita and M. Koshimura, MGTP: A Hyper-Matching Model-Generation Theorem Prover with Ramified Stacks, Joint UK-Japan Workshop, 1990. [Hasegawa 91a] R. Hasegawa, A Parallel Model Generation Th.eorem Pr9ver: MGTP and Further Research Plan, In Proc. of the Joint American-Japanese Workshop on Theorem Proving, Argonne, illinois, 1991. [Hasegawa 91b] R. Hasegawa, A Parallel Model-Generation Theorem Prover in KL1, Workshop on Parallel Processing for AI,IJCAI'91,1991. [Hasegawa 91c] R. Hasegawa, A Parallel Model Generation Theorem Prover with Ramified Term-Indexing, Joint France-Japan Workshop,Rennes,1991. 375 [Hasegawa 91d] R. Hasegawa, A Lazy Model-Generation Theorem Prover and Its Parallelization, Joint Germany-J apan vVorkshop on Theorem Proving, GMD,Bonn,1991. [Stickel 88] M. E. Stickel, A Prolog Technology Theorem Prover: Implementation by an Extended Prolog Compiler, In Journal of Automated Reasoning, 4:353380, 1988. [Hasegawa et. al. 92a] R. Hasegawa, M. Koshimura and H. Fujita, La.zy Model Generation for Improving the Efficiency of Forward Reasoning Theorem Provers, ICOT- TR-751, 1992. [Stickel 89] M. E. Stickel, The Path-indexing method for indexing terms, Technical Note 473, Artificial Intelligence Center, SRI International, Menlo Park, California, October 1989. [Hasegawa et. al. 92b] R. Hasegawa, M. Koshimura and I-I. Fujita, MGTP: A PaTallel Theorem Prover Based on Lazy Model Generation, To appear in Proc. of CADE 92 (.5y.stem Abst.ract), 1992. [Wos et. al. 84] L. Wos, R. Overbeek, E. Lusk and J. Boyle, Automated Reasoning: Introduction and Applications, Prentice-Hall, 1984. [Koshimura et. al. 90] M. Koshimura, H. Fujita and R. Hasegawa, Meta-Programming in KL1, ICOT-TR623, 1990 (in Japanese). [Loveland 78] D. W. Loveland, Automated Theorem P1'oving: A Logical Basis, North-Holland, 1978. [Manthey and Bry 88] R. Manthey and F. Bry, SATCHMO: a theorem prover implemented in Prolog, In Proc. of CADE 88, Argonne, Illinois, 1988. [McCune 90] W. W. McCune, OTTER 2.0 Users Guide, Argonne National Laboratory, 1990. [McCune and Wos 91] '\'IV. W. McCune and 1. Wos, Experim.ents in Automated Deduction with Condensed Detachment, Argonne National Laboratory, 1991. [Nakajima et. al. 89] K. Nakajima, Y. Inamura, N. Ichiyoshi, K. Rokusawa and T. Chikaya.ma, Distributed Implementation of KL1 on the Multi-PSI/V2, in Proc. of 6th lCLP, 1989. [Nakashima and Nakajima 87] I-I. Nakashima and K. Nakajima, Hardware architecture of the sequential inference machine PSI-II, In Proc. of 1987 Symposium on Logic Programming, Computer Society Press of the IEEE,1987. [Nakashima et. al.92] H. Nakashima, K. Nakajima, S. Kondoh, Y. Takeda and K. Masuda, Architecture and Implementation of PIM/m, In Proe. of FGCS'92, 1992. [Overbeek 90] R. Overbeek, Challenge Problems, (private communication) 1990. [Slaney and Lusk 91] J. K. Slaney and E. L. Lusk, Parallelizing the Closure Computation in Automated Deduction, In Pmc. of CADE 90, 1990. [Schumann 89] J. Schumann, SETHEO: User's Manual, Technische UniversiUi.t Miinchen, 1989. [Wos 88] L. Wos, A utomated Reasoning - 33 Basic Research Problems -, Prentice-Hall, 1988. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 376 On A Grammar Formalism, Knowledge Bases and Tools for Natural Language Processing in Logic Programming SANO, Hiroshi and FUKUMOTO, Furniyo Institute for New Generation Technology (IeOT) Sixth Research Laboratory sano@icot.or.jp fukumoto@icot.or.jp Abstract This paper gives an overview of Natural Language Processing (NLP) by adopting the framework of logic programming in ICOT. First, we introduce a grammar formalism called SFTB, a new grammar formalism which has been evolved from the latest research work on contemporary Japanese. SFTB was designed and developed following the outcome of this research and incorporates cOI~.putational features. Two grammar studies are in current use at the laboratory. One, Localized Unification Grammar (LUG), is based on the phrase-base approach. Another, Restricted Dependency Grammar (RDG), belongs to the family of dependency grammars. Computerbased dictionaries should be thought of as knowledge bases. We have built a dictionary, in the form of LUG, which is available for sentence processing. In addition to the hand-built database, we have developed computerbased dictionaries. Finally, a tool for developing grammar rules which run on a computer has been introduced. Basic grammar rules, described in the LUG form, have been made using the tool. The tool makes it possible to extend basic grammar rules in order to create adequate grammar rules for user applications. We believe that this set of tools is applicable to a well-integrated NLP system as a whole. Readers who are interested in NLP systems that are not described here should refer to [1]. 1 Introduction In this paper, we describe NLP research activities conducted at ICOTs sixth laboratory. The overall goal is to provide a set of power tools that use NLP and consist of (1) a framework of grammar structures for a language (Japanese), (2) grammar formalisms for writing grammar structures for computational use, (3) non-trivial-sized grammar rules running on a computer, (4) computer-based dictionaries that help build dictionary entries and can be used in grammar rules, (5) tools for analysis sentences, such as a syntactic parser, morphological analyzer, grammar writer's workbench, and a dictionary editor. The power tools contain the results of our research activities on NLP through the underlying logic programming and may be thought of as a well-integrated NLP system. One of the major problems in the area of NLP is an essential lack of cooperation by the power tools with each other and subsequent shared data, tools and systems. Because of the wide variety of (1) grammar formalisms, (2) parsing mechanisms which are independent developments and (3) forms in which dictionary entries are written, most researchers develop parsing systems individually, write several grammar rules and construct dictionaries as they go. If the many tools for computationallinguistics described above can be made available in common, we will be able to make progress through data sharing. To develop a well-integrated NLP system, we have conducted the following research. A Grammar Formalism First, we should clarify what we mean by sentence. To do so we introduce a grammar formalism which provides a criterion for defining a relationship between the meaning of sentences and a sequence of words. The grammar formalism described here is called SFTB. The underlying framework draws inspiration from Japanese language morphology [10] and from the latest research work on contemporary Japanese such as Japanese Phrase Structure Grammar (JPSG)[3]. The framework is intended for use in computational grammar. Two Grammar Descriptive Frameworks The next objective is to investigate a grammar descriptive framework to hand-code grammar rules as linguistic information supplied by means of SFTB. These rules should be applicable to computer processing in the framework of logic programming. There are two grammar structures in current use at the laboratory. One, Localized Unification Grammar (LUG), is based on the phrase-base approach and aims at unification based formalisms. Another, Restricted Dependency Grammar (RDG), belongs to the family of dependency grammars and processes sentences in a traditional way. 377 Dictionaries We shall describe three kinds of dictionaries. Lexical information used in a computer was required in an implementation of LUG's grammar. This is characterized as the finite syntactic information of attribute value pairs and results in a hand-built dictionary. Although the computer resident dictionary, consisting of about 7,000 entries, is hand-coded, the lexical database for morphological analysis created from existing computer-based resources by a conversion program consists of 150,000 entries. We have a diction,ary where each entry has a large amount of syntactic information and sense-related semantic information. It may be thought of as a linguistic knowledge base. A Tool Finally, we introduce a grammar rule development system called LINGUIST. LINGUIST consists of a bottomup parser (BUP-translator[8]) and debugging facilities with a cooperative response coordinator for user interfaces. The system is being integrated into an environme~t of support tools for developing, testing, and debugging grammar rules written in LUG. With this system, we have developed the basic grammar, which has 800 grammar rules, for contemporary Japanese language including morphological analysis rules. 2.1 The SFTB grammar system In our grammar system, the central units will be that of morphemes but not words, which are classified into parts of speech generally. The grammar proposed in this paper has morphemics as a part of its grammar system. This is a grammar system which is rooted in [10] (see the survey). Morphemes come in several varieties. A basic unit called goki stands for a concept where a certain relation holds the state of affairs and the idea that the object belongs in the real (cognitive) universe. Units of this kind can be divided into two types. One of the free forms, jiritsu-goki forms words by itself. We call one of the bound forms ketugou-goki, since the unit must combine with affix (s) in order to form words. A unit called setsuji performs a grammatical function where states of affairs are linked with certain relations. For example, verb endings are a type of morpheme. This approach uses a neutral unit goki in relation to the grammatical behavior under syntax. Example 1 shows a phrase structure produced SFTB framework for the phrase "Uresi sa no taiigen" ("Expression of happiness" in English). In this case, the word Uresi-sa (Happiness) is derived from the Uresi(Keiyou-goki) and the affix sa. For academic use, tapes are obtainable free from IeOT. These tapes serve as the linguistic tools mentioned above. Nominal 2 SFTB - A Grammar formalism for the Japanese language The aim of developing SFTB is to investigate linguistic frameworks for computational processing of the Japanese language. This framework is a necessary grammatical basis for processing Japanese sentences by computer. A grammatical basis for a language should provide a concrete and coherent framework for relating the linguistic form and the content conveyed by sentence expression. A computer must operate on the information structure expressing the content of a sentence produced by the framework. In the SFTB framework, the syntactic structure derived from the linguistic form is based on a compositional line of sentence analysis, but not on a fiat dependency analysis, which is the traditional way of handling sentence structure. It is able to cope with the' problems of subject and object omission, ellipsis, interdependence on context and so forth. It may also end structureless and patternless Japanese language confusion caused by the lack of a proper syntactic framework. ~ Adnominal r----joji Nominal I I Taigen-goki no (of) ~ Keiyou-goki I Uresi (happy) Nominal I Taigen-goki I Taigen (expression) affix I sa (ness) Figure 1: Example of phrase structure Note that, although the phrase structure is a constituent structure representation from a structural point of view, nodes including feature information are more complex i.e. feature bundles instead of category name. Semantic information is also added to the feature bundles. To make this approach applicable to sentence analysis, all that needs to be done is to integrate morphology and syntax into a comprehensive system. 378 2.2 Basic patterns and sentence structure The integration required reconsideration of not only the relation between morphology and syntax but also the general framework of the Japanese language, such as conjugation lists of verbs, word classification, basic sentence patterns and so forth. In this section, we concentrate on the grammar basis for syntactic analysis for Japanese language applicable to computer processing-."' First, we should clarify what we mean by a sentence as a criterion. No sentence is uttered or written without the speaker or writer expressing their view on a subject. This may suggest that we ought to extract not only the contents of the sentence but also the intention of the originator from the surface structure of the sentence. Above all, the syntactic forms mapped to verb endings are to be closely related to the meaning of the sentence. However, this is an ideal case where actual usage depends on context and discourse. As you know, speech is often rambling. One may ask how the content and intuition of graffiti can be extracted. Of course, the criterion mentioned above has to be applied to a limited range of linguistic phenomena. While bearing the above in mind, we offer the criteria and seek sentence structure to map the surface string to an internal representation that is used for NLP by computers. We characterize the properties of sentence structures we have been investigating, as illustrated in Figure 2. [Conten~:]tence Speaker [Topic]} 0 } ] . patterns Ta ble 1: B aSlC Sentence endings Intention suru/sita Neutral surudarou/sitadarou Presumptive siyou/sitadarou Volitional siro/suruna Imperative consist of sentence endings and phrase endings appearing in loosely dependent clauses. The derivation of the conjugation lists illustrated in Table 2 are based on the assumption that verb endings are proportional to the linguistic clues depending on the meaning that the sentence conveys. Table 2: Conjugation lists Hanasu (To speak) ] {[Modal]} Form su sita sudaraou sitadarou sou sumai se suna si site seba sitara sitari Lists (SFTB) Type Non-perfect indicative Perfect Non-perfect presumptive Perfect presumptive Positive volitional Negative volitional Positive imperative Negative imperative Connective form 1 Connective form 2 Non-perfect conditional Perfect conditional Coordinate form Lists t Form Type sa 1 so 2 si 3 su 4 su 5 se 6 se 7 t School grammar: 1 : Mizen-kei, 2 : Mizen-kei, 3 : Renyou-kei, 4 : Shuusi-kei, 5 : Rentai-kei, 6 : Katei-kei, 7 : Meirei-kei [Predicate] (Agent) Connections between states of affairs o .. Pattern Indicative Presumptive Volitional Imperative Subject Time Location Complement 1 Complement ) Figure 2: The Sentence Structure The basic patterns of Japanese sentences are presented in Table 1. The verb "suru", to do, is taken up as a model. Several types of conjugation lists for verbs are available and the elements of verb endings will be tailored to meet the sentence endings of basic patterns. The lists Compare SFTB's conjugation lists with the grammar lists of the school. The number of verb endings in the school grammar lists is less than seven, in the SFTB's lists, there are over a dozen. For each verb ending form, however, a type name provides sentence patterns, a syntactic function link to the meaning expressed by the sentence. 3 Linguistic Knowledge Bases Grammar The well understood way of processing Japanese language is that you read the technical papers and understand the most important part of the framework. Since linguistic knowledge about grammar and dictionaries is described in the natural language itself, there is no straightforward means of applying linguistic knowledge investigated by SFTB to map underlying descrip- 379 tions onto grammar rules that run on computers. The weakness is in the representations of the programs underlying the parser available for language processing on computers. To solve the problem, we present two computational grammar descriptive frameworks for developing grammar rules built from the SFTB framework. For computer systems with the parser developed in the logic programming framework, these grammar structures give the grammar writer a descriptive framework for writing grammar rules to be used by a parser. In this section, we describe two computational grammar descriptive frameworks, one is LUG formalism, the other is RDG formalism (RDG), used for writing grammar rules that run on computers. The former is based on the unification grammar that belongs to the phrase structure base grammar. The formalism of the latter takes its stand on the dependency structure grammar. Much has been done in writing grammar rules to parse natural languages. These grammar rules generally make use of an assumed grammar formalism. Almost no attempt, however, has been made to utilize different ways that base their underlying formalisms on the same grammar for processing a natural language. Thus, although processing methods have been discussed that contrast parsing speeds, required memories and so forth, we have not talked about the merits and the demerits of these grammar structures. In order to ascertain what kind of grammar formalism is appropriate for processing Japanese language, different approaches must be applied to the language. The original goal of SFTB was to provide a new Japanese language grammar formalism for contemporary Japanese. As applicability to computational linguistics look possible, we are now concentrating on writing grammar rules in terms of LUG and RDG, in the framework of SFTB. mar writers with little computer experience. Thus, LUG is a grammar specification language designed for users to develop non-trivial grammar expressed in the DCG. The basic data of LUG is a feature syntax. Categories are expressed as feature sets. Since the feature sets are represented as Prolog lists, the grammar is written in D CG formalisms, allowing users to make use of the BUP[8], BUP-XG[12], SAX(9] translators being developed in the framework of logic programming. As a sample LUG, we present, in Figure 3, an. informal representation of the phrase "Uresi sa no taigen". Morph taigen Category Taigen Marker no uresi/ sa ] Morph Category Taigen oftype Marker sa [ Sem nominalize( Y) Sem oftype( X,nominalize( Y)) Figure 3: LUG for the phrase The form produced by LUG can be described as a complex constituent that is the result of compositional and functional application. The functional application is used to limit compositional ambiguities caused by unification-oriented structural description. The contextual information of knowledge bases is dealing with the world or pragmatic knowledge about words, for example can be re-unified later. Thus, complete resolution of constituent structures depends on semantic-based and pragmatic-based accounts of subsequent information. With this formalism, the Japanese language grammar is written independently of the task of the application domain. 3.1.1 3.1 A Phrase Based Approach - LUG Definite Clause Grammar (DCG)[6] is one of the bridges connecting NLP and logic programming. Most of our grammar research activities can be regarded as improvements and extensions of DCG. A wide range of parsing techniquea have been suggested based on DCG through underlying context-free grammar. Although the computational effectiveness of DCG is powerful enough to write grammar rules that run directly on a computer, it can be thought of as programming rather than the description of the grammar of a language. It may be said that DCG is less expressive than other grammar formalisms in the sense of mathematical measures. If the process of developing grammar rules is tied to the description of DCG, it will be difficult to develop large grammar structures manual by using 'the DCG description. To overcome this pro blem, we have designed LUG to be accessible to gram- The Basic Grammars The LUG formalism has been used to build grammar structures for basic coverage of contemporary Japanese. As of now, grammar structures of 800 grammar rules are usable and are under development for the purpose of increasing coverage. Remember, however, that the readability of grammar structures is sacrificed when its rules are extended. It is often difficult to keep a large number of grammar rules under control. Even a small loss of attention causing inconsistent grammar can cause ambiguities to increase and analysis to become useless. An important characteristic of the basic grammar structure is that it is orderly divided into 12 groups to the following standards: o Difficulty in analyzing sentences According to Figure 2, a complete analysis of sentences comes from success in understanding the syntactic elements in the structure when in the parsing 380 process. This is directly and indirectly related to the basic sentence patterns and the sequence that words appear in sentences. The fewer syntactic elements omitted, the easier it is to parse its structure. The greater the difference between the syntactic elements and their corresponding morphemes, the more it costs to analyze a sentence correctly and to grasp its meaning. Grammar structures can be loosely divided into three levels: elementary, intermediate and advanced levels. vVe list here some samples of the kinds of grammar structure levels, but are limited to the following. Elementary level decision( declaratives), supposition, conjectural form ( declaratives), command(imperative), aspect operators, negation, polite form, complements, mood auxiliaries. Intermediate level passives, causatives, modal adverbs, spacio-temporal adverbs, topicalized phrases, relatives. Advanced level conditional phrases, causal phrases, some connectives, conjunctions and disjunctions of nominal phrases. 3.2 A Dependency Based Approach RDG Japanese word order is said to be free. Thus, dependency grammar that only focuses on the relation between two arbitrary constituents as a syntactic structure in a sentence has been well studied. Many NLP systems for the Japanese language have adopted the dependency paradigm as an approach for syntactic analysis. However, the problem of the dependency structure (it is not a tree but a connected graph structure) which is used in these NLP systems is that useless solutions are generated, which bring about a combinatorial explosion. This is because whether one constituent of a sentence modifies another constituent concerns only the localized information between the two constituents, such as the selectional restriction between a verb and its complements. In this section, we propose a dependency grammar formalism for the Japanese language called Restricted Dependency Grammar (RDG). A characteristic of RDG is (1) The interpretation of whether one constituent modifies the other or not depends on global information based on the word order of a sentence. So we can suppress the generation of useless solutions. (2) Every constituent of a sentence except the last should modify at least one constituent on its right. So, some linguistic phenomena, themes or ellipses can be treated easily in our approach. RDG is currently implemented in the SICStus prolog, and is being evaluated by using a Japanese newspaper editorial, with specially attention given to the number of solutions. In the following subsections, we introduce an outline of RDG formalism, concentrating on the constraint based on the word order of a sentence. 3.2.1 Modifiability rank A sentence consists of many constituents. We call these phrases. Every phrase of a sentence has two syntactic contrastive aspects, that is, one phrase modifies the other and one phrase is modified by the other. We call these aspects the modifier (henceforth Mer) and the modificand (Mcand). When the Mer of one phrase and the Mcand of another phrase match, we can connect the two phrases by an arc. In RDG formalism, every phrase and atc have a modifiability rank value. The phrase rank is a classification of a phrase based on the number of Mer and Mcand phrases. For example, a manner adverb such as "yukkuri" (slowly) can modify verbs such as "yomi-nagara" (reading a book), "yomeba" (if you are reading), "yomu-node" (as you read), and "yonda-keredo" (though you read). On the other hand, of all these verbs, a modal adverb, such as "tabun" (probably), can only modify "yonda-keredo" (though you read). This means that the number of Mer "yukkuri" (slowly) is more than that of "tabun" (probably). In a similar way, if a phrase can be modified more than another phrase, the number of Mcand phrases is more than that of the other phrase. The phrase rank consists of 7 Mer and 7 Mcand ranks. Every phrase has a Mer and Mcand rank. The classification of phrases based on these ranks is given in Table 3. Table 3: Classification of phrases based on rank Mer al a2 a3 a4 b c d phrase example deictic pronoun manner adverb noun continuous verb condition verb, temporal adverb causal verb contrastive verb, modal adverb Mcand Al A2 A3 A4 B C D phrase example deictic pronoun manner adverb noun continuous verb condition verb causal verb contrastive verb Conditions between phrase ranks are formulated as in (1), and (2). >- aa >- a4 >- b >- c >- d -« aa -« a4 -« B -< C -< D a2 a2 (1) (2) 381 ----? (1) shows the Mer rank. (2) shows the Mcand rank. For example, when a manner adverb "yukkuri" (slowly) is classified as a2, and a modal adverb "tabun" (probably) is classified as d from Table 3 Mer Rank, we found that the number of Mer "yukkuri" (slowly) is more than "tabun" (probably) from formula (1). The arc rank is a classification of an arc based on the phrase's modifiability rank. It is incorporated in the word order of a sentence. We assume that a sentence consists of three phrases, Pi, Pj, and Pk (i < j < k). We can get the dependency structures (Fig 4 (a), and (b)) from this sequence. The arc between Pi and Pj is shown ----? ----? as PiPj . In Fig 4 (a), we call PiPj an adjacent arc of ----? ----? ----? Table 4: Rank of Pi, Pj and PiPj a1 a2 a a a~ a a a a a a4 b c a a a a a a a a a b b b c d c d In (3), for example, when a >- b we say that b is lower than a. PjPk. In Fig 4 (b), we call PjPk the inside arc of PiPk. (a) (b) I IE] Figure 4: Dependency structure (1) (1)' (3) a>-b>-c>-d Now we show the constraints of word order using the rank of an arc. When the dependency structure is as in Figure 4 (a), the rank between the two arcs sh<;>uld be satisfied by formula (4). When the dependency structure is as in Figure 4 (b), the rank between the two arcs should be satisfied by formula (5). [Kare-ga] yobu-to heya-kara dete-kita. When he called change[obj2=bus-route, loc=L]. 2. change[obj2=bus-route,loc=L] => decrease[obj2=passenger, loc=L]. 3. decrease[obj2=passenger[mod=bus]J, loc=L] => abolish[bus, loc=L]. 4. enforce[obj=two-way-lane, loc=L] => dangerous[obj=pedestrian, loc=L]. 5. enforce[obj=two-way-lane, loc=LJ, turn-on[act=bus, obj=lights, loc=L] => dangerous[obj=pedestrian, loc=L, pol=O]. 6. enforce[obj=two-way-lane, loc=L] => dangerous[obj=enteringcar, loc=L]. 1. enforce[obj=one-way-system, loc=Midosuji] 2. enforce[obj=two-way-lane, loc=Midosuji, pol=O]. 3. change[obj2=bus-route, loc=Midosuji]. 4. change[obj2=passenger[mod=bus]J, loc=Midosuji]. 5. enforce[obj=two-way-lane, loc=London]. 6. dangerous[obj=pedestrian, loc=London, pol=O]. must be • Ground The grounds are necessary to justify the argument goal. The type of argument goal and the beliefs are used to select the proper plan for generating the grounds. The plans that are very restricted based on the reasons for the states of affairs will be described in detail in Section 3. An example of the grounds is given below. Ex. 2 Because the bus service stops if a two-way 7. enforce[obj=two-way-lane, loc=LJ, set-up[obj=road-sign, loc=L] => dangerous[obj=entering-car, loc=L, pol=O]. • Facts 1 enforced. lane is not enforced) " . • Opposing argument and its refutation Showing the grounds is enough to justify the argument goal. But, add to this any expected opposing argument and its refutation increases the persuasiveness of the text. The system adds the pseudoground of the opposite argument goal as the opposing argument, and points out that it is incorrect by refuting it. 7. turn-on[act=bus, obj=lights, loc=London]. Ex. 3 Indeed enforcing the two-way lane seems to • Judgments be dangerous for pedestrians) but they are safe if the buses tum their lights on. l. ng[obj=abolish[mod=bus]]. 2. ng[obj=dangerous[obj=_]]. Figure 1: Contents of the Beliefs • Example The argument text with the example is more persuasive. Ex. 4 For example) the number of passengers using of affairs to the system as the argument goals. In the table, A means some state of affairs, and A means the negative state of affairs of A. If a judgment g(A) exists in the system's belief, then the system believes A to be good. Correspondent Argument goal Assertion Judgement must(A) It must be A ng(A) It had better be A g(A) hb(A) -,ng(A) It may be A may(A) Dulcinea converts the given argument goal to the correspondent judgment, and shows that this judgment is supported by its beliefs. 2.5 Constituents of the Argument Text The argument texts consist of the grounds for the argument goal, the expected opposing arguments, its refutation and the examples. The plans are prepared for each constituent to generate its content. A brief description of the constituent follows. • Argument goal The argument goal is given to the system first. It provides the system's standpoint. It is the conclusion of the text, which is mostly placed at the end of the text. the bus decreased) because the two-way lane was not enforced. 2.6 Argument Graph The semantic contents that consist of the above parts are represented by the argument graph. Figure 2 is an example of the argument graphs, which insist the argument goal "Tbe two-way lane must be enforced". Each node in the graph represents a state of affairs. Ng in the nodes indicates that the state of affairs is regarded as no_good, and af indicates that the system assumes that this is true. Nodes (2)rv(5) with the assumed node (1) represent the grounds for justifying of the argument goal. The cause link in the graph means a general causation, and the p link is used to represent the assumed node. The term in the node (2) enforce (two-way-lane; 0) is the negative state of affairs of enforce (two-way-lane). The system regards node (5) as no_good and node (2), the negative state of affairs of the argument goal which causes the no_good state of affairs, as node (5). Therefore, this causal relation is the ground for the argument goal. The ant i link means the linked graph has contents opposite to the ground, and the deny link shows that the node seems to be caused, but is denied by the linked graph. The details of the ground, the opposing argument and the refutation of it are given in Section 3. 1 A two-way lane is a lane which allows buses to drive the wrong way up a one-way street. 388 (3)change(bu.-route) ng (6)enforce(two-wa.y-lane) ng c~e (4)decrea..e(pa •• enger) ng c~e (5 )a.bolish(bu.-service) ng c~e Figure 2: Argument Graph 2.7 A Structural Gap between the Argument Graph and a Linguistic Text Structure The system realizes the semantic contents by natural language expression. However, the semantic text structure does not always correspond to the linguistic text structure. The various relations in the argument graph such as causation, temporal sequence, condition and assumption are expressed by various natural language connective expressions by considering the system's standpoint and beliefs. The direct realization of the propositional content makes for unnatural text. Therefore, the system must consider the judgment in relation to the propositional content and the role of the block that the propositional content is placed on such as the grounds and the opposing argument. One of the correspondent natural language expressions to the argument graph in Figure 2 is given below. natural language expression, the direct realization from the semantic data structure needs complicated processing because their is too much information to be referred to and a large variety of decision orders_ To realize the proper expressions, as in the example above, it is necessary to not only refer to the relations between each state of affairs, but to consider how the partial structure relates to the whole text. A limited natural language expression is likely to be realized to avoid complexity, and such expressions cannot affect the reader_ In the early stages of the generation process, the organization of the linguistic text structure based on the linguistic strategy for the argument is important to realize the persuasive text, which utilizes the rich expressiveness of natural language. The linguistic strategy consists of the prescriptive descriptions of the developing topics and the local plans for each constituent to represent the local relations. In addition, we need the abstract representation form to represent the text structure generated as a result of structure planning. Ex. 5 If the one-way system is introduced into a street(l) , and a two-way lane is not enforced in the street(2) then the route of the bus service changes(3), the number of passengers decreases(4) , and, unfortunately, the bus service eventually stops (5)Indeed enforcing the two-way lane(6) seems to be dangerous for pedestrians(7) , but they are safe(lo) if the buses turn their lights On(8), even if the two-way lane is enforced(9)' Nodes (2)""""(5) are linked by the cause link, but the surface expressions connect them naturally in a variety of ways. In (5), "unfortunately" is used to express the writer's negative attitude, this word communicates efficiently that nodes (2)"'(5) are the grounds. The ex...: pression "Indeed rv seems to be "," and "even if "," are used, because (6).......,(7) represent content that opposes the argument goal. These are denied by (8)"'(10). Since (6)"'(10) have a different content from (1)"'(5), two separate paragraphs are formed. All these things make it clear that the surface natural language expression reflects the system's standpoint and beliefs besides the propositional contents. However, since there is a gap between the semantic contents and the 2.8 FTS (Functional Text Structure) We introduced the FTS as the abstract representation form. The FTS is able to represent information such as the writer's judgments, necessary to generate coherent text, and to reflect the writer's standpoint besides the propositional contents. Both the local relations between the states of affairs and the global construction of the text are described together. FTS is a text structure representation form which represents the functional relations that hold within a piece of text. FTS consists of the FTS-term, order constraints and gravitational constraints. The order constraints and the gravitational constraints are optional. FTS-term: The data structure that represents functional dependencies that hold within a piece of text. FTS does not fix the order of the sentences_ Order' constraint: The constraint of the order between two sentences. The order constraint Sl
~ anti enforce(two-way-Iane) da.ngerous(pedestrian, London;O) f c~e ng Figure 6: Generation of the Example 3.2 Figure 5: Generation of Refutation of Opposing Argument A 1 ,A 2 , ••. ,A n :::} B Al rv An B The result of application of rules is represented in an argument graph. Figure 4 shows an example. (b) goal type 2 (may) A goal of the form may(A) corresponds to a judgment --.ng(A). We cannot obtain this type of judgment using the schema above. we define the semantics of --.ng(A) as follows, "There seems to be grounds for a judgment --.ng(A). But, in fact, there is a refutation to the argument" This idea is also used in the creation of refutations of an opposing argument. 2. Generation of opposing arguments and their refutation An argument Al whose goal is contrary to the goal of argument A2 is called an opposing argument of A 2 • Its goal and its opposing argument's goal are listed below. Argument goal must(A) (= ng(A)) hb(A) (= g(A)) Opposing goal ng(A),g(A) ng(A),g(A) The module creates the pseudo-ground for the goal opposing the original goal. Find, then, creates the refutation to the opposing argument. Figure 5 shows an example of the refutation of the opposing argument. 3. Generation of examples We define a pair of facts which are unifiable to a rule as an "example" of the rule. By attaching examples to the rules in an argument, we can reinforce the argument (See Figure 6). Linguistic Organization with Argument Strategy This process applies some linguistic argument strategies to an argument graph and constructs an FTS of an argument text. Since the argument graph expresses only the semantic content of the argument, the structure of the graph is independent of the natural language expressions to be generated. Therefore, in order to generate the argument text, the argument graph should be translated into the FTS, which can be transformed into suitable natural language expressions. First of all, basic constituents in the argument graph such as the ground, the example and the refutation of the opposing argument are recognized. Then the order of these constituents is decided according to the prescriptive knowledge. The order is described using the order constraints of the FTS. For instance, the opposing arguments ate placed before the argument goal. The examples are placed after the ground. The ground is realized earlier than its opposing arguments and refutations of it. At the same time, each constituent is transformed into the FTS-term according to the transformation rules. Those constituents which cannot be used for the argument or would make the text unnatural are ignored. The following shows the transformation rules defined for each constituent of the argument graph. 1. Generation of the ground A causal relation in the argument graph is transformed into a new term which has a pair of labels cause and result. If the causal relation has precondition link p, then the content of the precondition with a label p_cond is added to the term. For example, the argument graph in Figure 4 is transformed into the following FTS-term. The FTS-term which represents the ground for the given argument goal has an attribute fts_type and its value main. [thesis= [set={ [thesis=[p_cond=enforce(two-way-lane), cause= enforce(two-way-lane;O), result=change(bus-route)]], [thesis=[cause= change(bus-route), result=decrease(passenger)]], [thesis=[cause= decrease(passenger), result=abolish(bus-service)]]}], fts_type= main] 391 d.cr .....(p ..... ng.r) ewe nr; ..boli.h(bu.- •• rvic.) nr; Figure 4: Generation of the Ground Table 2: Criteria for Connection Relation 2. Generation of the opposing argument and the refutation of it The FTS-term which represents the opposing argument is generated with a label antLt in the FTSterm which represents the ground. The contents of the opposing argument in the argument graph are transformed into the FTS-term in the same way as the transformation of the ground part. The FTS-term which expresses refutation of the opposing argument has an attribute fts_type with the value anti-deny. The following shows the FTSterm corresponding to the argument graph in Figure 5. [set={ [thesis=t1: ... , fts_type= main], [thesis=t2:[though=enforce(two-way-lane), assume=turn-on(lights), Depth of a memory stack Nwnber of bad dependency structures Structural similarity Nwnber of connectives with negative statements Nwnber of connecting two clauses Gravitational constraint Stability of topics Connecting two implications Sentence order similarity between the ground and the example The depth of a memory stack should be shorter. The number of bad dependency structures should be smaller in a text. The structure of the surface text should be similar to the FTS. Not more than two connectives to intraduce negative statement should appear in a sentence. Two clauses should not be connected more than a certain number of times. Two sentences under the gravitational constraint should be placed close. Sentences should be ordered so as not to change the topic frequently. Two implications A-+B and B-+C should be realized in this order The sentence order of the example should be similar to the ground. result~dangerous(pedestrian;O)], anti_t=t3:[thesis= [seem= [thesis= [cause=enforce(two-way-lane), result=dangerous(pedestrian)]]]], ft s_type=ant i_deny] }] order constraints: tl .-? t-c.. -t (J)M*, /{ AJV- 17J~~{t L t-c.. -t (J) t-c. '1>, /{ A (J)*~7J~ 40% ~1;- L L L'"f -? t-c.. t. (J) J: 5 K, -~~fi~ y ~~~ ~~TQc!K, ~fiv-Y~~~L7j:~n ti, /{ A Jv- 1- 7J~~{t L, /{ A (J)*~7J~~1; ~ t-c., /{ A7J~~ll:: ~ nL L'"f 5. T.o. -~, 7J~fi1:~ 7j: ~fiv-Y~~~Tn~, ~fi~ J: 5 K lj. ~ Q. L 7J>. L, /{ A (J) 71 1- ~J~,flTnti, ~fiv- Y~~~LL~, L t-c. 7J~ -? L , Y ~~1i'tIi L 7j: ~nrf7j: ~ 7j: 0. ~fi~ t± fi1:~-C~ 7j: 0. ~fi v - Generated Text Translated from Japanese Figure 8: Argument Text To generate more coherent and natural texts using the rich expressiveness of natural language, some user model and more complicate conversational settings will be necessary in the future. Acknow ledgments Thanks are due to TANAKA Yuichi for reading the draft and making a number of helpful suggestions. We also wish to thank all the members of the Sixth Research Laboratory and the members of working group NL U for valuable advice. References (Appelt 1988] Douglas E. Appelt. Planning naturallanguage,referring expressions. In David D. McDonald and Leonard Bole, editors, Natural Language Generation Systems. Springer-Verlag, 1988. (Danlos 1984] Laurence Danlos. Conceptual and linguistic decisions in generation. In the Proceedings of the International Conference on Computational Linguistics, 1984. (Hovy 1985] Eduard H. Hovy. Integrating text planning and production in generation. In the Proceedings of the International Joint Conference on Artificial Intelligence. (Hovy 1987] Eduard H. Hovy. Interpretation in generation. In the Proceedings of 6th AAAI Conference. [Hovy 1988] Eduard H. Hovy. Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum Associates, Publishers, 1988. [Hovy 1990a] Eduard H. Hovy. Pragmatics and natural language generation. Artificial Intelligence, 43:153197, 1990. [Hovy 1990b] Eduard H. Hovy. Unresolved issues in paragraph planning. In Current Research in Natural Language Generation. Academic Press, 1990. [Joshi 1987] Aravind K. Joshi. Word-order variation in natural language generation. In the Proceedings of 6th AAAI Conference. [Mann and Thompson 1987] W. C. Mann and S. A. Thompson. Rhetorical structure theory: Description and construction of text structures. In Natural Language Generation: New Results in Artificial Intelligence, Psychology, and Linguistics. Dordrecht: Martinus Nijhoff Publishers, 1987. [McDonald and Pustejovsky 1985] David D. McDonald and James D. Pustejovsky. Description-directed natural language generation. In the Proceedings of the International Joint Conference on Artificial Intelligence. [McKeown 1985] K. R. McKeown. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, 1985. [McKeown and Swartout 1988] K. R. McKeown and W. R. Swartout. Language generation and explanation. In Michael Zock and Gerard Sabah, editors, Advances in Natural Language Generation, volume 1. Ablex Publishing Corporation, 1988. [M~tter 1990] Marie W. Meteer. The 'generation gap' the problem of expressibility in text planning. Technical report, BBN Systems and Technologies Corporation, 1990. 394 cause ==> decrease( p ... senger) (af) cause ==> abolish(bus-service) (af,ng) anti enforce( two- way-Ia.ne) (af) cause ==> cause ==> Figure 9: The Argument Graph [thesis= 1 :must [cont= enforce [obj=tvo-vay-lane]] , reason= 2: [set:{ 1: [thesis= [set={ 1: [thesis= 1: [set={ 1: [thesis= [p_cond= 1: (enforce [obj=one-vay-system] ,af), cause= 2: (enforce [obj=tvo-vay-lane ,pol=O] ,af) , result= 3: (change [obj2=bus-route] ,af), order= [1,2,3], conn" [(condition,c) ,(implication ,c)]]] , 2: [thesis= [cause= 1: (change [obj2=bus-route] ,af) , result= 2: (decrease [obj2=passenger [mod=bus]] ,af) , order= [1,2], conn= [(implication ,c)]]]), order= [1,2], conn= [(development,c)]], exampl= 2: [thesis= 1: (decrease [obj2=passenger[mod=bus] ,amo=40%, ten=prec ,loc=Midosuji] ,f ,ng) , crecog= 2: [thesis= 1: (change [obj2=bus-route ,loc=Midosuj i] ,f) , crecog= 2: [set= { 1: [thesis= (enforce [obj=one-vay-system , loc=Midosuj i] ,f) ], 2: [thesis= (enforce [obj=tvo-vay-lane ,loc=Midosuji ,pol=O] ,f)]}, order= [1,2], conn= [(juxtaposition,c)]] , order= [2,1], conn= [(causation,s)]], attent= [{loc ,Midosuji}] , . order= [2,1], conn= [(causation,s)]]]), order=[2,1] , conn=[(generalization,s)]], 2: [thesis= [cause= 1: (decrease [obj2=passenger[mod=bus]] ,af), result= 2: (abolish [obj=bus] ,af ,ng) , order= [1,2], conn=[(implication,c)]] , order: [1.,2], conn= [(juxtaposition,s)]], ftst_ type= main] , 2: [thesis= 1: [though= 1: (enforced[obj=tvo-vay-lane] ,af) , assume= 2: (turn-on [obj=lights [mod"'bus]] ,af) , result= 3: (dangerous [obj2=pedestrian,pol=O] ,af), order= [2,1,3], conn: [(implication ,c), (concession,c)]] , ftst_type= anti_deny,' attent= [ {obj2 ,pedestrian}] , anti_t= 2: [thesis= [seem= [cause= (enforce [obj=tvo-vay-lane] ,af) , result= (dangerous [obj2=pedestrian] ,af ,ng), order= [1,2], conn= [(implication,c)]] order= [1], conn= []]], order= [2,1], conn= [(negation1,s)]] }, order= [1,2], conn= [(change,s)]], order= [2,1], conn= [(deduction,s)]] Figure 10: FTS with order and conn attributes PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 395 Situated Inference of Temporal Information Satoshi Tojo Hideki Yasukawa t Mitsubishi Research Institute., Inc. 8-1, Shimomeguro L Meguro-ku, Tokyo 153, JAPAN (phone) +81-3-3536-5813 (e-mail)tojo(Q!nui.co.jp Institute of New Generation Computer Technology (ICOT)t Mita Kokusai Bldg. 21F 4-28 Mita L Minato-ku, Tokyo 108. JAPAN (phone) +81-3-3456-3194 (e-mail)yasukawa@icot.or.jp Abstract been dealt with. However, the introduction of parameter 't' seems too strong for natural languages. contrary Representations of natural language, describing the same to the case of physical equations. Actually. we cannot state of affairs may differ between speakers because of always map the temporal property of verbs or temporal their different viewpoints. In this paper we propose the anaphora on the time axis correctly. concept of perspectives that are applied to situations. to explain this variety of the representation of an infon, with regard to time. We define the perspective by the relation theory of meaning, namely the relative locations in mind between the described situation. the utterance situation, and the infon. Our aim is to model a situated inference system that infers temporal features of a sentence from the partial temporal information each lexical item carries. For this purpose, we apply our notion of perspectives to actual tense and aspects that are used as lexical temporal features, and in addition we inspect the In opposition to this view of the time. we have been forced to loosen the strongest topology of physical time in some ways. One of the most famous works to represent so-called 'coarse' time is the interval-based theor!" by Allen [Allen 84]. Kamp [Kamp 79] proposed event calculus where he claimed that 'an instant' was relatively defined by all the known events in his D RT. Our work is to extend this temporal relativity. We will mainly pay attention to the temporal structures of tense. aspects, and verbs within the framework. validity of our formalization for verbs. 'y\;'e show the in- From the viewpoint of the history of situat.ion theory, ference system in the logic programming paradigm. and the notion of a spatio-temporallocation. or simply a lo- introduce an ambiguity solver for Japanese -teiru that cation. was proposed to represent the four-dimensional may have multiple meanings. as an experiment of our concept of time and place. framework. uation theory. situations and spatio-temporal locations In the early stages of sit- were distinguished [Barwise 8:3] as: In 1 Introduction $. at 1 a hold .... However the consideration of spatio-temporal location seems to have been rather neglected since then. and we The most common way to represent time is to assume have only found ('ooper's :work [Cooper 8.5] [Cooper86] that it is a one dimensional line which extends both to give a significant interpretation of locations for time to the eternal past and to the eternal future. and a semantics. point called 'now' which moves along the line at a fixed The authors have worked for the formalization of a speed. The semantics of time in natural languages. so of- temporal location as a meaning carrier of temporal in- ten associated with this physical time parameter of. has formation [Tajo 90]. In this paper. we aim to show a 396 paradigm of an inference system that merges temporal as a real situation [Barwise83]. According to the notion information carried by each lexical item and resolves any of real situations, we can assume that there is proto- temporal ambiguity that a word may have. First we re- lexicon in the world though there are many different ways view the role of the temporal location following Cooper's to verbalize them. We may call those infons that are not work. Our' position is to regard temporal locations of yet verbalized proto-infon.s. 1 We can regard proto-infons infons and situations as mental locations. We define a as the genotype of infons; to describe a proto-infon to temporal perspective toward a situation, which decides make the phenotype one is to give rel and roles in natural how an infon is verbalized, in terms of relative locations language with a certain viewpoint. We propose an idea of situations and infons. In the following section, we will of perspective which gives this notion of view next. give accounts for several important temporal features of In the scheme of individuation, Barwise regarded all tense and aspects by the perspectives, not only to de- linguistic labels as being encoded in the situation itselt fine the basic information for the intended situated in- already [Barwise 89J. We will not discuss the adequacy ference but also to see the validity of our formalization. of this idea in this paper, however it gives us a way to In the following section, we discuss the computation sys- formalize situations with perpsectives as follows. In or- tem that infers temporal features of a natural language der to state a formula of the form .s sentence. We have implemented an experimentation sys- to assume a certain observer who has cut out s as a part tem of the ambiguity solver in Japanese -teiru with a of the world and has paid attention to information knowledge representation language QUIXOTE, devel- that the formula must already contain someone's view or oped at ICOT (Institute of New Generation Computer perspective. In that meaning, in S or in F= 0', we are required 0'. 0', so the basic lex- icon must be included as linguistic labels. For example, Technology, Japan). if the observer is a Japanese. Japanese language labels Situations with perspectives 2 should be used to describe the information. From this point of view, we assume that in the formula of the sup- The temporal information in our mind seems preserved port relation between a situation and an infon someone's in a quite abstract way, and temporal span or duration perspective already exist. are relative to events in the mind. In this section, we will s F= 0' {::} P(,s' F= 0") discuss the structure of those subjective views for time. It is an open question whether we can strip off all the and for the real situation. external perspectives from a support relation, as below: 2.1 Real situations and perspectives "Ve often write down an infon in the following way: Even if we can. this must not be the only way to choose a sequence of ~ relation, parameter'8 Pi'S. ~ However none have been concerned with those labels for 2.2 Temporal perspective the relation in an infon. For example. should we admit We concern ourselves with the temporal part of the such relations as those contains tense and aspect'? If so. spatio-temporal location of the situation theory here. and if the following supporting relations are valid: The natural way to do this is to assume that there is a support relation: 8 F=~ 8wim,john ~ s' F=~ 8warn,john ~ s" F=~ is-swimming,john ~ then how should we describe the relation in 8 F~'" ~ that is already verbalized even though the relation inside and .sit,? the infon do not. have tense nor aspects. Our formaliza- Are we making different expressions for the same real tion is as follows. There is a perspective P for a support situation? relation that adds tense and aspect. 8,8', We hypothesize a virtual physical world, or in other words an ontological world which was originally proposed 1 This is a reinterpretation of the concept of information in the real situation. 397 P(s P~ rei,··· ~) Pt ~ reLwith_tense_aspect, ... ~ duration of to both sides of the supporting relation, the meaning temporal. We may omit the subscript hereafter to avoid confusion. The next work we need to do is to define the structure of a P. 0'. 0'. the mental time of the namely from the beginning of to the finishing point: of which is assumed to be independent of each perspective though we add the subscript 'f to represent if P is in-progress state [Parsons 90] of In-progress state: Here, we assumed that the perspective is decomposable 2.3 1I00lit an We call -lJ, P( s) 0' 1I001it. 'V\ie need another component for our temporal perspective, that decides tense. Tense should be decided in accOl'dance with the relative position between the described situation and the utterance situation (that offers ·now·). in terms of the 'relation theory of meaning'. that is, a nat ural language sentence .(J)' is interpreted as the rela- Relation theory of meaning with regard to time tion between the utterance situation u and the described Suppose that the mental descripton of the temporal this theory. we can say that the standpoint of vievv is the length, or the duration, of an infon can be written as mental location of the utterance situation. situation c denoted by u.[ in our definition. However the speaker of the sentence does not necessarily know whether Ken ceased to run at the point this utterance was made. Therefore. whether the verb .is past or present does not depend on whether the deed finished. Instead, the only required condition for past is that the area the speaker paid attention to (= Ilslid precedes the point at which this utterance was made (= Ilulid. Fig. 3 depicts this case; in that figure, the deed (= lia/ld mayor may not finish at the utterance point (= IIUllt) however both situations can support the same ~ infon: • the finishing of the deed is recognized / llOt recog- was-running, ken ::?>. In summary. perfective/imperfective is distinguished. dependent on if the finishing point of Iialit is before II ullt- Durative /non-durative depends on if the field of view wraps up Durative lIalit or not. The most important feature of our method is the distinction of dumtiv(; and non-dumtivf aspects II alit by the relation between and IISlit. 'We interpret the p'l'ogressiv(; feature in terms of our formalization as follows. The state of progressive is to see the deed as a durative. and seeing a part of the inside of the deed. Namely the speaker does not pay attention to when the deed began, nor to when it will finish. The state is shown in Fig. 4. ) o o Figure 1: progressive c=J o Figure 3: past On the contrary. let us consider the case we do not pay attention to the inside of the deed. \'"hen The past VIew for affairs is represented by /lullt. /lsllt jt On the contrary, the present tense is represented by IIUlit c IIslltWe formalize the feature of past and present as follows: s[j u] F~ past, a::?> ¢= S s[:::> u] F~ present, a::?> ¢= Fa s Fa lis lit contains the whole time of the deed, we can conclude that the observer recognized the event as a non-durative one. in which case the event was regarded as a point with no breadth on the mental time axis (Fig ..5). if there is no interaction with other events. 6 (1 ) (2) Jj. 3.3 G Aspects The study on how we see the temporal features of event::; has been done in linguistics and we know the variety of Figure :'): compression to lloll-durative aspects. Among the taxonomy, it seems rather proper to pay attention to the following two important features [Comrie 76] though other features may be omitted. because those distinctions can be found in any language. A lexical item for the durative \'iew becomes t he fol- lowing: One is: • the deed is recognized as a duration of time / a point on the time scale of time (durativejnon-durative) I)From the topological point of view, a set which does not contain other sets inside nor has intersections with other sets is identified with one of the smallest sets of the space. viz. a point. 400 Perfect and time of reference As Reichenbach considered to be the basic rules. The accumulation of claimed in [Dowty 79], present perfect in English refers to temporal information must not be a mere addition; in the current state. We have shown that present is repre- our case, it must be the merger of different topologies in sented by the relation that IISllt includes IIUlit. Therefore, IIUllt IISlit 's, 1I001I~'s, and IIUlit's. The computation, therefore, to must be done in the dual mode; one mode is conventional represent the present perfect. In Fig. 6, we have shown unification and backward chaining, and the other is the the perspective for the present perfect. merger of s[X] with consistency. This is the reason why to satisfy this issue, our 1I001It must precede the we chose the QUIXOTE language with its concept of modules (situations, in our case) inside of which features case (1) o can be defined. We will mention the specification later. o -teiru that can have three different kinds of meaning that We have developed an ambiguity solver for Japanese case (2) depend on the context. c=:l Figure 6: present perfect o The problem of Japanese '- teiru' 4.2 In Fig. 6, case (1) shows that perfect is interpreted as a terminative aspect although case (2) shows that perfect Prior to introducing the ambiguity solver we have devel- is read as an experience. oped, we need to give a short tip on Japanese grammar and the problem we tackled. The perfect view becomes the following: s[O'::S u] F~ perfect, 0' ~ ~ SF 0' In the Japanese language, auxiliary verbs are aggluti(4) nated at the tail of the syntactic main verb that is the original meaning carrier. In order to compose a progres- Inference of temporal information 4 sive sentence. we need to add an auxiliary verb '-tei-' . and after that we are required to affix a tense marker. We summarize them below for a Japanese verb' kiru (to Situated inference 4.1 wear)'. In terms of situated inference, we expect the inference with the following rules: I part of speech I meaning lexical entry This sample rule can be interpreted as: if SI supports 0'1, S2 supports supports 0'2, and so on, then we can infer that So 0'0' This kind of rule can be read as backward chaining from the head, just as with the inference rules of Prolog. ki-tei-ru -ta lma zutto san-nen-mae-ni b-tei-ta verb aux. verb affix affix adv. adv. adv. phrase I verb phrase \\t'e would like to devise a system that computes temporal information, asking questions that corresponds to the head of a rule, and accumulating the temporal information from its body. For example. assume X. }-, ... are variables for temporal information. s[X, y, ... ] F 0'0 ~ s[X] F 0'1' s[Y] F 0'2,···· where all the basic. lexical information such as (1). (2). (:3), (4), and so on, defined in the previous section. are to wear be -ing* present past or perfect now all the time 3 years ago I was wearing* The problem lies in the places marked with table above. The meaning marked by meaning of -ftC i: * is * in the not the sole actually we can interpret the auxiliary verb ill thref' different vvays, depending on the context. vVe show sample sentences below 7 . 'This example was shown by the members of the JPSG working group in ICOT 401 ima ki-tei-ru (be putting on currently) zutto ki-tei-ru (wear all the time) san-nen-mae-ni ki-tei-ru (have worn three years ago) where 1/1. 1771' . . . . mil are special extended terms called lTIodule identifiers. and 0". TI • . . . . Tn are extended terms. and (. is a set of constraints. Representation in QUI,YOTE We will build our ambiguity solver, focusing on the area 4.3.2 of a deed in JPSG framework that corresponds to our There are several points to be explained. that is. how IIsllt- The partial information that each lexical item car- ries, as defined in the previous section, is utilized. according to the Japanese lexicon table above. Namely we use the inference rules of past (1) I present (2) and per- fect (4), for Japanese '-rul -ta'. We use the inference rule of durative (3) for Japanese '-tei-'. the notions introduced in the preceding sections are represented. Object terms are used to represent situations. infons. perspective. and so forth. First, verbalized infons are represented by object terms of the following form: The ambiguity of '-tei-' is solved as in Fig. 7 where .&' is the merger of information: X 2 [8 C 0"] in Xs and X6 ible with value of Xs necessarily becomes [0" C u], and this gives '-tei' the interpretation of the resultant state. = [I'd = R. cls = CIS. pET = P], = Args]. inflv-I'cl = [0" -< u] is incompat= [u C s], so that the aTgs The CIS' takes the symbols act 1. act 2, act 3 .... as its value which indicates the classifications of verbs 011 what state. that is. in-progress. target. resultant. each verb 4.3 Implementation can introduce. Here's a list of the classification and the This section shows an implementation of the treatment of temporal information discussed in this paper. states introduced: The act} program is written in the knowledge representation lan- act] =}ip. rt8 guage QUIxoT£[Yasukawa90],[Yasukawa92]. 4.3.1 ip. tar'., 7'e8 =} act 3 =} tar. rES. The relationship among these three classes QUIXOTE IS given by the following subsumption definition. Terms in QUIXOTE are extended terms on an order-sorted signature called object terms. and written in general as: Thus. the verb "ki-" can introduce all the threE' .'itates. where ° is an atom called basic object. 11.12 are atoms called labels, and 01,02 possibly be object terms. The domain of atoms (BO) is ordered and constitutes a lattice (BO, -:!C. T, while the verbs like "hashi-" could not introduce fal'state. For example. the wrbalized infon correspouding to tlw sentence ··.John is running" is represented as follows: ~). The subsurnption reiation(r;;;.) is a binary relation on-'!' isa-relation. Intuitively, holds if of 02 01 01 r;;;. has more arcs than 02 02 (we say 02 subsumes = [nl = nm.cls = act}.ptI'S = Pl. = [agt = john]]. illf[t,-1'cl the domain of object terms. and corresponds to so-called (ll'g,'-.; 01) and the value of a node is larger than the value of the corresponding node when' P is the 1emporal lwrspective \\'hosp iiplcl of view is in-progress and point of vie\',' is the pn Sfld, A PP1'- of 01 with respect to -:!c-ordering. In QUIXL'lTE, subsump- spective is also represented. by a pa.ir of two object terms tion constraints can be used to specify an object or the as follows: relation among objects. A rule of QUIXOTE ([foe = Fml [POl' = POt']). is a prolog-like clause of the form: where For E {ip. tal'. I'ts} represents the field of view. and P or E {pns, past} represents the point of view. 402 s I=~~ s I=~~ S[X5]1=~ tei ~ S[X6]1=~ ru ~ = X 2&X3 = -< u = X 4 &XS &X6 = [] X.5 = -< s or s C X6 = U C s Xl X2 X3 X4 (J (J (J Figure 7: the inference tree The ip and res correspond to in-progress state, target (aet 2 , state) -+ {ip,tar}, state, and resultant state, respectively. (act3, state) -+ {tar, res}. Among the situations of several kinds, discourse situations are represented by object terms of the following For example, the expression "ki-tei-" has three interpretations, while "hashi-tei-" has two interpretations. The form: target and resultant are the states after an event's havdsit[jov = Fov,pov = Pov, src = U] ing culminated. Thus, it is possible to disambiguate the where U is the object term representing the utterance interpretations if some evidence that the event has cul- situation. minated or not. for example. the successive utterance Thus the propositional content of the sentence "John that the first sentence "ki-tei-ru" should be interpreted is running" is represented by = ip, pov = pn:.s, src = Ul] : inf[v]el = [reI = run. cis = act 2 • pep, = [j01' = ip.por = pres]], args = [agt = john]]. dsit[jov For simplicity, the object term [rel [jot' = ip,pov = pres]] = is written as is]unning. three words. The lexical entries are as follows. = acf1.T'el = puLon, form = h]:: v[cls = aet 2 , rel = run. fonn = hashi]:: aU;Tv[asp = state. for-m = tei]; : afliJ~[pov = pres. forTH = ru]: : diet :: v[cis diet :: The ambiguity is processed by the mapping from a pair of the class of a verb and the aspect of an auxilially verb: The definition of a tiny interpreter is given in the appendix ·5. A toplevel query corresponds to the definition of the meaning of a sentence in Situation Semantics. and is of the following form: example, the Japanese expression "ki-tei-ru" consist of diet :: as having ip as its field of view. run, peT'S Next, the lexical entries of Japanese are defined. For diet :: of .. mi ni tuke-tei-nai" (does not wear) makes it clear ?- rni [u=ul ,exp=Exp,e=E,infon=Infon] I I {E=dsit[fov=Fov,pov=Pov,src=ul]}. This query says that the meaning of the expression "Exp" in an utterance situation "ul" is represented by the described (temporal) situation "E" and the infon "Infon" where the variable "Fov" and "Pov" represent the temporal perspective. For example. the following result is given by this interpreter: ?- rni[u=ul,exp=[ki,tei,ru], e=dsit[fov=Fov,pov=Pov,src=ul], infon=Infon] . (act 1, state) -+ {ip, tar. res }, 403 Answer: assumed a topological relation between the three param- Fov {ip,tar,res} eters of the standpoint of view Pov pres (1Islld. Infon (liullt), the field of view and the duration of information (liallt), each of which is the mentally recognized location of the utter- inf[v~rel=[rel=put_on,cls=act1,pers=PJ , args=_J ance situation. the described situation, and the infon, respectively in terms of the relation theory of meaning. P = [fov=Fov,pov=PovJ VV'e have defined tense and several important aspectual distinctions such as perfective /imperfective and dura- This means that the expression "ki-tei-ru" has three interpretations depending on which fov is applied, because the verb "ki-" introduces all the three states8 . On the contrary, expressions like "hashi-lei-ru" and "waka-tei-ru" has two interpretations, because those verbs can not introduce all the three states. ?- mi[u=u1,exp=[hashi,tei,ruJ, e=dsit[fov=Fov,pov=Pov,src=u1J, infon=InfonJ . tive /non-durative, with the perspective, that should be used as the partial information for the situated inference system. In addition, we tried to define other temporal features of verbs such as telic / atelic and temporal welljill-foundedness, to see the validity of our formalization. Our framework for the situated inference of temporal information is to infer the whole temporal features of phrases or sentences, collecting the partial information that is carried by each lexical item, and to solve the ambiguity partial phrases they may have. vVe are required to have mechanisms for that system. both Prolog- Answer: Fav {ip,res} Pov pres like backward chaining and maintenance of consistency in modules (situations. in our case), so that we can utilize the knowledge representation language QUTYOT£ Infon inf[v_rel=[rel=run,cls=act1,pers=pJ, args=_J P = [fov=Fov,pov=PovJ in IeOT. '0.fe have implemented an inference system to solve the ambiguity of Japanese '-teiru'. the lIallt which may refer to different parts of a deed. In that of experiment, the problem is solved together with another ?- mi[u=u1,exp=[waka,tei,ruJ, e=dsit[fov=Fov,pov=Pov,src=u1J, infon=InfonJ . lexical item which offers the information 'which incides with Iialit co- IISllt'. Our inference system is still small. and needs to be developed to cover many other kinds of lexicon and temporal ambiguity. According to this future work, we might Answer: Fov Pov {tar,res} be required to reconsider the structure of perspectives. pres VV'e are still trying to determine other temporal features of verbs. and, as a task in the near future, we are going Infon inf[v_rel=[rel=understand,cls=act1,pers=PJ, to try tto define the temporal perspectives of sentence adverbs. args=_J P = [fov=Fov,pov=PovJ Acknowledgment 5 Conclusion Thanks to the members of STS- WG (Working Group We have introduced the idea of temporal perspectives for for Situation Theory and Semantics) of IeOT for their situations, to explain the variety of language expressions stimulating discussions and suggestions. Special thanks for information in real situations. As a perspective, we to Kuniaki Mukai who gave the authors many important BIn QUIXOTE, the fact 0[1 = {a,b}] is interpreted as the two facts, o{l a] and 0(1 b]. uation Theory, to Kazumasa Yokota who gave authors = = suggestions about the treatment of perspectives in Sit- 404 many important comments, and to Koiti Hasida who explained the JPSG theory for verbs to the authors. Spe- Appendix - The interpreter cial thanks also to Mitsuo Ikeda of ICOT, who has stud- XX ied the analysis of Japanese -teiru. aet1 >= References XX Lexical Entry diet .. v[els=aet_1,rel=put_on,form=ki];; [Barwise89] J. Barwise. The Situation in Logic. CSLI Subsumption Definition aet2;; aet1 >= aet3;; diet .. v[els=aet_2,rel=run,form=hashi];; diet v [els=aet_3,rel=understand,form=waka] ;; diet .. auxv [asp=state ,form=tei] ;; Lecture Notes 17, 1989. [Barwise83] J. Barwise and J. Perry. Situations and At~ tdudes. MIT Press, 1983. diet .. affix[pov=pres,form=ru];; diet .. affix [pov=past ,form=ru] ;; [Comrie 76] B. Comrie. Aspect. Cambridge University Press, 1976. XX Top level mi[u=U,exp=[] ,e=D,infon=Infon];; [Cooper 85] R. Cooper. semantics. AspeCtual classes in situation Technical Report CSLI-84-14C, Center for the Study of Language and Information, 1985. [Dowty 79] D. Dowty. Word Meaning and Montague Grammar. D.Reidel, 1979. [JPSG91] JPSG-WG. The minutes of Japanese phrase structure grammar. 1987-91. [Allen 84] J.F.Allen. Towards a general theory of action and time. ATtificial Intelligence, 1984. [Kamp 79] H. Kamp. Events, Instants. and Temporal Refe1'ences, pages 376-417. Springer Verlag, 1979. in Semantics from Different Points of View. [Cooper 86] R.Cooper. Tense and discourse loca.tion in situation semantics. Linguistics and Philosophy, 9(1):17-36, February 1986. [Tojo90] S. Tojo. A temporal representaion by a topology between situations. In Proc. of SICONLP '90. mi[u=U,exp=[ExpIR],e=D,infon=Infon] <= d_eont[exp=Exp,e=D,infon=Infon] , mi[u=U,exp=R,e=D,infon=Infon];; %%%% Interpretation Rules d_eont[exp=Exp,e=dsit[fov=Fov,pov=Pov,sre=U], infon=inf [v_rel=V_rel, args=Args]] <= diet: v[els=CLS,rel=Rel,form=Exp] I I {V_rel=[rel=Rel,els=CLS,pers=P]};; d_eont[exp=Exp,e=dsit[fov=Fov,pov=Pov,sre=U], infon=inf[v_rel=V_rel,args=Args]] <= diet: auxv[asp=ASP,form=ExpJ, map[els=CLS,asp=ASP,fov=Fov] I I {V_rel=[rel=_,els=CLS,pers=P] , P = [fov=Fov,pov=_]} ;; d_eont[exp=Exp,e=dsit[fov=Fov,pov=Pov,sre=U], infon=inf[v_rel=V_rel,args=Args]] <= diet : affix[pov=Pov,form=ru] I I {V_rel=[rel=_,els=_,pers=P] , P = [fov=_,pov=Pov]} " University of Seoul, 1990. [Parsons 90] T .Parsons. Events in the Semantics of English. MIT press, 1990. [Yasukawa90] H. Yasukawa and K. Yokota, "Labeled Graphs as Semantics of Objects", In Proc. SIGDBS and SIGAl of IPSJ, Oct., 1990. [Yasukawa92] H. Yasukawa, K. Yokota, H. Tsuda, Objects, Properties, and Modules in FGCS '92, Tokyo, June, 1992. QUIXOTE, In Proc. XX Field of view Mapping map[els=aet1,asp=state,fov={ip,tar,res}] " map[els=aet2,asp=state,fov={ip,res}] ;; map[els=aet3,asp=state,fov={tar,res}] ;; PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 405 A Parallel Cooperation Model for Natural Language Processing Shigeichiro Yamasaki, Michiko Turuta, Ikuko Nagasawa, Kenji Sugiyama Fujitsu LTD. 1015, Kamikodanaka Nakahara-Ku, Kawasaki 211, Japan Abstract This paper describes the result of a study of a natural language processing tool called "Laputa", which is based on parallel processing. This study was done as a part of the 5th generation computer project. The purpose of this study is to develop a software technology which integrates every part of natural language processing: morphological analysis, syntactic analysis, semantic analysis and so on, to make the best use of the special features of the Parallel Inference Machine. To accomplish this purpose, we propose a parallel cooperation model for natural language processing that is constructed from ·a common processor which performs every sub-process of natural language processing in the same way. As a framework for such a common processor, we adopt a type inference system of record-like type structures similar to Hassan AitKaci's psi-term [Ait-Kaci 86], Gert Smolka's sorted feature logic [Smolka 88] or Yasukawa and Yokota's objecttem [Yasukawa 90]. We found that we can utilize parallel parsing algorithms and th~ir speed-up technology to construct our type inference system, and we then built a type inference system using an algorithm similar to a context-free chart parser.' As a result of experimentation to evaluate the performance of our system on Multi-PSI, the simulator of the Parallel Inference Machine, we have been able to achieve a speed-up of a factor 13 when utilizing 32 processors of Multi-PSI. 1 Introduction With the. advance of semiconductor technologies, computers can be made smaller and cheaper, so that we can increase the value of a computer by giving it multiprocessor abilities. However, the software application technology for a parallel machine is still at an unsatisfactory level except for some special cases. The Parallel Inference Machine which is being developed in the 5th generation computer project has some special features such as an automatic synchronization mechanism and a logic programming language allowing declarative interpretations. Such features make complicated parallel processing tasks, that used to be practically impossible, possible to realize. Knowledge Information Processing is one such application which needs lots of computational power and consists of very complicated problems, but we expect that the Parallel Inference Machine will make these problems amenable to parallel processing. The purpose of this study is to propose a parallel cooperation model which makes natural language processing more natural by making use of the parallel inference machine features. In this paper, we will discuss the schema of the parallel cooperation model, as well as its realization and show the experiment results of the model capacity evaluations on the Multi-PSI, the simulator of the parallel inference machine. 2 Parallel cooperation model The advantages of using the Parallel Inference Machine lie not only in the processing speed, but also in the problem solving techniques. We were able to find more natural ways of solving a problem by looking at it from the parallel processing point of view. In recent years, system integration has often been suggested in the field of natural language processing. This involves the integration of morphological analysis, syntactic analysis, semantic analysis and speech recognition, and the integration of analysis and generation. The implication is that the various natural processing mechanisms at every stage must be linked to each other in order to understand natural language entirely. [Hishida 91] As the basis of this way of thinking, it is emphasized that our information processing has been carried out under "partialness of information", in other words, incompleteness of information. From the above, we can derive that a system which aims at integrating natural language processing could adopt parallel processing because it disregards the processing sequence. We adopted a mechanism which integrates all natural language analysis processing stages and makes them cooperate in parallel as the fundamental processing model. 406 Also we have added a priority process as an extension, in order to improve the processing efficiency. This priority process is the combination of both load balance and the parallel priority control. We call this process 'competition' and we call the extended parallel cooperation model the "model of cooperation and competition". However, we shall not discuss 'competition' in this paper. 3 Realization of automatic parallel cooperation It is well known that the integration of natural language processing and parallel cooperation is a natural model. However, very few systems based on this model are reported to have been actually built. One of the main problems has been modularity. Various research projects in natural language processing have been achieving good results in the fields of morphological analysis and syntactic analysis. However, these systems were designed as independent modules and very often their interfaces are very restricted and internal informatiot:l is normally invisible from the outside. To carry out efficient parallel cooperation, all processes must be able to exchange all of their information with each other. Therefore construction of methods of information exchange between the various modules and the control of these exchanges will be serious and complicated problems. One way to solve this problem is to make an abstraction of the processing framework, so that analysis phases such as morphological analysis, syntactic analysis etc. are carried out by one single processing mechanism. One such approach is Hashida's Constraint Transformation [Hasida 90]. We have adopted an approach similar to that of among others Hashida, in the sense that all levels of processing are carried out by one and the same processing mechanism. Our processing framework, however, does not utilize Constraint Transformation, but rather Type Inferencing with respect to record-like type structures, which is comparable to Hassan Ait-Kaci's LOGIN , Gert Smolka's Feature Logic, or the Object Terms in Yasukawa and Yokota's QUIXOTE. In our system, the usage of type inferencing can be seen to have two aspects: it works as a framework for analysis processing as well as for cooperation. Analysis processing employs a vertical kind of type judgment, as exemplified by .. the cooperation between morphological analysis and syntactic analysis. In morphological analysis characters are considered to be objects, and morphemes are to be taken as types; but when we perform syntactic analysis, morphemes are considered to be objects. The usage of type inferencing as a framework for cooperation, the second aspect of this usage mentioned above, is as a means for exchanging information between objects and types and for structuring the contents of this information. Here' both objects and types are represt\nted as typed record structures containing shared or common variables, and information exchange is implemented through the unification of shared variables in two typed record structures representing an object and a type. This unification mechanism of typed record structures has a mechanism to judge the types of objects that are instantiated to field elements through communication, and this is what was called the vertical type judgment mechanism. Parallel cooperation between syntactic and semantic processing is expressed through the unification mechanism of typed record structures. Even if we .treat all phases of natural language processing as similar in kind, it is still natural, for the sake of ease of grammar development and debugging, to do the development of the distinct processing phases separately. For this reason, we have structured our system so that concept organization rules for morphological, syntactic and· semantic processing can be developed separately. Parallel cooperation is then realized automatically by merging these diverse rules and definitions. 4 4.1 The realization of parallel analysis processing Type judgment mechanism Efficient algorithms exist for morphological and syntactic processing, and we cannot afford to ignore such knowledge in developing a practical system, even in the case of an integrated natural language processing system. Luckily we have found that there is a strict correspondence between our vertical type judgment and known syntactic analysis methods. Matsumoto's parallel syntactic analysis system PAX [Matsumoto 86] performs syntactic processing in parallel through a method called the "layered stream method", which is an efficient processing mechanism for search problems involving parallel logic programming languages. PAX employs what is basically a chart parsing algorithm. Our vertical type judgment processing formalism involves a reversal of the relationship between process and communication data in PAX. A syntactic analysis system using a similar processing method to ours is being considered by Icot's Taki [Sato 90]. Whereas PAX is strongly concerned with the clause indexing mechanism of logic programming languages, our method concentrates on increasing OR-parallellism and reducing the amount of data communicati~n in parallel execution. How we interpret phrase structure rules, using the type ordering relation "<" and type variables, is shown below. s (- np,vp 407 This is rewritten based on the rightmost element as follows. vp < (np -> s) Here the ordering relation "<" expresses a superordinate-subordinate relationship between types. Intuitively this means that the object that is judged to be a subordinate type can also be judged to be a superordinate type. It follows that the meaning of this rule is that the object that can be judged to be the vp, can also be judged to be a function of type np to s. s <- advp,np,vp In a case like this one, we embed functions to produce the following. vp < (np -> (advp -> s» When there are several possibilities, this is expressed in a direct sum format as follows. vp < (np -> ((advp -> s) + s» The dictionary is a collection of type declarations as follows. (in,the,end):advp love:np wins:vp Analysis is exe~uted as a process of type judgment of a word string .. In other words, analysis is the execution of the judgment of a type assignment such as the one below. (in,the,end,love,wins):s The execution is bottom-up. First the type of every word is looked up among the type declarations. The words then send these type judgments to their right adjacent element. If these types again have superordinate types, then they are treated as follows. If the superordinate type is a function, then a process is generated which checks the possibility that the typed object received from the left is appropriate. If it is not a function (in which case it is atomic), then this type judgment formula is sent to the right adjacent element, and also it is checked whether it has a superordinate type. When the result of a superordinate type or function appropriateness is a direct sum, then this result is handled in OR-parallel form. Repeating this kind of processing over and over, we get as answers all the combinations of elements from the leftmost to the rightmost that satisfy "s" . One of the special features of this processing formalism is that, when sending an object of atomic type, the pointer to the 'position of the leftmost of the elements that make up this object is sent along as the "exit of communication path". Hereby the partial tree that is constructed upon reception of this object in fact is capable of including all atomic-type objects that are structured to the left of the received object. If we translate this to structure sharing in sequential computation, we see that we can avoid unnecessarily repeating the same computation while retaining the computational efficiency of a chart parsing algorithm for context-free grammars. Below we give the KLI program for the fundamental part of vertical type judgment. Note however that the notation we have used above is transformed to KLI notation, in the manner explained directly below. direct sum type+, ... ,+type ==> [type, ... ,type] type declaration object:type ==> type(object,T) :true I T=type. type ordering ==> upper(type,T) type < type truel T=type. input format (in,the,end,love,wins):s ==>judgment([in,the,end,love,wins] ,s,R). Note: R will contain the result of computation Also we use "*,, for the operator that constructs the pair of the sending atomic type and the stream, and the atom "Leftmost" as an identifier for the leftmost position of the input. judgment(Objects,Type,Result) :- truel objects(Objects,'Leftmost',R), judged_as(R,Type,Result). objects([] ,L,R) :- trueIL=R. objects([WordIZ] ,L,R) :- true I type(Word,Type) , sum_type(Type,L,Rl), objects(Z, [Word\Rl] ,R). sum_type([] ,L,R) :- true\L=R-. sum_type([Type -> Type2\Z],L,R) :- true \ function_type(Type ->Type2,L,Rl), sum_type(Z,L,R2), merge({Rl,R2},R). otherwise. sum_type([Type!Z],L,R) :- true! atomic_type(Type,L,Rl), sum_type(Z,L,R2), merge({Rl,R2},R). function_type(Type -> Type2, [],R) true\R=[] . function_type(Type -> Type2,'Leftmost' ,R) true \R=[] . function_type(Type-> Type2,[Type *Ll!L],R) :- 408 ture. However in our system, a value of a feature is not a record-like type structure but a description and only the terminal nodes of a feature structure tree are typed objects. This is to improve the efficiency of calculation. In our system an object is represented as a pair of a record-like type structure and an identifier of the object. The value of a feature can be a variable. However the unification of descriptions involves merging feature structures rather than instantiating variables. Variables of feature value play the role of a tag for the merging point in feature unification. In our system such variables also play the role of communication pass to exchange information for our parallel cooperation. Sometimes a variable can be assigned a type. When a variable is assigned a type such a typed variable must be instantiated by an object. Below we give an example of a record-like type structure. true I sum_type(Type2,Ll,Rl), function_type(Type -) Type2,L,R2), merge({Rl,R2},R). otherwise. function_type(Type -) Type2, [_IL] ,R) true I function_type(Type -) Type2,L,R). atomic_type(Type,L,R) :true I upper(Type,Upper_Type), R= [Type*L IRl] , sum_type(Upper_Type,L,Rl). judged_as([] ,Type,Result) :trueIResult=[]. judged_as([Type*'Leftmost' IL] ,Type,Result) true I Result=[TypeIR] , judged_as(L,Type,R). otherwise. judged_as([_IL] ,Type,Result) :- true I judged_as(L,Type,Result). {human, [parents=[father={human,211, [name=taro]}, mother=X:{human,[]}]]} %Example of dictionary: type(love,Type) :- true I Type=[np] . type(wins,Type) :- true I Type=[vp] . type(end,Type) :- true I Type=[the -> [in -) [advp]]]. % Example of grammar: upper(vp,Upper_Type) :- true I Upper_Type=[np -) [advp -> [s], s]]. 4.2 Unification mechanism of record-like type structure the A record-like type structure is a pair of a sort symbol and a description. A sort symbol denotes the sort to which the type belongs. A description is a so-called record structure formed by pairs of feature names and their values. The feature value is also a description or an object. However a description is unlike an ordinary record structure in that its feature and value pairs are not always apparent. Indeed, the purpose of this structure is to obtain incremental precision from partial information, just like the feature structures used in unification grammar formalisms such as LFG. In systems like Ait-Kaci's psi-term, Smolka's sorted feature structure'or Yasukawa & Yokota's object term, the value of a feature is also a record-like type struc- This example shows a type which is sorted "human" and satisfies some constraints as a description. The description has a feature "parents" and the value of the feature is also a description that contains the feature of "father" and feature of "mother". The value of the feature "father" is an object that is of sort "human" and named "taro" and its object identifier is "211". The value of feature "mother" is a typed variable. The type of the variable is sorted "human" and its description has no information. The unification mechanism for the record-like type structure is realized as the addition of information to the table of the pairs of the tag and the structure to which the tag referred. The unification process is the merging process to construct the details of the record-like type structure. When the typed variable is instantiated by an object, the type judgment process is invoked. This is in concreto how our parallel cooperation mechanism for semantic analysis and syntactic analysis works. 4.2.1 Parallel cooperation and record-like type structure A type can be seen as a program which can process an object. This implies that there is a close relation between merging of information using record-like type structures on the one hand, and the "living link" between objects or programs on the other. As an example, imagine that a graph object which was created by a spread sheet program is passed on to an object which is a word processor document. If we want such a graph object to be a "living" object, re-computable by the creator program, then it must be annotated by its creator as a data type in the record structure of the word processor document. Now 409 when the data is re-computed, the system will invoke its creator application program automatically. The type theory of record-like type structures can be viewed as a framework for this kind of cooperation of different application programs. Laputa's principle of automatic parallel cooperation is a parallel version of this "live linking". Vertical type judgment of morphological analysis and syntactic analysis is an application program in this sense. We can extend this live linking even further by using variables that are shared between objects and types, so that we can propagate information to objects within objects. For example, if a graph object from a spread sheet is pasted to a word processor document, and some of the data within the graph is shared with a part of the document text, then re-computation of the spread sheet program will happen when that part of the document text is modified. In our system, such re-computation is realized by communication of processes. 5 The grammar and lexicon for parallel cooperation In this section we explain the syntax and description method for the grammar and lexicon for our system. Grammar rules and lexical items are described as a type definition or an ordinal relation of types. Our parallel processing mechanism treats morphological data and syntactic data in a uniform way. However it is not efficient to use exactly the same algorithm for morphological processing as for syntactic processing, because morphological processing examines only immediately adjacent items and therefore does not need context-free grammar. Our processing mechanism treats characters and morphemes in a slightly different way. For this reason characters and morphemes are distinguished as data types. Another, more essential reason for this is the problem posed by morphemes that consist of only one character. If there is no difference between a character and a morpheme, then our type judgement process will never be able to stop. 5.1 Some examples of grammatical and lexical description 5.1.1 Dynamic determination of semantic relation of subject and object The semantic categories of subject and object are not determined only by the verb with which they belong. In many, if not most cases, the adequacy of the semantic category of the object changes according to the subject. Because of this, the required semantic categories of subject and object should not be fixed in the lexical description of the verb. The example grammar rules below show how the adequacy of the semantic category of the grammatical object can vary dynamically depending on the subject. {np,[sem=Ob]} < ({vt,VT} -> {vp,VT=[obj=Ob]}) {vp,VP} < ({np,[sem=Ag]} -> {s,VP=[agent=Ag]}) In this exam pIe the first grammar rule shows that the superordinate type of the type "np" is a function of type "vt- > vp". This rule means that an object which is judged as the type "np" is also a function which, if applied to an object of type "vt", results in an object of type "vp". In this rule every description of "vt" is merged to "vp" and the value of the feature "sem" of "np" is unified with the value of feature "obj" of the type "vp". The next grammar rule means that an object of type "vp" is also a function of type "np- > s". This rule means that the value of the feature "sem" of subject "np" is unified with the value of the feature "agent" of "vp". Now we also show some lexicon entries to go with these rules. eats:{vt,[agent=Ag:{animal,[eat_obj=Ob]},obj=Ob]} john:{np, [sem={human,Id,[name='John ' ]}]} the_tiger :{np,[sem={tiger,Id,[]}]} In the lexicon the object "eats" has type "vt" and complex description. In the description the value of the feature "agent" is a typed variable and the type of the variable is sorted "animal" and the value of the feature of "eat-obj" is unified with the value of the feature of "obj" of type "vp". The rules specifying semantic categories look as follows. {tiger,[]} < {animal, [eat_obj=E:{animal,[]}]} {human,[]} < {animal, [eat_obj=E:{food, []}]} These rules mean that a tiger is an animal which eats animals and a human is an animal which eats food, respectively. Although under these rules of grammar, lexicon and semantic categories 'john' and 'the_tiger' are both animals, the judgment (the_tiger,eatsjohn):s succeeds but (john,eats,the_tiger):s fails, because John is a human and a human is an animal which eats food but a tiger cannot be judged as food from the rules governing semantic categories. 5.1.2 Subcategorization The next example is the lexical entry for the Japanese verb "hanasu" (to speak). This verb is subcategorized by 3 "np"s which are marked for case by the particles "ga", "wo" and "ni". 410 hanasu:{vp,[subcat=Case:{ga,wo,ni}, predicate=[ga=[gram_rel=subj, sem=G] , wo=[gram_rel=comp, sem=W] , ni=[gram_rel=obj, sem=N] , sem=[rel=speak, agent=G:{human,[]}, object=N:{human,[]}, topic=W:{event,[]}]]]}. 6th Laboratory [Sano 91]. We made the conceptual system rules in accordance with the conceptual system of the Japan Electronic Dictionary Research Institute EDR. The experiment We used 22 test sentences and examined 3 types of cooperation pattern: (1) syntactic analysis only, (2) cooperation of morphological analysis and syntactic analysis, and (3) cooperation of morphological analysis, syntactic analysis and semantic analysis. In addition to the above, suppose that we also have the following lexical entries and grammar rules. We checked the relation between the number of processor elements utilized and the number of reductions and processing time for each of these 3 cases. ga:{noun,N} -> {np,N=[case_marker=ga]} ni:{noun,N} -> {np,N=[case_marker=no]} wo:{noun,N} -> {np,N=[case_marker=wo]} All the tests have been performed three times, and the measurements given here are the averages computed from these three processing runs. {vp,VP=[subcat=Case:SUB]} < {np,[case_marker=Case]} -> {vp,VP=[subcat=New:SUB-{Case}]} Example of analysis result To indicate the level of processing of this experiment, I will show the result of analysis of a example sentence. In that case the type judgments for the sentences below will be successful. Example sentence (john,ga,mary,ni,anokoto,wo,hanasu):{vp,[]} (mary,ni,anokoto,wo,john,ga,hanasu):{vp,[]} (He inherited his father's business.) 5.1.3 Example of the conceptual system rules The conceptual system rules are sets of rules which determine superordinate and subordinate relations of concepts. The semantic analysis of Laputa uses these conceptual rules when it performs semantic judgemant. {object,O} < {'Top',O} {event,[]} < {'Top' ,[]} {concrete-object,[]} < {object,[]} {creature,[]} < lconcrete-object, []} {human,[]} < {creature,[]} {student, []} < {human,[]} 6 6.1 An experiment using Laputa Conditions of the experiment computer Multi-PSI 32PE construction as PIMOS 3.0.1 The size of the grammar and dictionary grammar rules 651 words 14,613 morphemes 8,268 concepts 770 We used the syntactic grammar and morphological grammar which were developed by Sano of ICOT's r~Btil~X:O).~ ~~It' t'=J Analysis result vp(l, [subcat=SUB: [], infl=u_ga, predicate=[ lex= ~, soa=[ga=[sem={man,l,[]}, gram_rel=subj], wo=[sem={job,6, [of_type={man,2,[]}]}, gram_rel=comp], sem={tugu,8, [agent={man,l,[]}, object={job,6, [of_type= {man,2, 0 } ] } ]} tenseless=action], polarity_of_soa=true, judgment=affirmation, aspect=not_continuous], mood=finished, recognition=[modality=descriptive, acceptance=affirmative]]) 6.2 Outcome of the experiment The the following graph shows, for the analysis of example sentence 12, how the speed-up ratio changes as the number of processors is increased from 1 to 32. 411 40 .9 40 30 30 .9 § Q.. ::J -6
--- sy n Figure 1: Example 12 processors and speed up ratio Figure 2: Example 14 processors and speed up ratio The behavior of the cooperative process of morphological, syntactic and semantic analysis is almost identical to that of syntactic analysis alone, while the cooperation of just morphological and syntactic analysis shows a much better speed up ratio relatively. This might lead one to think that cooperation of just morphological and syntactic a.nalysis makes for a better speed-up ratio. However examples involving a greater amount of calculation do not show this difference. Figure 2 is the result of the analysis experiment on example sentence 14. This graph show that all three types of cooperation have the same speed-up ratio, which is a different result t4an that we deduced from example sentence 12. We can interpret this difference as resulting from a difference in the amount of calculation. Both the syntaxonly calculation and the cooperative syntactic and morphological analysis process for example sentence 12 simply do not involve enough computation to fully show the potential speed-up ratio. Sentence 14, on the other hand, requires enough computation for any of the three types of cooperation, so that we can more clearly see the speed-up ratio. To verify this assumption, we plot the speed-up ratio against the amount of calculation for the three types of cooperation. As the graph shows, all three cooperation types show similar behavior for this relation. We can understand why this is so if we recall that in our system, all modes of processing employ the same basic processing mechanism. In the graph, we can see that the speed-up ratio rises steeply while the number of reductions remains small, but gradually becomes saturated as the number of reductions grows. In the bar graph of Figure 4, we can see that the number of reductions for sentence 12 in the case of cooperative morphological and syntactic analysis is about 1,200,000, while in the other two cases it is about half as much (approximately 600,000). Because the number of reductions as a whole is small, this difference is important. Example 14 on the other hand involves enough computation so that the effect is minimized and the similarities between the processing modalities are allowed to come out. 7 Conclusion In this paper, we proposed a model for integrated natural language processing on a parallel inference machine. This model is realized by choosing similar processing schemes for morphological, syntactic and semantic analysis and having these cooperate in parallel. Also, we have carried out an experiment to evaluate the ,practicality of our processing model. As the result of our experiment we have been able to realize speed-up to a factor of about 13 when utilizing 32 processor elements. The results also showed that the speed-up ratio is determined only by the amount of computation, and is not influenced by the configuration of cooperating analysis processes. If our processing model is to be practical as a method for a real parallel inference machine, the object of analysis should require a great amount of calculation because when the amount of calculation is low we can not expect a satisfactory speed-up ratio. We think that our processing model has the potential 412 20~------------------------~ 6.00e+6 . . . . . - - - - - - - - - - - - - - - - , 18 16 o 5.00e+6 14 4.00e+6 .~ 12 0.. :J (/) c: go .00e+6 10 -6 8 -C1t>1j: :O>-:J te iJ'0M~ 1i a-~-:J-CIt>.o~~K~~-:J-C-t0)~~"i-M0-C~tc Example sentences (Since only 1 hadn't, solved that. equation, my friends who knew how to solve it helped me out and 1 solved the problem and went to bed.) Examplel ::t: -:J -C ~.o (I will get fat.) Example18 tJ:~;6;7) 9 :hiJ'0LO)~iJtcffiSL-Clr>teW:ti5-a-Pf-N ti:o'0 ti (He inherited his father's business.) Example4 WO)~rj.l-c·'itt3lS-a-·W.o Ij: (Don't hang up the telephone in t.he middle of a conversat.ion.) Example5 4-1J0)1Rk1·VJ:LO)~iiJ: 1):i:-:J-C1t>1L (Today's Nanako is fatter than before.) Example6 $~:o'0I±lte 0'itt5i:i.0);;r.. -1 -;; 1":rtJ] I) Ij: ~ Ir> (Turn off the light when you leave t.he room.) Example7 ~-a-IJSfN-C·/.t-*:r~· -:J te:O'0mfi~ICj1gN--C·Ir>.o (Because I called and scolded him, it is progressing well.) ExampleS te (She began to tell me that that. was the reason why she cut her hair.) :~lf~lCfT < ~p:;6;*-:J Example9 te(-\Jff.;6;JJ§.lj.rJ: t. N';>le 0~;6~ ~-:J-C~te (When the train that she, who was going to Tokyo, was on started moving, snow began to fall.) ExamplelO fLti~p::ri~t.:.-Clt>teo)lC-t :b .r c mh -C L 1j:;6, -:J te '5 1t>'5~k:::-HJ:~"i--C: (Though I believed her, Nanako didn't return until the night on purpose.) -t O)~~"ef.L.til1;6;~J!"i-·§-:J te L '5 It> '5 rc'lm KIj: -:J te Examplell L c -c·1&--c·~tl--c: (Because only I expressed my opinion at that meeting, later at the company we had this kind of problem.) Exalllple12 'iI.;6;*.oO)"i-~i;>Ij:;6~0/~-T~ 1C1lf~~~O)L c "i- 3'R k=f;6;~ K ffiS L -C It> te (While waiting for the train, Nanako told him about the friends that she would invite to the party.) Example13 :/7 '7 f.L.O):5(;6;rc,jl§O) f,ff!lWJ "i-MIr>-C{BtKJ!-ttte 01&--C:iT r'J t:7£~(7)p:~;6'0i1ffiS;6;~1tv>-:J -C! te (First my father opened t.he package in question and showed it to him, and later we got a telephone call from a woman from the Kingdom of Saudi Arabia.) Example14 ;; 9 ;;r..",,;;r..0)1iiK~tt:O'0~-:J-C< .o~k-f-"i-*i;> 1j:~0:5(cML-Clt>k0.ICIj:-:Jk (As I talked to my father while waiting for Nanako to come home from school on Christmas Eve, it started snowing.) Example15 7 j. 9 :h:o'0LO)~ijtcffi5L-Clr>te:~H~a:IJSfNtiiJ'0 -c It> Ij: iJ, -:J tc:o' 0 M! 1i "i-~-:J-CIr>.o~~K~~~-c-tO)~m"i-Mlr>k (Since only I hadn't solved that equation, my friends who knew how to solve it helped me out and I solved the problem.) PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992, edited by ICOT. © ICOT, 1992 414 Architecture and Implementation of PIM/p Kouichi KUIvrON Akira ASATO Susumu ARAI Tsuyoshi SHINOGI Akira HATTORI Fujitsu Limited 1015, Kamikodanaka, Nakahara-ku, Kawasaki 211, Japan Kiyoshi HIRANO Hiroyoshi HATAZAWA Fujitsu Social Science Laboratory Ltd. Institute for New Generation Computer Technology Abstract In the FGeS project, we have developed a parallel inference machine, PIM/p, as a one of the final outputs of the project [Taki 1992]. PIM/p has up to 512 processing elements(PEs) using two level hardware structures. Each PE has a local memory and a cache system to reduce bus traffic. The special cache control instructions and the macro-call mechanism reduce the common bus traffic, which may become the performance bottle-neck for shared-memory multi-processor systems. Eight PEs and a main memory are connected by common bus using the parallel cache protocol, we call it a cluster. PIM/p system consists sixty-four clusters, those are connected by dual sixth-order hyper-cube networks. The KL1 processing system on PIM/p has two component, the compiler and the run-time support routines. The compiler uses the templates to generate PIM/p native codes from KL1-B codes. Each KL1-B instruction has a corresponding template .. The codes are optimized after the expansion from KL1-B to native. codes. The run-time support routines are placed in the internalinstruction memory, in the local-memory, or in the shared memory according to their calling frequencies. The preliminary evaluation results are presented. Corresponding to the hierarchy of PIM/p, two different configuration systems: the network connected system and the common bus connected system, are compared. The results show that the speedup ratio compared to one PE is nearly equal to the number of PEs for both configuration systems. Hence, the bus traffic is not a performance bottle-neck in PIM/p, and the automatic loadbalancing mechanism appropriately distributes loads among PEs within a cluster at the evaluation. 1 Introduction A parallel inference machine prototype(PIM/p) is now being used. It is tailored to KL1 [Ueda and Chikayama 1990], and includes up to 512 processors. A two-level hierarchical structure is being used in the new system: a processing element and a cluster(Figure 1). Eight processing elements form a cluster, which communicates with a shared memory through a common bus using snooping cache protocols. The clusters are connected with dual hypercube packet switching networks through network interface co-processors and packet routers. The chassis consists of four clusters. The maximum PIM/p system includes sixteen chassis. A single clock is delivered to all processing elements, maintaining the phase between different chassis. Some of the features introduced in the PIM/p system are: • Two level hierarchical structure to allow parallel programming with common memory and to facilitate system expansion with the hypercube network. • The macro-call instructions which have the advantages of both hard-wired RISC computers and micro-programmable instruction set computers. • Architectural support for incremental garbage collection Multiple Reference Bit(MRB), which reduces memory consumption when the executing parallel logic programming languages such as KL1. • Each processing element has a local memory, which can reduce bus traffic if the accessed data are placed in the local memory. • Coherent cache and dedicated cache commands for KL1 parallel execution, which can also reduce common bus traffic. • Generating the native instruction codes from intermediate KL1-B codes by optimizing compiler with a optimizer. • The optimizer analyses data-flow for both the tag parts and the data parts independently, which can eliminate unnecessary tag operations. 415 PE: Processing Element NIU: Network Interface Unit Cluster63 Figure 1: PIM/p system configuration 64 bi t internal data bus NIU (Network Interface Unit) r--- FPU r--- (Floating Point Unit) r-- ~ r- t Instruction Cache SCSI Protocol Controller 64K~ S CSI bus I- IIM (Internal Instruction Memory) Cache address ~ 64KB r-Data Cache/. 2 Network router internal address IPU internal (Instruction rn(lp Pro cessing Unit) Instruction r--- ~ ccu (Cache Controller Units) IPU Architecture The instruction processing unit(IPU) executes RISC-like instructions which have been tailored to KLI execution. The instruction set has many features which facilitate efficient KLI program execution. In this section, we describe these features. 2.1 Tagged data and type checking To execute KLI programs, a dynamic data type checking mechanism is needed to provide: • Transparent pointer dereferencing. • Polymorphic operations for data types. • Incremental garbage collection support. t Common bus Figure 2: PIM processing element configuration The processing Element(PE) consists of an Instruction Processing Unit(IPU), a Cache Control Unit(CCU) and network interface unit(NIU). Figure 2 is a schematic diagram of aPE. In this paper, the hardware architecture and the KLI processing system are described. In Section 2 to Section 4 we describe IPU, cache and the network system. Then, the run-time support routines for KLl, and the KL1-B compiler code generation and its optimization are described in Section5. Finally in Section 6 a preliminary performance evaluation results are presented. Dereference is required at the beginning of most unification operations in KLI. In dereference, a register is first tested' to see whether its content is an indirect pointer or not. If it is an indirect pointer, the cell pointed to is fetched into the register and its data type is tested again. Many operations in KLI include run-time data type checks even after dereferencing has been completed. Unifications include polymorphic operations for data whose type is not known until run-time. In addition, incremental garbage collection by MRB is embedded in dereferencing(See Section 2.5 for details). Therefore, tagged architecture is indispensable for the KL1 processing. In PIM/p, data is represented as 40-bit (8-bit tag + 32-bit data), and the general-purpose register has both a data part and a tag part. The MRB is assigned in one bit of the 8 bit tag. The tag conditions are specified as bit-wide logical operations between the tag of a register and the 8-bit immediate tag value in the instruction. An instruction can specify the logical operation as AND, OR, or XOR or a negations of one of these. 416 If a.n instruction specifies XOR as its logical operation, it checks whether the tag of the register matches the immediate value supplied in the instruction. Xormask operation does this matching under the immediate mask supplied in the instruction, which enables various groups of data types to be specified in a conditional instruction if the data types are appropriately assigned to tag bits (See Section 5.1 for details). Various hardware flags, like the condition code of ALU operations or hardware exception flags, can be checked as the tags of dedicated registers, so these flags can be examined by a method similar to data type checking. 2.2 Instructions and pipeline execution The processing element uses an instruction buffer and a four-stage pipeline, D A T B, to attempt to issue and complete an instruction. Table 1 shows the pipeline stages in AL U, memory access and branch instructions. All instructions except co-processor instructions are issued in every cycle. Basic instructions such as ALU operations have three operands, and memory accessing instructions are limited to load and store type instructions. Pipeline execution tends to make the branch penalty large. In PIM/p, the target instruction starts four clock after the branch instruction starts. To reduce the branch penalty, delayed branch instructions are used. These have one delay slot after them. The skip instruction is also useful. This nullifies a subsequent instruction if the skip condition is met. The skip instruction does not cause a pipeline break, so its use results in efficient instruction execution. Figure 3 shows the pipeline stages in conditional branch/delayedbranch/skip instructions. In the PIM/ p pipeline, all instructions write their results at the B stage and ALU or memory write instructions require source operands at the beginning of the B stage. The bypass from the B stage can eliminate interlocks. Conditional branch instructions test the condition at the B stage, the bypass also eliminates condition test interlocks. However, when the register is used by address calculation at the A stage when the value of the register has just been changed, an interlock may occur even if a bypass from B to A is prepared. Figure 4 shows this address calculation interlock. The compiler must recognize such interlock conditions and should eliminate them as far as possible.(See section 5.2.3) 2.3 Macro call and internal instructions A RISC or RISC-like instruction set has advantages in both low hard ware design cost and fast exec~tion pipelining. However, naive expansion of KL1-B to low-level RISC instructions produces a very large compiled code. When conditional branch is taken: condition tested at B D A T B : condo branch instruction D A T canceled : next external instruction D A canceled : 2nd external instruction D canceled : 3rd external instruction D A T : branch target instruction condition tested at B : condo branch instruction : next external instruction : 2nd external instruction : 3rd external instruction A T : branch target instruction When delayed branch is used: D A T B D A T B D A canceled D canceled D When conditional skip is taken: condition tested at B A T B : condo skip instruction D A T canceled : next external instruction D A T B : 2nd external instruction D A T B : 3rd external instruction D Pipeline stages of conditional branch/skip instructions Figure 3: D A T B D D D A D T A B T : register write instruction. : inter-lock occurs : next instruction Figure 4: Interlock caused by address calculation This may cause frequent instruction cache miss-hits and may fill up the common bus band width with instruction feed, especially in tightly-coupled multiprocessors such as a PIM/p cluster. Here, reducing common bus traffic is a most important design issue as is reducing the cache miss-hit ratio. On the other hand, the static code size can be small in a high-level instruction set computer with micro-programs, such as PSI. To meet both requirements, the processing element of PIM/p has two kinds of instruction streams, ex,~ernal and internal. External instructions are mostly RISC-like instructions with KL1 tag support[Shinogi et al. 1988]. Internal instructions are fed from internal instruction memory like micro-instructions. The external instruction set includes macro-call instructions, which first test the data type of a register given as an operand, then invoke programs in the internal instruction memory(IIM) or simply execute the next external instruction, depending on the test result. Every time a macro-call instruction is executed, the corresponding macro-body instruction is fetched from IIM to reduce the calling overhead, but it is not executed unless a macro-call test condition is met (See the Sand C stages of Table 1). Figure 5 shows the pipeline stages of macro-call instructions. A macro-call instruction can be regarded as a light-weight conditional subroutine call or 417 . Table 1: Pipeline stages of ALU, memory access and branch instructions (S) (C) D A T B ALU operation Memory access Branch Set IIM address, valid only for m-call or internal instructions Fetch instruction from IIM, valid only for m-call or internal instructions Decode 7 Decode 7 Decode Register read for address Register read for address Memory address Branch address calculation calculation Register read Cache tag access Cache tag access AL U operation 7 Cache data access 7 Cache data access 7 Register write Register write Condition test When the condition met: condition test at A D A : macro-call instruction : next external instruction D canceled : first internal instruction S C D A T B S C D A T B : 2nd internal instruction When the condition is not met: condition test at A D A : macro-call instruction D A T B : next external instruction D A T B : 2nd external instruction Figure 5: Pipeline stages of macro-call instructions as a high-level instruction with data type checking. To reduce the overhead of passing parameters from a macro-call instruction to the macro-body, the PIMjp processing element has three indirect registers. The indirect registers are pseudo registers whose real register numbers are obtained from the corresponding macro-call instruction parameters. These mechanisms may appear to be similar to those of conventional micro-programmable computers. Programs stored in IIM are written by system designers into internal instruction memory, like micro-programs. However, the internal instruction set is almost the same as the external instruction set, so a designer can use same development tools to generate both external and internal programs. Therefore, system designers can specify internal or external at the machine-language level, without writing complicated micro-instructions, as in conventional micro-programmable computers. 2.4 Dynamic test stage change As discussed in the Section 2.3, internal instruction executions require an additional two pipeline stages, Sand C, before the D stage, internal conditional branch causes a five clock cycle branch penalty when the branch is taken. In the case of an external branch instruction, target instruction fetch starts at A as an operand and the fetch finishes at the B stage, thus testing the condition before the B stage cannot reduce branch penalty. However, internal instructions must use the Sand Table 2: The advantages and disadvantages of B and A condition check Test stage B A Advantages No interlock 1r branch penalty Disadvantages 5r branch penalty O/1/2r interlock lr=l clock cycle C pipeline stages to fetch the target internal instruction. It cannot not start before the condition test. If the branch condition is determined earlier, say at stage A, target fetch can be started earlier. This reduces the branch penalty. However, an early condition test causes interlocking, which is common to memory address calculation, and this will occur even if the branch is not taken. Table 2 shows the advantages and disadvantages of both B stage and A stage condition tests. Some sample codings show internal conditional branches are often placed just after memory read or ALU operation instructions, and it is hard to insert non-related instructions between them. To minimize pipeline stall, an A stage test should be used if the previous instruction does not interlock the condition test, otherwise B stage test should be used. Preparing two sets of branch instructions, a B stage test and an A stage test, adds instructions to the PIMjp instruction set, because the PIMjp instruction set has many conditional branch instructions for various tag checking. Without adding instructions, the PIMjp pipeline controller decides between internal conditional branch A or B[Asato et al. 1991]. When some instructions interlock the test stage A of a successive internal conditional branch, the test stage is changed to B to avoid interlock, otherwise the test is done at A stage. We call this a dynamic conditional branch test stage change. If a compiler or a programmer can put two or more instructions between a register write instruction and a conditional branch based on the register, the test is done at the A stage. 418 2.5 MRB support Incremental garbage collection support is one of the most important issues in parallel inference machines. The PIM/p instruction set includes several instructions for efficient execution of MRB garbage collection[Chikayama and Kimura 1987]. Using the MRB incremental garbage collection, value cells or structures are allocated from free lists, and when those allocated areas are reclaimed, the areas are linked to free lists. To support these free list operations, the push and pop instructions are used. The MRB of each pointer and data object has to be maintained in all unification instructions. Especially in dereference, the MRB of the dereferenced result is off if and only if MRBs of both the pointer on a register and the pointed cell are MRB-off. MRB is assigned to one of the eight bit tag data. MRB-on means the bit is 1, MRB-off means a respectively. Therefore logical or of both the pointer MRB bit and the pointed data MRB bit represents the pointed data's multiple reference status. Dedicated instructions Read Tag WordMrbor and Deref support this operation. Read Tag WordMrbor loads memory data pointed by address register into destination register, accumulates both the address register's MRB and the destination register's MRB that is MRB of the memory data, sets the result status in the destination register. De ref is similar to the Read Tag WordMrbor instruction, but loads memory data into address register and the old address register value is saved to destination register simultaneously. Therefor, succeeding instructions can examine that the pointed data can be reclaimed or not by testing destination register's MRB bit. These dedicated instructions can minimize the overhead to adopt MRB incremental garbage collection. 3 3.1 Memory Architecture Cache and bus protocols Each PIM/p element processing has two 64K bytes caches for instructions and data. PIM/p uses copyback cache protocols which have been proved effective for reducing common bus traffic in shared-memory multiprocessors. To maintain cache coherence, there are basically two mechanisms, invalidating the modified block and broadcasting the new data to others. PIM/p uses the invalidation method for the following reasons. To use incremental garbage collection MRB, a reclaimed memory area need not be shared. Next time the area is used it may not be shared with the same processors which previously shared the area. In other, KL1 load distribution is achieved by distributing goal records in a cluster from one processor to another. Usually the distributed goals will not be referred from the source processor. In these cases, the broadcast method will produce unnecessary write commands to the ,common bus on every write to the newly allocated area or distributed goals. The invalidation method is much more efficient. PIM/p cache protocol is similar to Illinois protocol. However, PIM/p protocol has the following cache commands optimized for KLl. In normal write operations, a fetch-on-write strategy is used; however, it is not necessary to fetch the contents of shared memory when the block is allocated for a new data structure. That means the old data in the block is completely unnecessary. In KLl, when free lists are recreated after grand garbage collection, the old contents of memory have no meanings. To accomplish this, DirecL Write is used. DirecLwrite: If cache misses at the block boundary, write data into cache without fetching data from memory. The following instructions are used for inter-processor communication through a shared memory, for example goal distribution. Read..Invalidate: When cache misses, fetch the block and invalidate the cache block on other CPUs. This operation guarantees that the block is exclusive unless the other CPU subsequently request the block. Read-Purge: After the CPU reads a block, it is simply discarded even if it is modified. Exclusive-fead: Same as Read-Invalidate except for the last word in a cache block. When it is used to read the last word in a cache block, it purges the block like Read_Purge. Using these instructions, unnecessary swap-in and swapout can be avoided by invalidating the sender's cache block after receiver the gets the block, and by purging the receiver's cache block after the receiver reads all data in the block. Ill-behaved software may cause these instruction to destroy cache coherency. However, these instructions are used only in KLI processing system, and only systems programmers use them. There are hardware switches which can change the actions of those special read/write instructions to normal read/write actions. By using these switches, the systems programmer can examine their programs consistency. 3.2 Exclusive control operation To build a shared-memory parallel processor system, lock and unlock operation are essential guarding critical sections. KLI requires fine-grain parallel processing. The frequency of locking and unlocking operation needed for shared data is estimated at more than 5% of all memory accesses. Thus these operation must be executed 419 with low overheads by using hardware support. However, locking operations should seldom conflict with each other. It is therefore useful to introduce a hardware lock mechanism which has low overhead when there are no lock conflicts. In PIM/p, the cache block has exclusive and sha1'ed status. When the block is exclusive, it is not owned by other PEs. Hence there is no need to use the common bus. A marker called the lock address register which remembers the block is locked by the CPU. When the CPU locks a block, other CPU cannot get the block data until the block is unlocked by the original CPU. Even when the block is shared, fetching data and invalidating the block before locking is sufficient. The cost is nearly equivalent to the normal write operation. In KL1 processing, unification requires frequent locking, but the locking time is fairly short. A hardware busy wait scheme is better for lock conflict resolution. If a longer locking time is needed, a software lock can be made by combining lock, read and conditional jump instructions. For KL1, no bus cycles are needed for most of the lock reads hitting exclusive cache blocks. To build a packet, the IPU first makes a header which contains the packet destination and mode for broadcasting. It then building a packet body by executing coprocessor write instructions, which packs data one, two, or four bytes at a time. Finally the IPU puts a end of packet marker to send the packet to RTR. A whole packet of data is stored in packet memory before sending it, to minimize RTR busy time. The send and receive packet memories are both 16K bytes long. Each cluster has four SCSI ports which are connected to the PEs. Two have non-differential SCSI interface ports, and the other two have differential SCSI interface ports. The differential SCSI interface is able to extend the interface cable up to twenty five meters. It is used to connect SCSI disks which need not be placed beside the cluster. The PIM/p FEP is connected to a non-differential interface, and various other SCSI devices, such as an ether-net transceiver, can be connected through the SCSI bus. This extends PIM/p's application domain. 4.2 4 Network Architecture 4.1 Network interface unit Multiple clusters are connected by a hypercube topology network. At the design stage, we assumed that ten logical reductions require a hundred-bytes packet transfer. The target speed of PIM/p PE will be between 200K LIPS to 500K LIPS. This means 2M to 5M bytes per second network bandwidth is required by each PE. Thus 16M to 40M bytes per second network bandwidth is required to a cluster which contains eight PEs. If this data flows into the common bus, network packet data occupies about 10% to 25% of the total bandwidth of the common bus. Providing a network interface to each processing element reduces such common bus traffic. Each cluster has 8 PEs, and each PE has a network interface co-processor called a network interface unit (NIU). By attaching a NIU to each PE, a PE can send to or receive from a packet without using the common bus. The NIU performs the following functions: • Builds a packet into the NIU's packet memory, and sends it to the network router(RTR). • Receives a packet from the RTR, stores it to the packet memory. and signals the arrival of a packet to IPU. • Communicates to a SCSI bus driver chip which connects to PIM/p front-end processors(FEPs) or disks. All these actions are controlled by the IPU's co-processor instructions. Inter-cluster network connection While the NIU sends and receives packets, the network packet router(RTR) actually delivers packets. Each RTR connects four NIUs and up to six other RTRs to build a sixth order hypercube network topology. Thus each cluster has two RTRs which construct two independent hypercube networks to improve the total network throughput. The RTR can connect a maximum of sixty-four clusters(512 PEs). RTR uses the wormhole routing method to reduce traveling time when the network is not so busy, to avoid packet length restrictions caused by RTR packet buffer limitation. Between RTRs data is transferred at system clock rate. RTR has approximately 1K bytes of packet buffer for every output port, in order to reduce network congestion. The static routing method is used and deadlocks are avoided by the routing method. Broadcasting to the sub-cube is available. This can be used when the system is at the initial program stage. In the PIM/p system, one chassis contains four clusters. The maximum 512PE PIM/p system is sixteen chassis. Building for such a large system can be problematic. Transferring data between these chassis by synchronous-phase matched clock is impossible, because the system occupies an area of about sixteen meters square. This means that the traveling time of data is about one system clock tick. Introducing another hierarchy between inner-chassis communication and inter-chassis communication complicates the distribution strategies of the KL1 processing systems. This should be avoided. One of main feature of RTR is the interconnection between PIM/p chassis. To attain a transfer rate equal to system clock rate f • • • • • • • I • • • • • • 80 ................. . 60 ..... 60 ..... .~ 40 ..... . 40 60 80 100 120 (PEs) o : 1 M-nodes 250 K-nodes (a) best-path (90 K-nodes) 00 0: 20 40 60 80 100 120 (PEs) 10 x 6 box *: (b) pentomino (8 x 5 box) Figure 10: Speedups for best-path and pentomino Cache miss penalty should be the major degradation factor in best-path which has a large working set. Even in Multi-PSI/v2, cache miss degrades the performance 10 to 20 % as reported in [Nakajima and Ichiyoshi 1990]. Thus, the penalty relative to the machine cycle becomes more critical, because the cache size and physical memory access time of PIM/m are not greatly evolved from Multi-PSI/v2. 4.2 System Performance System performance is strongly related with load distribution strategy and communication cost. Since PIM/m has four times as many PEs as Multi-PSI/v2 has, it might become difficult to balance loads distributed to PEs. As for communication cost, we evaluated that the network capacity of Multi-PSI/v2 is much larger than required [Nakajima and Ichiyoshi 1990J. Therefore, we designed PIM/m's network making its throughput and bandwidth almost equal to those of Multi-PSI/v2's, expecting that the network still has enough capacity. The frequency of message passing, however, might be contrary to our expectation, because of underestimation of hot spot effect and so on. The speedup, which is gotten by dividing execution time for single processor by that for n processors, may give preliminary answers about those questions. Figure 10 shows the speedups of PIM/m and Multi-PSI/v2 for best-path and pentomino. Up to the 64 PE system, the speedup of PIM/m are quite similar to or slightly better than that of Multi-PSI/v2. Especially, the result of best-path shows surprising super-linear speedup, probably because partitioning the problem makes required memory space for a PE small and reduces cache miss rate and/or the frequency of batch-mode garbage collection. These results show that the network of PIM/m stands increase of message passing frequency caused by the improvement of PE performance. Thus, the perfomance of single cabinet minimum system is greatly improved from Multi-PSI/v2. That is, M/P-speedup is 5.6 for best-path and is 8.3 for pentomino. On the other hand, the speedup of the 128 PE system are considerably low, especially for best-path. Thus, the M/P-speedups for 4-cabinet a half of maximum system are 3.7 for best-path and 6.4 for pentomino. This implies that the problem size is too small to distribute loads to 128 PEs and/or the message passing frequency exceeds the network capacity. As for best-path, the reason of low speedup seems to be small size of the problem which takes only 1.8 sec on the 128 PE system, because a PE transmits messages only to its adjacent PEs. For example, when the problem is scaled up by increasing the number of nodes from 90 K to 250 K and 1 M, the speedups for the 128 PE system become 87 and 109 respectively, as shown in the figure*. *Since large problems cannot run on small size systems, the speedups are estimated by multiplying 32 PE speedups for small problems by 32 to 128 PE speedups for large problems. 434 In pentomino, its load distribution strategy might cause hot spot PEs which pool loads and distribute them in demand driven manner. The hot spot, however, is possibly that of computation for load generation rather than communication for distribution,,~ ,The problem size may also limits the speedup, because the execution time of the 128 PE system is only 1.3 sec. The speedup of larger size problem, which is for 10 X 6 box and takes 211 sec on the 128 PE system, is 105 as shown in the figure*. We are now planning further evaluation and analysis to confirm these observations or find out other reasons. As for IS-puzzle, we measured the speed ups of 64 and 128 PE systems changing the problem size as shown in Figure 11. The figure also shows the number of nodes in the search space for, each of seven initial states of the game board. The results for the 64 PE system of PIM/m is also quite similar to that of Multi-PSI/v2. The speedup of the 128 PE system, 38.7 to 109.2, are tightly related to the size of problems. The analysis of this relation is also left as a future work. 5 Concluding Remarks This paper presented the hardware architecture of PIM/m system, its processor element, and the pipelined microprocessor dedicated to the fast execution of KL1 programs. The KL1 implementation issues focused on its relation with garbage collection were also described. Then preliminary performance evaluation results were shown with brief discussions on them. We are now planning a research concentrated on further evaluation of the performance of PIM/m and the behavior of various KL1 programs. The evaluation results and detailed analysis on them should greatly contribute not only to the performance tune-up of PIM/m but also to the research on parallel inference machines in next step. Acknowledgment We would like to thank all those who contributed to the development of PIM/m system in ICOT, Mitsubishi Electric Corp. and related companies. We also wish to thank Vu Phan and Jose Uemura for their contribution to this paper. References [Chikayama 1984J T. Chikayama. Unique Features of ESP. In Proc. IntI. Con£. on Fifth Generation Computer Systems 1984, pp. 292-298, Nov. 1984. [Chikayama and Kimura 1987J T. Chikayama and Y. Kimura. Multiple Reference Management in Flat GHC. In (Speed up) 120 100 (Nodes) 6000 --.. : PIM/m (128 PE) G-€> : PIM/m (64 PE) ............. ~ - ~ 5600 : Multi-PSI/v2 80 1600 60 ....: .:. ..... .:. ..~.~ ...... 1200 40 800 20 ... 400 o Figure 11: Speedup for is-puzzle Proc. 4th IntI. Con£. on Logic Programming, pp. 276-293, 1987. [Chikayama et ai. 1988J T. Chikayama, H. Sato, and T. Miyazaki. Overview of the Parallel Inference Machine Operating System (PIMOS). In Proc. Intl. Con£. on Fifth Generation Computer Systems 1988, pp. 230-251, 1988. [Furuichi et ai. 1990J M. Furuichi, K. Taki, and N. Ichiyoshi. A Multi-Level Load Balancing Scheme for OR-Parallel Exhaustive Search Programs on the Multi-PSI. In Proc. 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 50-59, Mar. 1990. [Ichiyoshi et ai. 1987J N. Ichiyoshi, T. Miyazaki, and K. Taki. A Distributed Implementation of Flat GHC on the Multi-PSI. In Proc. 4th IntI. Con£. on Logic Programming, pp. 257-275, 1987. [Ichiyoshi et ai. 1988J N. Ichiyoshi, K. Rokusawa, K. Nakajima, and Y. Inamura. A New External Reference Management and Distributed Unification for KL1. In Proc. IntI. Con£. on Fifth Generation Computer Systems 1988, pp. 904-913, Nov. 1988. [ICOT 1990J ICOT. Proc. Workshop on Concurrent Programming and Parallel Processing, 1990. [Inamura et ai. 1989J Y. Inamura, N. Ichiyoshi, K. Rokusawa, and K. Nakajima. Optimization Technique Using the MRB and Their Evaluation on the Multi-PSIjV2. In Proc. North American Con£. on Logic Programming 1989, pp. 907-921, 1989. [Kimura and Chikayama 1987J Y. Kimura and T. Chikayama. An Abstract KL1 Machine and Its Instruction Set. In Proc. 4tll IEEE Symp. on Logic Programming, pp. 468477, Sept. 1987. [Machida et ai. 1991 J H. Machida, H. Andou, C. Ikenaga, H. Nakashima, A. Maeda, and M. Nakaya. A 1.5 MLIPS 40- 435 bit AI Processor. In Proc. Custom Integra.ted Circuits Coni, pp. 15.3.1-15.3.4, May 1991. [Masuda et al. 1988] K. Masuda, H. Ishizuka, H. Iwayama, K. Taki, and E. Sugino. Preliminary Evaluation of the Connection Network for the Multi-PSI system. In Proc. 8th European Conf. on Artificial Intelligence, pp. 18-23, 1988. [Nakajima et al. 1989] K. Nakajima, Y. In amur a , N. Ichiyoshi, K. Rokusawa, and T. Chikayama. Distributed Implementation of KLI on the Multi-PSI/V2. In Proc. 6th IntI. Conf. and Symp. on Logic Programming, 1989. [Nakajima and Ichiyoshi 1990] K. Nakajima and N. Ichiyoshi. Evaluation of Inter-Processor Communication in the KLI Implementation on the Multi-PSI. In Proc. 1990 IntI. Coni on Parallel Processing, Vol. 1, pp. 613-614, Aug. 1990. [Nakashima and Nakajima 1987] H. Nakashima and K. Nakajima. Hardware Architecture of the Sequential Inference Machine: PSI-II. In Proc. 4th IEEE Symp. on Logic Programming, pp. 104-113, Sept. 1987. [Nakashima et al. 1990] H. Nakashima, Y. Takeda, K. Nakajima, H. Andou, and K. Furutani. A Pipelined Microprocessor for Logic Programming Languages. In Proc. 1990 IntI. Conf. on Computer Design, pp. 355-359, Sept. 1990. [Rokusawa et al. 1988J K. Rokusawa, N. Ichiyoshi, T. Chikayama, and H. Nakashima. An Efficient Termination Detection and Abortion Algorithm for Distributed Processing Systems. In Proc. 1990 IntI. Conf. on Parallel Processing, Vol. I, pp. 18-22, Aug. 1988. [Takeda et al. 1988J Y. Takeda, H. Nakashima, K. Masuda, T. Chikayama, and K. Taki. A Load Balancing Mechanism for Large Scale Multiprocessor Systems and Its Implementation. In Proc. IntI. Conf on Fifth Generation Computer Systems 1988, pp. 978-986, Sept. 1988. (Taki et al. 1984J K. Taki, M. Yokota, A. Yamamoto, H. Nishikawa, S. Uchida, H. Nakashima, and A. Mitsuishi. Hardware Design and Implementation of the Personal Sequential Inference Machine (PSI). In Proc. IntI. Conf. on Fifth Generation Computer Systems 1984, pp. 398-409, Nov. 1984. [Taki 1988J K. Taki. The Parallel Software Research and Development Tool: Multi-PSI System. In K. Fuchi and M. Nivat, editors, Programming of Future Generation Computers. North-Holland, 1988. [Uchida et al. 1988] S. Uchida, K. Taki, K. Nakajima, A. Goto, and T. Chikayama. Research and Development of the Parallel Inference System in the Intermediate Stage of the FGCS Project. In Proc. IntI. Coni on Fifth Generation Computer Systems 1988, pp. 16-36, Nov. 1988. [Ueda 1985] K. Ueda. Guarded Horn Clauses. Technical Report 103, ICOT, 1985. (Also in Concurrent Prolog: Collected Papers, The MIT Press, 1987). [Wada-K and Ichiyoshi 1989] K. Wada and N. Ichiyoshi. A Study of Mapping of Locally Message Exchanging Algorithms on a Loosely-Coupled Multiprocessor. Technical Report 587, ICOT, 1989. (Wada-M and Ichiyoshi .1991] M. Wada and N. Ichiyoshi. A Parallel Iterative-Deepening A * and its Evaluation. In Proc. KL1 Programming Workshop '91, pp. 68-74, May 1991. (In Japanese). [Warren 1983] D. H. D. Warren. An Abstract Prolog Instruction Set. Technical Report 309, Artificial Intelligence Center, SRI International, Oct. 1983. [Watson and Watson 1987] P. Watson and I. Watson. An Efficient Garbage Collection Scheme for Parallel Computer Architecture. In Proc. Parallel Architecture and Languages Europe, June 1987. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON FIFTH GENERATION COMPUTER SYSTEMS 1992 edited by ICOT. © ICOT, 1992 ' 436 Parallel and Distributed Implementation bf Concurrent Logic Programming Language KLl Keiji HIRATA Reki YAMAMOTO Akira IMAI Hideo KAWAI Kiyoshi HIRANO Tsuneyoshi TAKAGI Kazuo TAKI Institute for New Generation Computer Technology 4-28 Mita 1 chorne, Minato-ku, Tokyo 108 JAPAN Akihiko NAKASE TOSHIBA Corporation Abstract This paper focuses on a parallel and distributed implementation method for a concurrent logic programming language, KL1, on a parallel inference machine, PIM. The KL1 language processor is systematically designed and implemented. First, the language specification of KL1 is deliberately analyzed and properly decomposed. As a result, the language functions are categorized into unification, inter-cluster processing, memory management, goal scheduling, meta control facilities, and an intermediate instruction set. Next, the algorithms and program modules for realizing the decomposed requirements are developed by considering the features of PIM architecture on which the algorithms work. The features of PIM architecture include a loosely-coupled network with messages possibly overtaken, and a cluster structure, i.e. a shared-memory multiprocessor portion. Lastly, the program modules are combined to construct the language processor. For each implementation issue, the design and implementation methods are discussed, with proper assumptions ~iven. This paper concentrates on several implementation issues that have been the subjects of intense ICOT research since 1988. 1 Introduction In the Fifth Generation Computer Systems Project, ICOT has been, simultaneously, developing a large-scale parallel machine PIM [Goto et al. 1988] [Imai et al. 1991], designing a concurrent logic programming language KL1 [Ueda and Chikayama 1990], and investigating the efficient parallel implementation of KL1 on PIM [ICOT 1st Res. Lab. 1991]. These subjects are closely related and have been evolving together. Kazuaki ROKUSAWA OKI Electric Industry Co.,Ltd. The KL1 language has several good features: a declarative description, simple representation of synchronization and communication, symbol manipulation, parallelism control, and portability. Similarly, PIM architecture, also, has a number of good features: high scalablity, general purpose applicability, and efficient symbolic computing. When implementing KL1 on PIM, various difficulties appear. However, the parallel and distributed implementation of KL1 must bridge the semantic gap between PIM and KL1 so that programmers can enjoy the KL1 language as an interface for general-purpose concurrent/parallel processing [Taki 1992]. ICOT has implemented KLI on Multi-PSI (a distributed-memory MIMD machine) and has been accumulating experience in KLI implementation [Nakajima et al. 1989]. The implementation of KL1 on Multi-PSI was a preliminary experiment for our implementation. This paper primarily focuses on a parallel and distributed implementation method for the concurrent logic programming language KLI on a parallel inference machine PIM. Section 2 gives readers some brief background knowledge on PIM and KLl. Section 3 systematically investigates the complex connections of what part of the language specifica:tion is supported by what component(s) of the KL1 language processor. Among these components, Section 4 focuses on and discusses several key implementation issues: efficient parallel implementation within a shared-memory portion, inter-cluster processing, a parallel copying garbage collector, meta control facilities, and a KL1 compiler. Section 5 concludes this paper. 437 Processing Element Inter-Cluster Network Current Goal @ Suspension by passive unification Clusterl Clusterm Figure 1: PIM Architecture Overviews of PIM and KLI 2 2.1 G c;:;) © © Creation by goal rewriting R~sump~ion· ~y c;:;)G actIve umficatlOn Suspended Goals @ © Ready Goals PIM Figure 1 shows the PIM architecture [Goto et al. 1988] [Imai et al. 1991]. PIM architecture assumptions and features are described below. One of the features of PIM architecture is its hierarchy. Up to about ten processing elements (PEs) are interconnected by a single bus to form a structure called a "cluster" in which main memory is shared. Here, the bus can be regarded as a local network. Many clusters can be interconnected by a global network. Within a cluster, inter-PE communication can be realized by short-delay high-throughput data transfer via the bus and the shared memory. Thus, PEs within a cluster share their address spaces, and each PE has its own snooping cache. The instruction set of a PE includes lock&read, wri te&unlock, and unlock as basic memory operations. Inter-cluster communication, though, may pass messages through some relay nodes and over long distances. Thus, inter-cluster communication increases the time delay and decreases the throughput. The address spaces of distinct clusters are separated, of course. The network delivers message packets to destinations while reading their header and tailer information. PIM architecture assumes the following property for the inter-cluster loosely-coupled network. If PEs send and/or other PEs receive message packets, the order of packets does not obey the FIFO rule. Even in one-PEto-one-PE communication, the FIFO rule is not obeyed. This assumption comes from the following hardware characteristics of PIM architecture. The reasons for this assumption are as follows. One is that there may be more than one path between two clusters I. The other is that when more than one PE within a cluster simultaneously sends message packets, it is not determined that which packet will be launched first into the network. In this sense, in the loosely-c.oupled network of PIM, messages IHowever, the routing of the PIM network is not adaptive. Figure 2: KL1 Execution Image are possibly overtaken in the network. 2.2 KLI KL1 is a kernel language for the PIM based on the GHC (Guarded Horn Clauses) language [Ueda and Chikayama 1990]. Figure 2 shows our KL1 execution image. A clause of a KL1 program can be viewed as a rewrite rule, which rewrites to the body goals a goal that succeeds the guard unification and satisfies the condition (guard), and has a form as follows: p: -g1, ···,gm I qb ... , qn' ~ guard part ---..-.body part Where p, gi, and qi stand for predicates. This rewriting of a goal is also called reduction. The execution model has a goal pool which holds the goals to be rewritten. Goals are regarded as lightweight processes. Basically, guard goals gl, ... ,gm and body goals are reduced concurrently, thus yielding parallelism. Goal (process) communication is realized as follows. Suppose that more than one goal shares a variable. When a goal binds a value to the shared variable, a clause for rewriting the other goal that shares the variable may be determined. The value which is instantiated to the shared variable controls the clause selection; this is the communication between KLI goals. Synchronization is realized as follows. When a goal is going to determine which clause can be used for rewriting, and the variables included in the goal are uninstantiated, the unification and the guard execution may be deferred since there is not enough information for the clause selection. The uninstantiated variables are supposed to be shared and the other goal is expected to bind 438 a value to the variable afterwards. Consequently, the suspended goal reduction waits for variable binding for the clause selection. That is, variable instantiation realizes data-flow synchronization. Actually, the KL1 language processor must deal efficiently with frequent suspension and resumption. Even if more than one clause can be used for rewriting, just one clause is selected indeterminately. A vertical bar between the guard part and the body part 'I', called a commit operator, designates indeterminacy. Since it is sufficient to hold a single environment for each variable, efficient implementation is expected. One of features of the KLI language is the provision of simple yet powerful meta control facilities as follows: goal execution control, computation resource management, and exception handling. These are essential for designing efficient parallel algorithms and enabling flexible parallel programming. Usually, operating systems perform meta-control on a process basis. However, the KL1language aims at fine-grain parallelism, and the KL1 language processor reduces a large number of goals in parallel. Therefore, it is inefficient and impossible for a programmer or th: runtime system 2 to control the execution of each goal. Consequently, KL1 introduces the concept of a shoen 3 [Chikayama et at. 1988]. A shoen is regarded as a goal group or a task with meta-control facilities. An initial goal is given as an argument to the built-in predicate shoen; descendant goals belonging to the shoen are controlled as a whole. Descendant goals inherit the shoen of the parent goal. Shoens are possibly nested as well; the structure connecting shoens is a tree. Moreover, to realize sophisticated mapping of parallel computation, priority and location specification are introduced; that is, they can be used for programming speculative computation and load balancing. If a programmer attaches an annotation to a body goal e.g. p@priority (N) , this tells the runtime system to execute goal p at priority N. Moreover, a goal can have a location specification e.g. p@cluster(M); this designates the runtime system to execute the goal p in the M th cluster. These two specifications are called pragmas. These pragmas never change the correctness of a program although they change the performance drastically. 3 Systematic Design of KLI Lang!lage Processor When implementing KL1 on PIM, various kinds of difficulties appear. Firstly, although the PIM architecture 2The software modules of the KLllanguage processor executed at run time are called a runtime system as a whole. For instance, the runtime system may include an interpreter, firmware in microcode, and libraries. On the contrary, compilers, assemblers and optimizers are not included in a runtime system. 3Shoen is pronouned, ~show' 'N'. adopts a hierarchical configuration, the KL1 implementation has to provide a uniform view of the machine to programmers. Secondly, it is difficult to determine to what extent a runtime system should support the functions of KL1 and which functions it should support within the specification of KLl. For instance, since the KL1 language does not specify the goal-scheduling strategy, a runtime system can employ any scheduling algorithm. However, both the general-purpose and the efficient algorithm are generally difficul.t to develop. Thirdly, for efficient implementation, it is important to employ algorithms which include fewer bottlenecks in terms of parallel execution. Lastly, the KL1 language processor is complex and of a large scale. Therefore, it is a promising idea to be able to overcome these difficulties by systematically designing a language processor as follows. Firstly, the given language specification must be deliberately analyzed and properly decomposed. Then, the algorithms and the program modules for realizing the decomposed requirements must be developed by considering the machine architecture on which the algorithms work. Lastly, the designer must construct the language processor by combining the program mod ules. A good combination of these modules will yield an efficient implementation. We designed the KL1 language processor on a loosely-coupled shared-memory multiprocessor system (PIM) by following these guidelines. 3.1 Requirements At first, we summarize the required functions of the KL1 language processor into the four items in the leftmost column of Table 1. These items are the result of analysis and decomposition of the KL1 language specification. The KL1 language processor may look like the kernel of an operating system. Next, mechanisms which satisfy these requirements are divided into those supported by a compiler and those supported by a runtime system. Furthermore, mechanisms by the runtime system are divided into two levels according to the machine configuration of PIM: sharedmemory level and distributed-memory level (the topmost row of Table 1). Some of the technologies used for KL1 implementation on single-processor systems may be expanded to sharedmemory multiprocessor systems. That is because both systems suppose a linear memory address space. However, it may not be straightforward to expand the singleprocessor technologies to distributed-memory multiprocessor systems in general. Of cource, that is mainly because distributed-memory systems provide a non-linear memory address space. Thus, the techniques used for distributed-memory systems are possibly quite different from those for a single-processor system. The contents of Table 1 show our solutions; that is, 439 Table 1: Implementation Issues of this Paper Compiler Unification Memory Management Goal Scheduling Meta-control Execution Control Resource Management Exception Handling Decomposition Reuse inst. TRO Runtime System Distributed-memory Level Shared-memory Level Message Protocol Suspension and Resumption Export and Import Tables Local GC Weighted Export Count Automatic Load Balancing Termination Detection Resource Caching what techniques are used for parallel and distributed KLI implementation. Each item in the leftmost column of the table is mentioned below. 3.1.1 Unification Goals are distributed all over a system for load balancing and may share data (variables and ground data) for communication. Logical variables remain resident at their original location. Consequently, not only intra-cluster but also inter-cluster data-references appear. During unification, goals have to read and write the shared data consistently and independently from the timings and locations of goals and data. Thus, mechanisms for preserving data consistency are needed. As described above, goals are rewritten in parallel and, thus, variable instantiations occur independently from each other. Suspension and resumption mechanisms based on variable bindings control goal execution and realize data-flow synchronization. Hence, our KLI implementation must realize the mechanisms for data· consistency, synchronization, and unification in a parallel and distributed environment. Moreover, since a major portion of the CPU time is spent for unification, the algorithm should be concerned with efficiency. 3.1.2 Memory Management Logical variables inherently have the single-assignment property. The single assignment property is very useful to programmers, but gives rise to heavy memory consumption. Since the KLI language does not backtrack, KLI cannot perform memory reclamation during execu.tion as Prolog does. Thus, an efficient memory management mechanism is indispensable for the KLI language processor. The issues associated with memory management are allocation, reclamation, working-set size, and garbage collection. To achieve high efficiency, not only must the algorithms and the data s.tructure of the runtime system be improved, but also a compiler has to generate effective codes by predicting the dynamic behavior Foster-parent Weighted Throw Count Message Protocol of a user program as much as possible. 3.1.3 Goal Scheduling The KLI language defines goal execution as concurrent. Thus, the system is responsible for the exploitation of actual parallelism. One implementation issue associated with goal scheduling is determining which goal scheduling strategies have high data locality, yet keep the number of idle PEs to a minimum. Further, the KLllanguage provides the concept of goal priority; each KLI goal has its own priority as explicitly designated by a programmer. Then, goals with higher priorities are likely to be reduced first. Goal prioritization in KLI is weak in some respect. Under the goal priority restriction, it is crucial to achieve load balancing. 3.1.4 Meta Control Facilities The goals of a shoen may actually be distributed over any clusters, and, thus, goals may be reduced on any PE in the system. Since the system operates in parallel, shoens are loosely managed; it is simply guaranteed that each operation will finish eventually. That is, it is impossible to execute a command simultaneously to all the goals of a shoen. A shoen has two streams as arguments of the shoen built-in predicate; one is for controlling shoen execution, and the other is for reporting the information inside the shoen. A shoen communicates with outside KLI processes through these two streams. Messages, such as start, stop, and add_resource, enter the control stream from the outside. Messages, such as terminated, resource_low, and exception return to the report stream from the inside. It is very difficult to evaluate the CPU time and memory space spent for computation when goals are distributed and executed in parallel. Therefore, the current· system regards the number of reductions as a measure of the computing resources consumed within the shoen. 440 The exceptions reported from a shoen include illegal input data, unification failure 4 , and perpetual suspension. Some examples of shoen functions are shown below. Stop message: When a stop message is issued in the control stream of a shoen, the system has to check whether or not the goals to be reduced belong to the shoen, and, if they do, the shoen changes its status to stop as soon as possible. The stop message is propagated to the nested descendant shoens. Resource Observation: The system always watches the consumption of computation resources, that is, the total number of times goals belonging to each shoen are reduced over the entire system. If the amount of consumption within a shoen is going to exceed the initial amount of supplied resources, the system stops the reduction of shoen goals and, then, issues the resource_lot,]' message on the report stream, viz. a supply request for a new resource. Exception Handling: When a programmer or the system creates an exception during the reduction of a goal in a shoen, the shoen responsible recognizes the exception and converts the exception information to a report stream message 5. The exception of the KL1 language is concerned with illegal arguments, arithmetic, failure, perpetual suspension and debugging. An exception message on the report stream indicates which goal caused what exception and where. Additionally, the exception message includes variables for a continuation given from the outside; the other process can designate a substitute goal to be executed, instead of the goal causing the exception. 3.2 Overview of Implementation Techniques IeOT developed the Multi-PSI system in 1988 [Nakajima et al. 1989]. The KLI system is running on the Multi-PSI. The architecture of PIM is very different from that of Multi-PSI in the following two points. One is that PIM has a loosely-coupled network with messages possibly overtaken. The other is that PIM has cluster structures that are shared-memory multiprocessors. Due to these features, PIM attains high performance, and, at the same time, the complexity of the KLI language processor increases. This section describes many of the implementation techniques we have been developing for such an archi4Notice that the unification failure of a KLI goal does not influence the outside of a shoen. In this sense, the reduction of a KLI goal never fails, unlike GHC. 5The mechanism for creating and recognizing exceptions is similar to catch-and-throw in LISP. tecture. Among these techniques, the issues which this paper focuses on are listed in Table 1. 3.2.1 Unification The synchronization and conununication of KLI are realized by read/write operations to variables and suspension/resumption of goal reduction during unification. These operations are described below. Passive Unification and Suspension: Passive unification is unification issued in the guard part of KLI programs. The KL1 language does not allow instantiation of variables in its guard part. The guard part unification is nonatomic. Since KL1 is a single-assignment language, once a variable is instantiated, the value never changes. This means that passive unification is simply the reading' and comparing of two values. From the implementational point of view, basically only read operations to variables are performed. Thus, no mutual exclusion is needed in the guard part. If goal reduction during the guard part is suspended, the goal is hooked to variables. Here, we have an assumption that almost all goals wait for a single variable to be instantiated afterwards. Therefore, an optimization may be taken into account; the operation for the goal suspension is just to link the goal to the original variable. If multiple uninstantiated variables suspend goal reduction, however, the goal is linked to the variables through a special structure for multiple suspension. During passive unification, only these suspension operations modify variables; the operations are realized by the compare & swap primitive. Active Unification and Resumption: Active unification is unification issued in the body part of KL1 programs. The KLI variables are allowed to be instantiated only in the body part. When an instantiation of a shared variable occurs, if goals are already hooked to the variable, these goals have to be resumed as well as the value assignment. When instantiating a variable, since other PEs might be instantiating the variable simultaneously, mutual exclusion is required. We also adopt compare & swap as the mutual exclusion primitive. ' When unifying two variables, one variable has to be linked to another to make the two variables identical. At this time, other PEs might be unifying the same two variables. Therefore, imprudent unification operation might turn out to generate a loop structure and/or dangling references. To avoid these, the following linking rule should be obeyed: the variable with the lowest address is linked to the one with the highest. Section 4.1 describes the implementation of unification in detail. 441 3.2.2 Inter-cluster Processing In a KLI multi-cluster system, more than one PE in each cluster reduces goals in parallel. If a goal reduction succeeds, there are two kinds of new goal destination: the cluster that the parent goal belongs to and the other cluster. If the other cluster is designated for load balancing, the runtime system throws the new goals to the clusters. If the arguments of a goal to be thrown are references to variables and structures, the references across clusters consequently appear, these are called external references. Here, suppose that a new goal with reference to data in cluster A is thrown to cluster B. Then, original cluster A exports the reference to the data to cluster B, and foreign cluster B imports the reference to the data from cluster A. Exportation and importation are also implemented by message sending. Multiple reference across clusters inevitably occurs. An external reference is straightforwardly represented by using the pair where cl is the cluster number in which the exported data resides, and addr is the memory address of the exported data. This representation of an external reference provides programmers with a linear memory space. However, this implementation causes a crucial problem; efficient local garbage collection is impossible. Here, local means that garbage collection is performed locally within a cluster. See Section 4.3 for more details on garbage collection. Since our local garbage collector adopts a stop and copy algorithm (Section 4.3), the locations of data move after garbage collection. At that time, all of the new addresses of moved data should be announced to all other clusters. Thus, straightforward representation would make cluster-local garbage collection very inefficient. Section 4.2 shows our solution to this problem and discusses more detailed inter-cluster processing subjects. 3.2.3 Memory Management As described in Section 3.1.2, the implementation of memory management should pay close attention to allocation, reclamation, working set size, and garbage collection. Allocation and Reclamation: A cluster has a set of free lists for pages and supports any number of contiguous pages 6. These are called global free lists. The size of 'pages is uniform; supposedly the integral power of two 7. A PE has a set of free lists for data objects, the sizes of which are less than the page size. These are called private free lists. Actual object size is rounded up to the closest integral power of two; the private free lists 6Currently, there are 15 kinds of free lists for supported pages: ,.., 15 - and - more. 7The size of a page is currently 256 words. just support the quantum sizes of 2n. Moreover, objects contained in a page are uniform in size. A PE allocates an object as follows. When a PE requires an object which is smaller than a page, the PE first tries to take an object from an appropriate private free list. If a PE runs out of a private free list and fails to take an object, then the PE tries to take a new page from the global free lists. If it succeeds, the PE partitions the page area into objects of the size the PE requires, recovers the starved free list and, then, uses an object. Otherwise, if a PE cannot take a proper page area from a global free list, the PE tries to extend the heap to allocate a new page area on demand. When a PE requires an object which is larger than a page, the PE tries to take new contiguous pages from global free lists. Otherwise, the PE tries to extend a heap to allocate new contiguous pages as above. When a PE reclaims a large or small object, it is linked to the proper free list. The features of this scheme are as follows: • Since a PE has its own private free lists for small objects, the access contention to global free lists and the heap is alleviated. • A PE usually just links garbage objects to and takes new objects from appropriate free lists; it leads the small runtime overhead for allocation and reclamation 8. • Since every PE handles its private free lists using push and pop operations (obeying the LIFO rule), the working set size can be kept small. • Since the size of small objects is rounded up to the nearest 2n, the number of private free lists to be managed decreases, and the deviation of private free list lengths can be alleviated to some extent. Additionally, the fragmentation within a page is prevented, though some objects might contain unused areas. • Since this scheme does not join two contiguous objects, unlike the buddy system, its runtime overhead of reclamation is kept small. On the other hand, when the free list of some size run out, our KLI language processor does not partition a . large object into smaller ones, but allocates a new page. This is mainly because, due to too much partitioning, it is likely that garbage collection will be invoked even if only slightly large object is required. The other reasons are as follows. In general, it is inefficient to incremental~y partition a small object into even smaller objects. The overhead for searching an object to be partitioned is needed. Also, in our KLI language processor, a local stop-and-copy garbage collector (described just below,' (2)) collects garb ages and rearranges the heap area efficiently. 8 A module of PIM, PIM/p, has dedicated machine instructions for handling free lists, push a.nd pop. 442 Furthermore, a KL1 compiler optimizes memory management by generating codes not only for allocation and reclamation but also to reuse data structures utilizing the MRB scheme [Chikayama and Kimura 1987] (Section 4.6.4). I shoen I • .Ii' Garbage Collection: Our KL1 language processor performs three kinds of garbage collections (1) local real-time garbage collection using the MRB scheme (2) local stop-and-copy garbage collector (3) real-time garbage collection of distributed data structures across clusters. Since (1) can reclaim almost any garbage object, (2) is needed, eventually. (1) has a very small overhead and can defer the invocation of (2). Moreover, in a sharedmemory multiprocessor, it is important that (1) does not destroy data on snooping caches and keeps the working set size of an application program small [Nishida et at. 1990], unlike (2). Section 4.3 discusses the parallel copying garbage collector (2) in detail. Section 4.2.2 discusses our method for reclaiming data structures referred to by external reference (3) in detail. 3.2.4 Goal Scheduling The aim of goal scheduling is to finish the execution of application programs earlier. It is impossible for a programmer to schedule all goals strictly during execution. In particular, in the knowledge processing field, there are many programs in which the dynamic behavior is difficult to predict. The optimum goal scheduling depends on applications, and, thus, there are no general-purpose goal scheduling algorithms. Hence, a programmer cannot avoid leaving part of the goal scheduling to a runtime system. Then, PEs within a cluster share their address spaces, and the communication between them is realized with a relatively low overhead. Optimistically thinking, the performance will pay for the overhead of the automated goal-scheduling within a cluster as the number of PEs increases. However, when the automated goal-scheduling for inter-cluster does not work well, the penalty is even greater. Consequently, the KL1language processor adopts automated goal-scheduling performed within a cluster and manual goal-scheduling among clusters. Furthermore, the runtime system should schedule goals fairly by managing priorities. Section 4.4 discusses the implementation of goal scheduling. 3.2.5 Meta Control Facilities The meta control facilities of KL1 are provided by a shoen. The implementation model for a shoen on a distributed environment introduces a joster-parent to prevent bottlenecks and to realize less communication. A cluster 0 cluster 1 shoen : shoen record fp : foster-parent record cluster 2 G:goal Figure "3: Relationship of Shoen and Foster-parents foster-parent is a kind of proxy shoen or a branch of a shoen; the foster-parents of a shoen are located on clusters where the goals of the shoen are reduced. A shoen and a foster-parent are realized by record structures which store their details, such as status, resources, and number of goals. Figure 3 shows the relationship between shoens, foster-parents and goals. As in Figure 3, a shoen controls its goals and the descendant shoens resident in a cluster through a fosterparent of the cluster. A shoen directly manages its fosterparents only. Then, a foster-parent manages the descendant shoens and goals. A shoen is created by the invocation of the shoen predicate. At that time, a shoen record is allocated in the cluster to which the PE executing the shoen predicate belongs. Next, when a goal arrives at a cluster but the foster-parent of its shoen does not yet exist, a fosterparent is created for the goal execution automatically. During execution, new goals and new descendant shoens are repeatedly created and terminated. When all goals and descendant shoens belonging to a foster-parent are terminated, the foster-parent is terminated, too. Further, when all foster-parents belonging to a shoen are terminated, the shoen is terminated. On comparing a shoen record and a foster-parent record of our implementation with those of the MultiPSI system, ours must hold more information because of the PIM network with messages possibly overtaken. That IS, in our KL1 system, the automatons to control a shoen and a foster-parent require more transition states. Consequently, in terms of implementing a shoen and a foster-parent, we have to pay special attention to efficient protocols between a shoen and its foster-parents 443 which work on the loosely-coupled network of PIM (messages are possibly overtaken in the PIM). Another point requiring attention is that, since parallel accessing might become a bottleneck, the system should be designed so that such data do not appear, i.e. less access contention. Section 4.5 describes the parallel implementation of a shoen and a foster-parent in more detail. 3.2.6 Intermediate Instruction Set As described so far, the KL1 language processor is too large and complex to be implemented directly in hardware or firmware. To overcome this problem, we adopted a method suggested by Prolog's Warren Abstract Machine (WAM) [Warren 1983] where the functions of the KL1 language processor are performed via an intermediate language, KL1-B. The advantages of introduction of an intermediate language include: code optimization, ease of system design and modification, and high portabilty. The optimization achieved at the W AM level brings about more benefits than the peep-hole optimization since the intermediate instruction sequence reflects the meanings of the source Prolog program. Similarly, the optimization at the KL1-B level gains more than the peep-hole optimization. Details on the optimization are described in Sections 4.6.4 and 4.6.5. If the specification of the KL1-B instruction set is fixed, it is possible to independently develop a compiler for compiling KL1 into KL1-B and a runtime system executing the KL1-B instructions. If a runtime system can be designed so that it absorbs the differences in hardware architecture, the machine-dependent parts of the KL1 language processor are made clear, and portability is improved. 3.2.7 Built-in Predicates This section mentions the optimization techniques on the implementation of the built-in predicates merge and set_vectoLelement. These techniques were originally invented for the Multi-PSI system. Our KL1 language processor basically inherits the techniques. merge: The merger predicate merges more than one stream into another. It is useful for representing indeterminacy; actually, the merge predicate is invoked frequently in practical KL1 programs, such as the PIMOS operating system [Chikayama et al. 1988]. Although a program for a stream merger can be written in KL1, the delay is large. Thus, it is profitable to implement the merger function with a constant delay by introducing the merge built-in predicate. Let us consider a part of a KL1 program: .. , p(X), q(Y), merge(X,Y,Z), .. When predicate p is to unify X and its output value, a system merger is invoked automatically within the unifier of X. The same thing happens as Y of q. See [Inamura et al. 1988] for a more detailed discussion. seLvector _element: To write efficient algorithms without disturbing the single-assignment property of logical variables, the primitive can be used as follows in the KL1 language: set_vector_element(Vect, Index, Elem, NewElem, NewVect) When an array Vect, its index value Index, and a new element value NewEl ern are given, this predicate binds Elem to the value at the position of Index and NewVect to a new array which is the same as Vect except that the element at Index is substituted for NewEl ern. Using the MRB scheme, our KL1 language processor detects a situation that NewVect is obtained in constant time. That is, the situation is that the reference to Vect is single, and, thus, destructive updating of the array is allowed. See [Inamura et al. 1988] for a more detailed discussion. 4 Implementation Issues This section focuses on several important implementation issues which ICOT has been working on intensively for the past four years. Our implementation mainly takes the following into account: - Smaller and shorter mutual exclusion within a cluster If the locking operation is effective over a wide area or for a long time, system performance is seriously degraded due to serialization. To avoid this, scattered and distributed data structures are designed, and only the compare & swap operation is adopted as a low-level primitive for light mutual exclusion 9. - Less communicationj i.e., fewer messages Since inter-cluster communication costs more than inner-cluster communication, mechanism for eliminating redundant messages are effective. - Main path optimized while enduring low efficiency m rare cases Since the efficiency of rare cases does not affect total performance, the implementation for handling the rare cases is simplified and low efficiency is endured. This is important for reducing code size. Important hardware restrictions to be taken into account are: 9Higher-level software locks contain this primitive. 444 - Snooping caches within a clusterj data locality has a great effect It is important to keep the working set of each PE size small. This leads to a reduction in the shared bus traffic and increase in the hit ratio of the snooping caches. - Messages are possibly overtaken in the looselycoupled network of PIM The number of shoen states and foster-parent states to be maintained increases. The message protocol between clusters should be carefully designed. 4.1 • Active unification of two structures is invoked; All elements of the two structures should be unified, however, the operation is rather complex (the ordinal implementation uses stacks like Prolog). To simplify the operation for rare cases, a special KL1 goal is ordinarily created and scheduled. For example, if two active unification arguments are both lists, the following goal is created. Unification The unification of variables shared by goals realizes synchronization and communication among goals. Since more than one PE within a cluster performs unification in parallel, mutual exclusion is required when writing a value to a variable. Since unification is a basic operation of the KL1 system, efficiency greatly affects total performance. At first, this section shows simple and efficient implementation methods of unification. Next, since problems associated with the loosely-coupled network of PIM occur, a distributed unification algorithm which works consistently and efficiently on the network is presented. 4.1.1 The above X and Y are unified to variables one at least of which has failed compare & swap during unification. Simplification Methods There are two ways to simplify the unification algorithm as follows. Structure Decomposition: A KL1 compiler decomposes the unification of a clause head. For example, ( a) of the following program is decomposed to (b) at compile time. p ( [f (X) ! LJ) : - true peA) :- A = [Y!L], Y = f(X) q(X), pel). (a) q (X), p (L) . (b) Thus, the compiler can generate more efficient KL1-B code corresponding to (b). Substitution for System Goals: In rare cases, a runtime system automatically substitutes part of the unification process with special KL1 goals. This can alleviate the complexity of a unification algorithm; imp lementors need not pay attention to mutual exclusion of the part. For example, let us consider the following two rare cases. • A compare & swap failure (another PE has modified the value); If this happens, then the following KL1 goal is automatically created and scheduled as if it were defined by a user: unify_retry(X,Y) true! X - Y. list_unifier([XlIX2J, [YlIY2J) Xl = Yl, X2 = Y2. 4.1.2 Distributed Implementation Message Passing true Based 1 on The principle of the protocol for distributed unification is as follows. A read/write operation to an external reference cell (Section 4.2.1) basically causes a corresponding request message to be launched to the network. However, redundant messages are eliminated as much as possible. Distributed Passive Unification: Passive unification has two phases: reading and comparing. First, to execute the read operation on an external reference cell is to send a read message to the foreign exported data. If the exported data has become a ground term (an instantiated variable), an answer_value message returns. If the exported data is still a variable, the request message is hooked to the variable. If the data is an external reference cell, the read message is forwarded to the cluster to which the cell refers. Next, the answer_value message arrives at the original cluster. Then, the returned value is assigned to the external reference cell, and the goal waiting for the reply message is resumed. Eventually, the goal reduction is going to compare the two values. Moreover, the import table entry for the cell can be released. The efficient implementation of inter-cluster message passing itself is presented in Section 4.2. Safe and Unsafe Attributes: If an argument of active unification is an external reference cell, the active unification has to realize the assignment in a remote cluster. Sending a unify message to the exported data assigns a value to the original exported data. However, in general, the unification of two variables from distinct clusters may generate a reference loop across clusters. In order to avoid creating such reference loop, we introduce the concept of safe/unsafe external references [Ichiyoshi et al. 1988]. When there is active unification between a variable and an external reference cell, and the external reference cell is safe, it is possible that the variable 445 Export Table Import Table is bound to the external reference cell. If the external reference cell is unsafe, a unify message is sent to the exported data. 4.2 Inter-cluster processing Cluster B Cluster A 4.2.1 Export and Import Tables Figure 4: Export and Import Tables Export Table: As described in Section 3.2.2 , straightforward implementation of an external reference makes cluster-local garbage collection very inefficient. In order to overcome this problem, each cluster introduces an export table to register all locations of data which are referenced from other clusters (Figure 4). That is, exported data should be accessed indirectly via the export table. Thus, the external reference is represented by the pair , called the external reference ID, where ent is the entry number of the export table. As the export table is located in the area which is not moved by local garbage collection, the external reference ID is not affected by local garbage collection. Changes in the location of exported data modify only the contents of export table entries. Since exported data is identified by its external reference ID, distinct external reference IDs are regarded as distinct data even if they are identical. To eliminate redundant inter-cluster messages, exported data should not have more than one external reference ID. Thus, every time a system exports an external reference ID, the system has to check whether or not the external reference ID is already registered on the export table. Import Table: In order to decrease inter-cluster traffic, the same exported data should be accessed as few times as possible. Hence, each cluster maintains an import table to register all imported external reference IDs. The same external references in a cluster are gathered into the same internal references of an external reference cell (EX in Figure 4). Then, exported data is accessed indirectly via the external reference cell, the import table, and the export table. The external reference cell is introduced so that it can be regarded equally as a variable. Operations to a variable are substituted for the operations to the external reference cell. Every time the system imports an external reference ID, the system has to check whether or not the external reference ID is already registered in the import table. Thus, the import table entry and the external reference cell point to each other. 4.2.2 Reclamation of Table Entries As described above, the export table is located in the area which is not moved by local garbage collection. During local garbage collection, data referred to by an export table entry should be regarded as active data, because it is difficult to know whether or not the export table entry is referred to by other clusters immediately. Therefore, without an efficient garbage collection scheme for the export table, many copies of non-active data would survive, these reducing the effective heap space and decreasing garbage collection performance. One way of managing table entries efficiently is for table entries to be reclaimed incrementally. Below, we describe a method for reclaming table entries in detail. Let us consider utilizing local garbage collection. Execution of local garbage collection might release the external reference cells. This leads to the release of import table entries and the issue of release messages to the corresponding export table entries. When the export table entry is no longer accessed, the entry is released. However, the reference count scheme cannot be used to manage the export table entries. This is because the increase and decrease messages for the reference counters of the export table entries are transferred through a network. Then, the arrival order of the two messages issued by the two distinct clusters is not determined in the PIM global network. This destroys the consistency of reference counters. Additionally, in the PIM network, messages are possibly overtaken. Although the reference count scheme has been improved and now requires the acknowledgment of each increase and decrease message, this will increase the network traffic. A more efficient scheme, the weighted export counting (WEC) scheme has been invented [Ichiyoshi et al. 1988]. This is an extension of the weighted reference counting scheme [Watson and Watson 1987] [Bevan 1989] in the sense that the messages being transmitted in the loosely-coupled network also have weights. With the WEC scheme, every export table entry E holds the following invariant relation (Figure 5): Weight of E = Weight of x x E references to E A weight is an integer. When a new export table entry is allocated, the same weight is assigned to both the export table entry and the external reference. When an import table entry is released, its weight is returned to the corresponding export table entry by the release message. The weight of the export table entry is decreased by the returned weight. The export table entry is detected as 446 no longer being accessed when the weight of the entry becomes zero. Then, the entry is released from the export table. See [Ichiyoshi et ai. 1988] for more details on the operation of the WEC scheme. add the weight W to a new import table entry. In Cluster B, the suspended goal is resumed when the new import table entry is allocated. Then, the export table entry for the return address is reclaimed. Figure 7 shows the situation at that time. Cluster A I WEe Export Table = 50 e Cluster C Message\ I WEe Cluster B I WEe = 20 WEe I = 100 Import Table Requ stWEC exported data suspended goals 1 = 30 Export Table Figure 5: WEC Invariant Relation It is important that the WEC scheme is not affected by the order in which messages arrive, and there is no need to give acknowledgment. Furthermore, the WEC scheme alleviates the cost of splitting external references. 4.2.3 Cluster Cluster B Figure 6: WEC Request Phase Supply of Weighted Export Count In terms of the WEC scheme, the problem of how to manage WEC when the weight of the import table entry cannot be split (when the weight reaches 1) remains. In order to overcome this problem, we developed a WEe supply mechanism which is an application of the bind hook technique. The bind hook technique suspends and resumes the 1\:L1 language (Section 2.2) [Goto et ai. 1988]. The WEC supply mechanism works as shown in Figure 6 and 7. The current situation is that the weight of an import table entry in Cluster Breaches 1, and a goal in Cluster B issues an access command to the data in Cluster A. In this case, the message related to the access command cannot be sent, because the weight to be put on the message command cannot be got from the import table entry. In the WEC supply mechanism, the left WEC (the weight is 1), first, is taken from the import table entry, and the import table entry is reclaimed. After that, in Cluster B, an export table entry for the external reference cell is allocated. This new external reference ID is supposed to be the return address for the reply to the following WEC supply request. At that time, the goal is hooked to the external reference cell. Eventually, Cluster B sends the RequestWEC message to request a new weight to Cluster A. Of course, the weight taken from the import table entry described above is returned to the corresponding export table entry by this message. Figure 6 shows the situation at that time. ·When Cluster A receives the RequestWEC message, Cluster A adds a weight, say W, to the corresponding export table entry and returns the SupplyWEC message to Cluster B. The SupplyWEC message tells Cluster B to A Export Table Import Table e k---4----I- exported data @: resumed: goals Export Table ~-----~ r -r--- Cluster A J : Cluster B Figure 7: WEC Supply Phase This mechanism allows the originated goal to be hooked and resumed inexpensively without additional data structures. The 1\:L1 language processor on Multi-PSI copes with this situation using indirect exportation and zero WEe message [Ichiyoshi et al. 1988]. However, the zero WEC message is a technique which is applicable to a FIFO network. As described earlier, the PIM network does not obey the FIFO rule, so the zero WEC message cannot be used in PIM. Therefore, PIM uses indirect exportation and WEC supply mechanism. 4.2.4 Mutual Exclusion of Table Entries In order to check whether or not an external reference is already registered on the export table, a hash table is used. When an export table entry is allocated, it is registered in the hash table. When a cluster receives 447 a release message, a PE in the cluster decreases the weight of the corresponding export table entry. If the weight reaches zero, the export table entry is removed from the hash table. Figure 8 shows the data structure of the export table and its hash table. Its hash key is the address of exported datum. Since up to about ten PEs within a cluster share these structures and access them in parallel, efficient mutual exclusion should be realized. CD Hash Table Export Table (2) entry '--_----JI+---+ ®_~~~a_ p!:._ __ _ __ _ .w~_G __ ~~~c: p!:.__ WEe ----------- hash chain exported data Method 2: Locking one hash table entry When a PE decreases the weight of an export table entry, the corresponding hash table entry ((2) in Figure 8) is locked. In this method, the data structure to be locked is obviously smaller than in Method 1. However, this method has an overhead for computing the hash value of exported data even when the hash chain is not modified. entry hash chain '--_----JI+---+ __ ~c:~c: Figure 8, location CD is locked. Since the implementation of this method is simple, the total execution time is short. However, this method occupies a large locking region for a long time. Thus, access contention occurs very frequently. p~:. __ ___ '!y~_G __ _ hash chain Figure 8: Data Structures of Export Table Here, let us consider how to realize efficient mutual exclusion in the following two cases, which are typical cases of release message processing. Case 1: A PE decreases the weight of an export table entry and the weight does not reach zero. In this case, only an export table entry is directly accessed. The export table entry should be locked, when manipulating its weight. The corresponding hash table entry does not need to be locked, because the hash chain does not change. Case 2: A PE decreases the weight of export table entry and the weight reaches zero. In this case, the export table entry is released from hash table entry. Therefore, the export table entry should be locked for the same reason as in Case 1. The hash table entry should also be locl{ed, when the export table entry is released from the hash chain, because other PEs may access the same hash chain simultaneously. The problem is how to lock these structures efficiently. Here, we implemented the following three methods and evaluated their efficiency. Method 1: Locking entire hash table and export table Whenever a PE accesses the export table, the export table and the hash table are entirely locked. In Method 3: Locking one hash table entry and one export table entry When a PE decreases the weight of an export table entry, the export table entry (@ in Figure 8) is locked. If the weight becomes zero, the corresponding hash table entry ((2) in Figure 8) is locked. Then, the export table entry is released from the hash chain. In this method, the locking of data structures is at a minimum and the frequency of access contention is low. However, implementation of this method is complicated. In the above two cases, the static execution steps of the three methods are measured, using a parallel KL1 emulator on a Sequent Symmetry. Tables 2 and 3 show the results. In the tables, Total represents the total execution steps spent on receiving a release message. Locking region represents locking intervals, that is, how long each structure is locked. Table 2: Locking Intervals ( static steps) Case 1 Total Method 1 Method 2 Method 3 30 37 32 Locking region CD (2) @ 23 23 0 26 Table 3: Locking Intervals(static steps) Case 2 Total Locking region @ ~ 54 47 32 27 CD Method 1 Method 2 Method 3 61 61 73 Before evaluation, we thought that Method 1 took fewer steps than the other methods. However, there is 448 ,actually, no great difference in the total number of execution steps. This is because the essential part of accessing the export table is complicated, and dominates the steps. In Method 1, as the ratio of the locking region to the total is relatively high, access contention to the hash table is supposed by frequent. Hence, we do not adopt Method 1. [Tal<:agi and Nakase 1991] tells us that WEC is effectively divided in actual programs. From this result, we assume that there are many release messages which just decrease the weight of WECo That is, Case 1 occurs much more frequently than Case 2. Thus, we mostly deal with Case 1. The total execution steps of Methods 2 and 3 (37 steps and 32 steps) are almost the same, The locking intervals of Methods 2 and 3 (23 steps and 26 steps) are almost the same. It is preferable that the data structure to be locked is small. According to this discussion, we adopt Method 3 as the mutual exclusion method for the export table. For the import table, a similar technique is used to reclaim the import table entries. 4.3 Parallel Copying Garbage Collector Efficient garbage collection (GC) methods are especially crucial for the KL1 language processor on multiprocessor systems. Since the KL1 execution dynamically consumes data structures, GC is necessary for reclaiming storage during computation. Moreover, GC should be executed at each cluster independently since it is very expensive to synchronize all clusters. As we described briefly in Section 3, an incremental GC method based on the MRB scheme was already proposed and implemented on Multi-PSI [Inamura et al. 1988], however since it cannot reclaim all garbage objects, it is still important to implement an efficient GC to supplement MRB GC. We invented a new parallel execution scheme of stop and copy garbage collector, based on Baker's sequential stop-;;md-copy algorithm[Baker 1978] for shared memory multiprocessors. The algorithm allocates two heaps although only one heap is actively used during program execution. When one heap is exhausted, all of its active data objects are copied to the other heap during GC. Thus, since Baker's algorithm accesses active objects this algorithm is simple and efficient. Innovative ideas in our algorithm are the methods which reduce access contention and distribute work among PEs during cooperative GC. Also no inter-cluster synchronization is needed since we use the export table described in Section 4.2. A more detailed algorithm is described in [Imai and Tick 1991]. 4.3.1 Parallel Algorithm Parallelization: There is potential parallelism inherent in the copying and scanning actions, of Baker's algorithm, i.e., accessing Sand B. Here pointer S represents the scanning point and B points to the bottom of the new heap. A naive method of exploiting this parallelism is to allow multiple PEs to scan successive cells at S, and copy them into B. Such a scheme is bottlenecked by the PEs vying to atomically read and increment S by one cell and atomically write B by many cells. Such a contention is unacceptable. Private Heap: One way to alleviate this bottleneck is to create multiple heaps corresponding to multiple PEs. This is the structure used in both Concert Multilisp[Halstead 1985] and JAM Parlog[Crammond 1988] garbage collectors. Consider a model where each PE( i) is allocated private sections of the new heap, managed with private Si and Bi pointers. Copying from the old space could proceed in parallel with each PE copying into its private new sections. As long as the mark operation in the old space is atomic, there will be no erroneous duplication of cells. Managing private heaps during copying, however, presents some significant design problems: • Allocating multiple heaps within the fixed space causes fragmentation . • It is difficult to distribute the work among the PEs throughout the GC. To efficiently allocate the heaps, each PE extends its heap incrementally in chunks. A chunk is defined as a unit of contiguous space, that is a constant number of HEU cells (HEU == Heap Extension Unit). We first consider a simple model, wherein each PE operates on a single heap, managed by a single pair of S and B pointers. The Bglobal pointer is a state variable pointing to the global bottom of the new allocated space shared by all PEs. Allocation of new chunks is always performed at Bglobal' Global Pool for Discontiguous Areas: When a chunk has been filled, the B pointer reaches the top of the next chunk (possibly not its own!). At this point a new chunk must be allocated to allow copying to continue. There are two cases where B overflows: either it overflows from the same chunk as S, or it overflows from a discontiguous chunk. In both cases, a new chunk is allocated. In the former case, nothing more needs to be done because S points into B's previous chunk, permitting its full scan. However, in the latter case, B's previous chunk will be lost if it is separated from S's by extraneous chunks (of other PEs, for instance). The problem of how to 'link' the discontiguous areas, to allow S to freely scan the heap, is solved in the following manner. In fact, the discontiguous areas are not 449 bottom top The shaded portions of the heap are owned by a PE( i) which manages Sand B. Other portions are owned by any PE(j) where j =I- i. The two chunks shaded as '/' are referenced by PE(i) via Sand B. The other chunks belonging to PE(i), shaded as '\', are not referenced. To avoid losing these chunks, they are registered in the global pool. Figure 9: Chunk Management in Simple Heap Model linked at all. When a new chunk is allocated, the B's previous chunk is simply added to a global pool. This pool holds chunks for load distribution, to balance the garbage collection among the PEs. Unscanned chunks in the pool are scanned by idle PEs which resume work (see Figure 9). Uniform Objects in Size: We now extend the previous simple model into a more sophisticated scheme that reduces the fragmentation caused by dividing the heap into chunks of uniform size. Imprudent packing of objects which come in various sizes into chunks might cause fragmentation, leaving useless area in the bottom of chunks. To avoid this problem, each object is allocated the closest quantum of 2n cells (for integer n < 10g(HEU)) that will contain it. Larger objects are allocated the smallest multiple of HEU chunks that can contain them. When copying objects, smaller thanREU, into the new heap, the following rule is observed: "All objects in a chunk are always uniform in size." If HEU is an integral power of two, then no portion of any chunk is wasted. When allocating heap space for objects of size greater than one REU, contiguous chunks are used. In this refined model, chunks are categorized by the size of the objects they contain. To effectively manage this added complexity, a PE manipulates multiple {S,B} pairs (called {SI,Bd, {S2,B 2}, {S4,B 4}, ... , and {SHEU' BREU } ). Initially, each PE allocates multiple chunks with Si and Bi set to the top of each chunk. Referring back to Figure 9, recall that shaded chunks of the heap are owned by PE(i) and non-shaded chunks are owned by other PEs. The chunks shaded as 'I', in the extended model, contain objects of some fixed size k, and are managed with a pointer pair {Sk,Bd. Chunks shaded as '\' are either directly referenced by other ,Pointer pairs of PE( i) (if they hold objects of size m -=I k), or are kept in the global pool. Load Balancing: In the previous algorithm, it is a difficult choice to select an optimal REU. As REU increases, Bglobal accesses become less frequent (which is desirable, since contention is reduced); however, the average distance between Sand B (in units of chunks) de- creases. This means that the chance of load balancing decreases with increasing REU. One solution to this dilemma is to introduce an independent, constant size unit for load balancing. The load distribution unit (LDU) is this predefined constant, which is distinct from HEU ID and enables more frequent load balancing during GC. In general, the optimized algorithm incorporates a new rule, wherein if (Bk - Sk > LDU), then the region between the two pointers (i.e., the region to be scanned later) is pushed onto the global pool. 4.3.2 Evaluation The parallel GC algorithm was evaluated for a large set of benchmark programs (from [Tick 1991] etc.) executing on a parallel KL1 emulator on a Sequent Symmetry. Statistics in the tables where measured on eight PEs with HEU=256 words and LDU=32 words, unless specified otherwise. A more detailed evaluation is given in [Imai and Tick 1991]. To evaluate load balancing during GC, we define the workload of a PE and the speedup of a system as follows: workload(PE) = speedup = number of cells copied + number of cells scanned 2::: workloads max (workload of PEs) The workload value approximates the GC time, which cannot be accurately measured because it is affected by DYNIX scheduling on Symmetry. Workload is measured in units of cells referenced. Speedup is calculated with the assumption that the PE with the' maximum workload determines the total GC time. Note that speedup only represents how well load balancing is performed and does not take into account any extra overheads of load balancing (which are tackled separately). We also define the ideal speedup of a system: ideal speedup = . ( mm 2::: workloads max (workload for one object) lOWe assume that HEU , #PEs = kLDU, for integer k > O. ) 450 avg. WL Benchmark x 1000 BestPath 165 Boyer 47 Cube 139 Life 101 MasterMind 4 MaxFlow 95 Pascal 5 Pentomino 3 Puzzle 17 SemiGroup 496 17 TP Turtles 203 Waltz 32 Zebra 167 32w 7.15 5.67 7.74 7.10 2.50 4.06 2.67 4.34 2.63 7.75 2.49 7.79 4.38 6.27 Speedup Size of LDU 64w 128w 256w 6.36 7.06 6.46 4.12 5.83 4.38 6.83 7.67 7.35 6.31 6.29 6.86 2.48 2.58 2.48 2.86 3.84 3.70 2.91 3.45 2.77 3.34 4.21 3.67 2.84 2.58 2.61 7.28 7.49 7.02 2.39 2.43 2.33 7.44 7.20 7.22 2.31 2.92 1.64 6.42 6.28 6.04 ideal 8.00 8.00 8.00 8.00 2.87 8.00 7.25 8.00 2.92 8.00 2.79 8.00 8.00 8.00 Benchmark BestPath Boyer Cube Life MasterMind MaxFlow Pascal Pentomino Puzzle SemiGroup TP Turtles Waltz Zebra 32 421.0 208.8 609.4 145.8 3.9 211.3 1.6 134.3 51.6 1,700.7 44.4 1,427.0 76.0 2,127.9 LDU (words) 64 128 139.6 84.4 131.3 24.3 241.6 96.a, 66.5 29.8 1.5 1.1 37.0 75.0 1.0 1.0 21.0 65.3 30.6 10.5 910.8 439.3 19.8 8.8 640.0 314.0 36.0 11.5 920.2 467.7 256 45.8 12.8 55.5 14.8 1.0 10.0 1.0 7.5 4.9 29.6 4.6 136.0 1.4 222.4 Table 4: Average Workload and Speedup (8 PEs, REU =256 words) Table 5: Accesses of the Global Pool (8 PEs, HEU=256 words) Ideal speedup is meant to be an approximate measure of the fastest that n PEs can perform GC. Given a perfect load distribution where 1/ n of the sum of the workloads is performed on each PE, the ideal speedup is n. There is an obvious case when an ideal speedup of n cannot be achieved: when a single data object is so large that its workload is greater than l/n ofthe sum ofthe workloads. In this case, GC can complete only after the workload for this object has completed. These intuitions are formulated in the above definition. well in these degenerate cases. Our method works best for deeper structures, so that B is incremented at a faster rate than S. In this case, ample work is uncovered and added to the global pool for distribution. Speedup: Table 4 summarizes the average workload and speedup metrics for the benchmarks. The table shows that benchmarks with larger workloads display higher speedups. This illustrates that the algorithm is quite practical. It also shows that the smaller the LDU, the higher the speedup obtained. This means there are the more chances to distribute unscanned regions, as we hypothesized. In some benchmarks, such as MasterMind, Puzzle and TP, ideal speedup is limited (2-3). This limitation is due to an inability of PEs to cooperate in accessing a single large structure. The biggest structure in each of the benchmark programs is the program module. A program module is actually a first-class structure and therefore subject to garbage collection (necessary for a selfcontained KL1 system which includes a debugger and incremental compiler). In practice, application programs consist of many modules, opposed to the benchmarks measured here, with only a single module per program. Thus the limitation of ideal speedup in MasterMind and Puzzle is peculiar to these toy programs. In benchmarks such as Pascal and Waltz, the achieved speedup is significantly less than the ideal speedup. These programs create many long, fiat lists. When copying such lists, Sand B are incremented at the same rate. The proposed load distribution mechanism does not work Contention at the Global Heap Bottom: We analyzed the frequency with which the global heap-bottom pointer, Bg/oba/, is updated (for allocation ofn"ew chunks). This action is important because Bg/oba/ is shared by all the PEs, which must lock each other out of the critical sections that manage the pointer. For instance, in Zebra (given HEU = 256 words and LDU = 32 words), Bg/oba/ is updated 3,885 times by GCs. If Bg/obal were updated whenever a single object was copied to the new heap, the value would be updated 126,761 times. Thus, the update frequency is reduced by over 32 times compared to this naive update scheme. In other benchmarks, the ratios of the other programs range from 15 to 114. Global-Pool Access Behavior: Table 5 shows the average number of global-pool accesses made by the benchmarks, and the average number of cells referenced (in thousands) by the benchmarks per global-po~l access. These statistics are shown with varying LDU sizes. The data confirms that, except for Pascal and MasterMind, the smaller the LDU, the more chances these are to distribute unscanned regions, as we hypothesized. The amount of distribution overhead is at least two orders of magnitude below the useful GC work, and in most cases, at least three orders of magnitude below. As described above, to achieve efficient garbage collection on a shared-memory multiprocessor system, load distribution and the working set size should also be carefully considered. 451 4.4 Goal Scheduling in a Cluster An efficient goal scheduling algorithm within a cluster must satisfy the following criteria: 1. no idle processing elements 2. high data locality 3. less access contention 4. no disturbance of busy processing elements Moreover, since the KLI language has the concept of goal priority (Section 3.1.3), goals with higher priorities within a cluster are the targets of scheduling. Notice that Load is the amount of work to be performed by a PE, cluster or system. Thus, load does not mean the number of goals. No Idle Processing Elements: The aim of goal scheduling is to finish the execution of application programs earlier. Previous software simulation told us the following [Sato and Goto 1988]: . • To keep all PEs busy is the most effective way of load balancing since the goals of the KLI language are, in general, fine-grained and have rich parallelism. • Making the numbers of goals of each PE the same during execution does not lead to good load balancing. Here, an idle PE means one that does not have any goals to be reduced, or one that reduces goals with lower priorities. High Data Locality: Since a cluster is viewed as a shared-memory multiprocessor, it is important to keep the data locality high to achieve high performance. This means keeping the hit ratio of snooping caches high. In our KLI runtime system, once argument data are allocated to a memory, the locations are not moved (only a garbage collector can move them). Hence, it is desirable that a goal that includes references to the argument data is reduced by a PE in which the cache already contains the data. Furthermore, in terms of KLI goal reduction, suspension and r~sumption during unification give rise to expensive context switching. If context switching occurs frequently, the hit ratio of snooping caches decreases and, consequently, the total performance is seriously degraded. Less Access Contention: To schedule goals properly, each PE has to access shared resources in parallel. For instance, there is a goal pool that stores goals to be reduced and priority information that must be exchanged among PEs. Since expensive mutual exclusion is required when PEs within a cluster access these shared resources, access conflicts should be decreased as much as possible. No Disturbance of Busy Processing Elements: From the load balancing point of view, it is better to have as many idle PEs as possible involved in work associated with goal scheduling. Moreover, when an idle PE tries to find a new goal, it is desirable that the idle PE should neither interrupt nor disturb the execution of busy PEs. Consequently, well-distributed data structures and algorithms should be designed so that these criteria are satisfied as much as possible. 4.4.1 Goal Pool Let us consider two ways of implementing a goal pool: centralized implementation and distributed implementation. That is, one queue in a cluster or one queue for every PE. If centralized implementation is used, priority is strictly managed. However, every time a goal is picked up and new goals are stored, the access contention may occur. Thus, our KLI implementation adopts the distributed implementation method. It turns out that transmission of goals between PEs for load balancing is required "and priority is loosely managed. On the contrary, however, distributed queue management is necessarily loose for priority. The distributed goal queues are managed using a depth-first rule to keep the data locality high. Under depth-first (LIFO) management, it is presumed that the same PE will often write and read the same data and that the number of suspensions and resumptions invoked will be less. Therefore, the cache hit ratio increases. Further, when a PE resumes goal unification, the PE sends the goal to the queue of the PE which suspended the goal previously. This also contributes to keeping the data locality high. As described above, since goals are accompanied with priorities, in our KLI implementation, a PE has its own goal queues for each priority. Figure 10 shows the goal queues with priorities. high ~t ..., J...t ~ low priority-wise stacks Figure 10: Goal Queue with Priorities 4.4.2 Transmission of Goals As soon as a PE becomes or may become idle, it must take a new goal with higher priority from the queue of . a PE with a small overhead to avoid going into an idle state. An idle PE triggers the transmission of a new goal. 452 Here, two design decisions are needed. One decision is deciding whether the PE that transmits a new goal with high priority is a request sender (idle PE) or a request receiver (busy PE). Another decision is deciding whether a new goal is to be picked from the top of a queue or the end. If an idle PE has the initiative, access contention may occur in the queue of a busy PE. If a busy PE has the initiative, the CPU time of the busy PE must be consumed. If a new goal is picked from the top of a queue, it may destroy the data locality of the busy PE's cache. If a new goal is at the end, it will often happen that the goal reduction of an idle PE is immediately suspended; the potential load of the goal may be small under LIFO management. Thus, this method may frequently trigger transmission. The current implementation uses dedicated PIM hardware which broadcasts requests to all PEs within a cluster, in order to issue a request for a new goal to the other PEs. Each busy PE executes an event handler once a reduction and the event handler may catch the request. Then, the busy PE which catches the request first. picks up the goal with the highest priority from the top of its goal queue. Our implementation should be evaluated for comparison. 4.4.3 Priority Balancing A PE always reduces goals which belong to its local queue and have the highest priority. There are two problems; one is how to detect the priority imbalance, and the other is how to correct the imbalance by cooperating with the other PEs. Our priority balancing scheme was designed so that fewer shared resources are required and busy PEs do less work concerned with priority balancing (Figure 11). Our scheme requires only one shared priority ~ ...... number of va,riables 11 In as the number of PEs to record a current integral value for each PE. A current priority of each PE is represented by Pi. There are two constants, max (> 0) and min « 0). Every PE will always calculate the integral Ii of Pi - Pa along time. When Ii > max, the PE(i) adjusts Pa to the current Pi and resets Ii to zero. When Ii < min, the PE(i) issues a goal request, adjusts Pa to the priority of a transmitted goal, and resets Ii to zero. The mechanism of the goal transmission described above is used as well, since the goal with the highest PE priority is picked up. More details on this algorithm are described in [Nakagawa et al. 1989]. The features of this scheme are as follows. The calculation of the integral reduces the frequency of shared resource Pa updating and busy PEs do some work only when I > max. The disadvantages are as follows. It may happe~ that the priority of a transmitted goal is even lower, that Pa decreases unreasonably, and that the frequency of the high-priority goal transmission decreases. Our priority balancing scheme utilizes the goal transmission mechanism (Section 4.4.2), which does not always transfer the goal with the most appropriate priority. Accordingly, a load imbalance may be sustained for a while. How well this method works depends on the priority of the goals transmitted upon requests. In other words, there is a tradeoff between loose priority management and the frequency of high-priority goal transmission. Further, in this scheme, a busy PE (a PE satisfying Ii > max) has to write its current priority Pi to the shared variable Pa. This may cause access conflict and disturb the busy PE. A new scheme which we will design should overcome these problems. However, we think that calculation of the integral along time is essential even in new schemes. I'V 4.5 . ........ ,.:~~ Meta Control Facilities When designing the implementation for a shoen, we assume that the following dynamic behavior applies in the KLI system: ~ . Pa integral o min Time Time Figure 11: Priority Balancing Scheme variable P a to record a.n average priority, and the same • Shoen statuses change infrequently. • Shoen operations are not executed immediately but within a finite time. • Messages transferred are possibly overtaken in the inter-cluster network. Under these assumptions, our implementation must satisfy the following requirements: • The less inter-cluster messages the better. • No bottleneck appears; algorithms and protocols that do not frequently' access shoen records and foster-parent records are desirable. • The processing associated with meta control should not degrade the performance of reduction. 453 Many techniques realizing a shoen have been developed to achieve high efficiency. This section concentrates on execution control and resource management. From now on, stream messages on the control and report streams for communication to the outside are represented in a typewriter typeface, such as start, add_resource, and ask_statistics. 4.5.1 Execution Control This section describes schemes for implementing the functions for execution control. Schemes (1) rv (2) are effective in a shared-memory environment (intra-cluster). Schemes (3) rv (5) are effective in a distributed-memory environment (inter-cluster). (1) Change of Foster-parent Status: Since goal reduction cannot be started when the status of fosterparent which the goal belongs to is not started, imprudent implementation needs to check the status of a fosterparent before every goal reduction. To avoid such frequent checking, a status change of the foster-parent is notified by the interruption mechanism. When a cluster receives a message that changes a foster-parent's status to non-executable, an interruption is issued to every PE in the cluster. When a PE catches the interruption, the PE checks to see if the current goal belongs to the target foster-parent. If so, then the foster-parent is to be stopped and the PE suspends execution of the current goal and starts to reduce the goal of the other active foster-parent. Otherwise, the PE continues the reduction. Since the newly scheduled goal is supposed to belong to the other foster-parent, the context of the goal reduction 11 must be switched, too. The assumption that the status of a foster-parent is switched infrequently implies that interruptions happen rarely. Thus, an advantage of the scheme is that the ordinary reduction process rarely suffers from foster-parent checking. (2) Foster-parent Termination Detection: To detect the termination of a foster-parent efficiently, a counter called childcount is introduced. The childcount represents the sum of both the number of goals and the number of shoens which belong to the foster-parent. When the childcount of a foster-parent reaches zero, all goals of the foster-parent are finished. The child count area is allocated in a foster-parent record, and all PEs in a cluster must access the area. Since this counter must be updated whenever a goal is created or terminated, frequent exclusive updating of this counter might become a bottleneck. To reduce such an access contention, the cache area of the childcount is allocated on each PE. The operations go as follows. At first, a counter is allocated on the childcount cache 11 A childcount cache and a resource cache. of each PE, initialized with a value zero. Every time a new goal is spawn, the counter is incremented, and the counter is decremented upon the end of goal reduction. When the reduction of a new goal whose foster-parent differs from the previous one begins, the current fosterparent should be switched. That is, the value of the counter on the childcount cache is brought back to the previous foster-parent record, and the counter is reinitialized. The foster-parent terminates when it detects that the counter on the foster-parent record is zero. This scheme is expected to work efficiently if fosterparents are not changed often. (3) Point-to-point Message Protocol: Basically, message protocols based on point-to-point communication between a shoen and a foster-parent are not designed on the basis of broadcasting [Rokusawa et al. 1988]. If almost all clusters always contain foster-parents of a shoen, protocols based on broadcast are taken into account. However, the current implemetation does not assume this, although it depends on applications. Therefore it is inefficient to broadcast messages to all clusters in the system every time. Then, a shoen provides a table that indicates whether or not its foster-parent exists in a cluster corresponding to the table position. The table is maintained by receiving foster-parent creation and termination messages from the other clusters. Accordingly, a shoen can send messages only to the clusters where its foster-parents reside. (4) Lazy Management of Foster-parent: A shoen controls its foster-parents by exchanging messages, such as start / stop messages. However, these messages may overtake, and, thus, a foster-parent may go into the incorrect states. For the stats to be correct and to minimize the maintenance cost, received start/stop messages aremanaged by a counter. If a start message arrives, the foster-parent increments the counter. If a stop message arrives, the foster-parent decrements the counter. Then, when the counter value crosses zero, the foster-parent changes the execution status properly. (5) Shoen Termination Detection: To detect the termination of a shoen efficiently, a Weighted Throw Count (WTC)· scheme was introduced [Rokusawa et al. 1988] [Rokusawa and Ichiyoshi 1992]. This scheme is also an application of the weighted reference count scheme [Watson and Watson 1987][Bevan 1989]. Logically, a sh0en is terminated when there are no foster-parents. However, this is not correct enough to maintain the number of foster-parents, since goals thrown by a fosterparent may be transferred in the network. Thus, a foster-parent lets both all goals to be thrown and all messages between a shoen and foster-parents to have a· portion of the foster-parent's weight. On terminating a foster-parent, all foster-parent weights are returned to 454 the shoen. If the foster-parent terminated at message arrival, the messages from the shoen are also sent back to the shoen to keep its weight. Then, when all weights are returned to the shoen, the shoen terminates itself. An advantage of this scheme is that it is free from sending acknowledgement messages. Thus, since a shoen must not continue to lock shared resources in this scheme until an acknowledgement returns, the scheme can reduce not only the network traffic but can also alleviate mutual exclusion. 4.5.2 Resource Management As described above, a shoen is also used as a unit for resource management. In the KL1 language, the reduction time is regarded as the computation resource. The shoen consumes the supplied resources while shifting the resources. Moreover, since a shoen works in parallel, lazy resource management is inevitable, like in the shoen execution control (Section 4.5.1). A shoen has a limited amount of resources which it can consume. Upon exceeding the limit, goals in the shoen cannot be reduced. When a runtime system detects that the total amount of consumed resources so far is approaching the limit, a resource_low message is automatically issued on the shoen's report stream. The shoen stops its execution with its resources exhausted. On the other hand, the add_resource message on the control stream raises the limit and the shoen can utilize the resource up to the new limit. Furthermore, a shoen which accepts the ask_statistics message reports the current resources consumed so far. This section describes our resource management implementation schemes. (1) Distributed Management: The scheme is briefly described below. Figure 12 shows the resource flow between a shoen and its foster-parents. A shoen has a limit value, which indicates that the shoen can consume resources up to' the limit. Initially, the resource limit is zero. Only the add_resource message can raise the limit. When a shoen receives the add_resource message, the shoen requests new resources to the above foster-parent by a value within the limit value designated by the add.xesource message. Here, we also call this foster-parent the parent foster-parent. Notice that a shoen and its parent foster-parent reside in the same cluster, and, thus, the operation for the resource request is implemented by read and write operations on a shared memory. After a shoen has got new resources from its parent foster-parent, the shoen further supplies resources to its foster-parents which requested resources by the supplY-1'esource message across clusters. Moreover the supplied resources may be supplied to the descendant shoens and foster-parents. Then, those foster-parents add resource : 1 bliu~et !s~~r~ tr~\SUPPIY/r\llurn /I' r resource low ~ fp : foster-parent G: Goal Figure 12: Resource Flow Between a Shoen and its Foster- parents consume the supplied resources. The shoen has a buffer for the resources; the excessive resources returned from terminated foster-parents are stored in the shoen buffer. When the remaining resources of a foster-parent are going to run out, a resource request message is sent to the above shoen. If the shoen cannot afford to supply the requested resources, the shoen issues the resource_low message on its report stream. Otherwise, if the shoen can afford and has sufficient resources in the buffer, the resources are supplied to the foster-parent immediately. If there are insufficient resources, the shoen requests new resources within the current limit value from its parent foster-parent. As described here, the resource buffer of a shoen can prevent the message from being issued more frequently than necessary. If the resources of the foster-parent are exhausted, goal reduction stops. Then, the scheduled goals are hooked on to the foster-parent record, in preparation for rescheduling when new resources are supplied from the shoen. Furthermore, each PE has a resource cache area for the foster-parent, and, hence, a counter is actually decremented every time a goal is reduced. This mechanism is similar to the childcount mechanism (Section 4.5.1). However, when the foster-parent of a goal tobe reduced alters, the caches on PEs must be brought back to the foster- parent record. (2) Resource Statistics: While the system enjoys lazy resource management, it gets harder to collect resource information over the entire system. A shoen receives the ask_statistics message, which reports the current total consumed resources. 455 The scheme used to collect the information is described. A shoen issues inquiry messages to each fosterparent. When an inquiry message arrives at a fosterparent, the foster-parent informs each PE of this using the interruption mechanism. This portion is similar to the mechanism of Section 4.5.1 (1). The PEs which catch the interruption check if the current goals belong to the target foster-parent. If so, the PE puts the resource on the cache back to the foster-parent record. When all corresponding PEs have been put back, the subtotal resource on the foster-parent appears. If not, the PEs do nothing and reduction continues. Then, the foster-parent reports the subtotal to the shoen and re-distributes some resources back to the PEs. As a result, the PEs resume goal reduction. We assume that the ask_statistics message is issued infrequently. This scheme works well. implicitly supports multi-PE processing. Further, some KL 1-B instructions are added for inter- cluster processing. A goal is represented by a goal record on a heap. The goal record consists of arguments and an execution environment which includes the number of arguments and the address of the predicate code. A ready goal is managed in the ready goal pool which has entries for each priority. Each entry indicates a linked stack of goal records. Suspended goals are hooked on the responsible variable. Each data word consists of a value part, a type part and a MRB part [Chikayama and Kimura 1987]. An MRB part is valid, if the value part is a pointer, and indicates whether its object is single-referenced or multiplereferenced. It is used for incremental garbage collection and destructive structure updating. 4.6.2 (3) Point-to-point Resource Delivery: The destination of new resources when a shoen receives resource request messages from its foster-parents is a design decision. It must be decided whether the shoen delivers the new resources only to the foster-parents which have requested them, or delivers them to all fosterparents. A protocol based on broadcast may be preferable when the foster-parents in nearly all clusters always possess the same amount of resources and consume them at the same speed. The current method is similar to one in Section 4.5.1 (3). Our assumptions we based on an experience of the Muti-PSI system. Goal scheduling within a cluster, however, differs and there is no guarantee that every cluster has the foster-parent of the shoen. Therefore, in the current implementation method the shoen sends the resource supply message just to the clusters which have sent resource request messages. 4.6 Intermediate Instruction Set The KL1 compiler for PIM has two phases. The first phase compiles a KL1 program into an intermediate instruction code; the instruction set "is called KL1-B. The second phase translates the intermediate code into a native code. KL1-B is designed for an abstract KLl mach~ne [Kimura and Chikayama 1987], interfacing between the KL1 language and PIM hardware, just as in Warren Abstract Machine [Warren 1983] of Prolog. KL1-B for PIM is extended from KL1-B for Multi-PSI to efficiently exploit the PIM hardware. 4.6.1 Abstract KL1 machine The abstract KLl machine is simple virtual hardware to describe a KL1 execution mechanis-1TI. It has a single PE with a heap memory and basically expresses the inside execution of a PE. However, every KL1-B instruction Overview of KL1-B The intermediate instruction set KL1-B was designed according to the following principles: • Memory based scheme - goal arguments are basically kept on a goal record at the beginning of a reduction, and each of them is read onto a register explicitly just before it is demanded. Thus, almost all registers are used temporarily (Section 4.6.3). • Optimization using the MRB scheme - some instructions to reuse structures are supported to alleviate execution cost (Section 4.6.4). • Clause indexing - the compiler collects the clauses which test the same variables, and compiles them into an instruction module. Then, all guard parts of a predicate are compiled as one into the code with branch instructions forming a tree structure (Section 4.6.5). • Each body is compiled into a sequence of instructions which run straight ahead without branching. The basic KL1-B instruction set is shown in Table 6. 4.6.3 Memory Based Scheme The Multi-PSI system executes a KL1 program using the register based scheme - all arguments of the current goal are loaded onto argument registers before reduction begins, just as WAM does for Prolog. Here, let us compare the following two methods in terms of the argument manipulation cost: • In the memory based scheme, the arguments referred to in the reduction are loaded and the modified arguments are stored at every reduction. There is no cost for goal switching. • In the register based scheme, all arguments of the swapped out goal are stored and all arguments of the swapped in goal are loaded at every goal switching. 456 Table 6: Basic KLI-B Instruction Set Specification KLI-B Instruction For passive unification: load_wait read_wait is_atom/integer/list/.. tesLatom/integer equal suspend Rgp,Pos,Rx,Lsus Rsp, Pos,Rx, Lsus RX,Lfail Rx, Const, Uail RX,Ry,Lsus,Uail Lpred,Arity Read a goal argument onto Rx and check binding. Read a structure element onto Rx and check binding. Test data type of Rx. Test data value of Rx. General unification. Suspend the current goal For argument/element preparation: load read puLatom/integer allocvariable alloclist/vector write Pgp, Pos, Rx Rsp,Pos,Rx Const,Rx Rx (Arity,)Rx RX,Rsp,Pos Read a goal argument onto Rx. Read a structure element onto Rx. Put the atomic constant onto Rx. Allocate a new variable and put the pointer onto Rx. Allocate a new list/vector structure and put the pointer onto Rx. Write Rx onto a structure element. For incremental garbage collection: mark collecLvalue collecUist/vector reuse_list/vector Rx Rx (Arity,)Rx (Arity,)Rx Mark MRB of Rx. Collect the structure recursively unless its MRB is marked. Collect the list structure unless its MRB is marked. Const,Rx Rsp,Rx Rx, Ry Unify Rx with the atomic constant. Unify Rx with the newly allocated structure. General unification. collecUist/vector + allodist/vector. For active unification: unify_atom/integer unify_bound_value unify For goal manipulation and event handling: collecLgoal alloc_goal store geLcode push_goal push_goaL with_priority throw_goal execute proceed Arity, Rgp Arity, Rgp RX,Rgp,Pos CodeSpec,Rcode Rgp, Rcode,Arity Rgp, Rcode, Rprio, Arity Rgp, Rcode, Rcls, Arity Rcode, Arity Reclaim the goal record. Allocate a new goal record. Write Rx onto a goal argument. Get the code address of the predicate onto Reode. Push the goal to the current priority entry of ready goal pool. Push the goal to the specified priority entry of ready goal pool. Throw the goal to the specified cluster. Handle the event if it occurrs and execute the goal repeatedly. Handle the event if it occurrs and take a new goal from ready goal pool to start the new reduction. Some arguments may be moved between registers at every reduction. Therefore, the memory based scheme is better than the 1'egister based scheme when • Goal switching occurs frequently. • A goal has many arguments. • A goal does not refer to many arguments in a reduction. Actually, these cases are expected to be seen often in large KLI programs. Thus, we have to verify the memory based scheme with many practical KLI applications. Additionally, the number of goal arguments is limited to the number of argument registers - 32 in the case of Multi-PSI. This limitation is too tight and is not favorable to KLI programmers. The memory based scheme can alleviate this limitation to some extent. On the other hand, the naive memory based scheme necessarily writes back all arguments to the goal record, even if tail recursion is employed. Since this is very wasteful, an optimization to keep frequently referenced arguments on registers is mandatory during tail recursion. 4.6.4 Optimization Two optimization techniques are introduced: tail recursive optimization and the reuse of data structures. We can describe these using the following sample codes. • source code: app([HIL),T ,X) :- true app([) ,T,X):- true I X=[HIY), app(L,T ,Y). I X=T. • intermediate code: app_entry: load CGP,O, Rl % Load up 457 CGP,2, R2 load app-'oop: R1, sus_orJail wait R1, next is_list commit * read R1, car, R3 read R1, cdr, R4 R1 reuse_list * write R3, R1, car al/oc-va ria ble RS write RS, R1, cdr unify_bound_value R1, R2 move R4, R1 move R5, R2 execute_tro app_Ioop next: is_atom tesLatom commit load unify col/ecLgoal proceed sus_orJail: store store suspend % arguments %H %L %H %Y cation. This is frequent in programs for list processing and programs written in message driven programming. In the sample codes in Section 4.6.3, element H of the passive-unified list [HIL] is used as element H of the new list [HIY], and is read and written by the instructions marked with stars ("*"). However, if the MRB of the passive-unified list is not marked, element H can actually be used in the new list as is, and, therefore, read and write instructions can be eliminated. Therefore, the following new optimized instructions are introduced: reuse_IisLwith_elements reuse_vector_with_elements R1, sus_orJail [J, R1 CGP, 1, R3 R3, R2 3, CGP %T R1, CGP, 0 R2, CGP,2 app_entry, 3 % Write back % arguments Reg, [Fear 1Fedr ] Arity, Reg, { Fo, FI , ... ,Fn} These instructions do nothing when the MRB of the structure pointer on Reg is not marked. If marked, they allocate a new structure, copy specified elements on the structure referenced by Reg to the new structure, and put the pointer to the new structure onto Reg. Thus, reuse of data structures reduces the number of memory operations and, accordingly, keeps the size of the working set small. Sample code is shown as follows: • optimized intermediate code: Tail Recursive Optimization: Some instructions are added for this optimization. Wait tests if an argument on a register is instantiated. Move prepares arguments for the next reduction. Execute_tro executes a goal while some arguments are kept on registers. In the above source code, the first and third arguments of the first clause are used in tail recursion. These arguments are loaded at the beginning of the reduction by the load instructions which are placed before the tail recursive loop. There is no need to write them into the goal record during tail recursion. However, they must be written back to the goal record explicitly before, say, switching the goal caused by the suspend instruction. Since the second argument is not used in tail recursion, it is kept on the goal record until it is referred to in the second clause. In this example, two write instructions and two read instructions are replaced with two move instructions. Thus, by assuming a cache hit ratio of 100 %, this optimization can save two steps on each recursion loop. Reuse of Data Structures: KLI-B for PIM supports the reuse of data structures. The reuse_list and reuse_vector instructions realize this. These instructions reuse an area in a heap on which the structure unified in a guard part was allocated, but, only if the MRB of the reference to the area is not ma.rked. However, the area for the element data of the reused structure is not reused. In KLI applications, it often happens that the areas of reclaimed structures can be reused for successive allo- app_Ioop: wait is_list commit read reuse_IisLwith_elements al/oc-variable write unify_hound_value move move execute_tro R1, sus_orJail R1, next R1, cdr, R4 R1, [110] RS R5, Rl, cdr R1, R2 R4, Rl RS, R2 app_Ioop %L %Y In this code, reuse_list and instructions marked with stars ("*") are replaced with the reuse_IisLwith_elements instruction. The second argument [110] specifies that the head element has to be copied if the MRB of the list pointer on R1 is marked. If the MRB is not marked, it does nothing and is equal to nop. Therefore, only the following write RS,R1,cdr instruction can allocate the list structure [HIY]; the instruction works like the rplacd function in LISP. Consequently, in this example, reuse optimization can save one read and one write instructions and is worth approximately two machine steps. 4.6.5 Clause Indexing The KLI language neither defines the testing order for the clause selection nor has the backtracking mechanism. Thus, to attain quick suspension detection and quick clause selection, the compiler can arrange the testing order of KLI clauses; this is called clause indexing. At first, 458 the compiler collects the clauses which test the same variable, and compiles the clauses into shared instructions. Most of these work as test-and-branch instructions with branch labels occurring in the instruction codes. All guard parts of a predicate are, then, compiled into a tree structure of instructions. Our KL1 programming experiences up to now have told as that a clause is infrequently selected according to the type of argument but is often selected according to the value. Further, even if multi-way switching of KL1-B instructions on data types is introduced, these KL1-B instructions are eventually implemented by a combination of native binary branch instructions, in general. Consequently, we decided that KL1-B does not provide a multi-way switching instruction on data types, but just binary-branch KL1-B instructions on a data type. Additionally, KL1-B provides a multi-way jump instruction on the value of an instantiated variable. Two instructions are added for multi-way jump on a value: switch_atom Reg, [{X1 ,Ld,{X2 ,L 2 }, ... ,{Xn,Ln}] switch_integer Reg, [{X1 ,Ld,{X2 ,Ld, ... ,{Xn,L n}] Switch_atom is used for multi-way switching on an atom value, and switch_integer is used for multi-way switching on an integer value. They test the value on the register Reg, and if it is equal to the value Xi, a branch to the instruction specified by the label Li occurs. Since the internal algorithm implementing these switching instructions is not defined in KL1-B, the translator to a native code may choose the most suitable method for switching. The current KL1-B instruction set was designed under several assumptions in terms of KL1 programs. Thus, we have to investigate how correct our assumptions are and how effective our KL1-B instruction set is. 5 Conclusion This paper discussed design and implementation issues of the KL1 language processor. PIM architecture differs from Multi-PSI architecture because of its looselycoupled network with messages possibly overtaken, and because of its cluster structure (i.e. its shared-memory . multiprocessor portion). These differences greatly influence the KL1 language processor and are essential to parallel and distributed implementation of the KL1 language. Several of the implementation issues focused on in this paper are more or less associated with these features. Our implementation is a solution to this situation. ICOT has been working on these implementation issues intensively for the past four years, since 1988. In this paper, we began by making several assumptions and, then, tailored our implementation to them. The assumptions came from our experiences based on the Multi-PSI system. Thus, we have to evaluate our implementation, accumulate experiences on our system, and verify the appropriateness of the assumptions. Hence, we will be able to reflect our results in the KL1 language processor of the next generation. In this development cycle, the systematic design concept is effective, and the concept yields the high modularity of a language processor. It turns out to be easy to improve and highly testable. Our KL1 language processor is presented on the PIM systems (PIM/p, PIM/c, PIM/i, PIM/k), which are being demonstrated at FGCS'92. Acknowledgment We would like to thank all ICOT researchers and company researchers who have been involved in the implementation of the KL1 language so far, especially, Dr. Atsuhiro Goto, Mr. Takayuki Nakagawa, and Mr. Masatoshi Sato. We also wish to thank the R&D members of Fujitsu Social Science Laboratory. Through their valuable contributions, we have achieved a practical KL1 language processor. Thanks also to Dr. Evan Tick of University of Oregon, for his great efforts in evaluating the parallel garbage collector with us. We would also like to thank Dr. Kazuhiro Fuchi, Director of ICOT Research Center, and Dr. Shunichi Uchida, Manager of Research Department ICOT, for giving us the opportunity to develop the KL1 language processor. References [Baker 1978] H. G. Baker. List Processing ih Real Time on a Serial Computer. Communications of the A CM, 21(4), 1978, pp.280-294. [Bevan 1989] D. 1. Bevan. Distributed Garbage Collection Using Reference Counting. Parallel Computing, 9(2), 1989, pp.179-192. [Chikayama et al. 1988] T. Chikayama, H. Sato and T. Miyazaki. Overview of the Parallel Inference Machine Operating System PIMOS. In Proc. of the Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988, pp. 230-25l. [Chikayama and Kimura 1987] T. Chikayama and Y. Kimura. Multiple Reference Management in Flat GHC. In Proc. of the Fourth Int. ConJ. on Logic Programming, 1987, pp.276-293. [Crammond 1988] J. A. Crammond. A Garbage Collection Algorithm for Shared Memory Parallel Processors. Int. Journal of Parallel Programming, 17(6), 1988, pp.497-522. 459 [Goto et al. 1988] A. Goto, M. Sato, K. Nakajima, K. Taki and A. Matsumoto. Overview of the Parallel Inference Machine Architecture (PIM). In Proc. of the Int. Conf. on Fifth Generation Computer Systems, ICOT, Tokyo, 1988, pp.208-229. [Halstead 1985] R. H. Halstead Jr. Multilisp: A Language for Concurrent Symbolic Computation. A CM Transactions on Programming Languages and Systems, 7(4), 1985, pp.501-538. [Ichiyoshi et al. 1988] N. Ichiyoshi, K. Rokusawa, K. Nakajima and Y. Inamura. A New External Reference Management and Distributed Unification for KL1. New Generation Computing, Ohmsha Ltd. 1990, pp.159-177. [ICOT 1st Res. Lab. 1991] ICOT 1st Research Laboratory. Tutorial on VPIM Implementation. ICOT Technical Memorandum, TM-1044, 1991 (In Japanese). [Imai et al. 1991] A. Imai, K. Hirata and K. Taki. PIM Architecture and Implementations. In Proc. of Fourth Franco Japanese Symposium, ICOT, Rennes, France, 1991. [Imai and Tick 1991] A. Imai and E. Tick. Evaluation of Parallel Copying Garbage Collection on a SharedMemory Multiprocessor. ICOT Technical Report, TR650, 1991. (To appear in IEEE Transactions on Parallel and Distributed Systems) [Inamura et al. 1988] Y. Inamura, N. Ichiyoshi, K. Rokusawa and K. Nakajima. Optimization Techniques Using the MRB and Their Evaluation on the MultiPSljV2. In P1'OC. of the North American Conf. on Logic Programming, 1989, pp. 907-921 (also ICOT Technical Report, TR-466, 1989). [Kimura and Chikayama 1987] Y. Kimura and T. Chikayama. An Abstract KL1 Machine and its Instruction Set. In Proc. of Symposium on Logic Programming, 1987, pp.468-477. [Nakagawa et al. 1989] T. Nakagawa, A. Goto and T. Chikayama. Slit-Check Feature to Speed Up Interprocessor Software Interruption Handling. In IPSJ SIG Reports, 89-ARC-77-3, 1989 (In Japanese). [Nakajima et d. 1989] K. Nakajima, Y. Inamura, N. Ichiyoshi, K. Rokusawa and T. Chikayama. Distributed Implementation of KL1 on the Multi-PSljV2. In Proc. of the Sixth Int. Conf. on Logic Programming, 1989, pages 436-451. [Nishida et al. 1990] K. Nishida, Y. Kimura, A. Matsumoto and A. Goto. Evaluation of MRB Garbage Collection on Parallel Logic Programming Architectures. In Proc. of the Seventh Int. Conf. on Logic Programming, 1990, pages 83-95. [Rokusawa et al. 1988] K. Rokusawa, N. Ichiyoshi, T. Chikayama and H. Nakashima. An Efficient Termination Detection and Abortion Algorithm for Distributed Processing Systems. In Proc. of the 1988 Int. Conf. on Parallel Processing, Vol. 1 Architecture, 1988,pp.18-22. [Rokusawa and Ichiyoshi 1992] K. Rokusawa and N. Ichiyoshi. A Scheme for State Change in a Distributed Environment Using Weighted Throw Counting. In Proc. of Sixth Int. Parallel Processing Symposium, IEEE, 1992. [Sato and Goto 1988] M. Sato and A. Goto. Evaluation of the KL1 Parallel System on a Shared Memory Multiprocessor. In Proc. of IFIP Working Conf. on Parallel Processing, 1988, pp. 305-318. [Takagi and N akase 1991] T. Takagi and A. Nakase, Evaluation of VPIM: A Distributed KL1 Implementation - Focusing on Inter-cluster Operations -, In IPSJ SIG Reports, 91-ARC-89-27, 1991 (In Japanese). [Taki 1992] K. Taki. Parallel Inference Machine PIM. In Proc. of the Int. Conf. on Fifth Generation Computer Systems, 1992. [Tick 1991] E. Tick. Parallel Logic Programming. Logic Programming, MIT Press, 1991. [Ueda and Chikayama 1990] K. Ueda and T. Chikayama. Design of the Kernal Language for the Parallel Inference Machine. The Computer Journal, (33)6, 1990, pp.494-500. [Warren 1983] D. H. D. Warren. An Abstract Prolog Instruction Set. Technical Note 309, Articial Intelligence Center, SRI, 1983. [Watson and Watson 1987] P. Watson and I. Watson. An Efficient Garbage Collection Scheme for Parallel Computer Architectures. In Proc. of Parallel Architectures and Languages Europe, LNCS 259, Vol. II, 1987, pp.432-443. Author Index A be, Masahiro ........................ 1022 Aiba, Akira ....................... '113, 330 Ait-kaci, Hassan ·····················1012 Aikawa, Seiichi ························286 A1feres, Jose J. ························562 Ali, Khayri A.M. ·····················739 Alliot, Jean-Marc .................... ·833 Amano, S.............................. '1133 Aparicio, Joaquim N. ···············562 Arai, Susumu ..................... ······414 Arikawa, Setsuo ························618 Arima, J un ······························505 Asaie, M ................................. ·723 Asato, Akira ···························414 Babaguchi, Noboru ··················497 Bahr, E. ·································969 Barachini, F. ··········· ..···············969 Bark1und, Jonas ························8l7 Bj~rner, Dines ························191 Borgida, Alexander ··················1036 Bossi, A. ·································570 Brachman, Ronald J .......... 1036, 1063 Bratko, Ivan ···························1207 Bruschi, Massimo .................... ·634 Bruynooghe, Maurice············473, 481 Bueno, Francisco ·····················759 Carpineto, Claudio ··················626 Castaing, Jacqueline··················l076 Cheng, Anthony S.K. ···············825 Chikayama, Takashi ··················73 Chikayama, Takashi ························269, 278, 286, 791 Chino, T. ······························1133 Cho, Jung Wan ................. '643, 851 Ciancarini, Paolo .................... ·926 Ciapessoni, Emanuele ···············702 Corradini, Andrea ····,'················887 Cox, P.T. ·································539 Dally, William J. . ................... ·746 Darlington, John .................... ·682 Date, Hiroshi ···························237 De Schreye Danny ·········473, 481, 650 Debray, Saumya K. ··················581 Denecker, Marc ....................... ·650 Dung, Ph an Minh ·····················555 Duvvuru, S.............................. ·809 Eddy, John K. ························1091 Eshghi, Kave ···························514 Evans, Chris ···························546 Feldmann, Richard J. ···············300 Fuchi, Kazuhiro ........................... 3 Fujise, Tetsuro ........ ················269 Fujita, Hiroshi ························357 Fujita, Masayuki ···············132, 357 Fukumoto, Fumiyo ··················376 Furukawa, Koichi .............. '20, 230 Gabbrielli, M. ···························570 Gaines, B.R..................... '1157, 1165 Gallaire, Herve ························220 Gaudiot, Jean-Luc ........ ·············977 Ge1ernter, David······················· ·926 Giacobazzi, Roberto ··················581 Goldberg, Varon .................... ·951 Gregory, Steve······ .................... ·843 Guo, Yi-ke ······························682 Gupta, Gopal ···························770 Hagiwara, Kaoru ·····················385 Hagstrom, Ray ························307 Hamfelt, Andreas ·····················1107 Hansen, L. ······························809 Hansson, Ake ························1107 Hasegawa, Ryuzo ........ '113, 132, 357 Hasida, Koiti ···························1141 Hatazawa, Hiroyoshi ··················414 Hattori, Akira ···························414 Hawley, David J. ·····················330 Hermenegi1do, Manuel Y.······759, 770 Herzig, Andreas ························833 Hirano, Kiyoshi ....... , ··········414, 436 Hirata, Keiji .......................... ·436 Hirosawa, Makoto .............. '294, 300 Hoare, C.A.R. ······ .... ····· .... · .. · .. ··211 Honda, Yasuaki .. · ........ · ........ ·1044 Hori, Atsushi ........ · .. · .... · .. · ...... ·269 Horiuchi, Kenji ........................ 897 Hoshi, Masahiro ........................ 237 Hoshida, Masaki .......... · ...... ·294, 300 Ichiyoshi, Nobuyuki .... · ...... ·166, 869 Idestam-A1mquist, Peter ............ 610 Ido, N. · .. ·· .... ······ .... ···· .. · ...... · .. ·723 Ikeda, Teruo .......................... ·385 Imai, Akira ............................. ·436 Inamura, Yii ........................... 425 Inoue, Katsumi ........................ 522 Ishida, Y oshiteru ...... ·· ............ ·1030 Ishikawa, Masato ...... · ........ 294, 300 Isozaki, Hideki ....................... ·694 Itoh, Fumihide ........................ 278 I wamasa, Mikito ..................... 1099 Iwayama, Noboru .................... ·33q Jaffar, Joxan .......................... ·987 Kahn, Kenneth M ... · .......... · ...... ·943 Yol.l 1- 460 Yol.2 461-1218 Kakas, Antonios C. .................. 546 Kale, Laxmikant Y. .. .. · .... · .... · .. ·783 Kamiko, Mayumi ..................... 286 Kamiya, Akimoto········ .. ···········l099 Karlsson, Roland ....... · .... · .. · .... ·739 Kasahara, Takayasu ··················1084 Kato, Hiroo .... · .. · ...... ·· ........ · .... ·237 Kato, Tatsuo ·· .. · ...... ·· .. · .. · .. ·· .. ··278 Kawagishi, Taro ........ ··· .. ·· .... · .... 330 Kawai, Hideo .......................... ·436 Kawamura, Moto .................... ·248 Kawamura, Tadashi ................. '463 Kawato, Nobuaki .................... ·1181 Kazic, Toni ...... · .. ·· .. · ................ 307 Kesim, F.Nihan .... · ........ · .. ·· .. ·1052 Kim, Byeong Man .................... ·643 Kimura, Kouichi .......... · .... 237, 869 Knill, E ....... · ............ ··· .......... ·539 Kobayashi, Yasuhiro .... · .... · .... ·1084 Kodama, Yuetsu························ 731 Koike, Hanpei .. ·········· .. · .. ··········715 Komatsu, Keiko ..................... 1\ 73 Konagaya, Akihiko .......... ··· .... ·791 Kondo, Seiichi ................... -.... 425 Konishi, Koichi .... · ............ · ...... 791 Konuma, Chiho ..................... 1099 Koseki, Yoshiyuki · ............ · .... 1190 Koshimura, Miyuki .. · ........ · ...... 357 Kotani, Akira ................ · .... · .... ·385 Kowalski, Robert A. .. · .... ···· .. · .. ··219 Kubo, Hideyuki .. · .......... · .. · .. · .... 288 Kubo, Yukihiro .......... ··· ........ 385 Kuhara, Satoru ....................... ·618 Kumon, Kouichi .. · .... · .. ·········· .. ··414 Kurozumi, Takashi .. · .. · .... · .......... 9 Lassez, Catherine ..................... 1066 Le Provost, Thierry ...... · .......... ·1004 Lee, J.H.M ........... ·· .......... · .... · .. 996 Lee, Sang Ho ........................... 643 Lefebvre, Alexandre .......... · .. · .... 915 Levi, Giorgio ................ · .... 570, 581 Lima-Marques, Mamede ...... · .... ·833 Lin, Eileen Tien .. · .................... ·907 Lin, Zheng ·· .. ·· .... ············· .. ·· .. ·859 Linster, M .............................. ·1157 Maeda, Munenori .... · ...... · .... ·· .. ·961 Maeda, Shigeru .................... '" ·1115 Maeng, Seung Ryou1 ........... '643, 851 Maher, Michael J ... ·· ............ · .... 987 Maim, Enrico ...................... · .... 702 Martens, Bern ........................... 473 ii Maruyama, Fumihiro .............. ·1181 Maruyama, Tsutomu .................. 791 Masuda, Kanae ........................ 425 Matono, Fumio ........................ 877 Matsumoto, Yukinori ........ ·237, 1198 Matsuo, Masahiro .................... ·269 Matsuzawa, Fumiko .................. 286 McGuinness, Deborah L. ........ ·1036 Menju, Satoshi ........................ 330 Meo, M.C ............................. ··570 Michaels, George .............. ·300, 307 MiIlroth, Hiikan ........................ 817 Minoda, Y oriko .................... '1181 Mistelberger, H ......................... 969 Miyano, Satoru ........................ 618 Mizoguchi, Fumio .................. 1061 Mochiji, Shigeru .................... ·1099 Montanari, Angelo" ···················702 Montanari, Ugo ........................ 887 Mori, Takeshi .......................... ·278 Mori, Toshiaki ........................ 497 Morita, Masao .................... ·······799 M uggleton, Stephen .................. l 071 Mukouchi, Yasuhito .................. 618 Naganuma, Kazutomo .............. ·248 Nagasawa,Ikuko .................... ·405 Nakagawa, T ........................... ·723 Nakajima, Katsuto .................... ·425 N akakuki, Y oichiro .................. 1190 Nakase, Akihiko························436 Nakashima, Hiroshi .................. 425 Nang, long H ........................... ·851 Nitta, Katsumi ............ 166, 294, Il15 Nonnenmann, Uwe .................. 1091 Ohkawa, Takenao .................... ·497 Ohki, Masaru .... ···· .. ··· .......... · .. ·l 022 Ohsaki, Hiroshi ........................ 1115 Ohta, Yoshihiko ........................ 522 Ohtake, Y oshihisa .................. 1115 Omiecinski, Edward .................. 907 Onishi, Satoshi ....................... ·425 Onizuka, Kentaro ........ ·············294 Ono, K ... ······· .. ······ .. ··············1133 Ono, Masayuki ........................ 1115 Oohira, Eiji ...... ·· .................. ·1022 Overbeek, Ross .. · ...... · ........ 223, 307 Patel-Schneider, Peter F ............. 1036 Paterson, Ross A. . ................... ·825 Pereira, Luis Monis .... · .. · .......... 562 Pietrzykowski, T. .. .................. ·539 Pliimer, Lutz ........ · .. · .............. ·489 Po del ski, Andreas·· .. ········ .. ····· .. 1012 Poirriez, Vincent··· .. · .. ······· .. ··· .. ··674 Poole, David ........................... 530 Preist, Chris··············· .... ····· .. ···· 514 Pull, Helen · ...... ······· .. · .. ····· .. ····682 Ratto, Elena· ...... · .. ····· .... ···· .. ····· 702 Rawn, David .. · .................... · .. ·300 Reiter, Raymond··· .. · .. ··· .. · ...... · .. ·600 Resnick, Lori Al perin .............. ·1036 Robinson, l.A. ....................... ·199 Rokusawa, Kazuaki ................. '436 Rosenblueth, David A. .. ....... · .. 1125 Rossi, Frances~a ........................ 887 Sakai, Shuichi ...... · ................ · .. ·731 Sakama, Chiaki ........................ 592 Sakane, Kiyokazu ...................... 1115 Sano, Hiroshi ........................... 376 Sastry, A.V.S ... · .. · .................... ·809 Sato, Hiroyuki .. · .................... ·248 Sato, Masaki .......................... ·278 Sato, Tadashi ............ · ............ ·~278 Satoh, Ken ········· .. ···················330 Sawada, Hiroyuki ...... · .............. 330 Sawada, Shuho .... · .................. ·1181 Sehr, David C. ...... · ........ · ........ 783 Sergot, Marek ........................ 1052 Shapiro, Ehud ........................... 951 Shaw, M.L.G. .. ...................... 1157 Shimada, Kentaro · ................ · .. ·715 Shin, D.W. ··· .. · .. · .... ·········· .. · .. ··851 Shinjo, Hiroshi ....................... ·1022 Shinogi, Tsuyoshi ..................... 414 Shinohara, Ayumi ...... · .............. 618 Shinohara, Takeshi .. · ........ · ...... 618 Shoham, Y oav·························· ·694 Silverman, William ................. ·951 Smith, Cassandra ...... · ........ · .... ·307 Smolka, Gert .. ············· .. ······ .. ··1012 Sohn, Andrew .............. · ............ 977 Stuckey, Peter J ....... · .... · ............ 987 Sueda, N aomichi .................... '1099 Sugie, M. · .......... · .. ···· .... ········ .. ·723 Sugiyama, Kenji ........................ 405 Sumita, K ........... · .. · ...... · ........ ·1133 Sundararajan, R. .. · .. ······ .. · .. ········809 Suzuki, lunzo .......... · ............ ·1099 Takagi, Tsuneyoshi .................. 436 Takayama, Yukihide .......... · .. · .... 658 Takeda, Yasutaka .................... ·425 Taki, Kazuo ............ 50, 166,237,436, 1074, 1198 Takizawa, Yuka .................... ·1181 Tanaka, Hidehiko ...... · .............. 715 Tanaka, Hidetoshi ...... · .. · .......... ·321 Tanaka, liro ........................... 877 Tanaka, Midori ·· ...... ····· .. · ........ 1190 Tanaka, Yuichi .. · .............. · ...... 155 Tarui, T. . ............................... ·723 Tatsuta, Makoto ........................ 666 Taylor, Ron .... · .. · .. · .. ·· .... · .. · .... ···307 Terasaki, Satoshi · .. · .. · ...... · .. ·· .. ·330 Tezuka, Yoshikazu .. ······· .. · .... · .. ··497 Tick, E ..... · ...... · .. · .. · ...... · .. '''809, 934 Tojo, Satoshi · .... · ...... · .. · ...... · .... 395 Tokoro, Mario ....................... ·1044 Toya, Tomoyuki ............ · ........ · .. 294 Tsuda, Hiroshi ........ · ........ ·257, 347 Turuta, Michiko ........................ 405 Uchida, Shunichi ............ · .. · .. 33, 232 Ueda, Kazunori ...... ·· .. ·· .. · .. ····· .. 799 Ukita, T ....... ·· ........ · .. · ...... · .. · .. ·1133 van Emden, M.H. · .. · ...... · .... 996, 1149 Verschaetse, Kristof .................. 481 Wada, Kumiko .............. ·· ........ 269 Wallace, Mark ....................... ·1004 Watanabe, Toshinori .. · ...... · .. · .. 1173 Watari, Shigeru ....................... ·1044 Wegner, Peter .. · .. · .. ·· .. · .. ·· .......... 225 Yalamanchili, Sudhakar ........... ·907 Yamada, Naoyuki .................. 1084 Yamaguchi, Yoshinori .. · ............ 731 Yamamoto, Reki ..................... 436 Yamasaki, Shigeichiro .............. ·405 Yang, Rong · .... ···· ...... ····· .. ···· .. ··843 Yap, Roland H.C ....... · .. · .......... ·987 Yashiro, Hiroshi ····· .... · .... ··········269 Yasukawa, Hideki · .. · .... ·89, 257, 395 Yokota, Kazumasa ....... "89,248, 257 Yoshida, Kaoru · ...... · .. · ...... ·307, 791 Yoshimura, Kikuo .................. 1084 Yoshino, Katsuyuki .................. 1084 Zawada, David ........................ 307 Zhong, X ..... ·· .. ···· .... ····· .. ··· .. ·· .. ·809
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19 Create Date : 2015:09:16 15:12:07-08:00 Modify Date : 2015:09:16 14:45:11-07:00 Metadata Date : 2015:09:16 14:45:11-07:00 Producer : Adobe Acrobat 9.0 Paper Capture Plug-in Format : application/pdf Document ID : uuid:4576a23c-09af-dd42-a07b-95cd404cc15b Instance ID : uuid:163aefa1-238d-344d-b715-43305fe0e8d3 Page Layout : SinglePage Page Mode : UseNone Page Count : 495EXIF Metadata provided by EXIF.tools