Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 27

MetaC 1.0

David McAllester

September 11, 2018

Abstract

MetaC provides a read-eval-print loop (a REPL) and notebook inter-

active development environment (a NIDE) for C programming. MetaC

can also be described as a Lisp-inspired programming environment for C.

The REPL and the NIDE are capable of incrementally compiling and exe-

cuting C statements and incremental procedure deﬁnitions in a persistent

C process. This is done by compiling and loading dynamic load libraries.

MetaC also extends C with Lisp-like features. More speciﬁcally, MetaC

provides a C-like universal syntax, computed macros, backquote [1], and

pattern-matching. The implementation is bootstrapped — it is imple-

mented in itself. These C extensions are intended to support the de-

velopment of high level frameworks with notebook interfaces as direct

extensions of C. This is in contrast to scripting-C hybrids such as Matlab,

Numpy, TensorFlow or PyTorch.

The NIDE is implemented in Emacs piped to a MetaC REPL. Pre-

sumably notebook IDEs for C or C++ could also be implemented as

extensions of Atom or Visual Studio Code piped to a persistent process.

Dynamic linking is diﬃcult to implement robustly. MetaC release 1.0

runs under both Ubuntu 18.04.1 / gcc 7.3.0 and MAC OSX 10.11.4 / gcc

Apple LLVM version 7.3.0 (clang-703.0.31).

Acknowledgement: I would like to thank Bob Givan for his aid in exercising

and polishing MetaC and his continuing eﬀorts on MathZero.

Contents

1 Introduction and Overview 4

1.1 Installation .............................. 4

1.2 HelloWorld.............................. 4

1.3 C Statements and C Expressions . . . . . . . . . . . . . . . . . . 5

1.4 Deﬁnitions of Types and Procedures . . . . . . . . . . . . . . . . 6

1.5 The Global Variable Array Restriction . . . . . . . . . . . . . . . 7

1.6 The Single Symbol Type Restriction . . . . . . . . . . . . . . . . 7

1.7 Backquote and Universal Syntax . . . . . . . . . . . . . . . . . . 8

1.8 PatternMatching........................... 8

1.9 Gensym and Macro Deﬁnitions . . . . . . . . . . . . . . . . . . . 10

1.10TheNotebookIDE.......................... 11

2 Universal Syntax 13

2.1 Universal Expressions: cons-car-cdr . . . . . . . . . . . . . . . . . 13

2.2 Universal Expressions: Phantom Brackets . . . . . . . . . . . . . 14

2.3 The Reader Speciﬁcation . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Backquote and Pattern Matching Revisited . . . . . . . . . . . . 19

3 Miscellaneous Features 20

3.1 Interning and Expression Properties . . . . . . . . . . . . . . . . 20

3.2 MemoryFrames............................ 21

3.3 UndoFrames ............................. 21

3.4 Bootstrapping and File Expansion . . . . . . . . . . . . . . . . . 22

3.5 MacroEﬀects............................. 23

3.6 Macro Preliminary Deﬁnitions . . . . . . . . . . . . . . . . . . . . 23

4 List of MetaC Primitives 23

4.1 Expressions .............................. 24

4.2 Properties............................... 24

4.3 Errors and Breakpoints . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Reading and Printing . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5 MacroExpansion........................... 25

4.6 ListProcessing ............................ 26

4.7 MemoryFrames............................ 26

4.8 UndoFrames ............................. 26

4.9 Bootstrapping and File Expansion . . . . . . . . . . . . . . . . . 26

1 Introduction and Overview

1.1 Installation

To install the REPL:

1. Clone the MetaC repository in a local MetaC directory.

2. From a shell running in your MetaC directory type make MC.

3. From the same shell type MC to get the command line prompt MC>.

To install the NIDE:

1. Clone the repository into a repository directory.

2. From a shell running in your MetaC directory type make NIDE.

3. Add (load "<MetaC directory>/mc.el") to your .emacs ﬁle.

4. Edit the ﬁrst line of mc.el to give the path to your gdb executable (the

Gnu debugger).

5. Edit the second line of mc.el to give the path to your MetaC directory.

6. Start or restart Emacs. The ﬁle mc.el adds hooks such that it is safest to

restart Emacs rather than load em.el into a running Emacs.

The NIDE can be run on any ﬁle with extension .mc in any directory. To

experiment with the NIDE use Emacs to visit the ﬁle NIDE-examples.mc from

the MetaC repository. Type C-M-s (control-meta-s) to start (or restart) the

C kernel. Typing C-c C-c will run the cell containing the cursor and print the

result in a C comment at the bottom of the cell. This is substantially equivalent

to typing the cell contents into the REPL. More Emacs key bindings and other

features of the NIDE are described in section 1.10.

The following examples use the REPL. However, all of these examples can be

run as cells in the NIDE.

1.2 Hello World

bash$ make MC

...

bash$ MC

MC> ‘{hello world}

hello world

MC>

In the above example the backquote expression given to the REPL macro-

expands to a C expression which evaluates to the expression “hello world”.

Backquote is described in more detail below.

1.3 C Statements and C Expressions

We can also declare global variables and execute statements from the REPL.

MC> int x[10];

done

MC> for(int i = 0; i < 10; i++)x[i] = i;

done

MC> for(int i = 0; i < 10; i++)fprintf(stdout,"%d",x[i]);

0123456789done

MC>int_exp(x[5])

MC>

In C syntax one distinguishes C expressions from C statements. A C statement

is executed for eﬀect while a C expression is evaluated for value. The REPL

can be given either a statement or an expression. When given a C statement,

the REPL simply prints “done” after executing the statement. When given a C

expression the REPL computes and prints the value. The REPL assumes that

the value of a C expression is a universal syntax tree (also called an expression,

giving a second, and confusing, sense of the term “expression”). The procedure

int exp above converts an integer to a universal syntax expression. Universal

syntax expressions are abstract syntax trees but can viewed as representations

of strings.

If a C statement executes a return outside of any procedure then that value

is returned to the REPL and printed. For example, the above session can be

extended with the following.

MC>{int sum = 0; for(int i = 0; i < 10; i++)sum += x[i]; return int_exp(sum);}

MC>

1.4 Deﬁnitions of Types and Procedures

One can also deﬁne new data types and procedures from the REPL.

MC>typedef struct myexpstruct{

char * label;

struct myexpstruct * car;

struct myexpstruct * cdr;} myexpstruct, *myexp;

done

MC> myexp mycons(char*s,myexp x,myexp y){

myexp cell=malloc(sizeof(myexpstruct));

cell->label=s;

cell->car=x;

cell->cdr=y;

return cell;}

done

MC> expptr myexp_exp(myexp x){

if(x==NULL)return string_atom("nil");

return

‘{${string_atom(x->label)}

${myexp_exp(x->car)}

${myexp_exp(x->cdr)}};

}

done

MC> myexp_exp(mycons("foo",mycons("bar",NULL,NULL),NULL))

foo bar nil nil nil

MC>

Procedures can also be redeﬁned at the REPL provided that the signature (argu-

ment types and return type) remains the same. Changing a procedure signature

requires restarting the MetaC REPL. This restricting ensures meaningful C type

checking in the presence of dynamic linking.

1.5 The Global Variable Array Restriction

To simplify dynamaic linking, all global data variables must be arrays. One can

always use single element arrays to represent non-array variables. To make this

more convenient we allow single element arrays to be declared with an initial

value in a manner similar to non-array data variables. For example, we can

declare and assign a variable yas follows.

MC> int y[0] = 2;

done

MC> y[0] += 1;

done

MC> int_exp(y[0])

MC>

Here assignments to y[0] are allowed but assignments to yare not — assignment

to array variables are not allowed in C. As noted above, this restriction greatly

simpliﬁes dynamic linking of data variables.

1.6 The Single Symbol Type Restriction

To simplify the implementation of MetaC all type expressions appearing in

procedure signatures and global array declarations must be single symbols. Ar-

bitrary types can be given single symbol names using typedef.

1.7 Backquote and Universal Syntax

We now consider backquote and universal syntax in more detail. Backquote can

be used in a manner analogous to string formatting.

MC> expptr friend[0] = ‘{Bob Givan};

done

MC> int height[0] = 6;

done

MC> ‘{My friend ${friend[0]} is ${int_exp(height[0])} feet tall.}

My friend Bob Givan is 6 feet tall.

MC>

While universal syntax expressions can be viewed as representations of strings,

universal syntax expressions are actually abstract syntax trees. The tree struc-

ture plays an important role in pattern matching. The tree structure is more

apparent in the following example.

MC> expptr e[0] = ‘{a+b};

done

MC> ‘{bar(${e[0]})}

bar(a+b)

MC>

As described in the next section, the pattern foo($x) will match the expression

foo(a+b) with xbound to the expression (tree) a+b.

1.8 Pattern Matching

MetaC provides a pattern matching case construct ucase (universal case). The

general form of the case construct is

ucase{e;

{<pattern1>}:{<body1>}

...

{<patternn>}:{<bodyn>}}

Variables are marked in patterns by the preﬁx $.

MC>int numeralp(expptr x)

{if(!atomp(x))return 0;

char*s=atom_string(x);

for(int i=0;s[i]!=’\0’;i++){if(s[i]<’0’||s[i]>’9’)return 0;}

return 1;

}

done

MC> int value(expptr e)

{ucase

{e;

{$x+$y}:{return value(x)+value(y);}

{$x*$y}:{return value(x)*value(y);}

{($x)}:{return value(x);}

{$z}.(numeralp(z)):{return atoi(atom_string(z));}

}

return 0; //this dead code convinces the compiler there is a return value.

}

done

MC>int_exp(value(‘{5+2*10}))

MC>int_exp(value(‘{foo}))

match error: the value

foo

does not match any of

{$z}.(numeralp(z)){($x)}{$x*$y}{$x+$y}

MC>

In many situations the choice of the particular tree structure imposed by the

MetaC reader is not important as long as the same conventions are used in

both the patterns and the expressions they are matching. For example, the

expression a+b+c will match the pattern $x+$y+$zindependent of whether + is

left associative or right associative. But the tree structure does matter in other

cases. The pattern $x*$ywill not match the expression a+b*c because the

reader brackets a+b*c as ha+hb∗cii. The pattern $x*$ydoes match (a+b)*c.

The variable $any is special. It acts as a wild card and is not bound to a value.

When a ucase is executed the patterns are tried sequentially and stops after the

ﬁrst match. If no pattern matches an error is generated. One can always add a

ﬁnal pattern of {$any}to provide a default catch-all case if that is desired.

In release 1.0 repeated variables, such as in the pattern {$x + $x}, are not

supported (and not checked for). If a variable is repeated it will be matched at

its last occurrences under depth-ﬁrst left to right traversal.

1.9 Gensym and Macro Deﬁnitions

MetaC supports computed macros. Pattern matching and backquote greatly

facilitate the writing of computed macros. Backquote and ucase are both im-

plemented as computed macros in the bootstrapped implementation. As a very

simple example we can deﬁne a dolist macro at the REPL.

MC>umacro{dolist($x,$L){$body}}{

expptr rest=gensym("rest");

return

‘{for(expptr $rest = $L; cellp($rest); $rest = cdr($rest);){

expptr $x=car($rest);

$body}};

}

done

MC>macroexpand(‘{dolist(item,list){f(item);}})

for

(expptr _genrest33=list;

cellp(_genrest33);

_genrest33=cdr(_genrest33);

){expptr item=car(_genrest33);f(item);}

MC>

The general form of a macro deﬁnition is

umacro{<pattern>}{<body>}

where instances of <pattern> in MetaC code are to be replaced by the value

returned by <body> under the variable bindings determined by the match. The

pattern is restricted so as to have an identiﬁable head symbol to which the

macro is attached. The procedure macroexpand(exptr e) takes a universal

syntax expression and returns the result of repeatedly expanding outermost

macros until no further macro expansion is possible.

The macro expansion of the above deﬁnition of dolist deﬁnes a procedure to

compute the macro expansion and installs that procedure on a macro property

of the symbol dolist. A typical macro pattern is a symbol followed by one or

more parenthesis expressions in which case the macro is attached to the initial

symbol. The pattern can also be a binary connective expressions such as x. >y.

In this case the macro is attached to the connective.

1.10 The Notebook IDE

Starting the kernel with the key C-M-s creates a running C process. When a cell

is executed with the command C-c C-c the text of the cell is (invisibly) passed

to Cprocess and a value returned y the process is printed inside a comment

below the cell. If a comment already occurs directly below the cell this comment

is replaced by a new comment containing the response of the REPL. As in a

Jupyter notebook, the values returned from the REPL are numbered so that

one can see when the cell was executed relative to other cell execution in the

notebook. If C-M-s is run when a kernel is already running the kernel process

is killed and a new one started (as in a Jupyter notebook). When the kernel is

restarted the execution numbering is reset to start with 1. Changing a procedure

signature (argument and return types) requires restarting the kernel.

Cell Boundaries. A cell begins at any nonempty line whose ﬁrst character

is other than white space (space or tab), a close character ’)’, ’}’, or ’]’, or a

slash ’/’. The cell ends at the begging of the next non-empty line whose ﬁrst

character is other than white space or a a close character. Note that a comment

starting at the beginning of a line terminates a cell.

Compile-Time Errors. If an error occurs while attempting to read or compile

the cell expression an error statement is displayed — typically the output of the

C compiler — and an error message is printed in the code buﬀer as the value of

cell.

Run Time Errors and Breakpoints. The MetaC procedure berror(char

*) enters the gdb debugger. Typing the continue command to gdb will return to

the NIDE (or REPL) with the C process intact. MetaC primitives call berror

when safety checks fail. The MetaC procedure breakpt(char *) can be used

for breakpoints from which execution can be continued. When this procedure is

called gdb will be entered. Giving the continue command to gdb will continue

from the breakpoint.

Segmentation Faults. Segmentation faults, and other types of errors, are

trapped by gdb directly. In this case the NIDE will enter the gdb debugger.

However, in this case the normal gdb continue command will not return to the

code buﬀer. To recover one types print return to NIDE() to gdb which will

typically return to the code buﬀer with the persistent C process intact. When

memory corruption is a possibility it is safest to restart the C process.

Printing. The procedure mcprint is like fprintf but omitting the ﬁle argu-

ment. The expression mcprint("x = %s\n",s) will print to the *Messages*

buﬀer in Emacs. The expression mcpprint(e)will pretty-print the expression

eto the *Messages* buﬀer.

Key Bindings. mc.el provides the following key bindings.

C-M-s starts or restarts the persistent C process.

C-c C-c executes the cell containing the cursor.

C-c C-r executes all cells in the current region sequentially with a diﬀerent execu-

tion number for each cell execution.

C-M-a moves to the begging of the current cell.

C-M-p moves to the beginning of the preceding cell.

C-M-n moves to the beginning of the next cell.

C-M-c removes all cell values. This is useful before a git commit as it can reduce

or eliminate diﬀerences in ﬁles between commits.

Include Statements. A cell in a code ﬁle containing #include(<filename>)

(with no semicolon) will load the cells of the named ﬁle into the persistent C

process. The extension ”.mc” will be automatically added and should not be

explicitly given.

2 Universal Syntax

It is not obvious how to implement light weight expression quotation and ex-

pression pattern matching for C expressions. The syntax of C is complex. We

bypass this complexity by introducing a syntax with the following three “uni-

versal” properties.

1. Universal syntax trees are semantics-free. They are simply trees viewed

as representations of character strings.

2. The universal syntax reader can read any parenthesis-balanced charac-

ter string. Here parenthesis-balanced means that every open parenthesis,

brace or bracket has a matching closing character and that string quota-

tions are properly closed.

3. The printer inverts the reader. If we read a string sinto a universal syntax

expression eand then print eback into a string s0we have that sand s0

are equivalent in a strong sense. Here we want sand s0to be “whitespace

equivalent” so as to be treated equivalently by the C lexer.

The emphasis here is on the representation of strings. The reader does not

always invert the printer — the expression hhone twoithreeiprints as one

two three which reads as hone htwo threeii. But the represented string is

preserved. This is fundamentally diﬀerent from most programming languages

supporting symbolic computation (such as Lisp) where it is assumed that the

tree structure, rather than the represented string, is fundamental and hence

that the reader should invert the printer.

The above universality properties allow one to assign semantics to universal

syntax expressions based on the strings that they represent. We can assign C

semantics to an expression by printing the expression and passing the resulting

string to a C compiler. This string-based semantics is not always compositional

with respect to the universal syntax tree structure. However, the tree structure

imposed by the reader is designed to approximate the compositional structure

of C syntax. In most cases pattern matching on universal syntax expressions

recovers substructure that is semantically compositional under C semantics.

Parentheses and semicolons can be helpful in aligning universal syntax trees

with C semantics. For languages and frameworks implemented as macro pack-

ages in MetaC one can guarantee that the universal syntax tree structure has

compositional semantics.

2.1 Universal Expressions: cons-car-cdr

A universal syntax expression is either an atom (a wrapper around a string), a

pair he1e2iwhere e1and e2are expressions, or a “parenthesis expression” the

form (e), {e}, or [e] where eis an expression. All three of these datatypes have

the same C type expptr (Expression Pointer). For each of the types atom, pair

and parenthesis expression there is a constructor procedure, a predicate, and

accessor functions as follows.

expptr string_atom(char *);

int atomp(expptr);

char * atom_string(expptr); //the argument must be an atom.

expptr cons(expptr,expptr);

int cellp(expptr);

expptr car(expptr); //the argument must be a cell. returns the first component.

expptr cdr(expptr); //the argument must be a cell. returns the second component.

expptr intern_paren(char,expptr); // the char must be one of ’(’, ’{’ or ’[’

int parenp(expptr);

expptr paren_inside(expptr);

2.2 Universal Expressions: Phantom Brackets

The MetaC reader maps a character string to a universal syntax expression. A

speciﬁcation of the reader is given in section 2.3. The expression produced by

the reader can be represented by “phantom brackets” around the given string

where there is a pair of brackets for each cons cell. For example the expression

{one two three}reads as {hone htwo threeii}. Examples of strings and the

phantom bracking imposed by the reading those strings is given in ﬁgure 1.

The printer simply removes the phantom brackets and prints the string that the

expression represents.

To reduce the clutter of brackets we will adopt two conventions for supress-

ing brackets when exhibiting phantom brackets. The ﬁrst convention is the

Lisp right-branching convention of not showing all the cell brackets for right-

associative (right-branching) sequences. For example hzero hone htwo threeiii.

will be written as hzero one two threei. Note that these “lists” are atom-

terminated rather than the Lisp convention of nil termination.

The second convention is that we will sometimes write hha1oia2iwhere ois a

connective atom as the left-branching structure ha1o a2i. For example, the ex-

pression {a+b}reads as {hha+ibi}which is then abbreviated as {ha+bi}.

Left-branching for binary connectives gives a C-consistent treatment of semi-

colon as a binary connective while also supporting the interpretation of semi-

colon as a statement terminator. This is discussed in more detail below.

It will generally be clear from context whether we using the Lisp right-branching

list convention or the left-branching connective convention. A set of examples

Hello World ⇒hHello Worldi

one two three ⇒hone htwo threeii

=hone two threei

x+y⇒hhx+iyi

=hx+yi

x+y∗z⇒hx+hy∗zii

(x+y)∗z⇒h(hx+yi)∗zi

foo(int x)⇒hfoo (hint xi)i

foo(int x,float y)⇒hfoo (hhint xi,hfloat yii)i

Figure 1: Examples of Reader Bracketings. Bracketings are shown for the

expression that results form reading the given strings. A complete bracketing

shows a pair of brackets for every expression pair (cons cell). The second and

third example show two somewhat informal conventions for dropping some of

the brackets — general sequences are assumed to be right-associative and binary

connective applications are assumed to be left-associative. These conventions

are used in other examples. Expressions are printed without the brackets — the

brackets are “phantoms” that show the tree structure. The bracketing (parsing)

done by the reader is speciﬁed formally in section 2.3. Since the bracketing does

not aﬀect the represented string, the precise bracketing is often unimportant.

of strings and the phantom brackets generated by the reader is given in ﬁgure 1.

To emphasize the signiﬁcance of left-branching binary connectives we note that

the MetaC reader bracketing of

{e1;e2;e3;e4}

can be written using either the left-braching binary connective convention as

{he1;he2;he3;e4iii}

or the right-braching sequence convention as

{hhe1;i he2;i he3;ie4i},

both of which abbreviate the same full bracketing

{hhe1;i hhe2;i hhe3;ie4iii}.

2.3 The Reader Speciﬁcation

The MetaC reader can be described in three phases — preprocessing, lexical-

ization, and parsing.

The MetaC preprocessor replaces each C-style comment with a space char-

acter. When processing an entire ﬁle, as is done in the MetaC procedure

file expressions described in section 3.4, the preprocessor divides the ﬁle

in cells using the same conventions as is used in the NIDE. A ﬁle must be

parenthesis-balanced within each cell.

Lexicalization segments a pre-processed character sting into to a sequence of

atoms. The MetaC lexer preserves all non-white characters. For the MetaC

lexer each atom is one of the following.

•Symbols. A symbol is a character string consisting of only alpha-numeric

characters — upper and lower case letters of the alphabet, plus the decimal

numerals, plus underbar. For example foo bar1.

•Connectives. A connective is a character string consisting of only “connec-

tive characters” — the connective characters consist of all non-alphanumeric,

non-white, non-parenthesis, characters other than the three special char-

acters $,‘and \.

•Quoted strings. A character string starting and ending with the same

string quotation character. For example "foo bar","!:&?;" or ’a’.

•Special character atoms. These are atoms whose strings are one character

long where that character is one of the special characters ‘,$, and \.

Two strings will be called whitespace-equivalent if they lexicalize to the same

sequence of atoms.

The reader is speciﬁed by a the grammar shown in ﬁgure 2. To simplify the

speciﬁcation of the reader we enclose the given string in parentheses so that,

without loss of generality, the input is assumed to be of the form {s}where sis

the lexical sequence to be read. The grammar has a nonterminal Pfor paren-

thesis expression which we take as the top level nonterminal. The nonterminals

SYM and QUOTE range over alpha-numeric string atoms and quoted string

atoms respectively. For each positive integer pwe have a nonterminal CONNp

ranging over connective atoms of precedence p. We have a nonterminal Epfor

expressions formed with connectives of precedence p. We also has a nonterminal

E∞for expressions that are formed at higher precedence than any connective

expression. The precedence levels are divided into a set Lof left-associative

precedence levels and a set Rof right associative precedence levels. There is a

special precedence level 3 for combining expressions with the “null connective”.

The null connective is left-associative. The null connective is higher precedence

that semicolon and comma but lower precedence than all other connectives. The

nonterminal E∞includes an epsilon production allowing null arguments.

The nonterminal Sranges over symbols or variables where VAR ranges over

variables. Variables are handles specially in pattern matching (ucase) and pat-

P::= (Ep)| {Ep} | [Ep]

Ep::= hhE`CONNpiErip∈ R, ` > p, r ≥p

Ep::= hhE`CONNpiErip∈ L, ` ≥p, r > p

E3::= hE`Eri` > 3, r ≥3

E∞::= QUOTE |hS P ∗i|h‘Pi|P||JUNK

S::= SYM |VAR

VAR ::= h$ SYMi|h$Pi|h\VARi

P∗::= |hP P ∗i

JUNK ::= ‘ |SJUNK

SJUNK ::= $ |\|h\SJUNKi

Figure 2: The grammar deﬁning the MetaC reader. The input string is

assumed to be of the form (s) so that endpoints are explicitly labeled. The non-

terminal SYM generates non-empty alpha-numeric strings; CONNpgenerates

non-empty connective strings of precedence p; and QUOTE generates string

quotations. Ris a set of right-associative precedence levels and Lis a set of

left-associative levels. Only the largest ﬁnite precedence level is left-associative.

MetaC classiﬁes all printable ASCII characters as being either alpha-numeric,

connective, string quotations, parenthesis characters or one of the three special

characters ‘, $ or \. The reader will accept any parenthesis-balanced string. 

denotes the empty string. Generated cells of the form hw ior h wiare re-

placed by w. Although the grammar is ambiguous, the reader is deterministic

as speciﬁed by the constraints that E∞expressions must be maximal and that

the use of the production for E∞must be minimized. See the text for details.

tern instantiation (backquote). Expression of the form hS P ∗iinclude foo,

foo(x),foo(int x){return x;},$x,$f(x), and ${f(x)}.

The above grammar is ambiguous. For example the string {foo (a) (b)}can

be parsed as either the left-branching structure {hhSYM P∗iPi}or the right-

branching structure {hSYM P∗i}. The reader is of course deterministic — {foo

(a) (b)}is read as right-branching. While a deterministic grammar for the

reader can be given, it is simpler to specify a deterministic parsing process

for the above ambiguous grammar. The implementation runs a deterministic

left-to-right shift-reduce process. However, the parser is easier to specify as

operating in global stages. In the ﬁrst stage one identiﬁes all maximal E∞

substrings. The maximal requirement implies that {foo (a) (b)}is parsed

as the single (right-branching) E∞expression hfoo (a) (b)i. The maximil

requirement also implies that the string {$foo (a) (b)}is read as the single

E∞expression hh$fooi(a) (b)i.

After identifying maximal E∞expressions we are left with a sequence of argu-

ments (the E∞expressions) and connectives. However, it is possible that this

sequence contains multiple consecutive connectives or multiple consecutive ar-

guments. We then place an argument between any two consecutive connectives

using the production for E∞and also add an term at the beginning of the

sequence if the sequence starts with a connective and an epsilon argument at

the end if the sequence ends in a connective. We now have a sequence of ar-

guments and connectives starting and ending with arguments and where there

are no two consecutive connectives. Next set kto be the largest precedence of

any connective and form all the Eksubstrings. We then repeatedly decrement

kand identify all the Eksubstrings until we have identiﬁes all Epsubstrings

for all p. At this point the entire string must be parsed as Epwhere pis the

smallest precedence of the connectives where we think of adjacent arguments as

having an implicit precedence 4 connetive between them.

This process can parse any parenthesis-balanced string.

The above speciﬁcation of the reader can result in cells of the form he ior h ei.

The implementation avoids producing such cells and the reader can be viewed

as replacing he ior h eiby eso that all cells are of the form he1e2iwhere e1

and e2are expressions representing non-empty strings. An empty parenthesis

string () is read as a parenthesis expression containing an atom for the empty

string.

Grouped by precedence, the binary connective characters are

{;} {,} {|} {} {&,?,!} {=∼,<,>} {+,-} {*,/} {%, ˆ, .} {:,@,#}.

The precedence of a binary connective is determined by its ﬁrst character. The

characters have low to high (outer to inner) precedence in the order given with

symbols in the same group having the same precedence. Here represents the

null connective at precedence level 4 The precedence can be summarized in

outer to inner order as semicolon, comma, bar, the null connective, Boolean

connectives, binary predicates, three levels of binary functions, and innermost

connectives which can be viewed as providing a way of building structured

atoms. This universal syntax precedence conventions has various divergences

with C syntax. However, when writing a macro package in MetaC one can

simply adopt the MetaC precedence conventions for the source code and use

parentheses where necessary in the generated C code.

2.4 Backquote and Pattern Matching Revisited

The semantics of backquote is deﬁned in terms of cons-car-cdr view of expres-

sions independent of the MetaC reader. In the typical case, the C value of a

backquote expression h‘{e}iis the expression (tree) ewith subexpressions (sub-

trees) of the form h$xi, where xis a symbol, replaced by the C value of xand

subexpressions (subtrees) of the form h${w}ireplaced by the expression which

is the C value of the string represented by the expression w.

Unfortunately this evaluation rule for backquote expressions is incomplete. One

of the most confusing situations is where the expansion of a macro contains a

backquote. Writing such a macro typically involves nested backquotes. While

nested backquotes are confusing, and should be avoided when possible, MetaC

supports nested backquotes. We consider a series of backquote expressions each

of which evaluates to the previous one.

First we have

‘{a+b+ ${z}} (1)

If the value of variable zis the expression cthen the value of expression (1) is

the expression a+b+c. The symbol $can be included in the value of a backquote

expression by quoting it. This gives our second expression.

‘{‘{a+ ${y}+\${z}}} (2)

If the value of variable yis the expression bthen the value expression of (2) is

expression (1). We can even have multiple layers of quotation as in the following.

‘{‘{‘{${x}+\${y}+\\${z}}}} (3)

If the value of variable xis the expression athen the value of expression (3) is

expression (2).

As with backquote, pattern matching is deﬁned in terms of the cons-car-cdr

view of expressions independent of the MetaC reader. We deﬁne a substitution

to be a mapping from symbol atoms to expressions. For a substitution σand

a pattern {e}we deﬁne the expressions σ(e) to be the result of replacing each

subexpression (subtree) of eof the form h$xiwhere xis a symbol atom with

σ(x). A pattern expression (tree) {e}matches an expression (tree) wwith

substitution σif σ(e) = w. An exception to this is the case where a variable

occurs multiple times in the pattern. In release 1.0 multiple occurrences of a

variable in a pattern is not supported.

3 Miscellaneous Features

3.1 Interning and Expression Properties

In MetaC expressions are interned and have property lists. Interning means that

two expressions which have the same tree structure and sequence of leaves —

that are “equal” in Lisp terminology – represented by the same data structure

with the same memory address — are “eq”. Hence we have

MC> int_exp(‘{foo(a)} == ‘{foo(a)})

Interned expressions with properties provide the functionality of dictionary data

structures such as C++ hashmaps or Python dictionaries. Interning also facili-

tates the implementation of forward chaining inference algorithms — we do not

want to waste noticing the same consequence over and over again. Interned

expressions with property lists can be viewed as a form of no-SQL database. In-

terning also allows directed acyclic graphs (dags) to be represented compactly as

expressions. This is very convenient for the representation of machine-generated

justiﬁcations (proofs).

The procedure setprop(expptr exp, expptr prop, void * value) sets the

given property of the given expression to the given value. The procedure

getprop(expptr exp, expptr prop, void * default) the given property of

the given value or the default value if no property value has been assigned.

MetaC also provides the procedures setprop int(expptr e, expptr p, int

n) and getprop int(expptr e, expptr p, int default). It is not diﬃcult

to deﬁne polymorphic macros for setting and extracting properties where the

type of the property is passed as an argument. It is also not diﬃcult to as-

sign types to particular properties so that the type argument is not needed in

polymorphic property setting and retrieving. However, these features are not

present in release 1.0.

3.2 Memory Frames

A memory frame is analogous to a stack frame but is independent of the C stack.

The procedure push memory frame() pushes a memory frame. The procedure

stack alloc(int nbytes) returns a pointer to a block of the given size allo-

cated from the current memory frame. The procedure pop memory frame()

pops the current memory-management frame by resetting a free pointer (a

“stack pointer”) so as to free all allocation done in that frame. MetaC pre-

allocates the heap (stack space) used for memory frames.

3.3 Undo Frames

Undo frames are similar to memory frames but with the added feature of undoing

eﬀects tied to the frame. The procedure push undo frame() enters a new undo

frame. Memory is allocated from the current undo frame with the procedure

undo alloc(int nbytes). Eﬀect are tied to the current undo frame with the

C preprocessor macro undo set(void * loc, void * val). On exit from an

undo frame all memory locations assigned by undo set are restored to the values

they had before entry to the frame.

In MetaC all expression allocation and property assignments are tied to the

current undo frame. On exit from the undo frame the database represented by

expression properties is restored to the state that existing on frame entry.

The procedure exp from undo(expptr exp) pops the current undo frame copyng

the given expression into the parent frame and returning the parent-frame copy

as a value. This is used for returning the “value” or “result” of a large compu-

tation while freeing the allocation and undoing the eﬀects of the computation.

This can be implemented as follows.

exptr exp_from_undo_frame(expptr e){

push_memory_frame();

expptr temp = copy_exp_to_stack(e);

pop_undo_frame();

expptr result = copy_exp_from_stack(e);

pop_memory_frame();

return result;

}

The procedure clean undo frame(expptr e) is similar to exp from undo frame

but replaces the current undo frame with a fresh one into which the given ex-

pression is placed. In this case there is no eﬀect on the parent frame.

exptr clean_undo_frame(expptr e){

push_memory_frame();

expptr temp = copy_exp_to_stack();

pop_undo_frame();

push_undo_frame();

result = copy_exp_from_stack(temp);

pop_momory_frame();

return result;

}

The procedure clean undo frame(expptr e) can be used for garbage collection

as in (install live stuff(clean undo frame(compute live stuff()))).

The procedures exp from undo frame and clean undo frame both run in time

proportional to the dag size of the given expression. The dag size can be expo-

nentially smaller than the tree size.

3.4 Bootstrapping and File Expansion

MetaC is bootstrapped. The makeﬁle for MetaC includes the following.

mcB.c : expandA mcB.mc

./expandA mcB.mc mcB.c

The executable expandA is implemented entirely in Cbut implements the back-

quote macro. The executable expandA expands backquote expressions appearing

in the input ﬁle. The ﬁle mcB.mc implements ucase using backquote and the

excutable expandB expands both backquote expressions and ucase statements.

The makeﬁle also includes.

mcC.c : expandB mcC.mc

./expandB mcC.mc mcC.c

Here the ﬁle mcC.mc implements additional macros using both backquote and

ucase. In the MetaC makeﬁle this is continued up to mcE.mc and expandE.

MetaC provides the procedure mcexpand(char * f1, char * f2) which is

used in the code for the expand commands. This procedure macro-expands

the ﬁle f1 and writes the result to ﬁle f2. This macro-expansion is done using

whatever procedures and operators currently have macro expansion procedure

pointers on their property lists, i.e., the expansion is done using whatever macros

are deﬁned in the current state.

The procedure mcexpand(char * f1, char * f2) is implemented using the

MetaC procedure file expressions(char * f). This procedure takes a ﬁle

name and returns a list of the expressions contained in the cells of that ﬁle.

3.5 Macro Eﬀects

Macros expansion can have eﬀects. The macro umacro expands to a procedure

deﬁnition of the macro expansion code but also generates a statement which

installs that procedure pointer in the macro property of the head symbol. During

macro expansion the MetaC primitive add init form(expptr statement) is

called with a statement that installs the procedure pointer on the appropriate

property list.

In ﬁle expansion the initialization forms generated during macro expansion are

incorporated into an initialization procedure. The macro init fun(<fname>)

macro expands to a deﬁnition of the given procedure name with no arguments

but with the sequence of initialization forms in its body. In the REPL and IDE

the initialization forms are run before the macro expansion of the input or cell

is run.

3.6 Macro Preliminary Deﬁnitions

Macro expansion can also generate preliminary deﬁnitions. It is possible to

implement syntactic closures (lambda) as a macro. A closure consists of a pro-

cedure pointer together with values for the free variables in the body of the

lambda expression. The lambda macro must ﬁrst create the procedure that ex-

ecutes the body of the lambda expression and then incorporate that procedure

pointer into the expansion of lambda macro. The construction of the procedure

deﬁnition is a “preamble” to the macro expansion. MetaC provides the primi-

tive add preamble(expptr definition) for creating preambles during macro

expansion. In ﬁle expansion these preamble deﬁnitions are inserted into the

output ﬁle before the expression generated by the macro. In the REPL and

NIDE these preamble deﬁnitions are installed along with the expression created

by the macro. MetaC 1.0 supports preambles but does not provide a macro for

lambda expressions.

4 List of MetaC Primitives

We now give a list of the MetaC primitives and pre-installed type deﬁnitions.

The macros backquote, ucase, and umacro expand to code built on these prim-

itves. We start with type deﬁnitions needed for the single atom type restriction

needed for the limiting type parsing abilities of MetaC.

typedef char * charptr;

typedef void * voidptr;

typedef FILE * FILEptr;

typedef struct expstruct{...} * expptr;

4.1 Expressions

The macros backquote and ucase expand into C code using these procedures.

expptr string_atom(charptr s);

int atomp(expptr e);

charptr atom_string(expptr a);

expptr cons(expptr x, expptr y);

int cellp(expptr e);

expptr car(expptr x);

expptr cdr(expptr x);

expptr intern_paren(char openchar, expptr arg); \\ openchar must be one of ’(’, ’{’ or ’[’.

int parenp(expptr e);

expptr paren_inside(expptr e);

char constructor(expptr e); //used for paren expressions.

We also have integer-expression conversions. The REPL and NIDE require that

inputs and cells respectively are expression-valued.

expptr int_exp(int i);

int exp_int(expptr s);

4.2 Properties

The macro umacro expands to code that sets the macro property of some atom

to a procedure pointer.

void setprop(expptr e, expptr key, voidptr val);

expptr getprop(expptr e, expptr key, expptr defaultval);

expptr gensym(charptr s);

4.3 Errors and Breakpoints

void berror(charptr s);

void breakpt(charptr s);

void return_to_NIDE();

4.4 Reading and Printing

The procedure file expressions takes a ﬁle name and returns a list of the

expressions (in universal syntax) contained in the cells of the ﬁle. The last two

print to the emacs *Messages* buﬀer in the NIDE. The last expressions is a

macro.

expptr file_expressions(charptr name);

void pprint(expptr e, FILEptr f, int indent); //indent is typically 0

void mcpprint(epptr e);

void mcprint(...)

4.5 Macro Expansion

expptr macroexpand(expptr e);

expptr macroexpand1(expptr e); //this result may contain unexpanded macros

void add_init_form(expptr statement);

void add_preamble(expptr aux_definition);

4.6 List Processing

expptr append(expptr l1, expptr l2);

expptr reverse(expptr l);

typedef expptr exp_to_exp(expptr);

typedef void exp_to_void(expptr);

expptr mapcar(exp_to_exp f, expptr l);

void mapc(exp_to_void f, expptr l);

int length(expptr l);

4.7 Memory Frames

push_memory_frame();

voidptr stack_alloc(int nbytes);

pop_memory_frame();

4.8 Undo Frames

void push_undo_frame();

voidptr undo_alloc(int nbytes);

void undo_set(voidptr loc, voidptr val);

void exp_from_undo_frame(expptr e);

void clean_undo_frame(expptr e);

4.9 Bootstrapping and File Expansion

file_expressions(char * fname);

mcexpand(char * fname1, char * fname2);

init_fun(fname)

References

[1] A. Bawden. Quasiquotation in lisp. In Proceedings of the 1999 ACM SIG-

PLAN Workshop on Partial Evaluation and Semantics-Based Program Ma-

nipulation, San Antonio, Texas, USA, January 22-23, 1999. Technical report

BRICS-NS-99-1, pages 4–12, 1999.

Manual

Navigation menu

Versions of this User Manual:

Views

Navigation