Manual
User Manual:
Open the PDF directly: View PDF
.
Page Count: 22
MetaC 1.0
David McAllester
September 8, 2018
Abstract
MetaC provides a read-eval-print loop (a REPL) and notebook inter-
active development environment (a NIDE) for C programming. MetaC
can also be described as a Lisp-inspired programming environment for C.
The REPL and the NIDE are capable of incrementally evaluating (com-
piling and running) C statements and incremental procedure definitions
in a persistent C process. This is implemented by incrementally compiling
dynamic load libraries which are then linked into the persistent process.
MetaC also extends C with Lisp-like features. More specifically, MetaC
provides a C-like universal syntax, computed macros, backquote [1], and
pattern-matching. The implementation is bootstrapped — it is imple-
mented in itself. These C extensions are intended to support the de-
velopment of high level frameworks with notebook interfaces as direct
extensions of C. This is in contrast to scripting-C hybrids such as Matlab,
Numpy, TensorFlow or PyTorch.
The NIDE is implemented in Emacs piped to a MetaC REPL. Pre-
sumably notebook IDEs for C or C++ could also be implemented as
extensions of Atom or Visual Studio Code piped to a persistent process.
Dynamic linking is difficult to implement robustly. MetaC release 1.0
runs under both Ubuntu 18.04.1 / gcc 7.3.0 and MAC OSX 10.11.4 / gcc
Apple LLVM version 7.3.0 (clang-703.0.31).
Acknowledgement: I would like to thank Bob Givan for his aid in exercising
and polishing MetaC and his continuing efforts on MathZero.
1
Contents
1 Introduction and Overview 3
1.1 Installation .............................. 3
1.2 HelloWorld.............................. 3
1.3 C Statements, C expressions, and Universal Syntax . . . . . . . . 4
1.4 Definitions of Types and Procedures . . . . . . . . . . . . . . . . 5
1.5 The Global Variable Array Restriction . . . . . . . . . . . . . . . 6
1.6 The Single Symbol Type Restriction . . . . . . . . . . . . . . . . 6
1.7 Backquote and Universal Syntax Expressions . . . . . . . . . . . 7
1.8 PatternMatching........................... 7
1.9 Gensym and Macro Definitions . . . . . . . . . . . . . . . . . . . 9
1.10TheNotebookIDE.......................... 10
2 Universal Syntax 11
2.1 Universal Expressions: cons-car-cdr . . . . . . . . . . . . . . . . . 12
2.2 Universal Expressions: Phantom Brackets . . . . . . . . . . . . . 13
2.3 The Reader Specification . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Backquote and Pattern Matching Revisited . . . . . . . . . . . . 18
3 Expression Interning, Expression Properties and Undo Frames 19
4 The MetaC Primitives 20
2
1 Introduction and Overview
1.1 Installation
To install the REPL:
1. Clone the MetaC repository in a local MetaC directory.
2. From a shell running in your MetaC directory type make MC.
3. From the same shell type MC to get the command line prompt MC>.
To install the NIDE:
1. Clone the repository into a repository directory.
2. From a shell running in your MetaC directory type make NIDE.
3. Add (load "<MetaC directory>/mc.el") to your .emacs file.
4. Edit the first line of mc.el to give the path to your gdb executable (the
Gnu debugger).
5. Edit the second line of mc.el to give the path to your MetaC directory.
6. Start or restart Emacs. The file mc.el adds hooks such that it is safest to
restart Emacs rather than load em.el into a running Emacs.
The NIDE can be run on any file with extension .mc in any directory. To
experiment with the NIDE use Emacs to visit the file NIDE-examples.mc from
the MetaC repository. Type C-M-s (control-meta-s) to start (or restart) the
C kernel. Typing C-c C-c will run the cell containing the cursor and print the
result in a C comment at the bottom of the cell. This is substantially equivalent
to typing the cell contents into the REPL. More Emacs key bindings and other
features of the NIDE are described in section 1.10.
The following examples use the REPL. However, all of these examples can be
run as cells in the NIDE.
1.2 Hello World
bash$ make MC
...
3

bash$ MC
MC> ‘{hello world}
hello world
MC>
In the above example the backquote expression given to the REPL macro-
expands to a C expression which evaluates to the expression “hello world”.
Backquote is described in more detail below.
1.3 C Statements, C expressions, and Universal Syntax
We can also declare global variables and execute statements from the REPL.
MC> int x[10];
done
MC> for(int i = 0; i < 10; i++)x[i] = i;
done
MC> for(int i = 0; i < 10; i++)fprintf(stdout,"%d",x[i]);
0123456789done
MC>int_exp(x[5])
5
MC>
In C syntax one distinguishes C expressions from C statements. A C statement
is executed for effect while a C expression is evaluated for value. The REPL
can be given either a statement or an expression. When given a C statement,
the REPL simply prints “done” after executing the statement. When given a C
expression the REPL computes and prints the value. The REPL assumes that
the value of a C expression is a universal syntax tree (also called an expression,
giving a second, and confusing, sense of the term “expression”). The procedure
int exp above converts an integer to a universal syntax expression. Universal
4
syntax expressions are abstract syntax trees but can viewed as representations
of strings.
If a C statement executes a return outside of any procedure then that value
is returned to the REPL and printed. For example, the above session can be
extended with the following.
MC>{int sum = 0; for(int i = 0; i < 10; i++)sum += x[i]; return int_exp(sum);}
45
MC>
1.4 Definitions of Types and Procedures
One can also define new data types and procedures from the REPL.
MC>typedef struct myexpstruct{
char * label;
struct myexpstruct * car;
struct myexpstruct * cdr;} myexpstruct, *myexp;
done
MC> myexp mycons(char*s,myexp x,myexp y){
myexp cell=malloc(sizeof(myexpstruct));
cell->label=s;
cell->car=x;
cell->cdr=y;
return cell;}
done
MC> expptr myexp_exp(myexp x){
if(x==NULL)return string_atom("nil");
return
‘{${string_atom(x->label)}
${myexp_exp(x->car)}
${myexp_exp(x->cdr)}};
}
done
5
MC> myexp_exp(mycons("foo",mycons("bar",NULL,NULL),NULL))
foo bar nil nil nil
MC>
Procedures can also be redefined at the REPL provided that the signature (argu-
ment types and return type) remains the same. Changing a procedure signature
requires restarting the MetaC REPL. This restricting ensures meaningful C type
checking in the presence of dynamic linking.
1.5 The Global Variable Array Restriction
To simplify dynamaic linking, all global data variables must be arrays. One can
always use single element arrays to represent non-array variables. To make this
more convenient we allow single element arrays to be declared with an initial
value in a manner similar to non-array data variables. For example, we can
declare and assign a variable yas follows.
MC> int y[0] = 2;
done
MC> y[0] += 1;
done
MC> int_exp(y[0])
3
MC>
Here assignments to y[0] are allowed but assignments to yare not — assignment
to array variables are not allowed in C. As noted above, this restriction greatly
simplifies dynamic linking of data variables.
1.6 The Single Symbol Type Restriction
To simplify the implementation of MetaC all type expressions appearing in
procedure signatures and global array declarations must be single symbols. Ar-
bitrary types can be given single symbol names using typedef.
6
1.7 Backquote and Universal Syntax Expressions
We now consider backquote and universal syntax expressions in more detail.
Backquote can be used in a manner analogous to string formatting.
MC> expptr friend[0] = ‘{Bob Givan};
done
MC> int height[0] = 6;
done
MC> ‘{My friend ${friend[0]} is ${int_exp(height[0])} feet tall.}
My friend Bob Givan is 6 feet tall.
MC>
While universal syntax expressions can be viewed as representations of strings,
universal syntax expressions are actually abstract syntax trees. The tree struc-
ture plays an important role in pattern matching. The tree structure is more
apparent in the following example.
MC> expptr e[0] = ‘{a+b};
done
MC> ‘{bar(${e[0]})}
bar(a+b)
MC>
As described in the next section, the pattern foo($x) will match the expression
foo(a+b) with xbound to the expression (tree) a+b.
1.8 Pattern Matching
MetaC provides a pattern matching case construct ucase (universal case). The
general form of the case construct is
7
ucase{e;
{<pattern1>}:{<body1>}
...
{<patternn>}:{<bodyn>}}
Variables are marked in patterns by the prefix $.
MC>int numeralp(expptr x)
{if(!atomp(x))return 0;
char*s=atom_string(x);
for(int i=0;s[i]!=’\0’;i++){if(s[i]<’0’||s[i]>’9’)return 0;}
return 1;
}
done
MC> int value(expptr e)
{ucase
{e;
{$x+$y}:{return value(x)+value(y);}
{$x*$y}:{return value(x)*value(y);}
{($x)}:{return value(x);}
{$z}.(numeralp(z)):{return atoi(atom_string(z));}
}
return 0; //this dead code convinces the compiler there is a return value.
}
done
MC>int_exp(value(‘{5+2*10}))
25
MC>int_exp(value(‘{foo}))
match error: the value
foo
does not match any of
{$z}.(numeralp(z)){($x)}{$x*$y}{$x+$y}
8
MC>
In many situations the choice of the particular tree structure imposed by the
MetaC reader is not important as long as the same conventions are used in
both the patterns and the expressions they are matching. For example, the
expression a+b+c will match the pattern $x+$y+$zindependent of whether + is
left associative or right associative. But the tree structure does matter in other
cases. The pattern $x*$ywill not match the expression a+b*c because the
reader brackets a+b*c as ha+hb∗cii. The pattern $x*$ydoes match (a+b)*c.
The variable $any is special. It acts as a wild card and is not bound to a value.
When a ucase is executed the patterns are tries sequentially and stops after the
first match. If no pattern matches an error is generated. One can always add a
final pattern of {$any}to provide a default catch-all case if that is desired.
In release 1.0 repeated variables, such as in the pattern {$x + $x}, are not
supported (and not checked for). If a variable is repeated it will be matched at
its last occurrences under depth-first left to right traversal.
1.9 Gensym and Macro Definitions
MetaC supports computed macros which can exploit pattern matching, back-
quote pattern instantiation, and other (bootstrapped) high level language fea-
tures. As a very simple example we can define a dolist macro at the REPL.
MC>umacro{mydolist($x,$L){$body}}{
expptr rest=gensym("rest");
return
‘{for(explist$rest=$L; cellp($rest);$rest=cdr($rest);){
expptr$x=car($rest);
$body}};
}
done
MC>macroexpand(‘{mydolist(item,list){f(item);}})
for
(explist _genrest33=list;
cellp(_genrest33);
_genrest33=cdr(_genrest33);
){expptr item=car(_genrest33);f(item);}
9
MC>
The general form of a macro definition is
umacro{<pattern>}{<body>}
where instances of <pattern> in MetaC code are to be replaced by the value
returned by <body> under the variable bindings determined by the match. The
procedure macroexpand takes a universal syntax expression and returns the re-
sult of repeatedly expanding outermost macros until no further macro expansion
is possible. Macro expansion can generate effects as well as return an expansion.
The REPL performs macro expansion on the given expression and also performs
the effects of that expansion. umacro is itself a MetaC macro and we can feed
the above macro definition to the REPL.
The macro expansion of the above macro definition defines a procedure to com-
pute the macro expansion and installs that procedure on a macro property of
the symbol mydolist. In a typical use a macro patterns is a symbol followed by
a sequence of parenthesis expressions., in which case the expansion procedure
is attached to the given symbol, or generalized application expression or a bi-
nary connection expressions. Universal syntax expression types are discussed in
the next section. The procedure for macro expansion is attached to either the
head symbol of the application or the binary operator of the binary connection
expression.
1.10 The Notebook IDE
Starting the kernel with the key C-M-s creates a running C process. When a
cell is executed with the command C-c C-c the text of the cell is (invisibly)
passed to Cprocess by the process is placed inside a comment below the cell. If
a comment already occurs directly below the cell this comment is replaced by a
new comment containing the response of the REPL. As in a Jupyter notebook,
the values returned from the REPL are numbered so that one can see when the
cell was executed relative to other cell execution in the notebook. If C-M-s is
run when a kernel is already running the kernel process is killed and a new one
started (as in a Jupyter notebook). When the kernel is restarted the execution
numbering is reset to start with 1. Changing a procedure signature (argument
and return types) requires restarting the kernel.
Compile-Time Errors. If an error occurs while attempting to read or compile
the cell expression an error statement is displayed — typically the output of the
C compiler — and an error message is printed in the code buffer as the value of
cell.
10

Run Time Errors and Breakpoints. The procedure berror(char *) can
be called from C code when a dynamic safety check fails. MetaC primitives call
berror when safety checks fail. When a berror is called during execution gdb is
entered in a interactive debugging REPL. Giving the continue command to gdb
will abort the cell execution and return to the code buffer with the persistent
C process intact.
The MetaC procedure breakpt(char *) can be used for breakpoints from which
execution can be continued. When this procedure is called gdb will be entered.
Giving the continue command to gdb will continue from the breakpoint.
Segmentation faults are trapped by gdb directly. In this case the NIDE will
move to the gdb REPL for investigating the error. But the normal gdb con-
tinue command will not return to the code buffer. In this case typing print
return to NIDE() will in most cases return to the code buffer with the persis-
tent C process intact. When memory corruption is a possibility it is safest to
restart the C process.
Printing. The procedure mcprint is like fprintf but omitting the file argu-
ment. The expression mcprint("x = %s\n",s) will print to the *Messages*
buffer in Emacs. The expression mcpprint(e)will pretty-print the expression
eto the *Messages* buffer.
Key Bindings. mc.el provides the following key bindings.
C-M-s starts or restarts the persistent C process.
C-c C-c executes the cell containing the cursor.
C-c C-r executes all cells in the current region sequentially with a different execu-
tion number for each cell execution.
C-M-a moves to the begging of the current cell.
C-M-p moves to the beginning of the preceding cell.
C-M-n moves to the beginning of the next cell.
Executing the eLisp expression (MC:strip-cell-values) in a code buffer will
remove the cell values. This is useful before a git commit as it can reduce or
eliminate differences in files between commits.
2 Universal Syntax
It is not obvious how to implement light weight expression quotation and ex-
pression pattern matching for C expressions. The syntax of C is complex. We
11
bypass this complexity by introducing a syntax with the following three “uni-
versal” properties.
1. Universal syntax trees are semantics-free. They are simply trees viewed
as representations of character strings.
2. The universal syntax reader can read any parenthesis-balanced charac-
ter string. Here parenthesis-balanced means that every open parenthesis,
brace or bracket has a matching closing character and that string quota-
tions are properly closed.
3. The printer inverts the reader. If we read a string sinto a universal syntax
expression eand then print eback into a string s0we have that sand s0
are equivalent in a strong sense. Here we want sand s0to be “whitespace
equivalent” so as to be treated equivalently by the C lexer.
The emphasis here is on the representation of strings. The reader does not
always invert the printer — the expression hhone twoithreeiprints as one
two three which reads as hone htwo threeii. But the represented string is
preserved. This is fundamentally different from most programming languages
supporting symbolic computation (such as Lisp) where it is assumed that the
tree structure, rather than the represented string, is fundamental and hence
that the reader should invert the printer.
The above universality properties allow one to assign semantics to universal
syntax expressions based on the strings that they represent. We can assign C
semantics to an expression by printing the expression and passing the resulting
string to a C compiler. This string-based semantics is not always compositional
with respect to the universal syntax bracketing. However, the tree structure
imposed by the reader is designed to approximate the compositional structure
of C syntax. In most cases pattern matching on universal syntax expressions
recovers substructure that is semantically compositional under C semantics.
Parentheses and semicolons can be helpful in aligning universal syntax trees
with C semantics.
2.1 Universal Expressions: cons-car-cdr
A universal syntax expression is either an atom (a wrapper around a string),
a pair he1e2iwhere e1and e2are expressions, or a “parenthesis expression”
the form (e), {e}, or [e] where eis an expression. All three of these datatypes
have the same C type expptr (Expression Pointer). Each has a constructor
procedure, a predicate, and accessor functions. The constructor, predicate and
accessors for atoms, pairs and parenthesis expressions are the following. The
atom data type has the constructor, test and accessor function for atoms are
the following.
12
expptr string_atom(char *);
int atomp(expptr);
char * atom_string(expptr); //the argument must be an atom.
expptr cons(expptr,expptr);
int cellp(expptr);
expptr car(expptr); //the argument must be a cell. returns the first component.
expptr cdr(expptr); //the argument must be a cell. returns the second component.
expptr intern_paren(char,expptr); // the char must be one of ’(’, ’{’ or ’[’
int parenp(expptr);
expptr paren_inside(expptr);
2.2 Universal Expressions: Phantom Brackets
The MetaC reader maps a character string to a universal syntax expression. A
specification of the reader is given in section 2.3. The expression produced by
the reader can be represented by “phantom brackets” around the given string
where there is a pair of brackets for each cons cell. For example the expression
{one two three}reads as {hone htwo threeii}. Examples of strings and the
phantom bracking imposed by the reading those strings is given in figure ref-
fig:reader. The printer simply removes the phantom brackets and prints the
string that the expression represents.
To reduce the clutter of brackets we will adopt two conventions for supress-
ing brackets when exhibiting phantom brackets. The first convention is the
Lisp right-branching convention of not showing all the cell brackets for right-
associative (right-branching) sequences. For example hzero hone htwo threeiii.
will be written as hzero one two threei. Note that these “lists” are atom-
terminated rather than the Lisp convention of nil termination.
The second convention is that we will sometimes write hha1oia2iwhere ois a
connective atom as the left-branching structure ha1o a2i. For example, the ex-
pression {a+b}reads as {hha+ibi}which is then abbreviated as {ha+bi}.
Left-branching for binary connectives gives a C-consistent treatment of semi-
colon as a binary connective while also supporting the interpretation of semi-
colon as a statement terminator. This is discussed in more detail below.
It will generally be clear from context whether we using the Lisp right-branching
list convention or the left-branching connective convention. A set of examples
of strings and the phantom brackets generated by the reader is given in figure 1.
To emphasize the significance of left-branching binary connectives we note that
the MetaC reader bracketing of
{e1;e2;e3;e4}
13
Hello World ⇒hHello Worldi
one two three ⇒hone htwo threeii
=hone two threei
x+y⇒hhx+iyi
=hx+yi
x+y∗z⇒hx+hy∗zii
(x+y)∗z⇒h(hx+yi)∗zi
foo(int x)⇒hfoo (hint xi)i
foo(int x,float y)⇒hfoo (hhint xi,hfloat yii)i
Figure 1: Examples of Reader Bracketings. Bracketings are shown for the
expression that results form reading the given strings. A complete bracketing
shows a pair of brackets for every expression pair (cons cell). The second and
third example show two somewhat informal conventions for dropping some of
the brackets — general sequences are assumed to be right-associative and binary
connective applications are assumed to be left-associative. These conventions
are used in other examples. Expressions are printed without the brackets — the
brackets are “phantoms” that show the tree structure. The bracketing (parsing)
done by the reader is specified formally in section 2.3. Since the bracketing does
not affect the represented string, the precise bracketing is often unimportant.
can be written using either the left-braching binary connective convention as
{he1;he2;he3;e4iii}
or the right-braching sequence convention as
{hhe1;i he2;i he3;ie4i},
both of which abbreviate the same full bracketing
{hhe1;i hhe2;i hhe3;ie4iii}.
2.3 The Reader Specification
The MetaC reader can be described in three phases — preprocessing, lexical-
ization, and parsing.
The MetaC preprocessor removes C-style comments and divides files into seg-
ments where each segment is to be read as a separate expression. A new segment
starts at the beginning of any line whose first character is not a space, tab or
14

close character (right parenthesis, brace or bracket). A file must be parenthesis-
balanced within each segment. For the REPL a segment ends at the first return
character not occurring inside parentheses.
Lexicalization segments a pre-processed character sting into to a sequence of
atoms. The MetaC lexer preserves all non-white characters. For the MetaC
lexer each atom is one of the following.
•Symbols, A symbol is a character string consisting of only alpha-numeric
characters — upper and lower case letters of the alphabet, plus the decimal
numerals, plus underbar. For example foo bar1.
•Connectives. A connective is a character string consisting of only “connec-
tive characters” — the connective characters consist of all non-alphanumeric,
non-white, non-parenthesis, characters other than the three special char-
acters $,‘and \.
•Quoted strings. A character string starting and ending with the same
string quotation character. For example "foo bar","!:&?;" or ’a’.
•Single character atoms for parenthesis characters and the three special
characters ‘,$, and \.
Two strings will be called whitespace-equivalent if they lexicalize to the same
sequence of atoms. Two strings that lexicalize equivalently under the MetaC
lexer will also lexicalize equivalently under the C lexer.
The reader is specified by a the grammar shown in figure 2. To simplify the
specification of the reader we enclose the given string in parentheses so that,
without loss of generality, the input is assumed to be of the form {s}where sis
the lexical sequence to be read. The grammar has a nonterminal Pfor paren-
thesis expression which we take as the top level nonterminal. The nonterminals
SYM and QUOTE range over alpha-numeric string atoms and quoted string
atoms respectively. For each positive integer pwe have a nonterminal CONNp
ranging over connective atoms of precedence p. We have a nonterminal Epfor
expressions formed with connectives of precedence p. We also has a nonterminal
E∞for expressions that are formed at higher precedence than any connective
expression. The precedence levels are divided into a set Lof left-associative
precedence levels and a set Rof right associative precedence levels. There is a
special precedence level 3 for combining expressions with the “null connective”.
The null connective is left-associative. The null connective is higher precedence
that semicolon and comma but lower precedence than all other connectives. The
nonterminal E∞includes an epsilon production allowing null arguments.
The nonterminal Sranges over symbols or variables where Vranges over vari-
ables. Variables are handles specially in pattern matching (ucase) and pattern
15
P::= (Ep)| {Ep} | [Ep]
Ep::= hhE`CONNpiErip∈ R, ` > p, r ≥p
Ep::= hhE`CONNpiErip∈ L, ` ≥p, r > p
E3::= hE`Eri` > 3, r ≥3
E∞::= QUOTE |hS P ∗i|h‘Pi|P||JUNK
S::= SYM |V
V::= h$ SYMi|h$Pi|h\Vi
P∗::= |hP P ∗i
JUNK ::= ‘ |SJUNK
SJUNK ::= $ |\|h\JUNKi
Figure 2: The grammar defining the MetaC reader. The input string is
assumed to be of the form (s) so that endpoints are explicitly labeled. The non-
terminal SYM generates non-empty alpha-numeric strings; CONNpgenerates
non-empty connective strings of precedence p; and QUOTE generates string
quotations. Ris a set of right-associative precedence levels and Lis a set
of left-associative levels. Only the highest precedence level is left-associative.
MetaC classifies all printable ASCII characters as being either alpha-numeric,
connective, string quotations, parenthesis characters or one of the three special
characters ‘, $ or \. The reader will accept any parenthesis-balanced string.
denotes the empty string. Generated cells of the form hw ior h wiare re-
placed by w. Although the grammar is ambiguous, the reader is deterministic
as specified by the constraints that E∞expressions must be maximal and that
the use of the production for E∞must be minimized. See the text for details.
16
instantiation (backquote). Expression of the form hS P ∗iinclude foo,foo(x),
foo(int x){return x;},$x,$f(x), and ${f(x)}.
The above grammar is ambiguous. For example the string {foo (a) (b)}can
be parsed as either the left-branching structure {hhSYM P∗iPi}or the right-
branching structure {hSYM P∗i}. The reader is of course deterministic — {foo
(a) (b)}is read as right-branching. While a deterministic grammar for the
reader can be given, it is simpler to specify a deterministic parsing process for the
above ambiguous grammar. The implementation runs a deterministic left-to-
right shift-reduce process. However, the parser is easier to specify as operating in
global stages. In the first stage one identifies all maximal E∞substrings. The
maximal E∞requirement specifies that {foo (a) (b)}is parsed as a single
(right-branching) E∞expression. However the string {$foo (a) (b)}is read
as the longer E∞expression {hh!fooi h(a) (b)ii. The grammar does have the
property that any given string has a unique segmentation into maximal E∞
expressions each of which has a unique structure.
After identifying maximal E∞expressions we are left with a sequence of argu-
ments (the E∞expressions) and connectives. However, it is possible that this
sequence contains multiple consecutive connectives or multiple consecutive ar-
guments. We then place an argument between any two consecutive connectives
using the production for E∞and also add an term at the beginning of the
sequence if the sequence starts with a connective and an epsilon argument at
the end if the sequence ends in a connective. We now have a sequence of ar-
guments and connectives starting and ending with arguments and where there
are no two consecutive connectives. Next set kto be the largest precedence of
any connective and form all the Eksubstrings. We then repeatedly decrement
kand identify all the Eksubstrings until we have identifies all Epsubstrings
for all p. At this point the entire string must be parsed as Epwhere pis the
smallest precedence of the connectives where we think of adjacent arguments as
having an implicit precedence 3 connetive between them.
This process can parse any parenthesis-balanced string.
The above specification of the reader can result in cells of the form he ior h ei.
The implementation avoids producing such cells and the reader can be viewed
as replacing he ior h eiby eso that all cells are of the form he1e2iwhere e1
and e2are expressions representing non-empty strings. An empty parenthesis
string () is read as a parenthesis expression containing an atom for the empty
string.
Grouped by precedence, the binary connective characters are
{;} {,} {|} {} {&,?,!} {=∼,<,>} {+,-} {*,/} {%, ˆ, .} {:,@,#}.
The precedence of a binary connective is determined by its first character. The
characters have low to high (outer to inner) precedence in the order given with
symbols in the same group having the same precedence. Here represents the
17
null connective at precedence level 3. The precedence can be summarized in
outer to inner order as semicolon, comma, bar, the null connective, Boolean
connectives, binary predicates, three levels of binary functions, and innermost
connectives which can be viewed as providing a way of building structured
atoms. This universal syntax precedence conventions has various divergences
with C syntax. However, when writing a macro package in MetaC one can
simply adopt the MetaC precedence conventions for the source code and use
parentheses where necessary in the generated C code.
2.4 Backquote and Pattern Matching Revisited
The semantics of backquote is defined in terms of cons-car-cdr view of expres-
sions independent of the MetaC reader. In the typical case, the C value of a
backquote expression h‘{e}iis the expression (tree) ewith subexpressions (sub-
trees) of the form h$xi, where xis a symbol, replaced by the C value of xand
subexpressions (subtrees) of the form h${w}ireplaced by the expression which
is the C value of the string represented by the expression w.
Unfortunately this evaluation rule for backquote expressions is incomplete. One
of the most confusing situations is where the expansion of a macro contains a
backquote. Writing such a macro typically involves nested backquotes. While
nested backquotes are confusing, and should be avoided when possible, MetaC
supports nested backquotes. We consider a series of backquote expressions each
of which evaluates to the previous one.
First we have
‘{a+b+ ${z}} (1)
If the value of variable zis the expression cthen the value of expression (1) is
the expression a+b+c. The symbol $can be included in the value of a backquote
expression by quoting it. This gives our second expression.
‘{‘{a+ ${y}+\${z}}} (2)
If the value of variable yis the expression bthen the value expression of (2) is
expression (1). We can even have multiple layers of quotation as in the following.
‘{‘{‘{${x}+\${y}+\\${z}}}} (3)
If the value of variable xis the expression athen the value of expression (3) is
expression (2).
As with backquote, pattern matching is defined in terms of the cons-car-cdr
view of expressions independent of the MetaC reader. We define a substitution
to be a mapping from symbol atoms to expressions. For a substitution σand
18

a pattern {e}we define the expressions σ(e) to be the result of replacing each
subexpression (subtree) of eof the form h$xiwhere xis a symbol atom with
σ(x). A pattern expression (tree) {e}matches an expression (tree) wwith
substitution σif σ(e) = w. An exception to this is the case where a variable
occurs multiple times in the pattern. In release 1.0 multiple occurrences of a
variable in a pattern is not supported.
3 Expression Interning, Expression Properties
and Undo Frames
In MetaC expressions are interned. This means that two expressions which
print as the same strings will be represented by the same data structure with
the same meory address (the same pointer). Hence we have
MC> int_exp(‘{foo(a)} == ‘{foo(a)})
1
Expressions also have property lists. The expression
setprop(e,p,v)
sets the property pof expression eto the value vwhere pand vare also expres-
sions. The expression
getprop(e,p,v)
returns property pof expression eor the expression vif no p-property has been
assigned to e. C coersions can be used to store property values other than
expressions but this raises the standard safety issues of C type coercions. It is
not difficult to define macros supporting properties of different types, but this
has not been done in release 1.0.
Interned expressions with properties provide the functionality of dictionary data
structures such as C++ hashmaps or Python dictionaries. They also facilitate
bottom-up logic programming implementations of dynamic programming infer-
ence algorithms.
Undo Frames. Creating expressions and setting properties allocates memory.
However this allocation is “undone” on exit from an “undo frame”. The state-
ment push undo frame(); enters an undo frame and the expression return from undo(e)
exits the undo frame returning the value of the expression e. All expression al-
location and property setting that occurs in an undo frame is undone on exit
from the frame. The expression value ereturned from the frame is copied into
an ephemeral return memory before the undoing occurs, is then copied into
the parent undo frame after the undoing and that copy in the parent frame is
19
returned as the value of the return expression. This undo architecture facili-
tates the implementation of case-analysis in automated reasoning systems such
as SMT solvers. It also provides a very efficient form of memory management
analogous to, but quite different from, stack allocation.
4 The MetaC Primitives
We now give a list of the MetaC primitives and pre-installed type definitions.
The macros backquote, ucase, and umacro expand to code built on these prim-
itves. We start with type definitions needed for the single atom type restriction
needed for the limiting type parsing abilities of MetaC.
typedef char * charptr;
typedef void * voidptr;
typedef FILE * FILEptr;
Next we have the lowest level interface functions for expressions. The macros
backquote and ucase expand into C code using these procedures.
typedef struct expstruct{...}*expptr;
expptr string_atom(char * s);
int atomp(expptr e);
charptr atom_string(expptr a);
expptr cons(expptr x, expptr y);
int cellp(expptr e);
expptr car(expptr x);
expptr cdr(expptr x);
expptr intern_paren(char openchar, expptr arg); \\ openchar must be one of ’(’, ’{’ or ’[’.
int parenp(expptr e);
20

expptr paren_inside(expptr e);
char constructor(expptr e); //used for paren expressions.
We also have integer-expression conversions. The REPL and NIDE require that
inputs and cells are expression-valued.
expptr int_exp(int i);
int exp_int(expptr s);
Next we have expression properties. The macro umacro expands to code that
sets the macro property of some atom to a procedure pointer. Gensym is also
important in macro definitions.
void setprop(expptr e, expptr key, expptr val);
expptr getprop(expptr e, expptr key, expptr defaultval);
expptr gensym(char * s);
We also have error exceptions and break points.
void berror(charptr s);
void breakpt(charptr s);
We now consider reading and printing. The procedure file expressions takes
a file name and returns a list of the expressions (in universal syntax) contained
in the file. The last two print to the *Messages* buffer in emacs in the NIDE.
The last expressions is a macro.
expptr file_expressions(char *name);
void pprint(expptr e, FILEptr f, int indent); //indent is typically 0
void mcpprint(epptr e);
void mcprint(...)
We have the following two macro-expansion procedures. The first does full
macro expansion — the result is guaranteed not to contain macros. The second
does one step of macro expansion — the result may contain more macros.
21
expptr macroexpand(expptr e);
expptr macroexpand1(expptr e); //this result may contain unexpanded macros
We also have the following list-processing procedures and macros.
expptr append(expptr l1, expptr l2);
expptr reverse(expptr l);
typedef expptr exp_to_exp(expptr);
typedef void exp_to_void(expptr);
expptr mapcar(exp_to_exp f, expptr l);
void mapc(exp_to_void f, expptr l);
int length(expptr l);
Finally we have the procedures related to undo frames.
void push_undo_frame();
void return_from_undo(expptr e);
void undo_set(voidptr loc, voidptr val);
voidptr undo_alloc(int size);
References
[1] A. Bawden. Quasiquotation in lisp. In Proceedings of the 1999 ACM SIG-
PLAN Workshop on Partial Evaluation and Semantics-Based Program Ma-
nipulation, San Antonio, Texas, USA, January 22-23, 1999. Technical report
BRICS-NS-99-1, pages 4–12, 1999.
22