CCGlab Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 13

CCGlab manual
version 3.6.1 July 23, 2018 coloured text means changes from previous version Cem Boz¸sahin
Home: github.com/bozsahin/ccglab Lisboa, Ankara, 2015–2018
1 Introduction
CCGlab is a tool for experimenting with Combinatory Categorial Grammar (CCG; Steedman 1996, 2000,
2012, Clark and Curran 2003, Zettlemoyer and Collins 2005, Steedman and Baldridge 2011, Boz¸sahin 2012).
We write CCGlab grammars almost in paper-style and look at the results in almost paper format.
It is written in COMMON LISP. There is an installer script at home cite of CCGlab to install all software
that is required. It implements all the established combinators of CCG, namely application, composition
and substitution, including their finite powers (quadratic for S; cubic for B), in all their directional variants.
Every rule has a switch for experimentation. CCGlab also implements subcomposition and its directional
variants up to cubic power for further experimentation (variously called Dand Oby Hoyt and Baldridge
2008, Boz¸sahin 2012 respectively). Hereafter we assume intermediate familiarity with CCG.
CCGlab is designed with linguists, cognitive scientists and computational linguists in mind. Raw input
to the system is lexical specifications and lexical rules, much like in Steedman papers. Unlike unification-
inspired systems, the logical form (LF) is associated with the overall syntactic category, and the underlying
lambda-calculus is visible at all times. Wide-coverage parsing with CCGlab models is a long-term goal.
The examples below show raw input to CCGlab.1The first three are lexical items, and the last one is a
unary rule. Order of specifications and whitespacing are not important except for unary rules; they apply in
order. They do not apply to same category twice. Later unary rules can see the results of earlier ones.
John n := np[agr=3s] : !john ;
likes v := (s\np[agr=3s])/^np : \x\y. !like x y;
and x := (@X\*@X)/*@X : \p\q\x. !and (p x)(q x);
(L1) np[agr=?x] : lf --> s/(s\np[agr=?x]) : \p\lf. p lf ;
Unification plays no role in CCGlab because we want to see how far grammatical inference can be carried
out by combinators. Features are checked for atomic category consistency only, and no feature is passed
non-locally. All features are simple, feature variables are for values only, they are local to a category, and
they are atomic-valued. For example, if we have the sequent (a) below, we get the result (b) by composition,
not (c). We could in principle compile all finite feature values for any basic category in a grammar and get
rid of its features (but nobody does that).
(a) s[f1=?x1,f2=v2]/s[f1=?x1] s[f1=v1,f2=?x2]/np[f2=?x2]
(b) s[f1=v1,f2=v2]/np[f2=?x2]
(c) s[f1=v1,f2=v2]/np[f2=v2]
Because type-raising is unary, it is left to unary rules, or to lexical generalizations. Lexical rules are unary
rules. Slash modalities are implemented too (see Baldridge 2002, Steedman and Baldridge 2011).
Meta-categories such as (X\X)/Xare allowed with application only, which maintains a very important
property of CCG: it is procedurally neutral (Pareschi and Steedman, 1987). Given two substantive categories
and CCGs understanding of universal rules, there is only one way to combine them, so that the parser can
eschew other rules for that sequent if it succeeds in one rule. For this reason, ‘X$’ categories have not been
incorporated. They require structured unification and jeopardize procedural neutralism.
Singleton categories are now supported by CCGlab; see Boz¸sahin and Güven (2018) for their syntax and
semantics. They allow us to continue to avoid wrapping in verb-particle constructions, and render idiomat-
ically combining expressions and phrasal idioms as simple categorial possibilities in CCG. For example, in
CCGlab notation, the categories for heads of these expressions are respectively:
(a) picked := (s\np)/"up"/np : \x\y\z.!pick _ y x z;
(b) kicked := (s\np)/"the bucket" : \y\z.!die _ y z;
(c) spilled := (s\np)/np[h=beans,spec=p] : \y\z.!reveal _ y !secret z;
where "up" and "the bucket" are singleton categories (i.e. surface strings turned into singleton category,
with constant value), and np[h=beans,spec=p] in (c) means this NP must be headed by beans, which
1The mapping of modalities in CCGlab to those in papers is .7→ .^7→  *7→ ?+7→ ×.
2
makes it a special subcategorization, by spec=p, meaning +special. These are not built-in features (in fact
there is no such thing in CCGlab). The underscore is just a reminder that what follows is not an argument of
the predicate; it is an event modality. This is taken care of here by a notational convention.
Below is the output for the Latin phrase for Balbus is building the wall. The first column is the rule index.
The rest is about the current step: a lexical assumption, result of a lexical rule, or one step of a derivation.
Derivation 1
-----------------
LEX MUR := N : WALL
LEX UM := (S/(S\NP))\N : (LAM X (LAM P (P X)))
< (MUR)(UM) := S/(S\NP) : ((LAM X (LAM P (P X))) WALL)
LEX BALB := N : BALB
LEX US := (S/(S\NP))\N : (LAM X (LAM P (P X)))
< (BALB)(US) := S/(S\NP) : ((LAM X (LAM P (P X))) BALB)
LEX AEDIFICAT := (S\NP)\NP : (LAM X (LAM Y ((BUILD X) Y)))
>Bx (BALB US)(AEDIFICAT) := S\NP : (LAM X
(((LAM X (LAM P (P X))) BALB)
((LAM X (LAM Y ((BUILD X) Y))) X)))
> (MUR UM)(BALB US AEDIFICAT) := S : (((LAM X (LAM P (P X))) WALL)
(LAM X
(((LAM X (LAM P (P X))) BALB)
((LAM X (LAM Y ((BUILD X) Y)))
X))))
Final LF, normal-order evaluated: ((BUILD WALL) BALB)
The inductive component prints associated parameter-weighted local counts of features and the final count,
for verification. Every lexical entry, word or lexical rule, is assigned a parameter. In the example below, all
parameters were set to unity so that weighted counting to be described in (2) below can be seen easily. The
only feature is the number of times lexical items are used in a derivation. 18 is the total here.
LEX 1.0 MUR := N : WALL
LEX 1.0 UM := (S/(S\NP))\N : (LAM X (LAM P (P X)))
< 2.0 (MUR)(UM) := S/(S\NP) : ((LAM X (LAM P (P X))) WALL)
LEX 1.0 BALB := N : BALB
LEX 1.0 US := (S/(S\NP))\N : (LAM X (LAM P (P X)))
< 2.0 (BALB)(US) := S/(S\NP) : ((LAM X (LAM P (P X))) BALB)
>O 4.0 (MUR UM)(BALB US) := S/((S\NP)\NP) : (LAM H
(((LAM X (LAM P (P X)))
WALL)
(LAM X
(((LAM X (LAM P (P X)))
BALB)
(H X)))))
LEX 1.0 AEDIFICAT := (S\NP)\NP : (LAM X (LAM Y ((BUILD X) Y)))
> 18.0 (MUR UM BALB US)(AEDIFICAT) := S : ((LAM H
(((LAM X (LAM P (P X)))
WALL)
(LAM X
(((LAM X (LAM P (P X)))
BALB)
(H X)))))
(LAM X
(LAM Y ((BUILD X) Y))))
Normalized LF: ((BUILD WALL) BALB)
The deductive component computes all the constituents and their LFs that are consequences of lexical as-
sumptions. The inductive component computes (i) the most likely LF for a given string, (ii) the most proba-
ble derivation for that LF, and (iii) the highest-weighted derivation for any LF. We assume that the inductive
component is supplied with lexical parameters after parameter estimation. (The deductive component starts
with the assumption that they are unity.)
The algorithm for the basic inductive component is more or less standard in Probabilistic CCG (PCCG).
The one we use is summarized from Zettlemoyer and Collins (2005) throughout the manual:
argmax
L
P(L|S;¯
θ) = argmax
L
D
P(L,D|S;¯
θ)(1)
where Sis the sequence of words to be parsed, Lis a logical form for it, Dis a sequence of CCG derivations for
the (L,S)pair, and ¯
θis the n-dimensional parameter vector for a grammar of size n(the total number of lex-
ical items and rules). The term on the right-hand side is induced from the following relation of probabilities
3
and parameters in PCCG (ibid.);2where ¯
fis a vector of 3-argument functions < f1(L,D,S),··· fn(L,D,S)>:
P(L,D|S;¯
θ) = e¯
f(L,D,S)·¯
θ
L
D
e¯
f(L,D,S)·¯
θ(2)
The functions of ¯
fcount local substructure in D. By default, fiis the number of times the lexical element i
(item or rule) is used in D, sometimes called the feature i. If you want to count other substructures in Dor L,
as in Clark and Curran (2003), you need to write some code. A plug-in is provided. That was one motivation
for giving detailed specs. There is a growing body of literature on the topic, starting with Clark and Curran.
You can think of CCGlab as Lisp code in three layers, Paul Graham 1994-style: (i) the representational
layer, which is based on (name value) pairs and lists of such pairs, so on; (ii) the parsing layer, which
is based on hash tables and the representation layer; and (iii) the post-parsing layer, which is based on λ-
calculus and the parsing layer, which is used for checking LF equivalence.3Combinators are Lisp macros
on the last layer. Because our lambda layer has nothing to do with Lisp’s lambdas (the internals of LFs
are always visible), you can use the top layer as a debugging tool for your LF assumptions. It has two LF
evaluators: normal-order and applicative-order. If they do not return the same LF on the same parse result,
then there is something strange about your LF.4
2CCGlab projects
We suggest that you create a separate directory for each project to keep things clean. CCGlab will need
around 6–8 files in the working directory to set up a CCG grammar and a model.
By a grammar we mean a set of CCG assumptions like the one above which you want to subject to CCGs
universal combinatorics. By a model we mean a version of the grammar which you’ve subjected to empirical
training and parameter estimation, results of which you want to subject to CCGs universal combinatorics.
They have the same representation in CCGlab.
A project with name Pconsists of the following files (we explain their format in §4):
P.ccg : The CCG grammar as text file, much like in papers. You write this one.
P.lisptokens : The version of the text file suited for the Lisp reader, and wrapped in (..). The sed
script called tokens does this by taking into account Lisp’s hypersensitivity to special symbols such
as ‘(’, ‘)’, ‘\’ etc., and CCGlabs stubbornness that e.g. :=’ is one token, not two. They are wrapped
in vertical bars, e.g. |:=|. Keep in mind that Lisp has its own tokenizer and whitespace preprocessor,
so this step is crucial to turn your grammar into Lisp code in the way you intended, rather than Lisp.
Your grammar becomes a Lisp object at one fell swoop because of this wrapping.
P.ded : Lisp translation of the input grammar P.lisptokens, as input to deduction. The deducer
just closes universal syntax-semantics of CCG on your lexical assumptions.
P.ind : The model-trained version of the grammar, as input to induction and parse ranking. It has
the same format as P.ded. This is possible because the deducer already knows how to compute the
parameter-weighted sums of features in a derivation. We assume that model training and parameter
estimation set the parameters right.
P.sup : The supervision file. This is the training data for PCCG parameter estimation. It consists of
sentence-LF pairs. Syntax and derivations are hidden variables in PCCG, and training is done solely
on sentence-LF pairs.
P.supervision : A simpler way to create the .sup file. Content is semicolon-separated sequence of
training pairs such as below. LF is much easier to specify than .sup.CCGlab will do the conversion.
words:lf;
P.lisp : Optional. Project-specific code developed by you. For example, your plug-in for feature
counts can go in here if you have one. In fact you can call this file anything you like as long as you
know what to load into Lisp. A standard name is recommended for others to keep track of your work.
2You may be alarmed by the exponentiation in formula (2) potentially causing floating-point overflow, and may worry about what
your value range should be for θito avoid that. We recommend starting with 0 <θi1. Keep also in mind that θiare not probabilities
but weights. They can be negative. Formula (2) takes care of any weight.
3This layer is post-parsing in the sense that although parsing builds LFs, it does not reduce them till the very end. So unevaluated
LFs of CCGlab are available for exploration.
4It doesn’t follow that your LF is correct if both evaluations return the same result. If it did, we wouldn’t need empirical sciences
like linguistics and cognitive science. Your categories, and derivations with them, can tell you more.
4
P.testsuite.lisp : Optional. I put my test sentences and their output display code in such files to
keep things tidy.
3 Workflows
There are basically two workflows. Model development cannot be fully automated, so it’s harder.5In the
cases below, the Lisp parts must be done after ccglab command. The README file explains how.
3.1 Grammar development and testing
If you start with a linguistically-motivated grammar, you’d probably take the following steps, assuming your
project’s name is example:
1. In your working directory, write your grammar and save it as plain text e.g. example.ccg
2. In CCGlab do e.g. (load-grammar "example" :maker ’sbcl).
This means Lisp tokens will be generated by sed from within Lisp, called in this case by SBCL Lisp.
This step prepares the example.ded file and loads it onto Lisp.
If you omit the “:maker ’sbcl” part, it will be assumed that .lisptokens file is already fresh,
and sed is not called.
3. Do: (ccg-deduce ’(a sequence of words)) to run the CKY parser.
4. Do: (cky-show-deduction) to see all the results.
You can also do (cky-pprint) to see the CKY table. It prints all the detail.
If there were errors in your grammar file, step 2 would fail to load the grammar, and you’d need to go back
to editing to fix it. Partially-generated example.ded will tell you where the first error was found. If you
change the example.ccg file, go back to step 2. Other functionalities of CCGlab are explained in §5.
3.2 Model testing
At some point, you may turn to modeling. It usually means taking the example.ded file and saving it as
example.ind first, then adjusting its parameters (because they were set to default in .ded) by parameter
estimation, and playing with its categories.
Model training is not easy to reduce to a standard workflow because it depends on what your model is
intended to do (whereas we all know what a grammar is supposed to do—get to semantics). This process is
up to you. CCGlab helps with the basics (lexical parameters) to compute (1–2) and other formulae like (3–6)
below. A plug-in called (plugin-count-more-substructure <resultcell>) is provided, where CKY
result cells and their derivation sequences are at your disposal.
In the end, you can create an example.ind file in the same format that you started. This means that
every lexical entry (lexical item or lexical rule) is associated with a parameter. This is the minimum. If
you have more parameters, you must write some code above the minimal machinery provided by CCGlab to
change induction. The model testing workflow is:
1. In CCGlab, do: (load-model "example").
This will load the grammar in example.ind with its parameter set.
2. Do: (ccg-induce ’(a sequence of words)) to CKY-parse and rank the parses.
3. Do: (cky-show-induction) to see three results for the sentence: (i) most likely LF for it, (ii) its
most likely derivation, and (iii) most likely derivation for any LF for the sentence. You can also do
(cky-pprint) to see the CKY table. It prints all the detail. If you like, do (cky-show-deduction)
to see all the results. It does not recompute anything.
There is an on/off switch to control what to do with out of vocabulary (OOV) items. If you turn it on
(see Table 5), it will create two lexical entries with categories X\?Xand X/
?Xfor every unknown item so that
the rest can be parsed along with the unknown items as much as it is possible with application. This much
knowledge-poor strategy is automated.
Their LFs are the same: λp.unknown0p. In training it seems best to keep the switch off so that OOV
items are complained about by CCGlab for model debugging; in testing wide-coverage parsers might opt to
switch it on.
5Grammar development cannot be fully automated either, but that’s another story. Beware of claims to the contrary, sometimes
emanating from CL/NLP/logic circles. This is good for business, for us grammarians.
5
3.3 Model development: parameter estimation
Parameters of an .ind file can be re-estimated from training data of (Li,Si)pairs where Liis the logical form
associated with sentence Si. The log-likelihood of the training data of size nis:
O(¯
θ) =
n
i=1
log P(Li|Si;¯
θ) =
n
i=1
log (
T
P(Li,T|Si;¯
θ)) (3)
Notice that syntax is marginalized by summing over all derivations Tof (Li,Si).
For individual parameters we look at the partial derivative of (3) with respect to parameter θj. The local
gradient of θjwith feature fjfor the training pair (Li,Si)is the difference of two expected values:
Oi
∂ θ j
=Efj(Li,T,Si)Efj(L,T,Si)(4)
The gradient will be negative if feature fjcontributes more to any parse than it does for the correct parses
of (Li,Si). It will be zero if all parses are correct, and positive otherwise. Expected values of fjare therefore
calculated under the distributions P(T|Si,Li;¯
θ)and P(L,T|Si;¯
θ). For the overall training set, using sums,
the partial derivative is:
O
∂ θ j
=
n
i=1
T
fj(Li,T,Si)P(T|Si,Li;¯
θ)
n
i=1
L
T
fj(L,T,Si)P(L,T|Si;¯
θ)(5)
Once we have the derivative, we use stochastic gradient ascent to re-estimate the parameters:
Initialize ¯
θto some value. (6)
for k=0···N1
for i=1···n
¯
θ=¯
θ+α0
1+c(i+kn)
log P(Li|Si;¯
θ)
¯
θ
where Nis the number of passes over the training set, nis the training set size, and α0and care learning-rate
parameters (learning rate and learning rate rate). The function update-model computes (6) by taking these
as arguments. We use the inside-outside algorithm, that is, non-zero counts are found before the loop above,
and the rest is eschewed. Both formulae can be beam-searched to make large models with long training
feasible. You can turn it off to see the complete update of the gradient—prepare for a long wait in a large
model.
This is gradient ascent, so initialize ¯
θaccordingly. You can standardize them as z-scores, if you like.
The partial derivative in (6) is Oi
¯
θ, for the training pair i, i.e. without the outermost sums in (5). It is what
update-model computes first, then (6).
An example workflow is provided for the ‘corner’ project in the package. The steps were:
1. Write corner.ccg and create corner.ded.
2. Copy corner.ded to corner.ind.
3. Initialize the parameters in corner.ind.
4. Design the training set corner.supervision, from which we generate corner.sup.
5. (update-model "corner" 10 1.0 1.0 :load t :verbose t :debug t). Notice that the train-
ing set size is not an argument to the function since it can be found from corner.sup.
6. (show-training)
7. (save-training "corner.ind.re-estimated"). The trained model is now in the output file.
You can go back to workflow in §3.2 to parse and rank the parses with the trained model if you
rename the output corner.ind.
4CCGlab representations
4.1 .ccg format for grammars
This file defines lexical items and lexical rules in Steedman style, with the following amendments (stylistic
ones are marked with ‘–’, and the grammatical ones with ‘?’):
6
– ‘;’ is the terminator of a lexical specification. It is required. Each spec must start on a new line.
– ‘-->’ is the lexical rule marker.
?Lexical rules take an LF on the left in one fell swoop and do something with it on the right. That
means you have to have a lambda for the same name on the right to make it substitutable. Here is an
example (type raising by a lexical rule):
(L1) np[agr=?x] : lf --> s/(s\np[agr=?x]) : \p\lf. p lf ;
Here is another one (verb raising to take adjuncts as arguments):
(d-shift) vp : lf --> vp/(vp\vp) : \q\lf. q lf;
Because there is no structured unification in CCGlab, it is a forced assumption to have just one thing
on the left. Why not structured unification? Because we want to see how far combinators can be put
to linguistic work and nothing else.
?A part-of-speech tag comes before ‘:=’. Its value is up to you. (This is the only way CCG can tell
whether e.g (S\NP)/(S\NP) can be a verb—say ‘want’—rather than an adjunct, which is crucial for
type-raising.)
?Special categories are pseudo-atomic. They start with @, e.g. @X. They must be lexically headed, and
they must be across the board in a category. For example, and := (@X\*@X)/*@X:.. is fine but
so := (@X/*@Y)/*(@X/*@Y):.. is not. And := (S\*NP)/*@X:.. is bad too. They do not have
extra features apart from what is inside the @Xs, which are imposed in term match. We therefore
eliminate the need for structured unification.
?If you use an @X category in a lexical rule, it will be treated as an ordinary category.
Non-variables in LF must be prefixally exclaimed to avoid substitution/application by Lisp. Write hit0
as !hit. It will be converted to the Lisp string constant "hit" by a Lisp reader macro.
?Avoid names for LF variables that start with the special symbol ‘&’. Combinators use it. The only
exception is the identity combinator, &i, which you may need in an LF when a functor subcategorizes
for a type-raised argument rather than the argument itself.6The ‘tokens’ script converts your &i to
(lam x x); see §4.4.
?The double slash is implemented. In X//Y Y and Y X\\Y,Ymust be lexical to succeed. The modality
of \\ and // is always application only. The result Xis assumed to be lexical too.
Phonological strings that span more than one word must be double-quoted. You must use them as such
in your parse string as well. The contents of a string are not capitalized by the Lisp reader whereas
everything else is made case-insensitive, which must be kept in mind. We assume that there are no
special symbols in the string. One or two punctuations (comma, period, exclamation or question mark)
are allowed.
Features need names. The basic categories in Sfin/(Sfin\NP3s)could be s[type=fin] and np[agr=3s].
Order of features is not important if you have more than one. They must be comma-separated.
Capitalization is not important for names, unless they are in a string constant. This is also true of
atomic categories, feature names, and values. NP is same as np. Lisp does it, not CCGlab.
Comments start with ‘%’. The rest of the line is ignored.
?Because CCGlab knows nothing about non-lambda internal abstractions such as the xin the logical
form \p\q.(!forall x)(!implies (p x)(q x)), it cannot alpha-normalize them to rename xto
something unique. This opens ways to accidental variable capture if some CCGlab combinator hap-
pens to abstract over the same variable, say λx.f(gx)for composition. We wouldn’t want this xto re-
place the LF xabove. If you intend to reduce (q x) to substitute for some lambda in qvia x, you must
abstract over it to ensure alpha-normalization; say \p\q\x.(!forall x)(!implies (p x)(q x)).
Assuming that xis not a syntactic variable but p,q are, it will substitute the variable in p,q and keep
the (!forall x). If this is not what is intended, then use a naming convention for non-lambda
variables which will not clash with CCGlab variables. Doubling the name as xx is my convention,
e.g. \p\q.(!forall xx)(!implies (p xx)(q xx). Prefixing or postfixing the variable with an
underscore is also a safe convention. Prefixing it with &is not. Combinators use this convention.
6An example of this formal need can be given as follows. Suppose that we want to subcategorize for a type-raised NP, e.g. f:=
S/(S/(S\NP)):λp.f0(p(λx.x)). Type-raised arguments are universally λp.pa0, so an argument could be e.g. a:= S/(S\NP) :λp.p a0.
Application of fto awould be odd if we didn’t have λx.xinside the LF of f, because fseems to be claiming—by its syntactic
category—that its predicate-argument structure is f0a0, not f0(λp.pa0). Neutralization of type-raising by a head is a forced move in
theories like CCG because some languages seem to subcategorize for type-raised arguments syntactically, eg. Dyirbal, Turkish, and
Inuit, where there is no determiner to value-raise an argument NP modified by a relative clause. Dyirbal’s absolutive relative marker
bears the category -nu := ((S/(S\NPabs))\(S/(S\NPabs))) \(S\NPabs), rather than -nu := (N\N)\(S\NPabs).
7
The rules for lexical specifications are given in Table 1. They are used by the LALR parser, which converts
the .lisptokens version of your textual grammar to .ded. Notice that lambdas can be grouped together, or
written separately. Both λxλy.hit0xy and λx.λy.hit0xy are fine, and mean the same thing. As standard, CCG
slashes and sequencing in the body of an LF are left-associative, and lambda binding is right-associative.
All LF bodies will be curried internally. For example, you can write ‘\x\y\z.!give x y z’ in the
.ccg file for convenience. It will be converted to ‘\x\y\z.((!give x) y)z’. Beta normalizer wants that.
4.2 .ded and .ind format for Lisp-internal grammar
These files are lists of (name value) pair lists bound to the global variable called *ccg-grammar*. We
chose such pairs because Lisp’s usual choice for such simple mappings, association lists, are difficult to
read.7Table 2 describes the format. For CCGlab, the order is not important in these lists—it does matter to
Lisp—because the values are unique.
4.3 .sup and .supervision files for model building and training
These are the supervision files for training. The .sup is the native format, which is difficult to type because
LFs must be curried. You can get it from the .supervision file which does that for you, which has the
syntax of line-separated specs of the form
data : lf ;
where each lf has the same format as in .ccg file. Take a look at make-supervision function in Table 5.
Native input to the trainer is a list of lists, where each member has first a list of whitespace-separated tokens
in the data, and second, a list which is the expected LF for the data, in the format described next.
The repository github.com/bozsahin/ccglab-models has lots of examples to copy from.
4.4 The logical form’s form
All LFs are curried. Lambda is not Lisp’s ‘lambda’, to avoid confusion. Formally, the mapping from
λ-calculus to CCGlabs LFs is:8
xor cLisp symbol or constant
(e e)(e e)
λx.e(lam x e)
You can see from Table 1’s non-terminal called ‘lf ’ that lambdas of CCGlab LFs are grouped to the left, the
only exception being &i (see footnote 6). So CCGlabs LFs are a subset of lambda terms defined above.
They correspond to supercombinators, which epitomize argument-taking; see Boz¸sahin (2012).
4.5 Parser’s representations
All parser objects are represented as hash tables. COMMON LISPs hash tables are expressive devices. They
do not support hash collision or chaining (we like them this way), and the keys can be anything, even a
function or another hash table. We harness this property. Unlike association lists, access to a key’s value
does not involve search, e.g. formula (1) is computed without CKY search because beta-normalized LF is
the key for that formula’s table. When you have Catalan number of logical forms to check, you’ll appreciate
this property. (We employ alpha-equivalence of lambda-calculus in counting different LFs.)
There are five kinds of hash tables in CCGlab. Table 3 shows example snapshots during parsing. Their
keys and structure are explained in Table 4.
5 Top-level functions
The basic parsing functions were explained in §3. Others are quite useful for grammar development and
testing. A more complete list is provided in Table 5. The code includes commentary to make use of them
7In Lisp terminology an association list is (name.value), rather than (name value). It is easy to feel annoyed when there are
too many of these ‘. to look at. Since we do sequential search only during pre-parsing, efficiency is not the concern here; legibility is.
Rest returns the value of an association pair, whereas second returns the value of a name-value pair.
8If you really think COMMON LISP has λ-calculus built-in and this is redundant, think again. Try (((lambda (x)(lambda
(y)(list x y))) ’a) ’b). You will get an error message! We don’t live in John McCarthy world anymore because of Lisp-2. It
has lambdas but not lambda-calculus. Now try (beta-normalize-outer ’(((lam x (lam y (x y))) ’a) ’b)).
8
and others. All CCGlab functions are callable if you know what you’re doing.
6CCGlab in the large
Some comments on public COMMON LISPs for their suitability for large-scale development, and the beam.
So far SBCL has proven itself for that task. First, it compiles all Lisp code by default. More importantly,
although CCL has a dynamically growing heap, its implementation of sort is very slow compared to SBCL.
Sort is essential to the beam, which is set to a default, which you can change. The formula is nbwhere
0b1. The nhere is the number of parses of the current input. For example b=.9 eliminates very few
analyses if nis small, large amounts when it’s large. Before you play with the beam system (bvalue and its
on/off switch), I’d suggest you experiment with learning parameters N,n,α0,cin (6).
One easy way to get the best of both worlds of fast sort and big heap is to reset the CCGLAB_LISP variable
after install. If you do the following before running CCGlab, it will set the heap to 6GB in SBCL.
export CCGLAB_LISP=’/usr/bin/sbcl --dynamic-space-size 6000’
CCL won’t give you trouble in long training sessions; it won’t go out of memory. You have to check whether
it is doing useful work rather than garbage-collecting or thrashing. Some control over SBCL is provided.
SBCL gives you two options if you use too much memory: (i) recompile SBCL or (ii) increase maximum
memory maps with e.g. ’sudo echo 262144 > /proc/sys/vm/max_map_count’. The second option
seems to work well without recompile.9This is the number of maps, not memory size. CLISP is not fully
ANSI. Non-ANSI Common Lisps are not compatible with CCGlab.
One way to avoid excessive garbage collection is increasing its cycle limit, which is 51MB by default in
SBCL. As usual, making it too much may be counterproductive.
A very rough estimate for training time with defaults is O(GSN)miliseconds; the constant here seems
to be 0 <c1; Gis the number of lexical items in the model; Sis the number of supervision pairs; and, N
is the number of passes in the stochastic gradient. This is still quite a range; we will try and provide a better
time estimate with less than order of magnitude error one day. In any case you need all the help you can get.
One is to use nohup,which runs ccglab code immune to hangups.
The bash script ccglab.nohup.sbcl which is provided in the repository addresses these concerns. It is
a 7-argument monster to work completely offline and standalone; please study the script in detail. It assumes
that constraints not within control of SBCL are already handled, such as /proc/sys/vm/max_map_count
above.
You can work with many Lisps for CCGlab at the same time by setting the shell variable CCGLAB_LISP
to different Lisp binaries in a shell. The default is set by you during install.
The hash tables for the CKY parser can grow very big. To work with very long sentences without
rehashing all the time, change the variable *hash-data-size* to a bigger value in ccg.lisp source.
Tables of these sizes (two tables: one for CKY parses, one for different LFs in argmax calculation) are
created once, during the first load, and cleared before every parse. The default size is 65,536.
There is a path language for nested hashtables, i.e. hashtables which take hashtables as values of features.
Rather than cascaded gethash calls, you can use the machash macro. See Table 5. By design, CKY hash
tables are doubled in size when full. The logic here is that, if the sentence is long enough to overflow a big
hash table, chances are that the table is going to grow just as fast, and we don’t want to keep rehashing in
long training sessions.
Because of exponentiation in formula (2) you have to watch out for floating point overflow. If the
parameters become too large, which may happen if you run many training sessions on the same grammar,
you can z-score the entire grammar as explained in Table 5. In a z-scored grammar all parameters are fraction
of the standard deviation distant from the mean. Z-scoring is better than normalization because it maintains
the data distribution’s properties like mean and variance. Don’t forget to save that updated model.
There are two companion repositories to CCGlab. They are separated from tool development for eas-
ier updates. Github.com/bozsahin/ccglab-grammars is for grammars which do not necessarily have a
probabilistic component, and github.com/bozsahin/ccglab-models is for grammars which are turned
into probabilistic models of parse ranking. Both are public repositories; please feel free to contribute.
CCGlab is GPL licensed public software and it will stay free; so you can modify and use it as part of a
software system, but you cannot copyright CCGlab; you must pass on the GPL license as is.
9If you get permission errors even with sudo try this: ’echo 262144 | sudo tee /proc/sys/vm/max_map_count’.
9
Acknowledgments
Translations from .ccg to .ded formats and from .supervision to .sup are made possible by Mark
Johnson’s LALR parser. Translation of the LFs of paper-style representations in .ccg to a self-contained
lambda-calculus processor is based on Alessandro Cimatti’s abstract data structure implementation. Thanks
to both gentlemen, and to Luke Zettlemoyer for help with the PCCG paper. Without the lambda translation,
you’d be at the mercy of a particular Lisp implementation of lambdas and closures to get the formula (1)
right, or to verify derived LFs. Lisp implementations can represent closures differently, if you can believe
that. I also thank Lisp community and stackexchange for answering questions before I ask them.
I am grateful to my university at Ankara, ODTÜ; to Cognitive Science Department of the Informatics
Institute, for allowing me to work on this project; to Turkish TÜB˙
ITAK for a sabbatical grant (number
1059B191500737), which provided the financial support for a research of which this work is a part; and
to the ANAGRAMA group of the CLUL lab of University of Lisbon, in particular to Amália Mendes, for
hosting me in a friendly and relaxed academic environment, all of which made CCGlab possible.
References
Baldridge, Jason. 2002. Lexically Specified Derivational Control in Combinatory Categorial Grammar.
Doctoral Dissertation, University of Edinburgh.
Boz¸sahin, Cem. 2012. Combinatory Linguistics. Berlin: De Gruyter Mouton.
Boz¸sahin, Cem, and Arzu Burcu Güven. 2018. Paracompositionality, MWEs, and argument substitution. In
23rd Formal Grammar Conference. Sofia, Bulgaria.
Clark, Stephen, and James R. Curran. 2003. Log-linear models for wide-coverage CCG parsing. In Pro-
ceedings of the Conference on Empirical Methods in Natural Language Processing, 97–104. Sapporo,
Japan.
Graham, Paul. 1994. On Lisp. Englewood Cliffs, NJ: Prentice Hall.
Hoyt, Frederick M., and Jason Baldridge. 2008. A logical basis for the Dcombinator and normal form in
CCG. In Proc. of the Annual Meeting of the ACL. Columbus, OH.
Pareschi, Remo, and Mark Steedman. 1987. A lazy way to chart-parse with categorial grammars. In Pro-
ceedings of the 25th Annual Meeting of the ACL, 81–88.
Steedman, Mark. 1996. Surface Structure and Interpretation. Cambridge, MA: MIT Press.
Steedman, Mark. 2000. The Syntactic Process. Cambridge, MA: MIT Press.
Steedman, Mark. 2012. Taking Scope. Cambridge, MA: MIT Press.
Steedman, Mark, and Jason Baldridge. 2011. Combinatory Categorial Grammar. In Non-transformational
syntax. eds.R. Borsley and Kersti Börjars, 181–224. Oxford: Blackwell.
Zettlemoyer, Luke, and Michael Collins. 2005. Learning to map sentences to logical form: Structured
classification with probabilistic categorial grammars. In Proc. of the 21st Conf. on Uncertainty in Artificial
Intelligence. Edinburgh.
TABLES 10
Table 1: .ccg format specification.
special cases: T: Any token @A: special category ?A: value variable
!A: LF constant Default modality: . % : start of comment
start start lex ;|lex ;eqns eqns ,eqn |eqn
lex T mtag :=cat |lrule eqn T=T
lrule (T)cat --> cat dir \|/
mtag T mod .|^|*|+
cat sync :l f l f bodys |lterm
lterm \T.lbody
sync basic |parentd |sync slash syn lbody lterm |bodys
syn basic |parentd bodys bodys body |body
basic T f eats body (bodys)
parentd (sync)body T
slash dir mod |dir|ddir ddir \\ |//
f eats [eqns]|ε
Table 2: .ded and .ind feature names in name-value pairs lists.
BCAT: basic category FEATS: basic category features as name-value pairs list
DIR: directionality of a complex category MODAL: slash modality (ALL CROSS HARMONIC STAR)
lex item features lex rule features
PHON phonological form INSYN input category
MORPH pos tag INSEM input lf
SYN syntactic type OUTSYN output category
SEM logical form OUTSEM output lf
PARAM parameter value PARAM parameter value
TABLES 11
Table 3: (a): CKY representation for: hits v := (s\+ np[agr=3s])/np : \x\y. !hit x y;’ during
(ccg-deduce ’(hits)). A ‘#’ means the value is a hash table. It is not part of the name. (b): ARGMAX rep-
resentation for: hits v := (s\+ np[agr=3s])/np : \x\y. !hit x y; after (ccg-deduce ’(hits)). LF
shown is the key. It is beta-normalized, given this input to the parser.
(a) (1 1 1) LEFT (1 1 1)
RIGHT (1 1 1)
SOLUTION#PHON HITS
MORPH V
SYN#RESULT#RESULT#BCAT S
ARG#BCAT NP
AGR 3S
DIR BS
MODAL CROSS
ARG#BCAT NP
DIR FS
MODAL ALL
SEM (LAM X (LAM Y (("HIT" X)Y)))
INDEX LEX
PARAM 1.0
LEX T
(b) (LAM X (LAM Y (("HIT" X)Y))) (1.0 ((1 1 1)))
TABLES 12
Table 4: Keys and value types for CCGlab hash tables, and the global variables.
Table type Key Value Description
CKY (I J K) (LEFT RIGHT SOLUTION LEX)LEFT/RIGHTs value: component cells of the solution in (I J K).
I: length of derivation. J: starting position. K: analysis number.
SOLUTIONs value: LEX or CKY-ENTRY table.
LEX value: true or false.
CKY-ENTRY SYN table of type BCAT or CCAT Syntactic category hash table
SEM Logical form See §4.4 and Table 1
INDEX Rule index LEX for lex entries, rule name for lex rules, >Bx for >B×, etc.
PARAM real number Cumulative for partial results, lexical for entries/rules
BCAT BCAT symbol or constant Name of the basic category: either constant or ‘@’ variable
<name> symbol or constant atomic value of the feature <name>: either constant or ‘?’ variable
CCAT RESULT table of type BCAT or CCAT Result of a complex category
ARG table of type BCAT or CCAT Argument of a complex category
DIR one of FS BS Directionality, corresponding to ‘/’ and ‘\’ respectively
MODAL name Slash modality. name: one of ALL CROSS HARMONIC STAR
LEX SYN table of type BCAT or CCAT Syntactic category hash table
SEM Logical form See §4.4 and Table 1
INDEX Rule index LEX
PARAM real number Parameter’s value for that lex entry (set to unity in .ded)
PHON symbol or string Phonological string for the lex entry
MORPH symbol Morphological category (part of speech) of the lex entry
LRULE INSYN table of type BCAT or CCAT Input syntactic category
INSEM Logical form Input LF as one object. It is the only input to OUTSEM.
OUTSYN table of type BCAT or CCAT Output syntactic category
OUTSEM Logical form Output LF as a function of INSEM. See §4.4 and Table 1
INDEX Rule index Rule name (given by you)
PARAM real number parameter value for the rule (set to unity in .ded)
ARGMAX <lf> (number cells) key: beta-normalized LF of the result.
number: weight-sum, cells: list of cells for the sum
Global variable Type or value (see above)
*ccg-grammar* List of list of name-value pairs, where current grammar is internally represented
*loaded-grammar* The name of the currently loaded project file (.ded or .ind)
*cky-hashtable* CKY table
*cky-lf-hashtable* ARGMAX table for computing formula (1)
*cky-lf* the maximum of the numerator of formula (2) without exponentiation, and its LF
*cky-lf-hashtable-sum* the value of the denominator of formula (2) without exponentiation
*cky-argmax-lf-max* CKY index of the argmax result’s highest weight-summed derivation
*cky-argmax-lf* List of CKY indices which have the same LF as argmax result
*cky-max* CKY index of the highest weight-summed derivation for any LF
*lex-rules-table* List of LRULE tables
*training-hashtable* Keeps progress of parameter updates.
Key is the item’s key. Value is the list of original and current value.
*training-non0-hashtable* Keeps non-zero counts of features for the current parses. Key is the
supervision pair index (base 1), and value is a list of item keys with non-zero counts
in all current parses of the pair. Calculated once before the loop in formula (6).
*supervision-pairs-list* List of sentence-LF pairs, in this order.
grammar LALRs preset grammar to convert your textual specs to Lisp
lexicon LALR parser’s token values (i.e. its lexicon, not yours)
lexforms LALR parser’s token types (ID is any name, others are special symbols)
*endmarker* LALR parser’s end-of-input designator, currently set to $in two places
(by the LALR parser)
*beam-exp* The exponent of beam. Default is .9, for n.9. Lower than 4/5 is not recommended.
*oovp* Set it to nil (default) if you want the tool to complain about OOV; t otherwise.
TABLES 13
Table 5: Some useful functions of CCGlab.
tokens <pname> From <pname>.ccg file generates <pname>.lisptokens at command line. Also called
from within CCGlab. Self-contained bash script which runs sed.
suptokens <pname> From <pname>.supervision file generates <pname>.sup at command line. Also
called from within CCGlab. Self-contained bash script which runs sed.
(load-grammar <pname> :maker <m>) Loads the <pname>.ded file if <m> is nil or if :maker key argument is not supplied.
Generates it from the .lisptokens file by the Lisp <m> if not nil. Recognized values
for <m> are ’SBCL,’CCL and ’CLISP. There is no default maker.
(load-model <pname>) Loads the <pname>.ind file.
(cky-pprint) Pretty-prints the CKY table with row, column, solution indices so that you can see the
values of syntactic features, LFs, partial results, etc. Strings print unquoted.
(ccg-deduce <list>) Parses the input words in <list>. Returns true/false.
(ccg-induce <list>) First parses then ranks the derivations for the input words in <list>. Returns
true/false.
(cky-show-deduction <cat>) Shows all derivations in paper style. Final LF is displayed normal-evaluated. If a
basic category is supplied in <cat>, only solutions with that category are reported
(but all are still computed). Without <cat> or if <cat> is nil all solutions are printed.
(cky-show-induction) Shows 3 derivations mentioned in §3.2, in paper style.
(cky-show-lf-eqv) Applies normal-order and applicative-order evaluation to the results, and reports
the differences. For this reason, the LFs of intermediate derivations are not beta-
normalized during parsing. We know that they look ugly, but you can in fact cut/paste
them and simplify; see cky-show-normal-forms.
(cky-show-normal-forms <r> <c>) Shows normal forms of all results in CKY table entry row <r> column <c>.
(cky-sem <cell>) Returns the LF in CKY table entry row <r> column <c> result <z>. An argument of
type <cell> is the list (<r> <c> <z>). Use cky-pprint beforehand for help.
(cky-reveal-cell <cell>) Prints the deduction sequence ending in CKY table cell <cell>; see cky-sem for
<cell>. Final LF is normalized for mortals; the table is kept intact.
(cky-pprint-probs <cell>) Prints the induction sequence ending in CKY <cell>. See cky-sem for <cell> argu-
ment.
(alpha-equivalent <e1> <e2>) Returns true iff two input LFs are structurally equivalent, i.e. if they have the same
structure modulo bound-variable names.
(status <tag>) Returns a summary of the current state (loaded grammar, its size, CKY info etc.). If
no <tag> is supplied it just prints out highest ranked solution. If <tag> is supplied all
solutions are returned in a list. Useful for re-ranking.
(reset-globals) Resets the current state of CCGlab grammar to almost tabula rasa.
(count-local-structure <cell>) Computes the final weight-sum ¯
f·¯
θat the result cell <cell>, from its constituent
cells; cf. formula (2). If you change the formula, argmax (1) will be computed cor-
rectly if you save the result in result <cell>.
(plugin-count-more-substructure <cell>) A plug-in to give you access to all derivational history to extract more features
from result cell <cell>. NB where it is called. Does nothing by default.
(update-model <pname> <N> <alpha0> <c> Updates a model’s parameters after loading the supervision set <pname>.sup.
:load <l> :verbose <v> :debug <d>) See §3.3 for meanings of arguments. If <l> is not nil, the model <pname>.ind is
first loaded. If <v> is not nil, it displays progress of stochastic gradient ascent through
iterations, for every lexical item. If <d> is not nil, it displays stepwise update of the
derivative in algorithm (6).
(show-training) Shows parameter values for every lexical item before and after training.
(save-training <file>) Saves the currently trained model with the new parameter values to file.
(make-supervision <pname> :maker <m>) Makes a lisp-ready pname.sup file from pname.supervision and pname.suptokens.
See §4.3. For makers see load-grammar above.
(save-grammar <file>) Saves the currently loaded grammar to file.
(switches) Prints the current values of rule switches. Follow the instructions there to (re)set them.
(z-score-grammar) Z-scores the parameter values of the currently loaded grammar.
(show-lf) All LFs will be shown in display of a derivation. The last one is simplified for readability.
(hide-lf) Only the final LF of a derivation is shown.
(beam-on) Turns the beam on.
(beam-off) Turns the beam off.
(beam-value) Current values of *beamp* and *beam-exp*. Use setf to change the exponent before training.
(machash f1 ... fn ht) Retrieves feature f1 from .. fn of the hashtable ht. Features f2..fn must be hash-valued.
(ab) List of shortcuts for top level functions.
(oov-on) Sets *oovp* to t (true).
(oov-off) Sets *oovp* to nil (default).

Navigation menu