Chat Script System Functions Manual

ChatScript-System-Functions-Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 61

DownloadChat Script-System-Functions-Manual
Open PDF In BrowserView PDF
ChatScript System Functions Manual
© Bruce Wilcox, gowilcox@gmail.com www.brilligunderstanding.com Revision
2/18/2018 cs8.1
•
•
•
•
•
•
•
•
•
•
•

Topic Functions
Marking Functions
Input Functions
Number Functions
Output Functions
Control Flow Functions
External Access Functions
JSON Functions
Word Manipulation Functions
Multipurpose Functions
Facts Functions

System functions are predefined and can be intermixed with direct output.
Generally they are used from the output side of a rule, but in many cases nothing
prevents you from invoking them from inside a pattern. When used in a pattern,
they do not write out any text output to the user. But their output will be tested
the same as it would from an if statement, meaning 0 and false are failures.
You can write them with or without a ˆ in front of their name. With is clearer,
but you don’t have to. The only time you must is if the first thing you want to
do in a gambit is call a function (unlikely).
t: name(xxx)
This is ambiguous. Is it function call or label and pattern?
The above is treated as a label and pattern. You can force it to be a function
call by one of these:
t: ^name(xxx)

# explicilty say it is a function

t: () name(xxx) # explicitly add an empty pattern

Rule Tags
Some functions out or take “rule tags”. All rules have an internal label consisting
of ~topic.toplevelindex.rejoinderindex. E.g.
~introductions.0.5
stands for the 0th rule in the ~introductions topic, rejoinder #5.

1

Topic Functions
ˆaddtopic ( topicname )
adds the named topic as a pending topic at the head of the list. Typically you
don’t need to do this, because finding a reaction from a topic which is not a
system, disabled, or nostay topic will automatically add the topic to the pending
list. Never returns a fail code even if the topic name is bad.
ˆavailable ( ruletag optionalfail )
Sees if the named rule is available (1) or used up (0). If you supply the optional
argument, the function will fail if the rule is not available.
ˆcleartopics()
Empty the pending topics list.
ˆcounttopic ( topic what )
For the given topic, return how many rules match what.
What is gambit, available, rules, used.
That is, how many gambits exist, how many available gambits exist (not erased),
how many top level rules (gambits + responders) exist, and how many top level
rules have been erased.
ˆgambit ( value value ... )
If value is a topic name, runs the topic in gambit mode to see if any gambits
arise. If none arise from the first value, it will try the second, and so on. It
does not fail unless a rule forces it to fail or the named topic doesn’t exist or
is disabled. You can supply an optional last argument FAIL, in which case it
will return FAILRULE_BIT if it didn’t fail but it didn’t generate any new output
either.
The value may be ~, which means use the current topic you are within. It can
also be PENDING, which means pick a topic from the pending topics stack (they
are all pending being returned to but not including the current topic). Or it can
be any other word, which will be a keyword of some topic to pick. E.g.,
^gambit(~ PENDING ~mygeneraltopic FAIL)

2

ˆgetrule ( what label )
for the given rule label or tag, return some fragment of the rule.
what can be tag, type,label, pattern, output,topic, and usable.
The type will be t, ?, s, a, etc.
If a rule label is involved, optional third argument if given means only find
enabled rules with that label. For usable, returns 1 if is can be used or null if it
has been erased. The label ~ means the current rule. The label 0 means the top
level rule above us (if we are a rejoinder, otherwise it is the same as ~).
ˆhasgambit ( topic )
fails if topic does not have any gambits left unexecuted.
Even it if does, they may not execute if they have patterns and they don’t match.
Optional second argument, if any will return normally if topic has any gambits
(executed or not) and will failrule if topic has no gambits (a reactor topic).
ˆkeep()
do not erase this top level rule when it executes its output part (you could
declare a topic to be this, although it wouldn’t affect gambits).
Doing keep() on a gambit is quite risky since gambits after it may not ever fire.
ˆlastused ( topic what )
given a topic name, get the volley of the last what, where what is GAMBIT,
RESPONDER, REJOINDER, ANY. If it has never happened, the value is 0.
ˆnext ( what {label} )
Given what of GAMBIT or RESPONDER or REJOINDER or RULE and a rule label or
tag, find the next rule of that what. Fails if none is found.
REJOINDER will fail if it reaches the next top level rule.
If label is ~, it will use the last call’s answer as the starting point, enabling you
to walk rules in succession.
There is also ˆnext(FACT @xxx) – see fact manual.
For ˆnext(INPUT) the system will read the next sentence and prep the system
with it. This means that all patterns and code executing thereafter will be in

3

the context of the next input sentence. That sentence is now used up, and will
not be seen next when the current revised sentence finishes.
Sample code might be:
t: Do you have any pets
a: ( ~yes ) refine()
b: ( %more ) ^next(input) refine()
c: ( ~pets ) ... # react to pet
c: () ^retry(SENTENCE) # return to try input from scratch
b: () What kind do you have?
c: ( ~pets ) ... # react to pet
If label is LOOP, the system will stop processing code in the current loop and
return to the next iteration of it, e.g. C++/Java continue, except that it will
stop all code and return to however high up the loop really is, exiting topics and
functions willy nilly if need be.
ˆpoptopic ( topicname )
Removes the named topic as a pending topic. The intent is not to automatically
return here in future conversation. If topicname is omitted, removes the current
topic AND makes the current topic fail execution at this point.
ˆrefine ( ? )
This is like a switch statement in C language. It executes in order the rejoinders
attached to its rule in sequence.
When the pattern of one matches, it executes that output and is done, regardless
of whether or not the output fails or generates nothing. It does not “fail”, unless
you add an optional FAIL argument. You can also provide a rule tag. Normally
it uses the rule the refine is executing from, but you can direct it to refine from
any rule.
ˆrejoinder ( {tag/label} )
Without argument, see if the prior input ended with a potential rejoinder rule,
and if so test it on the current sentence. If we match and dont fail on a rejoinder,
the rejoinder is satisfied. If we fail to match on the 1st input sentence, the
rejoinder remains in place for a second sentence. If that doesn’t match, it is
canceled. It is also canceled if output matching the first sentence sets a rejoinder.
You can give an optional tag or label to pretend the named rule had been the
one to set a rejoinder and so therefore execute its rejoinders explicitly.

4

ˆrespond ( value value ... )
Tests the sentence against the named value topic in responder mode to see if any
rule matches (executes the rule when matched). It does not fail (though it may
not generate any output), unless a rule forces it to fail or the topic requested
does not exist or is disabled.
This rule will not erase but the responding rule might. If the first value fails to
generate an answer, it tries the second, and so on. You can supply an optional
last argument FAIL, in which case it will return FAILRULE_BIT if it didn’t fail
but it didn’t generate any new output either. You could instead supply an
optiona last argument TEST, in which case a topic is executed to see if a rule
will match. If so, the tag is returned and no output is made from the topic (and
no rule is used up).
If a value designates a labelled or tagged rule (e.g., ~mytopic.mylabel or
~mytopic.1.0) then the system will skip over all rules until it reaches that rule,
then begin linear scanning, even if the topic is designated random.
The value may be ~, which means use the current topic you are within.
It can also be PENDING, which means pick a topic from the pending topics stack
(they are all pending being returned to but not including the current topic). Or
it can be any other word, which will be a keyword of some topic to pick.
ˆretry ( item )
If item is RULE reexecute the current rule. It will automatically try to match
one word later than its first match previously.
If item is TOPIC it will try the topic over again.
If item is SENTENCE it will retry doing the sentence again. To prevent infinite
loops, it will not perform more than 5 retries during a volley. SENTENCE is
particularly useful with changing the tokenflags to get input processing done
differently. If item is INPUT it will retry all input again.
ˆretry(TOPRULE) will return back to the top level rule (not of the topic but of
a rejoinder set) and retry.
It’s the same if the current rule was a top level rule, but if the current rule is
from ˆrefine(), then it returns to the outermost rule to restart. If the current
rule is not from ˆrefine(), then TOPRULE means the lexically placed toprule
above the current rule and a ˆreuse() will be performed to go to it.
ˆreuse ( rule label optional-enable optional-FAIL )
Uses the output script of another rule. The label can either be a simple rule
label within the current topic, or it can be a dotted pair of a topic name and a
5

label within that topic or it can be a rule tag.
ˆreuse stops at the first correctly labeled rule it can find and issues a RULE
fail if it cannot find one. Assuming nothing fails, it will return 0 regardless of
whether or not any output was generated.
When it executes the output of the other rule, that rule is credited with matching
and is disabled if it is allowed. If not allowed, the calling rule will be disabled if
it can be.
t: NAME () My name is Bob.
?: ( << what you name >> )
^reuse(NAME)
?: ( << what you girlfriend name >> )
^reuse(~SARAH.NAME)
Normally reuse will use the output of a rule whether or not the rule has been
disabled. But, if you supply a 2nd argument (whatever it is), then it will ignore
disabled ones and try to find one with the same label that is not disabled. You
can also supply a FAIL argument (as either 2nd or 3rd) which indicates the
system should issue a RULE FAIL if it doesn’t generate any output.
If you want to use a common rule to hold an answer and ONLY fire when reused,
perhaps with rejoinders, the most efficient way to do that is with a rule whose
pattern can never match. E.g. like this:
s: COMMON (?) some answer
a: () some rejoinder...
You make ˆreuses go to COMMON (or whatever you name it) or even
ˆsetrejoinder on it. The rule itself can never trigger because it only considers
its pattern when the input is a statement, but the pattern says the input must
be a question. So this rule never matches on its own.
There are also a variety of functions that return facts about a topic, but you
have to read the facts manual to learn about them.
ˆsequence ( ? )
This is like ˆrefine, except instead of only executing the first rejoinder that
matches, it executes all matching rejoinders in order. If one of the rule outputs
fails, it stops by failing the calling rule.
Normally ˆsequence uses the rejoinders of the rule that it is executing from,
but you can direct it to ˆsequence the rejoinders of any rule.

6

ˆsetrejoinder ( {kind} tag )
Force the output rejoinder to be set to the given tag or rule label. It’s as though
that rule had just executed, so the rules beneath it will be the rejoinders to try.
If kind is input then the input rejoinder is set.
If kind is output or is omitted, then it sets the output rejoinder.
ˆsetrejoinder does not jump anywhere.
ˆrejoinder.

It establishes the context for

When you do:
t: what is your name
a: ATX(_~propernoun) Hi, '_0
the outputrejoinder is set to ATX. You can change that if you want. When the
next volley comes in, the outputrejoinder is now the inputrejoinder and used for
ˆrejoinder. You can modify that as well. Both can exist simultaneously, you
have the input context and you set an output context before having used up the
inputrejoinder.
Setting a rejoinder on a rule means starting with the rejoinder immediately after
it. If you were trying to copy a rejoinder that had already been established and
redo it later, eg.
^setrejoinder(output %inputrejoinder)
this would be problematic, because it would set it to the rule after, which would
be wrong. For this use the kind of “copy” which does not have issues with this.
^setrejoinder(copy %inputrejoinder)
If kind is output or copy and no tag is given or the tag is null, the output
rejoinder is cleared (analogous to ˆdisable).
If the kind is input and no tag is given or the tag is null, the input rejoinder is
cleared.
ˆtopicflags ( topic )
Given a topic name, return the control bits for that topic. The bits are mapped
in dictionary_system.h as TOPIC_*.
ˆsleep ( milliseconds )
This stalls the engine for that many milliseconds. If this is a server, the server
is unavailable until sleep is done. Use with care. A good use is when starting
up a server instance and the boot process involves reading from an API. If your

7

machine runs 30 instances of ChatScript launched at once (to use max CPU),
then all of them hitting the same API at once may be bad for the API and
forcing a randomized sleep based on processid is a good use.

Marking Functions
ˆmark ({"SINGLE" word location )
Marking and unmarking words and concepts is fundamental to the pattern matching mechanism, so the system provides both an automatic marking mechanism
and manual override abilities. You can manually mark or unmark something.
Automatic system marking marks all concepts implied by chasing up membership
in other concepts, as does this call ˆmark. word can be any word, which also
means you can mark something with a concept name whether or not the concept
actually is defined anywhere.
There are two mechanisms supported using ˆmark and ˆunmark: specific and
generic.
With specific, you name words or concepts to mark or unmark, either at a
particular point in the sentence or throughout the sentence.
With generic you disable or reenable all existing marks on a word or words
in the sentence. In fact, you go beyond that because during patttern matching
words you disable are invisble entirely, and matching proceeds as if they do not
exist.
Specific: effects are permanent for the volley and cross over to other rules. In
documentation below, use of _0 symbolizes use of any match variable.
ˆmark ( ~meat _0 )
This marks ~meat as though it has been seen at whereever sentence location _0
is bound to (start and end)
ˆmark ( ~meat n )
Assuming n is within 1 and sentence word limit, this marks meat at nth word
location. If n was gotten from ˆposition of a match variable, it is the range of
that match variable.

8

ˆmark ( tomboy _0 )
This marks the word tomboy as visible at the location designated, even though
this word is not actually in the sentence. While patterns will react to its presence,
it will not show up in any memorizations using _.
While usually you mark a concept, you can also mark a word (though you should
generally use the canonical form of the word to trigger all its normal concept
hierarchy markings as well).
Although ˆconceptlist (see Facts manual) normally only reports concepts
marked at a word, if you explicitly mark using a word and not a concept, that
will also be reported in ˆconceptlist.
ˆmark ( ~meat )
With location omitted, this marks ~meat as though it has been seen at sentence
start (location 1).
ˆmark()
Clears all global unmarks. restore a global ˆunmark(0) exactly as it was before
the global unmark.
ˆunmark ( word _0 )
The inverse of specific ˆmark, this takes a matchvariable that was filled at the
position in the sentence you want erased and removes the mark on the word
or concept set or topic name given. Pattern matching for it in that position
will now fail. But it is not symmetric to ˆmark because it does not remove all
implied marks that mark may have set.
ˆunmark ( * n )
Assuming n is within 1 and sentence word limit, this unmarks all concepts at
nth word location. If n was gotten from ˆposition of a match variable, it is the
range of that match variable.
ˆunmark ( word all )
All references to word (or ~concept if you named one) are removed from anywhere
in the sentence.

9

Generic: effects are transient if done inside a pattern, last the volley if done
in output. When you are trying to analyze pieces of a sentence, you may want
to have a pattern that finds a kind of word, notes information, then hides that
kind of word and reanalyzes the input again looking for another of that ilk.
Being able to temporarily hide marks can be quite useful, and this means typically
you use ˆunmark of some flavor to hide words, and then ˆmark later to reenable
access to those hidden words.
ˆunmark ( * _0 )
Aays turn off ALL matches on this location temporarily. The word becomes
invisible. It disables matching at any of the words spanned by the match variable.
This unmark will also block subsequent specific marking using ˆmark at their
locations.
ˆmark ( * _0 )
To restore all marks to some location.
ˆunmark ( * )
Turns off matching on all words of the sentence.
ˆmark ( * )
Restores all marks of the sentence.
Reminder: If you do a generic unmark from within a pattern, it is transient and
will be turned off when the pattern match finishes (so you don’t ruin later rules),
whereas when you do it from output, then the change persists for the rest of the
volley. Furthermore it is handy to flip specific collections of generic unmarks on
an off.
ˆmark() memorizes the set of all * unmarks (generic unmarks) and then turns
them off so normal matching will occur.
ˆunmark() will restore the set of generic unmarks that were flipped off using
ˆmark().
ˆposition ( how matchvariable )
This returns the integer representing where the named match variable is located.

10

how can be START, END, or BOTH. Both means an encoding of where the start
and end of the the match was. See @_n in pattern matching to set a position or
the ˆsetposition function.
ˆmarked ( word )
returns 1 if word is marked, returns FAILRULE_BIT if the given word is not
currently marked from the current sentence.
ˆsetposition ( _var start end )
Sets the match location data of a match var to the number values given.
Alternatively you can do ˆsetposition ( _var _var1 ), which is redundant
with just doing _var = _var1.
ˆsetcanon ( wordindex value )
Changes the canonical value for this word.
ˆsettag ( wordindex value )
Changes the pos tag for the word.
ˆsetoriginal ( wordindex value )
Changes the original value for this word.
ˆsetrole ( wordindex value )
Changes the parse role for this word. These are used in conjunction with
$cs_externaltag to replace the CS inbuilt English postagger and parser with
one from outside. See end of ChatScript PosParser manual.
ˆsavesentence ( label ) / ˆrestoresentence ( label )
These two functions save and restore the current entire sentence preparation
context. That means everything that pattern matching depends upon from
the current sentence can be saved, you can go on to a new sentence (either via
ˆnext(INPUT) or ˆanalyze() or whatever), and then rapidly flip back to some

11

previous sentence analysis. Label is a value used to label the saved analysis.
This only works during the current volley.
Cannot be used in document mode. ˆsavesentence returns the number of
4-byte words the save took.

Input Functions
ˆanalyze ( stream )
The stream generates output (not printed to user) and then prepares the content
as though it were current input sentence. This means the current sentence
flagging and marking are all replaced by this one’s. It does not affect any
pending input still to be processed. If the stream is quoted string, the quotes
are removed. This would be common, for example, when analyzing output from
the chatbot gotten via grabbing facts with “chatoutput” as the verb.
Note that the stream is considered a single sentence. If you want to supply
multiple sentences, you need to call ˆtokenize and then loop on the facts
created.
Note that ˆanalyze does not call any prepass topic you may have, but you can
invoke that topic directly aterwards yourself.
ˆtokenize ( {WORD SENTENCE} stream )
WORD or SENTENCE are optional parameters (SENTENCE is default).
If SENTENCE, then splits the stream into sentences and creates facts of each like
this: (sentence ˆtokenize ˆtokenize).
If WORD, then splits it entirely into words paying no attention to sentence
boundaries.
ˆcapitalized ( n )
Returns 1 if the nth word of the sentences starts with a capital letter in user
input, else returns 0.
If n is alphabetic, it returns whether or not it starts with a capital letter. Illegal
values of n return failrule.

12

ˆinput ( ... )
The arguments, separated by spaces, are injected back into the input stream as
the next input, processed before any pending additional input. Typically this
command is then followed by ˆfail(SENTENCE) to cancel current processing
and move onto the revised input.
Since the sentence is fed in immediately after the current input, if you want to
feed in multiple sentences, you must reverse the order so the last sentence to be
processed is submitted via input first. You can detect that the current sentence
comes from ˆinput and not from the user by %revisedInput (bool) being true
(1).
ˆoriginal ( _n )
The argument is the name of a match variable. Whatever it has memorized will
be used to locate the corresponding series of words in the original raw input
from the user that led to this match.
E.g., if the input was: I lick ice crem, the converted input became I lick
ice_create and you’d memorized the food onto a match variable, then you could
do ˆoriginal(_0) and get back ice crem.
Another example:
# get foreign language proper name, without any CS standard processing.
u: what's your first name?
#! Anna Lisa
a: ( _* )
$firstname = ^original ( _0 )
Nice to meet you, $firstname
ˆposition ( which _var )
If which is start this returns the starting index of the word matched in the
named _var.
If which is end this returns the ending index. E.g.,
if the value of _1 was the fox, it might be that start was 3 and end was 4 in the
sentence it was the fox .
ˆremovetokenflags ( value )
Rremoves these flags from the tokenflags returned from the preprocessing stage.

13

ˆsettokenflags ( value )
Adds these flags to the tokenflags return from the preprocessing stage. Particularly useful for setting the #QUESTIONMARK flag indicating the input was
perceived to be a question.
For example, I treat tell me about cars sentences as questions by marking them
as such from script (equivalent to what do you know about cars?).
ˆsetwildcardindex ( value )
Tells the system to start at value for future allocations of wildcard slots. This is
only useful inside some pattern where you are trying to protect data from some
previous match. Eg.
u: (_~animals) refine()
a: ( ^setwildcardindex(_1) _~color)
_0 is set to an animal. Normally the rejoinder would set a color onto _0 and
clobber it, but the call to ˆsetwildcardindex forces it to use _1 instead, so
both _0 and _1 have values.
ˆisnormalword (value)
Fails if value has a character that is not alphabetic, numberic, a hyphen, an
underscore, or an apostrophe.

Number Functions
ˆcompute ( number operator number )
Performs arithmetic and puts the result into the output stream.
Numbers can be integer or float and will convert appropriately. There are a
range of operators that have synonyms, so you can pass in directly what the
user wrote. The answer will be ? if the operation makes no sense and infinity if
you divide by 0.
~numberOperator recognizes these operations:
operator symbol

description

+
*

plus add and (addition)
minus subtract deduct (subtraction)
x time multiply (multiplication)

14

operator symbol

description

/
%
root
ˆˆ
<< and >>
random

divide quotient (float division)
remainder modulo mod (integer only- modulo)
square_root (square root)
power exponent (exponent )
shift (limited to shifting 31 bits or less)
( 0 random 7 means 0,1,2,3,4,5,6 - integer only)

Basic operations can be done directly in assignment statements like:
$var = $x + 43 – 28
ˆtimefromseconds ( seconds {offset} )
This converts time in seconds (Unix epoch time) from the given time in whatever
timezone, to a string like %time returns. You can compute a difference in times
by merely doing a subtraction of the two times. %fulltime will give you the
current time that you could plug in here. The optional second argument will
displace that time by the hours offset (can be plus or minus).
ˆtimeinfofromseconds ( seconds )
This converts time in seconds (Unix epoch time) into its component bits, spread
across 7 match variables. Starting by default at _0, if you assign it like this:
_3 = ^timeinfofromseconds(%fulltime)
it will start at _3. The items you get are: seconds, minutes, hours, date in
month, month name, year, day name of week, month index (jan==0), dayofweek
index (sun==0).
ˆtimetoseconds ( seconds minutes hours date-of-month month year )
This converts time data since 1970 (Unix epoch time). Analogous to %fulltime,
which returns the current time in seconds. Month can be number 1-12 or name
of month or abbreviation of month. Date-of-month must be 1 or more. Year
must be on or 1970 and less than 2100. Optional 7th argument indicates whether
time is within daylight savings or not , values can be 1 or 0, t or f, T or F.
Default is false.
ˆisnumber ( value )
Fails if value is not an integer, float, or currency,
15

Output Functions
The following functions cannot be used during postprocessing since output has
been finished in theory and you can now analyze it.
ˆflushoutput()
Takes any current pending output stream data and sends it out. If the rule later
fails, the output has been protected and will still go out (though the rule will
not erase itself).
ˆinsertprint ( where stream )
The stream will be put into output, but it will be placed before output number
where or before output issued by the topic named by where. The output is safe
in that even if the rule later fails, this output will go out. Before the where, you
may put in output control flags as either a simple value or a value list in parens.
ˆkeephistory ( who count )
The history of either BOT or USER (values of who) will be cut back to the count
give. This affects detecting repeated input on the part of the user or detecting
repeating output by the chatbot.
ˆlastsaid ()
Returns what the bot said last volley.
ˆprint ( stream )
Sends the results of outputing that stream to the user. It is isolated from the
normal output stream, and goes to the user whether or not one later generates
a failure code from the rule. Before the output you may put in output control
flags as either a simple value without a # (e.g., OUTPUT_EVALCODE ) or a value
list in parens.
Flags include:
Flag

description

OUTPUT_EVALCODE

is automatic, so not particularly
useful. Useful ones would control
how print decides to space things
16

Flag

description

OUTPUT_RAW

does not attempt to interpret ( or
{ or [ or "
does not go to the user, is merely
return as an answer. Print
normally stores directly into the
response system, meaning failing
the rule later has no effect. Print
normally does not return a value
so you can’t store it into a
variable. And print has a number
of flags that can affect its
formatting that dont exist with
normal output. This flag
converts print into an ordinary
function returning a value,
reversing all those differences
dont add commas to numbers
remove quotes from strings
convert underscores to blanks

OUTPUT_RETURNVALUE_ONLY

OUTPUT_NOCOMMANUMBER
OUTPUT_NOQUOTES
OUTPUT_NOUNDERSCORE

These flags apply to output as it is sent to the user:
Flag

description

RESPONSE_NONE
RESPONSE_UPPERSTART

turn off all default response conversions
force 1st character of output to be
uppercase
RESPONSE_REMOVESPACEBEFORECOMMA
as the name says
RESPONSE_ALTERUNDERSCORES
convert underscores to spaces
RESPONSE_REMOVETILDE
remove leading ~ on class names
RESPONSE_NOCONVERTSPECIAL
don’t convert ecaped n, r, and t into ascii
direct characters
RESPONSE_CURLYQUOTES
change simple quotes to curly quotes
(starting and ending)

ˆpreprint ( stream )
The stream will be put into output, but it will be placed before all previously
generated outputs instead of after, which is what usually happens. The output
is safe in that even if the rule later fails, this output will go out. Before the
output you may put in output control flags as either a simple value or a value
list in parens.

17

ˆrepeat ()
Allows this rule to generate output that may repeat what has been said recently
by the chatbot.
ˆreviseOutput ( n value )
Allows you to replace a generated response with the given value.
n is one based and must be within range of given responses. One can use this,
for example, alter output to create accents. Using ˆresponse to get an output,
you can then use ˆsubstitute to generate a revised one and put it back using
this function.

Output Access
These functions allow you to find out what the chatbot has said and why.
ˆresponse ( id )
What the chatbot said for this response. Id 1 will be the first output.
ˆresponsequestion ( id )
Boolean 1 if response ended in ?, null otherwise.
ˆresponseruleid ( id )
The rule tag generating this response from which you can get the topic. May be
joined pair of rule tags if rule was relayed (reuse) from a different rule). The final
rule will be first and the relay second, eg ~keywordless.30.0.~control.3.4.
If the id is -1, then all output generated will be included, analogous to what
happens in the log file for why in the entries.

PostProcessing Functions
These functions are only available during postprocessing.

18

ˆpostprintbefore ( stream )
It prints the stream prepended to the existing output. You will not be able to
analyze or retrieve information about this, like you would from a normal print
because it generates no facts representing it. This is useful for adding outofband
messages [ ] to the front of input for controlling avatars and such. Or for adding
transitional phrases or other personality coloring before the main output.
ˆpostprintafter ( stream )
It prints the stream appended to the existing output. You will not be able
to analyze or retrieve information about this, like you would from a normal
print because it generates no facts representing it. This is useful for adding
summarizing data after output, e.g., when running the document reader.

Control Flow Functions
ˆargument ( n )
Retrieves the nth argument of the calling outputmacro (1-based).
ˆargument ( n ˆfn )
Looks backward in the callstack for the named outputmacro, and if found returns
the nth argument passed to it. Failure will be reported for n out of range or ˆfn
not in the call path.
This is an alterative access to function variable arguments, useful in a loop
instead of having to access by variable name.
If n is 0, the system merely tests whether the caller exists and fails if the caller
is not in the path of this call.
ˆcallstack ( @n )
Generates a list of transient facts into the named factset. The facts represent
the callstack and have as subject the critical value (the verb is callstack and
the object is the rule tag responsible for this entry). Items include function calls
(ˆxxxx) and topic calls (~xxxx) and internal calls (no prefix).

19

ˆcommand ( args )
Execute this stream of arguments through the : command processor. You can
execute debugging commands through here. E.g.,
^command(:execute ^print("Hello") )
Note that it is hard to turn on :trace this way, because the system resets It
internally at various points. The correct way to manipulate trace is to do
$cs_trace = -1 in regular script, outside of ˆcommand.
ˆend ( code )
Takes 1 argument and returns a code which will stop processing. Any data
pending in the output stream will be shipped to the user. If ˆend is contained
within the condition of an if, it merely stops it. An end rule inside a loop merely
stops the loop. All other codes propagate past the loop. The codes are:
code

description

CALL

stops the
current
outputmacro
w/o failing it.
See also
ˆreturn
stops the
current rule.
Whether the
next rule
triggers
depends
upon whether
or not output
was
generated

RULE

20

code

description

LOOP

stops the
current loop
but not the
rule
containing it.
Can pass up
through
topics to find
the loop. If
there is no
loop, it will
fail you all
the way to
the top
stops the
current topic
stops the
current rule,
topic, and
sentence
stops all the
way through
all sentences
of the current
input
succeeds a
plan – (only
usable within
a plan)

TOPIC
SENTENCE

INPUT

PLAN

ˆeval ( flags stream )
To evaluate a stream as though it were output (like to assign a variable). Can
be used to execute :commands from script as well.
Flags are optional and match the flag capabilities of ˆprint.
One common flag would be OUTPUT_NOQUOTES if you wanted to string enclosing
“” from a value. E.g.,
$$tmp = ^eval(OUTPUT_NOQUOTES ^arg1)
ˆeval is also particularly used with variables, when you know the value of a
variable is itself a variable name and you want its actual value, e.g.

21

$nox = 1
$$tmp = join($ no x)
$$val = eval($$tmp) # $$val = 1
ˆfail ( code )
Takes 1 argument and returns a failure code which will stop processing. How
extensive that stop is, depends on the code. If ˆfail is contained within the
condition of an if, it merely stops that and not anything broader. A fail or end
rule inside a loop merely stops the loop; other forms propagate past the loop.
The failure codes are:
code

description

RULE

stops the
current rule
and cancels
pending
output
stops a
containing
loop and fails
the rule
calling it. If
you have no
containing
loop, this can
crawl up
through all
enclosing
topics and
make no
output

LOOP

22

code

description

TOPIC

stops not
only the
current rule
also the
current topic
and cancels
pending
output. Rule
processing
stops for the
topic, but as
it exits, it
passes up to
the caller a
downgraded
fail(rule), so
the caller can
just continue
executing
other rules
stops the
current rule,
the current
topic, and
the current
sentence and
cancels
pending
output
stops
processing
anything
more from
this user’s
volley. Does
not cancel
pending
output. It’s
the same as
END(INPUT)

SENTENCE

INPUT

Output that has been recorded via ˆprint, ˆpreprint, etc is never canceled.
Only pending output.

23

ˆload ( name )
Normally CS takes all the data you have compiled as :build 0 and :build
whatever as layers 0 and 1, and loads them when CS starts up. They are then
permanently resident. However, you can also compile files named filesxxx2.txt
which will NOT be loaded automatically.
You can write script that calls ˆload, naming the xxx part and they will be
dynamically loaded, for that user only, and stay loaded for that user across all
volleys until you call ˆload again. Calling load again with a different name will
load that new name. Calling ˆload(null) will merely unload the dynamic layer
previously loaded.
WARNING
It’s erroneous (you get whatever happens to you), if you call ˆload from within
topics you have loaded via ˆload.
ˆclearmatch()
This clears all match variables to empty.
ˆmatch ( what )
This does a pattern match using the contents of what (usually a variable reference).
It fails if the match against current input fails. It operates on the current analyzed
sentence which is usually the current input, but since you can call ˆnext(input)
or ˆanalyze() it is whatever the current analysis data is.
if (%more AND ^match(^"(< ![~emocurse ~emothanks] ~interjections >)" ) )
{FAIL(SENTENCE)}
or
$$newrule = GetRule(pattern $$newtag)
$$newtype = GetRule(type $$newtag)
if ($$newtype == $$type AND match($$newrule)) # we would match this rule
ˆmatch can also take a rule tag for what, in which case it uses the pattern of
the rule given it. ˆmatch will normally take your pattern and compile it with
the script compiler during execution.
If you have discarded the script compiler in your build, it will run your pattern
directly and pray. In that case every token should be separated by a space: eg
not this:
[my you]
but this

24

[ my you ]
and relational tests won’t work so you can’t do _0>5 or _0? or things like that.
If you know your pattern in advance, you can put it on a rule and then execute
that since it will have been compiled. E.g.
s: TEST (some fancy pattern)
and later
^match(~mytopic.test)
You can also just say ˆmatch(~someconcept) and it will test the current input
for that concept.
’$$csmatch_start and $$csmatch_end are assigned to provide the range of
words that ˆmatch used.
ˆmatches ()
Returns a string of indices of words that matched the most recent pattern match.
The indices are in order, so you can know the range of the match or the specific
word indices that were seen. Currently matches only include the words/concepts
that were matched, not things like
(sag*)
where the word is not fully named.
ˆnofail ( code ... script ... )
The antithesis of ˆfail(). It takes a code and and number of script elements,
executes the script and removes all failure codes through the listed code.
This is important when calling ˆrespond and ˆgambit from a control script.
You would want a control script to pass along codes at the sentence level, but if
the respond call generated a fail-rule return, you don’t want that to stop all the
code of a control script responder.
The nofail codes are:
code

description

RULE

a rule failure within the script does not propagate outside of
nofail
a loop failure or end within the script does not propagate
outside of nofail
a topic or rule failure within the script does not propagate
outside of nofail

LOOP
TOPIC

25

code

description

SENTENCE

a topic or rule or sentence failure within the script does not
propogate outside of nofail
no failure propagates outside of the script

INPUT

notnull ( stream )
Execute the stream and if it returns no text value whatsoever, fail this code.
The text value is not used anywhere, just tested for existence. Useful in IF
conditions.
ˆnorejoinder ()
Prevents this rule from assigning a rejoinder.
ˆnotrace ( ... )
Suppresses normal tracing if if :trace all is on, for the duration of evaluation
of the contents of the parens. It does not block explicit traces of functions or
topics.
ˆreturn ( ... )
Evaluates it data and returns any output from the most recent calling outputmacro. It is nominally equivalent to:
here is some outputting
^end(CALL)
My personal coding convention is to use ˆreturn when the function is supposed
to return a value to a caller who will assign it somewhere. And not to use it if
the function is directly creating output to the user or is just being executed for
side effects.
You can return the contents of a variable ˆreturn($$myvar) or the name of a
factset ˆreturn(@19) or just some literal value ˆreturn(test). Returning a
factset just returns its name. But if you have
@0 = ^myfunc()
and ˆmyfunc returns a factset name, you have done the equivalent of
@0 = @19

26

which means copy the elements of set 19 into set 0.
Note that ˆreturn() and ˆreturn(null) are treated the same. An empty
string is returned. This is similar to assigning a variable by saying $var = null
which assigns the empty string.
ˆaddcontext ( topic label )
Sets a topic and context name for use by ˆincontext.
The label doesn’t have to corrrespond to any real label.
The topic can be a topic name or ~ meaning current topic.
ˆauthorized ()
Use same authorizedIP.txt file and rules that debug commands use, to validate
current user.
ˆclearcontext ()
Erases all context data (see ˆaddcontext).
ˆincontext ( label )
label can be a simple text label or a topicname.textlabel. The system tracks
rule labels that generated output to the user or rules starting with the label CX_
whether or not the rule generates output as long as it didn’t fail during output.
ˆinContext will return how many volleys have happened since the referenced
rule (normal return) if the label has output within the 5 prior volleys and will
fail if not. It’s like an extension of rejoinders. Rejoinders have a 1 volley context
and must be placed immediately after a rule. This has a 5 volley context and
are used in normal rule patterns.
u: (^incontext(PLAYTENNIS) why) because it was fun.

External Access Functions
ˆenvironment ( variablename )
Access environment variables of the operating system. E.g.
^environment(path)

27

ˆsystem ( any number of arguments )
The arguments, separated by spaces, are passed as a text string to the operating
system for execution as a command. The function always succeeds, returning
the return code of the call. You can transfer data back and forth via files by
using ˆimport and ˆexport of facts.
ˆpopen ( commandstring 'function )
The command string is a string to pass the os shell to execute. That will return
output strings (some number of them) which will have any \r or \n changed to
blanks and then the string stripped of leading and trailing blanks.
The string is then wrapped in double quotes so it looks like a standard ChatScript
single argument string, and sent to the declared function, which must be an
output macro or system function name, preceded by a quote.
The function can do whatever it wants. Any output it prints to the output buffer
will be concatenated together to be the output from ChatScript. If you need a
doublequote in the command string, use a backslash in front of each one. They
will be removed prior to sending the command. E.g.,
outputmacro: ^myfunc(^arg)
^arg \n
topic: ~test( testing )
u: () popen( "dir *.* /on" '^myfunc)
output this:
Volume in drive C is OS
Volume Serial Number is 24CB-C5FC
Directory of C:ChatScript
06/15/2013 12:50 PM  .
06/15/2013 12:50 PM  ..
12/30/2010 02:50 PM 5 authorizedIP.txt
06/15/2013 12:19 PM 10,744 changes.txt
05/08/2013 03:29 PM  DICT
...( additional lines omitted)
49 File(s) 29,813,641 bytes
24 Dir(s) 566,354,685,952 bytes free
'Function can be null if you are not needing to look at output.

28

ˆtcpopen ( kind url data 'function )
Analogous in spirit to popen.
You name the kind of service (POST, GET), the url (not including http://) but
including any subdirectory, the text string to send as data, and the quoted
function in ChatScript you want to receive the answer.
The answer will be read as strings of text (newlines separate and are stripped off
with carriage returns) and each string is passed in turn to your function which
takes a single argument (that text).
:trace TRACE_TCP
can be enabled to log what happens during the call.
Likely you will prefer ˆjsonopen which can deal with more complex web communication scenarios and returns structured data so you don’t have to write
script yourself to parse the text.
'function can be null if you are not needing to look at output.
The system will set $$tcpopen_error with error information if this function
fails.
When you look at a webpage you often see it’s url looking like this:
http://xml.weather.com/weather/local/4f33?cc=*&unit ="+vunit+"&dayf=7"
There are three components to it.
The host: xml.weather.com.
The service or directory: /weather/local/4f33.
The arguments: everything AFTER the ?.
The arguments are URLencoded, so spaces have been replaced by +, special
characters will be converted to %xx hex numbers.
If there are multiple values, they will be separated by & and the left side of an =
is the argument name and the right side is the value.
When you call ˆtcpopen, normally you provide the host and service as a single
argument (everything to the left of ?) and the data as another argument
(everything to the right of ?).
Since ChatScript URL encodes, you don’t. If you don’t know the unencoded form
of the data or you don’t think CS will get it right, you can provide URL-encoded
data yourself, in which case make your first argument either POSTU or GETU,
meaning you are supplying url-encoded data so CS should not do anything to
your arguments.

29

Below is sample code to find current conditions and temperature in san francisco
if you have an api key to the service. It calls the service, gets back all the JSON
formatted data from the request, and line by line passes it to ˆmyfunc.
This, in turn, calls a topic to hunt selectively for fragments and save them, and
when all the fragments we want have been found, ˆmyfunc outputs a message
and stops further processing by calling ˆEND(RULE).
Note that in this example there is no data to pass, everything is in the service
named, so the data value is “”.
outputmacro: ^myfunc (^value)
$$tmp = ^value
nofail(RULE respond(~tempinfo))
if ($$currentCondition AND $$currentTEMP)
{
print( It is $$currentCondition. )
print(The temperature is $$currentTemp. )
^END(RULE)
}
topic: ~tempinfo system repeat keep()
u: (!$$currentCondition)
$$start = findtext($$tmp $$pattern1 0)
$$findtext_start = findtext($$tmp ^"\"" $$start)
$$currentCondition = extract($$tmp $$start $$findtext_start )
u: ($$currentCondition)
$$start = findtext($$tmp $$pattern2 0)
$$findtext_start = findtext($$tmp , $$start)
$$currentTemp = extract($$tmp $$start $$findtext_start)
topic: ~INTRODUCTIONS repeat keep (~emogoodbye ~emohello ~emohowzit name )
t: ^keep() Ready. Type "weather" to see the data.
u: (weather)
$$pattern1 = ^"\"weather\":\""
$$pattern2 = ^"\"temp_f\":"
if ( tcpopen(GET api.wunderground.com/api/yourkey/conditions/q/CA/San_Francisco.json ""
{ hi }
else
{ $$tcpopen_error }
There is a subtlety in the ˆmyfunc code in that it uses ˆprint to put out the
30

result. Just writing:
if ($$currentCondition AND $$currentTEMP)
{
It is $$currentCondition.
The temperature is $$currentTemp.
^END(RULE)
}
will not work, because that output is being generated by the call to ˆtcpopen,
which is in the test part of the if, so everything it does is purely for effect of
testing a condition. The generated output is dicarded.
If you moved the output generation to the { } of the if, things would be fine.
E.g.,

if ( tcpopen(GET api.wunderground.com/api/yourkey/conditions/q/CA/San_Francisco.json "" '^my
{
It is $$currentCondition.
The temperature is $$currentTemp.
}
else { $$tcpopen_error }
Doing the output without using ˆprint is my preferred style; it is easier to see
what is going on for output if it is not hidden deep inside some if test.
ˆexport ( name from )
From must be a fact set to export. Name is the file to write them to. An optional
3rd argument append means to add to the file at the end, rather than recreate
the file from scratch.
Obviously, you must first have done something like ˆquery to populate the fact
set. Eg.
^query(direct_sv item label ? -1 ? @3)
^export(myfacts.txt @3)
If the name includes the substring “ltm”, then the file will not be appendable, but
will be encryptable and routes to databases if the filesystem has been overridden
by Mongo, Postgres, or MySQL.
ˆimport ( name set erase transient )
name is the file to read from. Set is where to put the read facts.
erase can be erase meaning delete the file after use or keep meaning leave the
file alone.

31

transient can be transient meaning mark facts as temporary (to self erase at
end of volley) or permanent meaning keep the facts as part of user data. Eg
^import(myfacts.txt @3).
If set is null, then facts are created but not stored into any fact-set and the subject
of the first fact is returned as the answer (presumed to be a json structure).
If the name includes the substring “ltm”, then the file will be decryptable and
routes to databases if the filesystem has been overridden by Mongo, Postgres, or
MySQL.

Debugging Function ˆdebug ()
As a last ditch, you can add this function call into a pattern or the output and
it will call DebugCode in functionExecute.cpp so you know exactly where you
are and can use a debugger to follow code thereafter if you can debug c code.

Logging Function ˆlog ( ... )
This allows you to print something directly to the users log file. If you want
it echoed to the console as well, you can do ˆlog(OUTPUT_ECHO This is my
message).
You can actually append to any file by putting at the front of your output the
word FILE in capital letters followed by the name of the file. E.g.,
^log(FILE TMP/mylog.txt This is my log output.)
Logging appends to the file. If you want to clear it first, issue a log command
like this:
^log(FILE TMP/mylog.txt NEW This is my log output)
The new tells it to initialize the file to empty.
Additionally you can optimize log file behavior. If you expect to write to a file a
lot during a volley (eg during :document mode), you can leave the file open by
using
^log(OPEN TMP/mylog.txt This is my log output.)
which caches the file ptr. After which you can write with OPEN or FILE
equivalently. To close the file use
^log(CLOSE TMP/mylog.txt)
By default, ˆlog acts like output to user, converting escaped nr, and t into their
actual ascii characters. The flag RESPONSE_NOCONVERTSPECIAL passed
in will block this.

32

ˆmemorymark ()
Reading a document consists of performing a single volley of the entire document.
This can tie up a lot of memory in keeping facts, dictionary entries, user variables,
etc. If you are careful in what you do, you can make the memory burden go away.
ˆmemoryMark() notes where memory is currently at, and is best done within the
document_pre topic. Then you can release memory after every sentence of the
document, so it doesn’t accumulate.
ˆmemoryfree ()
This releases memory back to the last ˆmemorymark(). It is best done after your
main control of the document bot has finished processing a sentence. Partly
because the analysis of the sentence is lost and so no later rules can pattern
match to it (though you can call ˆanalyze to reacquire your sentence). E.g.,
topic: ~document_pre system repeat()
t: ^memorymark() # note start
Log(OUTPUT_ECHO \n Begin $$document ) # instant display
topic: ~main_control system repeat () # executed each sentence of document
u: (%document)
respond(~filter)
^memoryfree()
The caveats and warnings about how this works. Whenenver you free memory,
the system will clear all fact sets. It will clear all user variables set after the
memory mark (leaving the ones before alone). It will then release facts, text,
and dictionary nodes created after the mark.
The only data you can pass out from a memoryMark/ memoryfree zone is data
stored on match variables (which have size limitations) or on the count field of a
dictionary word of a preexisting word.
ˆmemorygc ()
This can function in either document mode or chat mode. It does what it can to
release unused memory. It has restrictions in it does not work if you have facts
with facts as fields or are in planning mode. It also discards saved sentence data,
and all of your analysis data for the current sentence. It also discards all data in
factsets.

33

JSON Functions
JSON functions and JSON are described more fully in the ChatScript JSON
manual.
ˆjsonarrayinsert ( arrayname value )
Given the name of a json array and a value, it addsthe value to the end of
the array. SAFE protects any nested JSON data from being deleted. See JSON
manual.
ˆjsonarraydelete ( [INDEX, VALUE] arrayname value {ALL} )
This deletes a single entry from a JSON array. It does not damage the thing
deleted, just its member in the array. If the first argument is INDEX, then value
is a number which is the array index (0 . . . n-1). If the first argument is VALUE,
then value is the value to find and remove as the object of the json fact.
You can delete every matching VALUE entry by adding the optional 4th argument
ALL.
If there are numbered elements after this one, then those elements immediately
renumber downwards so that the array indexing range is contiguous.
ˆjsoncreate ( type )
Type is either array or object and a json composite with no content is created
and its name returned.
ˆjsondelete ( factid )
Deprecated in favor of ˆdelete.
ˆjsongather ( {factset} jsonid )
Takes the facts involved in the json data (as returned by ˆjsonparse or
ˆjsonopen and stores them in the named factset. This allows you to remove
their transient flags or save them in the users permanent data file.
You can omit fact-set as an argument if you are using an assignment statement:
@1 = ^jsongather(jsonid)

34

ˆJsongather normally gathers all levels of the data recursively. You can limit
how far down it goes by supplying level. Level 0 is all. Level 1 is the top level
of data. Etc.
ˆjsonlabel ( label )
Assigns a text sequence to add to jo- and ja- items created thereafter. E.g.
ˆjsonlabel(x) generates jo-x1 and ja-x1. You can turn it back off again with
ˆjsonlabel("")
This allows you to create json namespaces which will not conflict. Eg, you may
load a bunch of json during a system bootup (ˆcsboot) under one naming and
then use a different naming for user json created later and code can determine
the source of the data.
ˆjsonreadcvs ( TAB filepath )
reads a tsv (tab delimited spreadsheet file) and returns a JSON array representing
it. The lines are all objects in an array. The line is an object where non-empty
fields are given as field indexes. The first field is 0. Empty fields are skipped
over and their number omitted.
ˆjsonundecodestring ( string )
Removes all json escape markers back to normal for possible printout to a user.
This translates \n to newline, \r to carriage return, \t to tab, and \" to a
simple quote.
ˆjsonobjectinsert ( {DUPLICATE} objectname key value )
Inserts the key value pair into the object named. The key does not require
quoting. Inserting a json string as value requires a quoted string. Duplicate keys
are ignored unless the optional 1st argument DUPLICATE is given. SAFE protects
any nested JSON data from being deleted. See JSON manual.
ˆjsonopen ( {UNIQUE} kind url postdata header )
This function queries a website and returns a JSON datastructure as facts. It
uses the standard CURL library, so it’s arguments and how to use them are
generally defined by CURL documentation and the website you intend to access.
See ChatScript JSON manual for details.

35

ˆjsontree ( name )
name is the value returned by ˆJSONparse, ˆJSONopen, or some query into
such structures. It prints out a tree of elements, one per line, where depth is
represented as more deeply indented. Objects are marked with { } as they are
in JSON. Arrays are marked with [].
ˆjsonwrite ( name )
name is the name from a json fact set (returned by ˆJSONparse, ˆJSONopen, or
some query into such structures). Result is the corresponding JSON string (as a
website might emit), without any linefeeds.
ˆjsonparse ( {UNIQUE} string )
string is a json text string (as might be returned from a website) and this
parses into facts exactly as ˆjsonopen would do, just not retrieving the string
from the web. It returns the name of the root node. One use for this is to pass
JSON data as a quoted string within out-of-band data, and have the system
parse that into facts you can use.
You can add NOFAIL before the string argument, to tell it to return null but not
fail if a dereference path fails cannot be found.
^jsonparse(transient NOFAIL "{ a: $var, b: _0.e[2] }")
ˆjsonparse automatically converts any backslashunnnn into the corresponding
utf8 character.
ˆjsonkind ( something )
If something is a JSON object, the function returns object. If it is a JSON
array it returns array. Otherwise it fails.
ˆjsonpath ( string id )
string is a description of how to walk JSON. Id is the name of the node you
want to start at (typically returned from ˆjsonopen or ˆjsonparse.
Array values are accessed using typical array notation like ja-1[3] and object
fields using dotted notation like jo-7.id.
A simple path access might look like this: [1].id which means take the root
object passed as id, e.g., ja-1, get the 2nd index value (arrays are 0-based in
JSON). That value is expected to be an object, so return the value corresponding

36

to the id field of that object. In more complex situations, the value of id
might itself be an object or an array, which you could continue indexing like
[1].id.firstname.
ˆJsonpath can also return the actual factid of the match, instead of the object
of the fact. This would allow you to see the index of a found array element, or
the json object/array name involved. Or you could use ˆrevisefact to change
the specific value of that fact (not creating a new fact). Just add * after your
final path, eg
^jsonpath(.name* $$obj)
^jsonpath(.name[4]* $$obj)
If you need to handle the full range of legal keys in json, you can use text string
notation like this
^jsonpath(."st. helen".data $tmp)
You may omit the leading . of a path and CS will by default assume it
^jsonpath("st. helen".data $tmp)

Word Manipulation Functions
ˆburst ( {count once} data-source burst-character-string )
Takes the data source text and hunts within it for instances of the burst-characterstring. If it is being dumped to the output stream then only the first piece is
dumped.
If it is being assigned to a fact set (like @2) then a series of transient facts are
created for the pieces, with the piece as the subject and ˆburst ˆburst as the
verb and object.
If it is being assigned to a match variable, then pieces are assigned starting at
that variable and moving on to successively higher ones.
If burst does not find a separator, it puts out the original value. For assignment
to match variables, it also clears the next match variable so the end of the list
will be a null match variable.
If burst_character is omitted, it is presumed to be BOTH _ (which joins
composite words and names) and " “, which separates words.
If burst_character is the null string “”, it means burst into characters.
ˆburst takes an optional first parameter count, which tells it to return how
many items it would return if you burst, but not to do the burst.
ˆburst takes an optional first parameter once which says split only into the first
burst and then the leftover rest.

37

ˆburst has a special burst value digitsplit which will split a number-text
thing or a text-number thing into two pieces (text thing and number thing).
This is good for splitting a currency thing lik USD25 or 25$.
ˆwords ( someword )
Looks up the given word and returns all words matching it. Matching includes
the lower case form of it and any number of uppercase forms of it. E.g, you
might say ˆwords(ted) and get back facts for ted, Ted, TED.
The answers are a series of facts of the form (someword words words). In addition
to case switching, the system will automatically switch words with underscores
or blanks into words with changes in them to the other (since CS stores phrases
with underscores). So ˆwords("I love you") can match phrases already in the
dictionary of: I_love you I_love_you I love you I LOVE You
etc. Depending on which words are actually there (for example because they are
parts of a fact).
ˆcanon ( word canonicalform )
Same as :canon during a :build from a table. Fails during normal execution
not involving compiling.
ˆexplode ( word )
Convert a word into a series of facts of its letters.
ˆextract ( source start end )
Return the substring with the designated offset range (exclusive of end location).
Useful for data extraction using ˆpopen and ˆtcpopen when combined with
ˆfindtext.
In addition to absolute unsigned values, start and end can take on offsets or
relative values. A signed end is a length to extract plus a direction or shift in
start:
^extract($$source 5 +2) # to extract 2 characters beginning at position
^extract($$source 5 -2) # to extract 2 characters ending at position 5
A negative start is a backwards offset from end.

38

^extract($$source -1 +1) # from end, 1 character before and get 1 character

^extract($$source -5 -1) # from end, 5 characters before and get 1 character before. i.e. th
ˆfindtext ( source substring offset {insensitive} )
Find case sensitive substring within source+offset and return offset starting
immediately after match. Useful for data extraction using ˆpopen and ˆtcpopen
when combined with ˆextract. $$findtext_start is bound to the actual start
of the match. $$findtext_word is bound to the word index in which the match
was found where one or more blanks separate words. Indexing starts at 1 (same
as sentence positional notation).
An optional fourth argument insensitive will match insensitively.
Failing to match will generate a rule failure. If the source or substring contains
an _, these will be converted to blanks before execution, to allow that or the
space notation to be considered equivalent (unless your source or substring is
literally an underscore only).
ˆflags ( word )
get the 64bit systemflags of a word.
ˆintersectwords ( arg1 arg2 optional )
Given two “sentences”, finds words in common in both of them. Output facts
will go to the set assigned to, or @0 if not an assignment statement. The optional
third argument, if it’s canonical, it will match the canonical forms of each
word.
ˆjoin ( any number of arguments )
Concatenates them all together, putting the result into the output stream. If
the first argument is AUTOSPACE, it will put a single space between each of the
joined arguments automatically.
ˆactualinputrange ( start end )
Given the starting and ending word positions of an original input (what CS had
after tokenization but before adjustments), this returns the range of where the
words arose in the actual input. The return is a range whose start is shifted 8
bits left and ORed with the end position.

39

ˆoriginalinputrange ( start end )
Given the starting and ending word positions of an actual input (what CS sees
after adjustments and what you normally pattern match on), this returns the
range of where the words came from in the original input. The return is a range
whose start is shifted 8 bits left and ORed with the end position.
ˆproperties ( word )
Returns the 64bit properties of a word or fail-rule if the word is not already in
the dictionary.
ˆpos( part-of-speech word supplemental-data )
Generates a particular form of a word in any form and puts it in the output
stream. If it cannot generate the request, it issues a RULE failure. Most
combinations of arguments are obvious. Here are the 1st & 3rd choices. For
verbs with irregular pronoun conjugation, supply 4th argument of pronoun to
use.

part-of-speech

word/verb/number(+
supplement-data argument)

conjugate

pos-integer(as returned from
ˆpartofspeech)

raw

integer 1 .. %length

syllable

word

hex64

integer-word

hex32

integer-word

ismodelnumber

word

isinteger

word

isfloat

word

isuppercase

word

40

action
returns the word with
that part of
speech (eg conjugate go
#VERB_PAST_PARTICIPLE)
(returns the original
word in sentence)
tells you how many
syllables a word has
converts a number to
64bit hex
converts a number to
32 bit hex
return 1 if it is (mixed
alpha/numeric). Fails
otherwise.
return 1 if it is all
digits, fails otherwise
return 1 if it is float,
fails otherwise
return 1 if it begins
with an uppercase
letter, fails otherwise

part-of-speech

word/verb/number(+
supplement-data argument)

isalluppercase

word

type

word

common

word

verb

verb

present_participle
past_participle
infinitive
past
present3ps
present
verb

verb
verb
verb
verb
verb
verb
match noun

aux

auxverb pronoun

pronoun

word flip

adjective

word more

most

word

adverb

word more

noun

word proper

action
return 1 if it starts
uppercase, and consists
of entirely uppercase
letters, hyphen,
underscore and
ampersand, fails
otherwise
returns concept,
number, word, or
unknown
returns level of
commonness of the
word
given verb in any form,
return requested form

41

returns noun form
matching verb
(sing./plural).e.g.
(walk match dog) ->
walks
returns verb form
matching pronoun
supplied.for do,have, be
changes person form
for 1st and 2nd person
writes the adjective in
its comparative form:
fast -> faster
the superlative form.
beautiful -> most
beautiful
writes comparative
form: strong ->
strongly
return word as a
proper noun
(appropriately cased)

part-of-speech

word/verb/number(+
supplement-data argument)

lowercaseexist
uppercaseexist
singular
plural
irregular

word
word
word or a number
word or a number
word

determiner

word noun

place

integer

capitalize
uppercase
lowercase
allupper
canonical
integer

word
word
word
word
word
floatnumber

action

== 1
>1
return value only for
irregular nouns
add a determiner
“a/an” if it needs one
return place number of
integer

see notes
generate integer if float
is exact integer

Example:
# get first name (in a not English language), and capitalize
u: what's your first name?
#! giuditta
a: ( _* )
$_name = ^original(_0)
Nice to meet you, ^pos(capitalize $_name)
# if user enter giuditta, the rejoinder output: Nice to meet you, Giuditta
For ˆpos(canonical), there is an optional third argument which is the concept
name of the pos-tag. Foreign words may have multiple lemma forms based on
part of speech. E.g., in the German dictionary you can find this entry:

Informationstechnische ( NOUN ADJECTIVE NOUN_SINGULAR NOUN_PLURAL ) lemma=`informationst
which says there are two forms of canonical, one for ADJA (adjective) and one
for NN (noun). If you don’t specify a 3rd argument, you get the first one (ADJA).
If you specify ~ADJA you get the first and if you specify ~NN you get the second.
If your third argumernt is all then the list of all canonical forms is returned
with | separating the entries.

42

ˆdecodeInputtoken ( number )
Display the text values of tokenflag bits. You can pass it %token to see the
meanings of the current sentence analysis or $cs_token to see what you have
current set as token controls.
ˆdecodepos ( pos location )
Translates into text the 64bit pos data at given location. location can be a
position in the sentence (1. . . number of words) or a match variable found from
some location in the sentence). See dictionary.h for meanings of bits. Type word
will classify word as concept, word, number, or unknown.
ˆdecodepos ( role location )
Returns the text of the role data of the given location.
ˆlayer ( word )
When was this word entered into the dictionary. Answers are: wordnet, 0, 1, 2,
user.
ˆpartofspeech ( location )
Gets the 64-bit part-of-speech information about a word at location, resulting
from parsing. Location can be a position in the sentence (1. . . number of words)
or a match variable found from some location in the sentence). See dictionary.h
for meanings of bits.
ˆphrase ( type matchvar )
Can be used to retrieve all of a prepositional phrase or a noun phrase. type is
noun, prepositional, verbal, adjective. Optional 3rd argument canonical
will return the canonical phrase rather than the original phrase. E.g.,for input:
u: (I ~verb _~directobject) $tmp = ^phrase(noun _0)
with input I love red herring $tmp is set to red herring

43

ˆrole ( location )
Gets the 32-bit role information about a word at location, resulting from parsing.
Location can be a position in the sentence (1. . . number of words) or a match
variable found from some location in the sentence). See dictionary.h for meanings
of bits.
ˆtally ( word {value} )
Only valid during current volley. You can associate a 32-bit number with a word
by ˆtally(test 35) and retrieve it via ˆtally(test).
ˆrhyme ( word )
Finds a word in the dictionary which is the same except for the first letter (a
cheap rhyme).
ˆsubstitute ( mode find oldtext newtext)
Outputs the result of substitution. Mode can be character or word or insensitive.
In the text given by find, the system will search for oldtext and replace it with
newtext, for all occurrences. This is non-recursive, so it does not also substitute
within replaced text. Since find is a single argument, you pass a phrase or
sentence by using underscores instead of spaces. ˆsubstitute will convert all
underscores to spaces before beginning substitution and will output the spaced
results.
In character mode, the system finds oldtext as characters anywhere in newtext.
In word mode it only finds it as whole words in newtext. Finding is case sensitive,
unless you use the argument insensitive, which will do character mode insensitive
match. You can select insensitive word match by making the first argument be
a text string containing the normal 1st argument values, e.g. insensitive word
^substitute(word "I love lovely flowers" love hate)
outputs I hate lovely flowers
^substitute(character "I love lovely flowers" love hate)
outputs I hate hately flowers
ˆspell ( pattern fact-set )
Given a pattern, find words from the dictionary that meets it and create facts
for them that get stored in the referenced fact set. The facts are created with

44

subject 1, verb word, and object the found word. The pattern is a text string
describing possibly the length and letter constraints.
If there is an exact length of word, it must be first in the pattern. After which
the system matches the letters you provide against the start of the word up until
your pattern either ends or has an asterisk or a period. A period means match
any letter.
An asterisk matches any number of letters and would normally be followed by
more letters. The * will swallow letters in the dictionary word until it can match
the rest of your given pattern. It will keep trying as needed. Eg.
^spell(4the @1) will find them but not their
^spell(am*ic @1) will find American
^spell(a*ent @1) will find abasement
^spell(h.l.o @1) will find hello
ˆsexed ( word he-choice she-choice it-choice )
Given a word, depending on its sex the system outputs one of the three sex
choices given. An unrecognized word uses it.
^sexed(Georgina he she it)
would return she
ˆuppercase ( word )
Is the given word starting with an uppercase letter? Match variable binds usually
reflect how the user entered the word. This allows you to see what case they
entered it in. Returns 1 if yes and 0 otherwise.
ˆformat( integer/float

formatstring value)

This is a thin wrapper over sprintf. The first argument tells ChatScript what
kind of argument you are passing (since everything is a string to ChatScript).
The second argument is a string which is the format string for sprintf. The third
argument is the number to convert. For floats, you will always be passing a
double float so bear that in mind with your formatting. For integer, if you use
a %d format, you will be using a 32-bit value. For ll formats you will be using
64-bit but it won’t work well on Windows output because Windows uses their
own sprintf notation.

45

ˆaddproperty ( word flag1 ... flagn )
given the word, the dictionary entry for it is marked with additional properties,
the flags given which must match property flags or system flags in dictionarySystem.h. Typically used to mark up titles of books and things when building world
data.
In particular, however, if you are adding phrases or words not in the dictionary which will be used as patterns in match, you should mark them with
PATTERN_WORD. To create a dynamic concept, mark the set name as CONCEPT.
You can also add fact properties to all members of a set of facts via
^addproperty(@4 flag1 ... flagn).
These flags are also predefined in dictionarysystem.h and you can use some of the
predefined but meaningless ones to do what you want. These are User_flag4,
User_flag3, User_flag2, User_flag1.
ˆdefine( word )
Output the definition of the word. An optional second argument is the part of
speech: noun verb adjective adverb, which will limit the definition to just that
part of speech. Never fails but may return null.
The second argument can also be all which means list all definitions per part
of speech, not just the first. And it can be the third optional argument so you
can get all meanings of a word as a noun, for example.
ˆhasanyproperty ( word value )
Does this word have any of these property or systemflag bits? You can have up
to 5 values as arguments, e.g.,
^hasproperty(dog NOUN VERB ADJECTIVE ADVERB PREPOSITION)
If the word is not in the dictionary, it will infer it, allowing it to handle things
like verb tenses. If you want to insure the word already exists first, you should
do ˆproperties(dog) AND ˆhasproperty(dog xxx) since property fails if the
word is not found.
ˆhasallproperty( word value )
Does this word have all property or systemflag bits mentioned? You can have
up to 5 values as arguments, e.g.,
^hasallproperties(dog NOUN VERB ADJECTIVE ADVERB PREPOSTION)

46

Values should be all upper case. If the word is not in the dictionary, it will infer
it, allowing it to handle things like verb tenses. If you want to insure the word
already exists first, you should do
^properties(dog) AND ^hasproperty(dog xxx)
since property fails if the word is not found.
ˆremoveinternalflag ( word value )
Removes named internal flag from word.
Currently only value is HAS_SUBSTITUTE, which allows you to disable a
word/phrase substitution. Use as word the full text of the left entry in a
substitutions file. E.g.,
 maps to ~yes normally. If you do ˆremoveinternalflag(
 HAS_SUBSTITUTE) then it will no longer do that.
This is a permanent change to the resident dictionary, which will take effect until
the system is reloaded.
ˆremoveproperty ( word value )
Remove this property bit from this word.
This effect lasts until the system is reloaded. Value should be all upper case. Value
is normally a system flag value or a property value from dictionarysystem.h
which does not need a hash in front of it (system will look up the name).
word can be in doublequotes. And there are two internal bits that are also
allowed to be removed: CONCEPT and HAS_SUBSITUTE.
You can use HAS_SUBSTITUTE to disable some standard substitution in LIVEDATA, but you can’t apply this at build time because the system won’t remember.
Instead call it from ˆcsboot during startup.
Instead call it from ˆcsboot during startup. For example, in LIVEDATA
interjections file, there is an entry:

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 61
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.17
Create Date                     : 2018:04:01 07:26:19-07:00
Modify Date                     : 2018:04:01 07:26:19-07:00
Trapped                         : False
PTEX Fullbanner                 : This is MiKTeX-pdfTeX 2.9.6211 (1.40.17)
EXIF Metadata provided by EXIF.tools

Navigation menu