Pawn Language Guide

Pawn_Language_Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 194

DownloadPawn Language Guide
Open PDF In BrowserView PDF
Pawn

embedded scripting language

The Language

January 2016

CompuPhase

ii
“CompuPhase” and “Pawn” are trademarks of ITB CompuPhase.
“Java” is a trademark of Sun Microsystems, Inc.
“Microsoft” and “Microsoft Windows” are registered trademarks of Microsoft
Corporation.
“Linux” is a registered trademark of Linus Torvalds.
“Unicode” is a registered trademark of Unicode, Inc.

c 1997–2016, ITB CompuPhase
Copyright ⃝
Eerste Industriestraat 19–21, 1401VL Bussum The Netherlands
telephone: (+31)-(0)35 6939 261
e-mail: info@compuphase.com
WWW: http://www.compuphase.com
This manual and the associated software are made available under the conditions listed in appendix D of this manual.
Typeset with TEX in the “DejaVu” typeface family.

iii

Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A tutorial introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Arithmetic and expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Arrays and constants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Using functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Rational numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Symbolic subscripts (structured data) . . . . . . . . . . . . . . . . . . . . . . 19
Bit operations to manipulate “sets” . . . . . . . . . . . . . . . . . . . . . . . . . 22
A simple RPN calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Event-driven programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
State programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Program verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Documentation comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Warnings and errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
In closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Data and declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
State variable declarations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
Static local declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Static global declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Stock declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Public declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Constant variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Arrays (single dimension) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Progressive initiallers for arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Symbolic subscripts for arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Multi-dimensional arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Arrays and the sizeof operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64
Tag names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Function arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Coercion rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Calling functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Forward declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
State classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Public functions, function main . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Static functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Stock functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

iv

— Table of contents

Native functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
User-defined operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
The preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

General syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Operators and expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Bit manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Assignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104
Relational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Operator precedence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Proposed function library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Core functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Console functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125
Date/time functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
File input/output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Fixed point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Floating point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Process and library call interface . . . . . . . . . . . . . . . . . . . . . . . . . . 130
String manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Pitfalls: differences from C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Assorted tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Working with characters and strings . . . . . . . . . . . . . . . . . . . . . . 134
Internationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Working with tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Concatenating lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
A program that generates its own source code . . . . . . . . . . . 144
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A: Error and warning messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
B: The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
C: Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
D: License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

1

Foreword
PAWN is a simple, typeless, 32-bit “scripting” language with a
C-like syntax. Execution speed, stability, simplicity and a small
footprint were essential design criteria for both the language and
the interpreter/abstract machine that a PAWN program runs on.
An application or tool cannot do or be everything for all users.
This not only justifies the diversity of editors, compilers, operating systems and many other software systems, it also explains
the presence of extensive configuration options and macro or
scripting languages in applications. My own applications have
contained a variety of little languages; most were very simple,
some were extensive. . . and most needs could have been solved
by a general purpose language with a special purpose library.
Hence, PAWN.
The PAWN language was designed as a flexible language for manipulating objects in a host application. The tool set (compiler,
abstract machine) were written so that they were easily extensible and would run on different software/hardware architectures.

PAWN is a descendent of the original Small C by Ron Cain and
James Hendrix, which at its turn was a subset of C. Some of the
modifications that I did to Small C, e.g. the removal of the type
system and the substitution of pointers by references, were so
fundamental that I could hardly call my language a “subset of
C” or a “C dialect” any more. Therefore, I stripped off the “C”
from the title and used the name “SMALL” for the name of the
language in my publication in Dr. Dobb’s Journal and the years
since. During development and maintenance of the product, I
received many requests for changes. One of the frequently requested changes was to use a different name for the language
—searching for information on the SMALL scripting language on
the Internet was hindered by “small” being such a common word.
The name change occurred together with a significant change in
the language: the support of “states” (and state machines).
I am indebted to Ron Cain and James Hendrix (and more recently,
Andy Yuen), for their work on Small C and to Dr. Dobb’s Journal
for publishing it. Although I must have touched nearly every line
of the original code multiple times, the Small C origins are still
clearly visible.

2

—

Foreword

A detailed treatise of the design goals and compromises is in appendix C; here I would like to summarize a few key points. As
written in the previous paragraphs, PAWN is for customizing applications (by writing scripts), not for writing applications. PAWN
is weak on data structuring because PAWN programs are intended
to manipulate objects (text, sprites, streams, queries, . . . ) in the
host application, but the PAWN program is, by intent, denied direct access to any data outside its abstract machine. The only
means that a PAWN program has to manipulate objects in the host
application is by calling subroutines, so called “native functions”,
that the host application provides.
PAWN is flexible in that key area: calling functions. PAWN supports default values for any of the arguments of a function, callby-reference as well as call-by-value, and “named” as well as
“positional” function arguments. PAWN does not have a “type
checking” mechanism, by virtue of being a typeless language,
but it does offer in replacement a “classification checking” mechanism, called “tags”. The tag system is especially convenient for
function arguments because each argument may specify multiple
acceptable tags.
For any language, the power (or weakness) lies not in the individual features, but in their combination. For PAWN, I feel that
the combination of named arguments —which lets you specify
function arguments in any order, and default values —which allows you to skip specifying arguments that you are not interested
in, blend together to a convenient and “descriptive” way to call
(native) functions to manipulate objects in the host application.

3

A tutorial introduction
PAWN is a simple programming language with a syntax reminiscent to the “C” programming language. A PAWN program consists of a set of functions and a set of variables. The variables
are data objects and the functions contain instructions (called
“statements”) that operate on the data objects or that perform
tasks.
The first program in almost any computer language is one that
prints a simple string; printing “Hello world” is a classic example.
In PAWN, the program would look like:
LISTING:

Compiling and running scripts: page
167

hello.p

main()
printf "Hello world\n"

This manual assumes that you know how to run a PAWN program;
if not, please consult the application manual or appendix B).
A PAWN program starts execution in an “entry” function∗ —in
nearly all examples of this manual, this entry function is called
“main”. Here, the function main contains only a single instruction, which is at the line below the function head itself. Line
breaks and indenting are insignificant; the invocation of the function print could equally well be on the same line as the head of
function main.
The definition of a function requires that a pair of parentheses
follow the function name. If a function takes parameters, their
declarations appear between the parentheses. The function main
does not take any parentheses. The rules are different for a function invocation (or a function call); parentheses are optional in
the call to the print function.
The single argument of the print function is a string, which must
be enclosed in double quotes. The characters \n near the end of
the string form an escape sequence, in this case they indicate a
“newline” symbol. When print encounters the newline escape
sequence, it advances the cursor to the first column of the next
line. One has to use the \n escape sequence to insert a “newline”
into the string, because a string may not wrap over multiple lines.
∗

This should not be confused with the “state” entry functions, which are
called entry, but serve a different purpose —see page 39.

String literals: 98
Escape sequence:
97

4

—

A tutorial introduction

PAWN is a “case sensitive” language: upper and lower case letters are considered to be different letters. It would be an error to
spell the function printf in the above example as “PrintF”. Keywords and predefined symbols, like the name of function “main”,
must be typed in lower case.
If you know the C language, you may feel that the above example
does not look much like the equivalent “Hello world” program in
C/C++ . PAWN can also look very similar to C, though. The next
example program is also valid PAWN syntax (and it has the same
semantics as the earlier example):
LISTING:

hello.p — C style

#include 
main()
{
printf("Hello world\n");
}

These first examples also reveal a few differences between PAWN
and the C language:
⋄ there is usually no need to include any system-defined “header
file”;
⋄ semicolons are optional (except when writing multiple statements on one line);
⋄ when the body of a function is a single instruction, the braces
(for a compound instruction) are optional;
⋄ when you do not use the result of a function in an expression
or assignment, parentheses around the function argument are
optional.
As an aside, the few preceding points refer to optional syntaxes.
It is your choice what syntax you wish to use: neither style is
“deprecated” or “preferred”. The examples in this manual position the braces and use an indentation that is known as the
“Whitesmith’s style”, but PAWN is a free format language and
other indenting styles are just as good.

More function descriptions at page
121

Because PAWN is designed to be an extension language for applications, the function set/library that a PAWN program has at its
disposal depends on the host application. As a result, the PAWN
language has no intrinsic knowledge of any function. The print
function, used in this first example, must be made available by
the host application and be “declared” to the PAWN parser.† It is
†

In the language specification, the term “parser” refers to any implementation that processes and runs on conforming Pawn programs —either inter-

Arithmetic and expressions

— 5

assumed, however, that all host applications provide a minimal
set of common functions, like print and printf.
In some environments, the display or terminal must be enabled
before any text can be output onto it. If this is the case, you
must add a call to the function “console” before the first call to
function print or printf. The console function also allows you
to specify device characteristics, such as the number of lines and
columns of the display. The example programs in this manual do
not use the console functions, because many platforms do not
require or provide it.

Arithmetic and expressions
Fundamental elements of most programs are calculations, decisions (conditional execution), iterations (loops) and variables
to store input data, output data and intermediate results. The
next program example illustrates many of these concepts. The
program calculates the greatest common divisor of two values
using an algorithm invented by Euclides.
LISTING: gcd.p
/*
The greatest common divisor of two values,
using Euclides' algorithm.
*/
main()
{
print "Input two values\n"
new a = getvalue()
new b = getvalue()
while (a != b)
if (a > b)
a = a - b
else
b = b - a
printf "The greatest common divisor is %d\n", a
}

Function main now contains more than just a single “print” statement. When the body of a function contains more than one statement, these statements must be embodied in braces —the “{”
and “}” characters. This groups the instructions to a single compound statement. The notion of grouping statements in a compound statement applies as well to the bodies of if–else and loop
instructions.
preters or compilers.

Compound statement: 109

6
Data declarations
are covered in
detail starting at
page 59

—

Arithmetic and expressions

The new keyword creates a variable. The name of the variable
follows new. It is common, but not imperative, to assign a value
to the variable already at the moment of its creation. Variables
must be declared before they are used in an expression. The
getvalue function (also common predefined function) reads in a
value from the keyboard and returns the result. Note that PAWN
is a typeless language, all variables are numeric cells that can
hold a signed integral value.
The getvalue function name is followed by a pair of parentheses.
These are required because the value that getvalue returns is
stored in a variable. Normally, the function’s arguments (or parameters) would appear between the parentheses, but getvalue
(as used in this program) does not take any explicit arguments.
If you do not assign the result of a function to a variable or use
it in a expression in another way, the parentheses are optional.
For example, the result of the print and printf statements are
not used. You may still use parentheses around the arguments,
but it is not required.

“while” loop: 113
“if–else”: 111

Loop instructions, like “while”, repeat a single instruction as
long as the loop condition (the expression that follows the while
keyword) is “true”. One can execute multiple instructions in a
loop by grouping them in a compound statement. The if–else
instruction has one instruction for the “true” clause and one for
the “false”.
Observe that some statements, like while and if–else, contain
(or “fold around”) another instruction —in the case of if–else
even two other instructions. The complete bundle is, again, a
single instruction. That is:
⋄ the assignment statements “a = a - b” below the if and “b =
b - a” below the else are statements;
⋄ the if–else statement folds around these two assignments and
forms a single statement of itself;
⋄ the while statement folds around the if–else statement and
forms, again, a single statement.
It is common to make the nesting of the statements explicit by
indenting any sub-statements below a statement in the source
text. In the “Greatest Common Divisor” example, the left margin
indent increases by four space characters after the while statement, and again after the if and else keywords. Statements that
belong to the same level, such as both printf invocations and the
while loop, have the same indentation.

Arrays and constants

— 7

The loop condition for the while loop is “(a != b)”; the symbol
!= is the “not equal to” operator. That is, the if–else instruction
is repeated until “a” equals “b”. It is good practice to indent the
instructions that run under control of another statement, as is
done in the preceding example.
The call to printf, near the bottom of the example, differs from
the print call right below the opening brace (“{”). The “f” in
printf stands for “formatted”, which means that the function
can format and print numeric values and other data (in a userspecified format), as well as literal text. The %d symbol in the
string is a token that indicates the position and the format that
the subsequent argument to function printf should be printed.
At run time, the token %d is replaced by the value of variable “a”
(the second argument of printf).
Function print can only print text; it is quicker than printf. If
you want to print a literal “%” at the display, you have to use
print, or you have to double it in the string that you give to
printf. That is:
print "20% of the personnel accounts for 80% of the costs\n"

and
printf "20%% of the personnel accounts for 80%% of the costs\n"

print the same string.

Arrays and constants
Next to simple variables with a size of a single cell, PAWN supports “array variables” that hold many cells/values. The following example program displays a series of prime numbers using
the well known “sieve of Eratosthenes”. The program also introduces another new concept: symbolic constants. Symbolic
constants look like variables, but they cannot be changed.
LISTING:

sieve.p

/* Print all primes below 100, using the "Sieve of Eratosthenes" */
main()
{
const max_primes = 100
new series[max_primes] = [ true, ... ]
for (new i = 2; i < max_primes; ++i)
if (series[i])
{
printf "%d ", i
/* filter all multiples of this "prime" from the list */
for (new j = 2 * i; j < max_primes; j += i)

Relational operators: 105

8

—

Using functions
series[j] = false
}

}

Constant declaration: 100

Progressive initiallers: 62

“for” loop: 110

An overview of all
operators: 102

When a program or sub-program has some fixed limit built-in, it is
good practice create a symbolic constant for it. In the preceding
example, the symbol max_primes is a constant with the value 100.
The program uses the symbol max_primes three times after its
definition: in the declaration of the variable series and in both
for loops. If we were to adapt the program to print all primes
below 500, there is now only one line to change.
Like simple variables, arrays may be initialized upon creation.
PAWN offers a convenient shorthand to initialize all elements to
a fixed value: all hundred elements of the “series” array are
set to true —without requiring that the programmer types in the
word “true” a hundred times. The symbols true and false are
predefined constants.
When a simple variable, like the variables i and j in the primes
sieve example, is declared in the first expression of a for loop,
the variable is valid only inside the loop. Variable declaration has
its own rules; it is not a statement —although it looks like one.
One of the special rules for variable declaration is that the first
expression of a for loop may contain a variable declaration.
Both for loops also introduce new operators in their third expression. The ++ operator increments its operand by one; meaning
that, ++i is equal to i = i + 1. The += operator adds the expression on its right to the variable on its left; that is, j += i is equal
to j = j + i.
There is an “off-by-one” issue that you need to be aware if when
working with arrays. The first element in the series array is series[0], so if the array holds max_primes elements, the last element in the array is series[max_primes-1]. If max_primes is 100,
the last element, then, is series[99]. Accessing series[100] is
invalid.

Using functions
Larger programs separate tasks and operations into functions.
Using functions increases the modularity of programs and functions, when well written, are portable to other programs. The
following example implements a function to calculate numbers
from the Fibonacci series.

Using functions

— 9

The Fibonacci sequence was coined by Leonardo “Fibonacci” of
Pisa, an Italian mathematician of the 13th century —whose greatest achievement was popularizing the Hindu-Arabic numerals in
the Western world. The goal of the sequence was to describe the
growth of a population of (idealized) rabbits; and the sequence
is 1, 1, 2, 3, 5, 8, 13, 21,. . . (every next value is the sum of its two
predecessors).
LISTING:

fib.p

/* Calculation of Fibonacci numbers by iteration */
main()
{
print "Enter a value: "
new v = getvalue()
if (v > 0)
printf "The value of Fibonacci number %d is %d\n",
v, fibonacci(v)
else
printf "The Fibonacci number %d does not exist\n", v
}
fibonacci(n)
{
assert n > 0
new a = 0, b = 1
for (new i = 2; i < n; i++)
{
new c = a + b
a = b
b = c
}
return a + b
}

The assert instruction at the top of the fibonacci function deserves explicit mention; it guards against “impossible” or invalid
conditions. A negative Fibonacci number is invalid, and the assert statement flags it as a programmer’s error if this case ever
occurs. Assertions should only flag programmer’s errors, never
user input errors.
The implementation of a user-defined function is not much different than that of function main. Function fibonacci shows two
new concepts, though: it receives an input value through a parameter and it returns a value (it has a “result”).
Function parameters are declared in the function header; the
single parameter in this example is “n”. Inside the function, a
parameter behaves as a local variable, but one whose value is
passed from the outside at the call to the function.

“assert” statement: 109

Functions: properties & features:
68

10

—

Using functions

The return statement ends a function and sets the result of the
function. It need not appear at the very end of the function; early
exits are permitted.
Native function
interface: 82

The main function of the Fibonacci example calls predefined “native” functions, like getvalue and printf, as well as the userdefined function fibonacci. From the perspective of calling a
function (as in function main), there is no difference between
user-defined and native functions.
The Fibonacci numbers sequence describes a surprising variety
of natural phenomena. For example, the two or three sets of spirals in pineapples, pine cones and sunflowers usually have consecutive Fibonacci numbers between 5 and 89 as their number
of spirals. The numbers that occur naturally in branching patterns (e.g. that of plants) are indeed Fibonacci numbers. Finally,
although the Fibonacci sequence is not a geometric sequence,
the further the sequence is extended, the more closely the ratio
between successive terms approaches the Golden Ratio, of 1.618
that appears so often in art and architecture.∗
• Call-by-reference and call-by-value
Dates are a particularly rich source of algorithms and conversion
routines, because the calenders that a date refers to have known
such a diversity, through time and around the world.
The “Julian Day Number” is attributed to Josephus Scaliger† and
it counts the number of days since November 24, 4714 BC (proleptic Gregorian calendar‡ ). Scaliger chose that date because it
marked the coincidence of three well-established cycles: the 28year Solar Cycle (of the old Julian calendar), the 19-year Metonic
Cycle and the 15-year Indiction Cycle (periodic taxes or governmental requisitions in ancient Rome), and because no literature
√
∗
The exact value for the Golden Ratio is 1/2( 5 + 1). The relation between
Fibonacci numbers and the Golden Ratio also allows for a “direct” calculation of any sequence number, instead of the iterative method described
here.
†

‡

There is some debate on exactly what Josephus Scaliger invented and who
or what he called it after.
The Gregorian calendar was decreed to start on 15 October 1582 by pope
Gregory XIII, which means that earlier dates do not really exist in the Gregorian calendar. When extending the Gregorian calendar to days before 15
October 1582, we refer to it as the proleptic Gregorian calendar.

Using functions

— 11

or recorded history was known to pre-date that particular date in
the remote past. Scaliger used this concept to reconcile dates in
historic documents, later astronomers embraced it to calculate
intervals between two events more easily.
Julian Day numbers (sometimes denoted with unit “JD”) should
not be confused with Julian Dates (the number of days since the
start of the same year), or with the Julian calendar that was introduced by Julius Caesar.
Below is a program that calculates the Julian Day number from a
date in the (proleptic) Gregorian calendar, and vice versa. Note
that in the proleptic Gregorian calendar, the first year is 1 AD
(Anno Domini) and the year before that is 1 BC (Before Christ):
year zero does not exist! The program uses negative year values
for BC years and positive (non-zero) values for AD years.
LISTING:

julian.p

/* calculate Julian Day number from a date, and vice versa */
main()
{
new d, m, y, jdn
print "Give a date (dd-mm-yyyy): "
d = getvalue(_, '-', '/')
m = getvalue(_, '-', '/')
y = getvalue()
jdn = DateToJulian(d, m, y)
printf "Date %d/%d/%d = %d JD\n", d, m, y, jdn
print "Give a Julian Day Number: "
jdn = getvalue()
JulianToDate jdn, d, m, y
printf "%d JD = %d/%d/%d\n", jdn, d, m, y
}
DateToJulian(day, month, year)
{
/* The first year is 1. Year 0 does not exist: it is 1 BC (or -1) */
assert year != 0
if (year < 0)
year++
/* move January and February to the end of the previous year */
if (month <= 2)
year--, month += 12
new jdn = 365*year + year/4 - year/100 + year/400
+ (153*month - 457) / 5
+ day + 1721119
return jdn
}
JulianToDate(jdn, &day, &month, &year)
{
jdn -= 1721119

12

—

Using functions

/* approximate year, then adjust in a loop */
year = (400 * jdn) / 146097
while (365*year + year/4 - year/100 + year/400 < jdn)
year++
year-/* determine month */
jdn -= 365*year + year/4 - year/100 + year/400
month = (5*jdn + 457) / 153
/* determine day */
day = jdn - (153*month - 457) / 5
/* move January and February to start of the year */
if (month > 12)
month -= 12, year++
/* adjust negative years (year 0 must become 1 BC, or -1) */
if (year <= 0)
year-}

Function main starts with creating the variables to hold the day,
month and year, and the calculated Julian Day number. Then
it reads in a date —three calls to getvalue— and calls function
DateToJulian to calculate the day number. After calculating the
result, main prints the date that you entered and the Julian Day
number for that date.

“Call by value”
versus “call by
reference”: 69

Near the top of function DateToJulian, it increments the year
value if it is negative; it does this to cope with the absence of a
“zero” year in the proleptic Gregorian calendar. In other words,
function DateToJulian modifies its function arguments (later, it
also modifies month). Inside a function, an argument behaves
like a local variable: you may modify it. These modifications remain local to the function DateToJulian, however. Function main
passes the values of d, m and y into DateToJulian, who maps them
to its function arguments day, month and year respectively. Although DateToJulian modifies year and month, it does not change
y and m in function main; it only changes local copies of y and m.
This concept is called “call by value”.
The example intentionally uses different names for the local variables in the functions main and DateToJulian, for the purpose
of making the above explanation easier. Renaming main’s variables d, m and y to day, month and year respectively, does not
change the matter: then you just happen to have two local variables called day, two called month and two called year, which is
perfectly valid in PAWN.
The remainder of function DateToJulian is, regarding the PAWN
language, uninteresting arithmetic.

Rational numbers

— 13

Returning to the second part of the function main we see that it
now asks for a day number and calls another function, JulianToDate, to find the date that matches the day number. Function
JulianToDate is interesting because it takes one input argument
(the Julian Day number) and needs to calculate three output values, the day, month and year. Alas, a function can only have a
single return value —that is, a return statement in a function may
only contain one expression. To solve this, JulianToDate specifically requests that changes that it makes to some of its function
arguments are copied back to the variables of the caller of the
function. Then, in main, the variables that must hold the result
of JulianToDate are passed as arguments to JulianToDate.
Function JulianToDate marks the appropriate arguments for being “copied back to caller” by prefixing them with an & symbol.
Arguments with an & are copied back, arguments without is are
not. “Copying back” is actually not the correct term. An argument tagged with an & is passed to the function in a special way
that allows the function to directly modify the original variable.
This is called “call by reference” and an argument that uses it is
a “reference argument”.
In other words, if main passes y to JulianToDate —who maps it
to its function argument year— and JulianToDate changes year,
then JulianToDate really changes y. Only through reference arguments can a function directly modify a variable that is declared
in a different function.
To summarize the use of call-by-value versus call-by-reference: if
a function has one output value, you typically use a return statement; if a function has more output values, you use reference
arguments. You may combine the two inside a single function,
for example in a function that returns its “normal” output via a
reference argument and an error code in its return value.
As an aside, many desktop application use conversions to and
from Julian Day numbers (or varieties of it) to conveniently calculate the number of days between to dates or to calculate the
date that is 90 days from now —for example.

Rational numbers
All calculations done up to this point involved only whole numbers —integer values. PAWN also has support for numbers that
can hold fractional values: these are called “rational numbers”.

14

—

Rational numbers

However, whether this support is enabled depends on the host
application.
Rational numbers can be implemented as either floating-point
or fixed-point numbers. Floating-point arithmetic is commonly
used for general-purpose and scientific calculations, while fixedpoint arithmetic is more suitable for financial processing and applications where rounding errors should not come into play (or
at least, they should be predictable). The PAWN toolkit has both
a floating-point and a fixed-point module, and the details (and
trade-offs) for these modules in their respective documentation.
The issue is, however, that a host application may implement
either floating-point or fixed-point, or both or neither.∗ The program below requires that at least either kind of rational number
support is available; it will fail to run if the host application does
not support rational numbers at all.
LISTING:

c2f.p

#include 
main()
{
new Rational: Celsius
new Rational: Fahrenheit
print "Celsius\t Fahrenheit\n"
for (Celsius = 5; Celsius <= 25; Celsius++)
{
Fahrenheit = (Celsius * 1.8) + 32
printf "%r \t %r\n", Celsius, Fahrenheit
}
}

The example program converts a table of degrees Celsius to degrees Fahrenheit. The first directive of this program is to import
definitions for rational number support from an include file. The
file “rational” includes either support for floating-point numbers or for fixed-point numbers, depending on what is available.
Tag names: 65

The variables Celsius and Fahrenheit are declared with a tag
“Rational:” between the keyword new and the variable name.
A tag name denotes the purpose of the variable, its permitted
use and, as a special case for rational numbers, its memory layout. The Rational: tag tells the PAWN parser that the variables
Celsius and Fahrenheit contain fractional values, rather than
whole numbers.
∗

Actually, this is already true of all native functions, including all native functions that the examples in this manual use.

Strings

— 15

The equation for obtaining degrees Fahrenheit from degrees Celsius is
9
◦
F = + 32 ◦ C
5
The program uses the value 1.8 for the quotient 9/5. When rational number support is enabled, PAWN supports values with a
fractional part behind the decimal point.
The only other non-trivial change from earlier programs is that
the format string for the printf function now has variable placeholders denoted with “%r” instead of “%d”. The placeholder %r
prints a rational number at the position; %d is only for integers
(“whole numbers”).
I used the include file “rational” rather than “float” or “fixed”
in an attempt to make the example program portable. If you
know that the host application supports floating point arithmetic,
it may be more convenient to “#include” the definitions from the
file float and use the tag Float: instead of Rational —when doing so, you should also replace %r by %f in the call to printf. For
details on fixed point and floating point support, please see the
application notes “Fixed Point Support Library” and “Floating
Point Support Library” that are available separately.

Strings
PAWN has no intrinsic “string” type; character strings are stored
in arrays, with the convention that the array element behind the
last valid character is zero. Working with strings is therefore
equivalent with working with arrays.
Among the simplest of encryption schemes is one called “ROT13”
—actually the algorithm is quite “weak” from a cryptographical
point of view. It is most widely used in public electronic forums
(BBSes, Usenet) to hide texts from casual reading, such as the solution to puzzles or riddles. ROT13 simply “rotates” the alphabet
by half its length, i.e. 13 characters. It is a symmetric operation:
applying it twice on the same text reveals the original.
LISTING:

rot13.p

/* Simple encryption, using ROT13 */
main()
{
printf "Please type the string to mangle: "
new str[100]

16

—

Strings

getstring str, sizeof str, .pack = false
rot13 str
printf "After mangling, the string is: \"%s\"\n", str
}
rot13(string[])
{
for (new index = 0; string[index]; index++)
if ('a' <= string[index] <= 'z')
string[index] = (string[index] - 'a' + 13) % 26 + 'a'
else if ('A' <= string[index] <= 'Z')
string[index] = (string[index] - 'A' + 13) % 26 + 'A'
}

In the function header of rot13, the parameter “string” is declared as an array, but without specifying the size of the array —
there is no value between the square brackets. When you specify
a size for an array in a function header, it must match the size of
the actual parameter in the function call. Omitting the array size
specification in the function header removes this restriction and
allows the function to be called with arrays of any size. You must
then have some other means of determining the (maximum) size
of the array. In the case of a string parameter, one can simply
search for the zero terminator.
The for loop that walks over the string is typical for string processing functions. The loop condition is “string[index]”, the
rule for true/false conditions in PAWN is that any value is “true”,
except zero. That is, when the array cell at string[index] is
zero, it is “false” and the loop aborts.
The ROT13 algorithm rotates only letters; digits, punctuation
and special characters are left unaltered. Additionally, upper
and lower case letters must be handled separately. Inside the
for loop, two if statements filter out the characters of interest.
The way that the second if is chained to the “else” clause of
the first if is noteworthy, as it is a typical method of testing for
multiple non-overlapping conditions.
A function that
takes an array
as an argument
and that does not
change it, may
mark the argument as “const”;
see page 70

Earlier in this chapter, the concept of “call by value” versus “call
by reference” was discussed. When you are working with strings,
or arrays in general, note that PAWN always passes arrays by reference. It does this to conserve memory and to increase performance —arrays can be large data structures and passing them
by value requires a copy of this data structure to be made, taking both memory and time. Due to this rule, function rot13 can
modify its function parameter (called “string” in the example)
without needing to declare as a reference argument.

Strings

— 17

Another point of interest are the conditions in the two if statements. The first if, for example, holds the condition “'a' <=
string[index] <= 'z'”, which means that the expression is true
if (and only if) both 'a' <= string[index] and string[index] <=
'z' are true. In the combined expression, the relational operators are said to be “chained”, as they chain multiple comparisons
in one condition.
Finally, note how the last printf in function main uses the escape
sequence \" to print a double quote. Normally a double quote
ends the literal string; the escape sequence “\"” inserts a double
quote into the string.

Staying on the subject of strings and arrays, below is a program
that separates a string of text into individual words and counts
them. It is a simple program that shows a few new features of
the PAWN language.
LISTING: wcount.p
/* word count: count words on a string that the user types */
#include 
main()
{
print "Please type a string: "
new string[100]
getstring string, sizeof string, false
new count = 0
new word[20]
new index
for ( ;; )
{
word = strtok(string, index)
if (strlen(word) == 0)
break
count++
printf "Word %d: '%s'\n", count, word
}
printf "\nNumber of words: %d\n", count
}
strtok(const string[], &index)
{
new length = strlen(string)
/* skip leading white space */
while (index < length && string[index] <= ' ')
index++
/* store the word letter for letter */
new offset = index
/* save start position of token */
new result[20]
/* string to store the word in */
while (index < length

Relational operators: 105

Escape sequence:
97

18

—

Strings
&& string[index] > ' '
&& index - offset < sizeof result - 1)

{
result[index - offset] = string[index]
index++
}
result[index - offset] = EOS
/* zero-terminate the string */
return result
}

“for” loop: 110

Function main first displays a message and retrieves a string that
the user must type. Then it enters a loop: writing “for (;;)” creates a loop without initialisation, without increment and without
test —it is an infinite loop, equivalent to “while (true)”. However, where the PAWN parser will give you a warning if you type
“while (true)” (something along the line “redundant test expression; always true”), “for (;;)” passes the parser without
warning.
A typical use for an infinite loop is a case where you need a loop
with the test in the middle —a hybrid between a while and a
do. . . while loop, so to speak. PAWN does not support loops-witha-test-in-the middle directly, but you can imitate one by coding an
infinite loop with a conditional break. In this example program,
the loop:
⋄ gets a word from the string —code before the test;
⋄ tests whether a new word is available, and breaks out of the
loop if not —the test in the middle;
⋄ prints the word and its sequence number —code after the test.
As is apparent from the line “word = strtok(string, index)”
(and the declaration of variable word), PAWN supports array assignment and functions returning arrays. The PAWN parser verifies that the array that strtok returns has the same size and
dimensions as the variable that it is assigned into.
Function strlen is a native function (predefined), but strtok is
not: it must be implemented by ourselves. The function strtok
was inspired by the function of the same name from C/C++ , but
it does not modify the source string. Instead it copies characters
from the source string, word for word, into a local array, which
it then returns.

A common operation is to clear a string. There are various ways
to do so. The recommended way to clear a string is to assign a
zero-length literal string to the variable.

Symbolic subscripts (structured data)
LISTING:

— 19

clearing a string

my_string = ""

// assuming my_string is declared as packed array

Symbolic subscripts (structured data)
In a typeless language, we might assign a different purpose to
some array elements than to other elements in the same array.
PAWN supports symbolic substripts that allow to assign specific
tag names or ranges to individual array elements.
The example to illustrate symbolic subscripts is longer than previous PAWN programs, and it also displays a few other features,
such as global variables and named parameters.
LISTING:

queue.p

/* Priority queue (for simple text strings) */
#include 
main()
{
new msg[.text{40}, .priority]
/* insert a few items (read from console input) */
printf "Please insert a few messages and their priorities; " ...
"end with an empty string\n"
for ( ;; )
{
printf "Message: "
getstring msg.text, .pack = true
if (strlen(msg.text) == 0)
break
printf "Priority: "
msg.priority = getvalue()
if (!insert(msg))
{
printf "Queue is full, cannot insert more items\n"
break
}
}
/* now print the messages extracted from the queue */
printf "\nContents of the queue:\n"
while (extract(msg))
printf "[%d] %s\n", msg.priority, msg.text
}
const queuesize = 10
new queue[queuesize][.text{40}, .priority]
new queueitems = 0
insert(const item[.text{40}, .priority])
{
/* check if the queue can hold one more message */
if (queueitems == queuesize)
return false
/* queue is full */

20

—

Symbolic subscripts (structured data)

/* find the position to insert it to */
new pos = queueitems
/* start at the bottom */
while (pos > 0 && item.priority > queue[pos-1].priority)
--pos
/* higher priority: move up a slot */
/* make place for the item at the insertion spot */
for (new i = queueitems; i > pos; --i)
queue[i] = queue[i-1]
/* add the message to the correct slot */
queue[pos] = item
queueitems++
return true
}
extract(item[.text{40}, .priority])
{
/* check whether the queue has one more message */
if (queueitems == 0)
return false
/* queue is empty */
/* copy the topmost item */
item = queue[0]
--queueitems
/* move the queue one position up */
for (new i = 0; i < queueitems; ++i)
queue[i] = queue[i+1]
return true
}

Function main starts with a declaration of array variable msg. The
array has two fields, “.text” and “.priority”; the “.text” field
is declared as a sub-array holding 40 characters. The period is
required for symbolic subscripts and there may be no space between the period and the name.
When an array is declared with symbolic subscripts, it may only
be indexed with these subscripts. It would be an error to say,
for example, “msg[0]”. On the other hand, since there can only
be a single symbolic subscript between the brackets, the brackets become optional. That is, you can write “msg.priority” as a
shorthand for “msg.[priority]”.
Further in main are two loops. The for loop reads strings and
priority values from the console and inserts them in a queue.
The while loop below that extracts element by element from the
queue and prints the information on the screen. The point to
note, is that the for loop stores both the string and the priority
number (an integer) in the same variable msg; indeed, function
main declares only a single variable. Function getstring stores
the message text that you type starting at array msg.text while
the priority value is stored (by an assignment a few lines lower)

Symbolic subscripts (structured data)

— 21

in msg.priority. The printf function in the while loop reads the
string and the value from those positions as well.
At the same time, the msg array is an entity on itself: it is passed
in its entirety to function insert. That function, in turn, says near
the end “queue[queueitems] = item”, where item is an array with
the same declaration as the msg variable in main, and queue is a
two-dimensional array that holds queuesize elements, with the
minor dimension having symbolic subscripts. The declaration of
queue and queuesize are just above function insert.
At several spots in the example program, the same symbolic subscripts are repeated. In practice, a program would declare the
list of symbolic constants in a #define directive and declare the
arrays using this text-substition macro. This saves typing and
makes modifications of the declaration easier to maintain. Concretely, when adding near the top of the program the following
line:
#define MESSAGE[.text{40}, .priority]

you can declare an array as “msg[MESSAGE]” and subsequently
access the symbolic subscripts.
The example implements a “priority queue”. You can insert a
number of messages into the queue and when these messages
all have the same priority, they are extracted from the queue
in the same order. However, when the messages have different
priorities, the one with the highest priority comes out first. The
“intelligence” for this operation is inside function insert: it first
determines the position of the new message to add, then moves
a few messages one position upward to make space for the new
message. Function extract simply always retrieves the first element of the queue and shifts all remaining elements down by
one position.
Note that both functions insert and extract work on two shared
variables, queue and queueitems. A variable that is declared inside a function, like variable msg in function main can only be
accessed from within that function. A “global variable” is accessible by all functions, and that variable is declared outside
the scope of any function. Variables must still be declared before they are used, so main cannot access variables queue and
queueitems, but both insert and extract can.
Function extract returns the messages with the highest priority via its function argument item. That is, it changes its function argument by copying the first element of the queue array

22

—

Bit operations to manipulate “sets”

into item. Function insert copies in the other direction and it
does not change its function argument item. In such a case, it
is advised to mark the function argument as “const”. This helps
the PAWN parser to both check for errors and to generate better
(more compact, quicker) code.
Named parameters: 71
getstring: 126

A final remark on this latest sample is the call to getstring in
function main: if you look up the function declaration, you will
see that it takes three parameters, two of which are optional. In
this example, only the first and the last parameters are passed
in. Note how the example avoids ambiguity about which parameter follows the first, by putting the argument name in front of
the value. By using “named parameters” rather than positional
parameters, the order in which the parameters are listed is not
important. Named parameters are convenient in specifying —
and deciphering— long parameter lists.

Bit operations to manipulate “sets”
A few algorithms are most easily solved with “set operations”,
like intersection, union and inversion. In the figure below, for example, we want to design an algorithm that returns us the points
that can be reached from some other point in a specified maximum number of steps. For example, if we ask it to return the
points that can be reached in two steps starting from B, the algorithm has to return C, D, E and F, but not G because G takes
three steps from B.

Our approach is to keep, for each point in the graph, the set of
other points that it can reach in one step —this is the “next_step”
set. We also have a “result” set that keeps all points that we
have found so far. We start by setting the result set equal to
the next_step set for the departure point. Now we have in the

Bit operations to manipulate “sets” — 23
result set all points that one can reach in one step. Then, for
every point in our result set, we create a union of the result set
and the next_step set for that point. This process is iterated for

a specified number of loops.
An example may clarify the procedure outlined above. When the
departure point is B, we start by setting the result set to D and
E —these are the points that one can reach from B in one step.
Then, we walk through the result set. The first point that we encounter in the set is D, and we check what points can be reached
from D in one step: these are C and F. So we add C and F to the
result set. We knew that the points that can be reached from D
in one step are C and F, because C and F are in the next_step set
for D. So what we do is to merge the next_step set for point D
into the result set. The merge is called a “union” in set theory.
That handles D. The original result set also contained point E,
but the next_step set for E is empty, so no more point is added.
The new result set therefore now contains C, D, E and F.
A set is a general purpose container for elements. The only information that a set holds of an element is whether it is present
in the set or not. The order of elements in a set is insignificant
and a set cannot contain the same element multiple times. The
PAWN language does not provide a “set” data type or operators
that work on sets. However, sets with up to 32 elements can
be simulated by bit operations. It takes just one bit to store a
“present/absent” status and a 32-bit cell can therefore maintain
the status for 32 set elements —provided that each element is
assigned a unique bit position.
The relation between set operations and bitwise operations is
summarized in the following table. In the table, an upper case
letter stands for a set and a lower case letter for an element from
that set.
concept
intersection
union
complement
empty set
membership

mathematical notation PAWN expression
A∩B
A & B
A∪B
A|B
A
~A
ε
0
x∈A
(1 << x) & A

To test for membership —that is, to query whether a set holds a
particular element, create a set with just one element and take
the intersection. If the result is 0 (the empty set) the element is
not in the set. Bit numbering starts typically at zero; the lowest

24

—

Bit operations to manipulate “sets”

bit is bit 0 and the highest bit in a 32-bit cell is bit 31. To make
a cell with only bit 7 set, shift the value 1 left by seven —or in a
PAWN expression: “1 << 7”.
Below is the program that implements the algorithm described
earlier to find all points that can be reached from a specific departure in a given number of steps. The algorithm is completely
in the findtargets function.
LISTING:

set.p

/* Set operations, using bit arithmetic */
const
{ A
B
C
D
E
F
G
}

=
=
=
=
=
=
=

0b0000001,
0b0000010,
0b0000100,
0b0001000,
0b0010000,
0b0100000,
0b1000000

main()
{
new nextstep[] =
[ C | E,
/* A can reach C and E */
D | E,
/* B "
"
D and E */
G,
/* C "
"
G */
C | F,
/* D "
"
C and F */
0,
/* E "
"
none */
0,
/* F "
"
none */
E | F,
/* G "
"
E and F */
]
print "The departure point: "
new start = clamp( .value = toupper(getchar()) - 'A',
.min = 0,
.max = sizeof nextstep - 1
)
print "\nThe number of steps: "
new steps = getvalue()
/* make the set */
new result = findtargets(start, steps, nextstep)
printf "The points in range of %c in %d steps: ", start + 'A', steps
for (new i = 0; i < sizeof nextstep; i++)
if (result & 1 << i)
printf "%c ", i + 'A'
}
findtargets(start, steps, nextstep[], numpoints = sizeof nextstep)
{
new result = 0
new addedpoints = nextstep[start]
while (steps-- > 0 && result != addedpoints)
{
result = addedpoints
for (new i = 0; i < numpoints; i++)

A simple RPN calculator

— 25

if (result & 1 << i)
addedpoints |= nextstep[i]
}
return result
}

The const statement just below the header of the main function
declares the constants for the nodes A to G, using binary radix so
that that only a single bit is set in each value.

“const” statement:
100

When working with sets, a typical task that pops up is to determine the number of elements in the set. A straightforward
function that does this is below:

cellbits: 100

LISTING:

simple bitcount function

bitcount(set)
{
new count = 0
for (new i = 0; i < cellbits; i++)
if (set & (1 << i))
count++
return count
}

With a cell size of 32 bits, this function’s loop iterates 32 times
to check for a single bit at each iteration. With a bit of binary
arithmetic magic, we can reduce it to loop only for the number
of bits that are “set”. That is, the following function iterates only
once if the input value has only one bit set:
LISTING:

improved bitcount function

bitcount(set)
{
new count = 0
if (set)
do
count++
while ((set = set & (set - 1)))
return count
}

A simple RPN calculator
The common mathematical notation, with arithmetic expressions
like “26−3×(5+2)”, is known as the algebraic notation. It is a compact notation and we have grown accustomed to it. PAWN and by
far most other programming languages use the algebraic notation for their programming expressions. The algebraic notation
does have a few disadvantages, though. For instance, it occasionally requires that the order of operations is made explicit by

Algebraic notation
is also called “infix” notation

26

—

A simple RPN calculator

folding a part of the expression in parentheses. The expression at
the top of this paragraph can be rewritten to eliminate the parentheses, but at the cost of nearly doubling its length. In practice,
the algebraic notation is augmented with precedence level rules
that say, for example, that multiplication goes before addition
and subtraction.∗ Precedence levels greatly reduce the need for
parentheses, but it does not fully avoid them. Worse is that when
the number of operators grows large, the hierarchy of precedence levels and the particular precedence level for each operator becomes hard to memorize —which is why an operator-rich
language as APL does away with precedence levels altogether.

Reverse Polish
Notation is also
called “postfix”
notation

Around 1920, the Polish mathematician Jan Łukasiewicz demonstrated that by putting the operators in front of their operands,
instead of between them, precedence levels became redundant
and parentheses were never necessary. This notation became
known as the “Polish Notation”.† Later, Charles Hamblin proposed to put operators behind the operands, calling it the “Reverse Polish Notation”. The advantage of reversing the order is
that the operators are listed in the same order as they must be
executed: when reading the operators from the left to the right,
you also have the operations to perform in that order. The algebraic expression from the beginning of this section would read
in RPN as:
26 3 5 2 + × −
When looking at the operators only, we have: first an addition,
then a multiplication and finally a subtraction. The operands of
each operator are read from right to left: the operands for the +
operator are the values 5 and 2, those for the × operator are the
result of the previous addition and the value 3, and so on.
It is helpful to imagine the values to be stacked on a pile, where
the operators take one or more operands from the top of the pile
and put a result back on top of the pile. When reading through
the RPN expression, the values 26, 3, 5 and 2 are “stacked” in
that order. The operator + removes the top two elements from
∗

†

These rules are often summarized in a mnemonic like “Please Excuse My
Dear Aunt Sally” (Parentheses, Exponentiation, Multiplication, Division, Addition, Subtraction).
Polish Notation is completely unrelated to “Hungarian Notation” —which is
just the habit of adding “type” or “purpose” identification warts to names
of variables or functions.

A simple RPN calculator

— 27

the stack (5 and 2) and pushes the sum of these values back —
the stack now reads “26 3 7”. Then, the × operator removes 3
and 7 and pushes the product of the values onto the stack —the
stack is “26 21”. Finally, the − operator subtracts 21 from 26
and stores the single value 5, the end result of the expression,
back onto the stack.
Reverse Polish Notation became popular because it was easy to
understand and easy to implement in (early) calculators. It also
opens the way to operators with more than two operands (e.g.
integration) or operators with more than one result (e.g. conversion between polar and Cartesian coordinates).
The main program for a Reverse Polish Notation calculator is
below:
LISTING: rpn.p
/* a simple RPN calculator */
#include strtok
#include stack
#include rpnparse
main()
{
print "Type expressions in Reverse Polish Notation " ...
"(or an empty line to quit)\n"
new string{100}
while (getstring(string, .pack = true))
rpncalc string
}

The main program contains very little code itself; instead it includes the required code from three other files, each of which
implements a few functions that, together, build the RPN calculator. When programs or scripts get larger, it is usually advised
to spread the implementation over several files, in order to make
maintenance easier.
Function main first puts up a prompt and calls the native function
getstring to read an expression that the user types. Then it calls
the custom function rpncalc to do the real work. Function rpncalc is implemented in the file rpnparse.inc, reproduced below:
LISTING:

rpnparse.i

/* main rpn parser and lexical analysis, part of the RPN calculator */
#include 
#include 
#define Token [
.type,
Rational: .value,
.word{20},

/* operator or token type */
/* value, if t_type is "Number" */
/* raw string */

28

—

A simple RPN calculator

]
const Number
= '0'
const EndOfExpr = '#'
rpncalc(const string{})
{
new index
new field[Token]
for ( ;; )
{
field = gettoken(string, index)
switch (field.type)
{
case Number:
push field.value
case '+':
push pop() + pop()
case '-':
push - pop() + pop()
case '*':
push pop() * pop()
case '/', ':':
push 1.0 / pop() * pop()
case EndOfExpr:
break
/* exit "for" loop */
default:
printf "Unknown operator '%s'\n", field.word
}
}
printf "Result = %r\n", pop()
if (clearstack())
print "Stack not empty\n", red
}
gettoken(const string{}, &index)
{
/* first get the next "word" from the string */
new word{20}
word = strtok(string, index)
/* then parse it */
new field[Token]
field.word = word
if (strlen(word) == 0)
{
field.type = EndOfExpr /* special "stop" symbol */
field.value = 0
}
else if ('0' <= word{0} <= '9')
{
field.type = Number
field.value = rval(word)
}
else
{
field.type = word{0}
field.value = 0
}
return field

A simple RPN calculator

— 29

}

The RPN calculator uses rational numbers and rpnparse.inc includes the “rational” file for that purpose. Almost all of the
operations on rational numbers is hidden in the arithmetic. The
only direct references to rational numbers are the “%r” format
code in the printf statement near the bottom of function rpncalc and the call to rationalstr halfway function gettoken.
Near the top in the file rpnparse.inc is a preprocessor macro
that declares the symbolic subscripts for an array. The macro
name, “Token” will be used throughout the program to declare
arrays with those fields. For example, function rpncalc declares
variable field as an array using the macro to declare the field
names.

Rational numbers,
see also the “Celsius to Fahrenheit”
example on page
14

Preprocessor: 91

Arrays with symbolic subscripts were already introduced in the
section Arrays and symbolic subscripts on page 19; this script
shows another feature of symbolic subscripts: individual substripts may have a tag name of their own. In this example, .type
is a simple cell, .value is a rational value (with a fractional part)
that is tagged as such, and .word can hold a string of 20 characters (includding the terminating zero byte). See, for example,
the line:
printf "Unknown operator '%s'\n", field.word
how the .word subscript of the field variable is used as a string.
If you know C/C++ or Java, you may want to look at the switch
statement. The switch statement differs in a number of ways
from the other languages that provide it. The cases are not fallthrough, for example, which in turn means that the break statement for the case EndOfExpr breaks out of the enclosing loop,
instead of out of the switch.
On the top of the for loop in function rpncalc, you will find the
instruction “field = gettoken(string, index)”. As already exemplified in the wcount.p (“word count”) program on page 17,
functions may return arrays. It gets more interesting for a similar line in function gettoken:
field.word = word
where word is an array for 20 characters and field is an array
with 3 (symbolic) subscripts. However, as the .word subscript
is declared as having a size of 20 characters, the expression
“field.word” is considered a sub-array of 20 characters, precisely matching the array size of word.

“switch” statement: 112

30

—

LISTING:

A simple RPN calculator
strtok.i

/* extract words from a string (words are separated by white space) */
#include 
strtok(const string{}, &index)
{
new length = strlen(string)
/* skip leading white space */
while (index < length && string{index} <= ' ')
index++
/* store the word letter for letter */
new offset = index
/* save start position of token */
const wordlength = 20
/* maximum word length */
new result{wordlength}
/* string to store the word in */
while (index < length
&& string{index} > ' '
&& index - offset < wordlength)
{
result{index - offset} = string{index}
index++
}
result{index - offset} = EOS
/* zero-terminate the string */
return result
}

wcount.p: 17

Function strtok is the same as the one used in the wcount.p example. It is implemented in a separate file for the RPN calculator
program. Note that the strtok function as it is implemented here
can only handle words with up to 19 characters —the 20th character is the zero terminator. A truly general purpose re-usable
implementation of an strtok function would pass the destination
array as a parameter, so that it could handle words of any size.
Supporting both packed and unpack strings would also be a useful feature of a general purpose function.
When discussing the merits of Reverse Polish Notation, I mentioned that a stack is both an aid in “visualizing” the algorithm
as well as a convenient method to implement an RPN parser. This
example RPN calculator, uses a stack with the ubiquitous functions push and pop. For error checking and resetting the stack,
there is a third function that clears the stack.
LISTING: stack.i
/* stack functions, part of the RPN calculator */
#include 
static Rational: stack[50]
static stackidx = 0
push(Rational: value)
{
assert stackidx < sizeof stack
stack[stackidx++] = value
}

Event-driven programming

— 31

Rational: pop()
{
assert stackidx > 0
return stack[--stackidx]
}
clearstack()
{
assert stackidx >= 0
if (stackidx == 0)
return false
stackidx = 0
return true
}

The file stack.inc includes the file rational again. This is technically not necessary (rpnparse.inc already included the definitions for rational number support), but it does not do any harm
either and, for the sake of code re-use, it is better to make any
file include the definitions of the libraries that it depends on.
Notice how the two global variables stack and stackidx are declared as “static” variables; using the keyword static instead of
new. Doing this makes the global variables “visible” in that file
only. For all other files in a larger project, the symbols stack and
stackidx are invisible and they cannot (accidentally) modify the
variables. It also allows the other modules to declare their own
private variables with these names, so it avoids name clashing.
The RPN calculator is actually still a fairly small program, but
it has been set up as if it were a larger program. It was also
designed to demonstrate a set of elements of the PAWN language
and the example program could have been implemented more
compactly.

Event-driven programming
All of the example programs that were developed in this chapter
so far, have used a “flow-driven” programming model: they start
with main and the code determines what to do and when to request input. This programming model is easy to understand and
it nicely fits most programming languages, but it is also a model
does not fit many “real life” situations. Quite often, a program
cannot simply process data and suggest that the user provides
input only when it is ready for him/her. Instead, it is the user who
decides when to provide input, and the program or script should

32

—

Event-driven programming

be prepared to process it in an acceptable time, regardless of
what it was doing at the moment.
The above description suggests that a program should therefore
be able to interrupt its work and do other things before picking
up the original task. In early implementations, this was indeed
how such functionality was implemented: a multi-tasking system
where one task (or thread) managed the background tasks and
a second task/thread that sits in a loop continuously requesting
user input. This is a heavy-weight solution, however. A more
light-weight implementation of a responsive system is what is
called the “event-driven” programming model.
In the event-driven programming model, a program or script decomposes any lengthy (background) task into short manageable
blocks and in between, it is available for input. Instead of having the program poll for input, however, the host application (or
some other sub-system) calls a function that is attached to the
event —but only if the event occurs.
A typical event is “input”. Observe that input does not only come
from human operators. Input packets can arrive over serial cables, network stacks, internal sub-systems such as timers and
clocks, and all kinds of other equipment that you may have attached to your system. Many of the apparatus that produce input, just send it. The arrival of such input is an event, just like a
key press. If you do not catch the event, a few of them may be
stored in an internal system queue, but once the queue is saturated the events are simply dropped.
PAWN directly supports the event-driven model, because it supports multiple entry points. The sole entry point of a flow-driven
program is main; an event-driven program has an entry point
for every event that it captures. When compared to the flowdriven model, event-driven programs often appear “bottom-up”:
instead of your program calling into the host application and deciding what to do next, your program is being called from the
outside and it is required to respond appropriately and promptly.

Public functions:
80

PAWN does not specify a standard library, and so there is no guarantee that in a particular implementation, functions like printf
and getvalue are available. Although it is suggested that every implementation provides a minimal console/terminal interface with a these functions, their availability is ultimately implementation dependent. The same holds for the public functions
—the entry points for a script. It is implementation-dependent

Event-driven programming

— 33

which public functions a host application supports. The script in
this section may therefore not run on your platform (even if all
previous scripts ran fine). The tools in the standard distribution
of the PAWN system support all scripts developed in this manual,
provided that your operating system or environment supports
standard terminal functions such as setting the cursor position.
An early programming language that was developed solely for
teaching the concepts of programming to children was “Logo”.
This dialect of LISP made programming visual by having a small
robot, the “turtle”, drive over the floor under control of a simple
program. This concept was then copied to moving a (usually triangular) cursor of the computer display, again under control of a
program. A novelty was that the turtle now left a trail behind it,
allowing you to create drawings by properly programming the
turtle —it became known as turtle graphics. The term “turtle
graphics” was also used for drawing interactively with the arrow
keys on the keyboard and a “turtle” for the current position. This
method of drawing pictures on the computer was briefly popular
before the advent of the mouse.
LISTING:

turtle.p

@keypressed(key)
{
/* get current position */
new x, y
wherexy x, y
/* determine how the update the current position */
switch (key)
{
case 'u': y-/* up */
case 'd': y++
/* down */
case 'l': x-/* left */
case 'r': x++
/* right */
case '\e': exit /* Escape = exit */
}
/* adjust the cursor position and draw something */
moveturtle x, y
}
moveturtle(x, y)
{
gotoxy x, y
print "*"
gotoxy x, y
}

The entry point of the above program is @keypressed —it is called
on a key press. If you run the program and do not type any key,
the function @keypressed never runs; if you type ten keys, @keypressed runs ten times. Contrast this behaviour with main: func-

34

—

Event-driven programming

tion main runs immediately after you start the script and it runs
only once.
It is still allowed to add a main function to an event-driven program: the main function will then serve for one-time initialization. A simple addition to this example program is to add a main
function, in order to clear the console/terminal window on entry
and perhaps set the initial position of the “turtle” to the centre.
Support for function keys and other special keys (e.g. the arrow keys) is highly system-dependent. On ANSI terminals, these
keys produce different codes than in a Windows “DOS box”. In
the spirit of keeping the example program portable, I have used
common letters (“u” for up, “l” for left, etc.). This does not mean,
however, that special keys are beyond PAWN’s capabilities.
In the “turtle” script, the “Escape” key terminates the host application through the instruction exit. For a simple PAWN run-time
host, this will indeed work. With host applications where the
script is an add-on, or host-applications that are embedded in a
device, the script usually cannot terminate the host application.

• Multiple events
The advantages of the event-driven programming model, for creating reactive programs, become apparent in the presence of
multiple events. In fact, the event-driven model is only useful
if you have more that one entry point; if your script just handles
a single event, it might as well enter a polling loop for that single event. The more events need to be handled, the harder the
flow-driven programming model becomes. The script below implements a bare-bones “chat” program, using only two events:
one for sending and one for receiving. The script allows users
on a network (or perhaps over another connection) to exchange
single-line messages.
The script depends on the host application to provide the native and public functions for sending and receiving “datagrams”
and for responding to keys that are typed in. How the host application sends its messages, over a serial line or using TCP/IP,
the host application may decide itself. The tools in the standard
PAWN distribution push the messages over the TCP/IP network,
and allow for a “broadcast” mode so that more than two people
can chat with each other.

Event-driven programming
LISTING:

— 35

chat.p

#include 
const cellchars = cellbits / charbits
@receivestring(const message[], const source[])
printf "[%s] says: %s\n", source, message
@keypressed(key)
{
static string{100}
static index
if (key == '\e')
exit

/* quit on 'Esc' key */

echo key
if (key == '\r' || key == '\n' || index == sizeof string * cellchars)
{
string{index} = '\0'
/* terminate string */
sendstring string
index = 0
string{index} = '\0'
}
else
string{index++} = key
}
echo(key)
{
new string{2} = { 0 }
string{0} = key == '\r' ? '\n' : key
printf string
}

The bulk of the above script handles gathering received keypresses into a string and sending that string after seeing the
ENTER key. The “Escape” key ends the program. The function
echo serves to give visual feedback of what the user types: it
builds a zero-terminated string from the key and prints it.
Despite its simplicity, this script has the interesting property that
there is no fixed or prescribed order in which the messages are
to be sent or received —there is no query–reply scheme where
each host takes its turn in talking & listening. A new message
may even be received while the user is typing its own message.∗
∗

As this script makes no attempt to separate received messages from typed
messages (for example, in two different scrollable regions), the terminal/
console will look confusing when this happens. With an improved userinterface, this simple script could indeed be a nice message-base chat program.

36

—

State programming

State programming
In a program following the event-driven model, events arrive individually, and they are also responded to individually. On occasion, though, an event is part of a sequential flow, that must
be handled in order. Examples are data transfer protocols over,
for example, a serial line. Each event may carry a command, a
snippet of data that is part of a larger file, an acknowledgement,
or other signals that take part in the protocol. For the stream of
events (and the data packets that they carry) to make sense, the
event-driven program must follow a precise hand-shaking protocol.
To adhere to a protocol, an event-driven program must respond
to each event in compliance with the (recent) history of events received earlier and the responses to those events. In other words,
the handling of one event may set up a “condition” or “environment” for the handling any one or more subsequent events.
A simple, but quite effective, abstraction for constructing reactive systems that need to follow (partially) sequential protocols,
is that of the “automaton” or state machine. As the number of
states are usually finite, the theory often refers to such automatons as Finite State Automatons or Finite State Machines. In an
automaton, the context (or condition) of an event is its state. An
event that arrives may be handled differently depending on the
state of the automaton, and in response to an event, the automaton may switch to another state —this is called a transition. A
transition, in other words, as a response of the automaton to an
event in the context of its state.
Automatons are very common in software as well as in mechanical devices (you may see the Jacquard Loom as an early state
machine). Automatons, with a finite number of states, are deterministic (i.e. predictable in behaviour) and their relatively simple
design allows a straightforward implementation from a “state diagram”.
In a state diagram, the states are usually represented as circles
or rounded rectangles and the arrows represent the transitions.
As transitions are the response of the automaton to events, an
arrow may also be seen as an event “that does something”. An
event/transition that is not defined in a particular state is assumed to have no effect —it is silently ignored. A filled dot represents the entry state, which your program (or the host application) must set in start-up. It is common to omit in a state diagram

State programming

— 37

all event arrows that drop back into the same state, but for the
preceding figure I have chosen to make the response to all events
explicit.

The above state diagram is for “parsing” comments that start
with “/*” and end with “*/”. There are states for plain text and
for text inside a comment, plus two states for tentative entry into
or exit from a comment. The automaton is intended to parse
the comments interactively, from characters that the user types
on the keyboard. Therefore, the only events that the automaton reacts on are key presses. Actually, there is only one event
(“key-press”) and the state switches are determined by event’s
parameter: the key.
PAWN supports automatons and states directly in the language.
Every function∗ may optionally have one or more states assigned
to it. PAWN also supports multiple automatons, and each state is
part of a particular automaton. The following script implements
the preceding state diagram (in a single, anonymous, automaton). To differentiate plain text from comments, both are output
in a different colour.
LISTING:

comment.p

/* parse C comments interactively, using events and a state machine */
main()
state plain
@keypressed(key) 
{
state (key == '/') slash
if (key != '/')
echo key
}
∗

With the exception of “native functions” and user-defined operators.

38

—

State programming

@keypressed(key) 
{
state (key != '/') plain
state (key == '*') comment
echo '/'
/* print '/' held back from previous state */
if (key != '/')
echo key
}
@keypressed(key) 
{
echo key
state (key == '*') star
}
@keypressed(key) 
{
echo key
state (key != '*') comment
state (key == '/') plain
}
echo(key) 
printchar key, yellow
echo(key) 
printchar key, green
printchar(ch, colour)
{
setattr .foreground = colour
printf "%c", ch
}

Function main sets the starting state to main and exits; all logic
is event-driven. When a key arrives in state plain, the program
checks for a slash and conditionally prints the received key. The
interaction between the states plain and slash demonstrates a
complexity that is typical for automatons: you must decide how
to respond to an event when it arrives, without being able to
“peek ahead” or undo responses to earlier events. This is usually
the case for event-driven systems —you neither know what event
you will receive next, nor when you will receive it, and whatever
your response to the current event, there is a good chance that
you cannot erase it on a future event and pretend that it never
happened.
In our particular case, when a slash arrives, this might be the
start of a comment sequence (“/*”), but it is not necessarily so.
By inference, we cannot decide on reception of the slash character what colour to print it in. Hence, we hold it back. However,
there is no global variable in the script that says that a character is held back —in fact, apart from function parameters, no
variable is declared at all in this script. The information about a

State programming

— 39

character being held back is “hidden” in the state of the automaton.
As is apparent in the script, state changes may be conditional.
The condition is optional, and you can also use the common if–
else construct to change states.
Being state-dependent is not reserved for the event functions.
Other functions may have state declarations as well, as the echo
function demonstrates. When a function would have the same
implementation for several states, you just need to write a single
implementation and mention all applicable states. For function
echo there are two implementations to handle the four states.∗
That said, an automaton must be prepared to handle all events
in any state. Typically, the automaton has neither control over
which events arrive nor over when they arrive, so not handling
an event in some state could lead to wrong decisions. It frequently happens, then, that a some events are meaningful only
in a few specific states and that they should trigger an error or
“reset” procedure in all other cases. The function for handling
the event in such “error” condition might then hold a lot of state
names, if you were to mention them explicitly. There is a shorter
way: by not mentioning any name between the angle brackets,
the function matches all states that have not explicit implementation elsewhere. So, for example, you could use the signature
“echo(key) <>” for either of the two implementations (but not for
both).
A single anonymous automaton is pre-defined. If a program contains more than one automaton, the others must be explicitly
mentioned, both in the state classifier of the function and in the
state instruction. To do so, add the name of the automaton in
front of the state name and separate the names of the automaton and the state with a colon. That is, “parser:slash” stands
for the state slash of the automaton parser. A function can only
be part of a single automaton; you can share one implementation
of a function for several states of the same automaton, but you
cannot share that function for states of different automatons.
• Entry functions and automata theory
∗

A function that has the same implementation for all states, does not need a
state classifier at all —see printchar.

40

—

State programming

State machines, and the foundation of “automata theory”, originate from mechanical design and pneumatic/electric switching
circuits (using relays rather than transistors). Typical examples
are coin acceptors, traffic light control and communication lines
switching circuits. In these applications, robustness and predictability are paramount, and it was found that in this context
it was best to link actions (output) to the states rather than to
the events (input). In this design, entering a state causes activity —events cause state changes, but do not directly carry out
operations.
In a pedestrian crossing lights system, the lights for the vehicles
and the pedestrians must be synchronized. Technically, there
are six possible combinations, but obviously the combination of a
green light for the traffic and a “walk” sign for the pedestrians is
recipe for disaster. We can also immediately dismiss the combination of yellow/walk as too dangerous. Thus, four combinations
remain to be handled. The figure below is a state diagram for the
pedestrian crossing lights. The entire process is activated with
a button, and operates on a timer.

State programming

— 41

When the state red/walk times out, the state cannot immediately
go back to green/wait, because the pedestrians that are busy
crossing the road at that moment need some time to clear the
road —the state red/wait allows for this. For purpose of demonstration, this pedestrian crossing has the added functionality that
when a pedestrian pushes the button while the light for the traffic is already red, the time that the pedestrian has for crossing is
lengthened. If the state is red/wait and the button is pressed, it
switches back to red/walk. The enfolding box around the states
red/walk and red/wait for handling the button event is just a notational convenience: I could also have drawn two arrows from
either state back to red/walk. The script source code (which follows below) reflects this same notational convenience, though.
In the implementation in the PAWN language, the event functions now always have a single statement, which is either a state
change or an empty statement. Events that do not cause a state
change are absent in the diagram, but they must be handled in
the script; hence, the “fall-back” event functions that do nothing. The output, in this example program only messages printed
on the console, is all done in the special functions entry. The
function entry may be seen as a main for a state: it is implicitly
called when the state that it is attached to is entered. Note that
the entry function is also called when “switching” to the state
that the automaton is already in: when the state is red_walk an
invocation of the @keypressed sets the state to red_walk (which
it is already in) and causes the entry function of red_walk to run
—this is a re-entry of the state.
LISTING:

traffic.p

/* traffic light synchronizer, using states in an event-driven model */
#include 

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Mode                       : UseOutlines
Page Count                      : 194
Creator                         :  XeTeX output 2017.06.05:1618
Producer                        : xdvipdfmx (20140317)
Create Date                     : 2017:06:05 16:18:29+01:00
EXIF Metadata provided by EXIF.tools

Navigation menu