Pawn Language Guide

Pawn_Language_Guide

User Manual:

Open the PDF directly: View PDF .
Page Count: 194 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Foreword
A tutorial introduction
Data and declarations
Functions
The preprocessor
General syntax
Operators and expressions
Statements
Directives
Proposed function library
Pitfalls: differences from C
Assorted tips
Appendices
Index

Pawn

embedded scripting language

The Language

January 2016 CompuPhase

“CompuPhase” and “Pawn” are trademarks of ITB CompuPhase.

“Java” is a trademark of Sun Microsystems, Inc.

“Microsoft” and “Microsoft Windows” are registered trademarks of Microsoft

Corporation.

“Linux” is a registered trademark of Linus Torvalds.

“Unicode” is a registered trademark of Unicode, Inc.

⃝1997–2016, ITB CompuPhase

Eerste Industriestraat 19–21, 1401VL Bussum The Netherlands

telephone: (+31)-(0)35 6939 261

e-mail: info@compuphase.com

WWW: http://www.compuphase.com

This manual and the associated software are made available un-

der the conditions listed in appendix D of this manual.

Typeset with T

EX in the “DejaVu” typeface family.

iii

Contents

Foreword .......................................................... 1

A tutorial introduction .........................................3

Arithmetic and expressions....................................5

Arrays and constants...........................................7

Using functions .................................................8

Rational numbers .............................................13

Strings .........................................................15

Symbolic subscripts (structured data) ......................19

Bit operations to manipulate “sets” .........................22

A simple RPN calculator .....................................25

Event-driven programming ..................................31

State programming ...........................................36

Program veriﬁcation ..........................................47

Documentation comments ...................................49

Warnings and errors ..........................................56

In closing ......................................................58

Data and declarations .......................................59

State variable declarations...................................59

Static local declarations ......................................60

Static global declarations ....................................60

Stock declarations ............................................60

Public declarations ...........................................60

Constant variables ............................................61

Arrays (single dimension) ....................................61

Initialization ...................................................62

Progressive initiallers for arrays ............................62

Symbolic subscripts for arrays .............................. 63

Multi-dimensional arrays.....................................64

Arrays and the sizeof operator...............................64

Tag names .....................................................65

Functions ........................................................68

Function arguments ..........................................69

Coercion rules.................................................77

Calling functions ..............................................77

Recursion ......................................................78

Forward declarations .........................................79

State classiﬁers ...............................................80

Public functions, function main ..............................80

Static functions ...............................................81

Stock functions................................................82

iv — Table of contents

Native functions .............................................. 82

User-deﬁned operators .......................................83

The preprocessor ..............................................91

General syntax.................................................95

Operators and expressions ...............................102

Notational conventions .....................................102

Arithmetic ....................................................102

Bit manipulation .............................................103

Assignment...................................................104

Relational ....................................................105

Boolean.......................................................105

Miscellaneous................................................106

Operator precedence........................................107

Statements ....................................................109

Directives ......................................................114

Proposed function library..................................121

Core functions ...............................................121

Console functions............................................125

Date/time functions .........................................129

File input/output.............................................129

Fixed point arithmetic ......................................129

Floating point arithmetic ...................................130

Process and library call interface ..........................130

String manipulation .........................................130

Pitfalls: diﬀerences from C................................131

Assorted tips ..................................................134

Working with characters and strings ......................134

Internationalization .........................................136

Working with tags ...........................................139

Concatenating lines .........................................143

A program that generates its own source code ...........144

Appendices ....................................................145

A: Error and warning messages ............................145

B: The compiler ..............................................167

C: Rationale ..................................................173

D: License ....................................................180

Index ............................................................185

Foreword

PAWN is a simple, typeless, 32-bit “scripting” language with a

C-like syntax. Execution speed, stability, simplicity and a small

footprint were essential design criteria for both the language and

the interpreter/abstract machine that a PAWN program runs on.

An application or tool cannot do or be everything for all users.

This not only justiﬁes the diversity of editors, compilers, operat-

ing systems and many other software systems, it also explains

the presence of extensive conﬁguration options and macro or

scripting languages in applications. My own applications have

contained a variety of little languages; most were very simple,

some were extensive... and most needs could have been solved

by a general purpose language with a special purpose library.

Hence, PAWN.

The PAWN language was designed as a ﬂexible language for ma-

nipulating objects in a host application. The tool set (compiler,

abstract machine) were written so that they were easily extensi-

ble and would run on diﬀerent software/hardware architectures.

PAWN is a descendent of the original Small C by Ron Cain and

James Hendrix, which at its turn was a subset of C. Some of the

modiﬁcations that I did to Small C, e.g. the removal of the type

system and the substitution of pointers by references, were so

fundamental that I could hardly call my language a “subset of

C” or a “C dialect” any more. Therefore, I stripped oﬀ the “C”

from the title and used the name “SMALL” for the name of the

language in my publication in Dr. Dobb’s Journal and the years

since. During development and maintenance of the product, I

received many requests for changes. One of the frequently re-

quested changes was to use a diﬀerent name for the language

—searching for information on the SMALL scripting language on

the Internet was hindered by “small” being such a common word.

The name change occurred together with a signiﬁcant change in

the language: the support of “states” (and state machines).

I am indebted to Ron Cain and James Hendrix (and more recently,

Andy Yuen), for their work on Small C and to Dr. Dobb’s Journal

for publishing it. Although I must have touched nearly every line

of the original code multiple times, the Small C origins are still

clearly visible.

2 — Foreword

A detailed treatise of the design goals and compromises is in ap-

pendix C; here I would like to summarize a few key points. As

written in the previous paragraphs, PAWN is for customizing ap-

plications (by writing scripts), not for writing applications. PAWN

is weak on data structuring because PAWN programs are intended

to manipulate objects (text, sprites, streams, queries, . . . ) in the

host application, but the PAWN program is, by intent, denied di-

rect access to any data outside its abstract machine. The only

means that a PAWN program has to manipulate objects in the host

application is by calling subroutines, so called “native functions”,

that the host application provides.

PAWN is ﬂexible in that key area: calling functions. PAWN sup-

ports default values for any of the arguments of a function, call-

by-reference as well as call-by-value, and “named” as well as

“positional” function arguments. PAWN does not have a “type

checking” mechanism, by virtue of being a typeless language,

but it does oﬀer in replacement a “classiﬁcation checking” mech-

anism, called “tags”. The tag system is especially convenient for

function arguments because each argument may specify multiple

acceptable tags.

For any language, the power (or weakness) lies not in the indi-

vidual features, but in their combination. For PAWN, I feel that

the combination of named arguments —which lets you specify

function arguments in any order, and default values —which al-

lows you to skip specifying arguments that you are not interested

in, blend together to a convenient and “descriptive” way to call

(native) functions to manipulate objects in the host application.

A tutorial introduction

PAWN is a simple programming language with a syntax reminis-

cent to the “C” programming language. A PAWN program con-

sists of a set of functions and a set of variables. The variables

are data objects and the functions contain instructions (called

“statements”) that operate on the data objects or that perform

tasks.

The ﬁrst program in almost any computer language is one that Compiling and run-

ning scripts: page

167

prints a simple string; printing “Hello world” is a classic example.

In PAWN, the program would look like:

LISTING: hello.p

main()

printf "Hello world\n"

This manual assumes that you know how to run a PAWN program;

if not, please consult the application manual or appendix B).

A PAWN program starts execution in an “entry” function∗—in

nearly all examples of this manual, this entry function is called

“main”. Here, the function main contains only a single instruc-

tion, which is at the line below the function head itself. Line

breaks and indenting are insigniﬁcant; the invocation of the func-

tion print could equally well be on the same line as the head of

function main.

The deﬁnition of a function requires that a pair of parentheses

follow the function name. If a function takes parameters, their

declarations appear between the parentheses. The function main

does not take any parentheses. The rules are diﬀerent for a func-

tion invocation (or a function call); parentheses are optional in

the call to the print function.

The single argument of the print function is a string, which must String literals: 98

be enclosed in double quotes. The characters \n near the end of

the string form an escape sequence, in this case they indicate a Escape sequence:

“newline” symbol. When print encounters the newline escape

sequence, it advances the cursor to the ﬁrst column of the next

line. One has to use the \n escape sequence to insert a “newline”

into the string, because a string may not wrap over multiple lines.

∗This should not be confused with the “state” entry functions, which are

called entry, but serve a diﬀerent purpose —see page 39.

4 — A tutorial introduction

PAWN is a “case sensitive” language: upper and lower case let-

ters are considered to be diﬀerent letters. It would be an error to

spell the function printf in the above example as “PrintF”. Key-

words and predeﬁned symbols, like the name of function “main”,

must be typed in lower case.

If you know the C language, you may feel that the above example

does not look much like the equivalent “Hello world” program in

C/C++. PAWN can also look very similar to C, though. The next

example program is also valid PAWN syntax (and it has the same

semantics as the earlier example):

LISTING: hello.p — C style

#include <console>

main()

{

printf("Hello world\n");

}

These ﬁrst examples also reveal a few diﬀerences between PAWN

and the C language:

⋄there is usually no need to include any system-deﬁned “header

ﬁle”;

⋄semicolons are optional (except when writing multiple state-

ments on one line);

⋄when the body of a function is a single instruction, the braces

(for a compound instruction) are optional;

⋄when you do not use the result of a function in an expression

or assignment, parentheses around the function argument are

optional.

As an aside, the few preceding points refer to optional syntaxes.

It is your choice what syntax you wish to use: neither style is

“deprecated” or “preferred”. The examples in this manual po-

sition the braces and use an indentation that is known as the

“Whitesmith’s style”, but PAWN is a free format language and

other indenting styles are just as good.

Because PAWN is designed to be an extension language for appli-

cations, the function set/library that a PAWN program has at its

disposal depends on the host application. As a result, the PAWN

language has no intrinsic knowledge of any function. The print

More function de-

scriptions at page

121

function, used in this ﬁrst example, must be made available by

the host application and be “declared” to the PAWN parser.†It is

†In the language speciﬁcation, the term “parser” refers to any implementa-

tion that processes and runs on conforming Pawn programs —either inter-

Arithmetic and expressions — 5

assumed, however, that all host applications provide a minimal

set of common functions, like print and printf.

In some environments, the display or terminal must be enabled

before any text can be output onto it. If this is the case, you

must add a call to the function “console” before the ﬁrst call to

function print or printf. The console function also allows you

to specify device characteristics, such as the number of lines and

columns of the display. The example programs in this manual do

not use the console functions, because many platforms do not

require or provide it.

Arithmetic and expressions

Fundamental elements of most programs are calculations, de-

cisions (conditional execution), iterations (loops) and variables

to store input data, output data and intermediate results. The

next program example illustrates many of these concepts. The

program calculates the greatest common divisor of two values

using an algorithm invented by Euclides.

LISTING: gcd.p

The greatest common divisor of two values,

using Euclides' algorithm.

main()

{

print "Input two values\n"

new a = getvalue()

new b = getvalue()

while (a != b)

if (a > b)

a=a-b

else

b=b-a

printf "The greatest common divisor is %d\n", a

}

Function main now contains more than just a single “print” state-

ment. When the body of a function contains more than one state-

ment, these statements must be embodied in braces —the “{”

and “}” characters. This groups the instructions to a single com- Compound state-

ment: 109

pound statement. The notion of grouping statements in a com-

pound statement applies as well to the bodies of if–else and loop

instructions.

preters or compilers.

6 — Arithmetic and expressions

The new keyword creates a variable. The name of the variable

Data declarations

are covered in

detail starting at

page 59

follows new. It is common, but not imperative, to assign a value

to the variable already at the moment of its creation. Variables

must be declared before they are used in an expression. The

getvalue function (also common predeﬁned function) reads in a

value from the keyboard and returns the result. Note that PAWN

is a typeless language, all variables are numeric cells that can

hold a signed integral value.

The getvalue function name is followed by a pair of parentheses.

These are required because the value that getvalue returns is

stored in a variable. Normally, the function’s arguments (or pa-

rameters) would appear between the parentheses, but getvalue

(as used in this program) does not take any explicit arguments.

If you do not assign the result of a function to a variable or use

it in a expression in another way, the parentheses are optional.

For example, the result of the print and printf statements are

not used. You may still use parentheses around the arguments,

but it is not required.

Loop instructions, like “while”, repeat a single instruction as

“while” loop: 113

“if–else”: 111 long as the loop condition (the expression that follows the while

keyword) is “true”. One can execute multiple instructions in a

loop by grouping them in a compound statement. The if–else

instruction has one instruction for the “true” clause and one for

the “false”.

Observe that some statements, like while and if–else, contain

(or “fold around”) another instruction —in the case of if–else

even two other instructions. The complete bundle is, again, a

single instruction. That is:

⋄the assignment statements “a = a - b” below the if and “b =

b-a” below the else are statements;

⋄the if–else statement folds around these two assignments and

forms a single statement of itself;

⋄the while statement folds around the if–else statement and

forms, again, a single statement.

It is common to make the nesting of the statements explicit by

indenting any sub-statements below a statement in the source

text. In the “Greatest Common Divisor” example, the left margin

indent increases by four space characters after the while state-

ment, and again after the if and else keywords. Statements that

belong to the same level, such as both printf invocations and the

while loop, have the same indentation.

Arrays and constants — 7

The loop condition for the while loop is “(a != b)”; the symbol Relational opera-

tors: 105

!= is the “not equal to” operator. That is, the if–else instruction

is repeated until “a” equals “b”. It is good practice to indent the

instructions that run under control of another statement, as is

done in the preceding example.

The call to printf, near the bottom of the example, diﬀers from

the print call right below the opening brace (“{”). The “f” in

printf stands for “formatted”, which means that the function

can format and print numeric values and other data (in a user-

speciﬁed format), as well as literal text. The %d symbol in the

string is a token that indicates the position and the format that

the subsequent argument to function printf should be printed.

At run time, the token %d is replaced by the value of variable “a”

(the second argument of printf).

Function print can only print text; it is quicker than printf. If

you want to print a literal “%” at the display, you have to use

print, or you have to double it in the string that you give to

printf. That is:

print "20% of the personnel accounts for 80% of the costs\n"

and

printf "20%% of the personnel accounts for 80%% of the costs\n"

print the same string.

Arrays and constants

Next to simple variables with a size of a single cell, PAWN sup-

ports “array variables” that hold many cells/values. The follow-

ing example program displays a series of prime numbers using

the well known “sieve of Eratosthenes”. The program also in-

troduces another new concept: symbolic constants. Symbolic

constants look like variables, but they cannot be changed.

LISTING: sieve.p

/* Print all primes below 100, using the "Sieve of Eratosthenes" */

main()

{

const max_primes = 100

new series[max_primes] = [ true, ... ]

for (new i = 2; i < max_primes; ++i)

if (series[i])

{

printf "%d ", i

/* filter all multiples of this "prime" from the list */

for (new j = 2 * i; j < max_primes; j += i)

8 — Using functions

series[j] = false

}

When a program or sub-program has some ﬁxed limit built-in, it is

Constant declara-

tion: 100 good practice create a symbolic constant for it. In the preceding

example, the symbol max_primes is a constant with the value 100.

The program uses the symbol max_primes three times after its

deﬁnition: in the declaration of the variable series and in both

for loops. If we were to adapt the program to print all primes

below 500, there is now only one line to change.

Like simple variables, arrays may be initialized upon creation.

Progressive ini-

tiallers: 62 PAWN oﬀers a convenient shorthand to initialize all elements to

a ﬁxed value: all hundred elements of the “series” array are

set to true —without requiring that the programmer types in the

word “true” a hundred times. The symbols true and false are

predeﬁned constants.

When a simple variable, like the variables iand jin the primes

sieve example, is declared in the ﬁrst expression of a for loop,

the variable is valid only inside the loop. Variable declaration has

its own rules; it is not a statement —although it looks like one.

One of the special rules for variable declaration is that the ﬁrst

“for” loop: 110 expression of a for loop may contain a variable declaration.

Both for loops also introduce new operators in their third expres-

An overview of all

operators: 102 sion. The ++ operator increments its operand by one; meaning

that, ++i is equal to i = i + 1. The += operator adds the expres-

sion on its right to the variable on its left; that is, j += i is equal

to j = j + i.

There is an “oﬀ-by-one” issue that you need to be aware if when

working with arrays. The ﬁrst element in the series array is se-

ries[0], so if the array holds max_primes elements, the last ele-

ment in the array is series[max_primes-1]. If max_primes is 100,

the last element, then, is series[99]. Accessing series[100] is

invalid.

Using functions

Larger programs separate tasks and operations into functions.

Using functions increases the modularity of programs and func-

tions, when well written, are portable to other programs. The

following example implements a function to calculate numbers

from the Fibonacci series.

Using functions — 9

The Fibonacci sequence was coined by Leonardo “Fibonacci” of

Pisa, an Italian mathematician of the 13th century —whose great-

est achievement was popularizing the Hindu-Arabic numerals in

the Western world. The goal of the sequence was to describe the

growth of a population of (idealized) rabbits; and the sequence

is 1, 1, 2, 3, 5, 8, 13, 21,... (every next value is the sum of its two

predecessors).

LISTING: ﬁb.p

/* Calculation of Fibonacci numbers by iteration */

main()

{

print "Enter a value: "

new v = getvalue()

if (v > 0)

printf "The value of Fibonacci number %d is %d\n",

v, fibonacci(v)

else

printf "The Fibonacci number %d does not exist\n", v

}

fibonacci(n)

{

assert n > 0

newa=0,b=1

for (new i = 2; i < n; i++)

{

new c = a + b

a = b

b = c

}

return a + b

}

The assert instruction at the top of the fibonacci function de- “assert” state-

ment: 109

serves explicit mention; it guards against “impossible” or invalid

conditions. A negative Fibonacci number is invalid, and the as-

sert statement ﬂags it as a programmer’s error if this case ever

occurs. Assertions should only ﬂag programmer’s errors, never

user input errors.

The implementation of a user-deﬁned function is not much dif- Functions: prop-

erties & features:

ferent than that of function main. Function fibonacci shows two

new concepts, though: it receives an input value through a pa-

rameter and it returns a value (it has a “result”).

Function parameters are declared in the function header; the

single parameter in this example is “n”. Inside the function, a

parameter behaves as a local variable, but one whose value is

passed from the outside at the call to the function.

10 — Using functions

The return statement ends a function and sets the result of the

function. It need not appear at the very end of the function; early

exits are permitted.

The main function of the Fibonacci example calls predeﬁned “na-

Native function

interface: 82 tive” functions, like getvalue and printf, as well as the user-

deﬁned function fibonacci. From the perspective of calling a

function (as in function main), there is no diﬀerence between

user-deﬁned and native functions.

The Fibonacci numbers sequence describes a surprising variety

of natural phenomena. For example, the two or three sets of spi-

rals in pineapples, pine cones and sunﬂowers usually have con-

secutive Fibonacci numbers between 5 and 89 as their number

of spirals. The numbers that occur naturally in branching pat-

terns (e.g. that of plants) are indeed Fibonacci numbers. Finally,

although the Fibonacci sequence is not a geometric sequence,

the further the sequence is extended, the more closely the ratio

between successive terms approaches the Golden Ratio, of 1.618

that appears so often in art and architecture.∗

•Call-by-reference and call-by-value

Dates are a particularly rich source of algorithms and conversion

routines, because the calenders that a date refers to have known

such a diversity, through time and around the world.

The “Julian Day Number” is attributed to Josephus Scaliger†and

it counts the number of days since November 24, 4714 BC (pro-

leptic Gregorian calendar‡). Scaliger chose that date because it

marked the coincidence of three well-established cycles: the 28-

year Solar Cycle (of the old Julian calendar), the 19-year Metonic

Cycle and the 15-year Indiction Cycle (periodic taxes or govern-

mental requisitions in ancient Rome), and because no literature

∗The exact value for the Golden Ratio is 1

/2(√5 + 1). The relation between

Fibonacci numbers and the Golden Ratio also allows for a “direct” calcu-

lation of any sequence number, instead of the iterative method described

here.

†There is some debate on exactly what Josephus Scaliger invented and who

or what he called it after.

‡The Gregorian calendar was decreed to start on 15 October 1582 by pope

Gregory XIII, which means that earlier dates do not really exist in the Gre-

gorian calendar. When extending the Gregorian calendar to days before 15

October 1582, we refer to it as the proleptic Gregorian calendar.

Using functions — 11

or recorded history was known to pre-date that particular date in

the remote past. Scaliger used this concept to reconcile dates in

historic documents, later astronomers embraced it to calculate

intervals between two events more easily.

Julian Day numbers (sometimes denoted with unit “JD”) should

not be confused with Julian Dates (the number of days since the

start of the same year), or with the Julian calendar that was in-

troduced by Julius Caesar.

Below is a program that calculates the Julian Day number from a

date in the (proleptic) Gregorian calendar, and vice versa. Note

that in the proleptic Gregorian calendar, the ﬁrst year is 1 AD

(Anno Domini) and the year before that is 1 BC (Before Christ):

year zero does not exist! The program uses negative year values

for BC years and positive (non-zero) values for AD years.

LISTING: julian.p

/* calculate Julian Day number from a date, and vice versa */

main()

{

new d, m, y, jdn

print "Give a date (dd-mm-yyyy): "

d = getvalue(_, '-', '/')

m = getvalue(_, '-', '/')

y = getvalue()

jdn = DateToJulian(d, m, y)

printf "Date %d/%d/%d = %d JD\n", d, m, y, jdn

print "Give a Julian Day Number: "

jdn = getvalue()

JulianToDate jdn, d, m, y

printf "%d JD = %d/%d/%d\n", jdn, d, m, y

}

DateToJulian(day, month, year)

{

/* The first year is 1. Year 0 does not exist: it is 1 BC (or -1) */

assert year != 0

if (year < 0)

year++

/* move January and February to the end of the previous year */

if (month <= 2)

year--, month += 12

new jdn = 365*year + year/4 - year/100 + year/400

+ (153*month - 457) / 5

+ day + 1721119

return jdn

}

JulianToDate(jdn, &day, &month, &year)

{

jdn -= 1721119

12 — Using functions

/* approximate year, then adjust in a loop */

year = (400 * jdn) / 146097

while (365*year + year/4 - year/100 + year/400 < jdn)

year++

year--

/* determine month */

jdn -= 365*year + year/4 - year/100 + year/400

month = (5*jdn + 457) / 153

/* determine day */

day = jdn - (153*month - 457) / 5

/* move January and February to start of the year */

if (month > 12)

month -= 12, year++

/* adjust negative years (year 0 must become 1 BC, or -1) */

if (year <= 0)

year--

}

Function main starts with creating the variables to hold the day,

month and year, and the calculated Julian Day number. Then

it reads in a date —three calls to getvalue— and calls function

DateToJulian to calculate the day number. After calculating the

result, main prints the date that you entered and the Julian Day

number for that date.

Near the top of function DateToJulian, it increments the year

value if it is negative; it does this to cope with the absence of a

“zero” year in the proleptic Gregorian calendar. In other words,

function DateToJulian modiﬁes its function arguments (later, it

also modiﬁes month). Inside a function, an argument behaves

like a local variable: you may modify it. These modiﬁcations re-

main local to the function DateToJulian, however. Function main

passes the values of d,mand yinto DateToJulian, who maps them

to its function arguments day,month and year respectively. Al-

“Call by value”

versus “call by

reference”: 69

though DateToJulian modiﬁes year and month, it does not change

yand min function main; it only changes local copies of yand m.

This concept is called “call by value”.

The example intentionally uses diﬀerent names for the local vari-

ables in the functions main and DateToJulian, for the purpose

of making the above explanation easier. Renaming main’s vari-

ables d,mand yto day,month and year respectively, does not

change the matter: then you just happen to have two local vari-

ables called day, two called month and two called year, which is

perfectly valid in PAWN.

The remainder of function DateToJulian is, regarding the PAWN

language, uninteresting arithmetic.

Rational numbers — 13

Returning to the second part of the function main we see that it

now asks for a day number and calls another function, JulianTo-

Date, to ﬁnd the date that matches the day number. Function

JulianToDate is interesting because it takes one input argument

(the Julian Day number) and needs to calculate three output val-

ues, the day, month and year. Alas, a function can only have a

single return value —that is, a return statement in a function may

only contain one expression. To solve this, JulianToDate speciﬁ-

cally requests that changes that it makes to some of its function

arguments are copied back to the variables of the caller of the

function. Then, in main, the variables that must hold the result

of JulianToDate are passed as arguments to JulianToDate.

Function JulianToDate marks the appropriate arguments for be-

ing “copied back to caller” by preﬁxing them with an &symbol.

Arguments with an &are copied back, arguments without is are

not. “Copying back” is actually not the correct term. An argu-

ment tagged with an &is passed to the function in a special way

that allows the function to directly modify the original variable.

This is called “call by reference” and an argument that uses it is

a “reference argument”.

In other words, if main passes yto JulianToDate —who maps it

to its function argument year— and JulianToDate changes year,

then JulianToDate really changes y. Only through reference ar-

guments can a function directly modify a variable that is declared

in a diﬀerent function.

To summarize the use of call-by-value versus call-by-reference: if

a function has one output value, you typically use a return state-

ment; if a function has more output values, you use reference

arguments. You may combine the two inside a single function,

for example in a function that returns its “normal” output via a

reference argument and an error code in its return value.

As an aside, many desktop application use conversions to and

from Julian Day numbers (or varieties of it) to conveniently cal-

culate the number of days between to dates or to calculate the

date that is 90 days from now —for example.

Rational numbers

All calculations done up to this point involved only whole num-

bers —integer values. PAWN also has support for numbers that

can hold fractional values: these are called “rational numbers”.

14 — Rational numbers

However, whether this support is enabled depends on the host

application.

Rational numbers can be implemented as either ﬂoating-point

or ﬁxed-point numbers. Floating-point arithmetic is commonly

used for general-purpose and scientiﬁc calculations, while ﬁxed-

point arithmetic is more suitable for ﬁnancial processing and ap-

plications where rounding errors should not come into play (or

at least, they should be predictable). The PAWN toolkit has both

a ﬂoating-point and a ﬁxed-point module, and the details (and

trade-oﬀs) for these modules in their respective documentation.

The issue is, however, that a host application may implement

either ﬂoating-point or ﬁxed-point, or both or neither.∗The pro-

gram below requires that at least either kind of rational number

support is available; it will fail to run if the host application does

not support rational numbers at all.

LISTING: c2f.p

#include <rational>

main()

{

new Rational: Celsius

new Rational: Fahrenheit

print "Celsius\t Fahrenheit\n"

for (Celsius = 5; Celsius <= 25; Celsius++)

{

Fahrenheit = (Celsius * 1.8) + 32

printf "%r \t %r\n", Celsius, Fahrenheit

}

The example program converts a table of degrees Celsius to de-

grees Fahrenheit. The ﬁrst directive of this program is to import

deﬁnitions for rational number support from an include ﬁle. The

ﬁle “rational” includes either support for ﬂoating-point num-

bers or for ﬁxed-point numbers, depending on what is available.

The variables Celsius and Fahrenheit are declared with a tag

Tag names: 65 “Rational:” between the keyword new and the variable name.

A tag name denotes the purpose of the variable, its permitted

use and, as a special case for rational numbers, its memory lay-

out. The Rational: tag tells the PAWN parser that the variables

Celsius and Fahrenheit contain fractional values, rather than

whole numbers.

∗Actually, this is already true of all native functions, including all native func-

tions that the examples in this manual use.

Strings — 15

The equation for obtaining degrees Fahrenheit from degrees Cel-

sius is

◦F=9

5+ 32 ◦C

The program uses the value 1.8 for the quotient 9

5. When ra-

tional number support is enabled, PAWN supports values with a

fractional part behind the decimal point.

The only other non-trivial change from earlier programs is that

the format string for the printf function now has variable place-

holders denoted with “%r” instead of “%d”. The placeholder %r

prints a rational number at the position; %d is only for integers

(“whole numbers”).

I used the include ﬁle “rational” rather than “float” or “fixed”

in an attempt to make the example program portable. If you

know that the host application supports ﬂoating point arithmetic,

it may be more convenient to “#include” the deﬁnitions from the

ﬁle float and use the tag Float: instead of Rational —when do-

ing so, you should also replace %r by %f in the call to printf. For

details on ﬁxed point and ﬂoating point support, please see the

application notes “Fixed Point Support Library” and “Floating

Point Support Library” that are available separately.

Strings

PAWN has no intrinsic “string” type; character strings are stored

in arrays, with the convention that the array element behind the

last valid character is zero. Working with strings is therefore

equivalent with working with arrays.

Among the simplest of encryption schemes is one called “ROT13”

—actually the algorithm is quite “weak” from a cryptographical

point of view. It is most widely used in public electronic forums

(BBSes, Usenet) to hide texts from casual reading, such as the so-

lution to puzzles or riddles. ROT13 simply “rotates” the alphabet

by half its length, i.e. 13 characters. It is a symmetric operation:

applying it twice on the same text reveals the original.

LISTING: rot13.p

/* Simple encryption, using ROT13 */

main()

{

printf "Please type the string to mangle: "

new str[100]

16 — Strings

getstring str, sizeof str, .pack = false

rot13 str

printf "After mangling, the string is: \"%s\"\n", str

}

rot13(string[])

{

for (new index = 0; string[index]; index++)

if ('a' <= string[index] <= 'z')

string[index] = (string[index] - 'a' + 13) % 26 + 'a'

else if ('A' <= string[index] <= 'Z')

string[index] = (string[index] - 'A' + 13) % 26 + 'A'

}

In the function header of rot13, the parameter “string” is de-

clared as an array, but without specifying the size of the array —

there is no value between the square brackets. When you specify

a size for an array in a function header, it must match the size of

the actual parameter in the function call. Omitting the array size

speciﬁcation in the function header removes this restriction and

allows the function to be called with arrays of any size. You must

then have some other means of determining the (maximum) size

of the array. In the case of a string parameter, one can simply

search for the zero terminator.

The for loop that walks over the string is typical for string pro-

cessing functions. The loop condition is “string[index]”, the

rule for true/false conditions in PAWN is that any value is “true”,

except zero. That is, when the array cell at string[index] is

zero, it is “false” and the loop aborts.

The ROT13 algorithm rotates only letters; digits, punctuation

and special characters are left unaltered. Additionally, upper

and lower case letters must be handled separately. Inside the

for loop, two if statements ﬁlter out the characters of interest.

The way that the second if is chained to the “else” clause of

the ﬁrst if is noteworthy, as it is a typical method of testing for

multiple non-overlapping conditions.

Earlier in this chapter, the concept of “call by value” versus “call

A function that

takes an array

as an argument

and that does not

change it, may

mark the argu-

ment as “const”;

see page 70

by reference” was discussed. When you are working with strings,

or arrays in general, note that PAWN always passes arrays by ref-

erence. It does this to conserve memory and to increase perfor-

mance —arrays can be large data structures and passing them

by value requires a copy of this data structure to be made, tak-

ing both memory and time. Due to this rule, function rot13 can

modify its function parameter (called “string” in the example)

without needing to declare as a reference argument.

Strings — 17

Another point of interest are the conditions in the two if state- Relational opera-

tors: 105

ments. The ﬁrst if, for example, holds the condition “'a' <=

string[index] <= 'z'”, which means that the expression is true

if (and only if) both 'a' <= string[index] and string[index] <=

'z' are true. In the combined expression, the relational opera-

tors are said to be “chained”, as they chain multiple comparisons

in one condition.

Finally, note how the last printf in function main uses the escape Escape sequence:

sequence \" to print a double quote. Normally a double quote

ends the literal string; the escape sequence “\"” inserts a double

quote into the string.

Staying on the subject of strings and arrays, below is a program

that separates a string of text into individual words and counts

them. It is a simple program that shows a few new features of

the PAWN language.

LISTING: wcount.p

/* word count: count words on a string that the user types */

#include <string>

main()

{

print "Please type a string: "

new string[100]

getstring string, sizeof string, false

new count = 0

new word[20]

new index

for ( ;; )

{

word = strtok(string, index)

if (strlen(word) == 0)

break

count++

printf "Word %d: '%s'\n", count, word

}

printf "\nNumber of words: %d\n", count

}

strtok(const string[], &index)

{

new length = strlen(string)

/* skip leading white space */

while (index < length && string[index] <= ' ')

index++

/* store the word letter for letter */

new offset = index /* save start position of token */

new result[20] /* string to store the word in */

while (index < length

18 — Strings

&& string[index] > ' '

&& index - offset < sizeof result - 1)

{

result[index - offset] = string[index]

index++

}

result[index - offset] = EOS /* zero-terminate the string */

return result

}

Function main ﬁrst displays a message and retrieves a string that

“for” loop: 110 the user must type. Then it enters a loop: writing “for (;;)” cre-

ates a loop without initialisation, without increment and without

test —it is an inﬁnite loop, equivalent to “while (true)”. How-

ever, where the PAWN parser will give you a warning if you type

“while (true)” (something along the line “redundant test ex-

pression; always true”), “for (;;)” passes the parser without

warning.

A typical use for an inﬁnite loop is a case where you need a loop

with the test in the middle —a hybrid between a while and a

do...while loop, so to speak. PAWN does not support loops-with-

a-test-in-the middle directly, but you can imitate one by coding an

inﬁnite loop with a conditional break. In this example program,

the loop:

⋄gets a word from the string —code before the test;

⋄tests whether a new word is available, and breaks out of the

loop if not —the test in the middle;

⋄prints the word and its sequence number —code after the test.

As is apparent from the line “word = strtok(string, index)”

(and the declaration of variable word), PAWN supports array as-

signment and functions returning arrays. The PAWN parser ver-

iﬁes that the array that strtok returns has the same size and

dimensions as the variable that it is assigned into.

Function strlen is a native function (predeﬁned), but strtok is

not: it must be implemented by ourselves. The function strtok

was inspired by the function of the same name from C/C++, but

it does not modify the source string. Instead it copies characters

from the source string, word for word, into a local array, which

it then returns.

A common operation is to clear a string. There are various ways

to do so. The recommended way to clear a string is to assign a

zero-length literal string to the variable.

Symbolic subscripts (structured data) — 19

LISTING: clearing a string

my_string = "" // assuming my_string is declared as packed array

Symbolic subscripts (structured data)

In a typeless language, we might assign a diﬀerent purpose to

some array elements than to other elements in the same array.

PAWN supports symbolic substripts that allow to assign speciﬁc

tag names or ranges to individual array elements.

The example to illustrate symbolic subscripts is longer than pre-

vious PAWN programs, and it also displays a few other features,

such as global variables and named parameters.

LISTING: queue.p

/* Priority queue (for simple text strings) */

#include <string>

main()

{

new msg[.text{40}, .priority]

/* insert a few items (read from console input) */

printf "Please insert a few messages and their priorities; " ...

"end with an empty string\n"

for ( ;; )

{

printf "Message: "

getstring msg.text, .pack = true

if (strlen(msg.text) == 0)

break

printf "Priority: "

msg.priority = getvalue()

if (!insert(msg))

{

printf "Queue is full, cannot insert more items\n"

break

}

/* now print the messages extracted from the queue */

printf "\nContents of the queue:\n"

while (extract(msg))

printf "[%d] %s\n", msg.priority, msg.text

}

const queuesize = 10

new queue[queuesize][.text{40}, .priority]

new queueitems = 0

insert(const item[.text{40}, .priority])

{

/* check if the queue can hold one more message */

if (queueitems == queuesize)

return false /* queue is full */

20 — Symbolic subscripts (structured data)

/* find the position to insert it to */

new pos = queueitems /* start at the bottom */

while (pos > 0 && item.priority > queue[pos-1].priority)

--pos /* higher priority: move up a slot */

/* make place for the item at the insertion spot */

for (new i = queueitems; i > pos; --i)

queue[i] = queue[i-1]

/* add the message to the correct slot */

queue[pos] = item

queueitems++

return true

}

extract(item[.text{40}, .priority])

{

/* check whether the queue has one more message */

if (queueitems == 0)

return false /* queue is empty */

/* copy the topmost item */

item = queue[0]

--queueitems

/* move the queue one position up */

for (new i = 0; i < queueitems; ++i)

queue[i] = queue[i+1]

return true

}

Function main starts with a declaration of array variable msg. The

array has two ﬁelds, “.text” and “.priority”; the “.text” ﬁeld

is declared as a sub-array holding 40 characters. The period is

required for symbolic subscripts and there may be no space be-

tween the period and the name.

When an array is declared with symbolic subscripts, it may only

be indexed with these subscripts. It would be an error to say,

for example, “msg[0]”. On the other hand, since there can only

be a single symbolic subscript between the brackets, the brack-

ets become optional. That is, you can write “msg.priority” as a

shorthand for “msg.[priority]”.

Further in main are two loops. The for loop reads strings and

priority values from the console and inserts them in a queue.

The while loop below that extracts element by element from the

queue and prints the information on the screen. The point to

note, is that the for loop stores both the string and the priority

number (an integer) in the same variable msg; indeed, function

main declares only a single variable. Function getstring stores

the message text that you type starting at array msg.text while

the priority value is stored (by an assignment a few lines lower)

Symbolic subscripts (structured data) — 21

in msg.priority. The printf function in the while loop reads the

string and the value from those positions as well.

At the same time, the msg array is an entity on itself: it is passed

in its entirety to function insert. That function, in turn, says near

the end “queue[queueitems] = item”, where item is an array with

the same declaration as the msg variable in main, and queue is a

two-dimensional array that holds queuesize elements, with the

minor dimension having symbolic subscripts. The declaration of

queue and queuesize are just above function insert.

At several spots in the example program, the same symbolic sub-

scripts are repeated. In practice, a program would declare the

list of symbolic constants in a #define directive and declare the

arrays using this text-substition macro. This saves typing and

makes modiﬁcations of the declaration easier to maintain. Con-

cretely, when adding near the top of the program the following

line:

#define MESSAGE[.text{40}, .priority]

you can declare an array as “msg[MESSAGE]” and subsequently

access the symbolic subscripts.

The example implements a “priority queue”. You can insert a

number of messages into the queue and when these messages

all have the same priority, they are extracted from the queue

in the same order. However, when the messages have diﬀerent

priorities, the one with the highest priority comes out ﬁrst. The

“intelligence” for this operation is inside function insert: it ﬁrst

determines the position of the new message to add, then moves

a few messages one position upward to make space for the new

message. Function extract simply always retrieves the ﬁrst el-

ement of the queue and shifts all remaining elements down by

one position.

Note that both functions insert and extract work on two shared

variables, queue and queueitems. A variable that is declared in-

side a function, like variable msg in function main can only be

accessed from within that function. A “global variable” is ac-

cessible by all functions, and that variable is declared outside

the scope of any function. Variables must still be declared be-

fore they are used, so main cannot access variables queue and

queueitems, but both insert and extract can.

Function extract returns the messages with the highest prior-

ity via its function argument item. That is, it changes its func-

tion argument by copying the ﬁrst element of the queue array

22 — Bit operations to manipulate “sets”

into item. Function insert copies in the other direction and it

does not change its function argument item. In such a case, it

is advised to mark the function argument as “const”. This helps

the PAWN parser to both check for errors and to generate better

(more compact, quicker) code.

A ﬁnal remark on this latest sample is the call to getstring in

Named parame-

ters: 71

getstring: 126

function main: if you look up the function declaration, you will

see that it takes three parameters, two of which are optional. In

this example, only the ﬁrst and the last parameters are passed

in. Note how the example avoids ambiguity about which param-

eter follows the ﬁrst, by putting the argument name in front of

the value. By using “named parameters” rather than positional

parameters, the order in which the parameters are listed is not

important. Named parameters are convenient in specifying —

and deciphering— long parameter lists.

Bit operations to manipulate “sets”

A few algorithms are most easily solved with “set operations”,

like intersection, union and inversion. In the ﬁgure below, for ex-

ample, we want to design an algorithm that returns us the points

that can be reached from some other point in a speciﬁed maxi-

mum number of steps. For example, if we ask it to return the

points that can be reached in two steps starting from B, the al-

gorithm has to return C,D,Eand F, but not Gbecause Gtakes

three steps from B.

Our approach is to keep, for each point in the graph, the set of

other points that it can reach in one step —this is the “next_step”

set. We also have a “result” set that keeps all points that we

have found so far. We start by setting the result set equal to

the next_step set for the departure point. Now we have in the

Bit operations to manipulate “sets” — 23

result set all points that one can reach in one step. Then, for

every point in our result set, we create a union of the result set

and the next_step set for that point. This process is iterated for

a speciﬁed number of loops.

An example may clarify the procedure outlined above. When the

departure point is B, we start by setting the result set to Dand

E—these are the points that one can reach from Bin one step.

Then, we walk through the result set. The ﬁrst point that we en-

counter in the set is D, and we check what points can be reached

from Din one step: these are Cand F. So we add Cand Fto the

result set. We knew that the points that can be reached from D

in one step are Cand F, because Cand Fare in the next_step set

for D. So what we do is to merge the next_step set for point D

into the result set. The merge is called a “union” in set theory.

That handles D. The original result set also contained point E,

but the next_step set for Eis empty, so no more point is added.

The new result set therefore now contains C,D,Eand F.

A set is a general purpose container for elements. The only in-

formation that a set holds of an element is whether it is present

in the set or not. The order of elements in a set is insigniﬁcant

and a set cannot contain the same element multiple times. The

PAWN language does not provide a “set” data type or operators

that work on sets. However, sets with up to 32 elements can

be simulated by bit operations. It takes just one bit to store a

“present/absent” status and a 32-bit cell can therefore maintain

the status for 32 set elements —provided that each element is

assigned a unique bit position.

The relation between set operations and bitwise operations is

summarized in the following table. In the table, an upper case

letter stands for a set and a lower case letter for an element from

that set.

concept mathematical notation PAWN expression

intersection A∩BA&B

union A∪BA | B

complement A~A

empty set ε0

membership x∈A(1 << x) & A

To test for membership —that is, to query whether a set holds a

particular element, create a set with just one element and take

the intersection. If the result is 0(the empty set) the element is

not in the set. Bit numbering starts typically at zero; the lowest

24 — Bit operations to manipulate “sets”

bit is bit 0and the highest bit in a 32-bit cell is bit 31. To make

a cell with only bit 7set, shift the value 1left by seven —or in a

PAWN expression: “1 << 7”.

Below is the program that implements the algorithm described

earlier to ﬁnd all points that can be reached from a speciﬁc de-

parture in a given number of steps. The algorithm is completely

in the findtargets function.

LISTING: set.p

/* Set operations, using bit arithmetic */

const

{ A = 0b0000001,

B = 0b0000010,

C = 0b0000100,

D = 0b0001000,

E = 0b0010000,

F = 0b0100000,

G = 0b1000000

}

main()

{

new nextstep[] =

[ C | E, /* A can reach C and E */

D | E, /* B " " D and E */

G, /* C " " G */

C | F, /* D " " C and F */

0, /* E " " none */

0, /* F " " none */

E | F, /* G " " E and F */

]

print "The departure point: "

new start = clamp( .value = toupper(getchar()) - 'A',

.min = 0,

.max = sizeof nextstep - 1

)

print "\nThe number of steps: "

new steps = getvalue()

/* make the set */

new result = findtargets(start, steps, nextstep)

printf "The points in range of %c in %d steps: ", start + 'A', steps

for (new i = 0; i < sizeof nextstep; i++)

if (result & 1 << i)

printf "%c ", i + 'A'

}

findtargets(start, steps, nextstep[], numpoints = sizeof nextstep)

{

new result = 0

new addedpoints = nextstep[start]

while (steps-- > 0 && result != addedpoints)

{

result = addedpoints

for (new i = 0; i < numpoints; i++)

A simple RPN calculator — 25

if (result & 1 << i)

addedpoints |= nextstep[i]

}

return result

}

The const statement just below the header of the main function “const” statement:

100

declares the constants for the nodes Ato G, using binary radix so

that that only a single bit is set in each value.

When working with sets, a typical task that pops up is to de- cellbits: 100

termine the number of elements in the set. A straightforward

function that does this is below:

LISTING: simple bitcount function

bitcount(set)

{

new count = 0

for (new i = 0; i < cellbits; i++)

if (set & (1 << i))

count++

return count

}

With a cell size of 32 bits, this function’s loop iterates 32 times

to check for a single bit at each iteration. With a bit of binary

arithmetic magic, we can reduce it to loop only for the number

of bits that are “set”. That is, the following function iterates only

once if the input value has only one bit set:

LISTING: improved bitcount function

bitcount(set)

{

new count = 0

if (set)

count++

while ((set = set & (set - 1)))

return count

}

A simple RPN calculator

The common mathematical notation, with arithmetic expressions Algebraic notation

is also called “in-

ﬁx” notation

like “26−3×(5+2)”, is known as the algebraic notation. It is a com-

pact notation and we have grown accustomed to it. PAWN and by

far most other programming languages use the algebraic nota-

tion for their programming expressions. The algebraic notation

does have a few disadvantages, though. For instance, it occa-

sionally requires that the order of operations is made explicit by

26 — A simple RPN calculator

folding a part of the expression in parentheses. The expression at

the top of this paragraph can be rewritten to eliminate the paren-

theses, but at the cost of nearly doubling its length. In practice,

the algebraic notation is augmented with precedence level rules

that say, for example, that multiplication goes before addition

and subtraction.∗Precedence levels greatly reduce the need for

parentheses, but it does not fully avoid them. Worse is that when

the number of operators grows large, the hierarchy of prece-

dence levels and the particular precedence level for each oper-

ator becomes hard to memorize —which is why an operator-rich

language as APL does away with precedence levels altogether.

Around 1920, the Polish mathematician Jan Łukasiewicz demon-

strated that by putting the operators in front of their operands,

instead of between them, precedence levels became redundant

and parentheses were never necessary. This notation became

known as the “Polish Notation”.†Later, Charles Hamblin pro-

Reverse Polish

Notation is also

called “postﬁx”

notation

posed to put operators behind the operands, calling it the “Re-

verse Polish Notation”. The advantage of reversing the order is

that the operators are listed in the same order as they must be

executed: when reading the operators from the left to the right,

you also have the operations to perform in that order. The alge-

braic expression from the beginning of this section would read

in RPN as:

26 3 5 2 +× −

When looking at the operators only, we have: ﬁrst an addition,

then a multiplication and ﬁnally a subtraction. The operands of

each operator are read from right to left: the operands for the +

operator are the values 5 and 2, those for the ×operator are the

result of the previous addition and the value 3, and so on.

It is helpful to imagine the values to be stacked on a pile, where

the operators take one or more operands from the top of the pile

and put a result back on top of the pile. When reading through

the RPN expression, the values 26, 3, 5 and 2 are “stacked” in

that order. The operator +removes the top two elements from

∗These rules are often summarized in a mnemonic like “Please Excuse My

Dear Aunt Sally” (Parentheses, Exponentiation, Multiplication, Division, Ad-

dition, Subtraction).

†Polish Notation is completely unrelated to “Hungarian Notation” —which is

just the habit of adding “type” or “purpose” identiﬁcation warts to names

of variables or functions.

A simple RPN calculator — 27

the stack (5 and 2) and pushes the sum of these values back —

the stack now reads “26 3 7”. Then, the ×operator removes 3

and 7 and pushes the product of the values onto the stack —the

stack is “26 21”. Finally, the −operator subtracts 21 from 26

and stores the single value 5, the end result of the expression,

back onto the stack.

Reverse Polish Notation became popular because it was easy to

understand and easy to implement in (early) calculators. It also

opens the way to operators with more than two operands (e.g.

integration) or operators with more than one result (e.g. conver-

sion between polar and Cartesian coordinates).

The main program for a Reverse Polish Notation calculator is

below:

LISTING: rpn.p

/* a simple RPN calculator */

#include strtok

#include stack

#include rpnparse

main()

{

print "Type expressions in Reverse Polish Notation " ...

"(or an empty line to quit)\n"

new string{100}

while (getstring(string, .pack = true))

rpncalc string

}

The main program contains very little code itself; instead it in-

cludes the required code from three other ﬁles, each of which

implements a few functions that, together, build the RPN calcu-

lator. When programs or scripts get larger, it is usually advised

to spread the implementation over several ﬁles, in order to make

maintenance easier.

Function main ﬁrst puts up a prompt and calls the native function

getstring to read an expression that the user types. Then it calls

the custom function rpncalc to do the real work. Function rpn-

calc is implemented in the ﬁle rpnparse.inc, reproduced below:

LISTING: rpnparse.i

/* main rpn parser and lexical analysis, part of the RPN calculator */

#include <rational>

#include <string>

#define Token [

.type, /* operator or token type */

Rational: .value, /* value, if t_type is "Number" */

.word{20}, /* raw string */

28 — A simple RPN calculator

]

const Number = '0'

const EndOfExpr = '#'

rpncalc(const string{})

{

new index

new field[Token]

for ( ;; )

{

field = gettoken(string, index)

switch (field.type)

{

case Number:

push field.value

case '+':

push pop() + pop()

case '-':

push - pop() + pop()

case '*':

push pop() * pop()

case '/', ':':

push 1.0 / pop() * pop()

case EndOfExpr:

break /* exit "for" loop */

default:

printf "Unknown operator '%s'\n", field.word

}

printf "Result = %r\n", pop()

if (clearstack())

print "Stack not empty\n", red

}

gettoken(const string{}, &index)

{

/* first get the next "word" from the string */

new word{20}

word = strtok(string, index)

/* then parse it */

new field[Token]

field.word = word

if (strlen(word) == 0)

{

field.type = EndOfExpr /* special "stop" symbol */

field.value = 0

}

else if ('0' <= word{0} <= '9')

{

field.type = Number

field.value = rval(word)

}

else

{

field.type = word{0}

field.value = 0

}

return field

A simple RPN calculator — 29

}

The RPN calculator uses rational numbers and rpnparse.inc in- Rational numbers,

see also the “Cel-

sius to Fahrenheit”

example on page

cludes the “rational” ﬁle for that purpose. Almost all of the

operations on rational numbers is hidden in the arithmetic. The

only direct references to rational numbers are the “%r” format

code in the printf statement near the bottom of function rpn-

calc and the call to rationalstr halfway function gettoken.

Near the top in the ﬁle rpnparse.inc is a preprocessor macro Preprocessor: 91

that declares the symbolic subscripts for an array. The macro

name, “Token” will be used throughout the program to declare

arrays with those ﬁelds. For example, function rpncalc declares

variable field as an array using the macro to declare the ﬁeld

names.

Arrays with symbolic subscripts were already introduced in the

section Arrays and symbolic subscripts on page 19; this script

shows another feature of symbolic subscripts: individual sub-

stripts may have a tag name of their own. In this example, .type

is a simple cell, .value is a rational value (with a fractional part)

that is tagged as such, and .word can hold a string of 20 char-

acters (includding the terminating zero byte). See, for example,

the line:

printf "Unknown operator '%s'\n", field.word

how the .word subscript of the field variable is used as a string.

If you know C/C++ or Java, you may want to look at the switch “switch” state-

ment: 112

statement. The switch statement diﬀers in a number of ways

from the other languages that provide it. The cases are not fall-

through, for example, which in turn means that the break state-

ment for the case EndOfExpr breaks out of the enclosing loop,

instead of out of the switch.

On the top of the for loop in function rpncalc, you will ﬁnd the

instruction “field = gettoken(string, index)”. As already ex-

empliﬁed in the wcount.p (“word count”) program on page 17,

functions may return arrays. It gets more interesting for a simi-

lar line in function gettoken:

field.word = word

where word is an array for 20 characters and field is an array

with 3 (symbolic) subscripts. However, as the .word subscript

is declared as having a size of 20 characters, the expression

“field.word” is considered a sub-array of 20 characters, pre-

cisely matching the array size of word.

30 — A simple RPN calculator

LISTING: strtok.i

/* extract words from a string (words are separated by white space) */

#include <string>

strtok(const string{}, &index)

{

new length = strlen(string)

/* skip leading white space */

while (index < length && string{index} <= ' ')

index++

/* store the word letter for letter */

new offset = index /* save start position of token */

const wordlength = 20 /* maximum word length */

new result{wordlength} /* string to store the word in */

while (index < length

&& string{index} > ' '

&& index - offset < wordlength)

{

result{index - offset} = string{index}

index++

}

result{index - offset} = EOS /* zero-terminate the string */

return result

}

Function strtok is the same as the one used in the wcount.p ex-

wcount.p: 17 ample. It is implemented in a separate ﬁle for the RPN calculator

program. Note that the strtok function as it is implemented here

can only handle words with up to 19 characters —the 20th char-

acter is the zero terminator. A truly general purpose re-usable

implementation of an strtok function would pass the destination

array as a parameter, so that it could handle words of any size.

Supporting both packed and unpack strings would also be a use-

ful feature of a general purpose function.

When discussing the merits of Reverse Polish Notation, I men-

tioned that a stack is both an aid in “visualizing” the algorithm

as well as a convenient method to implement an RPN parser. This

example RPN calculator, uses a stack with the ubiquitous func-

tions push and pop. For error checking and resetting the stack,

there is a third function that clears the stack.

LISTING: stack.i

/* stack functions, part of the RPN calculator */

#include <rational>

static Rational: stack[50]

static stackidx = 0

push(Rational: value)

{

assert stackidx < sizeof stack

stack[stackidx++] = value

}

Event-driven programming — 31

Rational: pop()

{

assert stackidx > 0

return stack[--stackidx]

}

clearstack()

{

assert stackidx >= 0

if (stackidx == 0)

return false

stackidx = 0

return true

}

The ﬁle stack.inc includes the ﬁle rational again. This is tech-

nically not necessary (rpnparse.inc already included the deﬁni-

tions for rational number support), but it does not do any harm

either and, for the sake of code re-use, it is better to make any

ﬁle include the deﬁnitions of the libraries that it depends on.

Notice how the two global variables stack and stackidx are de-

clared as “static” variables; using the keyword static instead of

new. Doing this makes the global variables “visible” in that ﬁle

only. For all other ﬁles in a larger project, the symbols stack and

stackidx are invisible and they cannot (accidentally) modify the

variables. It also allows the other modules to declare their own

private variables with these names, so it avoids name clashing.

The RPN calculator is actually still a fairly small program, but

it has been set up as if it were a larger program. It was also

designed to demonstrate a set of elements of the PAWN language

and the example program could have been implemented more

compactly.

Event-driven programming

All of the example programs that were developed in this chapter

so far, have used a “ﬂow-driven” programming model: they start

with main and the code determines what to do and when to re-

quest input. This programming model is easy to understand and

it nicely ﬁts most programming languages, but it is also a model

does not ﬁt many “real life” situations. Quite often, a program

cannot simply process data and suggest that the user provides

input only when it is ready for him/her. Instead, it is the user who

decides when to provide input, and the program or script should

32 — Event-driven programming

be prepared to process it in an acceptable time, regardless of

what it was doing at the moment.

The above description suggests that a program should therefore

be able to interrupt its work and do other things before picking

up the original task. In early implementations, this was indeed

how such functionality was implemented: a multi-tasking system

where one task (or thread) managed the background tasks and

a second task/thread that sits in a loop continuously requesting

user input. This is a heavy-weight solution, however. A more

light-weight implementation of a responsive system is what is

called the “event-driven” programming model.

In the event-driven programming model, a program or script de-

composes any lengthy (background) task into short manageable

blocks and in between, it is available for input. Instead of hav-

ing the program poll for input, however, the host application (or

some other sub-system) calls a function that is attached to the

event —but only if the event occurs.

A typical event is “input”. Observe that input does not only come

from human operators. Input packets can arrive over serial ca-

bles, network stacks, internal sub-systems such as timers and

clocks, and all kinds of other equipment that you may have at-

tached to your system. Many of the apparatus that produce in-

put, just send it. The arrival of such input is an event, just like a

key press. If you do not catch the event, a few of them may be

stored in an internal system queue, but once the queue is satu-

rated the events are simply dropped.

PAWN directly supports the event-driven model, because it sup-

ports multiple entry points. The sole entry point of a ﬂow-driven

program is main; an event-driven program has an entry point

for every event that it captures. When compared to the ﬂow-

driven model, event-driven programs often appear “bottom-up”:

instead of your program calling into the host application and de-

ciding what to do next, your program is being called from the

outside and it is required to respond appropriately and promptly.

PAWN does not specify a standard library, and so there is no guar-

antee that in a particular implementation, functions like printf

and getvalue are available. Although it is suggested that ev-

ery implementation provides a minimal console/terminal inter-

face with a these functions, their availability is ultimately imple-

mentation dependent. The same holds for the public functions

Public functions:

80 —the entry points for a script. It is implementation-dependent

Event-driven programming — 33

which public functions a host application supports. The script in

this section may therefore not run on your platform (even if all

previous scripts ran ﬁne). The tools in the standard distribution

of the PAWN system support all scripts developed in this manual,

provided that your operating system or environment supports

standard terminal functions such as setting the cursor position.

An early programming language that was developed solely for

teaching the concepts of programming to children was “Logo”.

This dialect of LISP made programming visual by having a small

robot, the “turtle”, drive over the ﬂoor under control of a simple

program. This concept was then copied to moving a (usually tri-

angular) cursor of the computer display, again under control of a

program. A novelty was that the turtle now left a trail behind it,

allowing you to create drawings by properly programming the

turtle —it became known as turtle graphics. The term “turtle

graphics” was also used for drawing interactively with the arrow

keys on the keyboard and a “turtle” for the current position. This

method of drawing pictures on the computer was brieﬂy popular

before the advent of the mouse.

LISTING: turtle.p

@keypressed(key)

{

/* get current position */

new x, y

wherexy x, y

/* determine how the update the current position */

switch (key)

{

case 'u': y-- /* up */

case 'd': y++ /* down */

case 'l': x-- /* left */

case 'r': x++ /* right */

case '\e': exit /* Escape = exit */

}

/* adjust the cursor position and draw something */

moveturtle x, y

}

moveturtle(x, y)

{

gotoxy x, y

print "*"

gotoxy x, y

}

The entry point of the above program is @keypressed —it is called

on a key press. If you run the program and do not type any key,

the function @keypressed never runs; if you type ten keys, @key-

pressed runs ten times. Contrast this behaviour with main: func-

34 — Event-driven programming

tion main runs immediately after you start the script and it runs

only once.

It is still allowed to add a main function to an event-driven pro-

gram: the main function will then serve for one-time initializa-

tion. A simple addition to this example program is to add a main

function, in order to clear the console/terminal window on entry

and perhaps set the initial position of the “turtle” to the centre.

Support for function keys and other special keys (e.g. the ar-

row keys) is highly system-dependent. On ANSI terminals, these

keys produce diﬀerent codes than in a Windows “DOS box”. In

the spirit of keeping the example program portable, I have used

common letters (“u” for up, “l” for left, etc.). This does not mean,

however, that special keys are beyond PAWN’s capabilities.

In the “turtle” script, the “Escape” key terminates the host appli-

cation through the instruction exit. For a simple PAWN run-time

host, this will indeed work. With host applications where the

script is an add-on, or host-applications that are embedded in a

device, the script usually cannot terminate the host application.

•Multiple events

The advantages of the event-driven programming model, for cre-

ating reactive programs, become apparent in the presence of

multiple events. In fact, the event-driven model is only useful

if you have more that one entry point; if your script just handles

a single event, it might as well enter a polling loop for that sin-

gle event. The more events need to be handled, the harder the

ﬂow-driven programming model becomes. The script below im-

plements a bare-bones “chat” program, using only two events:

one for sending and one for receiving. The script allows users

on a network (or perhaps over another connection) to exchange

single-line messages.

The script depends on the host application to provide the na-

tive and public functions for sending and receiving “datagrams”

and for responding to keys that are typed in. How the host ap-

plication sends its messages, over a serial line or using TCP/IP,

the host application may decide itself. The tools in the standard

PAWN distribution push the messages over the TCP/IP network,

and allow for a “broadcast” mode so that more than two people

can chat with each other.

Event-driven programming — 35

LISTING: chat.p

#include <datagram>

const cellchars = cellbits / charbits

@receivestring(const message[], const source[])

printf "[%s] says: %s\n", source, message

@keypressed(key)

{

static string{100}

static index

if (key == '\e')

exit /* quit on 'Esc' key */

echo key

if (key == '\r' || key == '\n' || index == sizeof string * cellchars)

{

string{index} = '\0' /* terminate string */

sendstring string

index = 0

string{index} = '\0'

}

else

string{index++} = key

}

echo(key)

{

new string{2} = { 0 }

string{0} = key == '\r' ? '\n' : key

printf string

}

The bulk of the above script handles gathering received key-

presses into a string and sending that string after seeing the

ENTER key. The “Escape” key ends the program. The function

echo serves to give visual feedback of what the user types: it

builds a zero-terminated string from the key and prints it.

Despite its simplicity, this script has the interesting property that

there is no ﬁxed or prescribed order in which the messages are

to be sent or received —there is no query–reply scheme where

each host takes its turn in talking & listening. A new message

may even be received while the user is typing its own message.∗

∗As this script makes no attempt to separate received messages from typed

messages (for example, in two diﬀerent scrollable regions), the terminal/

console will look confusing when this happens. With an improved user-

interface, this simple script could indeed be a nice message-base chat pro-

gram.

36 — State programming

State programming

In a program following the event-driven model, events arrive in-

dividually, and they are also responded to individually. On oc-

casion, though, an event is part of a sequential ﬂow, that must

be handled in order. Examples are data transfer protocols over,

for example, a serial line. Each event may carry a command, a

snippet of data that is part of a larger ﬁle, an acknowledgement,

or other signals that take part in the protocol. For the stream of

events (and the data packets that they carry) to make sense, the

event-driven program must follow a precise hand-shaking proto-

col.

To adhere to a protocol, an event-driven program must respond

to each event in compliance with the (recent) history of events re-

ceived earlier and the responses to those events. In other words,

the handling of one event may set up a “condition” or “environ-

ment” for the handling any one or more subsequent events.

A simple, but quite eﬀective, abstraction for constructing reac-

tive systems that need to follow (partially) sequential protocols,

is that of the “automaton” or state machine. As the number of

states are usually ﬁnite, the theory often refers to such automa-

tons as Finite State Automatons or Finite State Machines. In an

automaton, the context (or condition) of an event is its state. An

event that arrives may be handled diﬀerently depending on the

state of the automaton, and in response to an event, the automa-

ton may switch to another state —this is called a transition. A

transition, in other words, as a response of the automaton to an

event in the context of its state.

Automatons are very common in software as well as in mechan-

ical devices (you may see the Jacquard Loom as an early state

machine). Automatons, with a ﬁnite number of states, are deter-

ministic (i.e. predictable in behaviour) and their relatively simple

design allows a straightforward implementation from a “state di-

agram”.

In a state diagram, the states are usually represented as circles

or rounded rectangles and the arrows represent the transitions.

As transitions are the response of the automaton to events, an

arrow may also be seen as an event “that does something”. An

event/transition that is not deﬁned in a particular state is as-

sumed to have no eﬀect —it is silently ignored. A ﬁlled dot rep-

resents the entry state, which your program (or the host applica-

tion) must set in start-up. It is common to omit in a state diagram

State programming — 37

all event arrows that drop back into the same state, but for the

preceding ﬁgure I have chosen to make the response to all events

explicit.

The above state diagram is for “parsing” comments that start

with “/*” and end with “*/”. There are states for plain text and

for text inside a comment, plus two states for tentative entry into

or exit from a comment. The automaton is intended to parse

the comments interactively, from characters that the user types

on the keyboard. Therefore, the only events that the automa-

ton reacts on are key presses. Actually, there is only one event

(“key-press”) and the state switches are determined by event’s

parameter: the key.

PAWN supports automatons and states directly in the language.

Every function∗may optionally have one or more states assigned

to it. PAWN also supports multiple automatons, and each state is

part of a particular automaton. The following script implements

the preceding state diagram (in a single, anonymous, automa-

ton). To diﬀerentiate plain text from comments, both are output

in a diﬀerent colour.

LISTING: comment.p

/* parse C comments interactively, using events and a state machine */

main()

state plain

@keypressed(key) <plain>

{

state (key == '/') slash

if (key != '/')

echo key

}

∗With the exception of “native functions” and user-deﬁned operators.

38 — State programming

@keypressed(key) <slash>

{

state (key != '/') plain

state (key == '*') comment

echo '/' /* print '/' held back from previous state */

if (key != '/')

echo key

}

@keypressed(key) <comment>

{

echo key

state (key == '*') star

}

@keypressed(key) <star>

{

echo key

state (key != '*') comment

state (key == '/') plain

}

echo(key) <plain, slash>

printchar key, yellow

echo(key) <comment, star>

printchar key, green

printchar(ch, colour)

{

setattr .foreground = colour

printf "%c", ch

}

Function main sets the starting state to main and exits; all logic

is event-driven. When a key arrives in state plain, the program

checks for a slash and conditionally prints the received key. The

interaction between the states plain and slash demonstrates a

complexity that is typical for automatons: you must decide how

to respond to an event when it arrives, without being able to

“peek ahead” or undo responses to earlier events. This is usually

the case for event-driven systems —you neither know what event

you will receive next, nor when you will receive it, and whatever

your response to the current event, there is a good chance that

you cannot erase it on a future event and pretend that it never

happened.

In our particular case, when a slash arrives, this might be the

start of a comment sequence (“/*”), but it is not necessarily so.

By inference, we cannot decide on reception of the slash charac-

ter what colour to print it in. Hence, we hold it back. However,

there is no global variable in the script that says that a char-

acter is held back —in fact, apart from function parameters, no

variable is declared at all in this script. The information about a

State programming — 39

character being held back is “hidden” in the state of the automa-

ton.

As is apparent in the script, state changes may be conditional.

The condition is optional, and you can also use the common if–

else construct to change states.

Being state-dependent is not reserved for the event functions.

Other functions may have state declarations as well, as the echo

function demonstrates. When a function would have the same

implementation for several states, you just need to write a single

implementation and mention all applicable states. For function

echo there are two implementations to handle the four states.∗

That said, an automaton must be prepared to handle all events

in any state. Typically, the automaton has neither control over

which events arrive nor over when they arrive, so not handling

an event in some state could lead to wrong decisions. It fre-

quently happens, then, that a some events are meaningful only

in a few speciﬁc states and that they should trigger an error or

“reset” procedure in all other cases. The function for handling

the event in such “error” condition might then hold a lot of state

names, if you were to mention them explicitly. There is a shorter

way: by not mentioning any name between the angle brackets,

the function matches all states that have not explicit implemen-

tation elsewhere. So, for example, you could use the signature

“echo(key) <>” for either of the two implementations (but not for

both).

A single anonymous automaton is pre-deﬁned. If a program con-

tains more than one automaton, the others must be explicitly

mentioned, both in the state classiﬁer of the function and in the

state instruction. To do so, add the name of the automaton in

front of the state name and separate the names of the automa-

ton and the state with a colon. That is, “parser:slash” stands

for the state slash of the automaton parser. A function can only

be part of a single automaton; you can share one implementation

of a function for several states of the same automaton, but you

cannot share that function for states of diﬀerent automatons.

•Entry functions and automata theory

∗A function that has the same implementation for all states, does not need a

state classiﬁer at all —see printchar.

40 — State programming

State machines, and the foundation of “automata theory”, orig-

inate from mechanical design and pneumatic/electric switching

circuits (using relays rather than transistors). Typical examples

are coin acceptors, traﬀic light control and communication lines

switching circuits. In these applications, robustness and pre-

dictability are paramount, and it was found that in this context

it was best to link actions (output) to the states rather than to

the events (input). In this design, entering a state causes activ-

ity —events cause state changes, but do not directly carry out

operations.

In a pedestrian crossing lights system, the lights for the vehicles

and the pedestrians must be synchronized. Technically, there

are six possible combinations, but obviously the combination of a

green light for the traﬀic and a “walk” sign for the pedestrians is

recipe for disaster. We can also immediately dismiss the combi-

nation of yellow/walk as too dangerous. Thus, four combinations

remain to be handled. The ﬁgure below is a state diagram for the

pedestrian crossing lights. The entire process is activated with

a button, and operates on a timer.

State programming — 41

When the state red/walk times out, the state cannot immediately

go back to green/wait, because the pedestrians that are busy

crossing the road at that moment need some time to clear the

road —the state red/wait allows for this. For purpose of demon-

stration, this pedestrian crossing has the added functionality that

when a pedestrian pushes the button while the light for the traf-

ﬁc is already red, the time that the pedestrian has for crossing is

lengthened. If the state is red/wait and the button is pressed, it

switches back to red/walk. The enfolding box around the states

red/walk and red/wait for handling the button event is just a no-

tational convenience: I could also have drawn two arrows from

either state back to red/walk. The script source code (which fol-

lows below) reﬂects this same notational convenience, though.

In the implementation in the PAWN language, the event func-

tions now always have a single statement, which is either a state

change or an empty statement. Events that do not cause a state

change are absent in the diagram, but they must be handled in

the script; hence, the “fall-back” event functions that do noth-

ing. The output, in this example program only messages printed

on the console, is all done in the special functions entry. The

function entry may be seen as a main for a state: it is implicitly

called when the state that it is attached to is entered. Note that

the entry function is also called when “switching” to the state

that the automaton is already in: when the state is red_walk an

invocation of the @keypressed sets the state to red_walk (which

it is already in) and causes the entry function of red_walk to run

—this is a re-entry of the state.

LISTING: traﬀic.p

/* traffic light synchronizer, using states in an event-driven model */

#include <time>

main() state green_wait

@keypressed(key) <green_wait> state yellow_wait

@keypressed(key) <red_walk, red_wait> state red_walk

@keypressed(key) <> {} /* fallback */

@timer() <yellow_wait> state red_walk

@timer() <red_walk> state red_wait

@timer() <red_wait> state green_wait

@timer() <> {} /* fallback */

entry() <green_wait>

print "Green / Don't walk\n"

entry() <yellow_wait>

{

print "Yellow / Don't walk\n"

settimer 2000

}

42 — State programming

entry() <red_walk>

{

print "Red / Walk\n"

settimer 5000

}

entry() <red_wait>

{

print "Red / Don't walk\n"

settimer 2000

}

This example program has an additional dependency on the host

application/environment: in addition to the “@keypressed” event

function, the host must also provide a “@timer” event with an

adjustable delay. Because of the timing functions, the script in-

cludes the system ﬁle time.inc near the top of the script.

The event functions with the state changes are all on the top

part of the script. The functions are laid out to take a single

line each, to suggest a table-like structure. All state changes are

unconditional in this example, but conditional state changes may

be used with entry functions too. The bottom part are the event

functions.

Two transitions to the state red_walk exist —or three if you con-

sider the aﬀection of multiple states to a single event function

as a mere notational convenience: from yellow_wait and from

the combination of red_walk and red_wait. These transitions all

pass through the same entry function, thereby reducing and sim-

plifying the code.

In automata theory, an automaton that associates activity with

state entries, such as this pedestrian traﬀic lights example, is a

“Moore automaton”; an automaton that associates activity with

(state-dependent) events or transitions is a “Mealy automaton”.

The interactive comment parser on page 37 is a typical Mealy au-

tomaton. The two kinds are equivalent: a Mealy automaton can

be converted to a Moore automaton and vice versa, although a

Moore automaton may need more states to implement the same

behaviour. In practice, the models are often mixed, with an over-

all “Moore automaton” design, and a few “Mealy states” where

that saves a state.

•State variables

The model of a pedestrian crossing light in the previous example

is not very realistic (its only goal is to demonstrate a few prop-

State programming — 43

erties of state programming with PAWN). The ﬁrst thing that is

lacking is a degree of fairness: pedestrians should not be able to

block car traﬀic indeﬁnitely. The car traﬀic should see a green

light for a period of some minimum duration after pedestrians

have had their time slot for crossing the road. Secondly, many

traﬀic lights have a kind of remote control ability, so that emer-

gency traﬀic (ambulance, ﬁre truck, . . . ) can force green lights

on their path. A well-known example of such remote control is

the MIRT system (Mobile Infra-Red Transmitter) but other sys-

tems exist —the Netherlands use a radiographic system called

VETAG for instance.

The new state diagram for the pedestrian crossing light has two

more states, but more importantly: it needs to save data across

events and share it between states. When the pedestrian presses

the button while the state is red_wait, we neither want to react

on the button immediately (this was our “fairness rule”), nor the

button to be ignored or “forgotten”. In other words, we move to

the state green_wait_interim regardless of the button press, but

memorize the press for a decision made at the point of leaving

state green_wait_interim.

Automatons excel in modelling control ﬂow in reactive and inter-

active systems, but data ﬂow has traditionally been a weak point.

To see why, consider that each event is handled individually by

a function and that the local variables in that function disappear

when the function returns. Local variables can, hence, not be

used to pass data from one event to the next. Global variables,

while providing a work-around, have drawbacks: global scope

and an “eternal” lifespan. If a variable is used only in the event

handlers of a single state, it is desirable to hide it from the other

states, in order to protect it from accidental modiﬁcation. Like-

wise, shortening the lifespan to the state(s) that the variable is

44 — State programming

active in, reduces the memory footprint. “State variables” pro-

vide this mix of variable scope and variable lifespan that are tied

to a series of states, rather than to functions or modules.

PAWN enriches the standard ﬁnite state machine (or automaton)

with variables that are declared with a state classiﬁer. These

variables are only accessible from the listed states and the mem-

ory these variable hold may be reused by other purposes while

the automaton is in a diﬀerent state (diﬀerent than the ones

listed). Apart from the state classiﬁer, the declaration of a state

variable is similar to that of a global variable. The declaration of

the variable button_memo in the next listing illustrates the con-

cept.

To reset the memorized button press, the script uses an “exit”

function. Just like an entry function is called when entering a

state, the exit function is called when leaving a state.

LISTING: traﬀic2.p

/* a more realistic traffic light synchronizer, including an

* "override" for emergency vehicles

#include <time>

main()

state green_wait_interim

new bool: button_memo <red_wait, green_wait_interim>

@keypressed(key)

{

switch (key)

{

case ' ': button_press

case '*': mirt_detect

}

button_press() <green_wait>

state yellow_wait

button_press() <red_wait, green_wait_interim>

button_memo = true

button_press() <> /* fallback */

{}

mirt_detect()

state mirt_override

@timer() <yellow_wait>

state red_walk

@timer() <red_walk>

state red_wait

@timer() <red_wait>

state green_wait_interim

@timer() <green_wait_interim>

State programming — 45

{

state (!button_memo) green_wait

state (button_memo) yellow_wait

}

@timer() <mirt_override>

state green_wait

@timer() <> /* fallback */

{}

entry() <green_wait_interim>

{

print "Green / Don't walk\n"

settimer 5000

}

exit() <green_wait_interim>

button_memo = false

entry() <yellow_wait>

{

print "Yellow / Don't walk\n"

settimer 2000

}

entry() <red_walk>

{

print "Red / Walk\n"

settimer 5000

}

entry() <red_wait>

{

print "Red / Don't walk\n"

settimer 2000

}

entry() <mirt_override>

{

print "Green / Don't walk\n"

settimer 5000

}

•State programming wrap-up

The common notation used in state diagrams is to indicate tran-

sitions with arrows and states with circles or rounded rectan-

gles. The circle/rounded rectangle optionally also mentions the

actions of an entry or exit function and of events that are han-

dled internally —without causing a transition. The arrow for a

transition contains the name of the event (or pseudo-event), an

optional condition between square brackets and an optional ac-

tion behind a slash (“/”).

States are ubiquitous, even if we do not always recognize them

as such. The concept of ﬁnite state machines has traditionally

46 — State programming

been applied mostly to programs mimicking mechanical appa-

ratus and software that implements communication protocols.

With the appearance of event-driven windowing systems, state

machines now also appear in the GUI design of desktop pro-

grams. States abound in web programs, because the browser

and the web-site scripting host have only a weak link. That said,

the state machine in web applications is typically implemented

in an ad-hoc manner.

States can also be recognized in common problems and riddles.

In the well known riddle of the man that must move a cabbage,

a sheep and a wolf across a river,∗the states are obvious —the

trick of the riddle is to avoid the forbidden states.

But now that we are discovering “states” everywhere, we must

be careful not to overdo it. For example, in the second imple-

mentation of a pedestrian crossing light, see page 44, I used a

variable (button_memo) to hold a criterion for a decision made at a

later time. An alternative implementation would be to throw in a

couple of more states to hold the situations “red-wait-&-button-

pressed” and “green-wait-interim-&-button-pressed”. No more

variable would then be needed, but at the cost of a more com-

plex state diagram and implementation. In general, the number

of states should be kept small.

Although automata provide a good abstraction to model reac-

tive and interactive systems, coming to a correct diagram is not

straightforward —and sometimes just outright hard. Too often,

the “sunny day scenario” of states and events is plotted out ﬁrst,

and everything straying from this path is then added on an im-

promptu basis. This approach carries the risk that some combi-

nations of events & states are forgotten, and indeed I have en-

countered two comment parser diagrams (like the one at page

37) by diﬀerent book/magazine authors that were ﬂawed in such

way. Instead, I advise to focus on the events and on the re-

sponses for individual events. For every state, every event should

be considered; do not route events through a general purpose

fall-back too eagerly.

It has become common practice, unfortunately, to introduce au-

tomata theory with applications for which better solutions exist.

∗A man has to ferry a wolf, a sheep and a cabbage across a river in a boat,

but the boat can only carry the man and a single additional item. If left

unguarded, the wolf will eat the sheep and the sheep will eat the cabbage.

How can the man ferry them across the river?

Program veriﬁcation — 47

One, oft repeated, example is that of an automaton that accu-

mulates the value of a series of coins, or that “calculates” the

remainder after division by 3 of a binary number. These appli-

cations may have made sense in mechanical/pneumatic design

where “the state” is the only memory that the automaton has,

but in software, using variables and arithmetic operations is the

better choice. Another typical example is that of matching words

or patterns using a state machine: every next letter that is input

switches to a new state. Lexical scanners, such as the ones that

compilers and interpreters use to interpret source code, might

use such state machines to ﬁlter out “reserved words”. How-

ever, for any practical set of reserved words, such automatons

become unwieldy, and no one will design them by hand. In addi-

tion, there is no reason why a lexical scanner cannot peek ahead

in the text or jump back to a mark that it set earlier —which is

one of the criteria for choosing a state implementation in the ﬁrst

place, and ﬁnally, solutions like “trie lookups” are likely simpler

to design and implement while being at least as quick.

Program veriﬁcation

Should the compiler/interpreter not catch all bugs? This rhetor-

ical question has both technical and philosophical sides. I will

forego all non-technical aspects and only mention that, in prac-

tice, there is a trade-oﬀ between the “expressiveness” of a com-

puter language and the “enforced correctness” (or “provable cor-

rectness’) of programs in that language. Making a language very

“strict” is not a solution if work needs to be done that exceeds

the size of a toy program. A too strict language leaves the pro-

grammer struggling with the language, whereas the “problem to

solve” should be the real struggle and the language should be a

simple means to express the solution in.

The goal of the PAWN language is to provide the developer with an

informal, and convenient to use, mechanism to test whether the

program behaves as was intended. This mechanism is called “as-

sertions” and, although the concept of assertions pre-dates the

idea of “design by contract”, it is most easily explained through

the design-by-contract methodology.

The “design by contract” paradigm provides an alternative ap-

proach for dealing with erroneous conditions. The premise is

that the programmer knows the task at hand, the conditions un-

der which the software must operate and the environment. In

48 — Program veriﬁcation

such an environment, each function speciﬁes the speciﬁc con-

ditions, in the form of assertions, that must hold true before a

client may execute the function. In addition, the function may

also specify any conditions that hold true after it completes its

operation. This is the “contract” of the function.

The name “design by contract” was coined by Bertrand Meyer

and its principles trace back to predicate logic and algorithmic

analysis.

⋄Preconditions specify the valid values of the input parameters

and environmental attributes;

⋄Postconditions specify the output and the (possibly modiﬁed)

environment;

⋄Invariants indicate the conditions that must hold true at key

points in a function, regardless of the path taken through the

function.

For example, a function that computes a square root of a number

may specify that its input parameter be non-negative. This is a

precondition. It may also specify that its output, when squared,

is the input value ±0.01%. This is a postcondition; it veriﬁes that

the routine operated correctly. A convenient way to calculate a

Example square

root function (us-

ing bisection): 75

square root is via “bisection”. At each iteration, this algorithm

gives at least one extra bit (binary digit) of accuracy. This is an

invariant (it might be an invariant that is hard to check, though).

Preconditions, postconditions and invariants are similar in the

sense that they all consist of a test and that a failed test indi-

cates an error in the implementation. As a result, you can imple-

ment preconditions, postconditions and invariants with a single

construct: the “assertion”. For preconditions, write assertions

at the very start of the routine; for invariants, write an asser-

tion where the invariant should hold; for post conditions, write

an assertion before each “return” statement or at the end of the

function.

In PAWN, the instruction is called assert; it is a simple statement

that contains a test. If the test outcome is “true”, nothing hap-

pens. If the outcome is “false”, the assert instruction terminates

the program with a message containing the details of the asser-

tion that failed.

Assertions are checks that should never fail. Genuine errors,

such as user input errors, should be handled with explicit tests in

the program, and not with assertions. As a rule, the expressions

Documentation comments — 49

contained in assertions should be free of side eﬀects: an asser-

tion should never contain code that your application requires for

correct operation.

This does have the eﬀect, however, that assertions never ﬁre in

a bug-free program: they just make the code fatter and slower,

without any user-visible beneﬁt. It is not this bad, though. An

additional feature of assertions is that you can build the source

code without assertions simply using a ﬂag or option to the PAWN

parser. The idea is that you enable assertions during develop-

ment and build the “retail version” of the code without asser-

tions. This is a better approach than removing the assertions,

because all assertions are automatically “back” when recompil-

ing the program —e.g. for maintenance.

During maintenance, or even during the initial development, if

you catch a bug that was not trapped by an assertion, before

ﬁxing the bug, you should think of how an assertion could have

trapped this error. Then, add this assertion and test whether it

indeed catches the bug before ﬁxing the bug. By doing this, the

code will gradually become sturdier and more reliable.

Documentation comments

When programs become larger, documenting the program and

the functions becomes vital for its maintenance, especially when

working in a team. The PAWN language tools have some fea-

tures to assist you in documenting the code in comments. Docu-

menting a program or library in its comments has a few advan-

tages —for example: documentation is more easily kept up to

date with the program, it is eﬀicient in the sense that program-

ming comments now double as documentation, and the parser

helps your documentation eﬀorts in generating syntax descrip-

tions and cross references.

Every comment that starts with three slashes (“/// ”) followed by Comment syntax:

white-space, or that starts with a slash and two stars (“/** ”) fol-

lowed by white-space is a special documentation comment. The

PAWN compiler extracts documentation comments and optionally

writes these to a “report” ﬁle. See the application documenta-

tion, or appendix B, how to enable the report generation.

As an aside, comments that start with “/**” must still be closed

with “*/”. Single line documentation comments (“///”) close at

the end of the line.

50 — Documentation comments

The report ﬁle is an XML ﬁle that can subsequently be trans-

formed to HTML documentation via an XSL/XSLT stylesheet,∗or

be run through other tools to create printed documentation. The

syntax of the report ﬁle is compatible with that of the “.Net” de-

veloper products —except that the PAWN compiler stores more

information in the report than just the extracted documentation

strings.

The example below illustrates documentation comments in a sim-

ple script that has a few functions. You may write documentation

comments for a function above its declaration or in its body. All

documentation comments that appear before the end of the func-

tion are attributed to the function. You can also add documenta-

tion comments to global variables and global constants —these

comments must appear above the declaration of the variable or

constant. The ﬁgure 1 shows part of the output for this (rather

long) example. The style of the output is adjustable in the cas-

cading style sheet (CSS-ﬁle) associated with the XSLT transfor-

mation ﬁle.

LISTING: weekday.p

/**

* This program illustrates Zeller's congruence algorithm to calculate

* the day of the week given a date.

/**

* <summary>

* The main program: asks the user to input a date and prints on

* what day of the week that date falls.

* </summary>

main()

{

new day, month, year

if (readdate(day, month, year))

{

new wkday = weekday(day, month, year)

printf "The date %d-%d-%d falls on a ", day, month, year

switch (wkday)

{

case 0:

print "Saturday"

case 1:

print "Sunday"

case 2:

print "Monday"

case 3:

print "Tuesday"

case 4:

∗The report ﬁle contains a reference to the “SMALLDOC.XSL” stylesheet.

Documentation comments — 51

print "Wednesday"

case 5:

print "Thursday"

case 6:

print "Friday"

}

else

print "Invalid date"

print "\n"

}

/**

* <summary>

* The core function of Zeller's congruence algorithm. The function

* works for the Gregorian calender.

* </summary>

* <param name="day">

* The day in the month, a value between 1 and 31.

* </param>

* <param name="month">

* The month: a value between 1 and 12.

* </param>

* <param name="year">

* The year in four digits.

* </param>

* <returns>

* The day of the week, where 0 is Saturday and 6 is Friday.

* </returns>

* <remarks>

* This function does not check the validity of the date; when the

* date in the parameters is invalid, the returned "day of the week"

* will hold an incorrect value.

*

* This equation fails in many programming languages, notably most

* implementations of C, C++ and Pascal, because these languages have

* a loosely defined "remainder" operator. Pawn, on the other hand,

* provides the true modulus operator, as defined in mathematical

* theory and as was intended by Zeller.

* </remarks>

weekday(day, month, year)

{

/**

* <remarks>

* For Zeller's congruence algorithm, the months January and

* February are the 13th and 14th month of the preceding

* year. The idea is that the "difficult month" February (which

* has either 28 or 29 days) is moved to the end of the year.

* </remarks>

if (month <= 2)

month += 12, --year

new j = year % 100

new e = year / 100

52 — Documentation comments

return (day + (month+1)*26/10 + j + j/4 + e/4 - 2*e) % 7

}

/**

* <summary>

* Reads a date and stores it in three separate fields.

* </summary>

* <param name="day">

* Will hold the day number upon return.

* </param>

* <param name="month">

* Will hold the month number upon return.

* </param>

* <param name="year">

* Will hold the year number upon return.

* </param>

* <returns>

* true if the date is valid, false otherwise;

* if the function returns false, the values of

* <paramref name="day"/>, <paramref name="month"/> and

* <paramref name="year"/> cannot be relied upon.

* </returns>

bool: readdate(&day, &month, &year)

{

print "Give a date (dd-mm-yyyy): "

day = getvalue(_,'-','/')

month = getvalue(_,'-','/')

year = getvalue()

return 1 <= month <= 12 && 1 <= day <= daysinmonth(month,year)

}

/**

* <summary>

* Returns whether a year is a leap year.

* </summary>

* <param name="year">

* The year in 4 digits.

* </param>

* <remarks>

* A year is a leap year:

* <ul>

* <li> if it is divisable by 4, </li>

* <li> but not if it is divisable by 100, </li>

* <li> but it is it is divisable by 400. </li>

* </ul>

* </remarks>

bool: isleapyear(year)

return year % 400 == 0 || year % 100 != 0 && year % 4 == 0

/**

* <summary>

* Returns the number of days in a month (the month is an integer

* in the range 1 .. 12). One needs to pass in the year as well,

* because the function takes leap years into account.

Documentation comments — 53

FIGURE 1: Documentation generated from the source code

* </summary>

* <param name="month">

* The month number, a value between 1 and 12.

* </param>

* <param name="year">

* The year in 4 digits.

* </param>

daysinmonth(month, year)

{

static daylist[] = [ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 ]

assert 1 <= month <= 12

return daylist[month-1] + _:(month == 2 && isleapyear(year))

}

The format of the XML ﬁle created by “.Net” developer products

is documented in the Microsoft documentation. The PAWN parser

creates a minimal description of each function or global variable

or constant that is used in a project, regardless of whether you

used documentation comments on that function, variable or con-

stant. The parser also generates few tags of its own:

54 — Documentation comments

attribute Attributes for a function, such as “native” or “stock”.

automaton The automaton that the function belongs to (if any).

dependencyThe names of the symbols (other functions, global

variables and/global constants) that the function re-

quires. If desired, a call tree can be constructed from

the dependencies.

param Function parameters. When you add a parameter de-

scription in a documentation comment, this descrip-

tion is combined with the auto-generated content for

the parameter.

paraminfo Tags and array or reference information on a param-

eter.

referrer All functions that refer to this symbol; i.e., all func-

tions that use or call this variable/function. This in-

formation is suﬀicient to serve as a “cross-reference”

—the “referrer” tree is the inverse of the “depen-

dency” tree.

stacksize The estimated number of cells that the function will

allocate on the stack and heap. This stack usage es-

timate excludes the stack requirements of any func-

tions that are “called” from the function to which the

documentation applies. For example, function read-

date is documented as taking 6 cells on the stack,

but it also calls daysinmonth which takes 4 additional

cells —and in turn calls isleapyear. To calculate the

total stack requirements for function readdate, the

call tree should be considered.

In addition to the local variables and function pa-

rameters, the compiler also uses the stack for stor-

ing intermediate results in complex expressions. The

stack space needed for these intermediate results

are also excluded from this report. In general, the

required overhead for the intermediate results is not

cumulative (over all functions), which is why it would

be inaccurate to add a “safety margin” to every func-

tion. For the program as a whole, a safety margin

would be highly advised. See appendix B (page 167)

for the -v option which can tell you the maximum es-

timate stack usage, based on the call tree.

Documentation comments — 55

tagname The tag of the constant, variable, function result or

function parameter(s).

transition The transitions that the function provokes and their

conditions —see the section State programming on

page 36.

All text in the documentation comment(s) is also copied to each

function, variable or constant to which it is attached. The text in

the documentation comment is copied without further process-

ing —with one exception, see below. As the rest of the report

ﬁle is in XML format, and the most suitable way to process XML

to on-line documentation is through an XSLT processor (such as

a modern browser), you may choose to do any formatting in the

documentation comments using HTML tags. Note that you will

often need to explicitly close any HTML tags; the HTML standard

does not require this, but XML/XSLT processors usually do. The

PAWN toolkit comes with an example XSLT ﬁle (with a matching

style sheet) which supports the following XML/HTML tags:

Formatted source code in a monospaced font; although

the “&”, “<” and “>” must be typed as “&”, “<”

and “&rt;” respectively.

Text set under the topic “Example”.

A parameter description, with the parameter name ap-

pearing inside the opening tag (the “name=” option) and

the parameter description following it.

A reference to a parameter, with the parameter name

appearing inside the opening tag (the “name=” option).

Text set under the topic “Remarks”.

Text set under the topic “Returns”.

Text set under the topic “See also”.

Text set immediately below the header of the symbol.

Sets the text in a header. This should only be used in

documentation that is not attached to a function or a vari-

able.

56 — Warnings and errors

Sets the text in a sub-header. This should only be used

in documentation that is not attached to a function or a

variable.

The following additional HTML tags are supported for general

purpose formatting text inside any of the above sections:

Text set in a monospaced font.

Text set emphasized, usually in italics.

Text set in a new paragraph. Instead of wrapping

and around every paragraph, inserting as a

separator between two paragraphs produces the same

eﬀect.

An alternative for

An unordered (bulleted) list.

An ordered (numbered) list.

An item in an ordered or unordered list.

As stated, there is one exception in the processing of documen-

tation comments: if your documentation comment contains a

<param ...> tag (and a matching </param>), the PAWN parser

looks up the parameter and combines your description of the pa-

rameter with the contents that it has automatically generated.

Warnings and errors

The big hurdle that I have stepped over is how to actually com-

pile the code snippets presented in this chapter. The reason is

that the procedure depends on the system that you are using: in

some applications there is a “Make” or “Compile script” command

button or menu option, while in other environments you have to

type a command like “pawncc myscript” on a command prompt.

If you are using the standard PAWN toolset, you will ﬁnd instruc-

tions of how to use the compiler and run-time in the companion

booklet “The PAWN booklet — Implementer’s Guide”. If you are

using Microsoft Windows, it may prove the most convenient to

Warnings and errors — 57

use the Quincy IDE that comes with PAWN for writing, running

and debugging scripts.

Regardless of the diﬀerences in launching the compile, the phe-

nomenon that results from launching the compile are likely to be

very similar between all systems:

⋄either the compile succeeds and produces an executable pro-

gram —that may or may not run automatically after the com-

pile;

⋄or the compile gives a list of warning and error messages.

Mistakes happen and the PAWN parser tries to catch as many of

them as it can. When you inspect the code that the PAWN parser

complains about, it may on occasion be rather diﬀicult for you

to see why the code is erroneous (or suspicious). The following

hints may help:

⋄Each error or warning number is numbered. You can look up

the error message with this number in appendix A, along with

a brief description on what the message really means.

⋄If the PAWN parser produces a list of errors, the ﬁrst error in

this list is a true error, but the diagnostic messages below it

may not be errors at all.

After the PAWN parser sees an error, it tries to step over it and

complete the compilation. However, the stumbling on the er-

ror may have confused the PAWN parser so that subsequent le-

gitimate statements are misinterpreted and reported as errors

too.

When in doubt, ﬁx the ﬁrst error and recompile.

⋄The PAWN parser checks only the syntax (spelling/grammar),

not the semantics (i.e. the “meaning”) of the code. When it de-

tects code that does not comply to the syntactical rules, there

may actually exist diﬀerent ways in which the code can be

changed to be “correct”, in the syntactical sense of the word

—even though many of these “corrections” would lead to non-

sensical code. The result is, though, that the PAWN parser may

have diﬀiculty to precisely locate the error: it does not know

what you meant to write. Hence, the parser often outputs two

line numbers and the error is somewhere in the range (between

the line numbers).

⋄Remember that a program that has no syntactical errors (the

PAWN parser accepts it without error & warning messages) may

still have semantical and logical errors which the PAWN parser

58 — In closing

cannot catch. The assert instruction (page 109) is meant to

help you catch these “run-time” errors.

In closing

If you know the C programming language, you will have seen

many concepts that you are familiar with, and a few new ones.

If you don’t know C, the pace of this introduction has probably

been quite high. Whether you are new to C or experienced in C, I

encourage you to read the following pages carefully. If you know

C or a C-like language, by the way, you may want to consult the

chapter Pitfalls (page 131) ﬁrst.

This booklet attempts to be both an informal introduction and

a (more formal) language speciﬁcation at the same time, per-

haps succeeding at neither. Since it is also the standard book on

PAWN,∗the focus of this booklet is on being accurate and com-

plete, rather than being easy to grasp.

The double nature of this booklet shows through in the order

in which it presents the subjects. The larger conceptual parts

of the language, variables and functions, are covered ﬁrst. The

operators, the statements and general syntax rules follow later

—not that they are less important, but they are easier to learn,

to look up, or to take for granted.

∗It is no longer the only book on Pawn.

Data and declarations

PAWN is a typeless language. All data elements are of type “cell”,

and a cell can hold an integral number. The size of a cell (in

bytes) is system dependent —usually, a cell is 32-bits.

The keyword new declares a new variable. For special declara-

tions, the keyword new is replaced by static,public or stock

(see below). A simple variable declaration creates a variable that

occupies one “cell” of data memory. Unless it is explicitly initial-

ized, the value of the new variable is zero.

A variable declaration may occur:

⋄at any position where a statement would be valid —local vari-

ables;

⋄at any position where a function declaration (native function

declarations) or a function implementation would be valid —

global variables;

⋄in the ﬁrst expression of a for loop instruction —also local vari- “for” loop: 110

ables.

Local declarations

A local declaration appears inside a compound statement. Compound state-

ment: 109

A local variable can only be accessed from within the com-

pound statement, and from nested compound statements.

A declaration in the ﬁrst expression of a for loop instruc-

tion is also a local declaration.

Global declarations

A global declaration appears outside a function. A global

variable is accessible to any function. Global data objects

can only be initialized with constant expressions.

State variable declarations

A state variable is a global variable with a state classiﬁer ap-

pended at the end. The scope and the lifespan of the variable

are restricted to the states that are listed in the classiﬁer. Fall-

back state speciﬁers are not permitted for state variables.

State variables may not be initialized. In contrast to normal vari-

ables (which are zero after declaration —unless explicitly initial-

ized), state variables hold an indeterminate value after declara-

tion and after ﬁrst entering a state in its classiﬁer. Typically, one

uses the state entry function(s) to properly initialize the state

variable, and the exit function(s) to reset these variables.

60 — Static local declarations

Static local declarations

A local variable is destroyed when the execution leaves the com-

pound block in which the variable was created. Local variables in

a function only exist during the run time of that function. Each

new run of the function creates and initializes new local vari-

ables. When a local variable is declared with the keyword static

rather than new, the variable remains in existence after the end of

a function. This means that static local variables provide private,

permanent storage that is accessible only from a single function

(or compound block). Like global variables, static local variables

can only be initialized with constant expressions.

Static global declarations

A static global variable behaves the same as a normal global vari-

able, except that its scope is restricted to the ﬁle that the decla-

ration resides in. To declare a global variable as static, replace

the keyword new by static.

Stock declarations

A global variable may be declared as “stock”. A stock declaration

Stock functions: 82 is one that the parser may remove or ignore if the variable turns

out not to be used in the program.

Stock variables are useful in combination with stock functions.

A public variable may be declared as “stock” as well —declaring

public variables as “public stock” enables you to declare al public

variables that a host application provides in an include ﬁle, with

only those variables that the script actually uses winding up in

the P-code ﬁle.

Public declarations

Global “simple” variables (no arrays) may be declared “public”

in two ways:

⋄declare the variable using the keyword public instead of new;

⋄start the variable name with the “@” symbol.

Arrays (single dimension) — 61

Public variables behave like global variables, with the addition

that the host program can also read and write public variables.

A (normal) global variable can only be accessed by the functions

in your script —the host program is unaware of them. As such,

a host program may require that you declare a variable with a

speciﬁc name as “public” for special purposes —such as the most

recent error number, or the general program state.

Constant variables

It is sometimes convenient to be able to create a variable that is Symbolic con-

stants: 100

initialized once and that may not be modiﬁed. Such a variable

behaves much like a symbolic constant, but it still is a variable.

To declare a constant variable, insert the keyword const between

the keyword that starts the variable declaration —new,static,

public or stock— and the variable name.

Examples:

new const address[4] = { 192, 0, 168, 66 }

public const status /* initialized to zero */

Three typical situations where one may use a constant variable

are:

⋄To create an “array” constant; symbolic constants cannot be

indexed.

⋄For a public variable that should be set by the host application,

and only by the host application. See the preceding section for

public variables.

⋄A special case is to mark array arguments to functions as const.

Array arguments are always passed by reference, declaring

them as const guards against unintentional modiﬁcation. Re-

fer to page 70 for an example of const function arguments.

Arrays (single dimension)

The syntax name[constant] declares name to be an array of “con- See also “multi-

dimensional ar-

rays”, page 64,

and “symbolic

subscripts”, page

stant” elements, where each element is a single cell. The name

is a placeholder of an identiﬁer name of your choosing and con-

stant is a positive non-zero value; constant may be absent. If

there is no value between the brackets, the number of elements

is set equal to the number of initiallers —see the example below.

62 — Initialization

The array index range is “zero based” which means that the ﬁrst

element is at name[0] and the last element is name[constant-1].

The syntax name{constant} also declares name as an array of con-

stant elements, but now the elements are characters rather than

cells. The number of characters that ﬁt in a cell depends on the

conﬁguration of the PAWN parser.

Initialization

Data objects can be initialized at their declaration. The initialler

Constants: 96 of a global data object must be a constant. Arrays, global or local,

must also be initialized with constants.

Uninitialized data defaults to zero.

Examples:

LISTING: good declaration

new i = 1

new j /* j is zero */

new k = 'a' /* k has character code for letter 'a' */

new a[] = [1,4,9,16,25] /* a has 5 elements */

new s1[20] = ['a','b'] /* the other 18 elements are 0 */

new s2[] = ''Hello world...'' /* an unpacked string */

Examples of invalid declarations:

LISTING: bad declarations

new c[3] = 4 /* an array cannot be set to a value */

new i = "Good-bye" /* only an array can hold a string */

new q[] /* unknown size of array */

new p[2] = { i + j, k - 3 } /* array initiallers must be constants */

Progressive initiallers for arrays

The ellipsis operator continues the progression of the initialisa-

tion constants for an array, based on the last two initialized ele-

ments. The ellipsis operator (three dots, or “...”) initializes the

array up to its declared size.

Examples:

LISTING: array initializers

new a[10] = { 1, ... } // sets all ten elements to 1

new b[10] = { 1, 2, ... } // b = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

new c[8] = { 1, 2, 40, 50, ... } // c = 1, 2, 40, 50, 60, 70, 80, 90

new d[10] = { 10, 9, ... } // d = 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

Symbolic subscripts for arrays — 63

Symbolic subscripts for arrays

An array may be declared with a list of symbols instead of a value

for its size: an example of this is the “priority queue” sample pro-

gram on page 19. An individual subscript may also be interpreted

as a sub-arrays, for example, see the RPN calculator program at

page 27.

The sub-array syntax applies as well to the initialization of an Use a #deﬁne for

convenient decla-

ration: 19

array with symbolic subscripts. Referring again to the “prior-

ity queue” sample program, to initialize a “message” array with

ﬁxed values, the syntax is:

LISTING: array initializers

new msg[.text{40}, .priority] = { "new message", 1 }

The initialler consists of a string (a literal array) and a value;

these go into the ﬁelds “.text” and “.priority” respectively.

An array dimension that is declared as a list of symbolic sub-

scripts, may only be indexed with these subscripts. From the

above declaration of variable “msg”, we may use:

LISTING: array initializers

msg[.text] = "another message"

msg[.priority] = 10 - msg[.priority]

It is an error, however, to use a (numeric) expression to index

“msg”. For example, “msg[1]” is an invalid expression.

Since an array with symbolic subscripts may not be indexed with

an expression, the square brackets that enclose the expression

become optional. These brackets may be omitted. The snippet

below is equivalent to the previous snippet.

LISTING: array initializers

msg.text = "another message"

msg.priority = 10 - msg.priority

A subscript may have an explicit tag name as well. This tag will Tag names: 65

then override the default tag for array elements. The RPN cal-

culator program makes use of this feature to mark one of the

subscripts as a rational value. In the declaration in the snippet

below, the expression “field.type” is a plain integer (without

tag), but the expression “field.value” has tag Rational:.

LISTING: array initializers

new field[ .type, /* operator or token type */

Rational: .value, /* value, if t_type is "Number" */

.word{20} /* raw string */

]

64 — Multi-dimensional arrays

Multi-dimensional arrays

Multi-dimensional arrays are arrays that contain references to

the sub-arrays. That is, a two-dimensional array is an “array of

single-dimensional arrays”.∗Below are a few examples of decla-

rations of two-dimensional arrays.

LISTING: two-dimensional arrays

new a[4][3]

new b[3][2] = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ]

new c[3][3] = [ [ 1 ], [ 2, ...], [ 3, 4, ... ] ]

new d[2]{10} = [ "agreement", "dispute" ]

new e[2][] = [ ''OK'', ''Cancel'' ]

new f[][] = [ ''OK'', ''Cancel'' ]

As the last two declarations (variable “e” en “f”) show, the ﬁnal

dimension of an array may have an unspeciﬁed length, in which

case the length of each sub-array is determined from the related

initializer. Every sub-array may have a diﬀerent size; in this par-

ticular example, “e[1][5]” contains the letter “l” from the word

“Cancel”, but “e[0][5]” is invalid because the length of the sub-

array “e[0]” is only three cells (containing the letters “O”, “K”

and a zero terminator).

The diﬀerence between the declarations for arrays “e” and “f” is

that we let the compiler count the number of initializers for the

major dimension —“sizeof f” is 2, like “sizeof e” (see the next

section on the sizeof operator).

Arrays and the sizeof operator

The sizeof operator returns the size of a variable in “elements”.

For a simple (non-compound) variable, the result of sizeof is al-

ways 1, because an element is a cell for a simple variable.

An array with one dimension holds a number of cells and the

sizeof operator returns that number. The snippet below would

therefore print “5” at the display, because the array “msg” holds

four characters (each in one cell) plus a zero-terminator:

LISTING: sizeof operator

new msg[] = ''Help''

printf(''%d'', sizeof msg);

∗The current implementation of the Pawn compiler supports only arrays with

up to three dimensions.

Tag names — 65

The sizeof operator always returns the number of cells, even

for a packed array. That is, in the next snippet, the value printed

would be less than “5” —although there are ﬁve characters in

the array, those are packed in fewer cells.

LISTING: sizeof operator

new msg{} = "Help"

printf(''%d'', sizeof msg);

With multi-dimensional arrays, the sizeof operator can return

the number of elements in each dimension. For the last (minor)

dimension, an element will again be a cell, but for the major di-

mension(s), an element is a sub-array. In the following code snip-

pet, observe that the syntax sizeof matrix refers to the major

dimension of the two-dimensional array and the syntax sizeof

matrix[] refers to the minor dimension of the array. The val-

ues that this snippet prints are 3 and 2 (for the major and minor

dimensions respectively):

LISTING: sizeof operator and multidimensional arrays

new matrix[3][2] = { { 1, 2 }, { 3, 4 }, { 5, 6 } }

printf(''%d %d'', sizeof matrix, sizeof matrix[]);

The application of the sizeof operator on multi-dimensional ar- Default function

arguments and

sizeof: 73

rays is especially convenient when used as a default value for

function arguments.

Tag names

A tag is a label that denotes the objective of —or the meaning

of— a variable, a constant or a function result. Tags are op-

tional, their only purpose is to allow a stronger compile-time er-

ror checking of operands in expressions, of function arguments

and of array indices.

A tag consists of a symbol name followed by a colon; it has the Label syntax: 109

same syntax as a label. A tag precedes the symbol name of a

variable, constant or function. In an assignment, only the right

hand of the “=” sign may be tagged.

Examples of valid tagged variable and constant deﬁnitions are:

LISTING: tag names

new bool:flag = true /* "flag" can only hold "true" or "false" */

const error:success = 0

const error:fatal= 1

const error:nonfatal = 2

error:errno = fatal

66 — Tag names

The sequence of the constants success,fatal and nonfatal could

“const” statement:

100 more conveniently be declared by grouping the constants in a

compount block, as illustrated below. The declaration below cre-

ates the same three constants, all with the tag error:. It is re-

quired to specify a value for the ﬁrst constant of the list, the

subsequent constants are automatically assigned a value that is

the value of the previous constant +1 —unless an explicit value

is present.

LISTING: enumerated constants

const error: {

notice = 0,

warning,

nonfatal,

fatal,

}

new error: code

After declaring variable “code” with tag name “error:”, you can

assign any of the constants with that same tag name to it; how-

ever, writing “code = 2” will give a parser diagnostic (a warning

or error message). A tag override (or a tag cast) adjusts an ex-

pression to the desired tag name. As a somewhat contrived ex-

ample, the next snippet elevates “code” to a higher level (a “more

serious error”) —note how the literal value 1 is forced to the tag

name “error:”

LISTING: tag override

if (code < fatal)

code = code + error:1

Tag names introduced so far started with a lower case letter;

these are “weak” tags. Tag names that start with an upper case

letter are “strong” tags. The diﬀerence between weak and strong

tags is that weak tags may, in a few circumstances, be dropped

implicitly by the PAWN parser —so that a weakly tagged expres-

sion becomes an untagged expression. The tag checking mech-

anism veriﬁes the following situations:

⋄When the expressions on both sides of a binary operator have a

diﬀerent tag, or when one of the expressions is tagged and the

other is not, the compiler issues a “tag mismatch” diagnostic.

There is no diﬀerence between weak and strong tags in this

situation.

⋄There is a special case for the assignment operator: the com-

“lvalue”: the vari-

able on the left

side in an assign-

ment, see page

102

piler issues a diagnostic if the variable on the left side of an

assignment operator has a tag, and the expression on the right

side either has a diﬀerent tag or is untagged. However, if the

variable on the left of the assignment operator is untagged, it

Tag names — 67

accepts an expression (on the right side) with a weak tag. In

other words, a weak tag is dropped in an assignment when the

lvalue is untagged.

⋄Passing arguments to a function follows the rule for assign-

ments. The compiler issues a diagnostic when the formal pa-

rameter (in a function deﬁnition) has a tag and the actual pa-

rameter (in the function call) either is untagged or has a diﬀer-

ent tag. However, if the formal parameter is untagged, it also

accepts a parameter with any weak tag.

Functions

A function declaration speciﬁes the name of the function and,

between parentheses, its formal parameters. A function may also

return a value. A function declaration must appear on a global

level (i.e. outside any other functions) and is globally accessible.

If a semicolon follows the function declaration (rather than a

The preferred way

to declare for-

ward functions is

at page 79

statement), the declaration denotes a forward declaration of the

function.

The return statement sets the function result. For example, func-

tion sum (see below) has as its result the value of both its argu-

ments added together. The return expression is optional for a

function, but one cannot use the value of a function that does

not return a value.

LISTING: sum function

sum(a, b)

return a + b

Arguments of a function are (implicitly declared) local variables

for that function. The function call determines the values of the

arguments.

Another example of a complete deﬁnition of a function is below:

function leapyear returns true for a leap year and false for a

non-leap year.

LISTING: leapyear function

leapyear(y)

return y % 4 == 0 && y % 100 != 0 || y % 400 == 0

The logical and arithmetic operators used in the leapyear exam-

ple are covered on pages 105 and 102 respectively.

Usually a function contains local variable declarations and con-

“assert” state-

ment: 109 sists of a compound statement. In the following example, note

the assert statement to guard against negative values for the

exponent.

LISTING: power function (raise to a power)

power(x, y)

{

/* returns x raised to the power of y */

assert y >= 0

new r = 1

for (new i = 0; i < y; i++)

r *= x

return r

}

Function arguments — 69

A function may contain multiple return statements —one usually

does this to quickly exit a function on a parameter error or when

it turns out that the function has nothing to do. If a function re-

turns an array, all return statements must specify an array with

the same size and the same dimensions.

Function arguments

The “faculty” function in the next program has one parameter Another exam-

ple is function Ju-

lianToDate at page

which it uses in a loop to calculate the faculty of that number.

What deserves attention is that the function modiﬁes its argu-

ment.

LISTING: faculty.p

/* Calculation of the faculty of a value */

main()

{

print "Enter a value: "

new v = getvalue()

new f = faculty(v)

printf "The faculty of %d is %d\n", v, f

}

faculty(n)

{

assert n >= 0

new result = 1

while (n > 0)

result *= n--

return result

}

Whatever (positive) value that “n” had at the entry of the while

loop in function faculty, “n” will be zero at the end of the loop.

In the case of the faculty function, the parameter is passed “by

value”, so the change of “n” is local to the faculty function. In

other words, function main passes “v” as input to function fac-

ulty, but upon return of faculty, “v” still has the same value as

before the function call.

•call-by-value versus call-by-reference

Arguments that occupy a single cell can be passed by value or by

reference. The default is “pass by value”. To create a function

argument that is passed by reference, preﬁx the argument name

with the character &.

Example:

70 — Function arguments

LISTING: swap function

swap(&a, &b)

{

new temp = b

b=a

a = temp

}

To pass an array to a function, append a pair of brackets to the ar-

gument name. You may optionally indicate the size of the array;

doing so improves error checking of the parser.

Example:

LISTING: addvector function

addvector(a[], const b[], size)

{

for (new i = 0; i < size; i++)

a[i] += b[i]

}

Arrays are always passed by reference. As a side note, array bin

Constant vari-

ables: 61 the above example does not change in the body of the function.

The function argument has been declared as const to make this

explicit. In addition to improving error checking, it also allows

the PAWN parser to generate more eﬀicient code.

To pass an array of literals to a function, use the same syntax as

for array initiallers: a literal string or the series of array indices

enclosed in braces (see page 98; the ellipsis for progressive ini-

tiallers cannot be used). Literal arrays can only have a single

dimension.

The following snippet calls addvector to add ﬁve to every element

of the array “vect”:

LISTING: addvector usage

new vect[3] = [ 1, 2, 3 ]

addvector(vect, [5, 5, 5], 3)

/* vect[] now holds the values 6, 7 and 8 */

The call to function printf with the string "Hello world\n" in the

“Hello world” pro-

gram: 3ﬁrst ubiquitous program is another example of passing a literal

array to a function.

Function arguments — 71

•Named parameters versus positional parameters

In the previous examples, the order of parameters of a function

call was important, because each parameter is copied to the func-

tion argument with the same sequential position. For example,

with the function weekday (which uses Zeller’s congruence algo-

rithm) deﬁned as below, you would call weekday(12,31,1999) to

get the week day of the last day of the preceding century.

LISTING: weekday function

weekday(month, day, year)

{

/* returns the day of the week: 0=Saturday, 1=Sunday, etc. */

if (month <= 2)

month += 12, --year

new j = year % 100

new e = year / 100

return (day + (month+1)*26/10 + j + j/4 + e/4 - 2*e) % 7

}

Date formats vary according to culture and nation. While the

format month/day/year is common in the United States of Amer-

ica, European countries often use the day/month/year format,

and technical publications sometimes standardize on the year/

month/day format (ISO/IEC 8824). In other words, no order of

arguments in the weekday function is “logical” or “conventional”.

That being the case, the alternative way to pass parameters to a

function is to use “named parameters”, as in the next examples

(the three function calls are equivalent):

LISTING: weekday usage —positional parameters

new wkday1 = weekday( .month = 12, .day = 31, .year = 1999)

new wkday2 = weekday( .day = 31, .month = 12, .year = 1999)

new wkday3 = weekday( .year = 1999, .month = 12, .day = 31)

With named parameters, a period (“.”) precedes the name of

the function argument. The function argument can be set to any

expression that is valid for the argument. The equal sign (“=”)

does in the case of a named parameter not indicate an assign-

ment; rather it links the expression that follows the equal sign to

one of the function arguments.

One may mix positional parameters and named parameters in a

function call with the restriction that all positional parameters

must precede any named parameters.

72 — Function arguments

•Default values of function arguments

A function argument may have a default value. The default value

Public functions

do not support de-

fault argument

values; see page

for a function argument must be a constant. To specify a default

value, append the equal sign (“=”) and the value to the argument

name.

When the function call speciﬁes an argument placeholder instead

of a valid argument, the default value applies. The argument

placeholder is the underscore character (“_”). The argument

placeholder is only valid for function arguments that have a de-

fault value.

The rightmost argument placeholders may simply be stripped

from the function argument list. For example, if function in-

crement is deﬁned as:

LISTING: increment function —default values

increment(&value, incr=1) value += incr

the following function calls are all equivalent:

LISTING: increment usage

increment(a)

increment(a, _)

increment(a, 1)

Default argument values for passed-by-reference arguments are

useful to make the input argument optional. For example, if the

function divmod is designed to return both the quotient and the

remainder of a division operation through its arguments, default

values make these arguments optional:

LISTING: divmod function —default values for reference parame-

ters

divmod(a, b, &quotient=0, &remainder=0)

{

quotient = a / b

remainder = a % b

}

With the preceding deﬁnition of function divmod, the following

function calls are now all valid:

LISTING: divmod usage

new p, q

divmod(10, 3, p, q)

divmod(10, 3, p, _)

divmod(10, 3, _, q)

divmod(10, 3, p)

divmod 10, 3, p, q

Function arguments — 73

Default arguments for array arguments are often convenient to

set a default string or prompt to a function that receives a string

argument. For example:

LISTING: print error function

print_error(const message[], const title[] = "Error: ")

{

print title

print message

print "\n"

}

The next example adds the ﬁelds of one array to another array,

and by default increments the ﬁrst three elements of the desti-

nation array by one:

LISTING: addvector function, revised

addvector(a[], const b[] = {1, 1, 1}, size = 3)

{

for (new i = 0; i < size; i++)

a[i] += b[i]

}

•sizeof operator and default function arguments

A default value of a function argument must be a constant, and “sizeof” operator

107

its value is determined at the point of the function’s declaration.

Using the “sizeof” operator to set the default value of a function

argument is a special case: the calculation of the value of the

sizeof expression is delayed to the point of the function call and

it takes the size of the actual argument rather than that of the

formal argument. When the function is used several times in a

program, with diﬀerent arguments, the outcome of the “sizeof”

expression is potentially diﬀerent at every call —which means

that the “default value” of the function argument may change.

Below is an example program that draws ten random numbers

in the range of 0–51 without duplicates. An example for an ap-

plication for drawing random numbers without duplicates is in

card games —those ten numbers could represent the cards for

two “hands” in a poker game. The virtues of the algorithm used

in this program, invented by Robert W. Floyd, are that it is eﬀi-

cient and unbiased —provided that the pseudo-random number

generator is unbiased as well.

“random” is a pro-

posed core func-

tion, see page 121

74 — Function arguments

LISTING: randlist.p

main()

{

new HandOfCards[10]

FillRandom(HandOfCards, 52)

print "A draw of 10 numbers from a range of 0 to 51 " ...

"(inclusive) without duplicates:\n"

for (new i = 0; i < sizeof HandOfCards; i++)

printf "%d ", HandOfCards[i]

}

FillRandom(Series[], Range, Number = sizeof Series)

{

assert Range >= Number /* cannot select 50 values

* without duplicates in the

* range 0..40, for example */

new Index = 0

for (new Seq = Range - Number; Seq < Range; Seq++)

{

new Val = random(Seq + 1)

new Pos = InSeries(Series, Val, Index)

if (Pos >= 0)

{

Series[Index] = Series[Pos]

Series[Pos] = Seq

}

else

Series[Index] = Val

Index++

}

InSeries(Series[], Value, Top = sizeof Series)

{

for (new i = 0; i < Top; i++)

if (Series[i] == Value)

return i

return -1

}

Function main declares the array HandOfCards with a size of ten

Array declarations:

61 cells and then calls function FillRandom with the purpose that it

draws ten positive random numbers below 52. Observe, how-

ever, that the only two parameters that main passes into the

call to FillRandom are the array HandOfCards, where the random

numbers should be stored, and the upper bound “52”. The num-

ber of random numbers to draw (“10”) is passed implicitly to

FillRandom.

The deﬁnition of function FillRandom below main speciﬁes for

its third parameter “Number = sizeof Series”, where “Series”

refers to the ﬁrst parameter of the function. Due to the special

case of a “sizeof default value”, the default value of the Number

argument is not the size of the formal argument Series, but that

Function arguments — 75

of the actual argument at the point of the function call: HandOf-

Cards.

Note that inside function FillRandom, asking the “sizeof” the

function argument Series would (still) evaluate in zero, because

the Series array is declared with unspeciﬁed length (see page

107 for the behaviour of sizeof). Using sizeof as a default value

for a function argument is a speciﬁc case. If the formal parame-

ter Series were declared with an explicit size, as in Series[10],

it would be redundant to add a Number argument with the array

size of the actual argument, because the parser would then en-

force that both formal and actual arguments have the size and

dimensions.

•Arguments with tag names

A tag optionally precedes a function argument. Using tags im- Tag names: 65

proves the compile-time error checking of the script and it serves

as “implicit documentation” of the function. For example, a func-

tion that computes the square root of an input value in ﬁxed point

precision may require that the input parameter is a ﬁxed point

value and that the result is ﬁxed point as well. The function be-

low uses the ﬁxed point extension module, and an approxima- Fixed point arith-

metic: 88;

Pawn Language Guide

Pawn_Language_Guide

Navigation menu

Versions of this User Manual:

Views

Navigation