Perl Programmers Reference Guide Version 5.005 02

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 1463 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Installing Perl
- INSTALL
The Perl FAQ (Frequently Asked Questions)
The Core Perl Manual
Core Modules
POD Translators
- pod2man
- pod2html
Porting Information
- patching
- pumpkin
Perl Utilities
Documentation files for various platforms
Other Core distributed files
Popular Modules (Win32, libwww, DBD, Sybase)
Table of Contents

Perl Programmers Reference Guide

Version 5.005_02

18−Oct−1998

"There’s more than one way to do it."

−− Larry Wall, Author of the Perl Programming Language

Author: Perl5−Porters

blank

INSTALL Perl Programmers Reference Guide INSTALL

NAME

Install − Build and Installation guide for perl5.

SYNOPSIS

The basic steps to build and install perl5 on a Unix system are:

rm −f config.sh Policy.sh

sh Configure

make

make test

make install

# You may also wish to add these:

(cd /usr/include && h2ph *.h sys/*.h)

(installhtml −−help)

(cd pod && make tex && <process the latex files>)

Each of these is explained in further detail below.

For information on non−Unix systems, see the section on "Porting information" below.

For information on what‘s new in this release, see the pod/perldelta.pod file. For more detailed information

about specific changes, see the Changes file.

DESCRIPTION

This document is written in pod format as an easy way to indicate its structure. The pod format is described

in pod/perlpod.pod, but you can read it as is with any pager or editor. Headings and items are marked by

lines beginning with ‘=’. The other mark−up used is

B<text> embolden text, used for switches, programs or commands

C<code> literal code

L<name> A link (cross reference) to name

You should probably at least skim through this entire document before proceeding.

If you‘re building Perl on a non−Unix system, you should also read the README file specific to your

operating system, since this may provide additional or different instructions for building Perl.

If there is a hint file for your system (in the hints/ directory) you should also read that hint file for specific

information for your system. (Unixware users should use the svr4.sh hint file.)

WARNING: This version is not binary compatible with Perl 5.004.

Starting with Perl 5.004_50 there were many deep and far−reaching changes to the language internals. If

you have dynamically loaded extensions that you built under perl 5.003 or 5.004, you can continue to use

them with 5.004, but you will need to rebuild and reinstall those extensions to use them 5.005. See the

discussions below on "Coexistence with earlier versions of perl5" and "Upgrading from 5.004 to 5.005" for

more details.

The standard extensions supplied with Perl will be handled automatically.

In a related issue, old extensions may possibly be affected by the changes in the Perl language in the current

release. Please see pod/perldelta.pod for a description of what‘s changed.

Space Requirements

The complete perl5 source tree takes up about 10 MB of disk space. The complete tree after completing

make takes roughly 20 MB, though the actual total is likely to be quite system−dependent. The installation

directories need something on the order of 10 MB, though again that value is system−dependent.

18−Oct−1998 Version 5.005_02 3

INSTALL Perl Programmers Reference Guide INSTALL

Start with a Fresh Distribution

If you have built perl before, you should clean out the build directory with the command

make distclean

make realclean

The only difference between the two is that make distclean also removes your old config.sh and Policy.sh

files.

The results of a Configure run are stored in the config.sh and Policy.sh files. If you are upgrading from a

previous version of perl, or if you change systems or compilers or make other significant changes, or if you

are experiencing difficulties building perl, you should probably not re−use your old config.sh. Simply

remove it or rename it, e.g.

mv config.sh config.sh.old

If you wish to use your old config.sh, be especially attentive to the version and architecture−specific

questions and answers. For example, the default directory for architecture−dependent library modules

includes the version name. By default, Configure will reuse your old name (e.g.

/opt/perl/lib/i86pc−solaris/5.003) even if you‘re running Configure for a different version, e.g. 5.004. Yes,

Configure should probably check and correct for this, but it doesn‘t, presently. Similarly, if you used a

shared libperl.so (see below) with version numbers, you will probably want to adjust them as well.

Also, be careful to check your architecture name. Some Linux systems (such as Debian) use i386, while

others may use i486, i586, or i686. If you pick up a precompiled binary, it might not use the same name.

In short, if you wish to use your old config.sh, I recommend running Configure interactively rather than

blindly accepting the defaults.

If your reason to reuse your old config.sh is to save your particular installation choices, then you can

probably achieve the same effect by using the new Policy.sh file. See the section on

"Site−wide Policy settings" below.

Run Configure

Configure will figure out various things about your system. Some things Configure will figure out for itself,

other things it will ask you about. To accept the default, just press RETURN. The default is almost always

okay. At any Configure prompt, you can type &−d and Configure will use the defaults from then on.

After it runs, Configure will perform variable substitution on all the *.SH files and offer to run make depend.

Configure supports a number of useful options. Run Configure −h to get a listing. See the Porting/Glossary

file for a complete list of Configure variables you can set and their definitions.

To compile with gcc, for example, you should run

sh Configure −Dcc=gcc

This is the preferred way to specify gcc (or another alternative compiler) so that the hints files can set

appropriate defaults.

If you want to use your old config.sh but override some of the items with command line options, you need to

use Configure −O.

By default, for most systems, perl will be installed in /usr/local/{bin, lib, man}. You can specify a different

‘prefix’ for the default installation directory, when Configure prompts you or by using the Configure

command line option −Dprefix=‘/some/directory‘, e.g.

sh Configure −Dprefix=/opt/perl

4 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

If your prefix contains the string "perl", then the directories are simplified. For example, if you use

prefix=/opt/perl, then Configure will suggest /opt/perl/lib instead of /opt/perl/lib/perl5/.

NOTE: You must not specify an installation directory that is below your perl source directory. If you do,

installperl will attempt infinite recursion.

It may seem obvious to say, but Perl is useful only when users can easily find it. It‘s often a good idea to

have both /usr/bin/perl and /usr/local/bin/perl be symlinks to the actual binary. Be especially careful,

however, of overwriting a version of perl supplied by your vendor. In any case, system administrators are

strongly encouraged to put (symlinks to) perl and its accompanying utilities, such as perldoc, into a directory

typically found along a user‘s PATH, or in another obvious and convenient place.

By default, Configure will compile perl to use dynamic loading if your system supports it. If you want to

force perl to be compiled statically, you can either choose this when Configure prompts you or you can use

the Configure command line option −Uusedl.

If you are willing to accept all the defaults, and you want terse output, you can run

sh Configure −des

For my Solaris system, I usually use

sh Configure −Dprefix=/opt/perl −Doptimize=’−xpentium −xO4’ −des

GNU−style configure

If you prefer the GNU−style configure command line interface, you can use the supplied configure.gnu

command, e.g.

CC=gcc ./configure.gnu

The configure.gnu script emulates a few of the more common configure options. Try

./configure.gnu −−help

for a listing.

Cross compiling is not supported.

(The file is called configure.gnu to avoid problems on systems that would not distinguish the files

"Configure" and "configure".)

Extensions

By default, Configure will offer to build every extension which appears to be supported. For example,

Configure will offer to build GDBM_File only if it is able to find the gdbm library. (See examples below.)

B, DynaLoader, Fcntl, IO, and attrs are always built by default. Configure does not contain code to test for

POSIX compliance, so POSIX is always built by default as well. If you wish to skip POSIX, you can set the

Configure variable useposix=false either in a hint file or from the Configure command line. Similarly, the

Opcode extension is always built by default, but you can skip it by setting the Configure variable

useopcode=false either in a hint file for from the command line.

You can learn more about each of these extensions by consulting the documentation in the individual .pm

modules, located under the ext/ subdirectory.

Even if you do not have dynamic loading, you must still build the DynaLoader extension; you should just

build the stub dl_none.xs version. (Configure will suggest this as the default.)

In summary, here are the Configure command−line variables you can set to turn off each extension:

B (Always included by default)

DB_File i_db

DynaLoader (Must always be included as a static extension)

Fcntl (Always included by default)

GDBM_File i_gdbm

IO (Always included by default)

18−Oct−1998 Version 5.005_02 5

INSTALL Perl Programmers Reference Guide INSTALL

NDBM_File i_ndbm

ODBM_File i_dbm

POSIX useposix

SDBM_File (Always included by default)

Opcode useopcode

Socket d_socket

Threads usethreads

attrs (Always included by default)

Thus to skip the NDBM_File extension, you can use

sh Configure −Ui_ndbm

Again, this is taken care of automatically if you don‘t have the ndbm library.

Of course, you may always run Configure interactively and select only the extensions you want.

Note: The DB_File module will only work with version 1.x of Berkeley DB or newer releases of version 2.

Configure will automatically detect this for you and refuse to try to build DB_File with version 2.

If you re−use your old config.sh but change your system (e.g. by adding libgdbm) Configure will still offer

your old choices of extensions for the default answer, but it will also point out the discrepancy to you.

Finally, if you have dynamic loading (most modern Unix systems do) remember that these extensions do not

increase the size of your perl executable, nor do they impact start−up time, so you probably might as well

build all the ones that will work on your system.

Including locally−installed libraries

Perl5 comes with interfaces to number of database extensions, including dbm, ndbm, gdbm, and Berkeley

db. For each extension, if Configure can find the appropriate header files and libraries, it will automatically

include that extension. The gdbm and db libraries are not included with perl. See the library documentation

for how to obtain the libraries.

Note: If your database header (.h) files are not in a directory normally searched by your C compiler, then

you will need to include the appropriate −I/your/directory option when prompted by Configure. If your

database library (.a) files are not in a directory normally searched by your C compiler and linker, then you

will need to include the appropriate −L/your/directory option when prompted by Configure. See the

examples below.

Examples

gdbm in /usr/local

Suppose you have gdbm and want Configure to find it and build the GDBM_File extension. This

examples assumes you have gdbm.h installed in /usr/local/include/gdbm.h and libgdbm.a installed in

/usr/local/lib/libgdbm.a. Configure should figure all the necessary steps out automatically.

Specifically, when Configure prompts you for flags for your C compiler, you should include

−I/usr/local/include.

When Configure prompts you for linker flags, you should include −L/usr/local/lib.

If you are using dynamic loading, then when Configure prompts you for linker flags for dynamic

loading, you should again include −L/usr/local/lib.

Again, this should all happen automatically. If you want to accept the defaults for all the questions and

have Configure print out only terse messages, then you can just run

sh Configure −des

and Configure should include the GDBM_File extension automatically.

This should actually work if you have gdbm installed in any of (/usr/local, /opt/local, /usr/gnu,

/opt/gnu, /usr/GNU, or /opt/GNU).

6 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

gdbm in /usr/you

Suppose you have gdbm installed in some place other than /usr/local/, but you still want Configure to

find it. To be specific, assume you have /usr/you/include/gdbm.h and /usr/you/lib/libgdbm.a. You still

have to add −I/usr/you/include to cc flags, but you have to take an extra step to help Configure find

libgdbm.a. Specifically, when Configure prompts you for library directories, you have to add

/usr/you/lib to the list.

It is possible to specify this from the command line too (all on one line):

sh Configure −des \

−Dlocincpth="/usr/you/include" \

−Dloclibpth="/usr/you/lib"

locincpth is a space−separated list of include directories to search. Configure will automatically add

the appropriate −I directives.

loclibpth is a space−separated list of library directories to search. Configure will automatically add the

appropriate −L directives. If you have some libraries under /usr/local/ and others under /usr/you, then

you have to include both, namely

sh Configure −des \

−Dlocincpth="/usr/you/include /usr/local/include" \

−Dloclibpth="/usr/you/lib /usr/local/lib"

Installation Directories

The installation directories can all be changed by answering the appropriate questions in Configure. For

convenience, all the installation questions are near the beginning of Configure.

I highly recommend running Configure interactively to be sure it puts everything where you want it. At any

point during the Configure process, you can answer a question with &−d and Configure will use the

defaults from then on.

By default, Configure will use the following directories for library files for 5.005 (archname is a string like

sun4−sunos, determined by Configure).

Configure variable Default value

$archlib /usr/local/lib/perl5/5.005/archname

$privlib /usr/local/lib/perl5/5.005

$sitearch /usr/local/lib/perl5/site_perl/5.005/archname

$sitelib /usr/local/lib/perl5/site_perl/5.005

Some users prefer to append a "/share" to $privlib and $sitelib to emphasize that those directories

can be shared among different architectures.

By default, Configure will use the following directories for manual pages:

Configure variable Default value

$man1dir /usr/local/man/man1

$man3dir /usr/local/lib/perl5/man/man3

(Actually, Configure recognizes the SVR3−style /usr/local/man/l_man/man1 directories, if present, and uses

those instead.)

The module man pages are stuck in that strange spot so that they don‘t collide with other man pages stored in

/usr/local/man/man3, and so that Perl‘s man pages don‘t hide system man pages. On some systems, man

less would end up calling up Perl‘s less.pm module man page, rather than the less program. (This default

location will likely change to /usr/local/man/man3 in a future release of perl.)

Note: Many users prefer to store the module man pages in /usr/local/man/man3. You can do this from the

command line with

18−Oct−1998 Version 5.005_02 7

INSTALL Perl Programmers Reference Guide INSTALL

sh Configure −Dman3dir=/usr/local/man/man3

Some users also prefer to use a .3pm suffix. You can do that with

sh Configure −Dman3ext=3pm

If you specify a prefix that contains the string "perl", then the directory structure is simplified. For example,

if you Configure with −Dprefix=/opt/perl, then the defaults for 5.005 are

Configure variable Default value

$archlib /opt/perl/lib/5.005/archname

$privlib /opt/perl/lib/5.005

$sitearch /opt/perl/lib/site_perl/5.005/archname

$sitelib /opt/perl/lib/site_perl/5.005

$man1dir /opt/perl/man/man1

$man3dir /opt/perl/man/man3

The perl executable will search the libraries in the order given above.

The directories under site_perl are empty, but are intended to be used for installing local or site−wide

extensions. Perl will automatically look in these directories.

In order to support using things like #!/usr/local/bin/perl5.005 after a later version is released,

architecture−dependent libraries are stored in a version−specific directory, such as

/usr/local/lib/perl5/archname/5.005/.

Further details about the installation directories, maintenance and development subversions, and about

supporting multiple versions are discussed in "Coexistence with earlier versions of perl5" below.

Again, these are just the defaults, and can be changed as you run Configure.

Changing the installation directory

Configure distinguishes between the directory in which perl (and its associated files) should be installed and

the directory in which it will eventually reside. For most sites, these two are the same; for sites that use AFS,

this distinction is handled automatically. However, sites that use software such as depot to manage software

packages may also wish to install perl into a different directory and use that management software to move

perl to its final destination. This section describes how to do this. Someday, Configure may support an

option −Dinstallprefix=/foo to simplify this.

Suppose you want to install perl under the /tmp/perl5 directory. You can edit config.sh and change all the

install* variables to point to /tmp/perl5 instead of /usr/local/wherever. Or, you can automate this process by

placing the following lines in a file config.over before you run Configure (replace /tmp/perl5 by a directory

of your choice):

installprefix=/tmp/perl5

test −d $installprefix || mkdir $installprefix

test −d $installprefix/bin || mkdir $installprefix/bin

installarchlib=‘echo $installarchlib | sed "s!$prefix!$installprefix!"‘

installbin=‘echo $installbin | sed "s!$prefix!$installprefix!"‘

installman1dir=‘echo $installman1dir | sed "s!$prefix!$installprefix!"‘

installman3dir=‘echo $installman3dir | sed "s!$prefix!$installprefix!"‘

installprivlib=‘echo $installprivlib | sed "s!$prefix!$installprefix!"‘

installscript=‘echo $installscript | sed "s!$prefix!$installprefix!"‘

installsitelib=‘echo $installsitelib | sed "s!$prefix!$installprefix!"‘

installsitearch=‘echo $installsitearch | sed "s!$prefix!$installprefix!"‘

Then, you can Configure and install in the usual way:

sh Configure −des

make

make test

8 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

make install

Beware, though, that if you go to try to install new add−on extensions, they too will get installed in under

‘/tmp/perl5’ if you follow this example. The next section shows one way of dealing with that problem.

Creating an installable tar archive

If you need to install perl on many identical systems, it is convenient to compile it once and create an archive

that can be installed on multiple systems. Here‘s one way to do that:

# Set up config.over to install perl into a different directory,

# e.g. /tmp/perl5 (see previous part).

sh Configure −des

make

make test

make install

cd /tmp/perl5

# Edit $archlib/Config.pm to change all the

# install* variables back to reflect where everything will

# really be installed.

# Edit any of the scripts in $scriptdir to have the correct

# #!/wherever/perl line.

tar cvf ../perl5−archive.tar .

# Then, on each machine where you want to install perl,

cd /usr/local # Or wherever you specified as $prefix

tar xvf perl5−archive.tar

Site−wide Policy settings

After Configure runs, it stores a number of common site−wide "policy" answers (such as installation

directories and the local perl contact person) in the Policy.sh file. If you want to build perl on another

system using the same policy defaults, simply copy the Policy.sh file to the new system and Configure will

use it along with the appropriate hint file for your system.

Alternatively, if you wish to change some or all of those policy answers, you should

rm −f Policy.sh

to ensure that Configure doesn‘t re−use them.

Further information is in the Policy_sh.SH file itself.

Configure−time Options

There are several different ways to Configure and build perl for your system. For most users, the defaults are

sensible and will work. Some users, however, may wish to further customize perl. Here are some of the

main things you can change.

Threads

On some platforms, perl5.005 can be compiled to use threads. To enable this, read the file

README.threads, and then try

sh Configure −Dusethreads

Currently, you need to specify −Dusethreads on the Configure command line so that the hint files can make

appropriate adjustments.

The default is to compile without thread support.

Selecting File IO mechanisms

Previous versions of perl used the standard IO mechanisms as defined in stdio.h. Versions 5.003_02 and

later of perl allow alternate IO mechanisms via a "PerlIO" abstraction, but the stdio mechanism is still the

default and is the only supported mechanism.

18−Oct−1998 Version 5.005_02 9

INSTALL Perl Programmers Reference Guide INSTALL

This PerlIO abstraction can be enabled either on the Configure command line with

sh Configure −Duseperlio

or interactively at the appropriate Configure prompt.

If you choose to use the PerlIO abstraction layer, there are two (experimental) possibilities for the underlying

IO calls. These have been tested to some extent on some platforms, but are not guaranteed to work

everywhere.

1. AT&T‘s "sfio". This has superior performance to stdio.h in many cases, and is extensible by the use

of "discipline" modules. Sfio currently only builds on a subset of the UNIX platforms perl supports.

Because the data structures are completely different from stdio, perl extension modules or external

libraries may not work. This configuration exists to allow these issues to be worked on.

This option requires the ‘sfio’ package to have been built and installed. A (fairly old) version of sfio is

in CPAN.

You select this option by

sh Configure −Duseperlio −Dusesfio

If you have already selected −Duseperlio, and if Configure detects that you have sfio, then sfio will be

the default suggested by Configure.

Note: On some systems, sfio‘s iffe configuration script fails to detect that you have an atexit function

(or equivalent). Apparently, this is a problem at least for some versions of Linux and SunOS 4.

You can test if you have this problem by trying the following shell script. (You may have to add some

extra cflags and libraries. A portable version of this may eventually make its way into Configure.)

#!/bin/sh

cat > try.c <<’EOCP’

#include <stdio.h>

main() { printf("42\n"); }

EOCP

cc −o try try.c −lsfio

val=‘./try‘

if test X$val = X42; then

echo "Your sfio looks ok"

else

echo "Your sfio has the exit problem."

If you have this problem, the fix is to go back to your sfio sources and correct iffe‘s guess about atexit.

There also might be a more recent release of Sfio that fixes your problem.

2. Normal stdio IO, but with all IO going through calls to the PerlIO abstraction layer. This configuration

can be used to check that perl and extension modules have been correctly converted to use the PerlIO

abstraction.

This configuration should work on all platforms (but might not).

You select this option via:

sh Configure −Duseperlio −Uusesfio

If you have already selected −Duseperlio, and if Configure does not detect sfio, then this will be the

default suggested by Configure.

10 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

Building a shared libperl.so Perl library

Currently, for most systems, the main perl executable is built by linking the "perl library" libperl.a with

perlmain.o, your static extensions (usually just DynaLoader.a) and various extra libraries, such as −lm.

On some systems that support dynamic loading, it may be possible to replace libperl.a with a shared

libperl.so. If you anticipate building several different perl binaries (e.g. by embedding libperl into different

programs, or by using the optional compiler extension), then you might wish to build a shared libperl.so so

that all your binaries can share the same library.

The disadvantages are that there may be a significant performance penalty associated with the shared

libperl.so, and that the overall mechanism is still rather fragile with respect to different versions and

upgrades.

In terms of performance, on my test system (Solaris 2.5_x86) the perl test suite took roughly 15% longer to

run with the shared libperl.so. Your system and typical applications may well give quite different results.

The default name for the shared library is typically something like libperl.so.3.2 (for Perl 5.003_02) or

libperl.so.302 or simply libperl.so. Configure tries to guess a sensible naming convention based on your C

library name. Since the library gets installed in a version−specific architecture−dependent directory, the

exact name isn‘t very important anyway, as long as your linker is happy.

For some systems (mostly SVR4), building a shared libperl is required for dynamic loading to work, and

hence is already the default.

You can elect to build a shared libperl by

sh Configure −Duseshrplib

To actually build perl, you must add the current working directory to your LD_LIBRARY_PATH

environment variable before running make. You can do this with

LD_LIBRARY_PATH=‘pwd‘:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH

for Bourne−style shells, or

setenv LD_LIBRARY_PATH ‘pwd‘

for Csh−style shells. You *MUST* do this before running make. Folks running NeXT OPENSTEP must

substitute DYLD_LIBRARY_PATH for LD_LIBRARY_PATH above.

There is also an potential problem with the shared perl library if you want to have more than one "flavor" of

the same version of perl (e.g. with and without −DDEBUGGING). For example, suppose you build and

install a standard Perl 5.004 with a shared library. Then, suppose you try to build Perl 5.004 with

−DDEBUGGING enabled, but everything else the same, including all the installation directories. How can

you ensure that your newly built perl will link with your newly built libperl.so.4 rather with the installed

libperl.so.4? The answer is that you might not be able to. The installation directory is encoded in the perl

binary with the LD_RUN_PATH environment variable (or equivalent ld command−line option). On Solaris,

you can override that with LD_LIBRARY_PATH; on Linux you can‘t. On Digital Unix, you can override

LD_LIBRARY_PATH by setting the _RLD_ROOT environment variable to point to the perl build directory.

The only reliable answer is that you should specify a different directory for the architecture−dependent

library for your −DDEBUGGING version of perl. You can do this by changing all the *archlib* variables in

config.sh, namely archlib, archlib_exp, and installarchlib, to point to your new architecture−dependent

library.

Malloc Issues

Perl relies heavily on malloc(3) to grow data structures as needed, so perl‘s performance can be noticeably

affected by the performance of the malloc function on your system.

The perl source is shipped with a version of malloc that is very fast but somewhat wasteful of space. On the

other hand, your system‘s malloc function may be a bit slower but also a bit more frugal. However, as of

18−Oct−1998 Version 5.005_02 11

INSTALL Perl Programmers Reference Guide INSTALL

5.004_68, perl‘s malloc has been optimized for the typical requests from perl, so there‘s a chance that it may

be both faster and use less memory.

For many uses, speed is probably the most important consideration, so the default behavior (for most

systems) is to use the malloc supplied with perl. However, if you will be running very large applications

(e.g. Tk or PDL) or if your system already has an excellent malloc, or if you are experiencing difficulties

with extensions that use third−party libraries that call malloc, then you might wish to use your system‘s

malloc. (Or, you might wish to explore the malloc flags discussed below.)

To build without perl‘s malloc, you can use the Configure command

sh Configure −Uusemymalloc

or you can answer ‘n’ at the appropriate interactive Configure prompt.

Malloc Performance Flags

If you are using Perl‘s malloc, you may add one or more of the following items to your ccflags config.sh

variable to change its behavior. You can find out more about these and other flags by reading the

commentary near the top of the malloc.c source. The defaults should be fine for nearly everyone.

−DNO_FANCY_MALLOC

Undefined by default. Defining it returns malloc to the version used in Perl 5.004.

−DPLAIN_MALLOC

Undefined by default. Defining it in addition to NO_FANCY_MALLOC returns malloc to the version

used in Perl version 5.000.

Building a debugging perl

You can run perl scripts under the perl debugger at any time with perl −d your_script. If, however, you

want to debug perl itself, you probably want to do

sh Configure −Doptimize=’−g’

This will do two independent things: First, it will force compilation to use cc −g so that you can use your

system‘s debugger on the executable. (Note: Your system may actually require something like cc −g2.

Check your man pages for cc(1) and also any hint file for your system.) Second, it will add

−DDEBUGGING to your ccflags variable in config.sh so that you can use perl −D to access perl‘s internal

state. (Note: Configure will only add −DDEBUGGING by default if you are not reusing your old config.sh.

If you want to reuse your old config.sh, then you can just edit it and change the optimize and ccflags

variables by hand and then propagate your changes as shown in "Propagating your changes to config.sh"

below.)

You can actually specify −g and −DDEBUGGING independently, but usually it‘s convenient to have both.

If you are using a shared libperl, see the warnings about multiple versions of perl under

Building a shared libperl.so Perl library.

Other Compiler Flags

For most users, all of the Configure defaults are fine. However, you can change a number of factors in the

way perl is built by adding appropriate −D directives to your ccflags variable in config.sh.

For example, you can replace the rand() and srand() functions in the perl source by any other random

number generator by a trick such as the following (this should all be on one line):

sh Configure −Dccflags=’−Dmy_rand=random −Dmy_srand=srandom’ \

−Drandbits=31

or you can use the drand48 family of functions with

sh Configure −Dccflags=’−Dmy_rand=lrand48 −Dmy_srand=srand48’ \

−Drandbits=31

12 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

or by adding the −D flags to your ccflags at the appropriate Configure prompt. (Read pp.c to see how this

works.)

You should also run Configure interactively to verify that a hint file doesn‘t inadvertently override your

ccflags setting. (Hints files shouldn‘t do that, but some might.)

What if it doesn‘t work?

Running Configure Interactively

If Configure runs into trouble, remember that you can always run Configure interactively so that you

can check (and correct) its guesses.

All the installation questions have been moved to the top, so you don‘t have to wait for them. Once

you‘ve handled them (and your C compiler and flags) you can type &−d at the next Configure prompt

and Configure will use the defaults from then on.

If you find yourself trying obscure command line incantations and config.over tricks, I recommend you

run Configure interactively instead. You‘ll probably save yourself time in the long run.

Hint files

The perl distribution includes a number of system−specific hints files in the hints/ directory. If one of

them matches your system, Configure will offer to use that hint file.

Several of the hint files contain additional important information. If you have any problems, it is a

good idea to read the relevant hint file for further information. See hints/solaris_2.sh for an extensive

example. More information about writing good hints is in the hints/README.hints file.

** WHOA THERE!!! ***

Occasionally, Configure makes a wrong guess. For example, on SunOS 4.1.3, Configure incorrectly

concludes that tzname[] is in the standard C library. The hint file is set up to correct for this. You will

see a message:

*** WHOA THERE!!! ***

The recommended value for $d_tzname on this machine was "undef"!

Keep the recommended value? [y]

You should always keep the recommended value unless, after reading the relevant section of the hint

file, you are sure you want to try overriding it.

If you are re−using an old config.sh, the word "previous" will be used instead of "recommended".

Again, you will almost always want to keep the previous value, unless you have changed something on

your system.

For example, suppose you have added libgdbm.a to your system and you decide to reconfigure perl to

use GDBM_File. When you run Configure again, you will need to add −lgdbm to the list of libraries.

Now, Configure will find your gdbm include file and library and will issue a message:

*** WHOA THERE!!! ***

The previous value for $i_gdbm on this machine was "undef"!

Keep the previous value? [y]

In this case, you do not want to keep the previous value, so you should answer ‘n’. (You‘ll also have

to manually add GDBM_File to the list of dynamic extensions to build.)

Changing Compilers

If you change compilers or make other significant changes, you should probably not re−use your old

config.sh. Simply remove it or rename it, e.g. mv config.sh config.sh.old. Then rerun Configure with

the options you want to use.

This is a common source of problems. If you change from cc to gcc, you should almost always

remove your old config.sh.

18−Oct−1998 Version 5.005_02 13

INSTALL Perl Programmers Reference Guide INSTALL

Propagating your changes to config.sh

If you make any changes to config.sh, you should propagate them to all the .SH files by running

sh Configure −S

You will then have to rebuild by running

make depend

make

config.over

You can also supply a shell script config.over to over−ride Configure‘s guesses. It will get loaded up

at the very end, just before config.sh is created. You have to be careful with this, however, as

Configure does no checking that your changes make sense. See the section on

"Changing the installation directory" for an example.

config.h

Many of the system dependencies are contained in config.h. Configure builds config.h by running the

config_h.SH script. The values for the variables are taken from config.sh.

If there are any problems, you can edit config.h directly. Beware, though, that the next time you run

Configure, your changes will be lost.

cflags

If you have any additional changes to make to the C compiler command line, they can be made in

cflags.SH. For instance, to turn off the optimizer on toke.c, find the line in the switch structure for

toke.c and put the command optimize=‘−g’ before the ;; . You can also edit cflags directly, but beware

that your changes will be lost the next time you run Configure.

To explore various ways of changing ccflags from within a hint file, see the file hints/README.hints.

To change the C flags for all the files, edit config.sh and change either $ccflags or $optimize,

and then re−run

sh Configure −S

make depend

No sh

If you don‘t have sh, you‘ll have to copy the sample file Porting/config_H to config.h and edit the

config.h to reflect your system‘s peculiarities. You‘ll probably also have to extensively modify the

extension building mechanism.

Porting information

Specific information for the OS/2, Plan9, VMS and Win32 ports is in the corresponding README

files and subdirectories. Additional information, including a glossary of all those config.sh variables,

is in the Porting subdirectory.

Ports for other systems may also be available. You should check out http://www.perl.com/CPAN/ports

for current information on ports to various other operating systems.

make depend

This will look for all the includes. The output is stored in makefile. The only difference between Makefile

and makefile is the dependencies at the bottom of makefile. If you have to make any changes, you should

edit makefile, not Makefile since the Unix make command reads makefile first. (On non−Unix systems, the

output may be stored in a different file. Check the value of $firstmakefile in your config.sh if in

doubt.)

Configure will offer to do this step for you, so it isn‘t listed explicitly above.

14 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

make

This will attempt to make perl in the current directory.

If you can‘t compile successfully, try some of the following ideas. If none of them help, and careful reading

of the error message and the relevant manual pages on your system doesn‘t help, you can send a message to

either the comp.lang.perl.misc newsgroup or to perlbug@perl.com with an accurate description of your

problem. See "Reporting Problems" below.

hints

If you used a hint file, try reading the comments in the hint file for further tips and information.

extensions

If you can successfully build miniperl, but the process crashes during the building of extensions, you

should run

make minitest

to test your version of miniperl.

locale

If you have any locale−related environment variables set, try unsetting them. I have some reports that

some versions of IRIX hang while running ./miniperl configpm with locales other than the C locale.

See the discussion under "make test" below about locales and the whole "Locale problems" section in

the file pod/perllocale.pod. The latter is especially useful if you see something like this

perl: warning: Setting locale failed.

perl: warning: Please check that your locale settings:

LC_ALL = "En_US",

LANG = (unset)

are supported and installed on your system.

perl: warning: Falling back to the standard locale ("C").

at Perl startup.

malloc duplicates

If you get duplicates upon linking for malloc et al, add −DEMBEDMYMALLOC to your ccflags

variable in config.sh.

varargs

If you get varargs problems with gcc, be sure that gcc is installed correctly and that you are not passing

−I/usr/include to gcc. When using gcc, you should probably have i_stdarg=‘define’ and

i_varargs=‘undef’ in config.sh. The problem is usually solved by running fixincludes correctly. If you

do change config.sh, don‘t forget to propagate your changes (see

"Propagating your changes to config.sh" below). See also the "vsprintf" item below.

util.c

If you get error messages such as the following (the exact line numbers and function name may vary in

different versions of perl):

util.c: In function ‘Perl_form’:

util.c:1107: number of arguments doesn’t match prototype

proto.h:125: prototype declaration

it might well be a symptom of the gcc "varargs problem". See the previous "varargs" item.

Solaris and SunOS dynamic loading

If you have problems with dynamic loading using gcc on SunOS or Solaris, and you are using GNU as

and GNU ld, you may need to add −B/bin/ (for SunOS) or −B/usr/ccs/bin/ (for Solaris) to your

$ccflags, $ldflags, and $lddlflags so that the system‘s versions of as and ld are used.

18−Oct−1998 Version 5.005_02 15

INSTALL Perl Programmers Reference Guide INSTALL

Note that the trailing ‘/’ is required. Alternatively, you can use the GCC_EXEC_PREFIX environment

variable to ensure that Sun‘s as and ld are used. Consult your gcc documentation for further

information on the −B option and the GCC_EXEC_PREFIX variable.

One convenient way to ensure you are not using GNU as and ld is to invoke Configure with

sh Configure −Dcc=’gcc −B/usr/ccs/bin/’

for Solaris systems. For a SunOS system, you must use −B/bin/ instead.

Alternatively, recent versions of GNU ld reportedly work if you include −Wl,−export−dynamic

in the ccdlflags variable in config.sh.

ld.so.1: ./perl: fatal: relocation error:

If you get this message on SunOS or Solaris, and you‘re using gcc, it‘s probably the GNU as or GNU

ld problem in the previous item "Solaris and SunOS dynamic loading".

LD_LIBRARY_PATH

If you run into dynamic loading problems, check your setting of the LD_LIBRARY_PATH

environment variable. If you‘re creating a static Perl library (libperl.a rather than libperl.so) it should

build fine with LD_LIBRARY_PATH unset, though that may depend on details of your local set−up.

dlopen: stub interception failed

The primary cause of the ‘dlopen: stub interception failed’ message is that the LD_LIBRARY_PATH

environment variable includes a directory which is a symlink to /usr/lib (such as /lib).

The reason this causes a problem is quite subtle. The file libdl.so.1.0 actually *only* contains

functions which generate ‘stub interception failed’ errors! The runtime linker intercepts links to

"/usr/lib/libdl.so.1.0" and links in internal implementation of those functions instead. [Thanks to Tim

Bunce for this explanation.]

nm extraction

If Configure seems to be having trouble finding library functions, try not using nm extraction. You

can do this from the command line with

sh Configure −Uusenm

or by answering the nm extraction question interactively. If you have previously run Configure, you

should not reuse your old config.sh.

umask not found

If the build processes encounters errors relating to umask(), the problem is probably that Configure

couldn‘t find your umask() system call. Check your config.sh. You should have d_umask=‘define’.

If you don‘t, this is probably the "nm extraction" problem discussed above. Also, try reading the hints

file for your system for further information.

vsprintf

If you run into problems with vsprintf in compiling util.c, the problem is probably that Configure failed

to detect your system‘s version of vsprintf(). Check whether your system has vprintf().

(Virtually all modern Unix systems do.) Then, check the variable d_vprintf in config.sh. If your

system has vprintf, it should be:

d_vprintf=’define’

If Configure guessed wrong, it is likely that Configure guessed wrong on a number of other common

functions too. This is probably the "nm extraction" problem discussed above.

do_aspawn

If you run into problems relating to do_aspawn or do_spawn, the problem is probably that Configure

failed to detect your system‘s fork() function. Follow the procedure in the previous item on

"nm extraction".

16 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

__inet_* errors

If you receive unresolved symbol errors during Perl build and/or test referring to __inet_* symbols,

check to see whether BIND 8.1 is installed. It installs a /usr/local/include/arpa/inet.h that refers to

these symbols. Versions of BIND later than 8.1 do not install inet.h in that location and avoid the

errors. You should probably update to a newer version of BIND. If you can‘t, you can either link with

the updated resolver library provided with BIND 8.1 or rename /usr/local/bin/arpa/inet.h during the

Perl build and test process to avoid the problem.

Optimizer

If you can‘t compile successfully, try turning off your compiler‘s optimizer. Edit config.sh and change

the line

optimize=’−O’

optimize=’ ’

then propagate your changes with sh Configure −S and rebuild with make depend; make.

CRIPPLED_CC

If you still can‘t compile successfully, try adding a −DCRIPPLED_CC flag. (Just because you get no

errors doesn‘t mean it compiled right!) This simplifies some complicated expressions for compilers

that get indigestion easily.

Missing functions

If you have missing routines, you probably need to add some library or other, or you need to undefine

some feature that Configure thought was there but is defective or incomplete. Look through config.h

for likely suspects. If Configure guessed wrong on a number of functions, you might have the

"nm extraction" problem discussed above.

toke.c

Some compilers will not compile or optimize the larger files (such as toke.c) without some extra

switches to use larger jump offsets or allocate larger internal tables. You can customize the switches

for each file in cflags. It‘s okay to insert rules for specific files into makefile since a default rule only

takes effect in the absence of a specific rule.

Missing dbmclose

SCO prior to 3.2.4 may be missing dbmclose(). An upgrade to 3.2.4 that includes libdbm.nfs

(which includes dbmclose()) may be available.

Note (probably harmless): No library found for −lsomething

If you see such a message during the building of an extension, but the extension passes its tests anyway

(see "make test" below), then don‘t worry about the warning message. The extension Makefile.PL

goes looking for various libraries needed on various systems; few systems will need all the possible

libraries listed. For example, a system may have −lcposix or −lposix, but it‘s unlikely to have both, so

most users will see warnings for the one they don‘t have. The phrase ‘probably harmless’ is intended

to reassure you that nothing unusual is happening, and the build process is continuing.

On the other hand, if you are building GDBM_File and you get the message

Note (probably harmless): No library found for −lgdbm

then it‘s likely you‘re going to run into trouble somewhere along the line, since it‘s hard to see how

you can use the GDBM_File extension without the −lgdbm library.

It is true that, in principle, Configure could have figured all of this out, but Configure and the extension

building process are not quite that tightly coordinated.

18−Oct−1998 Version 5.005_02 17

INSTALL Perl Programmers Reference Guide INSTALL

sh: ar: not found

This is a message from your shell telling you that the command ‘ar’ was not found. You need to check

your PATH environment variable to make sure that it includes the directory with the ‘ar’ command.

This is a common problem on Solaris, where ‘ar’ is in the /usr/ccs/bin directory.

db−recno failure on tests 51, 53 and 55

Old versions of the DB library (including the DB library which comes with FreeBSD 2.1) had broken

handling of recno databases with modified bval settings. Upgrade your DB library or OS.

Bad arg length for semctl, is XX, should be ZZZ

If you get this error message from the lib/ipc_sysv test, your System V IPC may be broken. The XX

typically is 20, and that is what ZZZ also should be. Consider upgrading your OS, or reconfiguring

your OS to include the System V semaphores.

lib/ipc_sysv........semget: No space left on device

Either your account or the whole system has run out of semaphores. Or both. Either list the

semaphores with "ipcs" and remove the unneeded ones (which ones these are depends on your system

and applications) with "ipcrm −s SEMAPHORE_ID_HERE" or configure more semaphores to your

system.

Miscellaneous

Some additional things that have been reported for either perl4 or perl5:

Genix may need to use libc rather than libc_s, or #undef VARARGS.

NCR Tower 32 (OS 2.01.01) may need −W2,−Sl,2000 and #undef MKDIR.

UTS may need one or more of −DCRIPPLED_CC, −K or −g, and undef LSTAT.

FreeBSD can fail the lib/ipc_sysv.t test if SysV IPC has not been configured to the kernel. Perl tries to

detect this, though, and you will get a message telling what to do.

If you get syntax errors on ‘(‘, try −DCRIPPLED_CC.

Machines with half−implemented dbm routines will need to #undef I_ODBM

make test

This will run the regression tests on the perl you just made (you should run plain ‘make’ before ‘make test’

otherwise you won‘t have a complete build). If ‘make test’ doesn‘t say "All tests successful" then something

went wrong. See the file t/README in the t subdirectory.

Note that you can‘t run the tests in background if this disables opening of /dev/tty. You can use ‘make

test−notty’ in that case but a few tty tests will be skipped.

What if make test doesn‘t work?

If make test bombs out, just cd to the t directory and run ./TEST by hand to see if it makes any difference. If

individual tests bomb, you can run them by hand, e.g.,

./perl op/groups.t

Another way to get more detailed information about failed tests and individual subtests is to cd to the t

directory and run

./perl harness

(this assumes that most basic tests succeed, since harness uses complicated constructs).

You should also read the individual tests to see if there are any helpful comments that apply to your system.

locale

Note: One possible reason for errors is that some external programs may be broken due to the

combination of your environment and the way make test exercises them. For example, this may

18 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

happen if you have one or more of these environment variables set: LC_ALL LC_CTYPE

LC_COLLATE LANG. In some versions of UNIX, the non−English locales are known to cause

programs to exhibit mysterious errors.

If you have any of the above environment variables set, please try

setenv LC_ALL C

(for C shell) or

LC_ALL=C;export LC_ALL

for Bourne or Korn shell) from the command line and then retry make test. If the tests then succeed,

you may have a broken program that is confusing the testing. Please run the troublesome test by hand

as shown above and see whether you can locate the program. Look for things like: exec, ‘backquoted

command‘, system, open("|...") or open("...|"). All these mean that Perl is trying to run some external

program.

Out of memory

On some systems, particularly those with smaller amounts of RAM, some of the tests in t/op/pat.t may

fail with an "Out of memory" message. Specifically, in perl5.004_64, tests 74 and 78 have been

reported to fail on some systems. On my SparcStation IPC with 8 MB of RAM, test 78 will fail if the

system is running any other significant tasks at the same time.

Try stopping other jobs on the system and then running the test by itself:

cd t; ./perl op/pat.t

to see if you have any better luck. If your perl still fails this test, it does not necessarily mean you have

a broken perl. This test tries to exercise the regular expression subsystem quite thoroughly, and may

well be far more demanding than your normal usage.

make install

This will put perl into the public directory you specified to Configure; by default this is /usr/local/bin. It will

also try to put the man pages in a reasonable place. It will not nroff the man pages, however. You may need

to be root to run make install. If you are not root, you must own the directories in question and you should

ignore any messages about chown not working.

Installing perl under different names

If you want to install perl under a name other than "perl" (for example, when installing perl with special

features enabled, such as debugging), indicate the alternate name on the "make install" line, such as:

make install PERLNAME=myperl

Installed files

If you want to see exactly what will happen without installing anything, you can run

./perl installperl −n

./perl installman −n

make install will install the following:

perl,

perl5.nnn where nnn is the current release number. This

will be a link to perl.

suidperl,

sperl5.nnn If you requested setuid emulation.

a2p awk−to−perl translator

cppstdin This is used by perl −P, if your cc −E can’t

read from stdin.

c2ph, pstruct Scripts for handling C structures in header files.

s2p sed−to−perl translator

18−Oct−1998 Version 5.005_02 19

INSTALL Perl Programmers Reference Guide INSTALL

find2perlfind−to−perl translator

h2ph Extract constants and simple macros from C headers

h2xs Converts C .h header files to Perl extensions.

perlbug Tool to report bugs in Perl.

perldoc Tool to read perl’s pod documentation.

pl2pm Convert Perl 4 .pl files to Perl 5 .pm modules

pod2html,Converters from perl’s pod documentation format

pod2latex, to other useful formats.

pod2man, and

pod2text

splain Describe Perl warnings and errors

library files in $privlib and $archlib specified to

Configure, usually under /usr/local/lib/perl5/.

man pages in the location specified to Configure, usually

something like /usr/local/man/man1.

module in the location specified to Configure, usually

man pages under /usr/local/lib/perl5/man/man3.

pod/*.pod in $privlib/pod/.

Installperl will also create the library directories $siteperl and $sitearch listed in config.sh. Usually,

these are something like

/usr/local/lib/perl5/site_perl/5.005

/usr/local/lib/perl5/site_perl/5.005/archname

where archname is something like sun4−sunos. These directories will be used for installing extensions.

Perl‘s *.h header files and the libperl.a library are also installed under $archlib so that any user may later

build new extensions, run the optional Perl compiler, or embed the perl interpreter into another program even

if the Perl source is no longer available.

Coexistence with earlier versions of perl5

WARNING: The upgrade from 5.004_0x to 5.005 is going to be a bit tricky. See

"Upgrading from 5.004 to 5.005" below.

In general, you can usually safely upgrade from one version of Perl (e.g. 5.004_04) to another similar version

(e.g. 5.004_05) without re−compiling all of your add−on extensions. You can also safely leave the old

version around in case the new version causes you problems for some reason. For example, if you want to be

sure that your script continues to run with 5.004_04, simply replace the ‘#!/usr/local/bin/perl’ line at the top

of the script with the particular version you want to run, e.g. #!/usr/local/bin/perl5.00404.

Most extensions will probably not need to be recompiled to use with a newer version of perl. Here is how it

is supposed to work. (These examples assume you accept all the Configure defaults.)

The directories searched by version 5.005 will be

Configure variable Default value

$archlib /usr/local/lib/perl5/5.005/archname

$privlib /usr/local/lib/perl5/5.005

$sitearch /usr/local/lib/perl5/site_perl/5.005/archname

$sitelib /usr/local/lib/perl5/site_perl/5.005

while the directories searched by version 5.005_01 will be

$archlib /usr/local/lib/perl5/5.00501/archname

$privlib /usr/local/lib/perl5/5.00501

$sitearch /usr/local/lib/perl5/site_perl/5.005/archname

$sitelib /usr/local/lib/perl5/site_perl/5.005

20 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

When you install an add−on extension, it gets installed into $sitelib (or $sitearch if it is

architecture−specific). This directory deliberately does NOT include the sub−version number (01) so that

both 5.005 and 5.005_01 can use the extension. Only when a perl version changes to break backwards

compatibility will the default suggestions for the $sitearch and $sitelib version numbers be

increased.

However, if you do run into problems, and you want to continue to use the old version of perl along with

your extension, move those extension files to the appropriate version directory, such as $privlib (or

$archlib). (The extension‘s .packlist file lists the files installed with that extension. For the Tk

extension, for example, the list of files installed is in $sitearch/auto/Tk/.packlist.) Then use

your newer version of perl to rebuild and re−install the extension into $sitelib. This way, Perl 5.005

will find your files in the 5.005 directory, and newer versions of perl will find your newer extension in the

$sitelib directory. (This is also why perl searches the site−specific libraries last.)

Alternatively, if you are willing to reinstall all your extensions every time you upgrade perl, then you can

include the subversion number in $sitearch and $sitelib when you run Configure.

Maintaining completely separate versions

Many users prefer to keep all versions of perl in completely separate directories. One convenient way to do

this is by using a separate prefix for each version, such as

sh Configure −Dprefix=/opt/perl5.004

and adding /opt/perl5.004/bin to the shell PATH variable. Such users may also wish to add a symbolic link

/usr/local/bin/perl so that scripts can still start with #!/usr/local/bin/perl.

Others might share a common directory for maintenance sub−versions (e.g. 5.004 for all 5.004_0x versions),

but change directory with each major version.

If you are installing a development subversion, you probably ought to seriously consider using a separate

directory, since development subversions may not have all the compatibility wrinkles ironed out yet.

Upgrading from 5.004 to 5.005

Extensions built and installed with versions of perl prior to 5.004_50 will need to be recompiled to be used

with 5.004_50 and later. You will, however, be able to continue using 5.004 even after you install 5.005.

The 5.004 binary will still be able to find the extensions built under 5.004; the 5.005 binary will look in the

new $sitearch and $sitelib directories, and will not find them.

Coexistence with perl4

You can safely install perl5 even if you want to keep perl4 around.

By default, the perl5 libraries go into /usr/local/lib/perl5/, so they don‘t override the perl4 libraries in

/usr/local/lib/perl/.

In your /usr/local/bin directory, you should have a binary named perl4.036. That will not be touched by the

perl5 installation process. Most perl4 scripts should run just fine under perl5. However, if you have any

scripts that require perl4, you can replace the #! line at the top of them by #!/usr/local/bin/perl4.036 (or

whatever the appropriate pathname is). See pod/perltrap.pod for possible problems running perl4 scripts

under perl5.

cd /usr/include; h2ph *.h sys/*.h

Some perl scripts need to be able to obtain information from the system header files. This command will

convert the most commonly used header files in /usr/include into files that can be easily interpreted by perl.

These files will be placed in the architecture−dependent library ($archlib) directory you specified to

Configure.

Note: Due to differences in the C and perl languages, the conversion of the header files is not perfect. You

will probably have to hand−edit some of the converted files to get them to parse correctly. For example,

h2ph breaks spectacularly on type casting and certain structures.

18−Oct−1998 Version 5.005_02 21

INSTALL Perl Programmers Reference Guide INSTALL

installhtml —help

Some sites may wish to make perl documentation available in HTML format. The installhtml utility can be

used to convert pod documentation into linked HTML files and install them.

The following command−line is an example of one used to convert perl documentation:

./installhtml \

−−podroot=. \

−−podpath=lib:ext:pod:vms \

−−recurse \

−−htmldir=/perl/nmanual \

−−htmlroot=/perl/nmanual \

−−splithead=pod/perlipc \

−−splititem=pod/perlfunc \

−−libpods=perlfunc:perlguts:perlvar:perlrun:perlop \

−−verbose

See the documentation in installhtml for more details. It can take many minutes to execute a large

installation and you should expect to see warnings like "no title", "unexpected directive" and "cannot

resolve" as the files are processed. We are aware of these problems (and would welcome patches for them).

You may find it helpful to run installhtml twice. That should reduce the number of "cannot resolve"

warnings.

cd pod && make tex && (process the latex files)

Some sites may also wish to make the documentation in the pod/ directory available in TeX format. Type

(cd pod && make tex && <process the latex files>)

Reporting Problems

If you have difficulty building perl, and none of the advice in this file helps, and careful reading of the error

message and the relevant manual pages on your system doesn‘t help either, then you should send a message

to either the comp.lang.perl.misc newsgroup or to perlbug@perl.com with an accurate description of your

problem.

Please include the output of the ./myconfig shell script that comes with the distribution. Alternatively, you

can use the perlbug program that comes with the perl distribution, but you need to have perl compiled before

you can use it. (If you have not installed it yet, you need to run ./perl −Ilib utils/perlbug

instead of a plain perlbug.)

You might also find helpful information in the Porting directory of the perl distribution.

DOCUMENTATION

Read the manual entries before running perl. The main documentation is in the pod/ subdirectory and should

have been installed during the build process. Type man perl to get started. Alternatively, you can type

perldoc perl to use the supplied perldoc script. This is sometimes useful for finding things in the library

modules.

Under UNIX, you can produce a documentation book in postscript form, along with its table of contents, by

going to the pod/ subdirectory and running (either):

./roffitall −groff # If you have GNU groff installed

./roffitall −psroff # If you have psroff

This will leave you with two postscript files ready to be printed. (You may need to fix the roffitall command

to use your local troff set−up.)

Note that you must have performed the installation already before running the above, since the script collects

the installed files to generate the documentation.

22 Version 5.005_02 18−Oct−1998

INSTALL Perl Programmers Reference Guide INSTALL

AUTHOR

Original author: Andy Dougherty doughera@lafayette.edu , borrowing very heavily from the original

README by Larry Wall, with lots of helpful feedback and additions from the perl5−porters@perl.org folks.

If you have problems, corrections, or questions, please see "Reporting Problems" above.

REDISTRIBUTION

This document is part of the Perl package and may be distributed under the same terms as perl itself.

If you are distributing a modified version of perl (perhaps as part of a larger package) please do modify these

installation instructions and the contact information to match your distribution.

LAST MODIFIED

$Id: INSTALL,v 1.42 1998/07/15 18:04:44 doughera Released $

18−Oct−1998 Version 5.005_02 23

perlfaq Perl Programmers Reference Guide perlfaq

NAME

perlfaq − frequently asked questions about Perl ($Date: 1998/08/05 12:09:32 $)

DESCRIPTION

This document is structured into the following sections:

perlfaq: Structural overview of the FAQ.

This document.

perlfaq1

: General Questions About Perl

Very general, high−level information about Perl.

perlfaq2

: Obtaining and Learning about Perl

Where to find source and documentation to Perl, support, and related matters.

perlfaq3

: Programming Tools

Programmer tools and programming support.

perlfaq4

: Data Manipulation

Manipulating numbers, dates, strings, arrays, hashes, and miscellaneous data issues.

perlfaq5

: Files and Formats

I/O and the "f" issues: filehandles, flushing, formats and footers.

perlfaq6

: Regexps

Pattern matching and regular expressions.

perlfaq7

: General Perl Language Issues

General Perl language issues that don‘t clearly fit into any of the other sections.

perlfaq8

: System Interaction

Interprocess communication (IPC), control over the user−interface (keyboard, screen and pointing

devices).

perlfaq9

: Networking

Networking, the Internet, and a few on the web.

Where to get this document

This document is posted regularly to comp.lang.perl.announce and several other related newsgroups. It is

available in a variety of formats from CPAN in the /CPAN/doc/FAQs/FAQ/ directory, or on the web at

http://www.perl.com/perl/faq/ .

How to contribute to this document

You may mail corrections, additions, and suggestions to perlfaq−suggestions@perl.com . This alias should

not be used to ask FAQs. It‘s for fixing the current FAQ.

What will happen if you mail your Perl programming problems to the authors

Your questions will probably go unread, unless they‘re suggestions of new questions to add to the FAQ, in

which case they should have gone to the perlfaq−suggestions@perl.com instead.

You should have read section 2 of this faq. There you would have learned that comp.lang.perl.misc is the

appropriate place to go for free advice. If your question is really important and you require a prompt and

correct answer, you should hire a consultant.

Credits

When I first began the Perl FAQ in the late 80s, I never realized it would have grown to over a hundred

pages, nor that Perl would ever become so popular and widespread. This document could not have been

written without the tremendous help provided by Larry Wall and the rest of the Perl Porters.

24 Version 5.005_02 18−Oct−1998

perlfaq Perl Programmers Reference Guide perlfaq

Author and Copyright Information

Bundled Distributions

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in these files are hereby placed into the public domain.

You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit.

A simple comment in the code giving credit would be courteous but is not required.

Disclaimer

This information is offered in good faith and in the hope that it may be of use, but is not guaranteed to be

correct, up to date, or suitable for any particular purpose whatsoever. The authors accept no liability in

respect of this information or its use.

Changes

22/June/98

Significant changes throughout in preparation for the 5.005 release.

24/April/97

Style and whitespace changes from Chip, new question on reading one character at a time from a

terminal using POSIX from Tom.

23/April/97

Added http://www.oasis.leo.org/perl/ to perlfaq2. Style fix to perlfaq3. Added floating point

precision, fixed complex number arithmetic, cross−references, caveat for Text::Wrap, alternative

answer for initial capitalizing, fixed incorrect regexp, added example of Tie::IxHash to perlfaq4.

Added example of passing and storing filehandles, added commify to perlfaq5. Restored variable

suicide, and added mass commenting to perlfaq7. Added Net::Telnet, fixed backticks, added

reader/writer pair to telnet question, added FindBin, grouped module questions together in perlfaq8.

Expanded caveats for the simple URL extractor, gave LWP example, added CGI security question,

expanded on the mail address answer in perlfaq9.

25/March/97

Added more info to the binary distribution section of perlfaq2. Added Net::Telnet to perlfaq6. Fixed

typos in perlfaq8. Added mail sending example to perlfaq9. Added Merlyn‘s columns to perlfaq2.

18/March/97

Added the DATE to the NAME section, indicating which sections have changed.

Mentioned SIGPIPE and perlipc in the forking open answer in perlfaq8.

Fixed description of a regular expression in perlfaq4.

17/March/97 Version

Various typos fixed throughout.

Added new question on Perl BNF on perlfaq7.

Initial Release: 11/March/97

This is the initial release of version 3 of the FAQ; consequently there have been no changes since its

initial release.

18−Oct−1998 Version 5.005_02 25

perlfaq1 Perl Programmers Reference Guide perlfaq1

NAME

perlfaq1 − General Questions About Perl ($Revision: 1.15 $, $Date: 1998/08/05 11:52:24 $)

DESCRIPTION

This section of the FAQ answers very general, high−level questions about Perl.

What is Perl?

Perl is a high−level programming language with an eclectic heritage written by Larry Wall and a cast of

thousands. It derives from the ubiquitous C programming language and to a lesser extent from sed, awk, the

Unix shell, and at least a dozen other tools and languages. Perl‘s process, file, and text manipulation facilities

make it particularly well−suited for tasks involving quick prototyping, system utilities, software tools,

system management tasks, database access, graphical programming, networking, and world wide web

programming. These strengths make it especially popular with system administrators and CGI script authors,

but mathematicians, geneticists, journalists, and even managers also use Perl. Maybe you should, too.

Who supports Perl? Who develops it? Why is it free?

The original culture of the pre−populist Internet and the deeply−held beliefs of Perl‘s author, Larry Wall,

gave rise to the free and open distribution policy of perl. Perl is supported by its users. The core, the

standard Perl library, the optional modules, and the documentation you‘re reading now were all written by

volunteers. See the personal note at the end of the README file in the perl source distribution for more

details. See perlhist (new as of 5.005) for Perl‘s milestone releases.

In particular, the core development team (known as the Perl Porters) are a rag−tag band of highly altruistic

individuals committed to producing better software for free than you could hope to purchase for money.

You may snoop on pending developments via news://genetics.upenn.edu/perl.porters−gw/ and

http://www.frii.com/~gnat/perl/porters/summary.html.

While the GNU project includes Perl in its distributions, there‘s no such thing as "GNU Perl". Perl is not

produced nor maintained by the Free Software Foundation. Perl‘s licensing terms are also more open than

GNU software‘s tend to be.

You can get commercial support of Perl if you wish, although for most users the informal support will more

than suffice. See the answer to "Where can I buy a commercial version of perl?" for more information.

Which version of Perl should I use?

You should definitely use version 5. Version 4 is old, limited, and no longer maintained; its last patch

(4.036) was in 1992. The most recent production release is 5.005_01. Further references to the Perl

language in this document refer to this production release unless otherwise specified. There may be one or

more official bug fixes for 5.005_01 by the time you read this, and also perhaps some experimental versions

on the way to the next release.

What are perl4 and perl5?

Perl4 and perl5 are informal names for different versions of the Perl programming language. It‘s easier to

say "perl5" than it is to say "the 5(.004) release of Perl", but some people have interpreted this to mean

there‘s a language called "perl5", which isn‘t the case. Perl5 is merely the popular name for the fifth major

release (October 1994), while perl4 was the fourth major release (March 1991). There was also a perl1 (in

January 1988), a perl2 (June 1988), and a perl3 (October 1989).

The 5.0 release is, essentially, a complete rewrite of the perl source code from the ground up. It has been

modularized, object−oriented, tweaked, trimmed, and optimized until it almost doesn‘t look like the old

code. However, the interface is mostly the same, and compatibility with previous releases is very high.

To avoid the "what language is perl5?" confusion, some people prefer to simply use "perl" to refer to the

latest version of perl and avoid using "perl5" altogether. It‘s not really that big a deal, though.

See perlhist for a history of Perl revisions.

26 Version 5.005_02 18−Oct−1998

perlfaq1 Perl Programmers Reference Guide perlfaq1

How stable is Perl?

Production releases, which incorporate bug fixes and new functionality, are widely tested before release.

Since the 5.000 release, we have averaged only about one production release per year.

Larry and the Perl development team occasionally make changes to the internal core of the language, but all

possible efforts are made toward backward compatibility. While not quite all perl4 scripts run flawlessly

under perl5, an update to perl should nearly never invalidate a program written for an earlier version of perl

(barring accidental bug fixes and the rare new keyword).

Is Perl difficult to learn?

No, Perl is easy to start learning — and easy to keep learning. It looks like most programming languages

you‘re likely to have experience with, so if you‘ve ever written an C program, an awk script, a shell script, or

even BASIC program, you‘re already part way there.

Most tasks only require a small subset of the Perl language. One of the guiding mottos for Perl development

is "there‘s more than one way to do it" (TMTOWTDI, sometimes pronounced "tim toady"). Perl‘s learning

curve is therefore shallow (easy to learn) and long (there‘s a whole lot you can do if you really want).

Finally, Perl is (frequently) an interpreted language. This means that you can write your programs and test

them without an intermediate compilation step, allowing you to experiment and test/debug quickly and

easily. This ease of experimentation flattens the learning curve even more.

Things that make Perl easier to learn: Unix experience, almost any kind of programming experience, an

understanding of regular expressions, and the ability to understand other people‘s code. If there‘s something

you need to do, then it‘s probably already been done, and a working example is usually available for free.

Don‘t forget the new perl modules, either. They‘re discussed in Part 3 of this FAQ, along with the CPAN,

which is discussed in Part 2.

How does Perl compare with other languages like Java, Python, REXX, Scheme, or Tcl?

Favorably in some areas, unfavorably in others. Precisely which areas are good and bad is often a personal

choice, so asking this question on Usenet runs a strong risk of starting an unproductive Holy War.

Probably the best thing to do is try to write equivalent code to do a set of tasks. These languages have their

own newsgroups in which you can learn about (but hopefully not argue about) them.

Can I do [task] in Perl?

Perl is flexible and extensible enough for you to use on almost any task, from one−line file−processing tasks

to complex systems. For many people, Perl serves as a great replacement for shell scripting. For others, it

serves as a convenient, high−level replacement for most of what they‘d program in low−level languages like

C or C++. It‘s ultimately up to you (and possibly your management ...) which tasks you‘ll use Perl for and

which you won‘t.

If you have a library that provides an API, you can make any component of it available as just another Perl

function or variable using a Perl extension written in C or C++ and dynamically linked into your main perl

interpreter. You can also go the other direction, and write your main program in C or C++, and then link in

some Perl code on the fly, to create a powerful application.

That said, there will always be small, focused, special−purpose languages dedicated to a specific problem

domain that are simply more convenient for certain kinds of problems. Perl tries to be all things to all

people, but nothing special to anyone. Examples of specialized languages that come to mind include prolog

and matlab.

When shouldn‘t I program in Perl?

When your manager forbids it — but do consider replacing them :−).

Actually, one good reason is when you already have an existing application written in another language

that‘s all done (and done well), or you have an application language specifically designed for a certain task

(e.g. prolog, make).

18−Oct−1998 Version 5.005_02 27

perlfaq1 Perl Programmers Reference Guide perlfaq1

For various reasons, Perl is probably not well−suited for real−time embedded systems, low−level operating

systems development work like device drivers or context−switching code, complex multithreaded

shared−memory applications, or extremely large applications. You‘ll notice that perl is not itself written in

Perl.

The new native−code compiler for Perl may reduce the limitations given in the previous statement to some

degree, but understand that Perl remains fundamentally a dynamically typed language, and not a statically

typed one. You certainly won‘t be chastized if you don‘t trust nuclear−plant or brain−surgery monitoring

code to it. And Larry will sleep easier, too — Wall Street programs not withstanding. :−)

What‘s the difference between "perl" and "Perl"?

One bit. Oh, you weren‘t talking ASCII? :−) Larry now uses "Perl" to signify the language proper and "perl"

the implementation of it, i.e. the current interpreter. Hence Tom‘s quip that "Nothing but perl can parse

Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and

"Python and Perl" look ok, while "awk and Perl" and "Python and perl" do not.

Is it a Perl program or a Perl script?

It doesn‘t matter.

In "standard terminology" a program has been compiled to physical machine code once, and can then be be

run multiple times, whereas a script must be translated by a program each time it‘s used. Perl programs,

however, are usually neither strictly compiled nor strictly interpreted. They can be compiled to a byte code

form (something of a Perl virtual machine) or to completely different languages, like C or assembly

language. You can‘t tell just by looking whether the source is destined for a pure interpreter, a parse−tree

interpreter, a byte code interpreter, or a native−code compiler, so it‘s hard to give a definitive answer here.

What is a JAPH?

These are the "just another perl hacker" signatures that some people sign their postings with. About 100 of

the of the earlier ones are available from http://www.perl.com/CPAN/misc/japh .

Where can I get a list of Larry Wall witticisms?

Over a hundred quips by Larry, from postings of his or source code, can be found at

http://www.perl.com/CPAN/misc/lwall−quotes .

How can I convince my sysadmin/supervisor/employees to use version (5/5.005/Perl instead of

some other language)?

If your manager or employees are wary of unsupported software, or software which doesn‘t officially ship

with your Operating System, you might try to appeal to their self−interest. If programmers can be more

productive using and utilizing Perl constructs, functionality, simplicity, and power, then the typical

manager/supervisor/employee may be persuaded. Regarding using Perl in general, it‘s also sometimes

helpful to point out that delivery times may be reduced using Perl, as compared to other languages.

If you have a project which has a bottleneck, especially in terms of translation or testing, Perl almost

certainly will provide a viable, and quick solution. In conjunction with any persuasion effort, you should not

fail to point out that Perl is used, quite extensively, and with extremely reliable and valuable results, at many

large computer software and/or hardware companies throughout the world. In fact, many Unix vendors now

ship Perl by default, and support is usually just a news−posting away, if you can‘t find the answer in the

comprehensive documentation, including this FAQ.

If you face reluctance to upgrading from an older version of perl, then point out that version 4 is utterly

unmaintained and unsupported by the Perl Development Team. Another big sell for Perl5 is the large

number of modules and extensions which greatly reduce development time for any given task. Also mention

that the difference between version 4 and version 5 of Perl is like the difference between awk and C++.

(Well, ok, maybe not quite that distinct, but you get the idea.) If you want support and a reasonable

guarantee that what you‘re developing will continue to work in the future, then you have to run the supported

version. That probably means running the 5.005 release, although 5.004 isn‘t that bad (it‘s just one year and

one release behind). Several important bugs were fixed from the 5.000 through 5.003 versions, though, so

try upgrading past them if possible.

28 Version 5.005_02 18−Oct−1998

perlfaq1 Perl Programmers Reference Guide perlfaq1

Of particular note is the massive bughunt for buffer overflow problems that went into the 5.004 release. All

releases prior to that, including perl4, are considered insecure and should be upgraded as soon as possible.

AUTHOR AND COPYRIGHT

When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or

otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of

this FAQ outside of that, see perlfaq.

Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged

to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit to the FAQ would be courteous but is not required.

18−Oct−1998 Version 5.005_02 29

perlfaq2 Perl Programmers Reference Guide perlfaq2

NAME

perlfaq2 − Obtaining and Learning about Perl ($Revision: 1.25 $, $Date: 1998/08/05 11:47:25 $)

DESCRIPTION

This section of the FAQ answers questions about where to find source and documentation for Perl, support,

and related matters.

What machines support Perl? Where do I get it?

The standard release of Perl (the one maintained by the perl development team) is distributed only in source

code form. You can find this at http://www.perl.com/CPAN/src/latest.tar.gz, which in standard Internet

format (a gzipped archive in POSIX tar format).

Perl builds and runs on a bewildering number of platforms. Virtually all known and current Unix derivatives

are supported (Perl‘s native platform), as are proprietary systems like VMS, DOS, OS/2, Windows, QNX,

BeOS, and the Amiga. There are also the beginnings of support for MPE/iX.

Binary distributions for some proprietary platforms, including Apple systems can be found

http://www.perl.com/CPAN/ports/ directory. Because these are not part of the standard distribution, they

may and in fact do differ from the base Perl port in a variety of ways. You‘ll have to check their respective

release notes to see just what the differences are. These differences can be either positive (e.g. extensions for

the features of the particular platform that are not supported in the source release of perl) or negative (e.g.

might be based upon a less current source release of perl).

A useful FAQ for Win32 Perl users is

http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html

How can I get a binary version of Perl?

If you don‘t have a C compiler because for whatever reasons your vendor did not include one with your

system, the best thing to do is grab a binary version of gcc from the net and use that to compile perl with.

CPAN only has binaries for systems that are terribly hard to get free compilers for, not for Unix systems.

Your first stop should be http://www.perl.com/CPAN/ports to see what information is already available. A

simple installation guide for MS−DOS is available at http://www.cs.ruu.nl/~piet/perl5dos.html , and

similarly for Windows 3.1 at http://www.cs.ruu.nl/~piet/perlwin3.html .

I don‘t have a C compiler on my system. How can I compile perl?

Since you don‘t have a C compiler, you‘re doomed and your vendor should be sacrificed to the Sun gods.

But that doesn‘t help you.

What you need to do is get a binary version of gcc for your system first. Consult the Usenet FAQs for your

operating system for information on where to get such a binary version.

I copied the Perl binary from one machine to another, but scripts don‘t work.

That‘s probably because you forgot libraries, or library paths differ. You really should build the whole

distribution on the machine it will eventually live on, and then type make install. Most other

approaches are doomed to failure.

One simple way to check that things are in the right place is to print out the hard−coded @INC which perl is

looking for.

perl −e ’print join("\n",@INC)’

If this command lists any paths which don‘t exist on your system, then you may need to move the

appropriate libraries to these locations, or create symlinks, aliases, or shortcuts appropriately.

You might also want to check out How do I keep my own module/library directory? in perlfaq8.

I grabbed the sources and tried to compile but gdbm/dynamic loading/malloc/linking/... failed.

How do I make it work?

Read the INSTALL file, which is part of the source distribution. It describes in detail how to cope with most

30 Version 5.005_02 18−Oct−1998

perlfaq2 Perl Programmers Reference Guide perlfaq2

idiosyncracies that the Configure script can‘t work around for any given system or architecture.

What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/...

mean?

CPAN stands for Comprehensive Perl Archive Network, a huge archive replicated on dozens of machines all

over the world. CPAN contains source code, non−native ports, documentation, scripts, and many

third−party modules and extensions, designed for everything from commercial database interfaces to

keyboard/screen control to web walking and CGI scripts. The master machine for CPAN is

ftp://ftp.funet.fi/pub/languages/perl/CPAN/, but you can use the address

http://www.perl.com/CPAN/CPAN.html to fetch a copy from a "site near you". See

http://www.perl.com/CPAN (without a slash at the end) for how this process works.

CPAN/path/... is a naming convention for files available on CPAN sites. CPAN indicates the base directory

of a CPAN mirror, and the rest of the path is the path from that directory to the file. For instance, if you‘re

using ftp://ftp.funet.fi/pub/languages/perl/CPAN as your CPAN site, the file CPAN/misc/japh file is

downloadable as ftp://ftp.funet.fi/pub/languages/perl/CPAN/misc/japh .

Considering that there are hundreds of existing modules in the archive, one probably exists to do nearly

anything you can think of. Current categories under CPAN/modules/by−category/ include perl core modules;

development support; operating system interfaces; networking, devices, and interprocess communication;

data type utilities; database interfaces; user interfaces; interfaces to other languages; filenames, file systems,

and file locking; internationalization and locale; world wide web support; server and daemon utilities;

archiving and compression; image manipulation; mail and news; control flow utilities; filehandle and I/O;

Microsoft Windows modules; and miscellaneous modules.

Is there an ISO or ANSI certified version of Perl?

Certainly not. Larry expects that he‘ll be certified before Perl is.

Where can I get information on Perl?

The complete Perl documentation is available with the perl distribution. If you have perl installed locally,

you probably have the documentation installed as well: type man perl if you‘re on a system resembling

Unix. This will lead you to other important man pages, including how to set your $MANPATH. If you‘re not

on a Unix system, access to the documentation will be different; for example, it might be only in HTML

format. But all proper perl installations have fully−accessible documentation.

You might also try perldoc perl in case your system doesn‘t have a proper man command, or it‘s been

misinstalled. If that doesn‘t work, try looking in /usr/local/lib/perl5/pod for documentation.

If all else fails, consult the CPAN/doc directory, which contains the complete documentation in various

formats, including native pod, troff, html, and plain text. There‘s also a web page at

http://www.perl.com/perl/info/documentation.html that might help.

Many good books have been written about Perl — see the section below for more details.

What are the Perl newsgroups on USENET? Where do I post questions?

The now defunct comp.lang.perl newsgroup has been superseded by the following groups:

comp.lang.perl.announce Moderated announcement group

comp.lang.perl.misc Very busy group about Perl in general

comp.lang.perl.moderated Moderated discussion group

comp.lang.perl.modules Use and development of Perl modules

comp.lang.perl.tk Using Tk (and X) from Perl

comp.infosystems.www.authoring.cgi Writing CGI scripts for the Web.

Actually, the moderated group hasn‘t passed yet, but we‘re keeping our fingers crossed.

There is also USENET gateway to the mailing list used by the crack Perl development team (perl5−porters)

at news://news.perl.com/perl.porters−gw/ .

18−Oct−1998 Version 5.005_02 31

perlfaq2 Perl Programmers Reference Guide perlfaq2

Where should I post source code?

You should post source code to whichever group is most appropriate, but feel free to cross−post to

comp.lang.perl.misc. If you want to cross−post to alt.sources, please make sure it follows their posting

standards, including setting the Followup−To header line to NOT include alt.sources; see their FAQ for

details.

If you‘re just looking for software, first use Alta Vista, Deja News, and search CPAN. This is faster and

more productive than just posting a request.

Perl Books

A number of books on Perl and/or CGI programming are available. A few of these are good, some are ok,

but many aren‘t worth your money. Tom Christiansen maintains a list of these books, some with extensive

reviews, at http://www.perl.com/perl/critiques/index.html.

The incontestably definitive reference book on Perl, written by the creator of Perl, is now in its second

edition:

Programming Perl (the "Camel Book"):

Authors: Larry Wall, Tom Christiansen, and Randal Schwartz

ISBN 1−56592−149−6 (English)

ISBN 4−89052−384−7 (Japanese)

URL: http://www.oreilly.com/catalog/pperl2/

(French, German, Italian, and Hungarian translations also

available)

The companion volume to the Camel containing thousands of real−world examples, mini−tutorials, and

complete programs (first premiering at the 1998 Perl Conference), is:

The Perl Cookbook (the "Ram Book"):

Authors: Tom Christiansen and Nathan Torkington,

with Foreword by Larry Wall

ISBN: 1−56592−243−3

URL: http://perl.oreilly.com/cookbook/

If you‘re already a hard−core systems programmer, then the Camel Book might suffice for you to learn Perl

from. But if you‘re not, check out:

Learning Perl (the "Llama Book"):

Authors: Randal Schwartz and Tom Christiansen

with Foreword by Larry Wall

ISBN: 1−56592−284−0

URL: http://www.oreilly.com/catalog/lperl2/

Despite the picture at the URL above, the second edition of "Llama Book" really has a blue cover, and is

updated for the 5.004 release of Perl. Various foreign language editions are available, including Learning

Perl on Win32 Systems (the Gecko Book).

If you‘re not an accidental programmer, but a more serious and possibly even degreed computer scientist

who doesn‘t need as much hand−holding as we try to provide in the Llama or its defurred cousin the Gecko,

please check out the delightful book, Perl: The Programmer‘s Companion, written by Nigel Chapman.

You can order O‘Reilly books directly from O‘Reilly & Associates, 1−800−998−9938. Local/overseas is

1−707−829−0515. If you can locate an O‘Reilly order form, you can also fax to 1−707−829−0104. See

http://www.ora.com/ on the Web.

What follows is a list of the books that the FAQ authors found personally useful. Your mileage may (but, we

hope, probably won‘t) vary.

Recommended books on (or muchly on) Perl follow; those marked with a star may be ordered from

O‘Reilly.

32 Version 5.005_02 18−Oct−1998

perlfaq2 Perl Programmers Reference Guide perlfaq2

References

*Programming Perl

by Larry Wall, Tom Christiansen, and Randal L. Schwartz

*Perl 5 Desktop Reference

By Johan Vromans

Tutorials

*Learning Perl [2nd edition]

by Randal L. Schwartz and Tom Christiansen

with foreword by Larry Wall

*Learning Perl on Win32 Systems

by Randal L. Schwartz, Erik Olson, and Tom Christiansen,

with foreword by Larry Wall

Perl: The Programmer’s Companion

by Nigel Chapman

Cross−Platform Perl

by Eric F. Johnson

MacPerl: Power and Ease

by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher

Task−Oriented

*The Perl Cookbook

by Tom Christiansen and Nathan Torkington

with foreword by Larry Wall

Perl5 Interactive Course [2nd edition]

by Jon Orwant

*Advanced Perl Programming

by Sriram Srinivasan

Effective Perl Programming

by Joseph Hall

Special Topics

*Mastering Regular Expressions

by Jeffrey Friedl

How to Set up and Maintain a World Wide Web Site [2nd edition]

by Lincoln Stein

Perl in Magazines

The first and only periodical devoted to All Things Perl, The Perl Journal contains tutorials, demonstrations,

case studies, announcements, contests, and much more. TPJ has columns on web development, databases,

Win32 Perl, graphical programming, regular expressions, and networking, and sponsors the Obfuscated Perl

Contest. It is published quarterly under the gentle hand of its editor, Jon Orwant. See http://www.tpj.com/

or send mail to subscriptions@tpj.com.

Beyond this, magazines that frequently carry high−quality articles on Perl are Web Techniques (see

http://www.webtechniques.com/), Performance Computing (http://www.performance−computing.com/), and

Usenix‘s newsletter/magazine to its members, login:, at http://www.usenix.org/. Randal‘s Web Technique‘s

columns are available on the web at http://www.stonehenge.com/merlyn/WebTechniques/.

18−Oct−1998 Version 5.005_02 33

perlfaq2 Perl Programmers Reference Guide perlfaq2

Perl on the Net: FTP and WWW Access

To get the best (and possibly cheapest) performance, pick a site from the list below and use it to grab the

complete list of mirror sites. From there you can find the quickest site for you. Remember, the following list

is not the complete list of CPAN mirrors.

http://www.perl.com/CPAN (redirects to another mirror)

http://www.perl.org/CPAN

ftp://ftp.funet.fi/pub/languages/perl/CPAN/

http://www.cs.ruu.nl/pub/PERL/CPAN/

ftp://ftp.cs.colorado.edu/pub/perl/CPAN/

What mailing lists are there for perl?

Most of the major modules (tk, CGI, libwww−perl) have their own mailing lists. Consult the documentation

that came with the module for subscription information. The following are a list of mailing lists related to

perl itself.

If you subscribe to a mailing list, it behooves you to know how to unsubscribe from it. Strident pleas to the

list itself to get you off will not be favorably received.

MacPerl

There is a mailing list for discussing Macintosh Perl. Contact "mac−perl−request@iis.ee.ethz.ch".

Also see Matthias Neeracher‘s (the creator and maintainer of MacPerl) webpage at

http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html for many links to interesting MacPerl sites, and

the applications/MPW tools, precompiled.

Perl5−Porters

The core development team have a mailing list for discussing fixes and changes to the language. Send

mail to "perl5−porters−request@perl.org" with help in the body of the message for information on

subscribing.

NTPerl

This list is used to discuss issues involving Win32 Perl 5 (Windows NT and Win95). Subscribe by

mailing ListManager@ActiveWare.com with the message body:

subscribe Perl−Win32−Users

The list software, also written in perl, will automatically determine your address, and subscribe you

automatically. To unsubscribe, mail the following in the message body to the same address like so:

unsubscribe Perl−Win32−Users

You can also check http://www.activeware.com/ and select "Mailing Lists" to join or leave this list.

Perl−Packrats

Discussion related to archiving of perl materials, particularly the Comprehensive Perl Archive

Network (CPAN). Subscribe by emailing majordomo@cis.ufl.edu:

subscribe perl−packrats

The list software, also written in perl, will automatically determine your address, and subscribe you

automatically. To unsubscribe, simple prepend the same command with an "un", and mail to the same

address like so:

unsubscribe perl−packrats

Archives of comp.lang.perl.misc

Have you tried Deja News or Alta Vista?

ftp.cis.ufl.edu:/pub/perl/comp.lang.perl.*/monthly has an almost complete collection dating back to 12/89

(missing 08/91 through 12/93). They are kept as one large file for each month.

34 Version 5.005_02 18−Oct−1998

perlfaq2 Perl Programmers Reference Guide perlfaq2

You‘ll probably want more a sophisticated query and retrieval mechanism than a file listing, preferably one

that allows you to retrieve articles using a fast−access indices, keyed on at least author, date, subject, thread

(as in "trn") and probably keywords. The best solution the FAQ authors know of is the MH pick command,

but it is very slow to select on 18000 articles.

If you have, or know where can be found, the missing sections, please let perlfaq−suggestions@perl.com

know.

Where can I buy a commercial version of Perl?

In a sense, Perl already is commercial software: It has a licence that you can grab and carefully read to your

manager. It is distributed in releases and comes in well−defined packages. There is a very large user

community and an extensive literature. The comp.lang.perl.* newsgroups and several of the mailing lists

provide free answers to your questions in near real−time. Perl has traditionally been supported by Larry,

dozens of software designers and developers, and thousands of programmers, all working for free to create a

useful thing to make life better for everyone.

However, these answers may not suffice for managers who require a purchase order from a company whom

they can sue should anything go wrong. Or maybe they need very serious hand−holding and contractual

obligations. Shrink−wrapped CDs with perl on them are available from several sources if that will help.

Or you can purchase a real support contract. Although Cygnus historically provided this service, they no

longer sell support contracts for Perl. Instead, the Paul Ingram Group will be taking up the slack through The

Perl Clinic. The following is a commercial from them:

"Do you need professional support for Perl and/or Oraperl? Do you need a support contract with defined

levels of service? Do you want to pay only for what you need?

"The Paul Ingram Group has provided quality software development and support services to some of the

world‘s largest corporations for ten years. We are now offering the same quality support services for Perl at

The Perl Clinic. This service is led by Tim Bunce, an active perl porter since 1994 and well known as the

author and maintainer of the DBI, DBD::Oracle, and Oraperl modules and author/co−maintainer of The Perl

5 Module List. We also offer Oracle users support for Perl5 Oraperl and related modules (which Oracle is

planning to ship as part of Oracle Web Server 3). 20% of the profit from our Perl support work will be

donated to The Perl Institute."

For more information, contact the The Perl Clinic:

Tel: +44 1483 424424

Fax: +44 1483 419419

Web: http://www.perl.co.uk/

Email: perl−support−info@perl.co.uk or Tim.Bunce@ig.co.uk

See also www.perl.com for updates on training and support.

Where do I send bug reports?

If you are reporting a bug in the perl interpreter or the modules shipped with perl, use the perlbug program in

the perl distribution or mail your report to perlbug@perl.com.

If you are posting a bug with a non−standard port (see the answer to "What platforms is Perl available for?"),

a binary distribution, or a non−standard module (such as Tk, CGI, etc), then please see the documentation

that came with it to determine the correct place to post bugs.

Read the perlbug(1) man page (perl5.004 or later) for more information.

What is perl.com? perl.org? The Perl Institute?

The perl.com domain is managed by Tom Christiansen, who created it as a public service long before

perl.org came about. Despite the name, it‘s a pretty non−commercial site meant to be a clearinghouse for

information about all things Perlian, accepting no paid advertisements, bouncy happy gifs, or silly java

applets on its pages. The Perl Home Page at http://www.perl.com/ is currently hosted on a T3 line courtesy

of Songline Systems, a software−oriented subsidiary of O‘Reilly and Associates.

18−Oct−1998 Version 5.005_02 35

perlfaq2 Perl Programmers Reference Guide perlfaq2

perl.org is the official vehicle for The Perl Institute. The motto of TPI is "helping people help Perl help

people" (or something like that). It‘s a non−profit organization supporting development, documentation, and

dissemination of perl.

How do I learn about object−oriented Perl programming?

perltoot (distributed with 5.004 or later) is a good place to start. Also, perlobj, perlref, and perlmod are

useful references, while perlbot has some excellent tips and tricks.

AUTHOR AND COPYRIGHT

When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or

otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of

this FAQ outside of that, see perlfaq.

Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged

to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit to the FAQ would be courteous but is not required.

36 Version 5.005_02 18−Oct−1998

perlfaq3 Perl Programmers Reference Guide perlfaq3

NAME

perlfaq3 − Programming Tools ($Revision: 1.29 $, $Date: 1998/08/05 11:57:04 $)

DESCRIPTION

This section of the FAQ answers questions related to programmer tools and programming support.

How do I do (anything)?

Have you looked at CPAN (see perlfaq2)? The chances are that someone has already written a module that

can solve your problem. Have you read the appropriate man pages? Here‘s a brief index:

Basics perldata, perlvar, perlsyn, perlop, perlsub

Execution perlrun, perldebug

Functions perlfunc

Objects perlref, perlmod, perlobj, perltie

Data Structures perlref, perllol, perldsc

Modules perlmod, perlmodlib, perlsub

Regexps perlre, perlfunc, perlop, perllocale

Moving to perl5 perltrap, perl

Linking w/C perlxstut, perlxs, perlcall, perlguts, perlembed

Various http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html

(not a man−page but still useful)

perltoc provides a crude table of contents for the perl man page set.

How can I use Perl interactively?

The typical approach uses the Perl debugger, described in the perldebug(1) man page, on an ‘‘empty‘’

program, like this:

perl −de 42

Now just type in any legal Perl code, and it will be immediately evaluated. You can also examine the

symbol table, get stack backtraces, check variable values, set breakpoints, and other operations typically

found in symbolic debuggers.

Is there a Perl shell?

In general, no. The Shell.pm module (distributed with perl) makes perl try commands which aren‘t part of

the Perl language as shell commands. perlsh from the source distribution is simplistic and uninteresting, but

may still be what you want.

How do I debug my Perl programs?

Have you used −w? It enables warnings for dubious practices.

Have you tried use strict? It prevents you from using symbolic references, makes you predeclare any

subroutines that you call as bare words, and (probably most importantly) forces you to predeclare your

variables with my or use vars.

Did you check the returns of each and every system call? The operating system (and thus Perl) tells you

whether they worked or not, and if not why.

open(FH, "> /etc/cantwrite")

or die "Couldn’t write to /etc/cantwrite: $!\n";

Did you read perltrap? It‘s full of gotchas for old and new Perl programmers, and even has sections for

those of you who are upgrading from languages like awk and C.

Have you tried the Perl debugger, described in perldebug? You can step through your program and see what

it‘s doing and thus work out why what it‘s doing isn‘t what it should be doing.

18−Oct−1998 Version 5.005_02 37

perlfaq3 Perl Programmers Reference Guide perlfaq3

How do I profile my Perl programs?

You should get the Devel::DProf module from CPAN, and also use Benchmark.pm from the standard

distribution. Benchmark lets you time specific portions of your code, while Devel::DProf gives detailed

breakdowns of where your code spends its time.

Here‘s a sample use of Benchmark:

use Benchmark;

@junk = ‘cat /etc/motd‘;

$count = 10_000;

timethese($count, {

’map’ => sub { my @a = @junk;

map { s/a/b/ } @a;

return @a

’for’ => sub { my @a = @junk;

local $_;

for (@a) { s/a/b/ };

return @a },

});

This is what it prints (on one machine—your results will be dependent on your hardware, operating system,

and the load on your machine):

Benchmark: timing 10000 iterations of for, map...

for: 4 secs ( 3.97 usr 0.01 sys = 3.98 cpu)

map: 6 secs ( 4.97 usr 0.00 sys = 4.97 cpu)

How do I cross−reference my Perl programs?

The B::Xref module, shipped with the new, alpha−release Perl compiler (not the general distribution prior to

the 5.005 release), can be used to generate cross−reference reports for Perl programs.

perl −MO=Xref[,OPTIONS] scriptname.plx

Is there a pretty−printer (formatter) for Perl?

There is no program that will reformat Perl as much as indent(1) does for C. The complex feedback between

the scanner and the parser (this feedback is what confuses the vgrind and emacs programs) makes it

challenging at best to write a stand−alone Perl parser.

Of course, if you simply follow the guidelines in perlstyle, you shouldn‘t need to reformat. The habit of

formatting your code as you write it will help prevent bugs. Your editor can and should help you with this.

The perl−mode for emacs can provide a remarkable amount of help with most (but not all) code, and even

less programmable editors can provide significant assistance.

If you are used to using vgrind program for printing out nice code to a laser printer, you can take a stab at

this using http://www.perl.com/CPAN/doc/misc/tips/working.vgrind.entry, but the results are not particularly

satisfying for sophisticated code.

Is there a ctags for Perl?

There‘s a simple one at http://www.perl.com/CPAN/authors/id/TOMC/scripts/ptags.gz which may do the

trick.

Where can I get Perl macros for vi?

For a complete version of Tom Christiansen‘s vi configuration file, see

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/toms.exrc, the standard benchmark file for vi

emulators. This runs best with nvi, the current version of vi out of Berkeley, which incidentally can be built

with an embedded Perl interpreter — see http://www.perl.com/CPAN/src/misc.

38 Version 5.005_02 18−Oct−1998

perlfaq3 Perl Programmers Reference Guide perlfaq3

Where can I get perl−mode for emacs?

Since Emacs version 19 patchlevel 22 or so, there have been both a perl−mode.el and support for the perl

debugger built in. These should come with the standard Emacs 19 distribution.

In the perl source directory, you‘ll find a directory called "emacs", which contains a cperl−mode that

color−codes keywords, provides context−sensitive help, and other nifty things.

Note that the perl−mode of emacs will have fits with "main‘foo" (single quote), and mess up the

indentation and hilighting. You should be using "main::foo" in new Perl code anyway, so this shouldn‘t

be an issue.

How can I use curses with Perl?

The Curses module from CPAN provides a dynamically loadable object module interface to a curses library.

A small demo can be found at the directory

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/rep; this program repeats a command and

updates the screen as needed, rendering rep ps axu similar to top.

How can I use X or Tk with Perl?

Tk is a completely Perl−based, object−oriented interface to the Tk toolkit that doesn‘t force you to use Tcl

just to get at Tk. Sx is an interface to the Athena Widget set. Both are available from CPAN. See the

directory http://www.perl.com/CPAN/modules/by−category/08_User_Interfaces/

Invaluable for Perl/Tk programming are: the Perl/Tk FAQ at

http://w4.lns.cornell.edu/~pvhp/ptk/ptkTOC.html , the Perl/Tk Reference Guide available at

http://www.perl.com/CPAN−local/authors/Stephen_O_Lidie/ , and the online manpages at

http://www−users.cs.umn.edu/~amundson/perl/perltk/toc.html .

How can I generate simple menus without using CGI or Tk?

The http://www.perl.com/CPAN/authors/id/SKUNZ/perlmenu.v4.0.tar.gz module, which is curses−based,

can help with this.

What is undump?

See the next questions.

How can I make my Perl program run faster?

The best way to do this is to come up with a better algorithm. This can often make a dramatic difference.

Chapter 8 in the Camel has some efficiency tips in it you might want to look at. Jon Bentley‘s book

‘‘Programming Pearls‘’ (that‘s not a misspelling!) has some good tips on optimization, too. Advice on

benchmarking boils down to: benchmark and profile to make sure you‘re optimizing the right part, look for

better algorithms instead of microtuning your code, and when all else fails consider just buying faster

hardware.

A different approach is to autoload seldom−used Perl code. See the AutoSplit and AutoLoader modules in

the standard distribution for that. Or you could locate the bottleneck and think about writing just that part in

C, the way we used to take bottlenecks in C code and write them in assembler. Similar to rewriting in C is

the use of modules that have critical sections written in C (for instance, the PDL module from CPAN).

In some cases, it may be worth it to use the backend compiler to produce byte code (saving compilation

time) or compile into C, which will certainly save compilation time and sometimes a small amount (but not

much) execution time. See the question about compiling your Perl programs for more on the compiler—the

wins aren‘t as obvious as you‘d hope.

If you‘re currently linking your perl executable to a shared libc.so, you can often gain a 10−25%

performance benefit by rebuilding it to link with a static libc.a instead. This will make a bigger perl

executable, but your Perl programs (and programmers) may thank you for it. See the INSTALL file in the

source distribution for more information.

Unsubstantiated reports allege that Perl interpreters that use sfio outperform those that don‘t (for IO intensive

applications). To try this, see the INSTALL file in the source distribution, especially the ‘‘Selecting File IO

18−Oct−1998 Version 5.005_02 39

perlfaq3 Perl Programmers Reference Guide perlfaq3

mechanisms‘’ section.

The undump program was an old attempt to speed up your Perl program by storing the already−compiled

form to disk. This is no longer a viable option, as it only worked on a few architectures, and wasn‘t a good

solution anyway.

How can I make my Perl program take less memory?

When it comes to time−space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in

Perl use more memory than strings in C, arrays take more that, and hashes use even more. While there‘s still

a lot to be done, recent releases have been addressing these issues. For example, as of 5.004, duplicate hash

keys are shared amongst all hashes using them, so require no reallocation.

In some cases, using substr() or vec() to simulate arrays can be highly beneficial. For example, an

array of a thousand booleans will take at least 20,000 bytes of space, but it can be turned into one 125−byte

bit vector for a considerable memory savings. The standard Tie::SubstrHash module can also help for

certain types of data structure. If you‘re working with specialist data structures (matrices, for instance)

modules that implement these in C may use less memory than equivalent Perl modules.

Another thing to try is learning whether your Perl was compiled with the system malloc or with Perl‘s builtin

malloc. Whichever one it is, try using the other one and see whether this makes a difference. Information

about malloc is in the INSTALL file in the source distribution. You can find out whether you are using

perl‘s malloc by typing perl −V:usemymalloc.

Is it unsafe to return a pointer to local data?

No, Perl‘s garbage collection system takes care of this.

sub makeone {

my @a = ( 1 .. 10 );

return \@a;

}

for $i ( 1 .. 10 ) {

push @many, makeone();

}

print $many[4][5], "\n";

print "@many\n";

How can I free an array or hash so my program shrinks?

You can‘t. On most operating systems, memory allocated to a program can never be returned to the system.

That‘s why long−running programs sometimes re−exec themselves. Some operating systems (notably,

FreeBSD) allegedly reclaim large chunks of memory that is no longer used, but it doesn‘t appear to happen

with Perl (yet). The Mac appears to be the only platform that will reliably (albeit, slowly) return memory to

the OS.

However, judicious use of my() on your variables will help make sure that they go out of scope so that Perl

can free up their storage for use in other parts of your program. A global variable, of course, never goes out

of scope, so you can‘t get its space automatically reclaimed, although undef()ing and/or delete()ing it

will achieve the same effect. In general, memory allocation and de−allocation isn‘t something you can or

should be worrying about much in Perl, but even this capability (preallocation of data types) is in the works.

How can I make my CGI script more efficient?

Beyond the normal measures described to make general Perl programs faster or smaller, a CGI program has

additional issues. It may be run several times per second. Given that each time it runs it will need to be

re−compiled and will often allocate a megabyte or more of system memory, this can be a killer. Compiling

into C isn‘t going to help you because the process start−up overhead is where the bottleneck is.

There are two popular ways to avoid this overhead. One solution involves running the Apache HTTP server

(available from http://www.apache.org/) with either of the mod_perl or mod_fastcgi plugin modules.

40 Version 5.005_02 18−Oct−1998

perlfaq3 Perl Programmers Reference Guide perlfaq3

With mod_perl and the Apache::Registry module (distributed with mod_perl), httpd will run with an

embedded Perl interpreter which pre−compiles your script and then executes it within the same address

space without forking. The Apache extension also gives Perl access to the internal server API, so modules

written in Perl can do just about anything a module written in C can. For more on mod_perl, see

http://perl.apache.org/

With the FCGI module (from CPAN), a Perl executable compiled with sfio (see the INSTALL file in the

distribution) and the mod_fastcgi module (available from http://www.fastcgi.com/) each of your perl scripts

becomes a permanent CGI daemon process.

Both of these solutions can have far−reaching effects on your system and on the way you write your CGI

scripts, so investigate them with care.

See http://www.perl.com/CPAN/modules/by−category/15_World_Wide_Web_HTML_HTTP_CGI/ .

A non−free, commerical product, ‘‘The Velocity Engine for Perl‘’, (http://www.binevolve.com/ or

http://www.binevolve.com/bine/vep) might also be worth looking at. It will allow you to increase the

performance of your perl scripts, upto 25 times faster than normal CGI perl by running in persistent perl

mode, or 4 to 5 times faster without any modification to your existing CGI scripts. Fully functional

evaluation copies are available from the web site.

How can I hide the source for my Perl program?

Delete it. :−) Seriously, there are a number of (mostly unsatisfactory) solutions with varying levels of

‘‘security‘’.

First of all, however, you can‘t take away read permission, because the source code has to be readable in

order to be compiled and interpreted. (That doesn‘t mean that a CGI script‘s source is readable by people on

the web, though, only by people with access to the filesystem) So you have to leave the permissions at the

socially friendly 0755 level.

Some people regard this as a security problem. If your program does insecure things, and relies on people

not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine

the insecure things and exploit them without viewing the source. Security through obscurity, the name for

hiding your bugs instead of fixing them, is little security indeed.

You can try using encryption via source filters (Filter::* from CPAN), but crackers might be able to decrypt

it. You can try using the byte code compiler and interpreter described below, but crackers might be able to

de−compile it. You can try using the native−code compiler described below, but crackers might be able to

disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can

definitively conceal it (this is true of every language, not just Perl).

If you‘re concerned about people profiting from your code, then the bottom line is that nothing but a

restrictive licence will give you legal security. License your software and pepper it with threatening

statements like ‘‘This is unpublished proprietary software of XYZ Corp. Your access to it does not give you

permission to use it blah blah blah.‘’ We are not lawyers, of course, so you should see a lawyer if you want

to be sure your licence‘s wording will stand up in court.

How can I compile my Perl program into byte code or C?

Malcolm Beattie has written a multifunction backend compiler, available from CPAN, that can do both these

things. It is included in the perl5.005 release, but is still considered experimental. This means it‘s fun to play

with if you‘re a programmer but not really for people looking for turn−key solutions.

Merely compiling into C does not in and of itself guarantee that your code will run very much faster. That‘s

because except for lucky cases where a lot of native type inferencing is possible, the normal Perl run time

system is still present and so your program will take just as long to run and be just as big. Most programs

save little more than compilation time, leaving execution no more than 10−30% faster. A few rare programs

actually benefit significantly (like several times faster), but this takes some tweaking of your code.

You‘ll probably be astonished to learn that the current version of the compiler generates a compiled form of

your script whose executable is just as big as the original perl executable, and then some. That‘s because as

18−Oct−1998 Version 5.005_02 41

perlfaq3 Perl Programmers Reference Guide perlfaq3

currently written, all programs are prepared for a full eval() statement. You can tremendously reduce this

cost by building a shared libperl.so library and linking against that. See the INSTALL podfile in the perl

source distribution for details. If you link your main perl binary with this, it will make it miniscule. For

example, on one author‘s system, /usr/bin/perl is only 11k in size!

In general, the compiler will do nothing to make a Perl program smaller, faster, more portable, or more

secure. In fact, it will usually hurt all of those. The executable will be bigger, your VM system may take

longer to load the whole thing, the binary is fragile and hard to fix, and compilation never stopped software

piracy in the form of crackers, viruses, or bootleggers. The real advantage of the compiler is merely

packaging, and once you see the size of what it makes (well, unless you use a shared libperl.so), you‘ll

probably want a complete Perl install anyway.

How can I get #!perl to work on [MS−DOS,NT,...]?

For OS/2 just use

extproc perl −S −your_switches

as the first line in *.cmd file (−S due to a bug in cmd.exe‘s ‘extproc’ handling). For DOS one should first

invent a corresponding batch file, and codify it in ALTERNATIVE_SHEBANG (see the INSTALL file in the

source distribution for more information).

The Win95/NT installation, when using the ActiveState port of Perl, will modify the Registry to associate the

.pl extension with the perl interpreter. If you install another port (Gurusaramy Sarathy‘s is the

recommended Win95/NT port), or (eventually) build your own Win95/NT Perl using WinGCC, then you‘ll

have to modify the Registry yourself.

Macintosh perl scripts will have the the appropriate Creator and Type, so that double−clicking them will

invoke the perl application.

IMPORTANT!: Whatever you do, PLEASE don‘t get frustrated, and just throw the perl interpreter into your

cgi−bin directory, in order to get your scripts working for a web server. This is an EXTREMELY big

security risk. Take the time to figure out how to do it correctly.

Can I write useful perl programs on the command line?

Yes. Read perlrun for more information. Some examples follow. (These assume standard Unix shell

quoting rules.)

# sum first and last fields

perl −lane ’print $F[0] + $F[−1]’ *

# identify text files

perl −le ’for(@ARGV) {print if −f && −T _}’ *

# remove (most) comments from C program

perl −0777 −pe ’s{/\*.*?\*/}{}gs’ foo.c

# make file a month younger than today, defeating reaper daemons

perl −e ’$X=24*60*60; utime(time(),time() + 30 * $X,@ARGV)’ *

# find first unused uid

perl −le ’$i++ while getpwuid($i); print $i’

# display reasonable manpath

echo $PATH | perl −nl −072 −e ’

s![^/+]*$!man!&&−d&&!$s{$_}++&&push@m,$_;END{print"@m"}’

Ok, the last one was actually an obfuscated perl entry. :−)

Why don‘t perl one−liners work on my DOS/Mac/VMS system?

The problem is usually that the command interpreters on those systems have rather different ideas about

quoting than the Unix shells under which the one−liners were created. On some systems, you may have to

change single−quotes to double ones, which you must NOT do on Unix or Plan9 systems. You might also

42 Version 5.005_02 18−Oct−1998

perlfaq3 Perl Programmers Reference Guide perlfaq3

have to change a single % to a %%.

For example:

# Unix

perl −e ’print "Hello world\n"’

# DOS, etc.

perl −e "print \"Hello world\n\""

# Mac

print "Hello world\n"

(then Run "Myscript" or Shift−Command−R)

# VMS

perl −e "print ""Hello world\n"""

The problem is that none of this is reliable: it depends on the command interpreter. Under Unix, the first two

often work. Under DOS, it‘s entirely possible neither works. If 4DOS was the command shell, you‘d

probably have better luck like this:

perl −e "print <Ctrl−x>"Hello world\n<Ctrl−x>""

Under the Mac, it depends which environment you are using. The MacPerl shell, or MPW, is much like

Unix shells in its support for several quoting variants, except that it makes free use of the Mac‘s non−ASCII

characters as control characters.

There is no general solution to all of this. It is a mess, pure and simple. Sucks to be away from Unix, huh?

:−)

[Some of this answer was contributed by Kenneth Albanowski.]

Where can I learn about CGI or Web programming in Perl?

For modules, get the CGI or LWP modules from CPAN. For textbooks, see the two especially dedicated to

web stuff in the question on books. For problems and questions related to the web, like ‘‘Why do I get 500

Errors‘’ or ‘‘Why doesn‘t it run from the browser right when it runs fine on the command line‘’, see these

sources:

WWW Security FAQ

http://www.w3.org/Security/Faq/

Web FAQ

http://www.boutell.com/faq/

CGI FAQ

http://www.webthing.com/page.cgi/cgifaq

HTTP Spec

http://www.w3.org/pub/WWW/Protocols/HTTP/

HTML Spec

http://www.w3.org/TR/REC−html40/

http://www.w3.org/pub/WWW/MarkUp/

CGI Spec

http://www.w3.org/CGI/

CGI Security FAQ

http://www.go2net.com/people/paulp/cgi−security/safe−cgi.txt

Where can I learn about object−oriented Perl programming?

perltoot is a good place to start, and you can use perlobj and perlbot for reference. Perltoot didn‘t come out

until the 5.004 release, but you can get a copy (in pod, html, or postscript) from

http://www.perl.com/CPAN/doc/FMTEYEWTK/ .

18−Oct−1998 Version 5.005_02 43

perlfaq3 Perl Programmers Reference Guide perlfaq3

Where can I learn about linking C with Perl? [h2xs, xsubpp]

If you want to call C from Perl, start with perlxstut, moving on to perlxs, xsubpp, and perlguts. If you want

to call Perl from C, then read perlembed, perlcall, and perlguts. Don‘t forget that you can learn a lot from

looking at how the authors of existing extension modules wrote their code and solved their problems.

I‘ve read perlembed, perlguts, etc., but I can‘t embed perl in

my C program, what am I doing wrong?

Download the ExtUtils::Embed kit from CPAN and run ‘make test’. If the tests pass, read the pods again

and again and again. If they fail, see perlbug and send a bugreport with the output of make test

TEST_VERBOSE=1 along with perl −V.

When I tried to run my script, I got this message. What does it

mean?

perldiag has a complete list of perl‘s error messages and warnings, with explanatory text. You can also use

the splain program (distributed with perl) to explain the error messages:

perl program 2>diag.out

splain [−v] [−p] diag.out

or change your program to explain the messages for you:

use diagnostics;

use diagnostics −verbose;

What‘s MakeMaker?

This module (part of the standard perl distribution) is designed to write a Makefile for an extension module

from a Makefile.PL. For more information, see ExtUtils::MakeMaker.

AUTHOR AND COPYRIGHT

When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or

otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of

this FAQ outside of that, see perlfaq.

Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged

to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit to the FAQ would be courteous but is not required.

44 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

NAME

perlfaq4 − Data Manipulation ($Revision: 1.26 $, $Date: 1998/08/05 12:04:00 $)

DESCRIPTION

The section of the FAQ answers question related to the manipulation of data as numbers, dates, strings,

arrays, hashes, and miscellaneous data issues.

Data: Numbers

Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting

(eg, 19.95)?

The infinite set that a mathematician thinks of as the real numbers can only be approximate on a computer,

since the computer only has a finite number of bits to store an infinite number of, um, numbers.

Internally, your computer represents floating−point numbers in binary. Floating−point numbers read in from

a file or appearing as literals in your program are converted from their decimal floating−point representation

(eg, 19.95) to the internal binary representation.

However, 19.95 can‘t be precisely represented as a binary floating−point number, just like 1/3 can‘t be

exactly represented as a decimal floating−point number. The computer‘s binary representation of 19.95,

therefore, isn‘t exactly 19.95.

When a floating−point number gets printed, the binary floating−point representation is converted back to

decimal. These decimal numbers are displayed in either the format you specify with printf(), or the

current output format for numbers (see

$# in perlvar

if you use print. $# has a different default value

in Perl5 than it did in Perl4. Changing $# yourself is deprecated.

This affects all computer languages that represent decimal floating−point numbers in binary, not just Perl.

Perl provides arbitrary−precision decimal numbers with the Math::BigFloat module (part of the standard Perl

distribution), but mathematical operations are consequently slower.

To get rid of the superfluous digits, just use a format (eg, printf("%.2f", 19.95)) to get the required

precision. See Floating−point Arithmetic in perlop.

Why isn‘t my octal data interpreted correctly?

Perl only understands octal and hex numbers as such when they occur as literals in your program. If they are

read in from somewhere and assigned, no automatic conversion takes place. You must explicitly use oct()

or hex() if you want the values converted. oct() interprets both hex ("0x350") numbers and octal ones

("0350" or even without the leading "0", like "377"), while hex() only converts hexadecimal ones, with or

without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".

This problem shows up most often when people try using chmod(), mkdir(), umask(), or

sysopen(), which all want permissions in octal.

chmod(644, $file); # WRONG −− perl −w catches this

chmod(0644, $file); # right

Does perl have a round function? What about ceil() and floor()? Trig functions?

Remember that int() merely truncates toward 0. For rounding to a certain number of digits, sprintf()

or printf() is usually the easiest route.

printf("%.3f", 3.1415926535); # prints 3.142

The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of

other mathematical and trigonometric functions.

use POSIX;

$ceil = ceil(3.5); # 4

$floor = floor(3.5); # 3

In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig

18−Oct−1998 Version 5.005_02 45

perlfaq4 Perl Programmers Reference Guide perlfaq4

module (part of the standard perl distribution) implements the trigonometric functions. Internally it uses the

Math::Complex module and some functions can break out from the real axis into the complex plane, for

example the inverse sine of 2.

Rounding in financial applications can have serious implications, and the rounding method used should be

specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by

Perl, but to instead implement the rounding function you need yourself.

How do I convert bits into ints?

To turn a string of 1s and 0s like 10110110 into a scalar containing its binary value, use the pack()

function (documented in pack in perlfunc):

$decimal = pack(’B8’, ’10110110’);

Here‘s an example of going the other way:

$binary_string = join(’’, unpack(’B*’, "\x29"));

How do I multiply matrices?

Use the Math::Matrix or Math::MatrixReal modules (available from CPAN) or the PDL extension (also

available from CPAN).

How do I perform an operation on a series of integers?

To call a function on each element in an array, and collect the results, use:

@results = map { my_func($_) } @array;

For example:

@triple = map { 3 * $_ } @single;

To call a function on each element of an array, but ignore the results:

foreach $iterator (@array) {

&my_func($iterator);

}

To call a function on each integer in a (small) range, you can use:

@results = map { &my_func($_) } (5 .. 25);

but you should be aware that the .. operator creates an array of all integers in the range. This can take a lot

of memory for large ranges. Instead use:

@results = ();

for ($i=5; $i < 500_005; $i++) {

push(@results, &my_func($i));

}

How can I output Roman numerals?

Get the http://www.perl.com/CPAN/modules/by−module/Roman module.

Why aren‘t my random numbers random?

The short explanation is that you‘re getting pseudorandom numbers, not random ones, because computers

are good at being predictable and bad at being random (despite appearances caused by bugs in your programs

:−). A longer explanation is available on http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy

of Tom Phoenix. John von Neumann said, ‘‘Anyone who attempts to generate random numbers by

deterministic means is, of course, living in a state of sin.‘’

You should also check out the Math::TrulyRandom module from CPAN. It uses the imperfections in your

system‘s timer to generate random numbers, but this takes quite a while. If you want a better pseudorandom

generator than comes with your operating system, look at ‘‘Numerical Recipes in C‘’ at

http://nr.harvard.edu/nr/bookc.html .

46 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

Data: Dates

How do I find the week−of−the−year/day−of−the−year?

The day of the year is in the array returned by localtime() (see localtime in perlfunc):

$day_of_year = (localtime(time()))[7];

or more legibly (in 5.004 or higher):

use Time::localtime;

$day_of_year = localtime(time())−>yday;

You can find the week of the year by dividing this by 7:

$week_of_year = int($day_of_year / 7);

Of course, this believes that weeks start at zero. The Date::Calc module from CPAN has a lot of date

calculation functions, including day of the year, week of the year, and so on. Note that not all business

consider ‘‘week 1‘’ to be the same; for example, American business often consider the first week with a

Monday in it to be Work Week #1, despite ISO 8601, which consider WW1 to be the frist week with a

Thursday in it.

How can I compare two dates and find the difference?

If you‘re storing your dates as epoch seconds then simply subtract one from the other. If you‘ve got a

structured date (distinct year, day, month, hour, minute, seconds values) then use one of the Date::Manip and

Date::Calc modules from CPAN.

How can I take a string and turn it into epoch seconds?

If it‘s a regular enough string that it always has the same format, you can split it up and pass the parts to

timelocal in the standard Time::Local module. Otherwise, you should look into the Date::Calc and

Date::Manip modules from CPAN.

How can I find the Julian Day?

Neither Date::Manip nor Date::Calc deal with Julian days. Instead, there is an example of Julian date

calculation that should help you in

http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz .

Does Perl have a year 2000 problem? Is Perl Y2K compliant?

Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is Y2K compliant. The programmers

you‘re hired to use it, however, probably are not.

Long answer: Perl is just as Y2K compliant as your pencil—no more, and no less. The date and time

functions supplied with perl (gmtime and localtime) supply adequate information to determine the year well

beyond 2000 (2038 is when trouble strikes for 32−bit machines). The year returned by these functions when

used in an array context is the year minus 1900. For years between 1910 and 1999 this happens to be a

2−digit decimal number. To avoid the year 2000 problem simply do not treat the year as a 2−digit number. It

isn‘t.

When gmtime() and localtime() are used in scalar context they return a timestamp string that

contains a fully−expanded year. For example, $timestamp = gmtime(1005613200) sets

$timestamp to "Tue Nov 13 01:00:00 2001". There‘s no year 2000 problem here.

That doesn‘t mean that Perl can‘t be used to create non−Y2K compliant programs. It can. But so can your

pencil. It‘s the fault of the user, not the language. At the risk of inflaming the NRA: ‘‘Perl doesn‘t break

Y2K, people do.‘’ See http://language.perl.com/news/y2k.html for a longer exposition.

Data: Strings

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more

specific questions (numbers, mail addresses, etc.) for details.

18−Oct−1998 Version 5.005_02 47

perlfaq4 Perl Programmers Reference Guide perlfaq4

How do I unescape a string?

It depends just what you mean by ‘‘escape‘’. URL escapes are dealt with in perlfaq9. Shell escapes with the

backslash (\) character are removed with:

s/\\(.)/$1/g;

This won‘t expand "\n" or "\t" or any other special escapes.

How do I remove consecutive pairs of characters?

To turn "abbcccd" into "abccd":

s/(.)\1/$1/g;

How do I expand function calls in a string?

This is documented in perlref. In general, this is fraught with quoting and readability problems, but it is

possible. To interpolate a subroutine call (in list context) into a string:

print "My sub returned @{[mysub(1,2,3)]} that time.\n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

print "That yields ${\($n + 5)} widgets\n";

Version 5.004 of Perl had a bug that gave list context to the expression in ${...}, but this is fixed in

version 5.005.

See also ‘‘How can I expand variables in text strings?‘’ in this section of the FAQ.

How do I find matching/nesting anything?

This isn‘t something that can be done in one regular expression, no matter how complicated. To find

something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1.

For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these

deals with nested patterns, nor can they. For that you‘ll have to write a parser.

If you are serious about writing a parser, there are a number of modules or oddities that will make your life a

lot easier. There is the CPAN module Parse::RecDescent, the standard module Text::Balanced, the byacc

program, and Mark−Jason Dominus‘s excellent py tool at http://www.plover.com/~mjd/perl/py/ .

One simple destructive, inside−out approach that you might try is to pull out the smallest nesting parts one at

a time:

while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {

# do something with $1

}

How do I reverse a string?

Use reverse() in scalar context, as documented in reverse.

$reversed = reverse $string;

How do I expand tabs in a string?

You can do it yourself:

1 while $string =~ s/\t+/’ ’ x (length($&) * 8 − length($‘) % 8)/e;

Or you can just use the Text::Tabs module (part of the standard perl distribution).

use Text::Tabs;

@expanded_lines = expand(@lines_with_tabs);

How do I reformat a paragraph?

Use Text::Wrap (part of the standard perl distribution):

48 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

use Text::Wrap;

print wrap("\t", ’ ’, @paragraphs);

The paragraphs you give to Text::Wrap should not contain embedded newlines. Text::Wrap doesn‘t justify

the lines (flush−right).

How can I access/change the first N letters of a string?

There are many ways. If you just want to grab a copy, use substr():

$first_byte = substr($a, 0, 1);

If you want to modify part of a string, the simplest way is often to use substr() as an lvalue:

substr($a, 0, 3) = "Tom";

Although those with a pattern matching kind of thought process will likely prefer:

$a =~ s/^.../Tom/;

How do I change the Nth occurrence of something?

You have to keep track of N yourself. For example, let‘s say you want to change the fifth occurrence of

"whoever" or "whomever" into "whosoever" or "whomsoever", case insensitively.

$count = 0;

s{((whom?)ever)}{

++$count == 5 # is it the 5th?

? "${2}soever" # yes, swap

: $1 # renege and leave it there

}igex;

In the more general case, you can use the /g modifier in a while loop, keeping count of matches.

$WANT = 3;

$count = 0;

while (/(\w+)\s+fish\b/gi) {

if (++$count == $WANT) {

print "The third fish is a $1 one.\n";

# Warning: don’t ‘last’ out of this loop

}

That prints out: "The third fish is a red one." You can also use a repetition count and

repeated pattern like this:

/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;

How can I count the number of occurrences of a substring within a string?

There are a number of ways, with varying efficiency: If you want a count of a certain single character (X)

within a string, you can use the tr/// function like so:

$string = "ThisXlineXhasXsomeXx’sXinXit":

$count = ($string =~ tr/X//);

print "There are $count X charcters in the string";

This is fine if you are just looking for a single character. However, if you are trying to count multiple

character substrings within a larger string, tr/// won‘t work. What you can do is wrap a while() loop

around a global pattern match. For example, let‘s count negative integers:

$string = "−9 55 48 −2 23 −76 4 14 −44";

while ($string =~ /−\d+/g) { $count++ }

print "There are $count negative numbers in the string";

18−Oct−1998 Version 5.005_02 49

perlfaq4 Perl Programmers Reference Guide perlfaq4

How do I capitalize all the words on one line?

To make the first letter of each word upper case:

$line =~ s/\b(\w)/\U$1/g;

This has the strange effect of turning "don‘t do it" into "Don‘T Do It". Sometimes you might want

this, instead (Suggested by Brian Foy):

$string =~ s/ (

(^\w) #at the beginning of the line

| # or

(\s\w) #preceded by whitespace

)

/\U$1/xg;

$string =~ /([\w’]+)/\u\L$1/g;

To make the whole line upper case:

$line = uc($line);

To force each word to be lower case, with the first letter upper case:

$line =~ s/(\w+)/\u\L$1/g;

You can (and probably should) enable locale awareness of those characters by placing a use locale

pragma in your program. See perllocale for endless details on locales.

How can I split a [character] delimited string except when inside

[character]? (Comma−separated files)

Take the example case of trying to split a string that is comma−separated into its different fields. (We‘ll

pretend you said comma−separated, not comma−delimited, which is different and almost never what you

mean.) You can‘t use split(/,/) because you shouldn‘t split if the comma is inside quotes. For

example, take a data line like this:

SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl,

author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming

your string is contained in $text):

@new = ();

push(@new, $+) while $text =~ m{

"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes

| ([^,]+),?

| ,

}gx;

push(@new, undef) if substr($text,−1,1) eq ’,’;

If you want to represent quotation marks inside a quotation−mark−delimited field, escape them with

backslashes (eg, "like \"this\"". Unescaping them is a task addressed earlier in this section.

Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:

use Text::ParseWords;

@new = quotewords(",", 0, $text);

How do I strip blank space from the beginning/end of a string?

Although the simplest approach would seem to be:

$string =~ s/^\s*(.*?)\s*$/$1/;

50 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

This is unneccesarily slow, destructive, and fails with embedded newlines. It is much better faster to do this

in two steps:

$string =~ s/^\s+//;

$string =~ s/\s+$//;

Or more nicely written as:

for ($string) {

s/^\s+//;

s/\s+$//;

}

This idiom takes advantage of the foreach loop‘s aliasing behavior to factor out common code. You can

do this on several strings at once, or arrays, or even the values of a hash if you use a slide:

# trim whitespace in the scalar, the array,

# and all the values in the hash

foreach ($scalar, @array, @hash{keys %hash}) {

s/^\s+//;

s/\s+$//;

}

How do I extract selected columns from a string?

Use substr() or unpack(), both documented in perlfunc. If you prefer thinking in terms of columns

instead of widths, you can use this kind of thing:

# determine the unpack format needed to split Linux ps output

# arguments are cut columns

my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);

sub cut2fmt {

my(@positions) = @_;

my $template = ’’;

my $lastpos = 1;

for my $place (@positions) {

$template .= "A" . ($place − $lastpos) . " ";

$lastpos = $place;

}

$template .= "A*";

return $template;

}

How do I find the soundex value of a string?

Use the standard Text::Soundex module distributed with perl.

How can I expand variables in text strings?

Let‘s assume that you have a string like:

$text = ’this has a $foo in it and a $bar’;

If those were both global variables, then this would suffice:

$text =~ s/\$(\w+)/${$1}/g;

But since they are probably lexicals, or at least, they could be, you‘d have to do this:

$text =~ s/(\$\w+)/$1/eeg;

die if $@; # needed on /ee, not /e

It‘s probably better in the general case to treat those variables as entries in some special hash. For example:

18−Oct−1998 Version 5.005_02 51

perlfaq4 Perl Programmers Reference Guide perlfaq4

%user_defs = (

foo => 23,

bar => 19,

);

$text =~ s/\$(\w+)/$user_defs{$1}/g;

See also ‘‘How do I expand function calls in a string?‘’ in this section of the FAQ.

What‘s wrong with always quoting "$vars"?

The problem is that those double−quotes force stringification, coercing numbers and references into strings,

even when you don‘t want them to be.

If you get used to writing odd things like these:

print "$var"; # BAD

$new = "$old"; # BAD

somefunc("$var"); # BAD

You‘ll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:

print $var;

$new = $old;

somefunc($var);

Otherwise, besides slowing you down, you‘re going to break code when the thing in the scalar is actually

neither a string nor a number, but a reference:

func(\@array);

sub func {

my $aref = shift;

my $oref = "$aref"; # WRONG

}

You can also get into subtle problems on those few operations in Perl that actually do care about the

difference between a string and a number, such as the magical ++ autoincrement operator or the

syscall() function.

Stringification also destroys arrays.

@lines = ‘command‘;

print "@lines"; # WRONG − extra blanks

print @lines; # right

Why don‘t my <<HERE documents work?

Check for these three things:

1. There must be no space after the << part.

2. There (probably) should be a semicolon at the end.

3. You can‘t (easily) have any space in front of the tag.

If you want to indent the text in the here document, you can do this:

# all in one

($VAR = <<HERE_TARGET) =~ s/^\s+//gm;

your text

goes here

HERE_TARGET

But the HERE_TARGET must still be flush against the margin. If you want that indented also, you‘ll have to

quote in the indentation.

($quote = <<’ FINIS’) =~ s/^\s+//gm;

52 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

...we will have peace, when you and all your works have

perished−−and the works of your dark master to whom you

would deliver us. You are a liar, Saruman, and a corrupter

of men’s hearts. −−Theoden in /usr/src/perl/taint.c

FINIS

$quote =~ s/\s*−−/\n−−/;

A nice general−purpose fixer−upper function for indented here documents follows. It expects to be called

with a here document as its argument. It looks to see whether each line begins with a common substring, and

if so, strips that off. Otherwise, it takes the amount of leading white space found on the first line and

removes that much off each subsequent line.

sub fix {

local $_ = shift;

my ($white, $leader); # common white space and common leading string

if (/^\s*(?:([^\w\s]+)(\s*).*\n)(?:\s*\1\2?.*\n)+$/) {

($white, $leader) = ($2, quotemeta($1));

} else {

($white, $leader) = (/^(\s+)/, ’’);

}

s/^\s*?$leader(?:$white)?//gm;

return $_;

}

This works with leading special strings, dynamically determined:

$remember_the_main = fix<<’ MAIN_INTERPRETER_LOOP’;

@@@ int

@@@ runops() {

@@@ SAVEI32(runlevel);

@@@ runlevel++;

@@@ while ( op = (*op−>op_ppaddr)() ) ;

@@@ TAINT_NOT;

@@@ return 0;

@@@ }

MAIN_INTERPRETER_LOOP

Or with a fixed amount of leading white space, with remaining indentation correctly preserved:

$poem = fix<<EVER_ON_AND_ON;

Now far ahead the Road has gone,

And I must follow, if I can,

Pursuing it with eager feet,

Until it joins some larger way

Where many paths and errands meet.

And whither then? I cannot say.

−−Bilbo in /usr/src/perl/pp_ctl.c

EVER_ON_AND_ON

Data: Arrays

What is the difference between $array[1] and @array[1]?

The former is a scalar value, the latter an array slice, which makes it a list with one (scalar) value. You

should use $ when you want a scalar value (most of the time) and @ when you want a list with one scalar

value in it (very, very rarely; nearly never, in fact).

Sometimes it doesn‘t make a difference, but sometimes it does. For example, compare:

$good[0] = ‘some program that outputs several lines‘;

18−Oct−1998 Version 5.005_02 53

perlfaq4 Perl Programmers Reference Guide perlfaq4

with

@bad[0] = ‘same program that outputs several lines‘;

The −w flag will warn you about these matters.

How can I extract just the unique elements of an array?

There are several possible ways, depending on whether the array is ordered and whether you wish to

preserve the ordering.

a) If @in is sorted, and you want @out to be sorted:

(this assumes all true values in the array)

$prev = ’nonesuch’;

@out = grep($_ ne $prev && ($prev = $_), @in);

This is nice in that it doesn‘t use much extra memory, simulating uniq(1)‘s behavior of removing only

adjacent duplicates. It‘s less nice in that it won‘t work with false values like undef, 0, or ""; "0 but

true" is ok, though.

b) If you don‘t know whether @in is sorted:

undef %saw;

@out = grep(!$saw{$_}++, @in);

c) Like (b), but @in contains only small integers:

@out = grep(!$saw[$_]++, @in);

d) A way to do (b) without any loops or greps:

undef %saw;

@saw{@in} = ();

@out = sort keys %saw; # remove sort if undesired

e) Like (d), but @in contains only small positive integers:

undef @ary;

@ary[@in] = @in;

@out = @ary;

How can I tell whether a list or array contains a certain element?

Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store

your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren‘t.

That being said, there are several ways to approach this. If you are going to make this query many times

over arbitrary string values, the fastest way is probably to invert the original array and keep an associative

array lying about whose keys are the first array‘s values.

@blues = qw/azure cerulean teal turquoise lapis−lazuli/;

undef %is_blue;

for (@blues) { $is_blue{$_} = 1 }

Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the

blues all in a hash in the first place.

If the values are all small integers, you could use a simple indexed array. This kind of an array will take up

less space:

@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);

undef @is_tiny_prime;

for (@primes) { $is_tiny_prime[$_] = 1; }

Now you check whether $is_tiny_prime[$some_number].

If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings

54 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

instead:

@articles = ( 1..10, 150..2000, 2017 );

undef $read;

for (@articles) { vec($read,$_,1) = 1 }

Now check whether vec($read,$n,1) is true for some $n.

Please do not use

$is_there = grep $_ eq $whatever, @array;

or worse yet

$is_there = grep /$whatever/, @array;

These are slow (checks every element even if the first matches), inefficient (same reason), and potentially

buggy (what if there are regexp characters in $whatever?).

How do I compute the difference of two arrays? How do I compute the intersection of two arrays?

Use a hash. Here‘s code to do both and more. It assumes that each element is unique in a given array:

@union = @intersection = @difference = ();

%count = ();

foreach $element (@array1, @array2) { $count{$element}++ }

foreach $element (keys %count) {

push @union, $element;

push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element;

}

How do I find the first array element for which a condition is true?

You can use this if you care about the index:

for ($i=0; $i < @array; $i++) {

if ($array[$i] eq "Waldo") {

$found_index = $i;

last;

}

Now $found_index has what you want.

How do I handle linked lists?

In general, you usually don‘t need a linked list in Perl, since with regular arrays, you can push and pop or

shift and unshift at either end, or you can use splice to add and/or remove arbitrary number of elements at

arbitrary points. Both pop and shift are both O(1) operations on perl‘s dynamic arrays. In the absence of

shifts and pops, push in general needs to reallocate on the order every log(N) times, and unshift will need to

copy pointers each time.

If you really, really wanted, you could use structures as described in perldsc or perltoot and do just what the

algorithm book tells you to do.

How do I handle circular lists?

Circular lists could be handled in the traditional fashion with linked lists, or you could just do something like

this with an array:

unshift(@array, pop(@array)); # the last shall be first

push(@array, shift(@array)); # and vice versa

18−Oct−1998 Version 5.005_02 55

perlfaq4 Perl Programmers Reference Guide perlfaq4

How do I shuffle an array randomly?

Use this:

# fisher_yates_shuffle( \@array ) :

# generate a random permutation of @array in place

sub fisher_yates_shuffle {

my $array = shift;

my $i;

for ($i = @$array; −−$i; ) {

my $j = int rand ($i+1);

next if $i == $j;

@$array[$i,$j] = @$array[$j,$i];

}

fisher_yates_shuffle( \@array ); # permutes @array in place

You‘ve probably seen shuffling algorithms that works using splice, randomly picking another element to

swap the current element with:

srand;

@new = ();

@old = 1 .. 10; # just a demo

while (@old) {

push(@new, splice(@old, rand @old, 1));

}

This is bad because splice is already O(N), and since you do it N times, you just invented a quadratic

algorithm; that is, O(N**2). This does not scale, although Perl is so efficient that you probably won‘t notice

this until you have rather largish arrays.

How do I process/modify each element of an array?

Use for/foreach:

for (@lines) {

s/foo/bar/; # change that word

y/XZ/ZX/; # swap those letters

}

Here‘s another; let‘s compute spherical volumes:

for (@volumes = @radii) { # @volumes has changed parts

$_ **= 3;

$_ *= (4/3) * 3.14159; # this will be constant folded

}

If you want to do the same thing to modify the values of the hash, you may not use the values function,

oddly enough. You need a slice:

for $orbit ( @orbits{keys %orbits} ) {

($orbit **= 3) *= (4/3) * 3.14159;

}

How do I select a random element from an array?

Use the rand() function (see rand):

# at the top of the program:

srand; # not needed for 5.004 and later

# then later on

56 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

$index = rand @array;

$element = $array[$index];

Make sure you only call srand once per program, if then. If you are calling it more than once (such as before

each call to rand), you‘re almost certainly doing something wrong.

How do I permute N elements of a list?

Here‘s a little program that generates all permutations of all the words on each line of input. The algorithm

embodied in the permute() function should work on any list:

#!/usr/bin/perl −n

# tsc−permute: permute each word of input

permute([split], []);

sub permute {

my @items = @{ $_[0] };

my @perms = @{ $_[1] };

unless (@items) {

print "@perms\n";

} else {

my(@newitems,@newperms,$i);

foreach $i (0 .. $#items) {

@newitems = @items;

@newperms = @perms;

unshift(@newperms, splice(@newitems, $i, 1));

permute([@newitems], [@newperms]);

}

How do I sort an array by (anything)?

Supply a comparison function to sort() (described in sort):

@list = sort { $a <=> $b } @list;

The default sort function is cmp, string comparison, which would sort (1, 2, 10) into (1, 10, 2).

<=>, used above, is the numerical comparison operator.

If you have a complicated function needed to pull out the part you want to sort on, then don‘t do it inside the

sort function. Pull it out first, because the sort BLOCK can be called many times for the same element.

Here‘s an example of how to pull out the first word after the first number on each item, and then sort those

words case−insensitively.

@idx = ();

for (@data) {

($item) = /\d+\s*(\S+)/;

push @idx, uc($item);

}

@sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];

Which could also be written this way, using a trick that‘s come to be known as the Schwartzian Transform:

@sorted = map { $_−>[0] }

sort { $a−>[1] cmp $b−>[1] }

map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;

If you need to sort on several fields, the following paradigm is useful.

@sorted = sort { field1($a) <=> field1($b) ||

field2($a) cmp field2($b) ||

field3($a) cmp field3($b)

18−Oct−1998 Version 5.005_02 57

perlfaq4 Perl Programmers Reference Guide perlfaq4

} @data;

This can be conveniently combined with precalculation of keys as given above.

See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about this approach.

See also the question below on sorting hashes.

How do I manipulate arrays of bits?

Use pack() and unpack(), or else vec() and the bitwise operations.

For example, this sets $vec to have bit N set if $ints[N] was set:

$vec = ’’;

foreach(@ints) { vec($vec,$_,1) = 1 }

And here‘s how, given a vector in $vec, you can get those bits into your @ints array:

sub bitvec_to_list {

my $vec = shift;

my @ints;

# Find null−byte density then select best algorithm

if ($vec =~ tr/\0// / length $vec > 0.95) {

use integer;

my $i;

# This method is faster with mostly null−bytes

while($vec =~ /[^\0]/g ) {

$i = −9 + 8 * pos $vec;

push @ints, $i if vec($vec, ++$i, 1);

}

} else {

# This method is a fast general algorithm

use integer;

my $bits = unpack "b*", $vec;

push @ints, 0 if $bits =~ s/^(\d)// && $1;

push @ints, pos $bits while($bits =~ /1/g);

}

return \@ints;

}

This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)

Why does defined() return true on empty arrays and hashes?

See defined in the 5.004 release or later of Perl.

Data: Hashes (Associative Arrays)

How do I process an entire hash?

Use the each() function (see each) if you don‘t care whether it‘s sorted:

while ( ($key, $value) = each %hash) {

print "$key = $value\n";

}

58 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

If you want it sorted, you‘ll have to use foreach() on the result of sorting the keys as shown in an earlier

question.

What happens if I add or remove keys from a hash while iterating over it?

Don‘t do that.

How do I look up a hash element by value?

Create a reverse hash:

%by_value = reverse %by_key;

$key = $by_value{$value};

That‘s not particularly efficient. It would be more space−efficient to use:

while (($key, $value) = each %by_key) {

$by_value{$value} = $key;

}

If your hash could have repeated values, the methods above will only find one of the associated keys. This

may or may not worry you.

How can I know how many entries are in a hash?

If you mean how many keys, then all you have to do is take the scalar sense of the keys() function:

$num_keys = scalar keys %hash;

In void context it just resets the iterator, which is faster for tied hashes.

How do I sort a hash (optionally by value instead of key)?

Internally, hashes are stored in a way that prevents you from imposing an order on key−value pairs. Instead,

you have to sort a list of the keys or values:

@keys = sort keys %hash; # sorted by key

@keys = sort {

$hash{$a} cmp $hash{$b}

} keys %hash; # and by value

Here we‘ll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that

fails, by straight ASCII comparison of the keys (well, possibly modified by your locale — see perllocale).

@keys = sort {

$hash{$b} <=> $hash{$a}

length($b) <=> length($a)

$a cmp $b

} keys %hash;

How can I always keep my hash sorted?

You can look into using the DB_File module and tie() using the $DB_BTREE hash bindings as

documented in In Memory Databases in DB_File. The Tie::IxHash module from CPAN might also be

instructive.

What‘s the difference between "delete" and "undef" with hashes?

Hashes are pairs of scalars: the first is the key, the second is the value. The key will be coerced to a string,

although the value can be any kind of scalar: string, number, or reference. If a key $key is present in the

array, exists($key) will return true. The value for a given key can be undef, in which case

$array{$key} will be undef while $exists{$key} will return true. This corresponds to ($key,

undef) being in the hash.

Pictures help... here‘s the %ary table:

18−Oct−1998 Version 5.005_02 59

perlfaq4 Perl Programmers Reference Guide perlfaq4

keys values

+−−−−−−+−−−−−−+

| a | 3 |

| x | 7 |

| d | 0 |

| e | 2 |

+−−−−−−+−−−−−−+

And these conditions hold

$ary{’a’} is true

$ary{’d’} is false

defined $ary{’d’} is true

defined $ary{’a’} is true

exists $ary{’a’} is true (perl5 only)

grep ($_ eq ’a’, keys %ary) is true

If you now say

undef $ary{’a’}

your table now reads:

keys values

+−−−−−−+−−−−−−+

| a | undef|

| x | 7 |

| d | 0 |

| e | 2 |

+−−−−−−+−−−−−−+

and these conditions now hold; changes in caps:

$ary{’a’} is FALSE

$ary{’d’} is false

defined $ary{’d’} is true

defined $ary{’a’} is FALSE

exists $ary{’a’} is true (perl5 only)

grep ($_ eq ’a’, keys %ary) is true

Notice the last two: you have an undef value, but a defined key!

Now, consider this:

delete $ary{’a’}

your table now reads:

keys values

+−−−−−−+−−−−−−+

| x | 7 |

| d | 0 |

| e | 2 |

+−−−−−−+−−−−−−+

and these conditions now hold; changes in caps:

$ary{’a’} is false

$ary{’d’} is false

defined $ary{’d’} is true

defined $ary{’a’} is false

exists $ary{’a’} is FALSE (perl5 only)

60 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

grep ($_ eq ’a’, keys %ary) is FALSE

See, the whole entry is gone!

Why don‘t my tied hashes make the defined/exists distinction?

They may or may not implement the EXISTS() and DEFINED() methods differently. For example, there

isn‘t the concept of undef with hashes that are tied to DBM* files. This means the true/false tables above will

give different results when used on such a hash. It also means that exists and defined do the same thing with

a DBM* file, and what they end up doing is not what they do with ordinary hashes.

How do I reset an each() operation part−way through?

Using keys %hash in scalar context returns the number of keys in the hash and resets the iterator

associated with the hash. You may need to do this if you use last to exit a loop early so that when you

re−enter it, the hash iterator has been reset.

How can I get the unique keys from two hashes?

First you extract the keys from the hashes into arrays, and then solve the uniquifying the array problem

described above. For example:

%seen = ();

for $element (keys(%foo), keys(%bar)) {

$seen{$element}++;

}

@uniq = keys %seen;

Or more succinctly:

@uniq = keys %{{%foo,%bar}};

Or if you really want to save space:

%seen = ();

while (defined ($key = each %foo)) {

$seen{$key}++;

}

while (defined ($key = each %bar)) {

$seen{$key}++;

}

@uniq = keys %seen;

How can I store a multidimensional array in a DBM file?

Either stringify the structure yourself (no fun), or else get the MLDBM (which uses Data::Dumper) module

from CPAN and layer it on top of either DB_File or GDBM_File.

How can I make my hash remember the order I put elements into it?

Use the Tie::IxHash from CPAN.

use Tie::IxHash;

tie(%myhash, Tie::IxHash);

for ($i=0; $i<20; $i++) {

$myhash{$i} = 2*$i;

}

@keys = keys %myhash;

# @keys = (0,1,2,3,...)

Why does passing a subroutine an undefined element in a hash create it?

If you say something like:

somefunc($hash{"nonesuch key here"});

18−Oct−1998 Version 5.005_02 61

perlfaq4 Perl Programmers Reference Guide perlfaq4

Then that element "autovivifies"; that is, it springs into existence whether you store something there or not.

That‘s because functions get scalars passed in by reference. If somefunc() modifies $_[0], it has to be

ready to write it back into the caller‘s version.

This has been fixed as of perl5.004.

Normally, merely accessing a key‘s value for a nonexistent key does not cause that key to be forever there.

This is different than awk‘s behavior.

How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?

Use references (documented in perlref). Examples of complex data structures are given in perldsc and

perllol. Examples of structures and object−oriented classes are in perltoot.

How can I use a reference as a hash key?

You can‘t do this directly, but you could use the standard Tie::Refhash module distributed with perl.

Data: Misc

How do I handle binary data correctly?

Perl is binary clean, so this shouldn‘t be a problem. For example, this works fine (assuming the files are

found):

if (‘cat /vmunix‘ =~ /gzip/) {

print "Your kernel is GNU−zip enabled!\n";

}

On some systems, however, you have to play tedious games with "text" versus "binary" files. See

binmode in perlfunc.

If you‘re concerned about 8−bit ASCII data, then see perllocale.

If you want to deal with multibyte characters, however, there are some gotchas. See the section on Regular

Expressions.

How do I determine whether a scalar is a number/whole/integer/float?

Assuming that you don‘t care about IEEE notations like "NaN" or "Infinity", you probably just want to use a

regular expression.

warn "has nondigits" if /\D/;

warn "not a natural number" unless /^\d+$/; # rejects −3

warn "not an integer" unless /^−?\d+$/; # rejects +3

warn "not an integer" unless /^[+−]?\d+$/;

warn "not a decimal number" unless /^−?\d+\.?\d*$/; # rejects .2

warn "not a decimal number" unless /^−?(?:\d+(?:\.\d*)?|\.\d+)$/;

warn "not a C float"

unless /^([+−]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+−]?\d+))?$/;

If you‘re on a POSIX system, Perl‘s supports the POSIX::strtod function. Its semantics are somewhat

cumbersome, so here‘s a getnum wrapper function for more convenient access. This function takes a string

and returns the number it found, or undef for input that isn‘t a C float. The is_numeric function is a

front end to getnum if you just want to say, ‘‘Is this a float?‘’

sub getnum {

use POSIX qw(strtod);

my $str = shift;

$str =~ s/^\s+//;

$str =~ s/\s+$//;

$! = 0;

my($num, $unparsed) = strtod($str);

if (($str eq ’’) || ($unparsed != 0) || $!) {

return undef;

62 Version 5.005_02 18−Oct−1998

perlfaq4 Perl Programmers Reference Guide perlfaq4

} else {

return $num;

}

sub is_numeric { defined &getnum }

Or you could check out http://www.perl.com/CPAN/modules/by−module/String/String−Scanf−1.1.tar.gz

instead. The POSIX module (part of the standard Perl distribution) provides the strtol and strtod for

converting strings to double and longs, respectively.

How do I keep persistent data across program calls?

For some specific applications, you can use one of the DBM modules. See AnyDBM_File. More generically,

you should consult the FreezeThaw, Storable, or Class::Eroot modules from CPAN.

How do I print out or copy a recursive data structure?

The Data::Dumper module on CPAN is nice for printing out data structures, and FreezeThaw for copying

them. For example:

use FreezeThaw qw(freeze thaw);

$new = thaw freeze $old;

Where $old can be (a reference to) any kind of data structure you‘d like. It will be deeply copied.

How do I define methods for every class/object?

Use the UNIVERSAL class (see UNIVERSAL).

How do I verify a credit card checksum?

Get the Business::CreditCard module from CPAN.

AUTHOR AND COPYRIGHT

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You

are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit would be courteous but is not required.

18−Oct−1998 Version 5.005_02 63

perlfaq5 Perl Programmers Reference Guide perlfaq5

NAME

perlfaq5 − Files and Formats ($Revision: 1.24 $, $Date: 1998/07/05 15:07:20 $)

DESCRIPTION

This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers.

How do I flush/unbuffer an output filehandle? Why must I do this?

The C standard I/O library (stdio) normally buffers characters sent to devices. This is done for efficiency

reasons, so that there isn‘t a system call for each byte. Any time you use print() or write() in Perl,

you go though this buffering. syswrite() circumvents stdio and buffering.

In most stdio implementations, the type of output buffering and the size of the buffer varies according to the

type of device. Disk files are block buffered, often with a buffer size of more than 2k. Pipes and sockets are

often buffered with a buffer size between 1/2 and 2k. Serial devices (e.g. modems, terminals) are normally

line−buffered, and stdio sends the entire line when it gets the newline.

Perl does not support truly unbuffered output (except insofar as you can syswrite(OUT, $char, 1)).

What it does instead support is "command buffering", in which a physical write is performed after every

output command. This isn‘t as hard on your system as unbuffering, but does get the output where you want

it when you want it.

If you expect characters to get to your device when you print them there, you‘ll want to autoflush its handle.

Use select() and the $| variable to control autoflushing (see

and select):

$old_fh = select(OUTPUT_HANDLE);

$| = 1;

select($old_fh);

Or using the traditional idiom:

select((select(OUTPUT_HANDLE), $| = 1)[0]);

Or if don‘t mind slowly loading several thousand lines of module code just because you‘re afraid of the $|

variable:

use FileHandle;

open(DEV, "+</dev/tty"); # ceci n’est pas une pipe

DEV−>autoflush(1);

or the newer IO::* modules:

use IO::Handle;

open(DEV, ">/dev/printer"); # but is this?

DEV−>autoflush(1);

or even this:

use IO::Socket; # this one is kinda a pipe?

$sock = IO::Socket::INET−>new(PeerAddr => ’www.perl.com’,

PeerPort => ’http(80)’,

Proto => ’tcp’);

die "$!" unless $sock;

$sock−>autoflush();

print $sock "GET / HTTP/1.0" . "\015\012" x 2;

$document = join(’’, <$sock>);

print "DOC IS: $document\n";

Note the bizarrely hardcoded carriage return and newline in their octal equivalents. This is the ONLY way

(currently) to assure a proper flush on all platforms, including Macintosh. That the way things work in

network programming: you really should specify the exact bit pattern on the network line terminator. In

64 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

practice, "\n\n" often works, but this is not portable.

See perlfaq9 for other examples of fetching URLs over the web.

How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to

the beginning of a file?

Although humans have an easy time thinking of a text file as being a sequence of lines that operates much

like a stack of playing cards — or punch cards — computers usually see the text file as a sequence of bytes.

In general, there‘s no direct way for Perl to seek to a particular line of a file, insert text into a file, or remove

text from a file.

(There are exceptions in special circumstances. You can add or remove at the very end of the file. Another

is replacing a sequence of bytes with another sequence of the same length. Another is using the

$DB_RECNO array bindings as documented in DB_File. Yet another is manipulating files with all lines the

same length.)

The general solution is to create a temporary copy of the text file with the changes you want, then copy that

over the original. This assumes no locking.

$old = $file;

$new = "$file.tmp.$$";

$bak = "$file.bak";

open(OLD, "< $old") or die "can’t open $old: $!";

open(NEW, "> $new") or die "can’t open $new: $!";

# Correct typos, preserving case

while (<OLD>) {

s/\b(p)earl\b/${1}erl/i;

(print NEW $_) or die "can’t write to $new: $!";

}

close(OLD) or die "can’t close $old: $!";

close(NEW) or die "can’t close $new: $!";

rename($old, $bak) or die "can’t rename $old to $bak: $!";

rename($new, $old) or die "can’t rename $new to $old: $!";

Perl can do this sort of thing for you automatically with the −i command−line switch or the closely−related

$^I variable (see perlrun for more details). Note that −i may require a suffix on some non−Unix systems;

see the platform−specific documentation that came with your port.

# Renumber a series of tests from the command line

perl −pi −e ’s/(^\s+test\s+)\d+/ $1 . ++$count /e’ t/op/taint.t

# form a script

local($^I, @ARGV) = (’.bak’, glob("*.c"));

while (<>) {

if ($. == 1) {

print "This line should appear at the top of each file\n";

}

s/\b(p)earl\b/${1}erl/i; # Correct typos, preserving case

print;

close ARGV if eof; # Reset $.

}

If you need to seek to an arbitrary line of a file that changes infrequently, you could build up an index of byte

positions of where the line ends are in the file. If the file is large, an index of every tenth or hundredth line

end would allow you to seek and read fairly efficiently. If the file is sorted, try the look.pl library (part of the

standard perl distribution).

18−Oct−1998 Version 5.005_02 65

perlfaq5 Perl Programmers Reference Guide perlfaq5

In the unique case of deleting lines at the end of a file, you can use tell() and truncate(). The

following code snippet deletes the last line of a file without making a copy or reading the whole file into

memory:

open (FH, "+< $file");

while ( <FH> ) { $addr = tell(FH) unless eof(FH) }

truncate(FH, $addr);

Error checking is left as an exercise for the reader.

How do I count the number of lines in a file?

One fairly efficient way is to count newlines in the file. The following program uses a feature of tr///, as

documented in perlop. If your text file doesn‘t end with a newline, then it‘s not really a proper text file, so

this may report one fewer line than you expect.

$lines = 0;

open(FILE, $filename) or die "Can’t open ‘$filename’: $!";

while (sysread FILE, $buffer, 4096) {

$lines += ($buffer =~ tr/\n//);

}

close FILE;

This assumes no funny games with newline translations.

How do I make a temporary file name?

Use the new_tmpfile class method from the IO::File module to get a filehandle opened for reading and

writing. Use this if you don‘t need to know the file‘s name.

use IO::File;

$fh = IO::File−>new_tmpfile()

or die "Unable to make new temporary file: $!";

Or you can use the tmpnam function from the POSIX module to get a filename that you then open yourself.

Use this if you do need to know the file‘s name.

use Fcntl;

use POSIX qw(tmpnam);

# try new temporary filenames until we get one that didn’t already

# exist; the check should be unnecessary, but you can’t be too careful

do { $name = tmpnam() }

until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);

# install atexit−style handler so that when we exit or die,

# we automatically delete this temporary file

END { unlink($name) or die "Couldn’t unlink $name : $!" }

# now go on to use the file ...

If you‘re committed to doing this by hand, use the process ID and/or the current time−value. If you need to

have many temporary files in one process, use a counter:

BEGIN {

use Fcntl;

my $temp_dir = −d ’/tmp’ ? ’/tmp’ : $ENV{TMP} || $ENV{TEMP};

my $base_name = sprintf("%s/%d−%d−0000", $temp_dir, $$, time());

sub temp_file {

local *FH;

my $count = 0;

until (defined(fileno(FH)) || $count++ > 100) {

$base_name =~ s/−(\d+)$/"−" . (1 + $1)/e;

66 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);

}

if (defined(fileno(FH))

return (*FH, $base_name);

} else {

return ();

}

How can I manipulate fixed−record−length files?

The most efficient way is using pack() and unpack(). This is faster than using substr() when take

many, many strings. It is slower for just a few.

Here is a sample chunk of code to break up and put back together again some fixed−format input lines, in

this case from the output of a normal, Berkeley−style ps:

# sample input line:

# 15158 p5 T 0:00 perl /home/tchrist/scripts/now−what

$PS_T = ’A6 A4 A7 A5 A*’;

open(PS, "ps|");

print scalar <PS>;

while (<PS>) {

($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);

for $var (qw!pid tt stat time command!) {

print "$var: <$$var>\n";

}

print ’line=’, pack($PS_T, $pid, $tt, $stat, $time, $command),

"\n";

}

We‘ve used $$var in a way that forbidden by use strict ‘refs’. That is, we‘ve promoted a string to

a scalar variable reference using symbolic references. This is ok in small programs, but doesn‘t scale well.

It also only works on global variables, not lexicals.

How can I make a filehandle local to a subroutine? How do I pass filehandles between

subroutines? How do I make an array of filehandles?

The fastest, simplest, and most direct way is to localize the typeglob of the filehandle in question:

local *TmpHandle;

Typeglobs are fast (especially compared with the alternatives) and reasonably easy to use, but they also have

one subtle drawback. If you had, for example, a function named TmpHandle(), or a variable named

%TmpHandle, you just hid it from yourself.

sub findme {

local *HostFile;

open(HostFile, "</etc/hosts") or die "no /etc/hosts: $!";

local $_; # <− VERY IMPORTANT

while (<HostFile>) {

print if /\b127\.(0\.0\.)?1\b/;

}

# *HostFile automatically closes/disappears here

}

Here‘s how to use this in a loop to open and store a bunch of filehandles. We‘ll use as values of the hash an

ordered pair to make it easy to sort the hash in insertion order.

@names = qw(motd termcap passwd hosts);

18−Oct−1998 Version 5.005_02 67

perlfaq5 Perl Programmers Reference Guide perlfaq5

my $i = 0;

foreach $filename (@names) {

local *FH;

open(FH, "/etc/$filename") || die "$filename: $!";

$file{$filename} = [ $i++, *FH ];

}

# Using the filehandles in the array

foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {

my $fh = $file{$name}[1];

my $line = <$fh>;

print "$name $. $line";

}

For passing filehandles to functions, the easiest way is to prefer them with a star, as in func(*STDIN). See

Passing Filehandles in perlfaq7 for details.

If you want to create many, anonymous handles, you should check out the Symbol, FileHandle, or

IO::Handle (etc.) modules. Here‘s the equivalent code with Symbol::gensym, which is reasonably

light−weight:

foreach $filename (@names) {

use Symbol;

my $fh = gensym();

open($fh, "/etc/$filename") || die "open /etc/$filename: $!";

$file{$filename} = [ $i++, $fh ];

}

Or here using the semi−object−oriented FileHandle, which certainly isn‘t light−weight:

use FileHandle;

foreach $filename (@names) {

my $fh = FileHandle−>new("/etc/$filename") or die "$filename: $!";

$file{$filename} = [ $i++, $fh ];

}

Please understand that whether the filehandle happens to be a (probably localized) typeglob or an anonymous

handle from one of the modules, in no way affects the bizarre rules for managing indirect handles. See the

next question.

How can I use a filehandle indirectly?

An indirect filehandle is using something other than a symbol in a place that a filehandle is expected. Here

are ways to get those:

$fh = SOME_FH; # bareword is strict−subs hostile

$fh = "SOME_FH"; # strict−refs hostile; same package only

$fh = *SOME_FH; # typeglob

$fh = \*SOME_FH; # ref to typeglob (bless−able)

$fh = *SOME_FH{IO}; # blessed IO::Handle from *SOME_FH typeglob

Or to use the new method from the FileHandle or IO modules to create an anonymous filehandle, store that

in a scalar variable, and use it as though it were a normal filehandle.

use FileHandle;

$fh = FileHandle−>new();

use IO::Handle; # 5.004 or higher

$fh = IO::Handle−>new();

Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an

68 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains a

filehandle. Functions like print, open, seek, or the functions or the <FH> diamond operator will accept

either a read filehandle or a scalar variable containing one:

($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);

print $ofh "Type it: ";

$got = <$ifh>

print $efh "What was that: $got";

Of you‘re passing a filehandle to a function, you can write the function in two ways:

sub accept_fh {

my $fh = shift;

print $fh "Sending to indirect filehandle\n";

}

Or it can localize a typeglob and use the filehandle directly:

sub accept_fh {

local *FH = shift;

print FH "Sending to localized filehandle\n";

}

Both styles work with either objects or typeglobs of real filehandles. (They might also work with strings

under some circumstances, but this is risky.)

accept_fh(*STDOUT);

accept_fh($handle);

In the examples above, we assigned the filehandle to a scalar variable before using it. That is because only

simple scalar variables, not expressions or subscripts into hashes or arrays, can be used with built−ins like

print, printf, or the diamond operator. These are illegal and won‘t even compile:

@fd = (*STDIN, *STDOUT, *STDERR);

print $fd[1] "Type it: "; # WRONG

$got = <$fd[0]> # WRONG

print $fd[2] "What was that: $got"; # WRONG

With print and printf, you get around this by using a block and an expression where you would place

the filehandle:

print { $fd[1] } "funny stuff\n";

printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;

# Pity the poor deadbeef.

That block is a proper block like any other, so you can put more complicated code there. This sends the

message out to one of two places:

$ok = −x "/bin/cat";

print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";

print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";

This approach of treating print and printf like object methods calls doesn‘t work for the diamond

operator. That‘s because it‘s a real operator, not just a function with a comma−less argument. Assuming

you‘ve been storing typeglobs in your structure as we did above, you can use the built−in function named

readline to reads a record just as <> does. Given the initialization shown above for @fd, this would

work, but only because readline() require a typeglob. It doesn‘t work with objects or strings, which

might be a bug we haven‘t fixed yet.

$got = readline($fd[0]);

Let it be noted that the flakiness of indirect filehandles is not related to whether they‘re strings, typeglobs,

18−Oct−1998 Version 5.005_02 69

perlfaq5 Perl Programmers Reference Guide perlfaq5

objects, or anything else. It‘s the syntax of the fundamental operators. Playing the object game doesn‘t help

you at all here.

How can I set up a footer format to be used with write()?

There‘s no builtin way to do this, but perlform has a couple of techniques to make it possible for the intrepid

hacker.

How can I write() into a string?

See perlform for an swrite() function.

How can I output my numbers with commas added?

This one will do it for you:

sub commify {

local $_ = shift;

1 while s/^(−?\d+)(\d{3})/$1,$2/;

return $_;

}

$n = 23659019423.2331;

print "GOT: ", commify($n), "\n";

GOT: 23,659,019,423.2331

You can‘t just:

s/^(−?\d+)(\d{3})/$1,$2/g;

because you have to put the comma in and then recalculate your position.

Alternatively, this commifies all numbers in a line regardless of whether they have decimal portions, are

preceded by + or −, or whatever:

# from Andrew Johnson <ajohnson@gpu.srv.ualberta.ca>

sub commify {

my $input = shift;

$input = reverse $input;

$input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;

return reverse $input;

}

How can I translate tildes (~) in a filename?

Use the <> (glob()) operator, documented in perlfunc. This requires that you have a shell installed that

groks tildes, meaning csh or tcsh or (some versions of) ksh, and thus may have portability problems. The

Glob::KGlob module (available from CPAN) gives more portable glob functionality.

Within Perl, you may use this directly:

$filename =~ s{

^ ~ # find a leading tilde

( # save this in $1

[^/] # a non−slash character

* # repeated 0 or more times (0 means me)

)

}{

? (getpwnam($1))[7]

: ( $ENV{HOME} || $ENV{LOGDIR} )

}ex;

70 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

How come when I open a file read−write it wipes it out?

Because you‘re using something like this, which truncates the file and then gives you read−write access:

open(FH, "+> /path/name"); # WRONG (almost always)

Whoops. You should instead use this, which will fail if the file doesn‘t exist. Using ">" always clobbers or

creates. Using "<" never does either. The "+" doesn‘t change this.

Here are examples of many kinds of file opens. Those using sysopen() all assume

use Fcntl;

To open file for reading:

open(FH, "< $path") || die $!;

sysopen(FH, $path, O_RDONLY) || die $!;

To open file for writing, create new file if needed or else truncate old file:

open(FH, "> $path") || die $!;

sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT) || die $!;

sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666) || die $!;

To open file for writing, create new file, file must not exist:

sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT) || die $!;

sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666) || die $!;

To open file for appending, create if necessary:

open(FH, ">> $path") || die $!;

sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT) || die $!;

sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;

To open file for appending, file must exist:

sysopen(FH, $path, O_WRONLY|O_APPEND) || die $!;

To open file for update, file must exist:

open(FH, "+< $path") || die $!;

sysopen(FH, $path, O_RDWR) || die $!;

To open file for update, create file if necessary:

sysopen(FH, $path, O_RDWR|O_CREAT) || die $!;

sysopen(FH, $path, O_RDWR|O_CREAT, 0666) || die $!;

To open file for update, file must not exist:

sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT) || die $!;

sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666) || die $!;

To open a file without blocking, creating if necessary:

sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)

or die "can’t open /tmp/somefile: $!":

Be warned that neither creation nor deletion of files is guaranteed to be an atomic operation over NFS. That

is, two processes might both successful create or unlink the same file! Therefore O_EXCL isn‘t so exclusive

as you might wish.

Why do I sometimes get an "Argument list too long" when I use <*?

The <> operator performs a globbing operation (see above). By default glob() forks csh(1) to do the actual

glob expansion, but csh can‘t handle more than 127 items and so gives the error message Argument list

too long. People who installed tcsh as csh won‘t have this problem, but their users may be surprised by

18−Oct−1998 Version 5.005_02 71

perlfaq5 Perl Programmers Reference Guide perlfaq5

it.

To get around this, either do the glob yourself with Dirhandles and patterns, or use a module like

Glob::KGlob, one that doesn‘t use the shell to do globbing.

Is there a leak/bug in glob()?

Due to the current implementation on some operating systems, when you use the glob() function or its

angle−bracket alias in a scalar context, you may cause a leak and/or unpredictable behavior. It‘s best

therefore to use glob() only in list context.

How can I open a file with a leading ">" or trailing blanks?

Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|")

to mean something special. To avoid this, you might want to use a routine like this. It makes incomplete

pathnames into explicit relative ones, and tacks a trailing null byte on the name to make perl leave it alone:

sub safe_filename {

local $_ = shift;

return m#^/#

? "$_\0"

: "./$_\0";

}

$fn = safe_filename("<<<something really wicked ");

open(FH, "> $fn") or "couldn’t open $fn: $!";

You could also use the sysopen() function (see sysopen).

How can I reliably rename a file?

Well, usually you just use Perl‘s rename() function. But that may not work everywhere, in particular,

renaming files across file systems. If your operating system supports a mv(1) program or its moral

equivalent, this works:

rename($old, $new) or system("mv", $old, $new);

It may be more compelling to use the File::Copy module instead. You just copy to the new file to the new

name (checking return values), then delete the old one. This isn‘t really the same semantics as a real

rename(), though, which preserves metainformation like permissions, timestamps, inode info, etc.

The newer version of File::Copy export a move() function.

How can I lock a file?

Perl‘s builtin flock() function (see perlfunc for details) will call flock(2) if that exists, fcntl(2) if it doesn‘t

(on perl version 5.004 and later), and lockf(3) if neither of the two previous system calls exists. On some

systems, it may even use a different form of native locking. Here are some gotchas with Perl‘s flock():

1 Produces a fatal error if none of the three system calls (or their close equivalent) exists.

2 lockf(3) does not provide shared locking, and requires that the filehandle be open for writing (or

appending, or read/writing).

3 Some versions of flock() can‘t lock files over a network (e.g. on NFS file systems), so you‘d need

to force the use of fcntl(2) when you build Perl. See the flock entry of perlfunc, and the INSTALL file

in the source distribution for information on building Perl to do this.

What can‘t I just open(FH, "file.lock")?

A common bit of code NOT TO USE is this:

sleep(3) while −e "file.lock"; # PLEASE DO NOT USE

open(LCK, "> file.lock"); # THIS BROKEN CODE

This is a classic race condition: you take two steps to do something which must be done in one. That‘s why

computer hardware provides an atomic test−and−set instruction. In theory, this "ought" to work:

72 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)

or die "can’t open file.lock: $!":

except that lamentably, file creation (and deletion) is not atomic over NFS, so this won‘t work (at least, not

every time) over the net. Various schemes involving involving link() have been suggested, but these tend

to involve busy−wait, which is also subdesirable.

I still don‘t get locking. I just want to increment the number in the file. How can I do this?

Didn‘t anyone ever tell you web−page hit counters were useless? They don‘t count number of hits, they‘re a

waste of time, and they serve only to stroke the writer‘s vanity. Better to pick a random number. It‘s more

realistic.

Anyway, this is what you can do if you can‘t help yourself.

use Fcntl;

sysopen(FH, "numfile", O_RDWR|O_CREAT) or die "can’t open numfile: $!";

flock(FH, 2) or die "can’t flock numfile: $!";

$num = <FH> || 0;

seek(FH, 0, 0) or die "can’t rewind numfile: $!";

truncate(FH, 0) or die "can’t truncate numfile: $!";

(print FH $num+1, "\n") or die "can’t write numfile: $!";

# DO NOT UNLOCK THIS UNTIL YOU CLOSE

close FH or die "can’t close numfile: $!";

Here‘s a much better web−page hit counter:

$hits = int( (time() − 850_000_000) / rand(1_000) );

If the count doesn‘t impress your friends, then the code might. :−)

How do I randomly update a binary file?

If you‘re just trying to patch a binary, in many cases something as simple as this works:

perl −i −pe ’s{window manager}{window mangler}g’ /usr/bin/emacs

However, if you have fixed sized records, then you might do something more like this:

$RECSIZE = 220; # size of record, in bytes

$recno = 37; # which record to update

open(FH, "+<somewhere") || die "can’t update somewhere: $!";

seek(FH, $recno * $RECSIZE, 0);

read(FH, $record, $RECSIZE) == $RECSIZE || die "can’t read record $recno: $!";

# munge the record

seek(FH, $recno * $RECSIZE, 0);

print FH $record;

close FH;

Locking and error checking are left as an exercise for the reader. Don‘t forget them, or you‘ll be quite sorry.

How do I get a file‘s timestamp in perl?

If you want to retrieve the time at which the file was last read, written, or had its meta−data (owner, etc)

changed, you use the −M, −A, or −C filetest operations as documented in perlfunc. These retrieve the age of

the file (measured against the start−time of your program) in days as a floating point number. To retrieve the

"raw" time in seconds since the epoch, you would call the stat function, then use localtime(),

gmtime(), or POSIX::strftime() to convert this into human−readable form.

Here‘s an example:

$write_secs = (stat($file))[9];

printf "file %s updated at %s\n", $file,

scalar localtime($write_secs);

18−Oct−1998 Version 5.005_02 73

perlfaq5 Perl Programmers Reference Guide perlfaq5

If you prefer something more legible, use the File::stat module (part of the standard distribution in version

5.004 and later):

use File::stat;

use Time::localtime;

$date_string = ctime(stat($file)−>mtime);

print "file $file updated at $date_string\n";

Error checking is left as an exercise for the reader.

How do I set a file‘s timestamp in perl?

You use the utime() function documented in utime. By way of example, here‘s a little program that copies

the read and write times from its first argument to all the rest of them.

if (@ARGV < 2) {

die "usage: cptimes timestamp_file other_files ...\n";

}

$timestamp = shift;

($atime, $mtime) = (stat($timestamp))[8,9];

utime $atime, $mtime, @ARGV;

Error checking is left as an exercise for the reader.

Note that utime() currently doesn‘t work correctly with Win95/NT ports. A bug has been reported.

Check it carefully before using it on those platforms.

How do I print to more than one file at once?

If you only have to do this once, you can do this:

for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }

To connect up to one filehandle to several output filehandles, it‘s easiest to use the tee(1) program if you

have it, and let it take care of the multiplexing:

open (FH, "| tee file1 file2 file3");

Or even:

# make STDOUT go to three files, plus original STDOUT

open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";

print "whatever\n" or die "Writing: $!\n";

close(STDOUT) or die "Closing: $!\n";

Otherwise you‘ll have to write your own multiplexing print function — or your own tee program — or use

Tom Christiansen‘s, at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is written in Perl

and offers much greater functionality than the stock version.

How can I read in a file by paragraphs?

Use the $\ variable (see perlvar for details). You can either set it to "" to eliminate empty paragraphs

("abc\n\n\n\ndef", for instance, gets treated as two paragraphs and not three), or "\n\n" to accept

empty paragraphs.

How can I read a single character from a file? From the keyboard?

You can use the builtin getc() function for most filehandles, but it won‘t (easily) work on a terminal

device. For STDIN, either use the Term::ReadKey module from CPAN, or use the sample code in getc.

If your system supports POSIX, you can use the following code, which you‘ll note turns off echo processing

as well.

#!/usr/bin/perl −w

use strict;

$| = 1;

74 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

for (1..4) {

my $got;

print "gimme: ";

$got = getone();

print "−−> $got\n";

}

exit;

BEGIN {

use POSIX qw(:termios_h);

my ($term, $oterm, $echo, $noecho, $fd_stdin);

$fd_stdin = fileno(STDIN);

$term = POSIX::Termios−>new();

$term−>getattr($fd_stdin);

$oterm = $term−>getlflag();

$echo = ECHO | ECHOK | ICANON;

$noecho = $oterm & ~$echo;

sub cbreak {

$term−>setlflag($noecho);

$term−>setcc(VTIME, 1);

$term−>setattr($fd_stdin, TCSANOW);

}

sub cooked {

$term−>setlflag($oterm);

$term−>setcc(VTIME, 0);

$term−>setattr($fd_stdin, TCSANOW);

}

sub getone {

my $key = ’’;

cbreak();

sysread(STDIN, $key, 1);

cooked();

return $key;

}

END { cooked() }

The Term::ReadKey module from CPAN may be easier to use:

use Term::ReadKey;

open(TTY, "</dev/tty");

print "Gimme a char: ";

ReadMode "raw";

$key = ReadKey 0, *TTY;

ReadMode "normal";

printf "\nYou said %s, char number %03d\n",

$key, ord $key;

For DOS systems, Dan Carson <dbc@tc.fluke.COM reports the following:

To put the PC in "raw" mode, use ioctl with some magic numbers gleaned from msdos.c (Perl source file)

and Ralf Brown‘s interrupt list (comes across the net every so often):

18−Oct−1998 Version 5.005_02 75

perlfaq5 Perl Programmers Reference Guide perlfaq5

$old_ioctl = ioctl(STDIN,0,0); # Gets device info

$old_ioctl &= 0xff;

ioctl(STDIN,1,$old_ioctl | 32); # Writes it back, setting bit 5

Then to read a single character:

sysread(STDIN,$c,1); # Read a single character

And to put the PC back to "cooked" mode:

ioctl(STDIN,1,$old_ioctl); # Sets it back to cooked mode.

So now you have $c. If ord($c) == 0, you have a two byte code, which means you hit a special key.

Read another byte with sysread(STDIN,$c,1), and that value tells you what combination it was

according to this table:

# PC 2−byte keycodes = ^@ + the following:

# HEX KEYS

# −−− −−−−

# 0F SHF TAB

# 10−19 ALT QWERTYUIOP

# 1E−26 ALT ASDFGHJKL

# 2C−32 ALT ZXCVBNM

# 3B−44 F1−F10

# 47−49 HOME,UP,PgUp

# 4B LEFT

# 4D RIGHT

# 4F−53 END,DOWN,PgDn,Ins,Del

# 54−5D SHF F1−F10

# 5E−67 CTR F1−F10

# 68−71 ALT F1−F10

# 73−77 CTR LEFT,RIGHT,END,PgDn,HOME

# 78−83 ALT 1234567890−=

# 84 CTR PgUp

This is all trial and error I did a long time ago, I hope I‘m reading the file that worked.

How can I tell if there‘s a character waiting on a filehandle?

The very first thing you should do is look into getting the Term::ReadKey extension from CPAN. It now

even has limited support for closed, proprietary (read: not open systems, not POSIX, not Unix, etc) systems.

You should also check out the Frequently Asked Questions list in comp.unix.* for things like this: the

answer is essentially the same. It‘s very system dependent. Here‘s one solution that works on BSD systems:

sub key_ready {

my($rin, $nfd);

vec($rin, fileno(STDIN), 1) = 1;

return $nfd = select($rin,undef,undef,0);

}

If you want to find out how many characters are waiting, there‘s also the FIONREAD ioctl call to be looked

at.

The h2ph tool that comes with Perl tries to convert C include files to Perl code, which can be required.

FIONREAD ends up defined as a function in the sys/ioctl.ph file:

require ’sys/ioctl.ph’;

$size = pack("L", 0);

ioctl(FH, FIONREAD(), $size) or die "Couldn’t call ioctl: $!\n";

$size = unpack("L", $size);

76 Version 5.005_02 18−Oct−1998

perlfaq5 Perl Programmers Reference Guide perlfaq5

If h2ph wasn‘t installed or doesn‘t work for you, you can grep the include files by hand:

% grep FIONREAD /usr/include/*/*

/usr/include/asm/ioctls.h:#define FIONREAD 0x541B

Or write a small C program using the editor of champions:

% cat > fionread.c

#include <sys/ioctl.h>

main() {

printf("%#08x\n", FIONREAD);

}

% cc −o fionread fionread

% ./fionread

0x4004667f

And then hard−code it, leaving porting as an exercise to your successor.

$FIONREAD = 0x4004667f; # XXX: opsys dependent

$size = pack("L", 0);

ioctl(FH, $FIONREAD, $size) or die "Couldn’t call ioctl: $!\n";

$size = unpack("L", $size);

FIONREAD requires a filehandle connected to a stream, meaning sockets, pipes, and tty devices work, but

not files.

How do I do a tail −f in perl?

First try

seek(GWFILE, 0, 1);

The statement seek(GWFILE, 0, 1) doesn‘t change the current position, but it does clear the

end−of−file condition on the handle, so that the next <GWFILE makes Perl try again to read something.

If that doesn‘t work (it relies on features of your stdio implementation), then you need something more like

this:

for (;;) {

for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {

# search for some stuff and put it into files

}

# sleep for a while

seek(GWFILE, $curpos, 0); # seek to where we had been

}

If this still doesn‘t work, look into the POSIX module. POSIX defines the clearerr() method, which

can remove the end of file condition on a filehandle. The method: read until end of file, clearerr(), read

some more. Lather, rinse, repeat.

How do I dup() a filehandle in Perl?

If you check open, you‘ll see that several of the ways to call open() should do the trick. For example:

open(LOG, ">>/tmp/logfile");

open(STDERR, ">&LOG");

Or even with a literal numeric descriptor:

$fd = $ENV{MHCONTEXTFD};

open(MHCONTEXT, "<&=$fd"); # like fdopen(3S)

Note that "<&STDIN" makes a copy, but "<&=STDIN" make an alias. That means if you close an aliased

18−Oct−1998 Version 5.005_02 77

perlfaq5 Perl Programmers Reference Guide perlfaq5

handle, all aliases become inaccessible. This is not true with a copied one.

Error checking, as always, has been left as an exercise for the reader.

How do I close a file descriptor by number?

This should rarely be necessary, as the Perl close() function is to be used for things that Perl opened

itself, even if it was a dup of a numeric descriptor, as with MHCONTEXT above. But if you really have to,

you may be able to do this:

require ’sys/syscall.ph’;

$rc = syscall(&SYS_close, $fd + 0); # must force numeric

die "can’t sysclose $fd: $!" unless $rc == −1;

Why can‘t I use "C:\temp\foo" in DOS paths? What doesn‘t ‘C:\temp\foo.exe‘ work?

Whoops! You just put a tab and a formfeed into that filename! Remember that within double quoted strings

("like\this"), the backslash is an escape character. The full list of these is in

Quote and Quote−like Operators. Unsurprisingly, you don‘t have a file called "c:(tab)emp(formfeed)oo" or

"c:(tab)emp(formfeed)oo.exe" on your DOS filesystem.

Either single−quote your strings, or (preferably) use forward slashes. Since all DOS and Windows versions

since something like MS−DOS 2.0 or so have treated / and \ the same in a path, you might as well use the

one that doesn‘t clash with Perl — or the POSIX shell, ANSI C and C++, awk, Tcl, Java, or Python, just to

mention a few.

Why doesn‘t glob("*.*") get all the files?

Because even on non−Unix ports, Perl‘s glob function follows standard Unix globbing semantics. You‘ll

need glob("*") to get all (non−hidden) files. This makes glob() portable.

Why does Perl let me delete read−only files? Why does −i clobber protected files? Isn‘t this a

bug in Perl?

This is elaborately and painstakingly described in the "Far More Than You Ever Wanted To Know" in

http://www.perl.com/CPAN/doc/FMTEYEWTK/file−dir−perms .

The executive summary: learn how your filesystem works. The permissions on a file say what can happen to

the data in that file. The permissions on a directory say what can happen to the list of files in that directory.

If you delete a file, you‘re removing its name from the directory (so the operation depends on the

permissions of the directory, not of the file). If you try to write to the file, the permissions of the file govern

whether you‘re allowed to.

How do I select a random line from a file?

Here‘s an algorithm from the Camel Book:

srand;

rand($.) < 1 && ($line = $_) while <>;

This has a significant advantage in space over reading the whole file in. A simple proof by induction is

available upon request if you doubt its correctness.

AUTHOR AND COPYRIGHT

When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or

otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of

this FAQ outside of that, see perlfaq.

Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged

to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit to the FAQ would be courteous but is not required.

78 Version 5.005_02 18−Oct−1998

perlfaq6 Perl Programmers Reference Guide perlfaq6

NAME

perlfaq6 − Regexps ($Revision: 1.22 $, $Date: 1998/07/16 14:01:07 $)

DESCRIPTION

This section is surprisingly small because the rest of the FAQ is littered with answers involving regular

expressions. For example, decoding a URL and checking whether something is a number are handled with

regular expressions, but those answers are found elsewhere in this document (in the section on Data and the

Networking one on networking, to be precise).

How can I hope to use regular expressions without creating illegible and unmaintainable code?

Three techniques can make regular expressions maintainable and understandable.

Comments Outside the Regexp

Describe what you‘re doing and how you‘re doing it, using normal Perl comments.

# turn the line into the first word, a colon, and the

# number of characters on the rest of the line

s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;

Comments Inside the Regexp

The /x modifier causes whitespace to be ignored in a regexp pattern (except in a character class), and

also allows you to use normal comments there, too. As you can imagine, whitespace and comments

help a lot.

/x lets you turn this:

s{<(?:[^>’"]*|".*?"|’.*?’)+>}{}gs;

into this:

s{ < # opening angle bracket

(?: # Non−backreffing grouping paren

[^>’"] * # 0 or more things that are neither > nor ’ nor "

| # or else

".*?" # a section between double quotes (stingy match)

| # or else

’.*?’ # a section between single quotes (stingy match)

) + # all occurring one or more times

> # closing angle bracket

}{}gsx; # replace with nothing, i.e. delete

It‘s still not quite so clear as prose, but it is very useful for describing the meaning of each part of the

pattern.

Different Delimiters

While we normally think of patterns as being delimited with / characters, they can be delimited by

almost any character. perlre describes this. For example, the s/// above uses braces as delimiters.

Selecting another delimiter can avoid quoting the delimiter within the pattern:

s/\/usr\/local/\/usr\/share/g; # bad delimiter choice

s#/usr/local#/usr/share#g; # better

I‘m having trouble matching over more than one line. What‘s wrong?

Either you don‘t have more than one line in the string you‘re looking at (probably), or else you aren‘t using

the correct modifier(s) on your pattern (possibly).

There are many ways to get multiline data into a string. If you want it to happen automatically while reading

input, you‘ll want to set $/ (probably to ‘’ for paragraphs or undef for the whole file) to allow you to read

more than one line at a time.

18−Oct−1998 Version 5.005_02 79

perlfaq6 Perl Programmers Reference Guide perlfaq6

Read perlre to help you decide which of /s and /m (or both) you might want to use: /s allows dot to

include newline, and /m allows caret and dollar to match next to a newline, not just at the end of the string.

You do need to make sure that you‘ve actually got a multiline string in there.

For example, this program detects duplicate words, even when they span line breaks (but not paragraph

ones). For this example, we don‘t need /s because we aren‘t using dot in a regular expression that we want

to cross line boundaries. Neither do we need /m because we aren‘t wanting caret or dollar to match at any

point inside the record next to newlines. But it‘s imperative that $/ be set to something other than the

default, or else we won‘t actually ever have a multiline record read in.

$/ = ’’; # read in more whole paragraph, not just one line

while ( <> ) {

while ( /\b([\w’−]+)(\s+\1)+\b/gi ) { # word starts alpha

print "Duplicate $1 at paragraph $.\n";

}

Here‘s code that finds sentences that begin with "From " (which would be mangled by many mailers):

$/ = ’’; # read in more whole paragraph, not just one line

while ( <> ) {

while ( /^From /gm ) { # /m makes ^ match next to \n

print "leading from in paragraph $.\n";

}

Here‘s code that finds everything between START and END in a paragraph:

undef $/; # read in whole file, not just one line or paragraph

while ( <> ) {

while ( /START(.*?)END/sm ) { # /s makes . cross line boundaries

print "$1\n";

}

How can I pull out lines between two patterns that are themselves on different lines?

You can use Perl‘s somewhat exotic .. operator (documented in perlop):

perl −ne ’print if /START/ .. /END/’ file1 file2 ...

If you wanted text and not lines, you would use

perl −0777 −pe ’print "$1\n" while /START(.*?)END/gs’ file1 file2 ...

But if you want nested occurrences of START through END, you‘ll run up against the problem described in

the question in this section on matching balanced text.

Here‘s another example of using ..:

while (<>) {

$in_header = 1 .. /^$/;

$in_body = /^$/ .. eof();

# now choose between them

} continue {

reset if eof(); # fix $.

}

I put a regular expression into $/ but it didn‘t work. What‘s wrong?

$/ must be a string, not a regular expression. Awk has to be better for something. :−)

Actually, you could do this if you don‘t mind reading the whole file into memory:

80 Version 5.005_02 18−Oct−1998

perlfaq6 Perl Programmers Reference Guide perlfaq6

undef $/;

@records = split /your_pattern/, <FH>;

The Net::Telnet module (available from CPAN) has the capability to wait for a pattern in the input stream, or

timeout if it doesn‘t appear within a certain time.

## Create a file with three lines.

open FH, ">file";

print FH "The first line\nThe second line\nThe third line\n";

close FH;

## Get a read/write filehandle to it.

$fh = new FileHandle "+<file";

## Attach it to a "stream" object.

use Net::Telnet;

$file = new Net::Telnet (−fhopen => $fh);

## Search for the second line and print out the third.

$file−>waitfor(’/second line\n/’);

print $file−>getline;

How do I substitute case insensitively on the LHS, but preserving case on the RHS?

It depends on what you mean by "preserving case". The following script makes the substitution have the

same case, letter by letter, as the original. If the substitution has more characters than the string being

substituted, the case of the last character is used for the rest of the substitution.

# Original by Nathan Torkington, massaged by Jeffrey Friedl

sub preserve_case($$)

{

my ($old, $new) = @_;

my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc

my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));

my ($len) = $oldlen < $newlen ? $oldlen : $newlen;

for ($i = 0; $i < $len; $i++) {

if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {

$state = 0;

} elsif (lc $c eq $c) {

substr($new, $i, 1) = lc(substr($new, $i, 1));

$state = 1;

} else {

substr($new, $i, 1) = uc(substr($new, $i, 1));

$state = 2;

}

# finish up with any remaining new (for when new is longer than old)

if ($newlen > $oldlen) {

if ($state == 1) {

substr($new, $oldlen) = lc(substr($new, $oldlen));

} elsif ($state == 2) {

substr($new, $oldlen) = uc(substr($new, $oldlen));

}

return $new;

}

18−Oct−1998 Version 5.005_02 81

perlfaq6 Perl Programmers Reference Guide perlfaq6

$a = "this is a TEsT case";

$a =~ s/(test)/preserve_case($1, "success")/gie;

print "$a\n";

This prints:

this is a SUcCESS case

How can I make \w match national character sets?

See perllocale.

How can I match a locale−smart version of /[a−zA−Z]/?

One alphabetic character would be /[^\W\d_]/, no matter what locale you‘re in. Non−alphabetics would

be /[\W\d_]/ (assuming you don‘t consider an underscore a letter).

How can I quote a variable to use in a regexp?

The Perl parser will expand $variable and @variable references in regular expressions unless the

delimiter is a single quote. Remember, too, that the right−hand side of a s/// substitution is considered a

double−quoted string (see perlop for more details). Remember also that any regexp special characters will

be acted on unless you precede the substitution with \Q. Here‘s an example:

$string = "to die?";

$lhs = "die?";

$rhs = "sleep no more";

$string =~ s/\Q$lhs/$rhs/;

# $string is now "to sleep no more"

Without the \Q, the regexp would also spuriously match "di".

What is /o really for?

Using a variable in a regular expression match forces a re−evaluation (and perhaps recompilation) each time

through. The /o modifier locks in the regexp the first time it‘s used. This always happens in a constant

regular expression, and in fact, the pattern was compiled into the internal format at the same time your entire

program was.

Use of /o is irrelevant unless variable interpolation is used in the pattern, and if so, the regexp engine will

neither know nor care whether the variables change after the pattern is evaluated the very first time.

/o is often used to gain an extra measure of efficiency by not performing subsequent evaluations when you

know it won‘t matter (because you know the variables won‘t change), or more rarely, when you don‘t want

the regexp to notice if they do.

For example, here‘s a "paragrep" program:

$/ = ’’; # paragraph mode

$pat = shift;

while (<>) {

print if /$pat/o;

}

How do I use a regular expression to strip C style comments from a file?

While this actually can be done, it‘s much harder than you‘d think. For example, this one−liner

perl −0777 −pe ’s{/\*.*?\*/}{}gs’ foo.c

will work in many but not all cases. You see, it‘s too simple−minded for certain kinds of C programs, in

particular, those with what appear to be comments in quoted strings. For that, you‘d need something like

this, created by Jeffrey Friedl:

$/ = undef;

$_ = <>;

82 Version 5.005_02 18−Oct−1998

perlfaq6 Perl Programmers Reference Guide perlfaq6

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|’(\\.|[^’\\])*’|\n+|.[^/"’\\]*)#$

print;

This could, of course, be more legibly written with the /x modifier, adding whitespace and comments.

Can I use Perl regular expressions to match balanced text?

Although Perl regular expressions are more powerful than "mathematical" regular expressions, because they

feature conveniences like backreferences (\1 and its ilk), they still aren‘t powerful enough. You still need to

use non−regexp techniques to parse balanced text, such as the text enclosed between matching parentheses or

braces, for example.

An elaborate subroutine (for 7−bit ASCII only) to pull out balanced and possibly nested single chars, like ‘

and ’, { and }, or ( and ) can be found in

http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .

The C::Scan module from CPAN contains such subs for internal usage, but they are undocumented.

What does it mean that regexps are greedy? How can I get around it?

Most people mean that greedy regexps match as much as they can. Technically speaking, it‘s actually the

quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate

gratification to overall greed. To get non−greedy versions of the same quantifiers, use (??, *?, +?, {}?).

An example:

$s1 = $s2 = "I am very very cold";

$s1 =~ s/ve.*y //; # I am cold

$s2 =~ s/ve.*?y //; # I am very cold

Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier

effectively tells the regular expression engine to find a match as quickly as possible and pass control on to

whatever is next in line, like you would if you were playing hot potato.

How do I process each word on each line?

Use the split function:

while (<>) {

foreach $word ( split ) {

# do something with $word here

}

Note that this isn‘t really a word in the English sense; it‘s just chunks of consecutive non−whitespace

characters.

To work with only alphanumeric sequences, you might consider

while (<>) {

foreach $word (m/(\w+)/g) {

# do something with $word here

}

How can I print out a word−frequency or line−frequency summary?

To do this, you have to parse out each word in the input stream. We‘ll pretend that by word you mean chunk

of alphabetics, hyphens, or apostrophes, rather than the non−whitespace chunk idea of a word given in the

previous question:

while (<>) {

while ( /(\b[^\W_\d][\w’−]+\b)/g ) { # misses "‘sheep’"

$seen{$1}++;

}

18−Oct−1998 Version 5.005_02 83

perlfaq6 Perl Programmers Reference Guide perlfaq6

}

while ( ($word, $count) = each %seen ) {

print "$count $word\n";

}

If you wanted to do the same thing for lines, you wouldn‘t need a regular expression:

while (<>) {

$seen{$_}++;

}

while ( ($line, $count) = each %seen ) {

print "$count $line";

}

If you want these output in a sorted order, see the section on Hashes.

How can I do approximate matching?

See the module String::Approx available from CPAN.

How do I efficiently match many regular expressions at once?

The following is super−inefficient:

while (<FH>) {

foreach $pat (@patterns) {

if ( /$pat/ ) {

# do something

}

Instead, you either need to use one of the experimental Regexp extension modules from CPAN (which might

well be overkill for your purposes), or else put together something like this, inspired from a routine in Jeffrey

Friedl‘s book:

sub _bm_build {

my $condition = shift;

my @regexp = @_; # this MUST not be local(); need my()

my $expr = join $condition => map { "m/\$regexp[$_]/o" } (0..$#regexp);

my $match_func = eval "sub { $expr }";

die if $@; # propagate $@; this shouldn’t happen!

return $match_func;

}

sub bm_and { _bm_build(’&&’, @_) }

sub bm_or { _bm_build(’||’, @_) }

$f1 = bm_and qw{

xterm

(?i)window

};

$f2 = bm_or qw{

\b[Ff]ree\b

\bBSD\B

(?i)sys(tem)?\s*[V5]\b

};

# feed me /etc/termcap, prolly

while ( <> ) {

print "1: $_" if &$f1;

84 Version 5.005_02 18−Oct−1998

perlfaq6 Perl Programmers Reference Guide perlfaq6

print "2: $_" if &$f2;

}

Why don‘t word−boundary searches with \b work for me?

Two common misconceptions are that \b is a synonym for \s+, and that it‘s the edge between whitespace

characters and non−whitespace characters. Neither is correct. \b is the place between a \w character and a

\W character (that is, \b is the edge of a "word"). It‘s a zero−width assertion, just like ^, $, and all the

other anchors, so it doesn‘t consume any characters. perlre describes the behaviour of all the regexp

metacharacters.

Here are examples of the incorrect application of \b, with fixes:

"two words" =~ /(\w+)\b(\w+)/; # WRONG

"two words" =~ /(\w+)\s+(\w+)/; # right

" =matchless= text" =~ /\b=(\w+)=\b/; # WRONG

" =matchless= text" =~ /=(\w+)=/; # right

Although they may not do what you thought they did, \b and \B can still be quite useful. For an example of

the correct use of \b, see the example of matching duplicate words over multiple lines.

An example of using \B is the pattern \Bis\B. This will find occurrences of "is" on the insides of words

only, as in "thistle", but not "this" or "island".

Why does using $&, $‘, or $’ slow my program down?

Because once Perl sees that you need one of these variables anywhere in the program, it has to provide them

on each and every pattern match. The same mechanism that handles these provides for the use of $1, $2,

etc., so you pay the same price for each regexp that contains capturing parentheses. But if you never use $&,

etc., in your script, then regexps without capturing parentheses won‘t be penalized. So avoid $&, $‘, and

$‘ if you can, but if you can‘t (and some algorithms really appreciate them), once you‘ve used them once,

use them at will, because you‘ve already paid the price.

What good is \G in a regular expression?

The notation \G is used in a match or substitution in conjunction the /g modifier (and ignored if there‘s no

/g) to anchor the regular expression to the point just past where the last match occurred, i.e. the pos()

point.

For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with

leading > characters), and you want change each leading > into a corresponding :. You could do so in this

way:

s/^(>+)/’:’ x length($1)/gem;

Or, using \G, the much simpler (and faster):

s/\G>/:/g;

A more sophisticated use might involve a tokenizer. The following lex−like example is courtesy of Jeffrey

Friedl. It did not work in 5.003 due to bugs in that release, but does work in 5.004 or better. (Note the use of

/c, which prevents a failed match with /g from resetting the search position back to the beginning of the

string.)

while (<>) {

chomp;

PARSER: {

m/ \G( \d+\b )/gcx && do { print "number: $1\n"; redo; };

m/ \G( \w+ )/gcx && do { print "word: $1\n"; redo; };

m/ \G( \s+ )/gcx && do { print "space: $1\n"; redo; };

m/ \G( [^\w\d]+ )/gcx && do { print "other: $1\n"; redo; };

}

18−Oct−1998 Version 5.005_02 85

perlfaq6 Perl Programmers Reference Guide perlfaq6

Of course, that could have been written as

while (<>) {

chomp;

PARSER: {

if ( /\G( \d+\b )/gcx {

print "number: $1\n";

redo PARSER;

}

if ( /\G( \w+ )/gcx {

print "word: $1\n";

redo PARSER;

}

if ( /\G( \s+ )/gcx {

print "space: $1\n";

redo PARSER;

}

if ( /\G( [^\w\d]+ )/gcx {

print "other: $1\n";

redo PARSER;

}

But then you lose the vertical alignment of the regular expressions.

Are Perl regexps DFAs or NFAs? Are they POSIX compliant?

While it‘s true that Perl‘s regular expressions resemble the DFAs (deterministic finite automata) of the

egrep(1) program, they are in fact implemented as NFAs (non−deterministic finite automata) to allow

backtracking and backreferencing. And they aren‘t POSIX−style either, because those guarantee worst−case

behavior for all cases. (It seems that some people prefer guarantees of consistency, even when what‘s

guaranteed is slowness.) See the book "Mastering Regular Expressions" (from O‘Reilly) by Jeffrey Friedl

for all the details you could ever hope to know on these matters (a full citation appears in perlfaq2).

What‘s wrong with using grep or map in a void context?

Both grep and map build a return list, regardless of their context. This means you‘re making Perl go to the

trouble of building up a return list that you then just ignore. That‘s no way to treat a programming language,

you insensitive scoundrel!

How can I match strings with multibyte characters?

This is hard, and there‘s no good way. Perl does not directly support wide characters. It pretends that a byte

and a character are synonymous. The following set of approaches was offered by Jeffrey Friedl, whose

article in issue #5 of The Perl Journal talks about this very matter.

Let‘s suppose you have some weird Martian encoding where pairs of ASCII uppercase letters encode single

Martian letters (i.e. the two bytes "CV" make a single Martian letter, as do the two bytes "SG", "VS", "XX",

etc.). Other bytes represent single characters, just like ASCII.

So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the nine characters ‘I‘, ’ ‘, ‘a‘, ‘m‘, ’ ‘,

‘CV‘, ‘SG‘, ‘XX‘, ‘!’.

Now, say you want to search for the single character /GX/. Perl doesn‘t know about Martian, so it‘ll find the

two bytes "GX" in the "I am CVSGXX!" string, even though that character isn‘t there: it just looks like it is

because "SG" is next to "XX", but there‘s no real "GX". This is a big problem.

Here are a few ways, all painful, to deal with it:

$martian =~ s/([A−Z][A−Z])/ $1 /g; # Make sure adjacent ‘‘martian’’ bytes

# are no longer adjacent.

86 Version 5.005_02 18−Oct−1998

perlfaq6 Perl Programmers Reference Guide perlfaq6

print "found GX!\n" if $martian =~ /GX/;

Or like this:

@chars = $martian =~ m/([A−Z][A−Z]|[^A−Z])/g;

# above is conceptually similar to: @chars = $text =~ m/(.)/g;

foreach $char (@chars) {

print "found GX!\n", last if $char eq ’GX’;

}

Or like this:

while ($martian =~ m/\G([A−Z][A−Z]|.)/gs) { # \G probably unneeded

print "found GX!\n", last if $1 eq ’GX’;

}

Or like this:

die "sorry, Perl doesn’t (yet) have Martian support )−:\n";

In addition, a sample program which converts half−width to full−width katakana (in Shift−JIS or EUC

encoding) is available from CPAN as

=for Tom make it so

There are many double− (and multi−) byte encodings commonly used these days. Some versions of these

have 1−, 2−, 3−, and 4−byte characters, all mixed.

AUTHOR AND COPYRIGHT

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You

are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit would be courteous but is not required.

18−Oct−1998 Version 5.005_02 87

perlfaq7 Perl Programmers Reference Guide perlfaq7

NAME

perlfaq7 − Perl Language Issues ($Revision: 1.21 $, $Date: 1998/06/22 15:20:07 $)

DESCRIPTION

This section deals with general Perl language issues that don‘t clearly fit into any of the other sections.

Can I get a BNF/yacc/RE for the Perl language?

There is no BNF, but you can paw your way through the yacc grammar in perly.y in the source distribution if

you‘re particularly brave. The grammar relies on very smart tokenizing code, so be prepared to venture into

toke.c as well.

In the words of Chaim Frenkel: "Perl‘s grammar can not be reduced to BNF. The work of parsing perl is

distributed between yacc, the lexer, smoke and mirrors."

What are all these $@%* punctuation signs, and how do I know when to use them?

They are type specifiers, as detailed in perldata:

$ for scalar values (number, string or reference)

@ for arrays

% for hashes (associative arrays)

* for all types of that symbol name. In version 4 you used them like

pointers, but in modern perls you can just use references.

While there are a few places where you don‘t actually need these type specifiers, you should always use

them.

A couple of others that you‘re likely to encounter that aren‘t really type specifiers are:

<> are used for inputting a record from a filehandle.

\ takes a reference to something.

Note that <FILE> is neither the type specifier for files nor the name of the handle. It is the <> operator

applied to the handle FILE. It reads one line (well, record − see

) from the handle FILE in scalar context,

or all lines in list context. When performing open, close, or any other operation besides <> on files, or even

talking about the handle, do not use the brackets. These are correct: eof(FH), seek(FH, 0, 2) and

"copying from STDIN to FILE".

Do I always/never have to quote my strings or use semicolons and commas?

Normally, a bareword doesn‘t need to be quoted, but in most cases probably should be (and must be under

use strict). But a hash key consisting of a simple word (that isn‘t the name of a defined subroutine)

and the left−hand operand to the => operator both count as though they were quoted:

This is like this

−−−−−−−−−−−− −−−−−−−−−−−−−−−

$foo{line} $foo{"line"}

bar => stuff "bar" => stuff

The final semicolon in a block is optional, as is the final comma in a list. Good style (see perlstyle) says to

put them in except for one−liners:

if ($whoops) { exit 1 }

@nums = (1, 2, 3);

if ($whoops) {

exit 1;

}

@lines = (

"There Beren came from mountains cold",

"And lost he wandered under leaves",

);

88 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

How do I skip some return values?

One way is to treat the return values as a list and index into it:

$dir = (getpwnam($user))[7];

Another way is to use undef as an element on the left−hand−side:

($dev, $ino, undef, undef, $uid, $gid) = stat($file);

How do I temporarily block warnings?

The $^W variable (documented in perlvar) controls runtime warnings for a block:

{

local $^W = 0; # temporarily turn off warnings

$a = $b + $c; # I know these might be undef

}

Note that like all the punctuation variables, you cannot currently use my() on $^W, only local().

A new use warnings pragma is in the works to provide finer control over all this. The curious should

check the perl5−porters mailing list archives for details.

What‘s an extension?

A way of calling compiled C code from Perl. Reading perlxstut is a good place to learn more about

extensions.

Why do Perl operators have different precedence than C operators?

Actually, they don‘t. All C operators that Perl copies have the same precedence in Perl as they do in C. The

problem is with operators that C doesn‘t have, especially functions that give a list context to everything on

their right, eg print, chmod, exec, and so on. Such functions are called "list operators" and appear as such in

the precedence table in perlop.

A common mistake is to write:

unlink $file || die "snafu";

This gets interpreted as:

unlink ($file || die "snafu");

To avoid this problem, either put in extra parentheses or use the super low precedence or operator:

(unlink $file) || die "snafu";

unlink $file or die "snafu";

The "English" operators (and, or, xor, and not) deliberately have precedence lower than that of list

operators for just such situations as the one above.

Another operator with surprising precedence is exponentiation. It binds more tightly even than unary minus,

making −2**2 product a negative not a positive four. It is also right−associating, meaning that 2**3**2 is

two raised to the ninth power, not eight squared.

Although it has the same precedence as in C, Perl‘s ?: operator produces an lvalue. This assigns $x to

either $a or $b, depending on the trueness of $maybe:

($maybe ? $a : $b) = $x;

How do I declare/create a structure?

In general, you don‘t "declare" a structure. Just use a (probably anonymous) hash reference. See perlref and

perldsc for details. Here‘s an example:

$person = {}; # new anonymous hash

$person−>{AGE} = 24; # set field AGE to 24

$person−>{NAME} = "Nat"; # set field NAME to "Nat"

18−Oct−1998 Version 5.005_02 89

perlfaq7 Perl Programmers Reference Guide perlfaq7

If you‘re looking for something a bit more rigorous, try perltoot.

How do I create a module?

A module is a package that lives in a file of the same name. For example, the Hello::There module would

live in Hello/There.pm. For details, read perlmod. You‘ll also find Exporter helpful. If you‘re writing a C

or mixed−language module with both C and Perl, then you should study perlxstut.

Here‘s a convenient template you might wish you use when starting your own module. Make sure to change

the names appropriately.

package Some::Module; # assumes Some/Module.pm

use strict;

BEGIN {

use Exporter ();

use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);

## set the version for version checking; uncomment to use

## $VERSION = 1.00;

# if using RCS/CVS, this next line may be preferred,

# but beware two−digit versions.

$VERSION = do{my@r=q$Revision: 1.21 $=~/\d+/g;sprintf ’%d.’.’%02d’x$#r,@r};

@ISA = qw(Exporter);

@EXPORT = qw(&func1 &func2 &func3);

%EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],

# your exported package globals go here,

# as well as any optionally exported functions

@EXPORT_OK = qw($Var1 %Hashit);

}

use vars @EXPORT_OK;

# non−exported package globals go here

use vars qw( @more $stuff );

# initialize package globals, first exported ones

$Var1 = ’’;

%Hashit = ();

# then the others (which are still accessible as $Some::Module::stuff)

$stuff = ’’;

@more = ();

# all file−scoped lexicals must be created before

# the functions below that use them.

# file−private lexicals go here

my $priv_var = ’’;

my %secret_hash = ();

# here’s a file−private function as a closure,

# callable as &$priv_func; it cannot be prototyped.

my $priv_func = sub {

# stuff goes here.

};

# make all your functions, whether exported or not;

# remember to put something interesting in the {} stubs

sub func1 {} # no prototype

90 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

sub func2() {} # proto’d void

sub func3($$) {} # proto’d to 2 scalars

# this one isn’t exported, but could be called!

sub func4(\%) {} # proto’d to 1 hash ref

END { } # module clean−up code here (global destructor)

1; # modules must return true

How do I create a class?

See perltoot for an introduction to classes and objects, as well as perlobj and perlbot.

How can I tell if a variable is tainted?

See Laundering and Detecting Tainted Data in perlsec. Here‘s an example (which doesn‘t use any system

calls, because the kill() is given no processes to signal):

sub is_tainted {

return ! eval { join(’’,@_), kill 0; 1; };

}

This is not −w clean, however. There is no −w clean way to detect taintedness − take this as a hint that you

should untaint all possibly−tainted data.

What‘s a closure?

Closures are documented in perlref.

Closure is a computer science term with a precise but hard−to−explain meaning. Closures are implemented

in Perl as anonymous subroutines with lasting references to lexical variables outside their own scopes. These

lexicals magically refer to the variables that were around when the subroutine was defined (deep binding).

Closures make sense in any programming language where you can have the return value of a function be

itself a function, as you can in Perl. Note that some languages provide anonymous functions but are not

capable of providing proper closures; the Python language, for example. For more information on closures,

check out any textbook on functional programming. Scheme is a language that not only supports but

encourages closures.

Here‘s a classic function−generating function:

sub add_function_generator {

return sub { shift + shift };

}

$add_sub = add_function_generator();

$sum = $add_sub−>(4,5); # $sum is 9 now.

The closure works as a function template with some customization slots left out to be filled later. The

anonymous subroutine returned by add_function_generator() isn‘t technically a closure because it

refers to no lexicals outside its own scope.

Contrast this with the following make_adder() function, in which the returned anonymous function

contains a reference to a lexical variable outside the scope of that function itself. Such a reference requires

that Perl return a proper closure, thus locking in for all time the value that the lexical had when the function

was created.

sub make_adder {

my $addpiece = shift;

return sub { shift + $addpiece };

}

$f1 = make_adder(20);

$f2 = make_adder(555);

18−Oct−1998 Version 5.005_02 91

perlfaq7 Perl Programmers Reference Guide perlfaq7

Now &$f1($n) is always 20 plus whatever $n you pass in, whereas &$f2($n) is always 555 plus

whatever $n you pass in. The $addpiece in the closure sticks around.

Closures are often used for less esoteric purposes. For example, when you want to pass in a bit of code into

a function:

my $line;

timeout( 30, sub { $line = <STDIN> } );

If the code to execute had been passed in as a string, ‘$line = <STDIN>’, there would have been no

way for the hypothetical timeout() function to access the lexical variable $line back in its caller‘s

scope.

What is variable suicide and how can I prevent it?

Variable suicide is when you (temporarily or permanently) lose the value of a variable. It is caused by

scoping through my() and local() interacting with either closures or aliased foreach() interator

variables and subroutine arguments. It used to be easy to inadvertently lose a variable‘s value this way, but

now it‘s much harder. Take this code:

my $f = "foo";

sub T {

while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" }

}

print "Finally $f\n";

The $f that has "bar" added to it three times should be a new $f (my $f should create a new local variable

each time through the loop). It isn‘t, however. This is a bug, and will be fixed.

How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}?

With the exception of regexps, you need to pass references to these objects. See

Pass by Reference in perlsub for this particular question, and perlref for information on references.

Passing Variables and Functions

Regular variables and functions are quite easy: just pass in a reference to an existing or anonymous

variable or function:

func( \$some_scalar );

func( \$some_array );

func( [ 1 .. 10 ] );

func( \%some_hash );

func( { this => 10, that => 20 } );

func( \&some_func );

func( sub { $_[0] ** $_[1] } );

Passing Filehandles

To pass filehandles to subroutines, use the *FH or \*FH notations. These are "typeglobs" − see

Typeglobs and Filehandles in perldata and especially Pass by Reference in perlsub for more

information.

Here‘s an excerpt:

If you‘re passing around filehandles, you could usually just use the bare typeglob, like *STDOUT, but

typeglobs references would be better because they‘ll still work properly under use strict

‘refs’. For example:

splutter(\*STDOUT);

sub splutter {

my $fh = shift;

92 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

print $fh "her um well a hmmm\n";

}

$rec = get_rec(\*STDIN);

sub get_rec {

my $fh = shift;

return scalar <$fh>;

}

If you‘re planning on generating new filehandles, you could do this:

sub openit {

my $name = shift;

local *FH;

return open (FH, $path) ? *FH : undef;

}

$fh = openit(’< /etc/motd’);

print <$fh>;

Passing Regexps

To pass regexps around, you‘ll need to either use one of the highly experimental regular expression

modules from CPAN (Nick Ing−Simmons‘s Regexp or Ilya Zakharevich‘s Devel::Regexp), pass

around strings and use an exception−trapping eval, or else be be very, very clever. Here‘s an example

of how to pass in a string to be regexp compared:

sub compare($$) {

my ($val1, $regexp) = @_;

my $retval = eval { $val =~ /$regexp/ };

die if $@;

return $retval;

}

$match = compare("old McDonald", q/d.*D/);

Make sure you never say something like this:

return eval "\$val =~ /$regexp/"; # WRONG

or someone can sneak shell escapes into the regexp due to the double interpolation of the eval and the

double−quoted string. For example:

$pattern_of_evil = ’danger ${ system("rm −rf * &") } danger’;

eval "\$string =~ /$pattern_of_evil/";

Those preferring to be very, very clever might see the O‘Reilly book, Mastering Regular Expressions,

by Jeffrey Friedl. Page 273‘s Build_MatchMany_Function() is particularly interesting. A

complete citation of this book is given in perlfaq2.

Passing Methods

To pass an object method into a subroutine, you can do this:

call_a_lot(10, $some_obj, "methname")

sub call_a_lot {

my ($count, $widget, $trick) = @_;

for (my $i = 0; $i < $count; $i++) {

$widget−>$trick();

}

Or you can use a closure to bundle up the object and its method call and arguments:

18−Oct−1998 Version 5.005_02 93

perlfaq7 Perl Programmers Reference Guide perlfaq7

my $whatnot = sub { $some_obj−>obfuscate(@args) };

func($whatnot);

sub func {

my $code = shift;

&$code();

}

You could also investigate the can() method in the UNIVERSAL class (part of the standard perl

distribution).

How do I create a static variable?

As with most things in Perl, TMTOWTDI. What is a "static variable" in other languages could be either a

function−private variable (visible only within a single function, retaining its value between calls to that

function), or a file−private variable (visible only to functions within the file it was declared in) in Perl.

Here‘s code to implement a function−private variable:

BEGIN {

my $counter = 42;

sub prev_counter { return −−$counter }

sub next_counter { return $counter++ }

}

Now prev_counter() and next_counter() share a private variable $counter that was initialized

at compile time.

To declare a file−private variable, you‘ll still use a my(), putting it at the outer scope level at the top of the

file. Assume this is in file Pax.pm:

package Pax;

my $started = scalar(localtime(time()));

sub begun { return $started }

When use Pax or require Pax loads this module, the variable will be initialized. It won‘t get

garbage−collected the way most variables going out of scope do, because the begun() function cares about

it, but no one else can get it. It is not called $Pax::started because its scope is unrelated to the package.

It‘s scoped to the file. You could conceivably have several packages in that same file all accessing the same

private variable, but another file with the same package couldn‘t get to it.

See Peristent Private Variables in perlsub for details.

What‘s the difference between dynamic and lexical (static) scoping? Between local() and

my()?

local($x) saves away the old value of the global variable $x, and assigns a new value for the duration

of the subroutine, which is visible in other functions called from that subroutine. This is done at run−time,

so is called dynamic scoping. local() always affects global variables, also called package variables or

dynamic variables.

my($x) creates a new variable that is only visible in the current subroutine. This is done at compile−time,

so is called lexical or static scoping. my() always affects private variables, also called lexical variables or

(improperly) static(ly scoped) variables.

For instance:

sub visible {

print "var has value $var\n";

}

sub dynamic {

local $var = ’local’; # new temporary value for the still−global

visible(); # variable called $var

94 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

}

sub lexical {

my $var = ’private’; # new private variable, $var

visible(); # (invisible outside of sub scope)

}

$var = ’global’;

visible(); # prints global

dynamic(); # prints local

lexical(); # prints global

Notice how at no point does the value "private" get printed. That‘s because $var only has that value within

the block of the lexical() function, and it is hidden from called subroutine.

In summary, local() doesn‘t make what you think of as private, local variables. It gives a global variable

a temporary value. my() is what you‘re looking for if you want private variables.

See "Private Variables via

my()

" and "Temporary Values via

local()

" for excruciating details.

How can I access a dynamic variable while a similarly named lexical is in scope?

You can do this via symbolic references, provided you haven‘t set use strict "refs". So instead of

$var, use ${‘var‘}.

local $var = "global";

my $var = "lexical";

print "lexical is $var\n";

no strict ’refs’;

print "global is ${’var’}\n";

If you know your package, you can just mention it explicitly, as in $Some_Pack::var. Note that the

notation $::var is not the dynamic $var in the current package, but rather the one in the main package,

as though you had written $main::var. Specifying the package directly makes you hard−code its name,

but it executes faster and avoids running afoul of use strict "refs".

What‘s the difference between deep and shallow binding?

In deep binding, lexical variables mentioned in anonymous subroutines are the same ones that were in scope

when the subroutine was created. In shallow binding, they are whichever variables with the same names

happen to be in scope when the subroutine is called. Perl always uses deep binding of lexical variables (i.e.,

those created with my()). However, dynamic variables (aka global, local, or package variables) are

effectively shallowly bound. Consider this just one more reason not to use them. See the answer to

"What‘s a closure?".

Why doesn‘t "my($foo) = <FILE;" work right?

my() and local() give list context to the right hand side of =. The <FH> read operation, like so many of

Perl‘s functions and operators, can tell which context it was called in and behaves appropriately. In general,

the scalar() function can help. This function does nothing to the data itself (contrary to popular myth) but

rather tells its argument to behave in whatever its scalar fashion is. If that function doesn‘t have a defined

scalar behavior, this of course doesn‘t help you (such as with sort()).

To enforce scalar context in this particular case, however, you need merely omit the parentheses:

local($foo) = <FILE>; # WRONG

local($foo) = scalar(<FILE>); # ok

local $foo = <FILE>; # right

You should probably be using lexical variables anyway, although the issue is the same here:

my($foo) = <FILE>; # WRONG

18−Oct−1998 Version 5.005_02 95

perlfaq7 Perl Programmers Reference Guide perlfaq7

my $foo = <FILE>; # right

How do I redefine a builtin function, operator, or method?

Why do you want to do that? :−)

If you want to override a predefined function, such as open(), then you‘ll have to import the new definition

from a different module. See Overriding Builtin Functions in perlsub. There‘s also an example in

Class::Template in perltoot.

If you want to overload a Perl operator, such as + or **, then you‘ll want to use the use overload

pragma, documented in overload.

If you‘re talking about obscuring method calls in parent classes, see Overridden Methods in perltoot.

What‘s the difference between calling a function as &foo and foo()?

When you call a function as &foo, you allow that function access to your current @_ values, and you

by−pass prototypes. That means that the function doesn‘t get an empty @_, it gets yours! While not strictly

speaking a bug (it‘s documented that way in perlsub), it would be hard to consider this a feature in most

cases.

When you call your function as &foo(), then you do get a new @_, but prototyping is still circumvented.

Normally, you want to call a function using foo(). You may only omit the parentheses if the function is

already known to the compiler because it already saw the definition (use but not require), or via a

forward reference or use subs declaration. Even in this case, you get a clean @_ without any of the old

values leaking through where they don‘t belong.

How do I create a switch or case statement?

This is explained in more depth in the perlsyn. Briefly, there‘s no official case statement, because of the

variety of tests possible in Perl (numeric comparison, string comparison, glob comparison, regexp matching,

overloaded comparisons, ...). Larry couldn‘t decide how best to do this, so he left it out, even though it‘s

been on the wish list since perl1.

The general answer is to write a construct like this:

for ($variable_to_test) {

if (/pat1/) { } # do something

elsif (/pat2/) { } # do something else

elsif (/pat3/) { } # do something else

else { } # default

}

Here‘s a simple example of a switch based on pattern matching, this time lined up in a way to make it look

more like a switch statement. We‘ll do a multi−way conditional based on the type of reference stored in

$whatchamacallit:

SWITCH: for (ref $whatchamacallit) {

/^$/ && die "not a reference";

/SCALAR/ && do {

print_scalar($$ref);

last SWITCH;

};

/ARRAY/ && do {

print_array(@$ref);

last SWITCH;

};

/HASH/ && do {

print_hash(%$ref);

96 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

last SWITCH;

};

/CODE/ && do {

warn "can’t print function ref";

last SWITCH;

};

# DEFAULT

warn "User defined type skipped";

}

See perlsyn/"Basic BLOCKs and Switch Statements" for many other examples in this style.

Sometimes you should change the positions of the constant and the variable. For example, let‘s say you

wanted to test which of many answers you were given, but in a case−insensitive way that also allows

abbreviations. You can use the following technique if the strings all start with different characters, or if you

want to arrange the matches so that one takes precedence over another, as "SEND" has precedence over

"STOP" here:

chomp($answer = <>);

if ("SEND" =~ /^\Q$answer/i) { print "Action is send\n" }

elsif ("STOP" =~ /^\Q$answer/i) { print "Action is stop\n" }

elsif ("ABORT" =~ /^\Q$answer/i) { print "Action is abort\n" }

elsif ("LIST" =~ /^\Q$answer/i) { print "Action is list\n" }

elsif ("EDIT" =~ /^\Q$answer/i) { print "Action is edit\n" }

A totally different approach is to create a hash of function references.

my %commands = (

"happy" => \&joy,

"sad", => \&sullen,

"done" => sub { die "See ya!" },

"mad" => \&angry,

);

print "How are you? ";

chomp($string = <STDIN>);

if ($commands{$string}) {

$commands{$string}−>();

} else {

print "No such command: $string\n";

}

How can I catch accesses to undefined variables/functions/methods?

The AUTOLOAD method, discussed in Autoloading in perlsub and

AUTOLOAD: Proxy Methods in perltoot, lets you capture calls to undefined functions and methods.

When it comes to undefined variables that would trigger a warning under −w, you can use a handler to trap

the pseudo−signal __WARN__ like this:

$SIG{__WARN__} = sub {

for ( $_[0] ) { # voici un switch statement

/Use of uninitialized value/ && do {

# promote warning to a fatal

die $_;

};

18−Oct−1998 Version 5.005_02 97

perlfaq7 Perl Programmers Reference Guide perlfaq7

# other warning cases to catch could go here;

warn $_;

}

};

Why can‘t a method included in this same file be found?

Some possible reasons: your inheritance is getting confused, you‘ve misspelled the method name, or the

object is of the wrong type. Check out perltoot for details on these. You may also use print

ref($object) to find out the class $object was blessed into.

Another possible reason for problems is because you‘ve used the indirect object syntax (eg, find Guru

"Samy") on a class name before Perl has seen that such a package exists. It‘s wisest to make sure your

packages are all defined before you start using them, which will be taken care of if you use the use

statement instead of require. If not, make sure to use arrow notation (eg, Guru−>find("Samy"))

instead. Object notation is explained in perlobj.

Make sure to read about creating modules in perlmod and the perils of indirect objects in

WARNING in perlobj.

How can I find out my current package?

If you‘re just a random program, you can do this to find out what the currently compiled package is:

my $packname = __PACKAGE__;

But if you‘re a method and you want to print an error message that includes the kind of object you were

called on (which is not necessarily the same as the one in which you were compiled):

sub amethod {

my $self = shift;

my $class = ref($self) || $self;

warn "called me from a $class object";

}

How can I comment out a large block of perl code?

Use embedded POD to discard it:

# program is here

=for nobody

This paragraph is commented out

# program continues

=begin comment text

all of this stuff

here will be ignored

by everyone

=end comment text

=cut

This can‘t go just anywhere. You have to put a pod directive where the parser is expecting a new statement,

not just in the middle of an expression or some other arbitrary yacc grammar production.

AUTHOR AND COPYRIGHT

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

98 Version 5.005_02 18−Oct−1998

perlfaq7 Perl Programmers Reference Guide perlfaq7

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You

are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit would be courteous but is not required.

18−Oct−1998 Version 5.005_02 99

perlfaq8 Perl Programmers Reference Guide perlfaq8

NAME

perlfaq8 − System Interaction ($Revision: 1.26 $, $Date: 1998/08/05 12:20:28 $)

DESCRIPTION

This section of the Perl FAQ covers questions involving operating system interaction. This involves

interprocess communication (IPC), control over the user−interface (keyboard, screen and pointing devices),

and most anything else not related to data manipulation.

Read the FAQs and documentation specific to the port of perl to your operating system (eg, perlvms,

perlplan9, ...). These should contain more detailed information on the vagaries of your perl.

How do I find out which operating system I‘m running under?

The $^O variable ($OSNAME if you use English) contains the operating system that your perl binary was

built for.

How come exec() doesn‘t return?

Because that‘s what it does: it replaces your currently running program with a different one. If you want to

keep going (as is probably the case if you‘re asking this question) use system() instead.

How do I do fancy stuff with the keyboard/screen/mouse?

How you access/control keyboards, screens, and pointing devices ("mice") is system−dependent. Try the

following modules:

Keyboard

Term::Cap Standard perl distribution

Term::ReadKey CPAN

Term::ReadLine::Gnu CPAN

Term::ReadLine::Perl CPAN

Term::Screen CPAN

Screen

Term::Cap Standard perl distribution

Curses CPAN

Term::ANSIColor CPAN

Mouse

Tk CPAN

Some of these specific cases are shown below.

How do I print something out in color?

In general, you don‘t, because you don‘t know whether the recipient has a color−aware display device. If

you know that they have an ANSI terminal that understands color, you can use the Term::ANSIColor module

from CPAN:

use Term::ANSIColor;

print color("red"), "Stop!\n", color("reset");

print color("green"), "Go!\n", color("reset");

Or like this:

use Term::ANSIColor qw(:constants);

print RED, "Stop!\n", RESET;

print GREEN, "Go!\n", RESET;

How do I read just one key without waiting for a return key?

Controlling input buffering is a remarkably system−dependent matter. If most systems, you can just use the

stty command as shown in getc, but as you see, that‘s already getting you into portability snags.

100 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

open(TTY, "+</dev/tty") or die "no tty: $!";

system "stty cbreak </dev/tty >/dev/tty 2>&1";

$key = getc(TTY); # perhaps this works

# OR ELSE

sysread(TTY, $key, 1);# probably this does

system "stty −cbreak </dev/tty >/dev/tty 2>&1";

The Term::ReadKey module from CPAN offers an easy−to−use interface that should be more efficient than

shelling out to stty for each key. It even includes limited support for Windows.

use Term::ReadKey;

ReadMode(’cbreak’);

$key = ReadKey(0);

ReadMode(’normal’);

However, that requires that you have a working C compiler and can use it to build and install a CPAN

module. Here‘s a solution using the standard POSIX module, which is already on your systems (assuming

your system supports POSIX).

use HotKey;

$key = readkey();

And here‘s the HotKey module, which hides the somewhat mystifying calls to manipulate the POSIX

termios structures.

# HotKey.pm

package HotKey;

@ISA = qw(Exporter);

@EXPORT = qw(cbreak cooked readkey);

use strict;

use POSIX qw(:termios_h);

my ($term, $oterm, $echo, $noecho, $fd_stdin);

$fd_stdin = fileno(STDIN);

$term = POSIX::Termios−>new();

$term−>getattr($fd_stdin);

$oterm = $term−>getlflag();

$echo = ECHO | ECHOK | ICANON;

$noecho = $oterm & ~$echo;

sub cbreak {

$term−>setlflag($noecho); # ok, so i don’t want echo either

$term−>setcc(VTIME, 1);

$term−>setattr($fd_stdin, TCSANOW);

}

sub cooked {

$term−>setlflag($oterm);

$term−>setcc(VTIME, 0);

$term−>setattr($fd_stdin, TCSANOW);

}

sub readkey {

my $key = ’’;

cbreak();

sysread(STDIN, $key, 1);

cooked();

return $key;

18−Oct−1998 Version 5.005_02 101

perlfaq8 Perl Programmers Reference Guide perlfaq8

}

END { cooked() }

How do I check whether input is ready on the keyboard?

The easiest way to do this is to read a key in nonblocking mode with the Term::ReadKey module from

CPAN, passing it an argument of −1 to indicate not to block:

use Term::ReadKey;

ReadMode(’cbreak’);

if (defined ($char = ReadKey(−1)) ) {

# input was waiting and it was $char

} else {

# no input was waiting

}

ReadMode(’normal’); # restore normal tty settings

How do I clear the screen?

If you only have to so infrequently, use system:

system("clear");

If you have to do this a lot, save the clear string so you can print it 100 times without calling a program 100

times:

$clear_string = ‘clear‘;

print $clear_string;

If you‘re planning on doing other screen manipulations, like cursor positions, etc, you might wish to use

Term::Cap module:

use Term::Cap;

$terminal = Term::Cap−>Tgetent( {OSPEED => 9600} );

$clear_string = $terminal−>Tputs(’cl’);

How do I get the screen size?

If you have Term::ReadKey module installed from CPAN, you can use it to fetch the width and height in

characters and in pixels:

use Term::ReadKey;

($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize();

This is more portable than the raw ioctl, but not as illustrative:

require ’sys/ioctl.ph’;

die "no TIOCGWINSZ " unless defined &TIOCGWINSZ;

open(TTY, "+</dev/tty") or die "No tty: $!";

unless (ioctl(TTY, &TIOCGWINSZ, $winsize=’’)) {

die sprintf "$0: ioctl TIOCGWINSZ (%08x: $!)\n", &TIOCGWINSZ;

}

($row, $col, $xpixel, $ypixel) = unpack(’S4’, $winsize);

print "(row,col) = ($row,$col)";

print " (xpixel,ypixel) = ($xpixel,$ypixel)" if $xpixel || $ypixel;

print "\n";

102 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

How do I ask the user for a password?

(This question has nothing to do with the web. See a different FAQ for that.)

There‘s an example of this in crypt). First, you put the terminal into "no echo" mode, then just read the

password normally. You may do this with an old−style ioctl() function, POSIX terminal control (see

POSIX, and Chapter 7 of the Camel), or a call to the stty program, with varying degrees of portability.

You can also do this for most systems using the Term::ReadKey module from CPAN, which is easier to use

and in theory more portable.

use Term::ReadKey;

ReadMode(’noecho’);

$password = ReadLine(0);

How do I read and write the serial port?

This depends on which operating system your program is running on. In the case of Unix, the serial ports

will be accessible through files in /dev; on other systems, the devices names will doubtless differ. Several

problem areas common to all device interaction are the following

lockfiles

Your system may use lockfiles to control multiple access. Make sure you follow the correct protocol.

Unpredictable behaviour can result from multiple processes reading from one device.

open mode

If you expect to use both read and write operations on the device, you‘ll have to open it for update (see

open in perlfunc for details). You may wish to open it without running the risk of blocking by using

sysopen() and O_RDWR|O_NDELAY|O_NOCTTY from the Fcntl module (part of the standard perl

distribution). See sysopen in perlfunc for more on this approach.

end of line

Some devices will be expecting a "\r" at the end of each line rather than a "\n". In some ports of perl,

"\r" and "\n" are different from their usual (Unix) ASCII values of "\012" and "\015". You may have

to give the numeric values you want directly, using octal ("\015"), hex ("0x0D"), or as a

control−character specification ("\cM").

print DEV "atv1\012"; # wrong, for some devices

print DEV "atv1\015"; # right, for some devices

Even though with normal text files, a "\n" will do the trick, there is still no unified scheme for

terminating a line that is portable between Unix, DOS/Win, and Macintosh, except to terminate ALL

line ends with "\015\012", and strip what you don‘t need from the output. This applies especially to

socket I/O and autoflushing, discussed next.

flushing output

If you expect characters to get to your device when you print() them, you‘ll want to autoflush that

filehandle. You can use select() and the $| variable to control autoflushing (see

and select):

$oldh = select(DEV);

$| = 1;

select($oldh);

You‘ll also see code that does this without a temporary variable, as in

select((select(DEV), $| = 1)[0]);

Or if you don‘t mind pulling in a few thousand lines of code just because you‘re afraid of a little $|

variable:

use IO::Handle;

DEV−>autoflush(1);

18−Oct−1998 Version 5.005_02 103

perlfaq8 Perl Programmers Reference Guide perlfaq8

As mentioned in the previous item, this still doesn‘t work when using socket I/O between Unix and

Macintosh. You‘ll need to hardcode your line terminators, in that case.

non−blocking input

If you are doing a blocking read() or sysread(), you‘ll have to arrange for an alarm handler to

provide a timeout (see alarm). If you have a non−blocking open, you‘ll likely have a non−blocking

read, which means you may have to use a 4−arg select() to determine whether I/O is ready on that

device (see select in perlfunc.

While trying to read from his caller−id box, the notorious Jamie Zawinski <jwz@netscape.com, after much

gnashing of teeth and fighting with sysread, sysopen, POSIX‘s tcgetattr business, and various other functions

that go bump in the night, finally came up with this:

sub open_modem {

use IPC::Open2;

my $stty = ‘/bin/stty −g‘;

open2( \*MODEM_IN, \*MODEM_OUT, "cu −l$modem_device −s2400 2>&1");

# starting cu hoses /dev/tty’s stty settings, even when it has

# been opened on a pipe...

system("/bin/stty $stty");

$_ = <MODEM_IN>;

chop;

if ( !m/^Connected/ ) {

print STDERR "$0: cu printed ‘$_’ instead of ‘Connected’\n";

}

How do I decode encrypted password files?

You spend lots and lots of money on dedicated hardware, but this is bound to get you talked about.

Seriously, you can‘t if they are Unix password files − the Unix password system employs one−way

encryption. It‘s more like hashing than encryption. The best you can check is whether something else

hashes to the same string. You can‘t turn a hash back into the original string. Programs like Crack can

forcibly (and intelligently) try to guess passwords, but don‘t (can‘t) guarantee quick success.

If you‘re worried about users selecting bad passwords, you should proactively check when they try to change

their password (by modifying passwd(1), for example).

How do I start a process in the background?

You could use

system("cmd &")

or you could use fork as documented in fork in perlfunc, with further examples in perlipc. Some things to be

aware of, if you‘re on a Unix−like system:

STDIN, STDOUT, and STDERR are shared

Both the main process and the backgrounded one (the "child" process) share the same STDIN,

STDOUT and STDERR filehandles. If both try to access them at once, strange things can happen.

You may want to close or reopen these for the child. You can get around this with opening a pipe

(see open in perlfunc) but on some systems this means that the child process cannot outlive the parent.

Signals

You‘ll have to catch the SIGCHLD signal, and possibly SIGPIPE too. SIGCHLD is sent when the

backgrounded process finishes. SIGPIPE is sent when you write to a filehandle whose child process

has closed (an untrapped SIGPIPE can cause your program to silently die). This is not an issue with

system("cmd&").

104 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

Zombies

You have to be prepared to "reap" the child process when it finishes

$SIG{CHLD} = sub { wait };

See Signals in perlipc for other examples of code to do this. Zombies are not an issue with

system("prog &").

How do I trap control characters/signals?

You don‘t actually "trap" a control character. Instead, that character generates a signal which is sent to your

terminal‘s currently foregrounded process group, which you then trap in your process. Signals are

documented in Signals in perlipc and chapter 6 of the Camel.

Be warned that very few C libraries are re−entrant. Therefore, if you attempt to print() in a handler that

got invoked during another stdio operation your internal structures will likely be in an inconsistent state, and

your program will dump core. You can sometimes avoid this by using syswrite() instead of print().

Unless you‘re exceedingly careful, the only safe things to do inside a signal handler are: set a variable and

exit. And in the first case, you should only set a variable in such a way that malloc() is not called (eg, by

setting a variable that already has a value).

For example:

$Interrupted = 0; # to ensure it has a value

$SIG{INT} = sub {

$Interrupted++;

syswrite(STDERR, "ouch\n", 5);

}

However, because syscalls restart by default, you‘ll find that if you‘re in a "slow" call, such as <FH>,

read(), connect(), or wait(), that the only way to terminate them is by "longjumping" out; that is, by

raising an exception. See the time−out handler for a blocking flock() in Signals in perlipc or chapter 6 of

the Camel.

How do I modify the shadow password file on a Unix system?

If perl was installed correctly, and your shadow library was written properly, the getpw*() functions

described in perlfunc should in theory provide (read−only) access to entries in the shadow password file. To

change the file, make a new shadow password file (the format varies from system to system − see passwd(5)

for specifics) and use pwd_mkdb(8) to install it (see pwd_mkdb(5) for more details).

How do I set the time and date?

Assuming you‘re running under sufficient permissions, you should be able to set the system−wide date and

time by running the date(1) program. (There is no way to set the time and date on a per−process basis.) This

mechanism will work for Unix, MS−DOS, Windows, and NT; the VMS equivalent is set time.

However, if all you want to do is change your timezone, you can probably get away with setting an

environment variable:

$ENV{TZ} = "MST7MDT"; # unixish

$ENV{’SYS$TIMEZONE_DIFFERENTIAL’}="−5" # vms

system "trn comp.lang.perl.misc";

How can I sleep() or alarm() for under a second?

If you want finer granularity than the 1 second that the sleep() function provides, the easiest way is to use

the select() function as documented in select in perlfunc. If your system has itimers and syscall()

support, you can check out the old example in

http://www.perl.com/CPAN/doc/misc/ancient/tutorial/eg/itimers.pl .

18−Oct−1998 Version 5.005_02 105

perlfaq8 Perl Programmers Reference Guide perlfaq8

How can I measure time under a second?

In general, you may not be able to. The Time::HiRes module (available from CPAN) provides this

functionality for some systems.

In general, you may not be able to. But if your system supports both the syscall() function in Perl as

well as a system call like gettimeofday(2), then you may be able to do something like this:

require ’sys/syscall.ph’;

$TIMEVAL_T = "LL";

$done = $start = pack($TIMEVAL_T, ());

syscall( &SYS_gettimeofday, $start, 0)) != −1

or die "gettimeofday: $!";

##########################

# DO YOUR OPERATION HERE #

##########################

syscall( &SYS_gettimeofday, $done, 0) != −1

or die "gettimeofday: $!";

@start = unpack($TIMEVAL_T, $start);

@done = unpack($TIMEVAL_T, $done);

# fix microseconds

for ($done[1], $start[1]) { $_ /= 1_000_000 }

$delta_time = sprintf "%.4f", ($done[0] + $done[1] )

−

($start[0] + $start[1] );

How can I do an atexit() or setjmp()/longjmp()? (Exception handling)

Release 5 of Perl added the END block, which can be used to simulate atexit(). Each package‘s END

block is called when the program or thread ends (see perlmod manpage for more details).

For example, you can use this to make sure your filter program managed to finish its output without filling

up the disk:

END {

close(STDOUT) || die "stdout close failed: $!";

}

The END block isn‘t called when untrapped signals kill the program, though, so if you use END blocks you

should also use

use sigtrap qw(die normal−signals);

Perl‘s exception−handling mechanism is its eval() operator. You can use eval() as setjmp and die()

as longjmp. For details of this, see the section on signals, especially the time−out handler for a blocking

flock() in Signals in perlipc and chapter 6 of the Camel.

If exception handling is all you‘re interested in, try the exceptions.pl library (part of the standard perl

distribution).

If you want the atexit() syntax (and an rmexit() as well), try the AtExit module available from

CPAN.

Why doesn‘t my sockets program work under System V (Solaris)? What does the error message

"Protocol not supported" mean?

Some Sys−V based systems, notably Solaris 2.X, redefined some of the standard socket constants. Since

these were constant across all architectures, they were often hardwired into perl code. The proper way to

106 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

deal with this is to "use Socket" to get the correct values.

Note that even though SunOS and Solaris are binary compatible, these values are different. Go figure.

How can I call my system‘s unique C functions from Perl?

In most cases, you write an external module to do it − see the answer to "Where can I learn about linking C

with Perl? [h2xs, xsubpp]". However, if the function is a system call, and your system supports

syscall(), you can use the syscall function (documented in perlfunc).

Remember to check the modules that came with your distribution, and CPAN as well − someone may

already have written a module to do it.

Where do I get the include files to do ioctl() or syscall()?

Historically, these would be generated by the h2ph tool, part of the standard perl distribution. This program

converts cpp(1) directives in C header files to files containing subroutine definitions, like

&SYS_getitimer, which you can use as arguments to your functions. It doesn‘t work perfectly, but it

usually gets most of the job done. Simple files like errno.h, syscall.h, and socket.h were fine, but the hard

ones like ioctl.h nearly always need to hand−edited. Here‘s how to install the *.ph files:

1. become super−user

2. cd /usr/include

3. h2ph *.h */*.h

If your system supports dynamic loading, for reasons of portability and sanity you probably ought to use

h2xs (also part of the standard perl distribution). This tool converts C header files to Perl extensions. See

perlxstut for how to get started with h2xs.

If your system doesn‘t support dynamic loading, you still probably ought to use h2xs. See perlxstut and

ExtUtils::MakeMaker for more information (in brief, just use make perl instead of a plain make to rebuild

perl with a new static extension).

Why do setuid perl scripts complain about kernel problems?

Some operating systems have bugs in the kernel that make setuid scripts inherently insecure. Perl gives you

a number of options (described in perlsec) to work around such systems.

How can I open a pipe both to and from a command?

The IPC::Open2 module (part of the standard perl distribution) is an easy−to−use approach that internally

uses pipe(), fork(), and exec() to do the job. Make sure you read the deadlock warnings in its

documentation, though (see IPC::Open2). See

Bidirectional Communication with Another Process in perlipc and

Bidirectional Communication with Yourself in perlipc

You may also use the IPC::Open3 module (part of the standard perl distribution), but be warned that it has a

different order of arguments from IPC::Open2 (see IPC::Open3).

Why can‘t I get the output of a command with system()?

You‘re confusing the purpose of system() and backticks (‘‘). system() runs a command and returns

exit status information (as a 16 bit value: the low 7 bits are the signal the process died from, if any, and the

high 8 bits are the actual exit value). Backticks (‘‘) run a command and return what it sent to STDOUT.

$exit_status = system("mail−users");

$output_string = ‘ls‘;

How can I capture STDERR from an external command?

There are three basic ways of running external commands:

system $cmd; # using system()

$output = ‘$cmd‘; # using backticks (‘‘)

open (PIPE, "cmd |"); # using open()

18−Oct−1998 Version 5.005_02 107

perlfaq8 Perl Programmers Reference Guide perlfaq8

With system(), both STDOUT and STDERR will go the same place as the script‘s versions of these,

unless the command redirects them. Backticks and open() read only the STDOUT of your command.

With any of these, you can change file descriptors before the call:

open(STDOUT, ">logfile");

system("ls");

or you can use Bourne shell file−descriptor redirection:

$output = ‘$cmd 2>some_file‘;

open (PIPE, "cmd 2>some_file |");

You can also use file−descriptor redirection to make STDERR a duplicate of STDOUT:

$output = ‘$cmd 2>&1‘;

open (PIPE, "cmd 2>&1 |");

Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling

the shell to do the redirection. This doesn‘t work:

open(STDERR, ">&STDOUT");

$alloutput = ‘cmd args‘; # stderr still escapes

This fails because the open() makes STDERR go to where STDOUT was going at the time of the

open(). The backticks then make STDOUT go to a string, but don‘t change STDERR (which still goes to

the old STDOUT).

Note that you must use Bourne shell (sh(1)) redirection syntax in backticks, not csh(1)! Details on why

Perl‘s system() and backtick and pipe opens all use the Bourne shell are in

http://www.perl.com/CPAN/doc/FMTEYEWTK/versus/csh.whynot . To capture a command‘s STDERR and

STDOUT together:

$output = ‘cmd 2>&1‘; # either with backticks

$pid = open(PH, "cmd 2>&1 |"); # or with an open pipe

while (<PH>) { } # plus a read

To capture a command‘s STDOUT but discard its STDERR:

$output = ‘cmd 2>/dev/null‘; # either with backticks

$pid = open(PH, "cmd 2>/dev/null |"); # or with an open pipe

while (<PH>) { } # plus a read

To capture a command‘s STDERR but discard its STDOUT:

$output = ‘cmd 2>&1 1>/dev/null‘; # either with backticks

$pid = open(PH, "cmd 2>&1 1>/dev/null |"); # or with an open pipe

while (<PH>) { } # plus a read

To exchange a command‘s STDOUT and STDERR in order to capture the STDERR but leave its STDOUT

to come out our old STDERR:

$output = ‘cmd 3>&1 1>&2 2>&3 3>&−‘; # either with backticks

$pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&−|");# or with an open pipe

while (<PH>) { } # plus a read

To read both a command‘s STDOUT and its STDERR separately, it‘s easiest and safest to redirect them

separately to files, and then read from those files when the program is done:

system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");

Ordering is important in all these examples. That‘s because the shell processes file descriptor redirections in

strictly left to right order.

108 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

system("prog args 1>tmpfile 2>&1");

system("prog args 2>&1 1>tmpfile");

The first command sends both standard out and standard error to the temporary file. The second command

sends only the old standard output there, and the old standard error shows up on the old standard out.

Why doesn‘t open() return an error when a pipe open fails?

It does, but probably not how you expect it to. On systems that follow the standard fork()/exec()

paradigm (such as Unix), it works like this: open() causes a fork(). In the parent, open() returns with

the process ID of the child. The child exec()s the command to be piped to/from. The parent can‘t know

whether the exec() was successful or not − all it can return is whether the fork() succeeded or not. To

find out if the command succeeded, you have to catch SIGCHLD and wait() to get the exit status. You

should also catch SIGPIPE if you‘re writing to the child — you may not have found out the exec() failed

by the time you write. This is documented in perlipc.

On systems that follow the spawn() paradigm, open() might do what you expect − unless perl uses a

shell to start your command. In this case the fork()/exec() description still applies.

What‘s wrong with using backticks in a void context?

Strictly speaking, nothing. Stylistically speaking, it‘s not a good way to write maintainable code because

backticks have a (potentially humungous) return value, and you‘re ignoring it. It‘s may also not be very

efficient, because you have to read in all the lines of output, allocate memory for them, and then throw it

away. Too often people are lulled to writing:

‘cp file file.bak‘;

And now they think "Hey, I‘ll just always use backticks to run programs." Bad idea: backticks are for

capturing a program‘s output; the system() function is for running programs.

Consider this line:

‘cat /etc/termcap‘;

You haven‘t assigned the output anywhere, so it just wastes memory (for a little while). Plus you forgot to

check $? to see whether the program even ran correctly. Even if you wrote

print ‘cat /etc/termcap‘;

In most cases, this could and probably should be written as

system("cat /etc/termcap") == 0

or die "cat program failed!";

Which will get the output quickly (as its generated, instead of only at the end) and also check the return

value.

system() also provides direct control over whether shell wildcard processing may take place, whereas

backticks do not.

How can I call backticks without shell processing?

This is a bit tricky. Instead of writing

@ok = ‘grep @opts ’$search_string’ @filenames‘;

You have to do this:

my @ok = ();

if (open(GREP, "−|")) {

while (<GREP>) {

chomp;

push(@ok, $_);

}

close GREP;

18−Oct−1998 Version 5.005_02 109

perlfaq8 Perl Programmers Reference Guide perlfaq8

} else {

exec ’grep’, @opts, $search_string, @filenames;

}

Just as with system(), no shell escapes happen when you exec() a list.

There are more examples of this Safe Pipe Opens in perlipc.

Why can‘t my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS−DOS)?

Because some stdio‘s set error and eof flags that need clearing. The POSIX module defines clearerr()

that you can use. That is the technically correct way to do it. Here are some less reliable workarounds:

1 Try keeping around the seekpointer and go there, like this:

$where = tell(LOG);

seek(LOG, $where, 0);

2 If that doesn‘t work, try seeking to a different part of the file and then back.

3 If that doesn‘t work, try seeking to a different part of the file, reading something, and then seeking

back.

4 If that doesn‘t work, give up on your stdio package and use sysread.

How can I convert my shell script to perl?

Learn Perl and rewrite it. Seriously, there‘s no simple converter. Things that are awkward to do in the shell

are easy to do in Perl, and this very awkwardness is what would make a shell−perl converter nigh−on

impossible to write. By rewriting it, you‘ll think about what you‘re really trying to do, and hopefully will

escape the shell‘s pipeline datastream paradigm, which while convenient for some matters, causes many

inefficiencies.

Can I use perl to run a telnet or ftp session?

Try the Net::FTP, TCP::Client, and Net::Telnet modules (available from CPAN).

http://www.perl.com/CPAN/scripts/netstuff/telnet.emul.shar will also help for emulating the telnet protocol,

but Net::Telnet is quite probably easier to use..

If all you want to do is pretend to be telnet but don‘t need the initial telnet handshaking, then the standard

dual−process approach will suffice:

use IO::Socket; # new in 5.004

$handle = IO::Socket::INET−>new(’www.perl.com:80’)

|| die "can’t connect to port 80 on www.perl.com: $!";

$handle−>autoflush(1);

if (fork()) { # XXX: undef means failure

select($handle);

print while <STDIN>; # everything from stdin to socket

} else {

print while <$handle>; # everything from socket to stdout

}

close $handle;

exit;

How can I write expect in Perl?

Once upon a time, there was a library called chat2.pl (part of the standard perl distribution), which never

really got finished. If you find it somewhere, don‘t use it. These days, your best bet is to look at the Expect

module available from CPAN, which also requires two other modules from CPAN, IO::Pty and IO::Stty.

Is there a way to hide perl‘s command line from programs such as "ps"?

First of all note that if you‘re doing this for security reasons (to avoid people seeing passwords, for example)

then you should rewrite your program so that critical information is never given as an argument. Hiding the

arguments won‘t make your program completely secure.

110 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

To actually alter the visible command line, you can assign to the variable $0 as documented in perlvar. This

won‘t work on all operating systems, though. Daemon programs like sendmail place their state there, as in:

$0 = "orcus [accepting connections]";

I {changed directory, modified my environment} in a perl script. How come the change

disappeared when I exited the script? How do I get my changes to be visible?

Unix

In the strictest sense, it can‘t be done — the script executes as a different process from the shell it was

started from. Changes to a process are not reflected in its parent, only in its own children created after

the change. There is shell magic that may allow you to fake it by eval()ing the script‘s output in

your shell; check out the comp.unix.questions FAQ for details.

How do I close a process‘s filehandle without waiting for it to complete?

Assuming your system supports such things, just send an appropriate signal to the process (see

kill in perlfunc. It‘s common to first send a TERM signal, wait a little bit, and then send a KILL signal to

finish it off.

How do I fork a daemon process?

If by daemon process you mean one that‘s detached (disassociated from its tty), then the following process is

reported to work on most Unixish systems. Non−Unix users should check their Your_OS::Process module

for other solutions.

Open /dev/tty and use the the TIOCNOTTY ioctl on it. See tty(4) for details. Or better yet, you can

just use the POSIX::setsid() function, so you don‘t have to worry about process groups.

Change directory to /

Reopen STDIN, STDOUT, and STDERR so they‘re not connected to the old tty.

Background yourself like this:

fork && exit;

How do I make my program run with sh and csh?

See the eg/nih script (part of the perl source distribution).

How do I find out if I‘m running interactively or not?

Good question. Sometimes −t STDIN and −t STDOUT can give clues, sometimes not.

if (−t STDIN && −t STDOUT) {

print "Now what? ";

}

On POSIX systems, you can test whether your own process group matches the current process group of your

controlling terminal as follows:

use POSIX qw/getpgrp tcgetpgrp/;

open(TTY, "/dev/tty") or die $!;

$tpgrp = tcgetpgrp(TTY);

$pgrp = getpgrp();

if ($tpgrp == $pgrp) {

print "foreground\n";

} else {

print "background\n";

}

How do I timeout a slow event?

Use the alarm() function, probably in conjunction with a signal handler, as documented Signals in perlipc

and chapter 6 of the Camel. You may instead use the more flexible Sys::AlarmCall module available from

18−Oct−1998 Version 5.005_02 111

perlfaq8 Perl Programmers Reference Guide perlfaq8

CPAN.

How do I set CPU limits?

Use the BSD::Resource module from CPAN.

How do I avoid zombies on a Unix system?

Use the reaper code from Signals in perlipc to call wait() when a SIGCHLD is received, or else use the

double−fork technique described in fork.

How do I use an SQL database?

There are a number of excellent interfaces to SQL databases. See the DBD::* modules available from

http://www.perl.com/CPAN/modules/dbperl/DBD . A lot of information on this can be found at

http://www.hermetica.com/technologia/perl/DBI/index.html .

How do I make a system() exit on control−C?

You can‘t. You need to imitate the system() call (see perlipc for sample code) and then have a signal

handler for the INT signal that passes the signal on to the subprocess. Or you can check for it:

$rc = system($cmd);

if ($rc & 127) { die "signal death" }

How do I open a file without blocking?

If you‘re lucky enough to be using a system that supports non−blocking reads (most Unixish systems do),

you need only to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module in conjunction with

sysopen():

use Fcntl;

sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)

or die "can’t open /tmp/somefile: $!":

How do I install a CPAN module?

The easiest way is to have the CPAN module do it for you. This module comes with perl version 5.004 and

later. To manually install the CPAN module, or any well−behaved CPAN module for that matter, follow

these steps:

1 Unpack the source into a temporary area.

perl Makefile.PL

make

make test

make install

If your version of perl is compiled without dynamic loading, then you just need to replace step 3 (make)

with make perl and you will get a new perl binary with your extension linked in.

See ExtUtils::MakeMaker for more details on building extensions. See also the next question.

What‘s the difference between require and use?

Perl offers several different ways to include code from one file into another. Here are the deltas between the

various inclusion constructs:

1) do $file is like eval ‘cat $file‘, except the former:

1.1: searches @INC and updates %INC.

1.2: bequeaths an *unrelated* lexical scope on the eval’ed code.

112 Version 5.005_02 18−Oct−1998

perlfaq8 Perl Programmers Reference Guide perlfaq8

2) require $file is like do $file, except the former:

2.1: checks for redundant loading, skipping already loaded files.

2.2: raises an exception on failure to find, compile, or execute $file.

3) require Module is like require "Module.pm", except the former:

3.1: translates each "::" into your system’s directory separator.

3.2: primes the parser to disambiguate class Module as an indirect object.

4) use Module is like require Module, except the former:

4.1: loads the module at compile time, not run−time.

4.2: imports symbols and semantics from that package to the current one.

In general, you usually want use and a proper Perl module.

How do I keep my own module/library directory?

When you build modules, use the PREFIX option when generating Makefiles:

perl Makefile.PL PREFIX=/u/mydir/perl

then either set the PERL5LIB environment variable before you run scripts that use the modules/libraries (see

perlrun) or say

use lib ’/u/mydir/perl’;

See Perl‘s lib for more information.

How do I add the directory my program lives in to the module/library search path?

use FindBin;

use lib "$FindBin::Bin";

use your_own_modules;

How do I add a directory to my include path at runtime?

Here are the suggested ways of modifying your include path:

the PERLLIB environment variable

the PERL5LIB environment variable

the perl −Idir commpand line flag

the use lib pragma, as in

use lib "$ENV{HOME}/myown_perllib";

The latter is particularly useful because it knows about machine dependent architectures. The lib.pm

pragmatic module was first included with the 5.002 release of Perl.

AUTHOR AND COPYRIGHT

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You

are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit would be courteous but is not required.

18−Oct−1998 Version 5.005_02 113

perlfaq9 Perl Programmers Reference Guide perlfaq9

NAME

perlfaq9 − Networking ($Revision: 1.20 $, $Date: 1998/06/22 18:31:09 $)

DESCRIPTION

This section deals with questions related to networking, the internet, and a few on the web.

My CGI script runs from the command line but not the browser. (500 Server Error)

If you can demonstrate that you‘ve read the following FAQs and that your problem isn‘t something simple

that can be easily answered, you‘ll probably receive a courteous and useful reply to your question if you post

it on comp.infosystems.www.authoring.cgi (if it‘s something to do with HTTP, HTML, or the CGI

protocols). Questions that appear to be Perl questions but are really CGI ones that are posted to

comp.lang.perl.misc may not be so well received.

The useful FAQs and related documents are:

CGI FAQ

http://www.webthing.com/page.cgi/cgifaq

Web FAQ

http://www.boutell.com/faq/

WWW Security FAQ

http://www.w3.org/Security/Faq/

HTTP Spec

http://www.w3.org/pub/WWW/Protocols/HTTP/

HTML Spec

http://www.w3.org/TR/REC−html40/

http://www.w3.org/pub/WWW/MarkUp/

CGI Spec

http://www.w3.org/CGI/

CGI Security FAQ

http://www.go2net.com/people/paulp/cgi−security/safe−cgi.txt

How can I get better error messages from a CGI program?

Use the CGI::Carp module. It replaces warn and die, plus the normal Carp modules carp, croak, and

confess functions with more verbose and safer versions. It still sends them to the normal server error log.

use CGI::Carp;

warn "This is a complaint";

die "But this one is serious";

The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to

catch compile−time warnings as well:

BEGIN {

use CGI::Carp qw(carpout);

open(LOG, ">>/var/local/cgi−logs/mycgi−log")

or die "Unable to append to mycgi−log: $!\n";

carpout(*LOG);

}

You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging,

but might confuse the end user.

use CGI::Carp qw(fatalsToBrowser);

die "Bad error here";

114 Version 5.005_02 18−Oct−1998

perlfaq9 Perl Programmers Reference Guide perlfaq9

Even if the error happens before you get the HTTP header out, the module will try to take care of this to

avoid the dreaded server 500 errors. Normal warnings still go out to the server error log (or wherever you‘ve

sent them with carpout) with the application name and date stamp prepended.

How do I remove HTML from a string?

The most correct way (albeit not the fastest) is to use HTML::Parse from CPAN (part of the libwww−perl

distribution, which is a must−have module for all web hackers).

Many folks attempt a simple−minded regular expression approach, like s/<.*?>//g, but that fails in many

cases because the tags may continue over line breaks, they may contain quoted angle−brackets, or HTML

comment may be present. Plus folks forget to convert entities, like < for example.

Here‘s one "simple−minded" approach, that works for most files:

#!/usr/bin/perl −p0777

s/<(?:[^>’"]*|([’"]).*?\1)*>//gs

If you want a more complete solution, see the 3−stage striphtml program in

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz .

Here are some tricky cases that you should think about when picking a solution:

<IMG SRC = "foo.gif"

ALT = "A > B">

<!−− <A comment> −−>

<# Just data #>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>

If HTML comments include other tags, those solutions would also break on text like this:

<!−− This section commented out.

−−>

How do I extract URLs?

A quick but imperfect approach is

#!/usr/bin/perl −n00

# qxurl − tchrist@perl.com

print "$2\n" while m{

< \s*

A \s+ HREF \s* = \s* (["’]) (.*?) \1

\s* >

}gsix;

This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal

with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs

about 100x faster than a more "complete" solution using the LWP suite of modules, such as the

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.

How do I download a file from the user‘s machine? How do I open a file on another machine?

In the context of an HTML form, you can use what‘s known as multipart/form−data encoding. The

CGI.pm module (available from CPAN) supports this in the start_multipart_form() method, which

isn‘t the same as the startform() method.

18−Oct−1998 Version 5.005_02 115

perlfaq9 Perl Programmers Reference Guide perlfaq9

How do I make a pop−up menu in HTML?

Use the <SELECT> and <OPTION> tags. The CGI.pm module (available from CPAN) supports this

widget, as well as many others, including some that it cleverly synthesizes on its own.

How do I fetch an HTML file?

One approach, if you have the lynx text−based HTML browser installed on your system, is this:

$html_code = ‘lynx −source $url‘;

$text_data = ‘lynx −dump $url‘;

The libwww−perl (LWP) modules from CPAN provide a more powerful way to do this. They work through

proxies, and don‘t require lynx:

# simplest version

use LWP::Simple;

$content = get($URL);

# or print HTML from a URL

use LWP::Simple;

getprint "http://www.sn.no/libwww−perl/";

# or print ASCII from HTML from a URL

use LWP::Simple;

use HTML::Parse;

use HTML::FormatText;

my ($html, $ascii);

$html = get("http://www.perl.com/");

defined $html

or die "Can’t fetch HTML from http://www.perl.com/";

$ascii = HTML::FormatText−>new−>format(parse_html($html));

print $ascii;

How do I automate an HTML form submission?

If you‘re submitting values using the GET method, create a URL and encode the form using the

query_form method:

use LWP::Simple;

use URI::URL;

my $url = url(’http://www.perl.com/cgi−bin/cpan_mod’);

$url−>query_form(module => ’DB_File’, readme => 1);

$content = get($url);

If you‘re using the POST method, create your own user agent and encode the content appropriately.

use HTTP::Request::Common qw(POST);

use LWP::UserAgent;

$ua = LWP::UserAgent−>new();

my $req = POST ’http://www.perl.com/cgi−bin/cpan_mod’,

[ module => ’DB_File’, readme => 1 ];

$content = $ua−>request($req)−>as_string;

How do I decode or create those %−encodings on the web?

Here‘s an example of decoding:

$string = "http://altavista.digital.com/cgi−bin/query?pg=q&what=news&fmt=.&q=%2Bc

$string =~ s/%([a−fA−F0−9]{2})/chr(hex($1))/ge;

Encoding is a bit harder, because you can‘t just blindly change all the non−alphanumunder character (\W)

into their hex escapes. It‘s important that characters with special meaning like / and ? not be translated.

116 Version 5.005_02 18−Oct−1998

perlfaq9 Perl Programmers Reference Guide perlfaq9

Probably the easiest way to get this right is to avoid reinventing the wheel and just use the URI::Escape

module, which is part of the libwww−perl package (LWP) available from CPAN.

How do I redirect to another page?

Instead of sending back a Content−Type as the headers of your reply, send back a Location: header.

Officially this should be a URI: header, so the CGI.pm module (available from CPAN) sends back both:

Location: http://www.domain.com/newpage

URI: http://www.domain.com/newpage

Note that relative URLs in these headers can cause strange effects because of "optimizations" that servers do.

$url = "http://www.perl.com/CPAN/";

print "Location: $url\n\n";

exit;

To be correct to the spec, each of those "\n" should really each be "\015\012", but unless you‘re stuck

on MacOS, you probably won‘t notice.

How do I put a password on my web pages?

That depends. You‘ll need to read the documentation for your web server, or perhaps check some of the

other FAQs referenced above.

How do I edit my .htpasswd and .htgroup files with Perl?

The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a consistent OO interface to these

files, regardless of how they‘re stored. Databases may be text, dbm, Berkley DB or any database with a DBI

compatible driver. HTTPD::UserAdmin supports files used by the ‘Basic’ and ‘Digest’ authentication

schemes. Here‘s an example:

use HTTPD::UserAdmin ();

HTTPD::UserAdmin

−>new(DB => "/foo/.htpasswd")

−>add($username => $password);

How do I make sure users can‘t enter values into a form that cause my CGI script to do bad

things?

Read the CGI security FAQ, at http://www−genome.wi.mit.edu/WWW/faqs/www−security−faq.html, and

the Perl/CGI FAQ at http://www.perl.com/CPAN/doc/FAQs/cgi/perl−cgi−faq.html.

In brief: use tainting (see perlsec), which makes sure that data from outside your script (eg, CGI parameters)

are never used in eval or system calls. In addition to tainting, never use the single−argument form of

system() or exec(). Instead, supply the command and arguments as a list, which prevents shell

globbing.

How do I parse a mail header?

For a quick−and−dirty solution, try this solution derived from page 222 of the 2nd edition of "Programming

Perl":

$/ = ’’;

$header = <MSG>;

$header =~ s/\n\s+/ /g; # merge continuation lines

%head = ( UNIX_FROM_LINE, split /^([−\w]+):\s*/m, $header );

That solution doesn‘t do well if, for example, you‘re trying to maintain all the Received lines. A more

complete approach is to use the Mail::Header module from CPAN (part of the MailTools package).

How do I decode a CGI form?

You use a standard module, probably CGI.pm. Under no circumstances should you attempt to do so by

hand!

18−Oct−1998 Version 5.005_02 117

perlfaq9 Perl Programmers Reference Guide perlfaq9

You‘ll see a lot of CGI programs that blindly read from STDIN the number of bytes equal to

CONTENT_LENGTH for POSTs, or grab QUERY_STRING for decoding GETs. These programs are very

poorly written. They only work sometimes. They typically forget to check the return value of the read()

system call, which is a cardinal sin. They don‘t handle HEAD requests. They don‘t handle multipart forms

used for file uploads. They don‘t deal with GET/POST combinations where query fields are in more than

one place. They don‘t deal with keywords in the query string.

In short, they‘re bad hacks. Resist them at all costs. Please do not be tempted to reinvent the wheel.

Instead, use the CGI.pm or CGI_Lite.pm (available from CPAN), or if you‘re trapped in the module−free

land of perl1 .. perl4, you might look into cgi−lib.pl (available from

http://www.bio.cam.ac.uk/web/form.html).

Make sure you know whether to use a GET or a POST in your form. GETs should only be used for

something that doesn‘t update the server. Otherwise you can get mangled databases and repeated feedback

mail messages. The fancy word for this is ‘‘idempotency‘’. This simply means that there should be no

difference between making a GET request for a particular URL once or multiple times. This is because the

HTTP protocol definition says that a GET request may be cached by the browser, or server, or an intervening

proxy. POST requests cannot be cached, because each request is independent and matters. Typically, POST

requests change or depend on state on the server (query or update a database, send mail, or purchase a

computer).

How do I check a valid mail address?

You can‘t, at least, not in real time. Bummer, eh?

Without sending mail to the address and seeing whether there‘s a human on the other hand to answer you,

you cannot determine whether a mail address is valid. Even if you apply the mail header standard, you can

have problems, because there are deliverable addresses that aren‘t RFC−822 (the mail header standard)

compliant, and addresses that aren‘t deliverable which are compliant.

Many are tempted to try to eliminate many frequently−invalid mail addresses with a simple regexp, such as

/^[\w.−]+\@([\w.−]\.)+\w+$/. It‘s a very bad idea. However, this also throws out many valid

ones, and says nothing about potential deliverability, so is not suggested. Instead, see

http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , which actually checks against the

full RFC spec (except for nested comments), looks for addresses you may not wish to accept mail to (say,

Bill Clinton or your postmaster), and then makes sure that the hostname given can be looked up in the DNS

MX records. It‘s not fast, but it works for what it tries to do.

Our best advice for verifying a person‘s mail address is to have them enter their address twice, just as you

normally do to change a password. This usually weeds out typos. If both versions match, send mail to that

address with a personal message that looks somewhat like:

Dear someuser@host.com,

Please confirm the mail address you gave us Wed May 6 09:38:41

MDT 1998 by replying to this message. Include the string

"Rumpelstiltskin" in that reply, but spelled in reverse; that is,

start with "Nik...". Once this is done, your confirmed address will

be entered into our records.

If you get the message back and they‘ve followed your directions, you can be reasonably assured that it‘s

real.

A related strategy that‘s less open to forgery is to give them a PIN (personal ID number). Record the address

and PIN (best that it be a random one) for later processing. In the mail you send, ask them to include the PIN

in their reply. But if it bounces, or the message is included via a ‘‘vacation‘’ script, it‘ll be there anyway. So

it‘s best to ask them to mail back a slight alteration of the PIN, such as with the characters reversed, one

added or subtracted to each digit, etc.

118 Version 5.005_02 18−Oct−1998

perlfaq9 Perl Programmers Reference Guide perlfaq9

How do I decode a MIME/BASE64 string?

The MIME−tools package (available from CPAN) handles this and a lot more. Decoding BASE64 becomes

as simple as:

use MIME::base64;

$decoded = decode_base64($encoded);

A more direct approach is to use the unpack() function‘s "u" format after minor transliterations:

tr#A−Za−z0−9+/##cd; # remove non−base64 chars

tr#A−Za−z0−9+/# −_#; # convert to uuencoded format

$len = pack("c", 32 + 0.75*length); # compute length byte

print unpack("u", $len . $_); # uudecode and print

How do I return the user‘s mail address?

On systems that support getpwuid, the $< variable and the Sys::Hostname module (which is part of the

standard perl distribution), you can probably try using something like this:

use Sys::Hostname;

$address = sprintf(’%s@%s’, getpwuid($<), hostname);

Company policies on mail address can mean that this generates addresses that the company‘s mail system

will not accept, so you should ask for users’ mail addresses when this matters. Furthermore, not all systems

on which Perl runs are so forthcoming with this information as is Unix.

The Mail::Util module from CPAN (part of the MailTools package) provides a mailaddress() function

that tries to guess the mail address of the user. It makes a more intelligent guess than the code above, using

information given when the module was installed, but it could still be incorrect. Again, the best way is often

just to ask the user.

How do I send mail?

Use the sendmail program directly:

open(SENDMAIL, "|/usr/lib/sendmail −oi −t −odq")

or die "Can’t fork for sendmail: $!\n";

print SENDMAIL <<"EOF";

From: User Originating Mail <me\@host>

To: Final Destination <you\@otherhost>

Subject: A relevant subject line

Body of the message goes here, in as many lines as you like.

EOF

close(SENDMAIL) or warn "sendmail didn’t close nicely";

The −oi option prevents sendmail from interpreting a line consisting of a single dot as "end of message".

The −t option says to use the headers to decide who to send the message to, and −odq says to put the

message into the queue. This last option means your message won‘t be immediately delivered, so leave it

out if you want immediate delivery.

Or use the CPAN module Mail::Mailer:

use Mail::Mailer;

$mailer = Mail::Mailer−>new();

$mailer−>open({ From => $from_address,

To => $to_address,

Subject => $subject,

})

or die "Can’t open: $!\n";

print $mailer $body;

18−Oct−1998 Version 5.005_02 119

perlfaq9 Perl Programmers Reference Guide perlfaq9

$mailer−>close();

The Mail::Internet module uses Net::SMTP which is less Unix−centric than Mail::Mailer, but less reliable.

Avoid raw SMTP commands. There are many reasons to use a mail transport agent like sendmail. These

include queueing, MX records, and security.

How do I read mail?

Use the Mail::Folder module from CPAN (part of the MailFolder package) or the Mail::Internet module

from CPAN (also part of the MailTools package).

# sending mail

use Mail::Internet;

use Mail::Header;

# say which mail host to use

$ENV{SMTPHOSTS} = ’mail.frii.com’;

# create headers

$header = new Mail::Header;

$header−>add(’From’, ’gnat@frii.com’);

$header−>add(’Subject’, ’Testing’);

$header−>add(’To’, ’gnat@frii.com’);

# create body

$body = ’This is a test, ignore’;

# create mail object

$mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);

# send it

$mail−>smtpsend or die;

Often a module is overkill, though. Here‘s a mail sorter.

#!/usr/bin/perl

# bysub1 − simple sort by subject

my(@msgs, @sub);

my $msgno = −1;

$/ = ’’; # paragraph reads

while (<>) {

if (/^From/m) {

/^Subject:\s*(?:Re:\s*)*(.*)/mi;

$sub[++$msgno] = lc($1) || ’’;

}

$msgs[$msgno] .= $_;

}

for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {

print $msgs[$i];

}

Or more succinctly,

#!/usr/bin/perl −n00

# bysub2 − awkish sort−by−subject

BEGIN { $msgno = −1 }

$sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;

$msg[$msgno] .= $_;

END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }

How do I find out my hostname/domainname/IP address?

The normal way to find your own hostname is to call the ‘hostname‘ program. While sometimes

expedient, this has some problems, such as not knowing whether you‘ve got the canonical name or not. It‘s

one of those tradeoffs of convenience versus portability.

120 Version 5.005_02 18−Oct−1998

perlfaq9 Perl Programmers Reference Guide perlfaq9

The Sys::Hostname module (part of the standard perl distribution) will give you the hostname after which

you can find out the IP address (assuming you have working DNS) with a gethostbyname() call.

use Socket;

use Sys::Hostname;

my $host = hostname();

my $addr = inet_ntoa(scalar(gethostbyname($name)) || ’localhost’);

Probably the simplest way to learn your DNS domain name is to grok it out of /etc/resolv.conf, at least under

Unix. Of course, this assumes several things about your resolv.conf configuration, including that it exists.

(We still need a good DNS domain name−learning method for non−Unix systems.)

How do I fetch a news article or the active newsgroups?

Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. This can make tasks like

fetching the newsgroup list as simple as:

perl −MNews::NNTPClient

−e ’print News::NNTPClient−>new−>list("newsgroups")’

How do I fetch/put an FTP file?

LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also available from CPAN) is more

complex but can put as well as fetch.

How can I do RPC in Perl?

A DCE::RPC module is being developed (but is not yet available), and will be released as part of the

DCE−Perl package (available from CPAN). No ONC::RPC module is known.

AUTHOR AND COPYRIGHT

When included as part of the Standard Version of Perl, or as part of its complete documentation whether

printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any

distribution of this file or derivatives thereof outside of that package require that special arrangements be

made with copyright holder.

Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You

are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A

simple comment in the code giving credit would be courteous but is not required.

18−Oct−1998 Version 5.005_02 121

perl Perl Programmers Reference Guide perl

NAME

perl − Practical Extraction and Report Language

SYNOPSIS

perl [ −sTuU ]

[ −hv ] [ −V[:configvar] ]

[ −cw ] [ −d[:debugger] ] [ −D[number/list] ]

[ −pna ] [ −Fpattern ] [ −l[octal] ] [ −0[octal] ]

[ −Idir ] [ −m[−]module ] [ −M[−]‘module...’ ]

[ −P ]

[ −S ]

[ −x[dir] ]

[ −i[extension] ]

[ −e ‘command’ ] [ — ] [ programfile ] [ argument ]...

For ease of access, the Perl manual has been split up into a number of sections:

perl Perl overview (this section)

perldelta Perl changes since previous version

perlfaq Perl frequently asked questions

perltoc Perl documentation table of contents

perldata Perl data structures

perlsyn Perl syntax

perlop Perl operators and precedence

perlre Perl regular expressions

perlrun Perl execution and options

perlfunc Perl builtin functions

perlvar Perl predefined variables

perlsub Perl subroutines

perlmod Perl modules: how they work

perlmodlib Perl modules: how to write and use

perlmodinstall Perl modules: how to install from CPAN

perlform Perl formats

perllocale Perl locale support

perlref Perl references

perldsc Perl data structures intro

perllol Perl data structures: lists of lists

perltoot Perl OO tutorial

perlobj Perl objects

perltie Perl objects hidden behind simple variables

perlbot Perl OO tricks and examples

perlipc Perl interprocess communication

perldebug Perl debugging

perldiag Perl diagnostic messages

perlsec Perl security

perltrap Perl traps for the unwary

perlport Perl portability guide

perlstyle Perl style guide

perlpod Perl plain old documentation

perlbook Perl book information

perlembed Perl ways to embed perl in your C or C++ application

perlapio Perl internal IO abstraction interface

perlxs Perl XS application programming interface

122 Version 5.005_02 18−Oct−1998

perl Perl Programmers Reference Guide perl

perlxstut Perl XS tutorial

perlguts Perl internal functions for those doing extensions

perlcall Perl calling conventions from C

perlhist Perl history records

(If you‘re intending to read these straight through for the first time, the suggested order will tend to reduce

the number of forward references.)

By default, all of the above manpages are installed in the /usr/local/man/ directory.

Extensive additional documentation for Perl modules is available. The default configuration for perl will

place this additional documentation in the /usr/local/lib/perl5/man directory (or else in the man subdirectory

of the Perl library directory). Some of this additional documentation is distributed standard with Perl, but

you‘ll also find documentation for third−party modules there.

You should be able to view Perl‘s documentation with your man(1) program by including the proper

directories in the appropriate start−up files, or in the MANPATH environment variable. To find out where

the configuration has installed the manpages, type:

perl −V:man.dir

If the directories have a common stem, such as /usr/local/man/man1 and /usr/local/man/man3, you need

only to add that stem (/usr/local/man) to your man(1) configuration files or your MANPATH environment

variable. If they do not share a stem, you‘ll have to add both stems.

If that doesn‘t work for some reason, you can still use the supplied perldoc script to view module

information. You might also look into getting a replacement man program.

If something strange has gone wrong with your program and you‘re not sure where you should look for help,

try the −w switch first. It will often point out exactly where the trouble is.

DESCRIPTION

Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and

printing reports based on that information. It‘s also a good language for many system management tasks.

The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant,

minimal).

Perl combines (in the author‘s opinion, anyway) some of the best features of C, sed, awk, and sh, so people

familiar with those languages should have little difficulty with it. (Language historians will also note some

vestiges of csh, Pascal, and even BASIC−PLUS.) Expression syntax corresponds quite closely to C

expression syntax. Unlike most Unix utilities, Perl does not arbitrarily limit the size of your data—if you‘ve

got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And

the tables used by hashes (previously called "associative arrays") grow as necessary to prevent degraded

performance. Perl uses sophisticated pattern matching techniques to scan large amounts of data very

quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files

look like hashes. Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism which

prevents many stupid security holes.

If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must

run a little faster, and you don‘t want to write the silly thing in C, then Perl may be for you. There are also

translators to turn your sed and awk scripts into Perl scripts.

But wait, there‘s more...

Perl version 5 is nearly a complete rewrite, and provides the following additional benefits:

Many usability enhancements

It is now possible to write much more readable Perl code (even within regular expressions).

Formerly cryptic variable names can be replaced by mnemonic identifiers. Error messages are more

informative, and the optional warnings will catch many of the mistakes a novice might make. This

cannot be stressed enough. Whenever you get mysterious behavior, try the −w switch!!! Whenever

18−Oct−1998 Version 5.005_02 123

perl Perl Programmers Reference Guide perl

you don‘t get mysterious behavior, try using −w anyway.

Simplified grammar

The new yacc grammar is one half the size of the old one. Many of the arbitrary grammar rules have

been regularized. The number of reserved words has been cut by 2/3. Despite this, nearly all old Perl

scripts will continue to work unchanged.

Lexical scoping

Perl variables may now be declared within a lexical scope, like "auto" variables in C. Not only is this

more efficient, but it contributes to better privacy for "programming in the large". Anonymous

subroutines exhibit deep binding of lexical variables (closures).

Arbitrarily nested data structures

Any scalar value, including any array element, may now contain a reference to any other variable or

subroutine. You can easily create anonymous variables and subroutines. Perl manages your

reference counts for you.

Modularity and reusability

The Perl library is now defined in terms of modules which can be easily shared among various

packages. A package may choose to import all or a portion of a module‘s published interface.

Pragmas (that is, compiler directives) are defined and used by the same mechanism.

Object−oriented programming

A package can function as a class. Dynamic multiple inheritance and virtual methods are supported

in a straightforward manner and with very little new syntax. Filehandles may now be treated as

objects.

Embeddable and Extensible

Perl may now be embedded easily in your C or C++ application, and can either call or be called by

your routines through a documented interface. The XS preprocessor is provided to make it easy to

glue your C or C++ routines into Perl. Dynamic loading of modules is supported, and Perl itself can

be made into a dynamic library.

POSIX compliant

A major new module is the POSIX module, which provides access to all available POSIX routines

and definitions, via object classes where appropriate.

Package constructors and destructors

The new BEGIN and END blocks provide means to capture control as a package is being compiled,

and after the program exits. As a degenerate case they work just like awk‘s BEGIN and END when

you use the −p or −n switches.

Multiple simultaneous DBM implementations

A Perl program may now access DBM, NDBM, SDBM, GDBM, and Berkeley DB files from the

same script simultaneously. In fact, the old dbmopen interface has been generalized to allow any

variable to be tied to an object class which defines its access methods.

Subroutine definitions may now be autoloaded

In fact, the AUTOLOAD mechanism also allows you to define any arbitrary semantics for undefined

subroutine calls. It‘s not for just autoloading.

Regular expression enhancements

You can now specify nongreedy quantifiers. You can now do grouping without creating a

backreference. You can now write regular expressions with embedded whitespace and comments for

readability. A consistent extensibility mechanism has been added that is upwardly compatible with

all old regular expressions.

124 Version 5.005_02 18−Oct−1998

perl Perl Programmers Reference Guide perl

Innumerable Unbundled Modules

The Comprehensive Perl Archive Network described in perlmodlib contains hundreds of

plug−and−play modules full of reusable code. See http://www.perl.com/CPAN for a site near you.

Compilability

While not yet in full production mode, a working perl−to−C compiler does exist. It can generate

portable byte code, simple C, or optimized C code.

Okay, that‘s definitely enough hype.

ENVIRONMENT

See perlrun.

AUTHOR

Larry Wall <larry@wall.org, with the help of oodles of other folks.

If your Perl success stories and testimonials may be of help to others who wish to advocate the use of Perl in

their applications, or if you wish to simply express your gratitude to Larry and the Perl developers, please

write to <perl−thanks@perl.org.

FILES

"/tmp/perl−e$$" temporary file for −e commands

"@INC" locations of perl libraries

SEE ALSO

a2p awk to perl translator

s2p sed to perl translator

DIAGNOSTICS

The −w switch produces some lovely diagnostics.

See perldiag for explanations of all Perl‘s diagnostics. The use diagnostics pragma automatically

turns Perl‘s normally terse warnings and errors into these longer forms.

Compilation errors will tell you the line number of the error, with an indication of the next token or token

type that was to be examined. (In the case of a script passed to Perl via −e switches, each −e is counted as

one line.)

Setuid scripts have additional constraints that can produce error messages such as "Insecure dependency".

See perlsec.

Did we mention that you should definitely consider using the −w switch?

BUGS

The −w switch is not mandatory.

Perl is at the mercy of your machine‘s definitions of various operations such as type casting, atof(), and

floating−point output with sprintf().

If your stdio requires a seek or eof between reads and writes on a particular stream, so does Perl. (This

doesn‘t apply to sysread() and syswrite().)

While none of the built−in data types have any arbitrary size limits (apart from memory size), there are still a

few arbitrary limits: a given variable name may not be longer than 255 characters, and no component of

your PATH may be longer than 255 if you use −S. A regular expression may not compile to more than

32767 bytes internally.

You may mail your bug reports (be sure to include full configuration information as output by the myconfig

program in the perl source tree, or by perl −V) to <perlbug@perl.com. If you‘ve succeeded in compiling

perl, the perlbug script in the utils/ subdirectory can be used to help mail in a bug report.

18−Oct−1998 Version 5.005_02 125

perl Perl Programmers Reference Guide perl

Perl actually stands for Pathologically Eclectic Rubbish Lister, but don‘t tell anyone I said that.

NOTES

The Perl motto is "There‘s more than one way to do it." Divining how many more is left as an exercise to

the reader.

The three principal virtues of a programmer are Laziness, Impatience, and Hubris. See the Camel Book for

why.

126 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

NAME

perldelta − what‘s new for perl5.004

DESCRIPTION

This document describes differences between the 5.003 release (as documented in Programming Perl,

second edition—the Camel Book) and this one.

Supported Environments

Perl5.004 builds out of the box on Unix, Plan 9, LynxOS, VMS, OS/2, QNX, AmigaOS, and Windows NT.

Perl runs on Windows 95 as well, but it cannot be built there, for lack of a reasonable command interpreter.

Core Changes

Most importantly, many bugs were fixed, including several security problems. See the Changes file in the

distribution for details.

List assignment to %ENV works

%ENV = () and %ENV = @list now work as expected (except on VMS where it generates a fatal error).

"Can‘t locate Foo.pm in @INC" error now lists @INC

Compilation option: Binary compatibility with 5.003

There is a new Configure question that asks if you want to maintain binary compatibility with Perl 5.003. If

you choose binary compatibility, you do not have to recompile your extensions, but you might have symbol

conflicts if you embed Perl in another application, just as in the 5.003 release. By default, binary

compatibility is preserved at the expense of symbol table pollution.

$PERL5OPT environment variable

You may now put Perl options in the $PERL5OPT environment variable. Unless Perl is running with taint

checks, it will interpret this variable as if its contents had appeared on a "#!perl" line at the beginning of your

script, except that hyphens are optional. PERL5OPT may only be used to set the following switches:

−[DIMUdmw].

Limitations on −M, −m, and −T options

The −M and −m options are no longer allowed on the #! line of a script. If a script needs a module, it should

invoke it with the use pragma.

The −T option is also forbidden on the #! line of a script, unless it was present on the Perl command line.

Due to the way #! works, this usually means that −T must be in the first argument. Thus:

#!/usr/bin/perl −T −w

will probably work for an executable script invoked as scriptname, while:

#!/usr/bin/perl −w −T

will probably fail under the same conditions. (Non−Unix systems will probably not follow this rule.) But

perl scriptname is guaranteed to fail, since then there is no chance of −T being found on the command

line before it is found on the #! line.

More precise warnings

If you removed the −w option from your Perl 5.003 scripts because it made Perl too verbose, we recommend

that you try putting it back when you upgrade to Perl 5.004. Each new perl version tends to remove some

undesirable warnings, while adding new warnings that may catch bugs in your scripts.

Deprecated: Inherited AUTOLOAD for non−methods

Before Perl 5.004, AUTOLOAD functions were looked up as methods (using the @ISA hierarchy), even when

the function to be autoloaded was called as a plain function (e.g. Foo::bar()), not a method (e.g.

Foo−>bar() or $obj−>bar()).

18−Oct−1998 Version 5.005_02 127

perl5004delta Perl Programmers Reference Guide perl5004delta

Perl 5.005 will use method lookup only for methods’ AUTOLOADs. However, there is a significant base of

existing code that may be using the old behavior. So, as an interim step, Perl 5.004 issues an optional

warning when a non−method uses an inherited AUTOLOAD.

The simple rule is: Inheritance will not work when autoloading non−methods. The simple fix for old code

is: In any module that used to depend on inheriting AUTOLOAD for non−methods from a base class named

BaseClass, execute *AUTOLOAD = \&BaseClass::AUTOLOAD during startup.

Previously deprecated %OVERLOAD is no longer usable

Using %OVERLOAD to define overloading was deprecated in 5.003. Overloading is now defined using the

overload pragma. %OVERLOAD is still used internally but should not be used by Perl scripts. See overload

for more details.

Subroutine arguments created only when they‘re modified

In Perl 5.004, nonexistent array and hash elements used as subroutine parameters are brought into existence

only if they are actually assigned to (via @_).

Earlier versions of Perl vary in their handling of such arguments. Perl versions 5.002 and 5.003 always

brought them into existence. Perl versions 5.000 and 5.001 brought them into existence only if they were not

the first argument (which was almost certainly a bug). Earlier versions of Perl never brought them into

existence.

For example, given this code:

undef @a; undef %a;

sub show { print $_[0] };

sub change { $_[0]++ };

show($a[2]);

change($a{b});

After this code executes in Perl 5.004, $a{b} exists but $a[2] does not. In Perl 5.002 and 5.003, both

$a{b} and $a[2] would have existed (but $a[2]‘s value would have been undefined).

Group vector changeable with $)

The $) special variable has always (well, in Perl 5, at least) reflected not only the current effective group,

but also the group list as returned by the getgroups() C function (if there is one). However, until this

release, there has not been a way to call the setgroups() C function from Perl.

In Perl 5.004, assigning to $) is exactly symmetrical with examining it: The first number in its string value

is used as the effective gid; if there are any numbers after the first one, they are passed to the

setgroups() C function (if there is one).

Fixed parsing of $$<digit, &$<digit, etc.

Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For example, "$$0"

was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly) fixed in Perl 5.004.

However, the developers of Perl 5.004 could not fix this bug completely, because at least two widely−used

modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets "$$<digit" in the

old (broken) way inside strings; but it generates this message as a warning. And in Perl 5.005, this special

treatment will cease.

Fixed localization of $<digit, $&, etc.

Perl versions before 5.004 did not always properly localize the regex−related special variables. Perl 5.004

does localize them, as the documentation has always said it should. This may result in $1, $2, etc. no

longer being set where existing programs use them.

No resetting of $. on implicit close

The documentation for Perl 5.0 has always stated that $. is not reset when an already−open file handle is

reopened with no intervening call to close. Due to a bug, perl versions 5.000 through 5.003 did reset $.

under that circumstance; Perl 5.004 does not.

128 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

wantarray may return undef

The wantarray operator returns true if a subroutine is expected to return a list, and false otherwise. In

Perl 5.004, wantarray can also return the undefined value if a subroutine‘s return value will not be used at

all, which allows subroutines to avoid a time−consuming calculation of a return value if it isn‘t going to be

used.

eval EXPR determines value of EXPR in scalar context

Perl (version 5) used to determine the value of EXPR inconsistently, sometimes incorrectly using the

surrounding context for the determination. Now, the value of EXPR (before being parsed by eval) is always

determined in a scalar context. Once parsed, it is executed as before, by providing the context that the scope

surrounding the eval provided. This change makes the behavior Perl4 compatible, besides fixing bugs

resulting from the inconsistent behavior. This program:

@a = qw(time now is time);

print eval @a;

print ’|’, scalar eval @a;

used to print something like "timenowis881399109|4", but now (and in perl4) prints "4|4".

Changes to tainting checks

A bug in previous versions may have failed to detect some insecure conditions when taint checks are turned

on. (Taint checks are used in setuid or setgid scripts, or when explicitly turned on with the −T invocation

option.) Although it‘s unlikely, this may cause a previously−working script to now fail — which should be

construed as a blessing, since that indicates a potentially−serious security hole was just plugged.

The new restrictions when tainting include:

No glob() or <*

These operators may spawn the C shell (csh), which cannot be made safe. This restriction will be

lifted in a future version of Perl when globbing is implemented without the use of an external program.

No spawning if tainted $CDPATH, $ENV, $BASH_ENV

These environment variables may alter the behavior of spawned programs (especially shells) in ways

that subvert security. So now they are treated as dangerous, in the manner of $IFS and $PATH.

No spawning if tainted $TERM doesn‘t look like a terminal name

Some termcap libraries do unsafe things with $TERM. However, it would be unnecessarily harsh to

treat all $TERM values as unsafe, since only shell metacharacters can cause trouble in $TERM. So a

tainted $TERM is considered to be safe if it contains only alphanumerics, underscores, dashes, and

colons, and unsafe if it contains other characters (including whitespace).

New Opcode module and revised Safe module

A new Opcode module supports the creation, manipulation and application of opcode masks. The revised

Safe module has a new API and is implemented using the new Opcode module. Please read the new Opcode

and Safe documentation.

Embedding improvements

In older versions of Perl it was not possible to create more than one Perl interpreter instance inside a single

process without leaking like a sieve and/or crashing. The bugs that caused this behavior have all been fixed.

However, you still must take care when embedding Perl in a C program. See the updated perlembed

manpage for tips on how to manage your interpreters.

Internal change: FileHandle class based on IO::* classes

File handles are now stored internally as type IO::Handle. The FileHandle module is still supported for

backwards compatibility, but it is now merely a front end to the IO::* modules — specifically, IO::Handle,

IO::Seekable, and IO::File. We suggest, but do not require, that you use the IO::* modules in new code.

18−Oct−1998 Version 5.005_02 129

perl5004delta Perl Programmers Reference Guide perl5004delta

In harmony with this change, *GLOB{FILEHANDLE} is now just a backward−compatible synonym for

*GLOB{IO}.

Internal change: PerlIO abstraction interface

It is now possible to build Perl with AT&T‘s sfio IO package instead of stdio. See perlapio for more

details, and the INSTALL file for how to use it.

New and changed syntax

$coderef−(PARAMS)

A subroutine reference may now be suffixed with an arrow and a (possibly empty) parameter list. This

syntax denotes a call of the referenced subroutine, with the given parameters (if any).

This new syntax follows the pattern of $hashref−>{FOO} and $aryref−>[$foo]: You may

now write &$subref($foo) as $subref−>($foo). All of these arrow terms may be chained;

thus, &{$table−>{FOO}}($bar) may now be written $table−>{FOO}−>($bar).

New and changed builtin constants

__PACKAGE__

The current package name at compile time, or the undefined value if there is no current package (due

to a package; directive). Like __FILE__ and __LINE__, __PACKAGE__ does not interpolate

into strings.

New and changed builtin variables

$^E Extended error message on some platforms. (Also known as $EXTENDED_OS_ERROR if you use

English).

$^H The current set of syntax checks enabled by use strict. See the documentation of strict for

more details. Not actually new, but newly documented. Because it is intended for internal use by Perl

core components, there is no use English long name for this variable.

$^M By default, running out of memory it is not trappable. However, if compiled for this, Perl may use the

contents of $^M as an emergency pool after die()ing with this message. Suppose that your Perl

were compiled with −DPERL_EMERGENCY_SBRK and used Perl‘s malloc. Then

$^M = ’a’ x (1<<16);

would allocate a 64K buffer for use when in emergency. See the INSTALL file for information on how

to enable this option. As a disincentive to casual use of this advanced feature, there is no use

English long name for this variable.

New and changed builtin functions

delete on slices

This now works. (e.g. delete @ENV{‘PATH‘, ‘MANPATH‘})

flock

is now supported on more platforms, prefers fcntl to lockf when emulating, and always flushes before

(un)locking.

printf and sprintf

Perl now implements these functions itself; it doesn‘t use the C library function sprintf() any

more, except for floating−point numbers, and even then only known flags are allowed. As a result, it is

now possible to know which conversions and flags will work, and what they will do.

The new conversions in Perl‘s sprintf() are:

%i a synonym for %d

%p a pointer (the address of the Perl value, in hexadecimal)

%n special: *stores* the number of characters output so far

130 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

into the next variable in the parameter list

The new flags that go between the % and the conversion are:

# prefix octal with "0", hex with "0x"

h interpret integer as C type "short" or "unsigned short"

V interpret integer as Perl’s standard integer type

Also, where a number would appear in the flags, an asterisk ("*") may be used instead, in which case

Perl uses the next item in the parameter list as the given number (that is, as the field width or

precision). If a field width obtained through "*" is negative, it has the same effect as the ‘−’ flag:

left−justification.

See sprintf for a complete list of conversion and flags.

keys as an lvalue

As an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash.

This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to

pre−extending an array by assigning a larger number to $#array.) If you say

keys %hash = 200;

then %hash will have at least 200 buckets allocated for it. These buckets will be retained even if you

do %hash = (); use undef %hash if you want to free the storage while %hash is still in scope.

You can‘t shrink the number of buckets allocated for the hash using keys in this way (but you needn‘t

worry about doing this by accident, as trying has no effect).

my() in Control Structures

You can now use my() (with or without the parentheses) in the control expressions of control

structures such as:

while (defined(my $line = <>)) {

$line = lc $line;

} continue {

print $line;

}

if ((my $answer = <STDIN>) =~ /^y(es)?$/i) {

user_agrees();

} elsif ($answer =~ /^n(o)?$/i) {

user_disagrees();

} else {

chomp $answer;

die "‘$answer’ is neither ‘yes’ nor ‘no’";

}

Also, you can declare a foreach loop control variable as lexical by preceding it with the word "my".

For example, in:

foreach my $i (1, 2, 3) {

some_function();

}

$i is a lexical variable, and the scope of $i extends to the end of the loop, but not beyond it.

Note that you still cannot use my() on global punctuation variables such as $_ and the like.

pack() and unpack()

A new format ‘w’ represents a BER compressed integer (as defined in ASN.1). Its format is a

sequence of one or more bytes, each of which provides seven bits of the total value, with the most

significant first. Bit eight of each byte is set, except for the last byte, in which bit eight is clear.

18−Oct−1998 Version 5.005_02 131

perl5004delta Perl Programmers Reference Guide perl5004delta

If ‘p’ or ‘P’ are given undef as values, they now generate a NULL pointer.

Both pack() and unpack() now fail when their templates contain invalid types. (Invalid types

used to be ignored.)

sysseek()

The new sysseek() operator is a variant of seek() that sets and gets the file‘s system read/write

position, using the lseek(2) system call. It is the only reliable way to seek before using sysread()

or syswrite(). Its return value is the new position, or the undefined value on failure.

use VERSION

If the first argument to use is a number, it is treated as a version number instead of a module name. If

the version of the Perl interpreter is less than VERSION, then an error message is printed and Perl

exits immediately. Because use occurs at compile time, this check happens immediately during the

compilation process, unlike require VERSION, which waits until runtime for the check. This is

often useful if you need to check the current Perl version before useing library modules which have

changed in incompatible ways from older versions of Perl. (We try not to do this more than we have

to.)

use Module VERSION LIST

If the VERSION argument is present between Module and LIST, then the use will call the VERSION

method in class Module with the given version as an argument. The default VERSION method,

inherited from the UNIVERSAL class, croaks if the given version is larger than the value of the

variable $Module::VERSION. (Note that there is not a comma after VERSION!)

This version−checking mechanism is similar to the one currently used in the Exporter module, but it is

faster and can be used with modules that don‘t use the Exporter. It is the recommended method for

new code.

prototype(FUNCTION)

Returns the prototype of a function as a string (or undef if the function has no prototype).

FUNCTION is a reference to or the name of the function whose prototype you want to retrieve. (Not

actually new; just never documented before.)

srand

The default seed for srand, which used to be time, has been changed. Now it‘s a heady mix of

difficult−to−predict system−dependent values, which should be sufficient for most everyday purposes.

Previous to version 5.004, calling rand without first calling srand would yield the same sequence of

random numbers on most or all machines. Now, when perl sees that you‘re calling rand and haven‘t

yet called srand, it calls srand with the default seed. You should still call srand manually if your

code might ever be run on a pre−5.004 system, of course, or if you want a seed other than the default.

$_ as Default

Functions documented in the Camel to default to $_ now in fact do, and all those that do are so

documented in perlfunc.

m//gc does not reset search position on failure

The m//g match iteration construct has always reset its target string‘s search position (which is visible

through the pos operator) when a match fails; as a result, the next m//g match after a failure starts

again at the beginning of the string. With Perl 5.004, this reset may be disabled by adding the "c" (for

"continue") modifier, i.e. m//gc. This feature, in conjunction with the \G zero−width assertion,

makes it possible to chain matches together. See perlop and perlre.

m//x ignores whitespace before ?*+{}

The m//x construct has always been intended to ignore all unescaped whitespace. However, before

Perl 5.004, whitespace had the effect of escaping repeat modifiers like "*" or "?"; for example, /a

*b/x was (mis)interpreted as /a\*b/x. This bug has been fixed in 5.004.

132 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

nested sub{} closures work now

Prior to the 5.004 release, nested anonymous functions didn‘t work right. They do now.

formats work right on changing lexicals

Just like anonymous functions that contain lexical variables that change (like a lexical index variable

for a foreach loop), formats now work properly. For example, this silently failed before (printed

only zeros), but is fine now:

my $i;

foreach $i ( 1 .. 10 ) {

write;

}

format =

my i is @#

However, it still fails (without a warning) if the foreach is within a subroutine:

my $i;

sub foo {

foreach $i ( 1 .. 10 ) {

write;

}

foo;

format =

my i is @#

New builtin methods

The UNIVERSAL package automatically contains the following methods that are inherited by all other

classes:

isa(CLASS)

isa returns true if its object is blessed into a subclass of CLASS

isa is also exportable and can be called as a sub with two arguments. This allows the ability to check

what a reference points to. Example:

use UNIVERSAL qw(isa);

if(isa($ref, ’ARRAY’)) {

...

}

can(METHOD)

can checks to see if its object has a method called METHOD, if it does then a reference to the sub is

returned; if it does not then undef is returned.

VERSION( [NEED] )

VERSION returns the version number of the class (package). If the NEED argument is given then it

will check that the current version (as defined by the $VERSION variable in the given package) not

less than NEED; it will die if this is not the case. This method is normally called as a class method.

This method is called automatically by the VERSION form of use.

use A 1.2 qw(some imported subs);

# implies:

18−Oct−1998 Version 5.005_02 133

perl5004delta Perl Programmers Reference Guide perl5004delta

A−>VERSION(1.2);

NOTE: can directly uses Perl‘s internal code for method lookup, and isa uses a very similar method and

caching strategy. This may cause strange effects if the Perl code dynamically changes @ISA in any package.

You may add other methods to the UNIVERSAL class via Perl or XS code. You do not need to use

UNIVERSAL in order to make these methods available to your program. This is necessary only if you wish

to have isa available as a plain subroutine in the current package.

TIEHANDLE now supported

See perltie for other kinds of tie()s.

TIEHANDLE classname, LIST

This is the constructor for the class. That means it is expected to return an object of some sort. The

reference can be used to hold some internal information.

sub TIEHANDLE {

print "<shout>\n";

my $i;

return bless \$i, shift;

}

PRINT this, LIST

This method will be triggered every time the tied handle is printed to. Beyond its self reference it also

expects the list that was passed to the print function.

sub PRINT {

$r = shift;

$$r++;

return print join( $, => map {uc} @_), $\;

}

PRINTF this, LIST

This method will be triggered every time the tied handle is printed to with the printf() function.

Beyond its self reference it also expects the format and list that was passed to the printf function.

sub PRINTF {

shift;

my $fmt = shift;

print sprintf($fmt, @_)."\n";

}

READ this LIST

This method will be called when the handle is read from via the read or sysread functions.

sub READ {

$r = shift;

my($buf,$len,$offset) = @_;

print "READ called, \$buf=$buf, \$len=$len, \$offset=$offset";

}

READLINE this

This method will be called when the handle is read from. The method should return undef when there

is no more data.

sub READLINE {

$r = shift;

return "PRINT called $$r times\n"

}

134 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

GETC this

This method will be called when the getc function is called.

sub GETC { print "Don’t GETC, Get Perl"; return "a"; }

DESTROY this

As with the other types of ties, this method will be called when the tied handle is about to be destroyed.

This is useful for debugging and possibly for cleaning up.

sub DESTROY {

print "</shout>\n";

}

Malloc enhancements

If perl is compiled with the malloc included with the perl distribution (that is, if perl −V:d_mymalloc is

‘define’) then you can print memory statistics at runtime by running Perl thusly:

env PERL_DEBUG_MSTATS=2 perl your_script_here

The value of 2 means to print statistics after compilation and on exit; with a value of 1, the statistics are

printed only on exit. (If you want the statistics at an arbitrary time, you‘ll need to install the optional module

Devel::Peek.)

Three new compilation flags are recognized by malloc.c. (They have no effect if perl is compiled with

system malloc().)

−DPERL_EMERGENCY_SBRK

If this macro is defined, running out of memory need not be a fatal error: a memory pool can allocated

by assigning to the special variable $^M. See "

$^M"

−DPACK_MALLOC

Perl memory allocation is by bucket with sizes close to powers of two. Because of these malloc

overhead may be big, especially for data of size exactly a power of two. If PACK_MALLOC is defined,

perl uses a slightly different algorithm for small allocations (up to 64 bytes long), which makes it

possible to have overhead down to 1 byte for allocations which are powers of two (and appear quite

often).

Expected memory savings (with 8−byte alignment in alignbytes) is about 20% for typical Perl

usage. Expected slowdown due to additional malloc overhead is in fractions of a percent (hard to

measure, because of the effect of saved memory on speed).

−DTWO_POT_OPTIMIZE

Similarly to PACK_MALLOC, this macro improves allocations of data with size close to a power of

two; but this works for big allocations (starting with 16K by default). Such allocations are typical for

big hashes and special−purpose scripts, especially image processing.

On recent systems, the fact that perl requires 2M from system for 1M allocation will not affect speed

of execution, since the tail of such a chunk is not going to be touched (and thus will not require real

memory). However, it may result in a premature out−of−memory error. So if you will be manipulating

very large blocks with sizes close to powers of two, it would be wise to define this macro.

Expected saving of memory is 0−100% (100% in applications which require most memory in such

2**n chunks); expected slowdown is negligible.

Miscellaneous efficiency enhancements

Functions that have an empty prototype and that do nothing but return a fixed value are now inlined (e.g.

sub PI () { 3.14159 }).

Each unique hash key is only allocated once, no matter how many hashes have an entry with that key. So

even if you have 100 copies of the same hash, the hash keys never have to be reallocated.

18−Oct−1998 Version 5.005_02 135

perl5004delta Perl Programmers Reference Guide perl5004delta

Support for More Operating Systems

Support for the following operating systems is new in Perl 5.004.

Win32

Perl 5.004 now includes support for building a "native" perl under Windows NT, using the Microsoft Visual

C++ compiler (versions 2.0 and above) or the Borland C++ compiler (versions 5.02 and above). The

resulting perl can be used under Windows 95 (if it is installed in the same directory locations as it got

installed in Windows NT). This port includes support for perl extension building tools like MakeMaker and

h2xs, so that many extensions available on the Comprehensive Perl Archive Network (CPAN) can now be

readily built under Windows NT. See http://www.perl.com/ for more information on CPAN and

README.win32 in the perl distribution for more details on how to get started with building this port.

There is also support for building perl under the Cygwin32 environment. Cygwin32 is a set of GNU tools

that make it possible to compile and run many UNIX programs under Windows NT by providing a mostly

UNIX−like interface for compilation and execution. See README.cygwin32 in the perl distribution for

more details on this port and how to obtain the Cygwin32 toolkit.

Plan 9

See README.plan9 in the perl distribution.

QNX

See README.qnx in the perl distribution.

AmigaOS

See README.amigaos in the perl distribution.

Pragmata

Six new pragmatic modules exist:

use autouse MODULE = qw(sub1 sub2 sub3)

Defers require MODULE until someone calls one of the specified subroutines (which must be

exported by MODULE). This pragma should be used with caution, and only when necessary.

use blib

use blib ‘dir’

Looks for MakeMaker−like ‘blib’ directory structure starting in dir (or current directory) and working

back up to five levels of parent directories.

Intended for use on command line with −M option as a way of testing arbitrary scripts against an

uninstalled version of a package.

use constant NAME = VALUE

Provides a convenient interface for creating compile−time constants, See

Constant Functions in perlsub.

use locale

Tells the compiler to enable (or disable) the use of POSIX locales for builtin operations.

When use locale is in effect, the current LC_CTYPE locale is used for regular expressions and

case mapping; LC_COLLATE for string ordering; and LC_NUMERIC for numeric formating in printf

and sprintf (but not in print). LC_NUMERIC is always used in write, since lexical scoping of formats

is problematic at best.

Each use locale or no locale affects statements to the end of the enclosing BLOCK or, if not

inside a BLOCK, to the end of the current file. Locales can be switched and queried with

POSIX::setlocale().

See perllocale for more information.

136 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

use ops

Disable unsafe opcodes, or any named opcodes, when compiling Perl code.

use vmsish

Enable VMS−specific language features. Currently, there are three VMS−specific features available:

‘status‘, which makes $? and system return genuine VMS status values instead of emulating POSIX;

‘exit‘, which makes exit take a genuine VMS status value instead of assuming that exit 1 is an

error; and ‘time‘, which makes all times relative to the local time zone, in the VMS tradition.

Modules

Required Updates

Though Perl 5.004 is compatible with almost all modules that work with Perl 5.003, there are a few

exceptions:

Module Required Version for Perl 5.004

−−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

Filter Filter−1.12

LWP libwww−perl−5.08

Tk Tk400.202 (−w makes noise)

Also, the majordomo mailing list program, version 1.94.1, doesn‘t work with Perl 5.004 (nor with perl 4),

because it executes an invalid regular expression. This bug is fixed in majordomo version 1.94.2.

Installation directories

The installperl script now places the Perl source files for extensions in the architecture−specific library

directory, which is where the shared libraries for extensions have always been. This change is intended to

allow administrators to keep the Perl 5.004 library directory unchanged from a previous version, without

running the risk of binary incompatibility between extensions’ Perl source and shared libraries.

Module information summary

Brand new modules, arranged by topic rather than strictly alphabetically:

CGI.pm Web server interface ("Common Gateway Interface")

CGI/Apache.pm Support for Apache’s Perl module

CGI/Carp.pm Log server errors with helpful context

CGI/Fast.pm Support for FastCGI (persistent server process)

CGI/Push.pm Support for server push

CGI/Switch.pm Simple interface for multiple server types

CPAN Interface to Comprehensive Perl Archive Network

CPAN::FirstTime Utility for creating CPAN configuration file

CPAN::Nox Runs CPAN while avoiding compiled extensions

IO.pm Top−level interface to IO::* classes

IO/File.pm IO::File extension Perl module

IO/Handle.pm IO::Handle extension Perl module

IO/Pipe.pm IO::Pipe extension Perl module

IO/Seekable.pm IO::Seekable extension Perl module

IO/Select.pm IO::Select extension Perl module

IO/Socket.pm IO::Socket extension Perl module

Opcode.pm Disable named opcodes when compiling Perl code

ExtUtils/Embed.pm Utilities for embedding Perl in C programs

ExtUtils/testlib.pm Fixes up @INC to use just−built extension

FindBin.pm Find path of currently executing program

Class/Struct.pm Declare struct−like datatypes as Perl classes

18−Oct−1998 Version 5.005_02 137

perl5004delta Perl Programmers Reference Guide perl5004delta

File/stat.pm By−name interface to Perl’s builtin stat

Net/hostent.pm By−name interface to Perl’s builtin gethost*

Net/netent.pm By−name interface to Perl’s builtin getnet*

Net/protoent.pm By−name interface to Perl’s builtin getproto*

Net/servent.pm By−name interface to Perl’s builtin getserv*

Time/gmtime.pm By−name interface to Perl’s builtin gmtime

Time/localtime.pm By−name interface to Perl’s builtin localtime

Time/tm.pm Internal object for Time::{gm,local}time

User/grent.pm By−name interface to Perl’s builtin getgr*

User/pwent.pm By−name interface to Perl’s builtin getpw*

Tie/RefHash.pm Base class for tied hashes with references as keys

UNIVERSAL.pm Base class for *ALL* classes

Fcntl

New constants in the existing Fcntl modules are now supported, provided that your operating system

happens to support them:

F_GETOWN F_SETOWN

O_ASYNC O_DEFER O_DSYNC O_FSYNC O_SYNC

O_EXLOCK O_SHLOCK

These constants are intended for use with the Perl operators sysopen() and fcntl() and the basic

database modules like SDBM_File. For the exact meaning of these and other Fcntl constants please refer to

your operating system‘s documentation for fcntl() and open().

In addition, the Fcntl module now provides these constants for use with the Perl operator flock():

LOCK_SH LOCK_EX LOCK_NB LOCK_UN

These constants are defined in all environments (because where there is no flock() system call, Perl

emulates it). However, for historical reasons, these constants are not exported unless they are explicitly

requested with the ":flock" tag (e.g. use Fcntl ‘:flock’).

The IO module provides a simple mechanism to load all of the IO modules at one go. Currently this

includes:

IO::Handle

IO::Seekable

IO::File

IO::Pipe

IO::Socket

For more information on any of these modules, please see its respective documentation.

Math::Complex

The Math::Complex module has been totally rewritten, and now supports more operations. These are

overloaded:

+ − * / ** <=> neg ~ abs sqrt exp log sin cos atan2 "" (stringify)

And these functions are now exported:

pi i Re Im arg

log10 logn ln cbrt root

tan

csc sec cot

asin acos atan

acsc asec acot

138 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

sinh cosh tanh

csch sech coth

asinh acosh atanh

acsch asech acoth

cplx cplxe

Math::Trig

This new module provides a simpler interface to parts of Math::Complex for those who need trigonometric

functions only for real numbers.

DB_File

There have been quite a few changes made to DB_File. Here are a few of the highlights:

Fixed a handful of bugs.

By public demand, added support for the standard hash function exists().

Made it compatible with Berkeley DB 1.86.

Made negative subscripts work with RECNO interface.

Changed the default flags from O_RDWR to O_CREAT|O_RDWR and the default mode from 0640 to

0666.

Made DB_File automatically import the open() constants (O_RDWR, O_CREAT etc.) from Fcntl, if

available.

Updated documentation.

Refer to the HISTORY section in DB_File.pm for a complete list of changes. Everything after DB_File 1.01

has been added since 5.003.

Net::Ping

Major rewrite − support added for both udp echo and real icmp pings.

Object−oriented overrides for builtin operators

Many of the Perl builtins returning lists now have object−oriented overrides. These are:

File::stat

Net::hostent

Net::netent

Net::protoent

Net::servent

Time::gmtime

Time::localtime

User::grent

User::pwent

For example, you can now say

use File::stat;

use User::pwent;

$his = (stat($filename)−>st_uid == pwent($whoever)−>pw_uid);

Utility Changes

pod2html

Sends converted HTML to standard output

The pod2html utility included with Perl 5.004 is entirely new. By default, it sends the converted

HTML to its standard output, instead of writing it to a file like Perl 5.003‘s pod2html did. Use the

—outfile=FILENAME option to write to a file.

18−Oct−1998 Version 5.005_02 139

perl5004delta Perl Programmers Reference Guide perl5004delta

xsubpp

void XSUBs now default to returning nothing

Due to a documentation/implementation bug in previous versions of Perl, XSUBs with a return type of

void have actually been returning one value. Usually that value was the GV for the XSUB, but

sometimes it was some already freed or reused value, which would sometimes lead to program failure.

In Perl 5.004, if an XSUB is declared as returning void, it actually returns no value, i.e. an empty list

(though there is a backward−compatibility exception; see below). If your XSUB really does return an

SV, you should give it a return type of SV *.

For backward compatibility, xsubpp tries to guess whether a void XSUB is really void or if it wants

to return an SV *. It does so by examining the text of the XSUB: if xsubpp finds what looks like an

assignment to ST(0), it assumes that the XSUB‘s return type is really SV *.

C Language API Changes

gv_fetchmethod and perl_call_sv

The gv_fetchmethod function finds a method for an object, just like in Perl 5.003. The GV it

returns may be a method cache entry. However, in Perl 5.004, method cache entries are not visible to

users; therefore, they can no longer be passed directly to perl_call_sv. Instead, you should use the

GvCV macro on the GV to extract its CV, and pass the CV to perl_call_sv.

The most likely symptom of passing the result of gv_fetchmethod to perl_call_sv is Perl‘s

producing an "Undefined subroutine called" error on the second call to a given method (since there is

no cache on the first call).

perl_eval_pv

A new function handy for eval‘ing strings of Perl code inside C code. This function returns the value

from the eval statement, which can be used instead of fetching globals from the symbol table. See

perlguts, perlembed and perlcall for details and examples.

Extended API for manipulating hashes

Internal handling of hash keys has changed. The old hashtable API is still fully supported, and will

likely remain so. The additions to the API allow passing keys as SV*s, so that tied hashes can be

given real scalars as keys rather than plain strings (nontied hashes still can only use strings as keys).

New extensions must use the new hash access functions and macros if they wish to use SV* keys.

These additions also make it feasible to manipulate HE*s (hash entries), which can be more efficient.

See perlguts for details.

Documentation Changes

Many of the base and library pods were updated. These new pods are included in section 1:

perldelta

This document.

perlfaq

Frequently asked questions.

perllocale

Locale support (internationalization and localization).

perltoot

Tutorial on Perl OO programming.

perlapio

Perl internal IO abstraction interface.

140 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

perlmodlib

Perl module library and recommended practice for module creation. Extracted from perlmod (which is

much smaller as a result).

perldebug

Although not new, this has been massively updated.

perlsec

Although not new, this has been massively updated.

New Diagnostics

Several new conditions will trigger warnings that were silent before. Some only affect certain platforms.

The following new warnings and errors outline these. These messages are classified as follows (listed in

increasing order of desperation):

(W) A warning (optional).

(D) A deprecation (optional).

(S) A severe warning (mandatory).

(F) A fatal error (trappable).

(P) An internal error you should never see (trappable).

(X) A very fatal error (nontrappable).

(A) An alien error message (not generated by Perl).

"my" variable %s masks earlier declaration in same scope

(W) A lexical variable has been redeclared in the same scope, effectively eliminating all access to the

previous instance. This is almost always a typographical error. Note that the earlier variable will still

exist until the end of the scope or until all closure referents to it are destroyed.

%s argument is not a HASH element or slice

(F) The argument to delete() must be either a hash element, such as

$foo{$bar}

$ref−>[12]−>{"susie"}

or a hash slice, such as

@foo{$bar, $baz, $xyzzy}

@{$ref−>[12]}{"susie", "queue"}

Allocation too large: %lx

(X) You can‘t allocate more than 64K on an MS−DOS machine.

Allocation too large

(F) You can‘t allocate more than 2^31+"small amount" bytes.

Applying %s to %s will act on scalar(%s)

(W) The pattern match (//), substitution (s///), and transliteration (tr///) operators work on scalar values.

If you apply one of them to an array or a hash, it will convert the array or hash to a scalar value — the

length of an array, or the population info of a hash — and then work on that scalar value. This is

probably not what you meant to do. See grep and map for alternatives.

Attempt to free nonexistent shared string

(P) Perl maintains a reference counted internal table of strings to optimize the storage and access of

hash keys and other strings. This indicates someone tried to decrement the reference count of a string

that can no longer be found in the table.

Attempt to use reference as lvalue in substr

(W) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty

strange. Perhaps you forgot to dereference it first. See substr.

18−Oct−1998 Version 5.005_02 141

perl5004delta Perl Programmers Reference Guide perl5004delta

Bareword "%s" refers to nonexistent package

(W) You used a qualified bareword of the form Foo::, but the compiler saw no other uses of that

namespace before that point. Perhaps you need to predeclare a package?

Can‘t redefine active sort subroutine %s

(F) Perl optimizes the internal handling of sort subroutines and keeps pointers into them. You tried to

redefine one such sort subroutine when it was currently active, which is not allowed. If you really

want to do this, you should write sort { &func } @x instead of sort func @x.

Can‘t use bareword ("%s") as %s ref while "strict refs" in use

(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.

Cannot resolve method ‘%s’ overloading ‘%s’ in package ‘%s’

(P) Internal error trying to resolve overloading specified by a method name (as opposed to a subroutine

reference).

Constant subroutine %s redefined

(S) You redefined a subroutine which had previously been eligible for inlining. See

Constant Functions in perlsub for commentary and workarounds.

Constant subroutine %s undefined

(S) You undefined a subroutine which had previously been eligible for inlining. See

Constant Functions in perlsub for commentary and workarounds.

Copy method did not return a reference

(F) The method which overloads "=" is buggy. See Copy Constructor.

Died

(F) You passed die() an empty string (the equivalent of die "") or you called it with no args and

both $@ and $_ were empty.

Exiting pseudo−block via %s

(W) You are exiting a rather special block construct (like a sort block or subroutine) by unconventional

means, such as a goto, or a loop control statement. See sort.

Identifier too long

(F) Perl limits identifiers (names for variables, functions, etc.) to 252 characters for simple names,

somewhat more for compound names (like $A::B). You‘ve exceeded Perl‘s limits. Future versions

of Perl are likely to eliminate these arbitrary limitations.

Illegal character %s (carriage return)

(F) A carriage return character was found in the input. This is an error, and not a warning, because

carriage return characters can break multi−line strings, including here documents (e.g., print

<<EOF;).

Illegal switch in PERL5OPT: %s

(X) The PERL5OPT environment variable may only be used to set the following switches:

−[DIMUdmw].

Integer overflow in hex number

(S) The literal hex number you have specified is too big for your architecture. On a 32−bit architecture

the largest hex literal is 0xFFFFFFFF.

Integer overflow in octal number

(S) The literal octal number you have specified is too big for your architecture. On a 32−bit

architecture the largest octal literal is 037777777777.

142 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

internal error: glob failed

(P) Something went wrong with the external program(s) used for glob and <*.c>. This may mean

that your csh (C shell) is broken. If so, you should change all of the csh−related variables in config.sh:

If you have tcsh, make the variables refer to it as if it were csh (e.g.

full_csh=‘/usr/bin/tcsh’); otherwise, make them all empty (except that d_csh should be

‘undef’) so that Perl will think csh is missing. In either case, after editing config.sh, run

./Configure −S and rebuild Perl.

Invalid conversion in %s: "%s"

(W) Perl does not understand the given format conversion. See sprintf.

Invalid type in pack: ‘%s’

(F) The given character is not a valid pack type. See pack.

Invalid type in unpack: ‘%s’

(F) The given character is not a valid unpack type. See unpack.

Name "%s::%s" used only once: possible typo

(W) Typographical errors often show up as unique variable names. If you had a good reason for having

a unique name, then just mention it again somehow to suppress the message (the use vars pragma

is provided for just this purpose).

Null picture in formline

(F) The first argument to formline must be a valid format picture specification. It was found to be

empty, which probably means you supplied it an uninitialized value. See perlform.

Offset outside string

(F) You tried to do a read/write/send/recv operation with an offset pointing outside the buffer. This is

difficult to imagine. The sole exception to this is that sysread()ing past the buffer will extend the

buffer and zero pad the new area.

Out of memory!

(X|F) The malloc() function returned 0, indicating there was insufficient remaining memory (or

virtual memory) to satisfy the request.

The request was judged to be small, so the possibility to trap it depends on the way Perl was compiled.

By default it is not trappable. However, if compiled for this, Perl may use the contents of $^M as an

emergency pool after die()ing with this message. In this case the error is trappable once.

Out of memory during request for %s

(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or

virtual memory) to satisfy the request. However, the request was judged large enough (compile−time

default is 64K), so a possibility to shut down by trapping this error is granted.

panic: frexp

(P) The library function frexp() failed, making printf("%f") impossible.

Possible attempt to put comments in qw() list

(W) qw() lists contain items separated by whitespace; as with literal strings, comment characters are

not ignored, but are instead treated as literal data. (You may have used different delimiters than the

parentheses shown here; braces are also frequently used.)

You probably wrote something like this:

@list = qw(

a # a comment

b # another comment

);

18−Oct−1998 Version 5.005_02 143

perl5004delta Perl Programmers Reference Guide perl5004delta

when you should have written this:

@list = qw(

);

If you really want comments, build your list the old−fashioned way, with quotes and commas:

@list = (

’a’, # a comment

’b’, # another comment

);

Possible attempt to separate words with commas

(W) qw() lists contain items separated by whitespace; therefore commas aren‘t needed to separate the

items. (You may have used different delimiters than the parentheses shown here; braces are also

frequently used.)

You probably wrote something like this:

qw! a, b, c !;

which puts literal commas into some of the list items. Write it without commas if you don‘t want them

to appear in your data:

qw! a b c !;

Scalar value @%s{%s} better written as $%s{%s}

(W) You‘ve used a hash slice (indicated by @) to select a single element of a hash. Generally it‘s

better to ask for a scalar value (indicated by $). The difference is that $foo{&bar} always behaves

like a scalar, both when assigning to it and when evaluating its argument, while @foo{&bar}

behaves like a list when you assign to it, and provides a list context to its subscript, which can do weird

things if you‘re expecting only one subscript.

Stub found while resolving method ‘%s’ overloading ‘%s’ in package ‘%s’

(P) Overloading resolution over @ISA tree may be broken by importing stubs. Stubs should never be

implicitely created, but explicit calls to can may break this.

Too late for "−T" option

(X) The #! line (or local equivalent) in a Perl script contains the −T option, but Perl was not invoked

with −T in its argument list. This is an error because, by the time Perl discovers a −T in a script, it‘s

too late to properly taint everything from the environment. So Perl gives up.

untie attempted while %d inner references still exist

(W) A copy of the object returned from tie (or tied) was still valid when untie was called.

Unrecognized character %s

(F) The Perl parser has no idea what to do with the specified character in your Perl script (or eval).

Perhaps you tried to run a compressed script, a binary program, or a directory as a Perl program.

Unsupported function fork

(F) Your version of executable does not support forking.

Note that under some systems, like OS/2, there may be different flavors of Perl executables, some of

which may support fork, some not. Try changing the name you call Perl by to perl_, perl__, and

so on.

Use of "$$<digit" to mean "${$}<digit" is deprecated

(D) Perl versions before 5.004 misinterpreted any type marker followed by "$" and a digit. For

example, "$$0" was incorrectly taken to mean "${$}0" instead of "${$0}". This bug is (mostly)

144 Version 5.005_02 18−Oct−1998

perl5004delta Perl Programmers Reference Guide perl5004delta

fixed in Perl 5.004.

However, the developers of Perl 5.004 could not fix this bug completely, because at least two

widely−used modules depend on the old meaning of "$$0" in a string. So Perl 5.004 still interprets

"$$<digit" in the old (broken) way inside strings; but it generates this message as a warning. And

in Perl 5.005, this special treatment will cease.

Value of %s can be "0"; test with defined()

(W) In a conditional expression, you used <HANDLE, <* (glob), each(), or readdir() as a

boolean value. Each of these constructs can return a value of "0"; that would make the conditional

expression false, which is probably not what you intended. When using these constructs in conditional

expressions, test their values with the defined operator.

Variable "%s" may be unavailable

(W) An inner (nested) anonymous subroutine is inside a named subroutine, and outside that is another

subroutine; and the anonymous (innermost) subroutine is referencing a lexical variable defined in the

outermost subroutine. For example:

sub outermost { my $a; sub middle { sub { $a } } }

If the anonymous subroutine is called or referenced (directly or indirectly) from the outermost

subroutine, it will share the variable as you would expect. But if the anonymous subroutine is called or

referenced when the outermost subroutine is not active, it will see the value of the shared variable as it

was before and during the *first* call to the outermost subroutine, which is probably not what you

want.

In these circumstances, it is usually best to make the middle subroutine anonymous, using the sub {}

syntax. Perl has specific support for shared variables in nested anonymous subroutines; a named

subroutine in between interferes with this feature.

Variable "%s" will not stay shared

(W) An inner (nested) named subroutine is referencing a lexical variable defined in an outer

subroutine.

When the inner subroutine is called, it will probably see the value of the outer subroutine‘s variable as

it was before and during the *first* call to the outer subroutine; in this case, after the first call to the

outer subroutine is complete, the inner and outer subroutines will no longer share a common value for

the variable. In other words, the variable will no longer be shared.

Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then

the outer and inner subroutines will never share the given variable.

This problem can usually be solved by making the inner subroutine anonymous, using the sub {}

syntax. When inner anonymous subs that reference variables in outer subroutines are called or

referenced, they are automatically rebound to the current values of such variables.

Warning: something‘s wrong

(W) You passed warn() an empty string (the equivalent of warn "") or you called it with no args

and $_ was empty.

Ill−formed logical name |%s| in prime_env_iter

(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over

%ENV which violates the syntactic rules governing logical names. Since it cannot be translated

normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some

software packages might directly modify logical name tables and introduce nonstandard names, or it

may indicate that a logical name table has been corrupted.

Got an error from DosAllocMem

(P) An error peculiar to OS/2. Most probably you‘re using an obsolete version of Perl, and this should

not happen anyway.

18−Oct−1998 Version 5.005_02 145

perl5004delta Perl Programmers Reference Guide perl5004delta

Malformed PERLLIB_PREFIX

(F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form

prefix1;prefix2

prefix1 prefix2

with nonempty prefix1 and prefix2. If prefix1 is indeed a prefix of a builtin library search path,

prefix2 is substituted. The error may appear if components are not found, or are too long. See

"PERLLIB_PREFIX" in README.os2.

PERL_SH_DIR too long

(F) An error peculiar to OS/2. PERL_SH_DIR is the directory to find the sh−shell in. See

"PERL_SH_DIR" in README.os2.

Process terminated by SIG%s

(W) This is a standard message issued by OS/2 applications, while *nix applications die in silence. It

is considered a feature of the OS/2 port. One can easily disable this by appropriate sighandlers, see

Signals in perlipc. See also "Process terminated by SIGTERM/SIGINT" in README.os2.

BUGS

If you find what you think is a bug, you might check the headers of recently posted articles in the

comp.lang.perl.misc newsgroup. There may also be information at http://www.perl.com/perl/, the Perl Home

Page.

If you believe you have an unreported bug, please run the perlbug program included with your release.

Make sure you trim your bug down to a tiny but sufficient test case. Your bug report, along with the output

of perl −V, will be sent off to <perlbug@perl.com to be analysed by the Perl porting team.

SEE ALSO

The Changes file for exhaustive details on what changed.

The INSTALL file for how to build Perl. This file has been significantly updated for 5.004, so even veteran

users should look through it.

The README file for general stuff.

The Copying file for copyright information.

HISTORY

Constructed by Tom Christiansen, grabbing material with permission from innumerable contributors, with

kibitzing by more than a few Perl porters.

Last update: Wed May 14 11:14:09 EDT 1997

146 Version 5.005_02 18−Oct−1998

perldata Perl Programmers Reference Guide perldata

NAME

perldata − Perl data types

DESCRIPTION

Variable names

Perl has three data structures: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes".

Normal arrays are indexed by number, starting with 0. (Negative subscripts count from the end.) Hash

arrays are indexed by string.

Values are usually referred to by name (or through a named reference). The first character of the name tells

you to what sort of data structure it refers. The rest of the name tells you the particular value to which it

refers. Most often, it consists of a single identifier, that is, a string beginning with a letter or underscore, and

containing letters, underscores, and digits. In some cases, it may be a chain of identifiers, separated by ::

(or by ’, but that‘s deprecated); all but the last are interpreted as names of packages, to locate the namespace

in which to look up the final identifier (see Packages for details). It‘s possible to substitute for a simple

identifier an expression that produces a reference to the value at runtime; this is described in more detail

below, and in perlref.

There are also special variables whose names don‘t follow these rules, so that they don‘t accidentally collide

with one of your normal variables. Strings that match parenthesized parts of a regular expression are saved

under names containing only digits after the $ (see perlop and perlre). In addition, several special variables

that provide windows into the inner working of Perl have names containing punctuation characters (see

perlvar).

Scalar values are always named with ‘$‘, even when referring to a scalar that is part of an array. It works

like the English word "the". Thus we have:

$days # the simple scalar value "days"

$days[28] # the 29th element of array @days

$days{’Feb’} # the ’Feb’ value from hash %days

$#days # the last index of array @days

but entire arrays or array slices are denoted by ‘@‘, which works much like the word "these" or "those":

@days # ($days[0], $days[1],... $days[n])

@days[3,4,5] # same as @days[3..5]

@days{’a’,’c’} # same as ($days{’a’},$days{’c’})

and entire hashes are denoted by ‘%‘:

%days # (key1, val1, key2, val2 ...)

In addition, subroutines are named with an initial ‘&‘, though this is optional when it‘s otherwise

unambiguous (just as "do" is often redundant in English). Symbol table entries can be named with an initial

‘*‘, but you don‘t really care about that yet.

Every variable type has its own namespace. You can, without fear of conflict, use the same name for a scalar

variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, or a label). This means that

$foo and @foo are two different variables. It also means that $foo[1] is a part of @foo, not a part of

$foo. This may seem a bit weird, but that‘s okay, because it is weird.

Because variable and array references always start with ‘$‘, ‘@‘, or ‘%‘, the "reserved" words aren‘t in fact

reserved with respect to variable names. (They ARE reserved with respect to labels and filehandles,

however, which don‘t have an initial special character. You can‘t have a filehandle named "log", for

instance. Hint: you could say open(LOG,‘logfile’) rather than open(log,‘logfile’). Using

uppercase filehandles also improves readability and protects you from conflict with future reserved words.)

Case IS significant—"FOO", "Foo", and "foo" are all different names. Names that start with a letter or

underscore may also contain digits and underscores.

18−Oct−1998 Version 5.005_02 147

perldata Perl Programmers Reference Guide perldata

It is possible to replace such an alphanumeric name with an expression that returns a reference to an object of

that type. For a description of this, see perlref.

Names that start with a digit may contain only more digits. Names that do not start with a letter, underscore,

or digit are limited to one character, e.g., $% or $$. (Most of these one character names have a predefined

significance to Perl. For instance, $$ is the current process id.)

Context

The interpretation of operations and values in Perl sometimes depends on the requirements of the context

around the operation or value. There are two major contexts: scalar and list. Certain operations return list

values in contexts wanting a list, and scalar values otherwise. (If this is true of an operation it will be

mentioned in the documentation for that operation.) In other words, Perl overloads certain operations based

on whether the expected return value is singular or plural. (Some words in English work this way, like "fish"

and "sheep".)

In a reciprocal fashion, an operation provides either a scalar or a list context to each of its arguments. For

example, if you say

int( <STDIN> )

the integer operation provides a scalar context for the <STDIN> operator, which responds by reading one

line from STDIN and passing it back to the integer operation, which will then find the integer value of that

line and return that. If, on the other hand, you say

sort( <STDIN> )

then the sort operation provides a list context for <STDIN>, which will proceed to read every line available

up to the end of file, and pass that list of lines back to the sort routine, which will then sort those lines and

return them as a list to whatever the context of the sort was.

Assignment is a little bit special in that it uses its left argument to determine the context for the right

argument. Assignment to a scalar evaluates the righthand side in a scalar context, while assignment to an

array or array slice evaluates the righthand side in a list context. Assignment to a list also evaluates the

righthand side in a list context.

User defined subroutines may choose to care whether they are being called in a scalar or list context, but

most subroutines do not need to care, because scalars are automatically interpolated into lists. See

wantarray.

Scalar values

All data in Perl is a scalar or an array of scalars or a hash of scalars. Scalar variables may contain various

kinds of singular data, such as numbers, strings, and references. In general, conversion from one form to

another is transparent. (A scalar may not contain multiple values, but may contain a reference to an array or

hash containing multiple values.) Because of the automatic conversion of scalars, operations, and functions

that return scalars don‘t need to care (and, in fact, can‘t care) whether the context is looking for a string or a

number.

Scalars aren‘t necessarily one thing or another. There‘s no place to declare a scalar variable to be of type

"string", or of type "number", or type "filehandle", or anything else. Perl is a contextually polymorphic

language whose scalars can be strings, numbers, or references (which includes objects). While strings and

numbers are considered pretty much the same thing for nearly all purposes, references are strongly−typed

uncastable pointers with builtin reference−counting and destructor invocation.

A scalar value is interpreted as TRUE in the Boolean sense if it is not the null string or the number 0 (or its

string equivalent, "0"). The Boolean context is just a special kind of scalar context.

There are actually two varieties of null scalars: defined and undefined. Undefined null scalars are returned

when there is no real value for something, such as when there was an error, or at end of file, or when you

refer to an uninitialized variable or element of an array. An undefined null scalar may become defined the

first time you use it as if it were defined, but prior to that you can use the defined() operator to determine

whether the value is defined or not.

148 Version 5.005_02 18−Oct−1998

perldata Perl Programmers Reference Guide perldata

To find out whether a given string is a valid nonzero number, it‘s usually enough to test it against both

numeric 0 and also lexical "0" (although this will cause −w noises). That‘s because strings that aren‘t

numbers count as 0, just as they do in awk:

if ($str == 0 && $str ne "0") {

warn "That doesn’t look like a number";

}

That‘s usually preferable because otherwise you won‘t treat IEEE notations like NaN or Infinity

properly. At other times you might prefer to use the POSIX::strtod function or a regular expression to check

whether data is numeric. See perlre for details on regular expressions.

warn "has nondigits" if /\D/;

warn "not a natural number" unless /^\d+$/; # rejects −3

warn "not an integer" unless /^−?\d+$/; # rejects +3

warn "not an integer" unless /^[+−]?\d+$/;

warn "not a decimal number" unless /^−?\d+\.?\d*$/; # rejects .2

warn "not a decimal number" unless /^−?(?:\d+(?:\.\d*)?|\.\d+)$/;

warn "not a C float"

unless /^([+−]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+−]?\d+))?$/;

The length of an array is a scalar value. You may find the length of array @days by evaluating $#days, as

in csh. (Actually, it‘s not the length of the array, it‘s the subscript of the last element, because there is

(ordinarily) a 0th element.) Assigning to $#days changes the length of the array. Shortening an array by

this method destroys intervening values. Lengthening an array that was previously shortened NO LONGER

recovers the values that were in those elements. (It used to in Perl 4, but we had to break this to make sure

destructors were called when expected.) You can also gain some miniscule measure of efficiency by

pre−extending an array that is going to get big. (You can also extend an array by assigning to an element

that is off the end of the array.) You can truncate an array down to nothing by assigning the null list () to it.

The following are equivalent:

@whatever = ();

$#whatever = −1;

If you evaluate a named array in a scalar context, it returns the length of the array. (Note that this is not true

of lists, which return the last value, like the C comma operator, nor of built−in functions, which return

whatever they feel like returning.) The following is always true:

scalar(@whatever) == $#whatever − $[ + 1;

Version 5 of Perl changed the semantics of $[: files that don‘t set the value of $[ no longer need to worry

about whether another file changed its value. (In other words, use of $[ is deprecated.) So in general you

can assume that

scalar(@whatever) == $#whatever + 1;

Some programmers choose to use an explicit conversion so nothing‘s left to doubt:

$element_count = scalar(@whatever);

If you evaluate a hash in a scalar context, it returns a value that is true if and only if the hash contains any

key/value pairs. (If there are any key/value pairs, the value returned is a string consisting of the number of

used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to

find out whether Perl‘s (compiled in) hashing algorithm is performing poorly on your data set. For example,

you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16", which means only

one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn‘t

supposed to happen.)

You can preallocate space for a hash by assigning to the keys() function. This rounds up the allocated

bucked to the next power of two:

18−Oct−1998 Version 5.005_02 149

perldata Perl Programmers Reference Guide perldata

keys(%users) = 1000; # allocate 1024 buckets

Scalar value constructors

Numeric literals are specified in any of the customary floating point or integer formats:

12345

12345.67

.23E−10

0xffff # hex

0377 # octal

4_294_967_296 # underline for legibility

String literals are usually delimited by either single or double quotes. They work much like shell quotes:

double−quoted string literals are subject to backslash and variable substitution; single−quoted strings are not

(except for "\’" and "\\"). The usual Unix backslash rules apply for making characters such as newline,

tab, etc., as well as some more exotic forms. See Quote and Quotelike Operators for a list.

Octal or hex representations in string literals (e.g. ‘0xffff’) are not automatically converted to their integer

representation. The hex() and oct() functions make these conversions for you. See hex and oct for more

details.

You can also embed newlines directly in your strings, i.e., they can end on a different line than they begin.

This is nice, but if you forget your trailing quote, the error will not be reported until Perl finds another line

containing the quote character, which may be much further on in the script. Variable substitution inside

strings is limited to scalar variables, arrays, and array slices. (In other words, names beginning with $ or @,

followed by an optional bracketed expression as a subscript.) The following code segment prints out "The

price is $100."

$Price = ’$100’; # not interpreted

print "The price is $Price.\n"; # interpreted

As in some shells, you can put curly brackets around the name to delimit it from following alphanumerics.

In fact, an identifier within such curlies is forced to be a string, as is any single identifier within a hash

subscript. Our earlier example,

$days{’Feb’}

can be written as

$days{Feb}

and the quotes will be assumed automatically. But anything more complicated in the subscript will be

interpreted as an expression.

Note that a single−quoted string must be separated from a preceding word by a space, because single quote is

a valid (though deprecated) character in a variable name (see Packages).

Three special literals are __FILE__, __LINE__, and __PACKAGE__, which represent the current filename,

line number, and package name at that point in your program. They may be used only as separate tokens;

they will not be interpolated into strings. If there is no current package (due to an empty package;

directive), __PACKAGE__ is the undefined value.

The tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual

end of file. Any following text is ignored, but may be read via a DATA filehandle: main::DATA for

__END__, or PACKNAME::DATA (where PACKNAME is the current package) for __DATA__. The two

control characters ^D and ^Z are synonyms for __END__ (or __DATA__ in a module). See SelfLoader for

more description of __DATA__, and an example of its use. Note that you cannot read from the DATA

filehandle in a BEGIN block: the BEGIN block is executed as soon as it is seen (during compilation), at

which point the corresponding __DATA__ (or __END__) token has not yet been seen.

A word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are

known as "barewords". As with filehandles and labels, a bareword that consists entirely of lowercase letters

150 Version 5.005_02 18−Oct−1998

perldata Perl Programmers Reference Guide perldata

risks conflict with future reserved words, and if you use the −w switch, Perl will warn you about any such

words. Some people may wish to outlaw barewords entirely. If you say

use strict ’subs’;

then any bareword that would NOT be interpreted as a subroutine call produces a compile−time error

instead. The restriction lasts to the end of the enclosing block. An inner block may countermand this by

saying no strict ‘subs’.

Array variables are interpolated into double−quoted strings by joining all the elements of the array with the

delimiter specified in the $" variable ($LIST_SEPARATOR in English), space by default. The following

are equivalent:

$temp = join($",@ARGV);

system "echo $temp";

system "echo @ARGV";

Within search patterns (which also undergo double−quotish substitution) there is a bad ambiguity: Is

/$foo[bar]/ to be interpreted as /${foo}[bar]/ (where [bar] is a character class for the regular

expression) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn‘t

otherwise exist, then it‘s obviously a character class. If @foo exists, Perl takes a good guess about [bar],

and is almost always right. If it does guess wrong, or if you‘re just plain paranoid, you can force the correct

interpretation with curly brackets as above.

A line−oriented form of quoting is based on the shell "here−doc" syntax. Following a << you specify a

string to terminate the quoted material, and all lines following the current line down to the terminating string

are the value of the item. The terminating string may be either an identifier (a word), or some quoted text. If

quoted, the type of quotes you use determines the treatment of the text, just as in regular quoting. An

unquoted identifier works like double quotes. There must be no space between the << and the identifier. (If

you put a space it will be treated as a null identifier, which is valid, and matches the first empty line.) The

terminating string must appear by itself (unquoted and with no surrounding whitespace) on the terminating

line.

print <<EOF;

The price is $Price.

EOF

print <<"EOF"; # same as above

The price is $Price.

EOF

print <<‘EOC‘; # execute commands

echo hi there

echo lo there

EOC

print <<"foo", <<"bar"; # you can stack them

I said foo.

foo

I said bar.

bar

myfunc(<<"THIS", 23, <<’THAT’);

Here’s a line

or two.

THIS

and here’s another.

THAT

18−Oct−1998 Version 5.005_02 151

perldata Perl Programmers Reference Guide perldata

Just don‘t forget that you have to put a semicolon on the end to finish the statement, as Perl doesn‘t know

you‘re not going to try to do this:

print <<ABC

179231

ABC

+ 20;

List value constructors

List values are denoted by separating individual values by commas (and enclosing the list in parentheses

where precedence requires it):

(LIST)

In a context not requiring a list value, the value of the list literal is the value of the final element, as with the

C comma operator. For example,

@foo = (’cc’, ’−E’, $bar);

assigns the entire list value to array foo, but

$foo = (’cc’, ’−E’, $bar);

assigns the value of variable bar to variable foo. Note that the value of an actual array in a scalar context is

the length of the array; the following assigns the value 3 to $foo:

@foo = (’cc’, ’−E’, $bar);

$foo = @foo; # $foo gets 3

You may have an optional comma before the closing parenthesis of a list literal, so that you can say:

@foo = (

);

LISTs do automatic interpolation of sublists. That is, when a LIST is evaluated, each element of the list is

evaluated in a list context, and the resulting list value is interpolated into LIST just as if each individual

element were a member of LIST. Thus arrays and hashes lose their identity in a LIST—the list

(@foo,@bar,&SomeSub,%glarch)

contains all the elements of @foo followed by all the elements of @bar, followed by all the elements

returned by the subroutine named SomeSub called in a list context, followed by the key/value pairs of

%glarch. To make a list reference that does NOT interpolate, see perlref.

The null list is represented by (). Interpolating it in a list has no effect. Thus ((),(),()) is equivalent to

(). Similarly, interpolating an array with no elements is the same as if no array had been interpolated at that

point.

A list value may also be subscripted like a normal array. You must put the list in parentheses to avoid

ambiguity. For example:

# Stat returns list value.

$time = (stat($file))[8];

# SYNTAX ERROR HERE.

$time = stat($file)[8]; # OOPS, FORGOT PARENTHESES

# Find a hex digit.

$hexdigit = (’a’,’b’,’c’,’d’,’e’,’f’)[$digit−10];

# A "reverse comma operator".

152 Version 5.005_02 18−Oct−1998

perldata Perl Programmers Reference Guide perldata

return (pop(@foo),pop(@foo))[0];

You may assign to undef in a list. This is useful for throwing away some of the return values of a function:

($dev, $ino, undef, undef, $uid, $gid) = stat($file);

Lists may be assigned to if and only if each element of the list is legal to assign to:

($a, $b, $c) = (1, 2, 3);

($map{’red’}, $map{’blue’}, $map{’green’}) = (0x00f, 0x0f0, 0xf00);

Array assignment in a scalar context returns the number of elements produced by the expression on the right

side of the assignment:

$x = (($foo,$bar) = (3,2,1)); # set $x to 3, not 2

$x = (($foo,$bar) = f()); # set $x to f()’s return count

This is very handy when you want to do a list assignment in a Boolean context, because most list functions

return a null list when finished, which when assigned produces a 0, which is interpreted as FALSE.

The final element may be an array or a hash:

($a, $b, @rest) = split;

my($a, $b, %rest) = @_;

You can actually put an array or hash anywhere in the list, but the first one in the list will soak up all the

values, and anything after it will get a null value. This may be useful in a local() or my().

A hash literal contains pairs of values to be interpreted as a key and a value:

# same as map assignment above

%map = (’red’,0x00f,’blue’,0x0f0,’green’,0xf00);

While literal lists and named arrays are usually interchangeable, that‘s not the case for hashes. Just because

you can subscript a list value like a normal array does not mean that you can subscript a list value as a hash.

Likewise, hashes included as parts of other lists (including parameters lists and return lists from functions)

always flatten out into key/value pairs. That‘s why it‘s good to use references sometimes.

It is often more readable to use the => operator between key/value pairs. The => operator is mostly just a

more visually distinctive synonym for a comma, but it also arranges for its left−hand operand to be

interpreted as a string—if it‘s a bareword that would be a legal identifier. This makes it nice for initializing

hashes:

%map = (

red => 0x00f,

blue => 0x0f0,

green => 0xf00,

);

or for initializing hash references to be used as records:

$rec = {

witch => ’Mable the Merciless’,

cat => ’Fluffy the Ferocious’,

date => ’10/31/1776’,

};

or for using call−by−named−parameter to complicated functions:

$field = $query−>radio_group(

name => ’group_name’,

values => [’eenie’,’meenie’,’minie’],

18−Oct−1998 Version 5.005_02 153

perldata Perl Programmers Reference Guide perldata

default => ’meenie’,

linebreak => ’true’,

labels => \%labels

);

Note that just because a hash is initialized in that order doesn‘t mean that it comes out in that order. See sort

for examples of how to arrange for an output ordering.

Typeglobs and Filehandles

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a

typeglob is a *, because it represents all types. This used to be the preferred way to pass arrays and hashes

by reference into a function, but now that we have real references, this is seldom needed.

The main use of typeglobs in modern Perl is create symbol table aliases. This assignment:

*this = *that;

makes $this an alias for $that, @this an alias for @that, %this an alias for %that, &this an alias for

&that, etc. Much safer is to use a reference. This:

local *Here::blue = \$There::green;

temporarily makes $Here::blue an alias for $There::green, but doesn‘t make @Here::blue an alias

for @There::green, or %Here::blue an alias for %There::green, etc. See Symbol Tables in perlmod for more

examples of this. Strange though this may seem, this is the basis for the whole module import/export

system.

Another use for typeglobs is to to pass filehandles into a function or to create new filehandles. If you need to

use a typeglob to save away a filehandle, do it this way:

$fh = *STDOUT;

or perhaps as a real reference, like this:

$fh = \*STDOUT;

See perlsub for examples of using these as indirect filehandles in functions.

Typeglobs are also a way to create a local filehandle using the local() operator. These last until their

block is exited, but may be passed back. For example:

sub newopen {

my $path = shift;

local *FH; # not my!

open (FH, $path) or return undef;

return *FH;

}

$fh = newopen(’/etc/passwd’);

Now that we have the *foo{THING} notation, typeglobs aren‘t used as much for filehandle manipulations,

although they‘re still needed to pass brand new file and directory handles into or out of functions. That‘s

because *HANDLE{IO} only works if HANDLE has already been used as a handle. In other words, *FH

can be used to create new symbol table entries, but *foo{THING} cannot.

Another way to create anonymous filehandles is with the IO::Handle module and its ilk. These modules

have the advantage of not hiding different types of the same name during the local(). See the bottom of

open()

for an example.

See perlref, perlsub, and Symbol Tables in perlmod for more discussion on typeglobs and the *foo{THING}

syntax.

154 Version 5.005_02 18−Oct−1998

perlsyn Perl Programmers Reference Guide perlsyn

NAME

perlsyn − Perl syntax

DESCRIPTION

A Perl script consists of a sequence of declarations and statements. The only things that need to be declared

in Perl are report formats and subroutines. See the sections below for more information on those

declarations. All uninitialized user−created objects are assumed to start with a null or value until they

are defined by some explicit operation such as assignment. (Though you can get warnings about the use of

undefined values if you like.) The sequence of statements is executed just once, unlike in sed and awk

scripts, where the sequence of statements is executed for each input line. While this means that you must

explicitly loop over the lines of your input file (or files), it also means you have much more control over

which files and which lines you look at. (Actually, I‘m lying—it is possible to do an implicit loop with

either the −n or −p switch. It‘s just not the mandatory default like it is in sed and awk.)

Declarations

Perl is, for the most part, a free−form language. (The only exception to this is format declarations, for

obvious reasons.) Comments are indicated by the "#" character, and extend to the end of the line. If you

attempt to use /* */ C−style comments, it will be interpreted either as division or pattern matching,

depending on the context, and C++ // comments just look like a null regular expression, so don‘t do that.

A declaration can be put anywhere a statement can, but has no effect on the execution of the primary

sequence of statements—declarations all take effect at compile time. Typically all the declarations are put at

the beginning or the end of the script. However, if you‘re using lexically−scoped private variables created

with my(), you‘ll have to make sure your format or subroutine definition is within the same block scope as

the my if you expect to be able to access those private variables.

Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point

forward in the program. You can declare a subroutine without defining it by saying sub name, thus:

sub myname;

$me = myname $0 or die "can’t get myname";

Note that it functions as a list operator, not as a unary operator; so be careful to use or instead of || in this

case. However, if you were to declare the subroutine as sub myname ($), then myname would function

as a unary operator, so either or or || would work.

Subroutines declarations can also be loaded up with the require statement or both loaded and imported

into your namespace with a use statement. See perlmod for details on this.

A statement sequence may contain declarations of lexically−scoped variables, but apart from declaring a

variable name, the declaration acts like an ordinary statement, and is elaborated within the sequence of

statements as if it were an ordinary statement. That means it actually has both compile−time and run−time

effects.

Simple statements

The only kind of simple statement is an expression evaluated for its side effects. Every simple statement

must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon

is optional. (A semicolon is still encouraged there if the block takes up more than one line, because you may

eventually add another line.) Note that there are some operators like eval {} and do {} that look like

compound statements, but aren‘t (they‘re just TERMs in an expression), and thus need an explicit

termination if used as the last item in a statement.

Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating

semicolon (or block ending). The possible modifiers are:

if EXPR

unless EXPR

while EXPR

until EXPR

18−Oct−1998 Version 5.005_02 155

perlsyn Perl Programmers Reference Guide perlsyn

foreach EXPR

The if and unless modifiers have the expected semantics, presuming you‘re a speaker of English. The

foreach modifier is an iterator: For each value in EXPR, it aliases $_ to the value and executes the

statement. The while and until modifiers have the usual "while loop" semantics (conditional

evaluated first), except when applied to a do−BLOCK (or to the now−deprecated do−SUBROUTINE

statement), in which case the block executes once before the conditional is evaluated. This is so that you can

write loops like:

do {

$line = <STDIN>;

...

} until $line eq ".\n";

See do. Note also that the loop control statements described later will NOT work in this construct, because

modifiers don‘t take loop labels. Sorry. You can always put another block inside of it (for next) or around

it (for last) to do that sort of thing. For next, just double the braces:

do {{

next if $x == $y;

# do something here

}} until $x++ > $z;

For last, you have to be more elaborate:

LOOP: {

do {

last if $x = $y**2;

# do something here

} while $x++ <= $z;

}

Compound statements

In Perl, a sequence of statements that defines a scope is called a block. Sometimes a block is delimited by the

file containing it (in the case of a required file, or the program as a whole), and sometimes a block is

delimited by the extent of a string (in the case of an eval).

But generally, a block is delimited by curly brackets, also known as braces. We will call this syntactic

construct a BLOCK.

The following compound statements may be used to control flow:

if (EXPR) BLOCK

if (EXPR) BLOCK else BLOCK

if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK

LABEL while (EXPR) BLOCK

LABEL while (EXPR) BLOCK continue BLOCK

LABEL for (EXPR; EXPR; EXPR) BLOCK

LABEL foreach VAR (LIST) BLOCK

LABEL BLOCK continue BLOCK

Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not statements. This means that the

curly brackets are required—no dangling statements allowed. If you want to write conditionals without

curly brackets there are several other ways to do it. The following all do the same thing:

if (!open(FOO)) { die "Can’t open $FOO: $!"; }

die "Can’t open $FOO: $!" unless open(FOO);

open(FOO) or die "Can’t open $FOO: $!"; # FOO or bust!

open(FOO) ? ’hi mom’ : die "Can’t open $FOO: $!";

# a bit exotic, that last one

156 Version 5.005_02 18−Oct−1998

perlsyn Perl Programmers Reference Guide perlsyn

The if statement is straightforward. Because BLOCKs are always bounded by curly brackets, there is never

any ambiguity about which if an else goes with. If you use unless in place of if, the sense of the test

is reversed.

The while statement executes the block as long as the expression is true (does not evaluate to the null string

("") or or "0"). The LABEL is optional, and if present, consists of an identifier followed by a colon.

The LABEL identifies the loop for the loop control statements next, last, and redo. If the LABEL is

omitted, the loop control statement refers to the innermost enclosing loop. This may include dynamically

looking back your call−stack at run time to find the LABEL. Such desperate behavior triggers a warning if

you use the −w flag.

If there is a continue BLOCK, it is always executed just before the conditional is about to be evaluated

again, just like the third part of a for loop in C. Thus it can be used to increment a loop variable, even

when the loop has been continued via the next statement (which is similar to the C continue statement).

Loop Control

The next command is like the continue statement in C; it starts the next iteration of the loop:

LINE: while (<STDIN>) {

next LINE if /^#/; # discard comments

...

}

The last command is like the break statement in C (as used in loops); it immediately exits the loop in

question. The continue block, if any, is not executed:

LINE: while (<STDIN>) {

last LINE if /^$/; # exit when done with header

...

}

The redo command restarts the loop block without evaluating the conditional again. The continue

block, if any, is not executed. This command is normally used by programs that want to lie to themselves

about what was just input.

For example, when processing a file like /etc/termcap. If your input lines might end in backslashes to

indicate continuation, you want to skip ahead and get the next record.

while (<>) {

chomp;

if (s/\\$//) {

$_ .= <>;

redo unless eof();

}

# now process $_

}

which is Perl short−hand for the more explicitly written version:

LINE: while (defined($line = <ARGV>)) {

chomp($line);

if ($line =~ s/\\$//) {

$line .= <ARGV>;

redo LINE unless eof(); # not eof(ARGV)!

}

# now process $line

}

Note that if there were a continue block on the above code, it would get executed even on discarded lines.

This is often used to reset line counters or ?pat? one−time matches.

18−Oct−1998 Version 5.005_02 157

perlsyn Perl Programmers Reference Guide perlsyn

# inspired by :1,$g/fred/s//WILMA/

while (<>) {

?(fred)? && s//WILMA $1 WILMA/;

?(barney)? && s//BETTY $1 BETTY/;

?(homer)? && s//MARGE $1 MARGE/;

} continue {

print "$ARGV $.: $_";

close ARGV if eof(); # reset $.

reset if eof(); # reset ?pat?

}

If the word while is replaced by the word until, the sense of the test is reversed, but the conditional is

still tested before the first iteration.

The loop control statements don‘t work in an if or unless, since they aren‘t loops. You can double the

braces to make them such, though.

if (/pattern/) {{

next if /fred/;

next if /barney/;

# so something here

}}

The form while/if BLOCK BLOCK, available in Perl 4, is no longer available. Replace any occurrence

of if BLOCK by if (do BLOCK).

For Loops

Perl‘s C−style for loop works exactly like the corresponding while loop; that means that this:

for ($i = 1; $i < 10; $i++) {

...

}

is the same as this:

$i = 1;

while ($i < 10) {

...

} continue {

$i++;

}

(There is one minor difference: The first form implies a lexical scope for variables declared with my in the

initialization expression.)

Besides the normal array index looping, for can lend itself to many other interesting applications. Here‘s

one that avoids the problem you get into if you explicitly test for end−of−file on an interactive file descriptor

causing your program to appear to hang.

$on_a_tty = −t STDIN && −t STDOUT;

sub prompt { print "yes? " if $on_a_tty }

for ( prompt(); <STDIN>; prompt() ) {

# do something

}

Foreach Loops

The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list

in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible

only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon

exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global

158 Version 5.005_02 18−Oct−1998

perlsyn Perl Programmers Reference Guide perlsyn

one, but it‘s still localized to the loop. (Note that a lexically scoped variable can cause problems if you have

subroutine or format declarations within the loop which refer to it.)

The foreach keyword is actually a synonym for the for keyword, so you can use foreach for

readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing

for comes more naturally.) If VAR is omitted, $_ is set to each value. If any element of LIST is an lvalue,

you can modify it by modifying VAR inside the loop. That‘s because the foreach loop index variable is

an implicit alias for each item in the list that you‘re looping over.

If any part of LIST is an array, foreach will get very confused if you add or remove elements within the

loop body, for example with splice. So don‘t do that.

foreach probably won‘t do what you expect if VAR is a tied or other special variable. Don‘t do that

either.

Examples:

for (@ary) { s/foo/bar/ }

foreach my $elem (@elements) {

$elem *= 2;

}

for $count (10,9,8,7,6,5,4,3,2,1,’BOOM’) {

print $count, "\n"; sleep(1);

}

for (1..15) { print "Merry Christmas\n"; }

foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) {

print "Item: $item\n";

}

Here‘s how a C programmer might code up a particular algorithm in Perl:

for (my $i = 0; $i < @ary1; $i++) {

for (my $j = 0; $j < @ary2; $j++) {

if ($ary1[$i] > $ary2[$j]) {

last; # can’t go to outer :−(

}

$ary1[$i] += $ary2[$j];

}

# this is where that last takes me

}

Whereas here‘s how a Perl programmer more comfortable with the idiom might do it:

OUTER: foreach my $wid (@ary1) {

INNER: foreach my $jet (@ary2) {

next OUTER if $wid > $jet;

$wid += $jet;

}

See how much easier this is? It‘s cleaner, safer, and faster. It‘s cleaner because it‘s less noisy. It‘s safer

because if code gets added between the inner and outer loops later on, the new code won‘t be accidentally

executed. The next explicitly iterates the other loop rather than merely terminating the inner one. And it‘s

faster because Perl executes a foreach statement more rapidly than it would the equivalent for loop.

18−Oct−1998 Version 5.005_02 159

perlsyn Perl Programmers Reference Guide perlsyn

Basic BLOCKs and Switch Statements

A BLOCK by itself (labeled or not) is semantically equivalent to a loop that executes once. Thus you can

use any of the loop control statements in it to leave or restart the block. (Note that this is NOT true in

eval{}, sub{}, or contrary to popular belief do{} blocks, which do NOT count as loops.) The

continue block is optional.

The BLOCK construct is particularly nice for doing case structures.

SWITCH: {

if (/^abc/) { $abc = 1; last SWITCH; }

if (/^def/) { $def = 1; last SWITCH; }

if (/^xyz/) { $xyz = 1; last SWITCH; }

$nothing = 1;

}

There is no official switch statement in Perl, because there are already several ways to write the

equivalent. In addition to the above, you could write

SWITCH: {

$abc = 1, last SWITCH if /^abc/;

$def = 1, last SWITCH if /^def/;

$xyz = 1, last SWITCH if /^xyz/;

$nothing = 1;

}

(That‘s actually not as strange as it looks once you realize that you can use loop control "operators" within an

expression, That‘s just the normal C comma operator.)

SWITCH: {

/^abc/ && do { $abc = 1; last SWITCH; };

/^def/ && do { $def = 1; last SWITCH; };

/^xyz/ && do { $xyz = 1; last SWITCH; };

$nothing = 1;

}

or formatted so it stands out more as a "proper" switch statement:

SWITCH: {

/^abc/ && do {

$abc = 1;

last SWITCH;

};

/^def/ && do {

$def = 1;

last SWITCH;

};

/^xyz/ && do {

$xyz = 1;

last SWITCH;

};

$nothing = 1;

}

SWITCH: {

160 Version 5.005_02 18−Oct−1998

perlsyn Perl Programmers Reference Guide perlsyn

/^abc/ and $abc = 1, last SWITCH;

/^def/ and $def = 1, last SWITCH;

/^xyz/ and $xyz = 1, last SWITCH;

$nothing = 1;

}

or even, horrors,

if (/^abc/)

{ $abc = 1 }

elsif (/^def/)

{ $def = 1 }

elsif (/^xyz/)

{ $xyz = 1 }

else

{ $nothing = 1 }

A common idiom for a switch statement is to use foreach‘s aliasing to make a temporary assignment to

$_ for convenient matching:

SWITCH: for ($where) {

/In Card Names/ && do { push @flags, ’−e’; last; };

/Anywhere/ && do { push @flags, ’−h’; last; };

/In Rulings/ && do { last; };

die "unknown value for form variable where: ‘$where’";

}

Another interesting approach to a switch statement is arrange for a do block to return the proper value:

$amode = do {

if ($flag & O_RDONLY) { "r" } # XXX: isn’t this 0?

elsif ($flag & O_WRONLY) { ($flag & O_APPEND) ? "a" : "w" }

elsif ($flag & O_RDWR) {

if ($flag & O_CREAT) { "w+" }

else { ($flag & O_APPEND) ? "a+" : "r+" }

}

};

print do {

($flags & O_WRONLY) ? "write−only" :

($flags & O_RDWR) ? "read−write" :

"read−only";

};

Or if you are certainly that all the && clauses are true, you can use something like this, which "switches" on

the value of the HTTP_USER_AGENT envariable.

#!/usr/bin/perl

# pick out jargon file page based on browser

$dir = ’http://www.wins.uva.nl/~mes/jargon’;

for ($ENV{HTTP_USER_AGENT}) {

$page = /Mac/ && ’m/Macintrash.html’

|| /Win(dows )?NT/ && ’e/evilandrude.html’

|| /Win|MSIE|WebTV/ && ’m/MicroslothWindows.html’

|| /Linux/ && ’l/Linux.html’

|| /HP−UX/ && ’h/HP−SUX.html’

|| /SunOS/ && ’s/ScumOS.html’

18−Oct−1998 Version 5.005_02 161

perlsyn Perl Programmers Reference Guide perlsyn

|| ’a/AppendixB.html’;

}

print "Location: $dir/$page\015\012\015\012";

That kind of switch statement only works when you know the && clauses will be true. If you don‘t, the

previous ?: example should be used.

You might also consider writing a hash instead of synthesizing a switch statement.

Goto

Although not for the faint of heart, Perl does support a goto statement. A loop‘s LABEL is not actually a

valid target for a goto; it‘s just the name of the loop. There are three forms: goto−LABEL, goto−EXPR,

and goto−&NAME.

The goto−LABEL form finds the statement labeled with LABEL and resumes execution there. It may not

be used to go into any construct that requires initialization, such as a subroutine or a foreach loop. It also

can‘t be used to go into a construct that is optimized away. It can be used to go almost anywhere else within

the dynamic scope, including out of subroutines, but it‘s usually better to use some other construct such as

last or die. The author of Perl has never felt the need to use this form of goto (in Perl, that is—C is

another matter).

The goto−EXPR form expects a label name, whose scope will be resolved dynamically. This allows for

computed gotos per FORTRAN, but isn‘t necessarily recommended if you‘re optimizing for

maintainability:

goto ("FOO", "BAR", "GLARCH")[$i];

The goto−&NAME form is highly magical, and substitutes a call to the named subroutine for the currently

running subroutine. This is used by AUTOLOAD() subroutines that wish to load another subroutine and then

pretend that the other subroutine had been called in the first place (except that any modifications to @_ in the

current subroutine are propagated to the other subroutine.) After the goto, not even caller() will be

able to tell that this routine was called first.

In almost all cases like this, it‘s usually a far, far better idea to use the structured control flow mechanisms of

next, last, or redo instead of resorting to a goto. For certain applications, the catch and throw pair of

eval{} and die() for exception processing can also be a prudent approach.

PODs: Embedded Documentation

Perl has a mechanism for intermixing documentation with source code. While it‘s expecting the beginning of

a new statement, if the compiler encounters a line that begins with an equal sign and a word, like this

=head1 Here There Be Pods!

Then that text and all remaining text up through and including a line beginning with =cut will be ignored.

The format of the intervening text is described in perlpod.

This allows you to intermix your source code and your documentation text freely, as in

=item snazzle($)

The snazzle() function will behave in the most spectacular

form that you can possibly imagine, not even excepting

cybernetic pyrotechnics.

=cut back to the compiler, nuff of this pod stuff!

sub snazzle($) {

my $thingie = shift;

.........

}

Note that pod translators should look at only paragraphs beginning with a pod directive (it makes parsing

easier), whereas the compiler actually knows to look for pod escapes even in the middle of a paragraph. This

162 Version 5.005_02 18−Oct−1998

perlsyn Perl Programmers Reference Guide perlsyn

means that the following secret stuff will be ignored by both the compiler and the translators.

$a=3;

=secret stuff

warn "Neither POD nor CODE!?"

=cut back

print "got $a\n";

You probably shouldn‘t rely upon the warn() being podded out forever. Not all pod translators are

well−behaved in this regard, and perhaps the compiler will become pickier.

One may also use pod directives to quickly comment out a section of code.

Plain Old Comments (Not!)

Much like the C preprocessor, Perl can process line directives. Using this, one can control Perl‘s idea of

filenames and line numbers in error or warning messages (especially for strings that are processed with

eval()). The syntax for this mechanism is the same as for most C preprocessors: it matches the regular

expression /^#\s*line\s+(\d+)\s*(?:\s"([^"]*)")?/ with $1 being the line number for the

next line, and $2 being the optional filename (specified within quotes).

Here are some examples that you should be able to type into your command shell:

% perl

# line 200 "bzzzt"

# the ‘#’ on the previous line must be the first char on line

die ’foo’;

__END__

foo at bzzzt line 201.

% perl

# line 200 "bzzzt"

eval qq[\n#line 2001 ""\ndie ’foo’]; print $@;

__END__

foo at − line 2001.

% perl

eval qq[\n#line 200 "foo bar"\ndie ’foo’]; print $@;

__END__

foo at foo bar line 200.

% perl

# line 345 "goop"

eval "\n#line " . __LINE__ . ’ "’ . __FILE__ ."\"\ndie ’foo’";

print $@;

__END__

foo at goop line 345.

18−Oct−1998 Version 5.005_02 163

perlop Perl Programmers Reference Guide perlop

NAME

perlop − Perl operators and precedence

SYNOPSIS

Perl operators have the following associativity and precedence, listed from highest precedence to lowest.

Note that all operators borrowed from C keep the same precedence relationship with each other, even where

C‘s precedence is slightly screwy. (This makes learning Perl easier for C folks.) With very few exceptions,

these all operate on scalar values only, not array values.

left terms and list operators (leftward)

left −>

nonassoc ++ −−

right **

right ! ~ \ and unary + and −

left =~ !~

left * / % x

left + − .

left << >>

nonassoc named unary operators

nonassoc < > <= >= lt gt le ge

nonassoc == != <=> eq ne cmp

left &

left | ^

left &&

left ||

nonassoc .. ...

right ?:

right = += −= *= etc.

left , =>

nonassoc list operators (rightward)

right not

left and

left or xor

In the following sections, these operators are covered in precedence order.

Many operators can be overloaded for objects. See overload.

DESCRIPTION

Terms and List Operators (Leftward)

A TERM has the highest precedence in Perl. They includes variables, quote and quote−like operators, any

expression in parentheses, and any function whose arguments are parenthesized. Actually, there aren‘t really

functions in this sense, just list operators and unary operators behaving as functions because you put

parentheses around the arguments. These are all documented in perlfunc.

If any list operator (print(), etc.) or any unary operator (chdir(), etc.) is followed by a left

parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest

precedence, just like a normal function call.

In the absence of parentheses, the precedence of list operators such as print, sort, or chmod is either

very high or very low depending on whether you are looking at the left side or the right side of the operator.

For example, in

@ary = (1, 3, sort 4, 2);

print @ary; # prints 1324

the commas on the right of the sort are evaluated before the sort, but the commas on the left are evaluated

164 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

after. In other words, list operators tend to gobble up all the arguments that follow them, and then act like a

simple TERM with regard to the preceding expression. Note that you have to be careful with parentheses:

# These evaluate exit before doing the print:

print($foo, exit); # Obviously not what you want.

print $foo, exit; # Nor is this.

# These do the print before evaluating exit:

(print $foo), exit; # This is what you want.

print($foo), exit; # Or this.

print ($foo), exit; # Or even this.

Also note that

print ($foo & 255) + 1, "\n";

probably doesn‘t do what you expect at first glance. See Named Unary Operators for more discussion of

this.

Also parsed as terms are the do {} and eval {} constructs, as well as subroutine and method calls, and

the anonymous constructors [] and {}.

See also Quote and Quote−like Operators toward the end of this section, as well as O Operators".

The Arrow Operator

Just as in C and C++, "−>" is an infix dereference operator. If the right side is either a [...] or {...}

subscript, then the left side must be either a hard or symbolic reference to an array or hash (or a location

capable of holding a hard reference, if it‘s an lvalue (assignable)). See perlref.

Otherwise, the right side is a method name or a simple scalar variable containing the method name, and the

left side must either be an object (a blessed reference) or a class name (that is, a package name). See perlobj.

Auto−increment and Auto−decrement

"++" and "—" work as in C. That is, if placed before a variable, they increment or decrement the variable

before returning the value, and if placed after, increment or decrement the variable after returning the value.

The auto−increment operator has a little extra builtin magic to it. If you increment a variable that is numeric,

or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has

been used in only string contexts since it was set, and has a value that is not the empty string and matches the

pattern /^[a−zA−Z]*[0−9]*$/, the increment is done as a string, preserving each character within its

range, with carry:

print ++($foo = ’99’); # prints ’100’

print ++($foo = ’a0’); # prints ’a1’

print ++($foo = ’Az’); # prints ’Ba’

print ++($foo = ’zz’); # prints ’aaa’

The auto−decrement operator is not magical.

Exponentiation

Binary "**" is the exponentiation operator. Note that it binds even more tightly than unary minus, so −2**4

is −(2**4), not (−2)**4. (This is implemented using C‘s pow(3) function, which actually works on doubles

internally.)

Symbolic Unary Operators

Unary "!" performs logical negation, i.e., "not". See also not for a lower precedence version of this.

Unary "−" performs arithmetic negation if the operand is numeric. If the operand is an identifier, a string

consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a

plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that

−bareword is equivalent to "−bareword".

18−Oct−1998 Version 5.005_02 165

perlop Perl Programmers Reference Guide perlop

Unary "~" performs bitwise negation, i.e., 1‘s complement. For example, 0666 &~ 027 is 0640. (See

also Integer Arithmetic and Bitwise String Operators.)

Unary "+" has no effect whatsoever, even on strings. It is useful syntactically for separating a function name

from a parenthesized expression that would otherwise be interpreted as the complete list of function

arguments. (See examples above under Terms and List Operators (Leftward).)

Unary "\" creates a reference to whatever follows it. See perlref. Do not confuse this behavior with the

behavior of backslash within a string, although both forms do convey the notion of protecting the next thing

from interpretation.

Binding Operators

Binary "=~" binds a scalar expression to a pattern match. Certain operations search or modify the string $_

by default. This operator makes that kind of operation work on some other string. The right argument is a

search pattern, substitution, or transliteration. The left argument is what is supposed to be searched,

substituted, or transliterated instead of the default $_. The return value indicates the success of the

operation. (If the right argument is an expression rather than a search pattern, substitution, or transliteration,

it is interpreted as a search pattern at run time. This can be is less efficient than an explicit search, because

the pattern must be compiled every time the expression is evaluated.

Binary "!~" is just like "=~" except the return value is negated in the logical sense.

Multiplicative Operators

Binary "*" multiplies two numbers.

Binary "/" divides two numbers.

Binary "%" computes the modulus of two numbers. Given integer operands $a and $b: If $b is positive,

then $a % $b is $a minus the largest multiple of $b that is not greater than $a. If $b is negative, then

$a % $b is $a minus the smallest multiple of $b that is not less than $a (i.e. the result will be less than or

equal to zero). Note than when use integer is in scope, "%" give you direct access to the modulus

operator as implemented by your C compiler. This operator is not as well defined for negative operands, but

it will execute faster.

Binary "x" is the repetition operator. In scalar context, it returns a string consisting of the left operand

repeated the number of times specified by the right operand. In list context, if the left operand is a list in

parentheses, it repeats the list.

print ’−’ x 80; # print row of dashes

print "\t" x ($tab/8), ’ ’ x ($tab%8); # tab over

@ones = (1) x 80; # a list of 80 1’s

@ones = (5) x @ones; # set all elements to 5

Additive Operators

Binary "+" returns the sum of two numbers.

Binary "−" returns the difference of two numbers.

Binary "." concatenates two strings.

Shift Operators

Binary "<<" returns the value of its left argument shifted left by the number of bits specified by the right

argument. Arguments should be integers. (See also Integer Arithmetic.)

Binary "" returns the value of its left argument shifted right by the number of bits specified by the right

argument. Arguments should be integers. (See also Integer Arithmetic.)

Named Unary Operators

The various named unary operators are treated as functions with one argument, with optional parentheses.

These include the filetest operators, like −f, −M, etc. See perlfunc.

166 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

If any list operator (print(), etc.) or any unary operator (chdir(), etc.) is followed by a left

parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest

precedence, just like a normal function call. Examples:

chdir $foo || die; # (chdir $foo) || die

chdir($foo) || die; # (chdir $foo) || die

chdir ($foo) || die; # (chdir $foo) || die

chdir +($foo) || die; # (chdir $foo) || die

but, because * is higher precedence than ||:

chdir $foo * 20; # chdir ($foo * 20)

chdir($foo) * 20; # (chdir $foo) * 20

chdir ($foo) * 20; # (chdir $foo) * 20

chdir +($foo) * 20; # chdir ($foo * 20)

rand 10 * 20; # rand (10 * 20)

rand(10) * 20; # (rand 10) * 20

rand (10) * 20; # (rand 10) * 20

rand +(10) * 20; # rand (10 * 20)

See also "Terms and List Operators (Leftward)".

Relational Operators

Binary "<" returns true if the left argument is numerically less than the right argument.

Binary ">" returns true if the left argument is numerically greater than the right argument.

Binary "<=" returns true if the left argument is numerically less than or equal to the right argument.

Binary ">=" returns true if the left argument is numerically greater than or equal to the right argument.

Binary "lt" returns true if the left argument is stringwise less than the right argument.

Binary "gt" returns true if the left argument is stringwise greater than the right argument.

Binary "le" returns true if the left argument is stringwise less than or equal to the right argument.

Binary "ge" returns true if the left argument is stringwise greater than or equal to the right argument.

Equality Operators

Binary "==" returns true if the left argument is numerically equal to the right argument.

Binary "!=" returns true if the left argument is numerically not equal to the right argument.

Binary "<=>" returns −1, 0, or 1 depending on whether the left argument is numerically less than, equal to,

or greater than the right argument.

Binary "eq" returns true if the left argument is stringwise equal to the right argument.

Binary "ne" returns true if the left argument is stringwise not equal to the right argument.

Binary "cmp" returns −1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or

greater than the right argument.

"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified by the current locale if use locale

is in effect. See perllocale.

Bitwise And

Binary "&" returns its operators ANDed together bit by bit. (See also Integer Arithmetic and

Bitwise String Operators.)

18−Oct−1998 Version 5.005_02 167

perlop Perl Programmers Reference Guide perlop

Bitwise Or and Exclusive Or

Binary "|" returns its operators ORed together bit by bit. (See also Integer Arithmetic and

Bitwise String Operators.)

Binary "^" returns its operators XORed together bit by bit. (See also Integer Arithmetic and

Bitwise String Operators.)

C−style Logical And

Binary "&&" performs a short−circuit logical AND operation. That is, if the left operand is false, the right

operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.

C−style Logical Or

Binary "||" performs a short−circuit logical OR operation. That is, if the left operand is true, the right

operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.

The || and && operators differ from C‘s in that, rather than returning 0 or 1, they return the last value

evaluated. Thus, a reasonably portable way to find out the home directory (assuming it‘s not "0") might be:

$home = $ENV{’HOME’} || $ENV{’LOGDIR’} ||

(getpwuid($<))[7] || die "You’re homeless!\n";

In particular, this means that you shouldn‘t use this for selecting between two aggregates for assignment:

@a = @b || @c; # this is wrong

@a = scalar(@b) || @c; # really meant this

@a = @b ? @b : @c; # this works fine, though

As more readable alternatives to && and || when used for control flow, Perl provides and and or operators

(see below). The short−circuit behavior is identical. The precedence of "and" and "or" is much lower,

however, so that you can safely use them after a list operator without the need for parentheses:

unlink "alpha", "beta", "gamma"

or gripe(), next LINE;

With the C−style operators that would have been written like this:

unlink("alpha", "beta", "gamma")

|| (gripe(), next LINE);

Use "or" for assignment is unlikely to do what you want; see below.

Range Operators

Binary ".." is the range operator, which is really two different operators depending on the context. In list

context, it returns an array of values counting (by ones) from the left value to the right value. This is useful

for writing foreach (1..10) loops and for doing slice operations on arrays. In the current

implementation, no temporary array is created when the range operator is used as the expression in

foreach loops, but older versions of Perl might burn a lot of memory when you write something like this:

for (1 .. 1_000_000) {

# code

}

In scalar context, ".." returns a boolean value. The operator is bistable, like a flip−flop, and emulates the

line−range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean

state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true

until the right operand is true, AFTER which the range operator becomes false again. (It doesn‘t become

false till the next time the range operator is evaluated. It can test the right operand and become false on the

same evaluation it became true (as in awk), but it still returns true once. If you don‘t want it to test the right

operand till the next evaluation (as in sed), use three dots ("...") instead of two.) The right operand is not

evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is

in the "true" state. The precedence is a little lower than || and &&. The value returned is either the empty

168 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each

range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn‘t

affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can

exclude the beginning point by waiting for the sequence number to be greater than 1. If either operand of

scalar ".." is a constant expression, that operand is implicitly compared to the $. variable, the current line

number. Examples:

As a scalar operator:

if (101 .. 200) { print; } # print 2nd hundred lines

next line if (1 .. /^$/); # skip header lines

s/^/> / if (/^$/ .. eof()); # quote body

# parse mail messages

while (<>) {

$in_header = 1 .. /^$/;

$in_body = /^$/ .. eof();

# do something based on those

} continue {

close ARGV if eof; # reset $. each file

}

As a list operator:

for (101 .. 200) { print; } # print $_ 100 times

@foo = @foo[0 .. $#foo]; # an expensive no−op

@foo = @foo[$#foo−4 .. $#foo]; # slice last 5 items

The range operator (in list context) makes use of the magical auto−increment algorithm if the operands are

strings. You can say

@alphabet = (’A’ .. ’Z’);

to get all the letters of the alphabet, or

$hexdigit = (0 .. 9, ’a’ .. ’f’)[$num & 15];

to get a hexadecimal digit, or

@z2 = (’01’ .. ’31’); print $z2[$mday];

to get dates with leading zeros. If the final value specified is not in the sequence that the magical increment

would produce, the sequence goes until the next value would be longer than the final value specified.

Conditional Operator

Ternary "?:" is the conditional operator, just as in C. It works much like an if−then−else. If the argument

before the ? is true, the argument before the : is returned, otherwise the argument after the : is returned. For

example:

printf "I have %d dog%s.\n", $n,

($n == 1) ? ’’ : "s";

Scalar or list context propagates downward into the 2nd or 3rd argument, whichever is selected.

$a = $ok ? $b : $c; # get a scalar

@a = $ok ? @b : @c; # get an array

$a = $ok ? @b : @c; # oops, that’s just a count!

The operator may be assigned to if both the 2nd and 3rd arguments are legal lvalues (meaning that you can

assign to them):

($a_or_b ? $a : $b) = $c;

18−Oct−1998 Version 5.005_02 169

perlop Perl Programmers Reference Guide perlop

This is not necessarily guaranteed to contribute to the readability of your program.

Because this operator produces an assignable result, using assignments without parentheses will get you in

trouble. For example, this:

$a % 2 ? $a += 10 : $a += 2

Really means this:

(($a % 2) ? ($a += 10) : $a) += 2

Rather than this:

($a % 2) ? ($a += 10) : ($a += 2)

Assignment Operators

"=" is the ordinary assignment operator.

Assignment operators work as in C. That is,

$a += 2;

is equivalent to

$a = $a + 2;

although without duplicating any side effects that dereferencing the lvalue might trigger, such as from

tie(). Other assignment operators work similarly. The following are recognized:

**= += *= &= <<= &&=

−= /= |= >>= ||=

.= %= ^=

Note that while these are grouped by family, they all have the precedence of assignment.

Unlike in C, the assignment operator produces a valid lvalue. Modifying an assignment is equivalent to

doing the assignment and then modifying the variable that was assigned to. This is useful for modifying a

copy of something, like this:

($tmp = $global) =~ tr [A−Z] [a−z];

Likewise,

($a += 2) *= 3;

is equivalent to

$a += 2;

$a *= 3;

Comma Operator

Binary "," is the comma operator. In scalar context it evaluates its left argument, throws that value away,

then evaluates its right argument and returns that value. This is just like C‘s comma operator.

In list context, it‘s just the list argument separator, and inserts both its arguments into the list.

The => digraph is mostly just a synonym for the comma operator. It‘s useful for documenting arguments

that come in pairs. As of release 5.001, it also forces any word to the left of it to be interpreted as a string.

List Operators (Rightward)

On the right side of a list operator, it has very low precedence, such that it controls all comma−separated

expressions found there. The only operators with lower precedence are the logical operators "and", "or", and

"not", which may be used to evaluate calls to list operators without the need for extra parentheses:

open HANDLE, "filename"

170 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

or die "Can’t open: $!\n";

See also discussion of list operators in Terms and List Operators (Leftward).

Logical Not

Unary "not" returns the logical negation of the expression to its right. It‘s the equivalent of "!" except for the

very low precedence.

Logical And

Binary "and" returns the logical conjunction of the two surrounding expressions. It‘s equivalent to &&

except for the very low precedence. This means that it short−circuits: i.e., the right expression is evaluated

only if the left expression is true.

Logical or and Exclusive Or

Binary "or" returns the logical disjunction of the two surrounding expressions. It‘s equivalent to || except for

the very low precedence. This makes it useful for control flow

print FH $data or die "Can’t write to FH: $!";

This means that it short−circuits: i.e., the right expression is evaluated only if the left expression is false.

Due to its precedence, you should probably avoid using this for assignment, only for control flow.

$a = $b or $c; # bug: this is wrong

($a = $b) or $c; # really means this

$a = $b || $c; # better written this way

However, when it‘s a list context assignment and you‘re trying to use "||" for control flow, you probably need

"or" so that the assignment takes higher precedence.

@info = stat($file) || die; # oops, scalar sense of stat!

@info = stat($file) or die; # better, now @info gets its due

Then again, you could always use parentheses.

Binary "xor" returns the exclusive−OR of the two surrounding expressions. It cannot short circuit, of course.

C Operators Missing From Perl

Here is what C has that Perl doesn‘t:

unary & Address−of operator. (But see the "\" operator for taking a reference.)

unary * Dereference−address operator. (Perl‘s prefix dereferencing operators are typed: $, @, %, and

&.)

(TYPE) Type casting operator.

Quote and Quote−like Operators

While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds

of interpolating and pattern matching capabilities. Perl provides customary quote characters for these

behaviors, but also provides a way for you to choose your quote character for any of them. In the following

table, a {} represents any pair of delimiters you choose. Non−bracketing delimiters use the same character

fore and aft, but the 4 sorts of brackets (round, angle, square, curly) will all nest.

Customary Generic Meaning Interpolates

’’ q{} Literal no

"" qq{} Literal yes

‘‘ qx{} Command yes (unless ’’ is delimiter)

qw{} Word list no

// m{} Pattern match yes

qr{} Pattern yes

s{}{} Substitution yes

tr{}{} Transliteration no (but see below)

18−Oct−1998 Version 5.005_02 171

perlop Perl Programmers Reference Guide perlop

Note that there can be whitespace between the operator and the quoting characters, except when # is being

used as the quoting character. q#foo# is parsed as being the string foo, while q #foo# is the operator q

followed by a comment. Its argument will be taken from the next line. This allows you to write:

s {foo} # Replace foo

{bar} # with bar.

For constructs that do interpolation, variables beginning with "$" or "@" are interpolated, as are the

following sequences. Within a transliteration, the first ten of these sequences may be used.

\t tab (HT, TAB)

\n newline (NL)

\r return (CR)

\f form feed (FF)

\b backspace (BS)

\a alarm (bell) (BEL)

\e escape (ESC)

\033 octal char

\x1b hex char

\c[ control char

\l lowercase next char

\u uppercase next char

\L lowercase till \E

\U uppercase till \E

\E end case modification

\Q quote non−word characters till \E

If use locale is in effect, the case map used by \l, \L, \u and \U is taken from the current locale. See

perllocale.

All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as

an unvarying, physical newline character. It is an illusion that the operating system, device drivers, C

libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF.

For example, on a Mac, these are reversed, and on systems without line terminator, printing "\n" may emit

no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII

when you need an exact character. For example, most networking protocols expect and prefer a CR+LF

("\012\015" or "\cJ\cM") for line terminators, and although they often accept just "\012", they

seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned

some day.

You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the

corresponding variable, while escaping will cause the literal string \$ to be inserted. You‘ll need to write

something like m/\Quser\E\@\Qhost/.

Patterns are subject to an additional level of interpretation as a regular expression. This is done as a second

pass, after variables are interpolated, so that regular expressions may be incorporated into the pattern from

the variables. If this is not what you want, use \Q to interpolate a variable literally.

Apart from the above, there are no multiple levels of interpolation. In particular, contrary to the expectations

of shell programmers, back−quotes do NOT interpolate within double quotes, nor do single quotes impede

evaluation of variables when used within double quotes.

Regexp Quote−Like Operators

Here are the quote−like operators that apply to pattern matching and related activities.

Most of this section is related to use of regular expressions from Perl. Such a use may be considered from

two points of view: Perl handles a a string and a "pattern" to RE (regular expression) engine to match, RE

engine finds (or does not find) the match, and Perl uses the findings of RE engine for its operation, possibly

asking the engine for other matches.

172 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

RE engine has no idea what Perl is going to do with what it finds, similarly, the rest of Perl has no idea what

a particular regular expression means to RE engine. This creates a clean separation, and in this section we

discuss matching from Perl point of view only. The other point of view may be found in perlre.

?PATTERN?

This is just like the /pattern/ search, except that it matches only once between calls to the

reset() operator. This is a useful optimization when you want to see only the first occurrence

of something in each file of a set of files, for instance. Only ?? patterns local to the current

package are reset.

while (<>) {

if (?^$?) {

# blank line between header and body

}

} continue {

reset if eof; # clear ?? status for next file

}

This usage is vaguely deprecated, and may be removed in some future version of Perl.

m/PATTERN/cgimosx

/PATTERN/cgimosx

Searches a string for a pattern match, and in scalar context returns true (1) or false (‘’). If no

string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with

=~ need not be an lvalue—it may be the result of an expression evaluation, but remember the =~

binds rather tightly.) See also perlre. See perllocale for discussion of additional considerations

that apply when use locale is in effect.

Options are:

c Do not reset search position on a failed match when /g is in effect.

g Match globally, i.e., find all occurrences.

i Do case−insensitive pattern matching.

m Treat string as multiple lines.

o Compile pattern only once.

s Treat string as single line.

x Use extended regular expressions.

If "/" is the delimiter then the initial m is optional. With the m you can use any pair of

non−alphanumeric, non−whitespace characters as delimiters (if single quotes are used, no

interpretation is done on the replacement string. Unlike Perl 4, Perl 5 treats backticks as normal

delimiters; the replacement text is not evaluated as a command). This is particularly useful for

matching Unix path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is

the delimiter, then the match−only−once rule of ?PATTERN? applies.

PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every

time the pattern search is evaluated. (Note that $) and $| might not be interpolated because

they look like end−of−string tests.) If you want such a pattern to be compiled only once, add a

/o after the trailing delimiter. This avoids expensive run−time recompilations, and is useful

when the value you are interpolating won‘t change over the life of the script. However,

mentioning /o constitutes a promise that you won‘t change the variables in the pattern. If you

change them, Perl won‘t even notice.

If the PATTERN evaluates to the empty string, the last successfully matched regular expression

is used instead.

If the /g option is not used, m// in a list context returns a list consisting of the subexpressions

matched by the parentheses in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc. are

also set, and that this differs from Perl 4‘s behavior.) When there are no parentheses in the

18−Oct−1998 Version 5.005_02 173

perlop Perl Programmers Reference Guide perlop

pattern, the return value is the list (1) for success. With or without parentheses, an empty list is

returned upon failure.

Examples:

open(TTY, ’/dev/tty’);

<TTY> =~ /^y/i && foo(); # do foo if desired

if (/Version: *([0−9.]*)/) { $version = $1; }

next if m#^/usr/spool/uucp#;

# poor man’s grep

$arg = shift;

while (<>) {

print if /$arg/o; # compile only once

}

if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

This last example splits $foo into the first two words and the remainder of the line, and assigns

those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were

assigned, i.e., if the pattern matched.

The /g modifier specifies global pattern matching—that is, matching as many times as possible

within the string. How it behaves depends on the context. In list context, it returns a list of all

the substrings matched by all the parentheses in the regular expression. If there are no

parentheses, it returns a list of all the matched strings, as if there were parentheses around the

whole pattern.

In scalar context, each execution of m//g finds the next match, returning TRUE if it matches,

and FALSE if there is no further match. The position after the last match can be read or set using

the pos() function; see pos. A failed match normally resets the search position to the

beginning of the string, but you can avoid that by adding the /c modifier (e.g. m//gc).

Modifying the target string also resets the search position.

You can intermix m//g matches with m/\G.../g, where \G is a zero−width assertion that

matches the exact position where the previous m//g, if any, left off. The \G assertion is not

supported without the /g modifier; currently, without /g, \G behaves just like \A, but that‘s

accidental and may change in the future.

Examples:

# list context

($one,$five,$fifteen) = (‘uptime‘ =~ /(\d+\.\d+)/g);

# scalar context

$/ = ""; $* = 1; # $* deprecated in modern perls

while (defined($paragraph = <>)) {

while ($paragraph =~ /[a−z][’")]*[.!?]+[’")]*\s/g) {

$sentences++;

}

print "$sentences\n";

# using m//gc with \G

$_ = "ppooqppqq";

while ($i++ < 2) {

print "1: ’";

print $1 while /(o)/gc; print "’, pos=", pos, "\n";

print "2: ’";

174 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

print $1 if /\G(q)/gc; print "’, pos=", pos, "\n";

print "3: ’";

print $1 while /(p)/gc; print "’, pos=", pos, "\n";

}

The last example should print:

1: ’oo’, pos=4

2: ’q’, pos=5

3: ’pp’, pos=7

1: ’’, pos=7

2: ’q’, pos=8

3: ’’, pos=8

A useful idiom for lex−like scanners is /\G.../gc. You can combine several regexps like

this to process a string part−by−part, doing different actions depending on which regexp

matched. Each regexp tries to match where the previous one leaves off.

$_ = <<’EOL’;

$url = new URI::URL "http://www/"; die if $url eq "xXx";

EOL

LOOP:

{

print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc;

print(" lowercase"), redo LOOP if /\G[a−z]+\b[,.;]?\s*/gc;

print(" UPPERCASE"), redo LOOP if /\G[A−Z]+\b[,.;]?\s*/gc;

print(" Capitalized"), redo LOOP if /\G[A−Z][a−z]+\b[,.;]?\s*/gc;

print(" MiXeD"), redo LOOP if /\G[A−Za−z]+\b[,.;]?\s*/gc;

print(" alphanumeric"), redo LOOP if /\G[A−Za−z0−9]+\b[,.;]?\s*/gc;

print(" line−noise"), redo LOOP if /\G[^A−Za−z0−9]+/gc;

print ". That’s all!\n";

}

Here is the output (split into several lines):

line−noise lowercase line−noise lowercase UPPERCASE line−noise

UPPERCASE line−noise lowercase line−noise lowercase line−noise

lowercase lowercase line−noise lowercase lowercase line−noise

MiXeD line−noise. That’s all!

q/STRING/

‘STRING’

A single−quoted, literal string. A backslash represents a backslash unless followed by the

delimiter or another backslash, in which case the delimiter or backslash is interpolated.

$foo = q!I said, "You said, ’She said it.’"!;

$bar = q(’This is it.’);

$baz = ’\n’; # a two−character string

qq/STRING/

"STRING"

A double−quoted, interpolated string.

$_ .= qq

(*** The previous line contains the naughty word "$1".\n)

if /(tcl|rexx|python)/; # :−)

$baz = "\n"; # a one−character string

18−Oct−1998 Version 5.005_02 175

perlop Perl Programmers Reference Guide perlop

qr/STRING/imosx

A string which is (possibly) interpolated and then compiled as a regular expression. The result

may be used as a pattern in a match

$re = qr/$pattern/;

$string =~ /foo${re}bar/; # can be interpolated in other patterns

$string =~ $re; # or used standalone

Options are:

i Do case−insensitive pattern matching.

m Treat string as multiple lines.

o Compile pattern only once.

s Treat string as single line.

x Use extended regular expressions.

The benefit from this is that the pattern is precompiled into an internal representation, and does

not need to be recompiled every time a match is attempted. This makes it very efficient to do

something like:

foreach $pattern (@pattern_list) {

my $re = qr/$pattern/;

foreach $line (@lines) {

if($line =~ /$re/) {

do_something($line);

}

See perlre for additional information on valid syntax for STRING, and for a detailed look at the

semantics of regular expressions.

qx/STRING/

‘STRING‘ A string which is (possibly) interpolated and then executed as a system command with

/bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The

collected standard output of the command is returned; standard error is unaffected. In scalar

context, it comes back as a single (potentially multi−line) string. In list context, returns a list of

lines (however you‘ve defined lines with $/ or $INPUT_RECORD_SEPARATOR).

Because backticks do not affect standard error, use shell file descriptor syntax (assuming the

shell supports this) if you care to address this. To capture a command‘s STDERR and STDOUT

together:

$output = ‘cmd 2>&1‘;

To capture a command‘s STDOUT but discard its STDERR:

$output = ‘cmd 2>/dev/null‘;

To capture a command‘s STDERR but discard its STDOUT (ordering is important here):

$output = ‘cmd 2>&1 1>/dev/null‘;

To exchange a command‘s STDOUT and STDERR in order to capture the STDERR but leave its

STDOUT to come out the old STDERR:

$output = ‘cmd 3>&1 1>&2 2>&3 3>&−‘;

To read both a command‘s STDOUT and its STDERR separately, it‘s easiest and safest to

redirect them separately to files, and then read from those files when the program is done:

system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");

176 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

Using single−quote as a delimiter protects the command from Perl‘s double−quote interpolation,

passing it on to the shell instead:

$perl_info = qx(ps $$); # that’s Perl’s $$

$shell_info = qx’ps $$’; # that’s the new shell’s $$

Note that how the string gets evaluated is entirely subject to the command interpreter on your

system. On most platforms, you will have to protect shell metacharacters if you want them

treated literally. This is in practice difficult to do, as it‘s unclear how to escape which characters.

See perlsec for a clean and safe example of a manual fork() and exec() to emulate

backticks safely.

On some platforms (notably DOS−like ones), the shell may not be capable of dealing with

multiline commands, so putting newlines in the string may not get you what you want. You may

be able to evaluate multiple commands in a single line by separating them with the command

separator character, if your shell supports that (e.g. ; on many Unix shells; & on the Windows

NT cmd shell).

Beware that some command shells may place restrictions on the length of the command line.

You must ensure your strings don‘t exceed this limit after any necessary interpolations. See the

platform−specific release notes for more details about your particular environment.

Using this operator can lead to programs that are difficult to port, because the shell commands

called vary between systems, and may in fact not be present at all. As one example, the type

command under the POSIX shell is very different from the type command under DOS. That

doesn‘t mean you should go out of your way to avoid backticks when they‘re the right way to get

something done. Perl was made to be a glue language, and one of the things it glues together is

commands. Just understand what you‘re getting yourself into.

See O Operators" for more discussion.

qw/STRING/

Returns a list of the words extracted out of STRING, using embedded whitespace as the word

delimiters. It is exactly equivalent to

split(’ ’, q/STRING/);

This equivalency means that if used in scalar context, you‘ll get split‘s (unfortunate) scalar

context behavior, complete with mysterious warnings.

Some frequently seen examples:

use POSIX qw( setlocale localeconv )

@EXPORT = qw( foo bar baz );

A common mistake is to try to separate the words with comma or to put comments into a

multi−line qw−string. For this reason the −w switch produce warnings if the STRING contains

the "," or the "#" character.

s/PATTERN/REPLACEMENT/egimosx

Searches a string for a pattern, and if found, replaces that pattern with the replacement text and

returns the number of substitutions made. Otherwise it returns false (specifically, the empty

string).

If no string is specified via the =~ or !~ operator, the $_ variable is searched and modified.

(The string specified with =~ must be scalar variable, an array element, a hash element, or an

assignment to one of those, i.e., an lvalue.)

If the delimiter chosen is single quote, no variable interpolation is done on either the PATTERN

or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable

rather than an end−of−string test, the variable will be interpolated into the pattern at run−time. If

you want the pattern compiled only once the first time the variable is interpolated, use the /o

18−Oct−1998 Version 5.005_02 177

perlop Perl Programmers Reference Guide perlop

option. If the pattern evaluates to the empty string, the last successfully executed regular

expression is used instead. See perlre for further explanation on these. See perllocale for

discussion of additional considerations that apply when use locale is in effect.

Options are:

e Evaluate the right side as an expression.

g Replace globally, i.e., all occurrences.

i Do case−insensitive pattern matching.

m Treat string as multiple lines.

o Compile pattern only once.

s Treat string as single line.

x Use extended regular expressions.

Any non−alphanumeric, non−whitespace delimiter may replace the slashes. If single quotes are

used, no interpretation is done on the replacement string (the /e modifier overrides this,

however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not

evaluated as a command. If the PATTERN is delimited by bracketing quotes, the

REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g.,

s(foo)(bar) or s<foo>/bar/. A /e will cause the replacement portion to be interpreted

as a full−fledged Perl expression and eval()ed right then and there. It is, however, syntax

checked at compile−time.

Examples:

s/\bgreen\b/mauve/g; # don’t change wintergreen

$path =~ s|/usr/bin|/usr/local/bin|;

s/Login: $foo/Login: $bar/; # run−time pattern

($foo = $bar) =~ s/this/that/; # copy first, then change

$count = ($paragraph =~ s/Mister\b/Mr./g); # get change−count

$_ = ’abc123xyz’;

s/\d+/$&*2/e; # yields ’abc246xyz’

s/\d+/sprintf("%5d",$&)/e; # yields ’abc 246xyz’

s/\w/$& x 2/eg; # yields ’aabbcc 224466xxyyzz’

s/%(.)/$percent{$1}/g; # change percent escapes; no /e

s/%(.)/$percent{$1} || $&/ge; # expr now, so /e

s/^=(\w+)/&pod($1)/ge; # use function call

# expand variables in $_, but dynamics only, using

# symbolic dereferencing

s/\$(\w+)/${$1}/g;

# /e’s can even nest; this will expand

# any embedded scalar variable (including lexicals) in $_

s/(\$\w+)/$1/eeg;

# Delete (most) C comments.

$program =~ s {

/\* # Match the opening delimiter.

.*? # Match a minimal number of characters.

\*/ # Match the closing delimiter.

} []gsx;

s/^\s*(.*?)\s*$/$1/; # trim white space in $_, expensively

for ($variable) { # trim white space in $variable, cheap

178 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

s/^\s+//;

s/\s+$//;

}

s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields

Note the use of $ instead of \ in the last example. Unlike sed, we use the \<digit> form in only

the left hand side. Anywhere else it‘s $<

digit

Occasionally, you can‘t use just a /g to get all the changes to occur. Here are two common

cases:

# put commas in the right places in an integer

1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4

1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5

# expand tabs to 8−column spacing

1 while s/\t+/’ ’ x (length($&)*8 − length($‘)%8)/e;

tr/SEARCHLIST/REPLACEMENTLIST/cds

y/SEARCHLIST/REPLACEMENTLIST/cds

Transliterates all occurrences of the characters found in the search list with the corresponding

character in the replacement list. It returns the number of characters replaced or deleted. If no

string is specified via the =~ or !~ operator, the $_ string is transliterated. (The string specified

with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of

those, i.e., an lvalue.) A character range may be specified with a hyphen, so tr/A−J/0−9/

does the same replacement as tr/ACEGIBDFHJ/0246813579/. For sed devotees, y is

provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the

REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes,

e.g., tr[A−Z][a−z] or tr(+\−*/)/ABCD/.

Options:

c Complement the SEARCHLIST.

d Delete found but unreplaced characters.

s Squash duplicate replaced characters.

If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d

modifier is specified, any characters specified by SEARCHLIST not found in

REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of

some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s

modifier is specified, sequences of characters that were transliterated to the same character are

squashed down to a single instance of the character.

If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified.

Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is

replicated till it is long enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is

replicated. This latter is useful for counting characters in a class or for squashing character

sequences in a class.

Examples:

$ARGV[1] =~ tr/A−Z/a−z/; # canonicalize to lower case

$cnt = tr/*/*/; # count the stars in $_

$cnt = $sky =~ tr/*/*/; # count the stars in $sky

$cnt = tr/0−9//; # count the digits in $_

tr/a−zA−Z//s; # bookkeeper −> bokeper

18−Oct−1998 Version 5.005_02 179

perlop Perl Programmers Reference Guide perlop

($HOST = $host) =~ tr/a−z/A−Z/;

tr/a−zA−Z/ /cs; # change non−alphas to single space

tr [\200−\377]

[\000−\177]; # delete 8th bit

If multiple transliterations are given for a character, only the first one is used:

tr/AAA/XYZ/

will transliterate any A to X.

Note that because the transliteration table is built at compile time, neither the SEARCHLIST nor

the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you

want to use variables, you must use an eval():

eval "tr/$oldlist/$newlist/";

die $@ if $@;

eval "tr/$oldlist/$newlist/, 1" or die $@;

Gory details of parsing quoted constructs

When presented with something which may have several different interpretations, Perl uses the principle

DWIM (expanded to Do What I Mean − not what I wrote) to pick up the most probable interpretation of the

source. This strategy is so successful that Perl users usually do not suspect ambivalence of what they write.

However, time to time Perl‘s ideas differ from what the author meant.

The target of this section is to clarify the Perl‘s way of interpreting quoted constructs. The most frequent

reason one may have to want to know the details discussed in this section is hairy regular expressions.

However, the first steps of parsing are the same for all Perl quoting operators, so here they are discussed

together.

Some of the passes discussed below are performed concurrently, but as far as results are the same, we

consider them one−by−one. For different quoting constructs Perl performs different number of passes, from

one to five, but they are always performed in the same order.

Finding the end

First pass is finding the end of the quoted construct, be it multichar ender "\nEOF\n" of <<EOF

construct, / which terminates qq/ construct, ] which terminates qq[ construct, or > which terminates

a fileglob started with <.

When searching for multichar construct no skipping is performed. When searching for one−char

non−matching delimiter, such as /, combinations \\ and \/ are skipped. When searching for

one−char matching delimiter, such as ], combinations \\, \] and \[ are skipped, and nested [, ] are

skipped as well.

For 3−parts constructs, s/// etc. the search is repeated once more.

During this search no attention is paid to the semantic of the construct, thus

"$hash{"$foo/$bar"}"

bar # This is not a comment, this slash / terminated m//!

do not form legal quoted expressions. Note that since the slash which terminated m// was followed

by a SPACE, this is not m//x, thus # was interpreted as a literal #.

180 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

Removal of backslashes before delimiters

During the second pass the text between the starting delimiter and the ending delimiter is copied to a

safe location, and the \ is removed from combinations consisting of \ and delimiter(s) (both starting

and ending delimiter if they differ).

The removal does not happen for multi−char delimiters.

Note that the combination \\ is left as it was!

Starting from this step no information about the delimiter(s) is used in the parsing.

Interpolation

Next step is interpolation in the obtained delimiter−independent text. There are four different cases.

<<‘EOF’, m‘’, s‘’’, tr///, y///

No interpolation is performed.

‘’, q//

The only interpolation is removal of \ from pairs \\.

"", ‘‘, qq//, qx//, <file*glob>

\Q, \U, \u, \L, \l (possibly paired with \E) are converted to corresponding Perl constructs,

thus "$foo\Qbaz$bar" is converted to

$foo . (quotemeta("baz" . $bar));

Other combinations of \ with following chars are substituted with appropriate expansions.

Interpolated scalars and arrays are converted to join and . Perl constructs, thus "‘@arr‘"

becomes

"’" . (join $", @arr) . "’";

Since all three above steps are performed simultaneously left−to−right, the is no way to insert a

literal $ or @ inside \Q\E pair: it cannot be protected by \, since any \ (except in \E) is

interpreted as a literal inside \Q\E, and any $ is interpreted as starting an interpolated scalar.

Note also that the interpolating code needs to make decision where the interpolated scalar ends,

say, whether "a $b −> {c}" means

"a " . $b . " −> {c}";

"a " . $b −> {c};

Most the time the decision is to take the longest possible text which does not include spaces

between components and contains matching braces/brackets.

?RE?, /RE/, m/RE/, s/RE/foo/,

Processing of \Q, \U, \u, \L, \l and interpolation happens (almost) as with qq// constructs,

but the substitution of

followed by other chars is not performed! Moreover, inside

(?{BLOCK}) no processing is performed at all.

Interpolation has several quirks: $|, $( and $) are not interpolated, and constructs

$var[SOMETHING] are voted (by several different estimators) to be an array element or

$var followed by a RE alternative. This is the place where the notation ${arr[$bar]}

comes handy: /${arr[0−9]}/ is interpreted as an array element −9, not as a regular

expression from variable $arr followed by a digit, which is the interpretation of

/$arr[0−9]/.

Note that absence of processing of \\ creates specific restrictions on the post−processed text: if

the delimiter is /, one cannot get the combination \/ into the result of this step: / will finish the

18−Oct−1998 Version 5.005_02 181

perlop Perl Programmers Reference Guide perlop

regular expression, \/ will be stripped to / on the previous step, and \\/ will be left as is.

Since / is equivalent to \/ inside a regular expression, this does not matter unless the delimiter

is special character for the RE engine, as in s*foo*bar*, m[foo], or ?foo?.

This step is the last one for all the constructs except regular expressions, which are processed further.

Interpolation of regular expressions

All the previous steps were performed during the compilation of Perl code, this one happens in run

time (though it may be optimized to be calculated at compile time if appropriate). After all the

preprocessing performed above (and possibly after evaluation if catenation, joining, up/down−casing

and quotemeta()ing are involved) the resulting string is passed to RE engine for compilation.

Whatever happens in the RE engine is better be discussed in perlre, but for the sake of continuity let us

do it here.

This is the first step where presence of the //x switch is relevant. The RE engine scans the string

left−to−right, and converts it to a finite automaton.

Backslashed chars are either substituted by corresponding literal strings, or generate special nodes of

the finite automaton. Characters which are special to the RE engine generate corresponding nodes.

(?#...) comments are ignored. All the rest is either converted to literal strings to match, or is

ignored (as is whitespace and #−style comments if //x is present).

Note that the parsing of the construct [...] is performed using absolutely different rules than the

rest of the regular expression. Similarly, the (?{...}) is only checked for matching braces.

Optimization of regular expressions

This step is listed for completeness only. Since it does not change semantics, details of this step are

not documented and are subject to change.

I/O Operators

There are several I/O operators you should know about. A string enclosed by backticks (grave accents) first

undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the

output of that command is the value of the pseudo−literal, like in a shell. In scalar context, a single string

consisting of all the output is returned. In list context, a list of values is returned, one for each line of output.

(You can set $/ to use a different line terminator.) The command is executed each time the pseudo−literal

is evaluated. The status value of the command is returned in $? (see perlvar for the interpretation of $?).

Unlike in csh, no translation is done on the return data—newlines remain newlines. Unlike in any of the

shells, single quotes do not hide variable names in the command from interpretation. To pass a $ through to

the shell you need to hide it with a backslash. The generalized form of backticks is qx//. (Because

backticks always undergo shell expansion as well, see perlsec for security concerns.)

Evaluating a filehandle in angle brackets yields the next line from that file (newline, if any, included), or

undef at end of file. Ordinarily you must assign that value to a variable, but there is one situation where an

automatic assignment happens. If and ONLY if the input symbol is the only thing inside the conditional of a

while or for(;;) loop, the value is automatically assigned to the variable $_. In these loop constructs,

the assigned value (whether assignment is automatic or explicit) is then tested to see if it is defined. The

defined test avoids problems where line has a string value that would be treated as false by perl e.g. "" or "0"

with no trailing newline. (This may seem like an odd thing to you, but you‘ll use the construct in almost

every Perl script you write.) Anyway, the following lines are equivalent to each other:

while (defined($_ = <STDIN>)) { print; }

while ($_ = <STDIN>) { print; }

while (<STDIN>) { print; }

for (;<STDIN>;) { print; }

print while defined($_ = <STDIN>);

print while ($_ = <STDIN>);

print while <STDIN>;

182 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

and this also behaves similarly, but avoids the use of $_ :

while (my $line = <STDIN>) { print $line }

If you really mean such values to terminate the loop they should be tested for explicitly:

while (($_ = <STDIN>) ne ’0’) { ... }

while (<STDIN>) { last unless $_; ... }

In other boolean contexts, <

filehandle

> without explicit defined test or comparison will solicit a

warning if −w is in effect.

The filehandles STDIN, STDOUT, and STDERR are predefined. (The filehandles stdin, stdout, and

stderr will also work except in packages, where they would be interpreted as local identifiers rather than

global.) Additional filehandles may be created with the open() function. See

open()

for details on this.

If a <FILEHANDLE> is used in a context that is looking for a list, a list consisting of all the input lines is

returned, one line per list element. It‘s easy to make a LARGE data space this way, so use with care.

The null filehandle <> is special and can be used to emulate the behavior of sed and awk. Input from <>

comes either from standard input, or from each file listed on the command line. Here‘s how it works: the

first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "−", which

when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The

loop

while (<>) {

... # code for each line

}

is equivalent to the following Perl−like pseudo code:

unshift(@ARGV, ’−’) unless @ARGV;

while ($ARGV = shift) {

open(ARGV, $ARGV);

while (<ARGV>) {

... # code for each line

}

except that it isn‘t so cumbersome to say, and will actually work. It really does shift array @ARGV and put

the current filename into variable $ARGV. It also uses filehandle ARGV internally—<> is just a synonym

for <ARGV>, which is magical. (The pseudo code above doesn‘t work because it treats <ARGV> as

non−magical.)

You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you

really want. Line numbers ($.) continue as if the input were one big happy file. (But see example under

eof for how to reset line numbers on each file.)

If you want to set @ARGV to your own list of files, go right ahead. This sets @ARGV to all plain text files

if no @ARGV was given:

@ARGV = grep { −f && −T } glob(’*’) unless @ARGV;

You can even set them to pipe commands. For example, this automatically filters compressed arguments

through gzip:

@ARGV = map { /\.(gz|Z)$/ ? "gzip −dc < $_ |" : $_ } @ARGV;

If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the

front like this:

while ($_ = $ARGV[0], /^−/) {

shift;

18−Oct−1998 Version 5.005_02 183

perlop Perl Programmers Reference Guide perlop

last if /^−−$/;

if (/^−D(.*)/) { $debug = $1 }

if (/^−v/) { $verbose++ }

# ... # other switches

}

while (<>) {

# ... # code for each line

}

The <> symbol will return undef for end−of−file only once. If you call it again after this it will assume

you are processing another @ARGV list, and if you haven‘t set @ARGV, will input from STDIN.

If the string inside the angle brackets is a reference to a scalar variable (e.g., <$foo>), then that variable

contains the name of the filehandle to input from, or its typeglob, or a reference to the same. For example:

$fh = \*STDIN;

$line = <$fh>;

If what‘s within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle

name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of

filenames or the next filename in the list is returned, depending on context. This distinction is determined

on syntactic grounds alone. That means <$x> is always a readline from an indirect handle, but

<$hash{key}> is always a glob. That‘s because $x is a simple scalar variable, but $hash{key} is

not—it‘s a hash element.

One level of double−quote interpretation is done first, but you can‘t say <$foo> because that‘s an indirect

filehandle as explained in the previous paragraph. (In older versions of Perl, programmers would insert curly

brackets to force interpretation as a filename glob: <${foo}>. These days, it‘s considered cleaner to call

the internal function directly as glob($foo), which is probably the right way to have done it in the first

place.) Example:

while (<*.c>) {

chmod 0644, $_;

}

is equivalent to

open(FOO, "echo *.c | tr −s ’ \t\r\f’ ’\\012\\012\\012\\012’|");

while (<FOO>) {

chop;

chmod 0644, $_;

}

In fact, it‘s currently implemented that way. (Which means it will not work on filenames with spaces in

them unless you have csh(1) on your machine.) Of course, the shortest way to do the above is:

chmod 0644, <*.c>;

Because globbing invokes a shell, it‘s often faster to call readdir() yourself and do your own grep()

on the filenames. Furthermore, due to its current implementation of using a shell, the glob() routine may

get "Arg list too long" errors (unless you‘ve installed tcsh(1L) as /bin/csh).

A glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before

it will start over. In a list context this isn‘t important, because you automatically get them all anyway. In

scalar context, however, the operator returns the next value each time it is called, or a undef value if you‘ve

just run out. As for filehandles an automatic defined is generated when the glob occurs in the test part of a

while or for − because legal glob returns (e.g. a file called ) would otherwise terminate the loop. Again,

undef is returned only once. So if you‘re expecting a single value from a glob, it is much better to say

($file) = <blurch*>;

184 Version 5.005_02 18−Oct−1998

perlop Perl Programmers Reference Guide perlop

than

$file = <blurch*>;

because the latter will alternate between returning a filename and returning FALSE.

It you‘re trying to do variable interpolation, it‘s definitely better to use the glob() function, because the

older notation can cause people to become confused with the indirect filehandle notation.

@files = glob("$dir/*.[ch]");

@files = glob($files[$i]);

Constant Folding

Like C, Perl does a certain amount of expression evaluation at compile time, whenever it determines that all

arguments to an operator are static and have no side effects. In particular, string concatenation happens at

compile time between literals that don‘t do variable substitution. Backslash interpretation also happens at

compile time. You can say

’Now is the time for all’ . "\n" .

’good men to come to.’

and this all reduces to one string internally. Likewise, if you say

foreach $file (@filenames) {

if (−s $file > 5 + 100 * 2**16) { }

}

the compiler will precompute the number that expression represents so that the interpreter won‘t have to.

Bitwise String Operators

Bitstrings of any size may be manipulated by the bitwise operators (~ | & ^).

If the operands to a binary bitwise op are strings of different sizes, or and xor ops will act as if the shorter

operand had additional zero bits on the right, while the and op will act as if the longer operand were

truncated to the length of the shorter.

# ASCII−based examples

print "j p \n" ^ " a h"; # prints "JAPH\n"

print "JA" | " ph\n"; # prints "japh\n"

print "japh\nJunk" & ’_____’; # prints "JAPH\n";

print ’p N$’ ^ " E<H\n"; # prints "Perl\n";

If you are intending to manipulate bitstrings, you should be certain that you‘re supplying bitstrings: If an

operand is a number, that will imply a numeric bitwise operation. You may explicitly show which type of

operation you intend by using "" or 0+, as in the examples below.

$foo = 150 | 105 ; # yields 255 (0x96 | 0x69 is 0xFF)

$foo = ’150’ | 105 ; # yields 255

$foo = 150 | ’105’; # yields 255

$foo = ’150’ | ’105’; # yields string ’155’ (under ASCII)

$baz = 0+$foo & 0+$bar; # both ops explicitly numeric

$biz = "$foo" ^ "$bar"; # both ops explicitly stringy

Integer Arithmetic

By default Perl assumes that it must do most of its arithmetic in floating point. But by saying

use integer;

you may tell the compiler that it‘s okay to use integer operations from here to the end of the enclosing

BLOCK. An inner BLOCK may countermand this by saying

no integer;

18−Oct−1998 Version 5.005_02 185

perlop Perl Programmers Reference Guide perlop

which lasts until the end of that BLOCK.

The bitwise operators ("&", "|", "^", "~", "<<", and "") always produce integral results. (But see also

Bitwise String Operators.) However, use integer still has meaning for them. By default, their results

are interpreted as unsigned integers. However, if use integer is in effect, their results are interpreted as

signed integers. For example, ~0 usually evaluates to a large integral value. However, use integer;

~0 is −1 on twos−complement machines.

Floating−point Arithmetic

While use integer provides integer−only arithmetic, there is no similar ways to provide rounding or

truncation at a certain number of decimal places. For rounding to a certain number of digits, sprintf() or

printf() is usually the easiest route.

Floating−point numbers are only approximations to what a mathematician would call real numbers. There

are infinitely more reals than floats, so some corners must be cut. For example:

printf "%.20g\n", 123456789123456789;

# produces 123456789123456784

Testing for exact equality of floating−point equality or inequality is not a good idea. Here‘s a (relatively

expensive) work−around to compare whether two floating−point numbers are equal to a particular number of

decimal places. See Knuth, volume II, for a more robust treatment of this topic.

sub fp_equal {

my ($X, $Y, $POINTS) = @_;

my ($tX, $tY);

$tX = sprintf("%.${POINTS}g", $X);

$tY = sprintf("%.${POINTS}g", $Y);

return $tX eq $tY;

}

The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of

other mathematical and trigonometric functions. The Math::Complex module (part of the standard perl

distribution) defines a number of mathematical functions that can also work on real numbers.

Math::Complex not as efficient as POSIX, but POSIX can‘t work with complex numbers.

Rounding in financial applications can have serious implications, and the rounding method used should be

specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by

Perl, but to instead implement the rounding function you need yourself.

Bigger Numbers

The standard Math::BigInt and Math::BigFloat modules provide variable precision arithmetic and

overloaded operators. At the cost of some space and considerable speed, they avoid the normal pitfalls

associated with limited−precision representations.

use Math::BigInt;

$x = Math::BigInt−>new(’123456789123456789’);

print $x * $x;

# prints +15241578780673678515622620750190521

186 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

NAME

perlre − Perl regular expressions

DESCRIPTION

This page describes the syntax of regular expressions in Perl. For a description of how to use regular

expressions in matching operations, plus various examples of the same, see discussion of m//, s///, qr//

and ?? in Regexp Quote−Like Operators in perlop.

The matching operations can have various modifiers. The modifiers that relate to the interpretation of the

regular expression inside are listed below. For the modifiers that alter the way a regular expression is used

by Perl, see Regexp Quote−Like Operators in perlop and

Gory details of parsing quoted constructs in perlop.

i Do case−insensitive pattern matching.

If use locale is in effect, the case map is taken from the current locale. See perllocale.

m Treat string as multiple lines. That is, change "^" and "$" from matching at only the very start or end

of the string to the start or end of any line anywhere within the string,

s Treat string as single line. That is, change "." to match any character whatsoever, even a newline,

which it normally would not match.

The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s

without /m will force "^" to match only at the beginning of the string and "$" to match only at the end

(or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character

whatsoever, while yet allowing "^" and "$" to match, respectively, just after and just before newlines

within the string.

x Extend your pattern‘s legibility by permitting whitespace and comments.

These are usually written as "the /x modifier", even though the delimiter in question might not actually be a

slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the

new (?...) construct. See below.

The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore

whitespace that is neither backslashed nor within a character class. You can use this to break up your regular

expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing

a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in

the pattern (outside of a character class, where they are unaffected by /x), that you‘ll either have to escape

them or encode them using octal or hex escapes. Taken together, these features go a long way towards

making Perl‘s regular expressions more readable. Note that you have to be careful not to include the pattern

delimiter in the comment—perl has no way of knowing you did not intend to close the pattern early. See the

C−comment deletion code in perlop.

Regular Expressions

The patterns used in pattern matching are regular expressions such as those supplied in the Version 8 regex

routines. (In fact, the routines are derived (distantly) from Henry Spencer‘s freely redistributable

reimplementation of the V8 routines.) See Version 8 Regular Expressions for details.

In particular the following metacharacters have their standard egrep−ish meanings:

\ Quote the next metacharacter

^ Match the beginning of the line

. Match any character (except newline)

$ Match the end of the line (or before newline at the end)

| Alternation

() Grouping

[] Character class

18−Oct−1998 Version 5.005_02 187

perlre Perl Programmers Reference Guide perlre

By default, the "^" character is guaranteed to match at only the beginning of the string, the "$" character at

only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that

the string contains only one line. Embedded newlines will not be matched by "^" or "$". You may,

however, wish to treat a string as a multi−line buffer, such that the "^" will match after any newline within

the string, and "$" will match before any newline. At the cost of a little more overhead, you can do this by

using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this

practice is now deprecated.)

To facilitate multi−line substitutions, the "." character never matches a newline unless you use the /s

modifier, which in effect tells Perl to pretend the string is a single line—even if it isn‘t. The /s modifier

also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another

module.

The following standard quantifiers are recognized:

* Match 0 or more times

+ Match 1 or more times

? Match 1 or 0 times

{n} Match exactly n times

{n,} Match at least n times

{n,m} Match at least n but not more than m times

(If a curly bracket occurs in any other context, it is treated as a regular character.) The "*" modifier is

equivalent to {0,}, the "+" modifier to {1,}, and the "?" modifier to {0,1}. n and m are limited to

integral values less than 65536.

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a

particular starting location) while still allowing the rest of the pattern to match. If you want it to match the

minimum number of times possible, follow the quantifier with a "?". Note that the meanings don‘t change,

just the "greediness":

*? Match 0 or more times

+? Match 1 or more times

?? Match 0 or 1 time

{n}? Match exactly n times

{n,}? Match at least n times

{n,m}? Match at least n but not more than m times

Because patterns are processed as double quoted strings, the following also work:

\t tab (HT, TAB)

\n newline (LF, NL)

\r return (CR)

\f form feed (FF)

\a alarm (bell) (BEL)

\e escape (think troff) (ESC)

\033 octal char (think of a PDP−11)

\x1B hex char

\c[ control char

\l lowercase next char (think vi)

\u uppercase next char (think vi)

\L lowercase till \E (think vi)

\U uppercase till \E (think vi)

\E end case modification (think vi)

\Q quote (disable) pattern metacharacters till \E

If use locale is in effect, the case map used by \l, \L, \u and \U is taken from the current locale. See

perllocale.

188 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the

corresponding variable, while escaping will cause the literal string \$ to be matched. You‘ll need to write

something like m/\Quser\E\@\Qhost/.

In addition, Perl defines the following:

\w Match a "word" character (alphanumeric plus "_")

\W Match a non−word character

\s Match a whitespace character

\S Match a non−whitespace character

\d Match a digit character

\D Match a non−digit character

A \w matches a single alphanumeric character, not a whole word. To match a word you‘d need to say \w+.

If use locale is in effect, the list of alphabetic characters generated by \w is taken from the current

locale. See perllocale. You may use \w, \W, \s, \S, \d, and \D within character classes (though not as

either end of a range).

Perl defines the following zero−width assertions:

\b Match a word boundary

\B Match a non−(word boundary)

\A Match only at beginning of string

\Z Match only at end of string, or before newline at the end

\z Match only at end of string

\G Match only where previous m//g left off (works only with /g)

A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W

on the other side of it (in either order), counting the imaginary characters off the beginning and end of the

string as matching a \W. (Within character classes \b represents backspace rather than a word boundary.)

The \A and \Z are just like "^" and "$", except that they won‘t match multiple times when the /m modifier

is used, while "^" and "$" will match at every internal line boundary. To match the actual end of the string,

not ignoring newline, you can use \z. The \G assertion can be used to chain global matches (using m//g),

as described in Regexp Quote−Like Operators in perlop.

It is also useful when writing lex−like scanners, when you have several patterns that you want to match

against consequent substrings of your string, see the previous reference. The actual location where \G will

match can also be influenced by using pos() as an lvalue. See pos.

When the bracketing construct ( ... ) is used, \<digit> matches the digit‘th substring. Outside of the

pattern, always use "$" instead of "\" in front of the digit. (While the \<digit> notation can on rare occasion

work outside the current pattern, this should not be relied upon. See the WARNING below.) The scope of

$<digit> (and $‘, $&, and $’) extends to the end of the enclosing BLOCK or eval string, or to the next

successful pattern match, whichever comes first. If you want to use parentheses to delimit a subpattern (e.g.,

a set of alternatives) without saving it as a subpattern, follow the ( with a ?:.

You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10,

$11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if

there have been at least that many left parentheses before the backreference. Otherwise (for backward

compatibility) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through

\9 are always backreferences.)

$+ returns whatever the last bracket match matched. $& returns the entire matched string. ($0 used to

return the same thing, but not any more.) $‘ returns everything before the matched string. $’ returns

everything after the matched string. Examples:

s/^([^ ]*) *([^ ]*)/$2 $1/; # swap first two words

if (/Time: (..):(..):(..)/) {

$hours = $1;

18−Oct−1998 Version 5.005_02 189

perlre Perl Programmers Reference Guide perlre

$minutes = $2;

$seconds = $3;

}

Once perl sees that you need one of $&, $‘ or $’ anywhere in the program, it has to provide them on each

and every pattern match. This can slow your program down. The same mechanism that handles these

provides for the use of $1, $2, etc., so you pay the same price for each pattern that contains capturing

parentheses. But if you never use $&, etc., in your script, then patterns without capturing parentheses won‘t

be penalized. So avoid $&, $‘, and $‘ if you can, but if you can‘t (and some algorithms really appreciate

them), once you‘ve used them once, use them at will, because you‘ve already paid the price. As of 5.005,

$& is not so costly as the other two.

Backslashed metacharacters in Perl are alphanumeric, such as \b, \w, \n. Unlike some other regular

expression languages, there are no backslashed symbols that aren‘t alphanumeric. So anything that looks

like \\, $, $, \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once

used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a

string that you want to use for a pattern. Simply quote all non−alphanumeric characters:

$pattern =~ s/(\W)/\\$1/g;

Now it is much more common to see either the quotemeta() function or the \Q escape sequence used to

disable all metacharacters’ special meanings like this:

/$unquoted\Q$quoted\E$unquoted/

Perl defines a consistent extension syntax for regular expressions. The syntax is a pair of parentheses with a

question mark as the first thing within the parentheses (this was a syntax error in older versions of Perl). The

character after the question mark gives the function of the extension. Several extensions are already

supported:

(?#text) A comment. The text is ignored. If the /x switch is used to enable whitespace formatting, a

simple # will suffice. Note that perl closes the comment as soon as it sees a ), so there is no

way to put a literal ) in the comment.

(?:pattern)

(?imsx−imsx:pattern)

This is for clustering, not capturing; it groups subexpressions like "()", but doesn‘t make

backreferences as "()" does. So

@fields = split(/\b(?:a|b|c)\b/)

is like

@fields = split(/\b(a|b|c)\b/)

but doesn‘t spit out extra fields.

The letters between ? and : act as flags modifiers, see

(?imsx−imsx)

. In particular,

/(?s−i:more.*than).*million/i

is equivalent to more verbose

/(?:(?s−i)more.*than).*million/i

(?=pattern)

A zero−width positive lookahead assertion. For example, /\w+(?=\t)/ matches a word

followed by a tab, without including the tab in $&.

(?!pattern)

A zero−width negative lookahead assertion. For example /foo(?!bar)/ matches any

occurrence of "foo" that isn‘t followed by "bar". Note however that lookahead and

lookbehind are NOT the same thing. You cannot use this for lookbehind.

190 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

If you are looking for a "bar" that isn‘t preceded by a "foo", /(?!foo)bar/ will not do

what you want. That‘s because the (?!foo) is just saying that the next thing cannot be

"foo"—and it‘s not, it‘s a "bar", so "foobar" will match. You would have to do something

like /(?!foo)...bar/ for that. We say "like" because there‘s the case of your "bar" not

having three characters before it. You could cover that this way:

/(?:(?!foo)...|^.{0,2})bar/. Sometimes it‘s still easier just to say:

if (/bar/ && $‘ !~ /foo$/)

For lookbehind see below.

(?<=pattern)

A zero−width positive lookbehind assertion. For example, /(?<=\t)\w+/ matches a word

following a tab, without including the tab in $&. Works only for fixed−width lookbehind.

(?<!pattern)

A zero−width negative lookbehind assertion. For example /(?<!bar)foo/ matches any

occurrence of "foo" that isn‘t following "bar". Works only for fixed−width lookbehind.

(?{ code })

Experimental "evaluate any Perl code" zero−width assertion. Always succeeds. code is not

interpolated. Currently the rules to determine where the code ends are somewhat

convoluted.

The code is properly scoped in the following sense: if the assertion is backtracked (compare

"Backtracking"), all the changes introduced after localisation are undone, so

$_ = ’a’ x 8;

(?{ $cnt = 0 }) # Initialize $cnt.

(

(?{

local $cnt = $cnt + 1; # Update $cnt, backtracking−safe.

})

aaaa

(?{ $res = $cnt }) # On success copy to non−localized

# location.

>x;

will set $res = 4. Note that after the match $cnt returns to the globally introduced value

0, since the scopes which restrict local statements are unwound.

This assertion may be used as

(?(condition)yes−pattern|no−pattern)

switch.

If not used in this way, the result of evaluation of code is put into variable $^R. This

happens immediately, so $^R can be used from other (?{ code }) assertions inside the

same regular expression.

The above assignment to $^R is properly localized, thus the old value of $^R is restored if

the assertion is backtracked (compare "Backtracking").

Due to security concerns, this construction is not allowed if the regular expression involves

run−time interpolation of variables, unless use re ‘eval’ pragma is used (see re), or the

variables contain results of qr() operator (see qr/STRING/imosx in perlop).

This restriction is due to the wide−spread (questionable) practice of using the construct

$re = <>;

chomp $re;

18−Oct−1998 Version 5.005_02 191

perlre Perl Programmers Reference Guide perlre

$string =~ /$re/;

without tainting. While this code is frowned upon from security point of view, when (?{})

was introduced, it was considered bad to add new security holes to existing scripts.

NOTE: Use of the above insecure snippet without also enabling taint mode is to be severely

frowned upon. use re ‘eval’ does not disable tainting checks, thus to allow $re in the

above snippet to contain (?{}) with tainting enabled, one needs both use re ‘eval’

and untaint the $re.

(?>pattern)

An "independent" subexpression. Matches the substring that a standalone pattern would

match if anchored at the given position, and only this substring.

Say, ^(?>a*)ab will never match, since (?>a*) (anchored at the beginning of string, as

above) will match all characters a at the beginning of string, leaving no a for ab to match. In

contrast, a*ab will match the same as a+b, since the match of the subgroup a* is influenced

by the following group ab (see "Backtracking"). In particular, a* inside a*ab will match

fewer characters than a standalone a*, since this makes the tail match.

An effect similar to (?>pattern) may be achieved by

(?=(pattern))\1

since the lookahead is in "logical" context, thus matches the same substring as a standalone

a+. The following \1 eats the matched string, thus making a zero−length assertion into an

analogue of (?>...). (The difference between these two constructs is that the second one

uses a catching group, thus shifting ordinals of backreferences in the rest of a regular

expression.)

This construct is useful for optimizations of "eternal" matches, because it will not backtrack

(see "Backtracking").

m{ \(

(

[^()]+

$ [^()]* $

That will efficiently match a nonempty group with matching two−or−less−level−deep

parentheses. However, if there is no such group, it will take virtually forever on a long string.

That‘s because there are so many different ways to split a long string into several substrings.

This is what (.+)+ is doing, and (.+)+ is similar to a subpattern of the above pattern.

Consider that the above pattern detects no−match on ((()aaaaaaaaaaaaaaaaaa in

several seconds, but that each extra letter doubles this time. This exponential performance

will make it appear that your program has hung.

However, a tiny modification of this pattern

m{ \(

(

(?> [^()]+ )

$ [^()]* $

192 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

which uses (?>...) matches exactly when the one above does (verifying this yourself

would be a productive exercise), but finishes in a fourth the time when used on a similar

string with 1000000 as. Be aware, however, that this pattern currently triggers a warning

message under −w saying it "matches the null string many times"):

On simple groups, such as the pattern (? [^()]+ ), a comparable effect may be achieved by

negative lookahead, as in [^()]+ (?! [^()] ). This was only 4 times slower on a

string with 1000000 as.

(?(condition)yes−pattern|no−pattern)

(?(condition)yes−pattern)

Conditional expression. (condition) should be either an integer in parentheses (which is

valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate

zero−width assertion.

Say,

m{ ( \( )?

[^()]+

(?(1) \) )

matches a chunk of non−parentheses, possibly included in parentheses themselves.

(?imsx−imsx)

One or more embedded pattern−match modifiers. This is particularly useful for patterns that

are specified in a table somewhere, some of which want to be case sensitive, and some of

which don‘t. The case insensitive ones need to include merely (?i) at the front of the

pattern. For example:

$pattern = "foobar";

if ( /$pattern/i ) { }

# more flexible:

$pattern = "(?i)foobar";

if ( /$pattern/ ) { }

Letters after − switch modifiers off.

These modifiers are localized inside an enclosing group (if any). Say,

( (?i) blah ) \s+ \1

(assuming x modifier, and no i modifier outside of this group) will match a repeated

(including the case!) word blah in any case.

A question mark was chosen for this and for the new minimal−matching construct because 1) question mark

is pretty rare in older regular expressions, and 2) whenever you see one, you should stop and "question"

exactly what is going on. That‘s psychology...

Backtracking

A fundamental feature of regular expression matching involves the notion called backtracking, which is

currently used (when needed) by all regular expression quantifiers, namely *, *?, +, +?, {n,m}, and

{n,m}?.

For a regular expression to match, the entire regular expression must match, not just part of it. So if the

beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail,

the matching engine backs up and recalculates the beginning part—that‘s why it‘s called backtracking.

Here is an example of backtracking: Let‘s say you want to find the word following "foo" in the string "Food

is on the foo table.":

18−Oct−1998 Version 5.005_02 193

perlre Perl Programmers Reference Guide perlre

$_ = "Food is on the foo table.";

if ( /\b(foo)\s+(\w+)/i ) {

print "$2 follows $1.\n";

}

When the match runs, the first part of the regular expression (\b(foo)) finds a possible match right at the

beginning of the string, and loads up $1 with "Foo". However, as soon as the matching engine sees that

there‘s no whitespace following the "Foo" that it had saved in $1, it realizes its mistake and starts over

again one character after where it had the tentative match. This time it goes all the way until the next

occurrence of "foo". The complete regular expression matches this time, and you get the expected output of

"table follows foo."

Sometimes minimal matching can help a lot. Imagine you‘d like to match everything between "foo" and

"bar". Initially, you write something like this:

$_ = "The food is under the bar in the barn.";

if ( /foo(.*)bar/ ) {

print "got <$1>\n";

}

Which perhaps unexpectedly yields:

got <d is under the bar in the >

That‘s because .* was greedy, so you get everything between the first "foo" and the last "bar". In this case,

it‘s more effective to use minimal matching to make sure you get the text between a "foo" and the first "bar"

thereafter.

if ( /foo(.*?)bar/ ) { print "got <$1>\n" }

got <d is under the >

Here‘s another example: let‘s say you‘d like to match a number at the end of a string, and you also want to

keep the preceding part the match. So you write this:

$_ = "I have 2 numbers: 53147";

if ( /(.*)(\d*)/ ) { # Wrong!

print "Beginning is <$1>, number is <$2>.\n";

}

That won‘t work at all, because .* was greedy and gobbled up the whole string. As \d* can match on an

empty string the complete regular expression matched successfully.

Beginning is <I have 2 numbers: 53147>, number is <>.

Here are some variants, most of which don‘t work:

$_ = "I have 2 numbers: 53147";

@pats = qw{

(.*)(\d*)

(.*)(\d+)

(.*?)(\d*)

(.*?)(\d+)

(.*)(\d+)$

(.*?)(\d+)$

(.*)\b(\d+)$

(.*\D)(\d+)$

};

for $pat (@pats) {

printf "%−12s ", $pat;

if ( /$pat/ ) {

194 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

print "<$1> <$2>\n";

} else {

print "FAIL\n";

}

That will print out:

(.*)(\d*) <I have 2 numbers: 53147> <>

(.*)(\d+) <I have 2 numbers: 5314> <7>

(.*?)(\d*) <> <>

(.*?)(\d+) <I have > <2>

(.*)(\d+)$ <I have 2 numbers: 5314> <7>

(.*?)(\d+)$ <I have 2 numbers: > <53147>

(.*)\b(\d+)$ <I have 2 numbers: > <53147>

(.*\D)(\d+)$ <I have 2 numbers: > <53147>

As you see, this can be a bit tricky. It‘s important to realize that a regular expression is merely a set of

assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition

might succeed against a particular string. And if there are multiple ways it might succeed, you need to

understand backtracking to know which variety of success you will achieve.

When using lookahead assertions and negations, this can all get even tricker. Imagine you‘d like to find a

sequence of non−digits not followed by "123". You might try to write that as

$_ = "ABC123";

if ( /^\D*(?!123)/ ) { # Wrong!

print "Yup, no 123 in $_\n";

}

But that isn‘t going to match; at least, not the way you‘re hoping. It claims that there is no 123 in the string.

Here‘s a clearer picture of why it that pattern matches, contrary to popular expectations:

$x = ’ABC123’ ;

$y = ’ABC445’ ;

print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;

print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;

print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;

print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;

This prints

2: got ABC

3: got AB

4: got ABC

You might have expected test 3 to fail because it seems to a more general purpose version of test 1. The

important difference between them is that test 3 contains a quantifier (\D*) and so can use backtracking,

whereas test 1 will not. What‘s happening is that you‘ve asked "Is it true that at the start of $x, following 0

or more non−digits, you have something that‘s not 123?" If the pattern matcher had let \D* expand to

"ABC", this would have caused the whole pattern to fail. The search engine will initially match \D* with

"ABC". Then it will try to match (?!123 with "123", which of course fails. But because a quantifier

(\D*) has been used in the regular expression, the search engine can backtrack and retry the match

differently in the hope of matching the complete regular expression.

The pattern really, really wants to succeed, so it uses the standard pattern back−off−and−retry and lets \D*

expand to just "AB" this time. Now there‘s indeed something following "AB" that is not "123". It‘s in fact

"C123", which suffices.

18−Oct−1998 Version 5.005_02 195

perlre Perl Programmers Reference Guide perlre

We can deal with this by using both an assertion and a negation. We‘ll say that the first part in $1 must be

followed by a digit, and in fact, it must also be followed by something that‘s not "123". Remember that the

lookaheads are zero−width expressions—they only look, but don‘t consume any of the string in their match.

So rewriting this way produces what you‘d expect; that is, case 5 will fail, but case 6 succeeds:

print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;

print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;

6: got ABC

In other words, the two zero−width assertions next to each other work as though they‘re ANDed together,

just as you‘d use any builtin assertions: /^$/ matches only if you‘re at the beginning of the line AND the

end of the line simultaneously. The deeper underlying truth is that juxtaposition in regular expressions

always means AND, except when you write an explicit OR using the vertical bar. /ab/ means match "a"

AND (then) match "b", although the attempted matches are made at different positions because "a" is not a

zero−width assertion, but a one−width assertion.

One warning: particularly complicated regular expressions can take exponential time to solve due to the

immense number of possible ways they can use backtracking to try match. For example this will take a very

long time to run

/((a{0,5}){0,5}){0,5}/

And if you used *‘s instead of limiting it to 0 through 5 matches, then it would take literally forever—or

until you ran out of stack space.

A powerful tool for optimizing such beasts is "independent" groups, which do not backtrace (see

(?>pattern)

). Note also that zero−length lookahead/lookbehind assertions will not backtrace to make

the tail match, since they are in "logical" context: only the fact whether they match or not is considered

relevant. For an example where side−effects of a lookahead might have influenced the following match, see

(?>pattern)

Version 8 Regular Expressions

In case you‘re not familiar with the "regular" Version 8 regex routines, here are the pattern−matching rules

not described above.

Any single character matches itself, unless it is a metacharacter with a special meaning described here or

above. You can cause characters that normally function as metacharacters to be interpreted literally by

prefixing them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a "\"). A series of

characters matches that series of characters in the target string, so the pattern blurfl would match "blurfl"

in the target string.

You can specify a character class, by enclosing a list of characters in [], which will match any one character

from the list. If the first character after the "[" is "^", the class matches any character not in the list. Within a

list, the "−" character is used to specify a range, so that a−z represents all characters between "a" and "z",

inclusive. If you want "−" itself to be a member of a class, put it at the start or end of the list, or escape it

with a backslash. (The following all specify the same class of three characters: [−az], [az−], and

[a\−z]. All are different from [a−z], which specifies a class containing twenty−six characters.)

Characters may be specified using a metacharacter syntax much like that used in C: "\n" matches a newline,

"\t" a tab, "\r" a carriage return, "\f" a form feed, etc. More generally, \nnn, where nnn is a string of octal

digits, matches the character whose ASCII value is nnn. Similarly, \xnn, where nn are hexadecimal digits,

matches the character whose ASCII value is nn. The expression \cx matches the ASCII character control−x.

Finally, the "." metacharacter matches any character except "\n" (unless you use /s).

You can specify a series of alternatives for a pattern using "|" to separate them, so that fee|fie|foe will

match any of "fee", "fie", or "foe" in the target string (as would f(e|i|o)e). The first alternative includes

everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the

last alternative contains everything from the last "|" to the next pattern delimiter. For this reason, it‘s

common practice to include alternatives in parentheses, to minimize confusion about where they start and

196 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

end.

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches,

is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when mathing

foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it

successfully matches the target string. (This might not seem important, but it is important when you are

capturing matched text using parentheses.)

Also remember that "|" is interpreted as a literal within square brackets, so if you write [fee|fie|foe]

you‘re really only matching [feio|].

Within a pattern, you may designate subpatterns for later reference by enclosing them in parentheses, and

you may refer back to the nth subpattern later in the pattern using the metacharacter \n. Subpatterns are

numbered based on the left to right order of their opening parenthesis. A backreference matches whatever

actually matched the subpattern in the string being examined, not the rules for that subpattern. Therefore,

(0|0x)\d*\s\1\d* will match "0x1234 0x4321", but not "0x1234 01234", because subpattern 1 actually

matched "0x", even though the rule 0|0x could potentially match the leading 0 in the second number.

WARNING on \1 vs $1

Some people get too used to writing things like:

$pattern =~ s/(\W)/\\\1/g;

This is grandfathered for the RHS of a substitute to avoid shocking the sed addicts, but it‘s a dirty habit to

get into. That‘s because in PerlThink, the righthand side of a s/// is a double−quoted string. \1 in the

usual double−quoted string means a control−A. The customary Unix meaning of \1 is kludged in for s///.

However, if you get into the habit of doing that, you get yourself into trouble if you then add an /e

modifier.

s/(\d+)/ \1 + 1 /eg; # causes warning under −w

Or if you try to do

s/(\d+)/\1000/;

You can‘t disambiguate that by saying \{1}000, whereas you can fix it with ${1}000. Basically, the

operation of interpolation should not be confused with the operation of matching a backreference. Certainly

they mean two different things on the left side of the s///.

Repeated patterns matching zero−length substring

WARNING: Difficult material (and prose) ahead. This section needs a rewrite.

Regular expressions provide a terse and powerful programming language. As with most other power tools,

power comes together with the ability to wreak havoc.

A common abuse of this power stems from the ability to make infinite loops using regular expressions, with

something as innocous as:

’foo’ =~ m{ ( o? )* }x;

The o? can match at the beginning of ‘foo’, and since the position in the string is not moved by the match,

o? would match again and again due to the * modifier. Another common way to create a similar cycle is

with the looping modifier //g:

@matches = ( ’foo’ =~ m{ o? }xg );

print "match: <$&>\n" while ’foo’ =~ m{ o? }xg;

or the loop implied by split().

However, long experience has shown that many programming tasks may be significantly simplified by using

repeated subexpressions which may match zero−length substrings, with a simple example being:

18−Oct−1998 Version 5.005_02 197

perlre Perl Programmers Reference Guide perlre

@chars = split //, $string; # // is not magic in split

($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /

Thus Perl allows the /()/ construct, which forcefully breaks the infinite loop. The rules for this are

different for lower−level loops given by the greedy modifiers *+{}, and for higher−level ones like the /g

modifier or split() operator.

The lower−level loops are interrupted when it is detected that a repeated expression did match a zero−length

substring, thus

m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;

is made equivalent to

m{ (?: NON_ZERO_LENGTH )*

(?: ZERO_LENGTH )?

}x;

The higher level−loops preserve an additional state between iterations: whether the last match was

zero−length. To break the loop, the following match after a zero−length match is prohibited to have a

length of zero. This prohibition interacts with backtracking (see "Backtracking"), and so the second best

match is chosen if the best match is of zero length.

Say,

$_ = ’bar’;

s/\w??/<$&>/g;

results in "<<b<<a<<r<". At each position of the string the best match given by non−greedy ?? is the

zero−length match, and the second best match is what is matched by \w. Thus zero−length matches

alternate with one−character−long matches.

Similarly, for repeated m/()/g the second−best match is the match at the position one notch further in the

string.

The additional state of being matched with zero−length is associated to the matched string, and is reset by

each assignment to pos().

Creating custom RE engines

Overloaded constants (see overload) provide a simple way to extend the functionality of the RE engine.

Suppose that we want to enable a new RE escape−sequence \Y| which matches at boundary between

white−space characters and non−whitespace characters. Note that (?=\S)(?<!\S)|(?!\S)(?<=\S)

matches exactly at these positions, so we want to have each \Y| in the place of the more complicated

version. We can create a module customre to do this:

package customre;

use overload;

sub import {

shift;

die "No argument to customre::import allowed" if @_;

overload::constant ’qr’ => \&convert;

}

sub invalid { die "/$_[0]/: invalid escape ’\\$_[1]’"}

my %rules = ( ’\\’ => ’\\’,

’Y|’ => qr/(?=\S)(?<!\S)|(?!\S)(?<=\S)/ );

sub convert {

my $re = shift;

$re =~ s{

198 Version 5.005_02 18−Oct−1998

perlre Perl Programmers Reference Guide perlre

\\ ( \\ | Y . )

}

{ $rules{$1} or invalid($re,$1) }sgex;

return $re;

}

Now use customre enables the new escape in constant regular expressions, i.e., those without any

runtime variable interpolations. As documented in overload, this conversion will work only over literal parts

of regular expressions. For \Y|$re\Y| the variable part of this regular expression needs to be converted

explicitly (but only if the special meaning of \Y| should be enabled inside $re):

use customre;

$re = <>;

chomp $re;

$re = customre::convert $re;

/\Y|$re\Y|/;

SEE ALSO

Regexp Quote−Like Operators in perlop.

Gory details of parsing quoted constructs in perlop.

pos.

perllocale.

Mastering Regular Expressions (see perlbook) by Jeffrey Friedl.

18−Oct−1998 Version 5.005_02 199

perlrun Perl Programmers Reference Guide perlrun

NAME

perlrun − how to execute the Perl interpreter

SYNOPSIS

perl [ −sTuU ]

[ −hv ] [ −V[:configvar] ]

[ −cw ] [ −d[:debugger] ] [ −D[number/list] ]

[ −pna ] [ −Fpattern ] [ −l[octal] ] [ −0[octal] ]

[ −Idir ] [ −m[−]module ] [ −M[−]‘module...’ ]

[ −P ]

[ −S ]

[ −x[dir] ]

[ −i[extension] ]

[ −e ‘command’ ] [ — ] [ programfile ] [ argument ]...

DESCRIPTION

Upon startup, Perl looks for your script in one of the following places:

1. Specified line by line via −e switches on the command line.

2. Contained in the file specified by the first filename on the command line. (Note that systems

supporting the #! notation invoke interpreters this way. See Location of Perl.)

3. Passed in implicitly via standard input. This works only if there are no filename arguments—to pass

arguments to a STDIN script you must explicitly specify a "−" for the script name.

With methods 2 and 3, Perl starts parsing the input file from the beginning, unless you‘ve specified a −x

switch, in which case it scans for the first line starting with #! and containing the word "perl", and starts there

instead. This is useful for running a script embedded in a larger message. (In this case you would indicate

the end of the script using the __END__ token.)

The #! line is always examined for switches as the line is being parsed. Thus, if you‘re on a machine that

allows only one argument with the #! line, or worse, doesn‘t even recognize the #! line, you still can get

consistent switch behavior regardless of how Perl was invoked, even if −x was used to find the beginning of

the script.

Because many operating systems silently chop off kernel interpretation of the #! line after 32 characters,

some switches may be passed in on the command line, and some may not; you could even get a "−" without

its letter, if you‘re not careful. You probably want to make sure that all your switches fall either before or

after that 32 character boundary. Most switches don‘t actually care if they‘re processed redundantly, but

getting a − instead of a complete switch could cause Perl to try to execute standard input instead of your

script. And a partial −I switch could also cause odd results.

Some switches do care if they are processed twice, for instance combinations of −l and −0. Either put all the

switches after the 32 character boundary (if applicable), or replace the use of −0digits by BEGIN{ $/ =

"\0digits"; }.

Parsing of the #! switches starts wherever "perl" is mentioned in the line. The sequences "−*" and "− " are

specifically ignored so that you could, if you were so inclined, say

#!/bin/sh −− # −*− perl −*− −p

eval ’exec /usr/bin/perl −wS $0 ${1+"$@"}’

if $running_under_some_shell;

to let Perl see the −p switch.

If the #! line does not contain the word "perl", the program named after the #! is executed instead of the Perl

interpreter. This is slightly bizarre, but it helps people on machines that don‘t do #!, because they can tell a

program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter

for them.

200 Version 5.005_02 18−Oct−1998

perlrun Perl Programmers Reference Guide perlrun

After locating your script, Perl compiles the entire script to an internal form. If there are any compilation

errors, execution of the script is not attempted. (This is unlike the typical shell script, which might run

part−way through before finding a syntax error.)

If the script is syntactically correct, it is executed. If the script runs off the end without hitting an exit()

or die() operator, an implicit exit(0) is provided to indicate successful completion.

#! and quoting on non−Unix systems

Unix‘s #! technique can be simulated on other systems:

OS/2

Put

extproc perl −S −your_switches

as the first line in *.cmd file (−S due to a bug in cmd.exe‘s ‘extproc’ handling).

MS−DOS

Create a batch file to run your script, and codify it in ALTERNATIVE_SHEBANG (see the dosish.h file

in the source distribution for more information).

Win95/NT

The Win95/NT installation, when using the Activeware port of Perl, will modify the Registry to

associate the .pl extension with the perl interpreter. If you install another port of Perl, including the

one in the Win32 directory of the Perl distribution, then you‘ll have to modify the Registry yourself.

Note that this means you can no longer tell the difference between an executable Perl program and a

Perl library file.

Macintosh

Macintosh perl scripts will have the appropriate Creator and Type, so that double−clicking them will

invoke the perl application.

Command−interpreters on non−Unix systems have rather different ideas on quoting than Unix shells. You‘ll

need to learn the special characters in your command−interpreter (*, \ and " are common) and how to

protect whitespace and these characters to run one−liners (see −e below).

On some systems, you may have to change single−quotes to double ones, which you must NOT do on Unix

or Plan9 systems. You might also have to change a single % to a %%.

For example:

# Unix

perl −e ’print "Hello world\n"’

# MS−DOS, etc.

perl −e "print \"Hello world\n\""

# Macintosh

print "Hello world\n"

(then Run "Myscript" or Shift−Command−R)

# VMS

perl −e "print ""Hello world\n"""

The problem is that none of this is reliable: it depends on the command and it is entirely possible neither

works. If 4DOS was the command shell, this would probably work better:

perl −e "print <Ctrl−x>"Hello world\n<Ctrl−x>""

CMD.EXE in Windows NT slipped a lot of standard Unix functionality in when nobody was looking, but

just try to find documentation for its quoting rules.

Under the Macintosh, it depends which environment you are using. The MacPerl shell, or MPW, is much

18−Oct−1998 Version 5.005_02 201

perlrun Perl Programmers Reference Guide perlrun

like Unix shells in its support for several quoting variants, except that it makes free use of the Macintosh‘s

non−ASCII characters as control characters.

There is no general solution to all of this. It‘s just a mess.

Location of Perl

It may seem obvious to say, but Perl is useful only when users can easily find it. When possible, it‘s good for

both /usr/bin/perl and /usr/local/bin/perl to be symlinks to the actual binary. If that can‘t be done, system

administrators are strongly encouraged to put (symlinks to) perl and its accompanying utilities, such as

perldoc, into a directory typically found along a user‘s PATH, or in another obvious and convenient place.

In this documentation, #!/usr/bin/perl on the first line of the script will stand in for whatever method

works on your system.

Switches

A single−character switch may be combined with the following switch, if any.

#!/usr/bin/perl −spi.bak # same as −s −p −i.bak

Switches include:

−0[

digits

]

specifies the input record separator ($/) as an octal number. If there are no digits, the null character

is the separator. Other switches may precede or follow the digits. For example, if you have a version

of find which can print filenames terminated by the null character, you can say this:

find . −name ’*.bak’ −print0 | perl −n0e unlink

The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl

to slurp files whole because there is no legal character with that value.

−a turns on autosplit mode when used with a −n or −p. An implicit split command to the @F array is

done as the first thing inside the implicit while loop produced by the −n or −p.

perl −ane ’print pop(@F), "\n";’

is equivalent to

while (<>) {

@F = split(’ ’);

print pop(@F), "\n";

}

An alternate delimiter may be specified using −F.

−c causes Perl to check the syntax of the script and then exit without executing it. Actually, it will

execute BEGIN, END, and use blocks, because these are considered as occurring outside the

execution of your program.

−d runs the script under the Perl debugger. See perldebug.

−d:

foo

runs the script under the control of a debugging or tracing module installed as Devel::foo. E.g.,

−d:DProf executes the script using the Devel::DProf profiler. See perldebug.

−D

letters

−D

number

sets debugging flags. To watch how it executes your script, use −Dtls. (This works only if

debugging is compiled into your Perl.) Another nice value is −Dx, which lists your compiled syntax

tree. And −Dr displays compiled regular expressions. As an alternative, specify a number instead of

list of letters (e.g., −D14 is equivalent to −Dtls):

1 p Tokenizing and parsing

202 Version 5.005_02 18−Oct−1998

perlrun Perl Programmers Reference Guide perlrun

2 s Stack snapshots

4 l Context (loop) stack processing

8 t Trace execution

16 o Method and overloading resolution

32 c String/numeric conversions

64 P Print preprocessor command for −P

128 m Memory allocation

256 f Format processing

512 r Regular expression parsing and execution

1024 x Syntax tree dump

2048 u Tainting checks

4096 L Memory leaks (needs C<−DLEAKTEST> when compiling Perl)

8192 H Hash dump −− usurps values()

16384 X Scratchpad allocation

32768 D Cleaning up

65536 S Thread synchronization

All these flags require −DDEBUGGING when you compile the Perl executable. This flag is

automatically set if you include −g option when Configure asks you about optimizer/debugger

flags.

−e

commandline

may be used to enter one line of script. If −e is given, Perl will not look for a script filename in the

argument list. Multiple −e commands may be given to build up a multi−line script. Make sure to use

semicolons where you would in a normal program.

−F

pattern

specifies the pattern to split on if −a is also in effect. The pattern may be surrounded by //, "", or

‘’, otherwise it will be put in single quotes.

−h prints a summary of the options.

−i[

extension

]

specifies that files processed by the <> construct are to be edited in−place. It does this by renaming

the input file, opening the output file by the original name, and selecting that output file as the default

for print() statements. The extension, if supplied, is used to modify the name of the old file to

make a backup copy, following these rules:

If no extension is supplied, no backup is made and the current file is overwritten.

If the extension doesn‘t contain a * then it is appended to the end of the current filename as a suffix.

If the extension does contain one or more * characters, then each * is replaced with the current

filename. In perl terms you could think of this as:

($backup = $extension) =~ s/\*/$file_name/g;

This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix:

$ perl −pi’bak_*’ −e ’s/bar/baz/’ fileA # backup to ’bak_fileA’

Or even to place backup copies of the original files into another directory (provided the directory

already exists):

$ perl −pi’old/*.bak’ −e ’s/bar/baz/’ fileA # backup to ’old/fileA.bak’

These sets of one−liners are equivalent:

$ perl −pi −e ’s/bar/baz/’ fileA # overwrite current file

$ perl −pi’*’ −e ’s/bar/baz/’ fileA # overwrite current file

18−Oct−1998 Version 5.005_02 203

perlrun Perl Programmers Reference Guide perlrun

$ perl −pi’.bak’ −e ’s/bar/baz/’ fileA# backup to ’fileA.bak’

$ perl −pi’*.bak’ −e ’s/bar/baz/’ fileA# backup to ’fileA.bak’

From the shell, saying

$ perl −p −i.bak −e "s/foo/bar/; ... "

is the same as using the script:

#!/usr/bin/perl −pi.bak

s/foo/bar/;

which is equivalent to

#!/usr/bin/perl

$extension = ’.bak’;

while (<>) {

if ($ARGV ne $oldargv) {

if ($extension !~ /\*/) {

$backup = $ARGV . $extension;

}

else {

($backup = $extension) =~ s/\*/$ARGV/g;

}

rename($ARGV, $backup);

open(ARGVOUT, ">$ARGV");

select(ARGVOUT);

$oldargv = $ARGV;

}

s/foo/bar/;

}

continue {

print; # this prints to original filename

}

select(STDOUT);

except that the −i form doesn‘t need to compare $ARGV to $oldargv to know when the filename

has changed. It does, however, use ARGVOUT for the selected filehandle. Note that STDOUT is

restored as the default output filehandle after the loop.

As shown above, Perl creates the backup file whether or not any output is actually changed. So this

is just a fancy way to copy files:

$ perl −p −i’/some/file/path/*’ −e 1 file1 file2 file3...

$ perl −p −i’.bak’ −e 1 file1 file2 file3...

You can use eof without parentheses to locate the end of each input file, in case you want to append

to each file, or reset line numbering (see example in eof).

If, for a given file, Perl is unable to create the backup file as specified in the extension then it will

skip that file and continue on with the next one (if it exists).

For a discussion of issues surrounding file permissions and −i, see

Why does Perl let me delete read−only files? Why does −i clobber protected files? Isn‘t this a bug in Perl?.

You cannot use −i to create directories or to strip extensions from files.

Perl does not expand ~, so don‘t do that.

204 Version 5.005_02 18−Oct−1998

perlrun Perl Programmers Reference Guide perlrun

Finally, note that the −i switch does not impede execution when no files are given on the command

line. In this case, no backup is made (the original file cannot, of course, be determined) and

processing proceeds from STDIN to STDOUT as might be expected.

−I

SEE ALSO

See perlref for more about references and closures. See perlxs if you‘d like to learn about calling C

subroutines from perl. See perlmod to learn about bundling up your functions in separate files.

18−Oct−1998 Version 5.005_02 295

perlmod Perl Programmers Reference Guide perlmod

NAME

perlmod − Perl modules (packages and symbol tables)

DESCRIPTION

Packages

Perl provides a mechanism for alternative namespaces to protect packages from stomping on each other‘s

variables. In fact, there‘s really no such thing as a global variable in Perl (although some identifiers default

to the main package instead of the current one). The package statement declares the compilation unit as

being in the given namespace. The scope of the package declaration is from the declaration itself through the

end of the enclosing block, eval, sub, or end of file, whichever comes first (the same scope as the my()

and local() operators). All further unqualified dynamic identifiers will be in this namespace. A package

statement only affects dynamic variables—including those you‘ve used local() on—but not lexical

variables created with my(). Typically it would be the first declaration in a file to be included by the

require or use operator. You can switch into a package in more than one place; it merely influences

which symbol table is used by the compiler for the rest of that block. You can refer to variables and

filehandles in other packages by prefixing the identifier with the package name and a double colon:

$Package::Variable. If the package name is null, the main package is assumed. That is, $::sail

is equivalent to $main::sail.

The old package delimiter was a single quote, but double colon is now the preferred delimiter, in part

because it‘s more readable to humans, and in part because it‘s more readable to emacs macros. It also makes

C++ programmers feel like they know what‘s going on—as opposed to using the single quote as separator,

which was there to make Ada programmers feel like they knew what‘s going on. Because the old−fashioned

syntax is still supported for backwards compatibility, if you try to use a string like "This is $owner‘s

house", you‘ll be accessing $owner::s; that is, the $s variable in package owner, which is probably

not what you meant. Use braces to disambiguate, as in "This is ${owner}‘s house".

Packages may be nested inside other packages: $OUTER::INNER::var. This implies nothing about the

order of name lookups, however. All symbols are either local to the current package, or must be fully

qualified from the outer package name down. For instance, there is nowhere within package OUTER that

$INNER::var refers to $OUTER::INNER::var. It would treat package INNER as a totally separate

global package.

Only identifiers starting with letters (or underscore) are stored in a package‘s symbol table. All other

symbols are kept in package main, including all of the punctuation variables like $_. In addition, when

unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are

forced to be in package main, even when used for other purposes than their builtin one. Note also that, if

you have a package called m, s, or y, then you can‘t use the qualified form of an identifier because it will be

interpreted instead as a pattern match, a substitution, or a transliteration.

(Variables beginning with underscore used to be forced into package main, but we decided it was more

useful for package writers to be able to use leading underscore to indicate private variables and method

names. $_ is still global though.)

Eval()ed strings are compiled in the package in which the eval() was compiled. (Assignments to

$SIG{}, however, assume the signal handler specified is in the main package. Qualify the signal handler

name if you wish to have a signal handler in a package.) For an example, examine perldb.pl in the Perl

library. It initially switches to the DB package so that the debugger doesn‘t interfere with variables in the

script you are trying to debug. At various points, however, it temporarily switches back to the main

package to evaluate various expressions in the context of the main package (or wherever you came from).

See perldebug.

The special symbol __PACKAGE__ contains the current package, but cannot (easily) be used to construct

variables.

See perlsub for other scoping issues related to my() and local(), and perlref regarding closures.

296 Version 5.005_02 18−Oct−1998

perlmod Perl Programmers Reference Guide perlmod

Symbol Tables

The symbol table for a package happens to be stored in the hash of that name with two colons appended.

The main symbol table‘s name is thus %main::, or %:: for short. Likewise symbol table for the nested

package mentioned earlier is named %OUTER::INNER::.

The value in each entry of the hash is what you are referring to when you use the *name typeglob notation.

In fact, the following have the same effect, though the first is more efficient because it does the symbol table

lookups at compile time:

local *main::foo = *main::bar;

local $main::{foo} = $main::{bar};

You can use this to print out all the variables in a package, for instance. The standard dumpvar.pl library

and the CPAN module Devel::Symdump make use of this.

Assignment to a typeglob performs an aliasing operation, i.e.,

*dick = *richard;

causes variables, subroutines, formats, and file and directory handles accessible via the identifier richard

also to be accessible via the identifier dick. If you want to alias only a particular variable or subroutine,

you can assign a reference instead:

*dick = \$richard;

Which makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays.

Tricky, eh?

This mechanism may be used to pass and return cheap references into or from subroutines if you won‘t want

to copy the whole thing. It only works when assigning to dynamic variables, not lexicals.

%some_hash = (); # can’t be my()

*some_hash = fn( \%another_hash );

sub fn {

local *hashsym = shift;

# now use %hashsym normally, and you

# will affect the caller’s %another_hash

my %nhash = (); # do what you want

return \%nhash;

}

On return, the reference will overwrite the hash slot in the symbol table specified by the *some_hash

typeglob. This is a somewhat tricky way of passing around references cheaply when you won‘t want to have

to remember to dereference variables explicitly.

Another use of symbol tables is for making "constant" scalars.

*PI = \3.14159265358979;

Now you cannot alter $PI, which is probably a good thing all in all. This isn‘t the same as a constant

subroutine, which is subject to optimization at compile−time. This isn‘t. A constant subroutine is one

prototyped to take no arguments and to return a constant expression. See perlsub for details on these. The

use constant pragma is a convenient shorthand for these.

You can say *foo{PACKAGE} and *foo{NAME} to find out what name and package the *foo symbol

table entry comes from. This may be useful in a subroutine that gets passed typeglobs as arguments:

sub identify_typeglob {

my $glob = shift;

print ’You gave me ’, *{$glob}{PACKAGE}, ’::’, *{$glob}{NAME}, "\n";

}

identify_typeglob *foo;

18−Oct−1998 Version 5.005_02 297

perlmod Perl Programmers Reference Guide perlmod

identify_typeglob *bar::baz;

This prints

You gave me main::foo

You gave me bar::baz

The *foo{THING} notation can also be used to obtain references to the individual elements of *foo, see

perlref.

Package Constructors and Destructors

There are two special subroutine definitions that function as package constructors and destructors. These are

the BEGIN and END routines. The sub is optional for these routines.

A BEGIN subroutine is executed as soon as possible, that is, the moment it is completely defined, even

before the rest of the containing file is parsed. You may have multiple BEGIN blocks within a file—they

will execute in order of definition. Because a BEGIN block executes immediately, it can pull in definitions

of subroutines and such from other files in time to be visible to the rest of the file. Once a BEGIN has run, it

is immediately undefined and any code it used is returned to Perl‘s memory pool. This means you can‘t ever

explicitly call a BEGIN.

An END subroutine is executed as late as possible, that is, when the interpreter is being exited, even if it is

exiting as a result of a die() function. (But not if it‘s polymorphing into another program via exec, or

being blown out of the water by a signal—you have to trap that yourself (if you can).) You may have

multiple END blocks within a file—they will execute in reverse order of definition; that is: last in, first out

(LIFO).

Inside an END subroutine, $? contains the value that the script is going to pass to exit(). You can modify

$? to change the exit value of the script. Beware of changing $? by accident (e.g. by running something via

system).

Note that when you use the −n and −p switches to Perl, BEGIN and END work just as they do in awk, as a

degenerate case. As currently implemented (and subject to change, since its inconvenient at best), both

BEGIN and END blocks are run when you use the −c switch for a compile−only syntax check, although your

main code is not.

Perl Classes

There is no special class syntax in Perl, but a package may function as a class if it provides subroutines to act

as methods. Such a package may also derive some of its methods from another class (package) by listing the

other package name in its global @ISA array (which must be a package global, not a lexical).

For more on this, see perltoot and perlobj.

Perl Modules

A module is just a package that is defined in a library file of the same name, and is designed to be reusable.

It may do this by providing a mechanism for exporting some of its symbols into the symbol table of any

package using it. Or it may function as a class definition and make its semantics available implicitly through

method calls on the class and its objects, without explicit exportation of any symbols. Or it can do a little of

both.

For example, to start a normal module called Some::Module, create a file called Some/Module.pm and start

with this template:

package Some::Module; # assumes Some/Module.pm

use strict;

BEGIN {

use Exporter ();

use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);

298 Version 5.005_02 18−Oct−1998

perlmod Perl Programmers Reference Guide perlmod

# set the version for version checking

$VERSION = 1.00;

# if using RCS/CVS, this may be preferred

$VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d"

@ISA = qw(Exporter);

@EXPORT = qw(&func1 &func2 &func4);

%EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ],

# your exported package globals go here,

# as well as any optionally exported functions

@EXPORT_OK = qw($Var1 %Hashit &func3);

}

use vars @EXPORT_OK;

# non−exported package globals go here

use vars qw(@more $stuff);

# initalize package globals, first exported ones

$Var1 = ’’;

%Hashit = ();

# then the others (which are still accessible as $Some::Module::stuff)

$stuff = ’’;

@more = ();

# all file−scoped lexicals must be created before

# the functions below that use them.

# file−private lexicals go here

my $priv_var = ’’;

my %secret_hash = ();

# here’s a file−private function as a closure,

# callable as &$priv_func; it cannot be prototyped.

my $priv_func = sub {

# stuff goes here.

};

# make all your functions, whether exported or not;

# remember to put something interesting in the {} stubs

sub func1 {} # no prototype

sub func2() {} # proto’d void

sub func3($$) {} # proto’d to 2 scalars

# this one isn’t exported, but could be called!

sub func4(\%) {} # proto’d to 1 hash ref

END { } # module clean−up code here (global destructor)

Then go on to declare and use your variables in functions without any qualifications. See Exporter and the

perlmodlib for details on mechanics and style issues in module creation.

Perl modules are included into your program by saying

use Module;

use Module LIST;

This is exactly equivalent to

18−Oct−1998 Version 5.005_02 299

perlmod Perl Programmers Reference Guide perlmod

BEGIN { require Module; import Module; }

BEGIN { require Module; import Module LIST; }

As a special case

use Module ();

is exactly equivalent to

BEGIN { require Module; }

All Perl module files have the extension .pm. use assumes this so that you don‘t have to spell out

"Module.pm" in quotes. This also helps to differentiate new modules from old .pl and .ph files. Module

names are also capitalized unless they‘re functioning as pragmas, "Pragmas" are in effect compiler

directives, and are sometimes called "pragmatic modules" (or even "pragmata" if you‘re a classicist).

The two statements:

require SomeModule;

require "SomeModule.pm";

differ from each other in two ways. In the first case, any double colons in the module name, such as

Some::Module, are translated into your system‘s directory separator, usually "/". The second case does

not, and would have to be specified literally. The other difference is that seeing the first require clues in

the compiler that uses of indirect object notation involving "SomeModule", as in $ob = purge

SomeModule, are method calls, not function calls. (Yes, this really can make a difference.)

Because the use statement implies a BEGIN block, the importation of semantics happens at the moment the

use statement is compiled, before the rest of the file is compiled. This is how it is able to function as a

pragma mechanism, and also how modules are able to declare subroutines that are then visible as list

operators for the rest of the current file. This will not work if you use require instead of use. With

require you can get into this problem:

require Cwd; # make Cwd:: accessible

$here = Cwd::getcwd();

use Cwd; # import names from Cwd::

$here = getcwd();

require Cwd; # make Cwd:: accessible

$here = getcwd(); # oops! no main::getcwd()

In general, use Module () is recommended over require Module, because it determines module

availability at compile time, not in the middle of your program‘s execution. An exception would be if two

modules each tried to use each other, and each also called a function from that other module. In that case,

it‘s easy to use requires instead.

Perl packages may be nested inside other package names, so we can have package names containing ::.

But if we used that package name directly as a filename it would makes for unwieldy or impossible

filenames on some systems. Therefore, if a module‘s name is, say, Text::Soundex, then its definition is

actually found in the library file Text/Soundex.pm.

Perl modules always have a .pm file, but there may also be dynamically linked executables or autoloaded

subroutine definitions associated with the module. If so, these will be entirely transparent to the user of the

module. It is the responsibility of the .pm file to load (or arrange to autoload) any additional functionality.

The POSIX module happens to do both dynamic loading and autoloading, but the user can say just use

POSIX to get it all.

For more information on writing extension modules, see perlxstut and perlguts.

300 Version 5.005_02 18−Oct−1998

perlmod Perl Programmers Reference Guide perlmod

SEE ALSO

See perlmodlib for general style issues related to building Perl modules and classes as well as descriptions of

the standard library and CPAN, Exporter for how Perl‘s standard import/export mechanism works, perltoot

for an in−depth tutorial on creating classes, perlobj for a hard−core reference document on objects, and

perlsub for an explanation of functions and scoping.

18−Oct−1998 Version 5.005_02 301

perlref Perl Programmers Reference Guide perlref

NAME

perlref − Perl references and nested data structures

DESCRIPTION

Before release 5 of Perl it was difficult to represent complex data structures, because all references had to be

symbolic—and even then it was difficult to refer to a variable instead of a symbol table entry. Perl now not

only makes it easier to use symbolic references to variables, but also lets you have "hard" references to any

piece of data or code. Any scalar may hold a hard reference. Because arrays and hashes contain scalars, you

can now easily build arrays of arrays, arrays of hashes, hashes of arrays, arrays of hashes of functions, and so

on.

Hard references are smart—they keep track of reference counts for you, automatically freeing the thing

referred to when its reference count goes to zero. (Note: the reference counts for values in self−referential or

cyclic data structures may not go to zero without a little help; see

Two−Phased Garbage Collection in perlobj for a detailed explanation.) If that thing happens to be an object,

the object is destructed. See perlobj for more about objects. (In a sense, everything in Perl is an object, but

we usually reserve the word for references to objects that have been officially "blessed" into a class

package.)

Symbolic references are names of variables or other objects, just as a symbolic link in a Unix filesystem

contains merely the name of a file. The *glob notation is a kind of symbolic reference. (Symbolic

references are sometimes called "soft references", but please don‘t call them that; references are confusing

enough without useless synonyms.)

In contrast, hard references are more like hard links in a Unix file system: They are used to access an

underlying object without concern for what its (other) name is. When the word "reference" is used without

an adjective, as in the following paragraph, it is usually talking about a hard reference.

References are easy to use in Perl. There is just one overriding principle: Perl does no implicit referencing or

dereferencing. When a scalar is holding a reference, it always behaves as a simple scalar. It doesn‘t

magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing

it.

Making References

References can be created in several ways.

1. By using the backslash operator on a variable, subroutine, or value. (This works much like the &

(address−of) operator in C.) Note that this typically creates ANOTHER reference to a variable,

because there‘s already a reference to the variable in the symbol table. But the symbol table reference

might go away, and you‘ll still have the reference that the backslash returned. Here are some

examples:

$scalarref = \$foo;

$arrayref = \@ARGV;

$hashref = \%ENV;

$coderef = \&handler;

$globref = \*foo;

It isn‘t possible to create a true reference to an IO handle (filehandle or dirhandle) using the backslash

operator. The most you can get is a reference to a typeglob, which is actually a complete symbol table

entry. But see the explanation of the *foo{THING} syntax below. However, you can still use type

globs and globrefs as though they were IO handles.

2. A reference to an anonymous array can be created using square brackets:

$arrayref = [1, 2, [’a’, ’b’, ’c’]];

Here we‘ve created a reference to an anonymous array of three elements whose final element is itself a

reference to another anonymous array of three elements. (The multidimensional syntax described later

302 Version 5.005_02 18−Oct−1998

perlref Perl Programmers Reference Guide perlref

can be used to access this. For example, after the above, $arrayref−>[2][1] would have the

value "b".)

Note that taking a reference to an enumerated list is not the same as using square brackets—instead it‘s

the same as creating a list of references!

@list = (\$a, \@b, \%c);

@list = \($a, @b, %c); # same thing!

As a special case, \(@foo) returns a list of references to the contents of @foo, not a reference to

@foo itself. Likewise for %foo.

3. A reference to an anonymous hash can be created using curly brackets:

$hashref = {

’Adam’ => ’Eve’,

’Clyde’ => ’Bonnie’,

};

Anonymous hash and array composers like these can be intermixed freely to produce as complicated a

structure as you want. The multidimensional syntax described below works for these too. The values

above are literals, but variables and expressions would work just as well, because assignment operators

in Perl (even within local() or my()) are executable statements, not compile−time declarations.

Because curly brackets (braces) are used for several other things including BLOCKs, you may

occasionally have to disambiguate braces at the beginning of a statement by putting a + or a return

in front so that Perl realizes the opening brace isn‘t starting a BLOCK. The economy and mnemonic

value of using curlies is deemed worth this occasional extra hassle.

For example, if you wanted a function to make a new hash and return a reference to it, you have these

options:

sub hashem { { @_ } } # silently wrong

sub hashem { +{ @_ } } # ok

sub hashem { return { @_ } } # ok

On the other hand, if you want the other meaning, you can do this:

sub showem { { @_ } } # ambiguous (currently ok, but may change)

sub showem { {; @_ } } # ok

sub showem { { return @_ } } # ok

Note how the leading +{ and {; always serve to disambiguate the expression to mean either the

HASH reference, or the BLOCK.

4. A reference to an anonymous subroutine can be created by using sub without a subname:

$coderef = sub { print "Boink!\n" };

Note the presence of the semicolon. Except for the fact that the code inside isn‘t executed

immediately, a sub {} is not so much a declaration as it is an operator, like do{} or eval{}.

(However, no matter how many times you execute that particular line (unless you‘re in an

eval("...")), $coderef will still have a reference to the SAME anonymous subroutine.)

Anonymous subroutines act as closures with respect to my() variables, that is, variables visible

lexically within the current scope. Closure is a notion out of the Lisp world that says if you define an

anonymous function in a particular lexical context, it pretends to run in that context even when it‘s

called outside of the context.

In human terms, it‘s a funny way of passing arguments to a subroutine when you define it as well as

when you call it. It‘s useful for setting up little bits of code to run later, such as callbacks. You can

even do object−oriented stuff with it, though Perl already provides a different mechanism to do

that—see perlobj.

18−Oct−1998 Version 5.005_02 303

perlref Perl Programmers Reference Guide perlref

You can also think of closure as a way to write a subroutine template without using eval. (In fact, in

version 5.000, eval was the only way to get closures. You may wish to use "require 5.001" if you use

closures.)

Here‘s a small example of how closures works:

sub newprint {

my $x = shift;

return sub { my $y = shift; print "$x, $y!\n"; };

}

$h = newprint("Howdy");

$g = newprint("Greetings");

# Time passes...

&$h("world");

&$g("earthlings");

This prints

Howdy, world!

Greetings, earthlings!

Note particularly that $x continues to refer to the value passed into newprint() despite the fact that

the "my $x" has seemingly gone out of scope by the time the anonymous subroutine runs. That‘s

what closure is all about.

This applies only to lexical variables, by the way. Dynamic variables continue to work as they have

always worked. Closure is not something that most Perl programmers need trouble themselves about

to begin with.

5. References are often returned by special subroutines called constructors. Perl objects are just

references to a special kind of object that happens to know which package it‘s associated with.

Constructors are just special subroutines that know how to create that association. They do so by

starting with an ordinary reference, and it remains an ordinary reference even while it‘s also being an

object. Constructors are often named new() and called indirectly:

$objref = new Doggie (Tail => ’short’, Ears => ’long’);

But don‘t have to be:

$objref = Doggie−>new(Tail => ’short’, Ears => ’long’);

use Term::Cap;

$terminal = Term::Cap−>Tgetent( { OSPEED => 9600 });

use Tk;

$main = MainWindow−>new();

$menubar = $main−>Frame(−relief => "raised",

−borderwidth => 2)

6. References of the appropriate type can spring into existence if you dereference them in a context that

assumes they exist. Because we haven‘t talked about dereferencing yet, we can‘t show you any

examples yet.

7. A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax.

*foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which

holds everything known as foo).

$scalarref = *foo{SCALAR};

$arrayref = *ARGV{ARRAY};

$hashref = *ENV{HASH};

$coderef = *handler{CODE};

304 Version 5.005_02 18−Oct−1998

perlref Perl Programmers Reference Guide perlref

$ioref = *STDIN{IO};

$globref = *foo{GLOB};

All of these are self−explanatory except for *foo{IO}. It returns the IO handle, used for file handles

(open), sockets (socket and socketpair), and directory handles (opendir). For compatibility with

previous versions of Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}.

*foo{THING} returns undef if that particular THING hasn‘t been used yet, except in the case of

scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn‘t been used yet.

This might change in a future release.

*foo{IO} is an alternative to the \*HANDLE mechanism given in

Typeglobs and Filehandles in perldata for passing filehandles into or out of subroutines, or storing into

larger data structures. Its disadvantage is that it won‘t create a new filehandle for you. Its advantage is

that you have no risk of clobbering more than you want to with a typeglob assignment, although if you

assign to a scalar instead of a typeglob, you‘re ok.

splutter(*STDOUT);

splutter(*STDOUT{IO});

sub splutter {

my $fh = shift;

print $fh "her um well a hmmm\n";

}

$rec = get_rec(*STDIN);

$rec = get_rec(*STDIN{IO});

sub get_rec {

my $fh = shift;

return scalar <$fh>;

}

Using References

That‘s it for creating references. By now you‘re probably dying to know how to use references to get back to

your long−lost data. There are several basic methods.

1. Anywhere you‘d put an identifier (or chain of identifiers) as part of a variable or subroutine name, you

can replace the identifier with a simple scalar variable containing a reference of the correct type:

$bar = $$scalarref;

push(@$arrayref, $filename);

$$arrayref[0] = "January";

$$hashref{"KEY"} = "VALUE";

&$coderef(1,2,3);

print $globref "output\n";

It‘s important to understand that we are specifically NOT dereferencing $arrayref[0] or

$hashref{"KEY"} there. The dereference of the scalar variable happens BEFORE it does any key

lookups. Anything more complicated than a simple scalar variable must use methods 2 or 3 below.

However, a "simple scalar" includes an identifier that itself uses method 1 recursively. Therefore, the

following prints "howdy".

$refrefref = \\\"howdy";

print $$$$refrefref;

2. Anywhere you‘d put an identifier (or chain of identifiers) as part of a variable or subroutine name, you

can replace the identifier with a BLOCK returning a reference of the correct type. In other words, the

previous examples could be written like this:

$bar = ${$scalarref};

18−Oct−1998 Version 5.005_02 305

perlref Perl Programmers Reference Guide perlref

push(@{$arrayref}, $filename);

${$arrayref}[0] = "January";

${$hashref}{"KEY"} = "VALUE";

&{$coderef}(1,2,3);

$globref−>print("output\n"); # iff IO::Handle is loaded

Admittedly, it‘s a little silly to use the curlies in this case, but the BLOCK can contain any arbitrary

expression, in particular, subscripted expressions:

&{ $dispatch{$index} }(1,2,3); # call correct routine

Because of being able to omit the curlies for the simple case of $$x, people often make the mistake of

viewing the dereferencing symbols as proper operators, and wonder about their precedence. If they

were, though, you could use parentheses instead of braces. That‘s not the case. Consider the difference

below; case 0 is a short−hand version of case 1, NOT case 2:

$$hashref{"KEY"} = "VALUE"; # CASE 0

${$hashref}{"KEY"} = "VALUE"; # CASE 1

${$hashref{"KEY"}} = "VALUE"; # CASE 2

${$hashref−>{"KEY"}} = "VALUE"; # CASE 3

Case 2 is also deceptive in that you‘re accessing a variable called %hashref, not dereferencing through

$hashref to the hash it‘s presumably referencing. That would be case 3.

3. Subroutine calls and lookups of individual array elements arise often enough that it gets cumbersome

to use method 2. As a form of syntactic sugar, the examples for method 2 may be written:

$arrayref−>[0] = "January"; # Array element

$hashref−>{"KEY"} = "VALUE"; # Hash element

$coderef−>(1,2,3); # Subroutine call

The left side of the arrow can be any expression returning a reference, including a previous

dereference. Note that $array[$x] is NOT the same thing as $array−>[$x] here:

$array[$x]−>{"foo"}−>[0] = "January";

This is one of the cases we mentioned earlier in which references could spring into existence when in

an lvalue context. Before this statement, $array[$x] may have been undefined. If so, it‘s

automatically defined with a hash reference so that we can look up {"foo"} in it. Likewise

$array[$x]−>{"foo"} will automatically get defined with an array reference so that we can look

up [0] in it. This process is called autovivification.

One more thing here. The arrow is optional BETWEEN brackets subscripts, so you can shrink the

above down to

$array[$x]{"foo"}[0] = "January";

Which, in the degenerate case of using only ordinary arrays, gives you multidimensional arrays just

like C‘s:

$score[$x][$y][$z] += 42;

Well, okay, not entirely like C‘s arrays, actually. C doesn‘t know how to grow its arrays on demand.

Perl does.

4. If a reference happens to be a reference to an object, then there are probably methods to access the

things referred to, and you should probably stick to those methods unless you‘re in the class package

that defines the object‘s methods. In other words, be nice, and don‘t violate the object‘s encapsulation

without a very good reason. Perl does not enforce encapsulation. We are not totalitarians here. We do

expect some basic civility though.

The ref() operator may be used to determine what type of thing the reference is pointing to. See perlfunc.

306 Version 5.005_02 18−Oct−1998

perlref Perl Programmers Reference Guide perlref

The bless() operator may be used to associate the object a reference points to with a package functioning

as an object class. See perlobj.

A typeglob may be dereferenced the same way a reference can, because the dereference syntax always

indicates the kind of reference desired. So ${*foo} and ${\$foo} both indicate the same scalar variable.

Here‘s a trick for interpolating a subroutine call into a string:

print "My sub returned @{[mysub(1,2,3)]} that time.\n";

The way it works is that when the @{...} is seen in the double−quoted string, it‘s evaluated as a block.

The block creates a reference to an anonymous array containing the results of the call to mysub(1,2,3).

So the whole block returns a reference to an array, which is then dereferenced by @{...} and stuck into the

double−quoted string. This chicanery is also useful for arbitrary expressions:

print "That yields @{[$n + 5]} widgets\n";

Symbolic references

We said that references spring into existence as necessary if they are undefined, but we didn‘t say what

happens if a value used as a reference is already defined, but ISN‘T a hard reference. If you use it as a

reference in this case, it‘ll be treated as a symbolic reference. That is, the value of the scalar is taken to be

the NAME of a variable, rather than a direct link to a (possibly) anonymous value.

People frequently expect it to work like this. So it does.

$name = "foo";

$$name = 1; # Sets $foo

${$name} = 2; # Sets $foo

${$name x 2} = 3; # Sets $foofoo

$name−>[0] = 4; # Sets $foo[0]

@$name = (); # Clears @foo

&$name(); # Calls &foo() (as in Perl 4)

$pack = "THAT";

${"${pack}::$name"} = 5; # Sets $THAT::foo without eval

This is very powerful, and slightly dangerous, in that it‘s possible to intend (with the utmost sincerity) to use

a hard reference, and accidentally use a symbolic reference instead. To protect against that, you can say

use strict ’refs’;

and then only hard references will be allowed for the rest of the enclosing block. An inner block may

countermand that with

no strict ’refs’;

Only package variables (globals, even if localized) are visible to symbolic references. Lexical variables

(declared with my()) aren‘t in a symbol table, and thus are invisible to this mechanism. For example:

local $value = 10;

$ref = \$value;

{

my $value = 20;

print $$ref;

}

This will still print 10, not 20. Remember that local() affects package variables, which are all "global" to

the package.

Not−so−symbolic references

A new feature contributing to readability in perl version 5.001 is that the brackets around a symbolic

reference behave more like quotes, just as they always have within a string. That is,

18−Oct−1998 Version 5.005_02 307

perlref Perl Programmers Reference Guide perlref

$push = "pop on ";

print "${push}over";

has always meant to print "pop on over", despite the fact that push is a reserved word. This has been

generalized to work the same outside of quotes, so that

print ${push} . "over";

and even

print ${ push } . "over";

will have the same effect. (This would have been a syntax error in Perl 5.000, though Perl 4 allowed it in the

spaceless form.) Note that this construct is not considered to be a symbolic reference when you‘re using

strict refs:

use strict ’refs’;

${ bareword }; # Okay, means $bareword.

${ "bareword" }; # Error, symbolic reference.

Similarly, because of all the subscripting that is done using single words, we‘ve applied the same rule to any

bareword that is used for subscripting a hash. So now, instead of writing

$array{ "aaa" }{ "bbb" }{ "ccc" }

you can write just

$array{ aaa }{ bbb }{ ccc }

and not worry about whether the subscripts are reserved words. In the rare event that you do wish to do

something like

$array{ shift }

you can force interpretation as a reserved word by adding anything that makes it more than a bareword:

$array{ shift() }

$array{ +shift }

$array{ shift @_ }

The −w switch will warn you if it interprets a reserved word as a string. But it will no longer warn you about

using lowercase words, because the string is effectively quoted.

Pseudo−hashes: Using an array as a hash

WARNING: This section describes an experimental feature. Details may change without notice in future

versions.

Beginning with release 5.005 of Perl you can use an array reference in some contexts that would normally

require a hash reference. This allows you to access array elements using symbolic names, as if they were

fields in a structure.

For this to work, the array must contain extra information. The first element of the array has to be a hash

reference that maps field names to array indices. Here is an example:

$struct = [{foo => 1, bar => 2}, "FOO", "BAR"];

$struct−>{foo}; # same as $struct−>[1], i.e. "FOO"

$struct−>{bar}; # same as $struct−>[2], i.e. "BAR"

keys %$struct; # will return ("foo", "bar") in some order

values %$struct; # will return ("FOO", "BAR") in same some order

while (my($k,$v) = each %$struct) {

print "$k => $v\n";

}

308 Version 5.005_02 18−Oct−1998

perlref Perl Programmers Reference Guide perlref

Perl will raise an exception if you try to delete keys from a pseudo−hash or try to access nonexistent fields.

For better performance, Perl can also do the translation from field names to array indices at compile time for

typed object references. See fields.

Function Templates

As explained above, a closure is an anonymous function with access to the lexical variables visible when that

function was compiled. It retains access to those variables even though it doesn‘t get run until later, such as

in a signal handler or a Tk callback.

Using a closure as a function template allows us to generate many functions that act similarly. Suppopose

you wanted functions named after the colors that generated HTML font changes for the various colors:

print "Be ", red("careful"), "with that ", green("light");

The red() and green() functions would be very similar. To create these, we‘ll assign a closure to a

typeglob of the name of the function we‘re trying to build.

@colors = qw(red blue green yellow orange purple violet);

for my $name (@colors) {

no strict ’refs’; # allow symbol table manipulation

*$name = *{uc $name} = sub { "<FONT COLOR=’$name’>@_</FONT>" };

}

Now all those different functions appear to exist independently. You can call red(), RED(), blue(),

BLUE(), green(), etc. This technique saves on both compile time and memory use, and is less

error−prone as well, since syntax checks happen at compile time. It‘s critical that any variables in the

anonymous subroutine be lexicals in order to create a proper closure. That‘s the reasons for the my on the

loop iteration variable.

This is one of the only places where giving a prototype to a closure makes much sense. If you wanted to

impose scalar context on the arguments of these functions (probably not a wise idea for this particular

example), you could have written it this way instead:

*$name = sub ($) { "<FONT COLOR=’$name’>$_[0]</FONT>" };

However, since prototype checking happens at compile time, the assignment above happens too late to be of

much use. You could address this by putting the whole loop of assignments within a BEGIN block, forcing

it to occur during compilation.

Access to lexicals that change over type—like those in the for loop above—only works with closures, not

general subroutines. In the general case, then, named subroutines do not nest properly, although anonymous

ones do. If you are accustomed to using nested subroutines in other programming languages with their own

private variables, you‘ll have to work at it a bit in Perl. The intuitive coding of this kind of thing incurs

mysterious warnings about ‘‘will not stay shared‘’. For example, this won‘t work:

sub outer {

my $x = $_[0] + 35;

sub inner { return $x * 19 } # WRONG

return $x + inner();

}

A work−around is the following:

sub outer {

my $x = $_[0] + 35;

local *inner = sub { return $x * 19 };

return $x + inner();

}

Now inner() can only be called from within outer(), because of the temporary assignments of the

closure (anonymous subroutine). But when it does, it has normal access to the lexical variable $x from the

18−Oct−1998 Version 5.005_02 309

perlref Perl Programmers Reference Guide perlref

scope of outer().

This has the interesting effect of creating a function local to another function, something not normally

supported in Perl.

WARNING

You may not (usefully) use a reference as the key to a hash. It will be converted into a string:

$x{ \$a } = $a;

If you try to dereference the key, it won‘t do a hard dereference, and you won‘t accomplish what you‘re

attempting. You might want to do something more like

$r = \@a;

$x{ $r } = $r;

And then at least you can use the values(), which will be real refs, instead of the keys(), which won‘t.

The standard Tie::RefHash module provides a convenient workaround to this.

SEE ALSO

Besides the obvious documents, source code can be instructive. Some rather pathological examples of the

use of references can be found in the t/op/ref.t regression test in the Perl source directory.

See also perldsc and perllol for how to use references to create complex data structures, and perltoot,

perlobj, and perlbot for how to use them to create objects.

310 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

NAME

perldsc − Perl Data Structures Cookbook

DESCRIPTION

The single feature most sorely lacking in the Perl programming language prior to its 5.0 release was complex

data structures. Even without direct language support, some valiant programmers did manage to emulate

them, but it was hard work and not for the faint of heart. You could occasionally get away with the

$m{$LoL,$b} notation borrowed from awk in which the keys are actually more like a single concatenated

string "$LoL$b", but traversal and sorting were difficult. More desperate programmers even hacked

Perl‘s internal symbol table directly, a strategy that proved hard to develop and maintain—to put it mildly.

The 5.0 release of Perl let us have complex data structures. You may now write something like this and all

of a sudden, you‘d have a array with three dimensions!

for $x (1 .. 10) {

for $y (1 .. 10) {

for $z (1 .. 10) {

$LoL[$x][$y][$z] =

$x ** $y + $z;

}

Alas, however simple this may appear, underneath it‘s a much more elaborate construct than meets the eye!

How do you print it out? Why can‘t you say just print @LoL? How do you sort it? How can you pass it

to a function or get one of these back from a function? Is is an object? Can you save it to disk to read back

later? How do you access whole rows or columns of that matrix? Do all the values have to be numeric?

As you see, it‘s quite easy to become confused. While some small portion of the blame for this can be

attributed to the reference−based implementation, it‘s really more due to a lack of existing documentation

with examples designed for the beginner.

This document is meant to be a detailed but understandable treatment of the many different sorts of data

structures you might want to develop. It should also serve as a cookbook of examples. That way, when you

need to create one of these complex data structures, you can just pinch, pilfer, or purloin a drop−in example

from here.

Let‘s look at each of these possible constructs in detail. There are separate sections on each of the following:

arrays of arrays

hashes of arrays

arrays of hashes

hashes of hashes

more elaborate constructs

But for now, let‘s look at general issues common to all these types of data structures.

REFERENCES

The most important thing to understand about all data structures in Perl — including multidimensional

arrays—is that even though they might appear otherwise, Perl @ARRAYs and %HASHes are all internally

one−dimensional. They can hold only scalar values (meaning a string, number, or a reference). They cannot

directly contain other arrays or hashes, but instead contain references to other arrays or hashes.

You can‘t use a reference to a array or hash in quite the same way that you would a real array or hash. For C

or C++ programmers unused to distinguishing between arrays and pointers to the same, this can be

confusing. If so, just think of it as the difference between a structure and a pointer to a structure.

18−Oct−1998 Version 5.005_02 311

perldsc Perl Programmers Reference Guide perldsc

You can (and should) read more about references in the perlref(1) man page. Briefly, references are rather

like pointers that know what they point to. (Objects are also a kind of reference, but we won‘t be needing

them right away—if ever.) This means that when you have something which looks to you like an access to a

two−or−more−dimensional array and/or hash, what‘s really going on is that the base type is merely a

one−dimensional entity that contains references to the next level. It‘s just that you can use it as though it

were a two−dimensional one. This is actually the way almost all C multidimensional arrays work as well.

$list[7][12] # array of arrays

$list[7]{string} # array of hashes

$hash{string}[7] # hash of arrays

$hash{string}{’another string’} # hash of hashes

Now, because the top level contains only references, if you try to print out your array in with a simple

print() function, you‘ll get something that doesn‘t look very nice, like this:

@LoL = ( [2, 3], [4, 5, 7], [0] );

print $LoL[1][2];

print @LoL;

ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)

That‘s because Perl doesn‘t (ever) implicitly dereference your variables. If you want to get at the thing a

reference is referring to, then you have to do this yourself using either prefix typing indicators, like

${$blah}, @{$blah}, @{$blah[$i]}, or else postfix pointer arrows, like $a−>[3],

$h−>{fred}, or even $ob−>method()−>[3].

COMMON MISTAKES

The two most common mistakes made in constructing something like an array of arrays is either accidentally

counting the number of elements or else taking a reference to the same memory location repeatedly. Here‘s

the case where you just get the count instead of a nested array:

for $i (1..10) {

@list = somefunc($i);

$LoL[$i] = @list; # WRONG!

}

That‘s just the simple case of assigning a list to a scalar and getting its element count. If that‘s what you

really and truly want, then you might do well to consider being a tad more explicit about it, like this:

for $i (1..10) {

@list = somefunc($i);

$counts[$i] = scalar @list;

}

Here‘s the case of taking a reference to the same memory location again and again:

for $i (1..10) {

@list = somefunc($i);

$LoL[$i] = \@list; # WRONG!

}

So, what‘s the big problem with that? It looks right, doesn‘t it? After all, I just told you that you need an

array of references, so by golly, you‘ve made me one!

Unfortunately, while this is true, it‘s still broken. All the references in @LoL refer to the very same place,

and they will therefore all hold whatever was last in @list! It‘s similar to the problem demonstrated in the

following C program:

#include <pwd.h>

main() {

struct passwd *getpwnam(), *rp, *dp;

312 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

rp = getpwnam("root");

dp = getpwnam("daemon");

printf("daemon name is %s\nroot name is %s\n",

dp−>pw_name, rp−>pw_name);

}

Which will print

daemon name is daemon

root name is daemon

The problem is that both rp and dp are pointers to the same location in memory! In C, you‘d have to

remember to malloc() yourself some new memory. In Perl, you‘ll want to use the array constructor [] or

the hash constructor {} instead. Here‘s the right way to do the preceding broken code fragments:

for $i (1..10) {

@list = somefunc($i);

$LoL[$i] = [ @list ];

}

The square brackets make a reference to a new array with a copy of what‘s in @list at the time of the

assignment. This is what you want.

Note that this will produce something similar, but it‘s much harder to read:

for $i (1..10) {

@list = 0 .. $i;

@{$LoL[$i]} = @list;

}

Is it the same? Well, maybe so—and maybe not. The subtle difference is that when you assign something in

square brackets, you know for sure it‘s always a brand new reference with a new copy of the data. Something

else could be going on in this new case with the @{$LoL[$i]}} dereference on the left−hand−side of the

assignment. It all depends on whether $LoL[$i] had been undefined to start with, or whether it already

contained a reference. If you had already populated @LoL with references, as in

$LoL[3] = \@another_list;

Then the assignment with the indirection on the left−hand−side would use the existing reference that was

already there:

@{$LoL[3]} = @list;

Of course, this would have the "interesting" effect of clobbering @another_list. (Have you ever noticed how

when a programmer says something is "interesting", that rather than meaning "intriguing", they‘re

disturbingly more apt to mean that it‘s "annoying", "difficult", or both? :−)

So just remember always to use the array or hash constructors with [] or {}, and you‘ll be fine, although

it‘s not always optimally efficient.

Surprisingly, the following dangerous−looking construct will actually work out fine:

for $i (1..10) {

my @list = somefunc($i);

$LoL[$i] = \@list;

}

That‘s because my() is more of a run−time statement than it is a compile−time declaration per se. This

means that the my() variable is remade afresh each time through the loop. So even though it looks as

though you stored the same variable reference each time, you actually did not! This is a subtle distinction

that can produce more efficient code at the risk of misleading all but the most experienced of programmers.

So I usually advise against teaching it to beginners. In fact, except for passing arguments to functions, I

18−Oct−1998 Version 5.005_02 313

perldsc Perl Programmers Reference Guide perldsc

seldom like to see the gimme−a−reference operator (backslash) used much at all in code. Instead, I advise

beginners that they (and most of the rest of us) should try to use the much more easily understood

constructors [] and {} instead of relying upon lexical (or dynamic) scoping and hidden reference−counting

to do the right thing behind the scenes.

In summary:

$LoL[$i] = [ @list ]; # usually best

$LoL[$i] = \@list; # perilous; just how my() was that list?

@{ $LoL[$i] } = @list; # way too tricky for most programmers

CAVEAT ON PRECEDENCE

Speaking of things like @{$LoL[$i]}, the following are actually the same thing:

$listref−>[2][2] # clear

$$listref[2][2] # confusing

That‘s because Perl‘s precedence rules on its five prefix dereferencers (which look like someone swearing: $

@ * % &) make them bind more tightly than the postfix subscripting brackets or braces! This will no

doubt come as a great shock to the C or C++ programmer, who is quite accustomed to using *a[i] to mean

what‘s pointed to by the i‘th element of a. That is, they first take the subscript, and only then dereference

the thing at that subscript. That‘s fine in C, but this isn‘t C.

The seemingly equivalent construct in Perl, $$listref[$i] first does the deref of $listref, making

it take $listref as a reference to an array, and then dereference that, and finally tell you the i‘th value of

the array pointed to by $LoL. If you wanted the C notion, you‘d have to write ${$LoL[$i]} to force the

$LoL[$i] to get evaluated first before the leading $ dereferencer.

WHY YOU SHOULD ALWAYS use strict

If this is starting to sound scarier than it‘s worth, relax. Perl has some features to help you avoid its most

common pitfalls. The best way to avoid getting confused is to start every program like this:

#!/usr/bin/perl −w

use strict;

This way, you‘ll be forced to declare all your variables with my() and also disallow accidental "symbolic

dereferencing". Therefore if you‘d done this:

my $listref = [

[ "fred", "barney", "pebbles", "bambam", "dino", ],

[ "homer", "bart", "marge", "maggie", ],

[ "george", "jane", "elroy", "judy", ],

];

print $listref[2][2];

The compiler would immediately flag that as an error at compile time, because you were accidentally

accessing @listref, an undeclared variable, and it would thereby remind you to write instead:

print $listref−>[2][2]

DEBUGGING

Before version 5.002, the standard Perl debugger didn‘t do a very nice job of printing out complex data

structures. With 5.002 or above, the debugger includes several new features, including command line editing

as well as the x command to dump out complex data structures. For example, given the assignment to $LoL

above, here‘s the debugger output:

DB<1> x $LoL

$LoL = ARRAY(0x13b5a0)

0 ARRAY(0x1f0a24)

0 ’fred’

314 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

1 ’barney’

2 ’pebbles’

3 ’bambam’

4 ’dino’

1 ARRAY(0x13b558)

0 ’homer’

1 ’bart’

2 ’marge’

3 ’maggie’

2 ARRAY(0x13b540)

0 ’george’

1 ’jane’

2 ’elroy’

3 ’judy’

CODE EXAMPLES

Presented with little comment (these will get their own manpages someday) here are short code examples

illustrating access of various types of data structures.

LISTS OF LISTS

Declaration of a LIST OF LISTS

@LoL = (

[ "fred", "barney" ],

[ "george", "jane", "elroy" ],

[ "homer", "marge", "bart" ],

);

Generation of a LIST OF LISTS

# reading from file

while ( <> ) {

push @LoL, [ split ];

}

# calling a function

for $i ( 1 .. 10 ) {

$LoL[$i] = [ somefunc($i) ];

}

# using temp vars

for $i ( 1 .. 10 ) {

@tmp = somefunc($i);

$LoL[$i] = [ @tmp ];

}

# add to an existing row

push @{ $LoL[0] }, "wilma", "betty";

Access and Printing of a LIST OF LISTS

# one element

$LoL[0][0] = "Fred";

# another element

$LoL[1][1] =~ s/(\w)/\u$1/;

# print the whole thing with refs

for $aref ( @LoL ) {

print "\t [ @$aref ],\n";

}

18−Oct−1998 Version 5.005_02 315

perldsc Perl Programmers Reference Guide perldsc

# print the whole thing with indices

for $i ( 0 .. $#LoL ) {

print "\t [ @{$LoL[$i]} ],\n";

}

# print the whole thing one at a time

for $i ( 0 .. $#LoL ) {

for $j ( 0 .. $#{ $LoL[$i] } ) {

print "elt $i $j is $LoL[$i][$j]\n";

}

HASHES OF LISTS

Declaration of a HASH OF LISTS

%HoL = (

flintstones => [ "fred", "barney" ],

jetsons => [ "george", "jane", "elroy" ],

simpsons => [ "homer", "marge", "bart" ],

);

Generation of a HASH OF LISTS

# reading from file

# flintstones: fred barney wilma dino

while ( <> ) {

next unless s/^(.*?):\s*//;

$HoL{$1} = [ split ];

}

# reading from file; more temps

# flintstones: fred barney wilma dino

while ( $line = <> ) {

($who, $rest) = split /:\s*/, $line, 2;

@fields = split ’ ’, $rest;

$HoL{$who} = [ @fields ];

}

# calling a function that returns a list

for $group ( "simpsons", "jetsons", "flintstones" ) {

$HoL{$group} = [ get_family($group) ];

}

# likewise, but using temps

for $group ( "simpsons", "jetsons", "flintstones" ) {

@members = get_family($group);

$HoL{$group} = [ @members ];

}

# append new members to an existing family

push @{ $HoL{"flintstones"} }, "wilma", "betty";

Access and Printing of a HASH OF LISTS

# one element

$HoL{flintstones}[0] = "Fred";

# another element

$HoL{simpsons}[1] =~ s/(\w)/\u$1/;

# print the whole thing

316 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

foreach $family ( keys %HoL ) {

print "$family: @{ $HoL{$family} }\n"

}

# print the whole thing with indices

foreach $family ( keys %HoL ) {

print "family: ";

foreach $i ( 0 .. $#{ $HoL{$family} } ) {

print " $i = $HoL{$family}[$i]";

}

print "\n";

}

# print the whole thing sorted by number of members

foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) {

print "$family: @{ $HoL{$family} }\n"

}

# print the whole thing sorted by number of members and name

foreach $family ( sort {

@{$HoL{$b}} <=> @{$HoL{$a}}

$a cmp $b

} keys %HoL )

{

print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n";

}

LISTS OF HASHES

Declaration of a LIST OF HASHES

@LoH = (

{

Lead => "fred",

Friend => "barney",

{

Lead => "george",

Wife => "jane",

Son => "elroy",

{

Lead => "homer",

Wife => "marge",

Son => "bart",

}

);

Generation of a LIST OF HASHES

# reading from file

# format: LEAD=fred FRIEND=barney

while ( <> ) {

$rec = {};

for $field ( split ) {

($key, $value) = split /=/, $field;

$rec−>{$key} = $value;

}

18−Oct−1998 Version 5.005_02 317

perldsc Perl Programmers Reference Guide perldsc

push @LoH, $rec;

}

# reading from file

# format: LEAD=fred FRIEND=barney

# no temp

while ( <> ) {

push @LoH, { split /[\s+=]/ };

}

# calling a function that returns a key,value list, like

# "lead","fred","daughter","pebbles"

while ( %fields = getnextpairset() ) {

push @LoH, { %fields };

}

# likewise, but using no temp vars

while (<>) {

push @LoH, { parsepairs($_) };

}

# add key/value to an element

$LoH[0]{pet} = "dino";

$LoH[2]{pet} = "santa’s little helper";

Access and Printing of a LIST OF HASHES

# one element

$LoH[0]{lead} = "fred";

# another element

$LoH[1]{lead} =~ s/(\w)/\u$1/;

# print the whole thing with refs

for $href ( @LoH ) {

print "{ ";

for $role ( keys %$href ) {

print "$role=$href−>{$role} ";

}

print "}\n";

}

# print the whole thing with indices

for $i ( 0 .. $#LoH ) {

print "$i is { ";

for $role ( keys %{ $LoH[$i] } ) {

print "$role=$LoH[$i]{$role} ";

}

print "}\n";

}

# print the whole thing one at a time

for $i ( 0 .. $#LoH ) {

for $role ( keys %{ $LoH[$i] } ) {

print "elt $i $role is $LoH[$i]{$role}\n";

}

318 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

HASHES OF HASHES

Declaration of a HASH OF HASHES

%HoH = (

flintstones => {

lead => "fred",

pal => "barney",

jetsons => {

lead => "george",

wife => "jane",

"his boy" => "elroy",

simpsons => {

lead => "homer",

wife => "marge",

kid => "bart",

);

Generation of a HASH OF HASHES

# reading from file

# flintstones: lead=fred pal=barney wife=wilma pet=dino

while ( <> ) {

next unless s/^(.*?):\s*//;

$who = $1;

for $field ( split ) {

($key, $value) = split /=/, $field;

$HoH{$who}{$key} = $value;

}

# reading from file; more temps

while ( <> ) {

next unless s/^(.*?):\s*//;

$who = $1;

$rec = {};

$HoH{$who} = $rec;

for $field ( split ) {

($key, $value) = split /=/, $field;

$rec−>{$key} = $value;

}

# calling a function that returns a key,value hash

for $group ( "simpsons", "jetsons", "flintstones" ) {

$HoH{$group} = { get_family($group) };

}

# likewise, but using temps

for $group ( "simpsons", "jetsons", "flintstones" ) {

%members = get_family($group);

$HoH{$group} = { %members };

}

# append new members to an existing family

%new_folks = (

wife => "wilma",

18−Oct−1998 Version 5.005_02 319

perldsc Perl Programmers Reference Guide perldsc

pet => "dino",

);

for $what (keys %new_folks) {

$HoH{flintstones}{$what} = $new_folks{$what};

}

Access and Printing of a HASH OF HASHES

# one element

$HoH{flintstones}{wife} = "wilma";

# another element

$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;

# print the whole thing

foreach $family ( keys %HoH ) {

print "$family: { ";

for $role ( keys %{ $HoH{$family} } ) {

print "$role=$HoH{$family}{$role} ";

}

print "}\n";

}

# print the whole thing somewhat sorted

foreach $family ( sort keys %HoH ) {

print "$family: { ";

for $role ( sort keys %{ $HoH{$family} } ) {

print "$role=$HoH{$family}{$role} ";

}

print "}\n";

}

# print the whole thing sorted by number of members

foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {

print "$family: { ";

for $role ( sort keys %{ $HoH{$family} } ) {

print "$role=$HoH{$family}{$role} ";

}

print "}\n";

}

# establish a sort order (rank) for each role

$i = 0;

for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }

# now print the whole thing sorted by number of members

foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {

print "$family: { ";

# and print these according to rank order

for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {

print "$role=$HoH{$family}{$role} ";

}

print "}\n";

}

MORE ELABORATE RECORDS

Declaration of MORE ELABORATE RECORDS

Here‘s a sample showing how to create and use a record whose fields are of many different sorts:

320 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

$rec = {

TEXT => $string,

SEQUENCE => [ @old_values ],

LOOKUP => { %some_table },

THATCODE => \&some_function,

THISCODE => sub { $_[0] ** $_[1] },

HANDLE => \*STDOUT,

};

print $rec−>{TEXT};

print $rec−>{LIST}[0];

$last = pop @ { $rec−>{SEQUENCE} };

print $rec−>{LOOKUP}{"key"};

($first_k, $first_v) = each %{ $rec−>{LOOKUP} };

$answer = $rec−>{THATCODE}−>($arg);

$answer = $rec−>{THISCODE}−>($arg1, $arg2);

# careful of extra block braces on fh ref

print { $rec−>{HANDLE} } "a string\n";

use FileHandle;

$rec−>{HANDLE}−>autoflush(1);

$rec−>{HANDLE}−>print(" a string\n");

Declaration of a HASH OF COMPLEX RECORDS

%TV = (

flintstones => {

series => "flintstones",

nights => [ qw(monday thursday friday) ],

members => [

{ name => "fred", role => "lead", age => 36, },

{ name => "wilma", role => "wife", age => 31, },

{ name => "pebbles", role => "kid", age => 4, },

jetsons => {

series => "jetsons",

nights => [ qw(wednesday saturday) ],

members => [

{ name => "george", role => "lead", age => 41, },

{ name => "jane", role => "wife", age => 39, },

{ name => "elroy", role => "kid", age => 9, },

simpsons => {

series => "simpsons",

nights => [ qw(monday) ],

members => [

{ name => "homer", role => "lead", age => 34, },

{ name => "marge", role => "wife", age => 37, },

{ name => "bart", role => "kid", age => 11, },

);

18−Oct−1998 Version 5.005_02 321

perldsc Perl Programmers Reference Guide perldsc

Generation of a HASH OF COMPLEX RECORDS

# reading from file

# this is most easily done by having the file itself be

# in the raw data format as shown above. perl is happy

# to parse complex data structures if declared as data, so

# sometimes it’s easiest to do that

# here’s a piece by piece build up

$rec = {};

$rec−>{series} = "flintstones";

$rec−>{nights} = [ find_days() ];

@members = ();

# assume this file in field=value syntax

while (<>) {

%fields = split /[\s=]+/;

push @members, { %fields };

}

$rec−>{members} = [ @members ];

# now remember the whole thing

$TV{ $rec−>{series} } = $rec;

###########################################################

# now, you might want to make interesting extra fields that

# include pointers back into the same data structure so if

# change one piece, it changes everywhere, like for examples

# if you wanted a {kids} field that was an array reference

# to a list of the kids’ records without having duplicate

# records and thus update problems.

###########################################################

foreach $family (keys %TV) {

$rec = $TV{$family}; # temp pointer

@kids = ();

for $person ( @{ $rec−>{members} } ) {

if ($person−>{role} =~ /kid|son|daughter/) {

push @kids, $person;

}

# REMEMBER: $rec and $TV{$family} point to same data!!

$rec−>{kids} = [ @kids ];

}

# you copied the list, but the list itself contains pointers

# to uncopied objects. this means that if you make bart get

# older via

$TV{simpsons}{kids}[0]{age}++;

# then this would also change in

print $TV{simpsons}{members}[2]{age};

# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]

# both point to the same underlying anonymous hash table

# print the whole thing

foreach $family ( keys %TV ) {

print "the $family";

322 Version 5.005_02 18−Oct−1998

perldsc Perl Programmers Reference Guide perldsc

print " is on during @{ $TV{$family}{nights} }\n";

print "its members are:\n";

for $who ( @{ $TV{$family}{members} } ) {

print " $who−>{name} ($who−>{role}), age $who−>{age}\n";

}

print "it turns out that $TV{$family}{lead} has ";

print scalar ( @{ $TV{$family}{kids} } ), " kids named ";

print join (", ", map { $_−>{name} } @{ $TV{$family}{kids} } );

print "\n";

}

Database Ties

You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is

that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with

how references are to be represented on disk. One experimental module that does partially attempt to

address this need is the MLDBM module. Check your nearest CPAN site as described in perlmodlib for

source code to MLDBM.

SEE ALSO

perlref(1), perllol(1), perldata(1), perlobj(1)

AUTHOR

Tom Christiansen <tchrist@perl.com

Last update: Wed Oct 23 04:57:50 MET DST 1996

18−Oct−1998 Version 5.005_02 323

perllol Perl Programmers Reference Guide perllol

NAME

perlLoL − Manipulating Lists of Lists in Perl

DESCRIPTION

Declaration and Access of Lists of Lists

The simplest thing to build is a list of lists (sometimes called an array of arrays). It‘s reasonably easy to

understand, and almost everything that applies here will also be applicable later on with the fancier data

structures.

A list of lists, or an array of an array if you would, is just a regular old array @LoL that you can get at with

two subscripts, like $LoL[3][2]. Here‘s a declaration of the array:

# assign to our array a list of list references

@LoL = (

[ "fred", "barney" ],

[ "george", "jane", "elroy" ],

[ "homer", "marge", "bart" ],

);

print $LoL[2][2];

bart

Now you should be very careful that the outer bracket type is a round one, that is, a parenthesis. That‘s

because you‘re assigning to an @list, so you need parentheses. If you wanted there not to be an @LoL, but

rather just a reference to it, you could do something more like this:

# assign a reference to list of list references

$ref_to_LoL = [

[ "fred", "barney", "pebbles", "bambam", "dino", ],

[ "homer", "bart", "marge", "maggie", ],

[ "george", "jane", "alroy", "judy", ],

];

print $ref_to_LoL−>[2][2];

Notice that the outer bracket type has changed, and so our access syntax has also changed. That‘s because

unlike C, in perl you can‘t freely interchange arrays and references thereto. $ref_to_LoL is a reference to

an array, whereas @LoL is an array proper. Likewise, $LoL[2] is not an array, but an array ref. So how

come you can write these:

$LoL[2][2]

$ref_to_LoL−>[2][2]

instead of having to write these:

$LoL[2]−>[2]

$ref_to_LoL−>[2]−>[2]

Well, that‘s because the rule is that on adjacent brackets only (whether square or curly), you are free to omit

the pointer dereferencing arrow. But you cannot do so for the very first one if it‘s a scalar containing a

reference, which means that $ref_to_LoL always needs it.

Growing Your Own

That‘s all well and good for declaration of a fixed data structure, but what if you wanted to add new elements

on the fly, or build it up entirely from scratch?

First, let‘s look at reading it in from a file. This is something like adding a row at a time. We‘ll assume that

there‘s a flat file in which each line is a row and each word an element. If you‘re trying to develop an @LoL

list containing all these, here‘s the right way to do that:

324 Version 5.005_02 18−Oct−1998

perllol Perl Programmers Reference Guide perllol

while (<>) {

@tmp = split;

push @LoL, [ @tmp ];

}

You might also have loaded that from a function:

for $i ( 1 .. 10 ) {

$LoL[$i] = [ somefunc($i) ];

}

Or you might have had a temporary variable sitting around with the list in it.

for $i ( 1 .. 10 ) {

@tmp = somefunc($i);

$LoL[$i] = [ @tmp ];

}

It‘s very important that you make sure to use the [] list reference constructor. That‘s because this will be

very wrong:

$LoL[$i] = @tmp;

You see, assigning a named list like that to a scalar just counts the number of elements in @tmp, which

probably isn‘t what you want.

If you are running under use strict, you‘ll have to add some declarations to make it happy:

use strict;

my(@LoL, @tmp);

while (<>) {

@tmp = split;

push @LoL, [ @tmp ];

}

Of course, you don‘t need the temporary array to have a name at all:

while (<>) {

push @LoL, [ split ];

}

You also don‘t have to use push(). You could just make a direct assignment if you knew where you

wanted to put it:

my (@LoL, $i, $line);

for $i ( 0 .. 10 ) {

$line = <>;

$LoL[$i] = [ split ’ ’, $line ];

}

or even just

my (@LoL, $i);

for $i ( 0 .. 10 ) {

$LoL[$i] = [ split ’ ’, <> ];

}

You should in general be leery of using potential list functions in a scalar context without explicitly stating

such. This would be clearer to the casual reader:

my (@LoL, $i);

for $i ( 0 .. 10 ) {

$LoL[$i] = [ split ’ ’, scalar(<>) ];

18−Oct−1998 Version 5.005_02 325

perllol Perl Programmers Reference Guide perllol

}

If you wanted to have a $ref_to_LoL variable as a reference to an array, you‘d have to do something like

this:

while (<>) {

push @$ref_to_LoL, [ split ];

}

Now you can add new rows. What about adding new columns? If you‘re dealing with just matrices, it‘s

often easiest to use simple assignment:

for $x (1 .. 10) {

for $y (1 .. 10) {

$LoL[$x][$y] = func($x, $y);

}

for $x ( 3, 7, 9 ) {

$LoL[$x][20] += func2($x);

}

It doesn‘t matter whether those elements are already there or not: it‘ll gladly create them for you, setting

intervening elements to undef as need be.

If you wanted just to append to a row, you‘d have to do something a bit funnier looking:

# add new columns to an existing row

push @{ $LoL[0] }, "wilma", "betty";

Notice that I couldn‘t say just:

push $LoL[0], "wilma", "betty"; # WRONG!

In fact, that wouldn‘t even compile. How come? Because the argument to push() must be a real array, not

just a reference to such.

Access and Printing

Now it‘s time to print your data structure out. How are you going to do that? Well, if you want only one of

the elements, it‘s trivial:

print $LoL[0][0];

If you want to print the whole thing, though, you can‘t say

print @LoL; # WRONG

because you‘ll get just references listed, and perl will never automatically dereference things for you.

Instead, you have to roll yourself a loop or two. This prints the whole structure, using the shell−style for()

construct to loop across the outer set of subscripts.

for $aref ( @LoL ) {

print "\t [ @$aref ],\n";

}

If you wanted to keep track of subscripts, you might do this:

for $i ( 0 .. $#LoL ) {

print "\t elt $i is [ @{$LoL[$i]} ],\n";

}

or maybe even this. Notice the inner loop.

for $i ( 0 .. $#LoL ) {

for $j ( 0 .. $#{$LoL[$i]} ) {

326 Version 5.005_02 18−Oct−1998

perllol Perl Programmers Reference Guide perllol

print "elt $i $j is $LoL[$i][$j]\n";

}

As you can see, it‘s getting a bit complicated. That‘s why sometimes is easier to take a temporary on your

way through:

for $i ( 0 .. $#LoL ) {

$aref = $LoL[$i];

for $j ( 0 .. $#{$aref} ) {

print "elt $i $j is $LoL[$i][$j]\n";

}

Hmm... that‘s still a bit ugly. How about this:

for $i ( 0 .. $#LoL ) {

$aref = $LoL[$i];

$n = @$aref − 1;

for $j ( 0 .. $n ) {

print "elt $i $j is $LoL[$i][$j]\n";

}

Slices

If you want to get at a slice (part of a row) in a multidimensional array, you‘re going to have to do some

fancy subscripting. That‘s because while we have a nice synonym for single elements via the pointer arrow

for dereferencing, no such convenience exists for slices. (Remember, of course, that you can always write a

loop to do a slice operation.)

Here‘s how to do one operation using a loop. We‘ll assume an @LoL variable as before.

@part = ();

$x = 4;

for ($y = 7; $y < 13; $y++) {

push @part, $LoL[$x][$y];

}

That same loop could be replaced with a slice operation:

@part = @{ $LoL[4] } [ 7..12 ];

but as you might well imagine, this is pretty rough on the reader.

Ah, but what if you wanted a two−dimensional slice, such as having $x run from 4..8 and $y run from 7 to

12? Hmm... here‘s the simple way:

@newLoL = ();

for ($startx = $x = 4; $x <= 8; $x++) {

for ($starty = $y = 7; $y <= 12; $y++) {

$newLoL[$x − $startx][$y − $starty] = $LoL[$x][$y];

}

We can reduce some of the looping through slices

for ($x = 4; $x <= 8; $x++) {

push @newLoL, [ @{ $LoL[$x] } [ 7..12 ] ];

}

If you were into Schwartzian Transforms, you would probably have selected map for that

18−Oct−1998 Version 5.005_02 327

perllol Perl Programmers Reference Guide perllol

@newLoL = map { [ @{ $LoL[$_] } [ 7..12 ] ] } 4 .. 8;

Although if your manager accused of seeking job security (or rapid insecurity) through inscrutable code, it

would be hard to argue. :−) If I were you, I‘d put that in a function:

@newLoL = splice_2D( \@LoL, 4 => 8, 7 => 12 );

sub splice_2D {

my $lrr = shift; # ref to list of list refs!

my ($x_lo, $x_hi,

$y_lo, $y_hi) = @_;

return map {

[ @{ $lrr−>[$_] } [ $y_lo .. $y_hi ] ]

} $x_lo .. $x_hi;

}

SEE ALSO

perldata(1), perlref(1), perldsc(1)

AUTHOR

Tom Christiansen <tchrist@perl.com

Last update: Thu Jun 4 16:16:23 MDT 1998

328 Version 5.005_02 18−Oct−1998

perlobj Perl Programmers Reference Guide perlobj

NAME

perlobj − Perl objects

DESCRIPTION

First of all, you need to understand what references are in Perl. See perlref for that. Second, if you still find

the following reference work too complicated, a tutorial on object−oriented programming in Perl can be

found in perltoot.

If you‘re still with us, then here are three very simple definitions that you should find reassuring.

1. An object is simply a reference that happens to know which class it belongs to.

2. A class is simply a package that happens to provide methods to deal with object references.

3. A method is simply a subroutine that expects an object reference (or a package name, for class

methods) as the first argument.

We‘ll cover these points now in more depth.

An Object is Simply a Reference

Unlike say C++, Perl doesn‘t provide any special syntax for constructors. A constructor is merely a

subroutine that returns a reference to something "blessed" into a class, generally the class that the subroutine

is defined in. Here is a typical constructor:

package Critter;

sub new { bless {} }

That word new isn‘t special. You could have written a construct this way, too:

package Critter;

sub spawn { bless {} }

In fact, this might even be preferable, because the C++ programmers won‘t be tricked into thinking that new

works in Perl as it does in C++. It doesn‘t. We recommend that you name your constructors whatever makes

sense in the context of the problem you‘re solving. For example, constructors in the Tk extension to Perl are

named after the widgets they create.

One thing that‘s different about Perl constructors compared with those in C++ is that in Perl, they have to

allocate their own memory. (The other things is that they don‘t automatically call overridden base−class

constructors.) The {} allocates an anonymous hash containing no key/value pairs, and returns it The

bless() takes that reference and tells the object it references that it‘s now a Critter, and returns the

reference. This is for convenience, because the referenced object itself knows that it has been blessed, and

the reference to it could have been returned directly, like this:

sub new {

my $self = {};

bless $self;

return $self;

}

In fact, you often see such a thing in more complicated constructors that wish to call methods in the class as

part of the construction:

sub new {

my $self = {};

bless $self;

$self−>initialize();

return $self;

}

If you care about inheritance (and you should; see Modules: Creation, Use, and Abuse in perlmod), then you

18−Oct−1998 Version 5.005_02 329

perlobj Perl Programmers Reference Guide perlobj

want to use the two−arg form of bless so that your constructors may be inherited:

sub new {

my $class = shift;

my $self = {};

bless $self, $class;

$self−>initialize();

return $self;

}

Or if you expect people to call not just CLASS−>new() but also $obj−>new(), then use something like

this. The initialize() method used will be of whatever $class we blessed the object into:

sub new {

my $this = shift;

my $class = ref($this) || $this;

my $self = {};

bless $self, $class;

$self−>initialize();

return $self;

}

Within the class package, the methods will typically deal with the reference as an ordinary reference.

Outside the class package, the reference is generally treated as an opaque value that may be accessed only

through the class‘s methods.

A constructor may re−bless a referenced object currently belonging to another class, but then the new class is

responsible for all cleanup later. The previous blessing is forgotten, as an object may belong to only one

class at a time. (Although of course it‘s free to inherit methods from many classes.) If you find yourself

having to do this, the parent class is probably misbehaving, though.

A clarification: Perl objects are blessed. References are not. Objects know which package they belong to.

References do not. The bless() function uses the reference to find the object. Consider the following

example:

$a = {};

$b = $a;

bless $a, BLAH;

print "\$b is a ", ref($b), "\n";

This reports $b as being a BLAH, so obviously bless() operated on the object and not on the reference.

A Class is Simply a Package

Unlike say C++, Perl doesn‘t provide any special syntax for class definitions. You use a package as a class

by putting method definitions into the class.

There is a special array within each package called @ISA, which says where else to look for a method if you

can‘t find it in the current package. This is how Perl implements inheritance. Each element of the @ISA

array is just the name of another package that happens to be a class package. The classes are searched (depth

first) for missing methods in the order that they occur in @ISA. The classes accessible through @ISA are

known as base classes of the current class.

All classes implicitly inherit from class UNIVERSAL as their last base class. Several commonly used

methods are automatically supplied in the UNIVERSAL class; see "Default UNIVERSAL methods" for more

details.

If a missing method is found in one of the base classes, it is cached in the current class for efficiency.

Changing @ISA or defining new subroutines invalidates the cache and causes Perl to do the lookup again.

If neither the current class, its named base classes, nor the UNIVERSAL class contains the requested

method, these three places are searched all over again, this time looking for a method named AUTOLOAD().

330 Version 5.005_02 18−Oct−1998

perlobj Perl Programmers Reference Guide perlobj

If an AUTOLOAD is found, this method is called on behalf of the missing method, setting the package

global $AUTOLOAD to be the fully qualified name of the method that was intended to be called.

If none of that works, Perl finally gives up and complains.

Perl classes do method inheritance only. Data inheritance is left up to the class itself. By and large, this is

not a problem in Perl, because most classes model the attributes of their object using an anonymous hash,

which serves as its own little namespace to be carved up by the various classes that might want to do

something with the object. The only problem with this is that you can‘t sure that you aren‘t using a piece of

the hash that isn‘t already used. A reasonable workaround is to prepend your fieldname in the hash with the

package name.

sub bump {

my $self = shift;

$self−>{ __PACKAGE__ . ".count"}++;

}

A Method is Simply a Subroutine

Unlike say C++, Perl doesn‘t provide any special syntax for method definition. (It does provide a little

syntax for method invocation though. More on that later.) A method expects its first argument to be the

object (reference) or package (string) it is being invoked on. There are just two types of methods, which

we‘ll call class and instance. (Sometimes you‘ll hear these called static and virtual, in honor of the two C++

method types they most closely resemble.)

A class method expects a class name as the first argument. It provides functionality for the class as a whole,

not for any individual object belonging to the class. Constructors are typically class methods. Many class

methods simply ignore their first argument, because they already know what package they‘re in, and don‘t

care what package they were invoked via. (These aren‘t necessarily the same, because class methods follow

the inheritance tree just like ordinary instance methods.) Another typical use for class methods is to look up

an object by name:

sub find {

my ($class, $name) = @_;

$objtable{$name};

}

An instance method expects an object reference as its first argument. Typically it shifts the first argument

into a "self" or "this" variable, and then uses that as an ordinary reference.

sub display {

my $self = shift;

my @keys = @_ ? @_ : sort keys %$self;

foreach $key (@keys) {

print "\t$key => $self−>{$key}\n";

}

Method Invocation

There are two ways to invoke a method, one of which you‘re already familiar with, and the other of which

will look familiar. Perl 4 already had an "indirect object" syntax that you use when you say

print STDERR "help!!!\n";

This same syntax can be used to call either class or instance methods. We‘ll use the two methods defined

above, the class method to lookup an object reference and the instance method to print out its attributes.

$fred = find Critter "Fred";

display $fred ’Height’, ’Weight’;

These could be combined into one statement by using a BLOCK in the indirect object slot:

18−Oct−1998 Version 5.005_02 331

perlobj Perl Programmers Reference Guide perlobj

display {find Critter "Fred"} ’Height’, ’Weight’;

For C++ fans, there‘s also a syntax using −> notation that does exactly the same thing. The parentheses are

required if there are any arguments.

$fred = Critter−>find("Fred");

$fred−>display(’Height’, ’Weight’);

or in one statement,

Critter−>find("Fred")−>display(’Height’, ’Weight’);

There are times when one syntax is more readable, and times when the other syntax is more readable. The

indirect object syntax is less cluttered, but it has the same ambiguity as ordinary list operators. Indirect object

method calls are parsed using the same rule as list operators: "If it looks like a function, it is a function".

(Presuming for the moment that you think two words in a row can look like a function name. C++

programmers seem to think so with some regularity, especially when the first word is "new".) Thus, the

parentheses of

new Critter (’Barney’, 1.5, 70)

are assumed to surround ALL the arguments of the method call, regardless of what comes after. Saying

new Critter (’Bam’ x 2), 1.4, 45

would be equivalent to

Critter−>new(’Bam’ x 2), 1.4, 45

which is unlikely to do what you want.

There are times when you wish to specify which class‘s method to use. In this case, you can call your

method as an ordinary subroutine call, being sure to pass the requisite first argument explicitly:

$fred = MyCritter::find("Critter", "Fred");

MyCritter::display($fred, ’Height’, ’Weight’);

Note however, that this does not do any inheritance. If you wish merely to specify that Perl should START

looking for a method in a particular package, use an ordinary method call, but qualify the method name with

the package like this:

$fred = Critter−>MyCritter::find("Fred");

$fred−>MyCritter::display(’Height’, ’Weight’);

If you‘re trying to control where the method search begins and you‘re executing in the class itself, then you

may use the SUPER pseudo class, which says to start looking in your base class‘s @ISA list without having

to name it explicitly:

$self−>SUPER::display(’Height’, ’Weight’);

Please note that the SUPER:: construct is meaningful only within the class.

Sometimes you want to call a method when you don‘t know the method name ahead of time. You can use

the arrow form, replacing the method name with a simple scalar variable containing the method name:

$method = $fast ? "findfirst" : "findbest";

$fred−>$method(@args);

Default UNIVERSAL methods

The UNIVERSAL package automatically contains the following methods that are inherited by all other

classes:

isa(CLASS)

isa returns true if its object is blessed into a subclass of CLASS

332 Version 5.005_02 18−Oct−1998

perlobj Perl Programmers Reference Guide perlobj

isa is also exportable and can be called as a sub with two arguments. This allows the ability to check

what a reference points to. Example

use UNIVERSAL qw(isa);

if(isa($ref, ’ARRAY’)) {

#...

}

can(METHOD)

can checks to see if its object has a method called METHOD, if it does then a reference to the sub is

returned, if it does not then undef is returned.

VERSION( [NEED] )

VERSION returns the version number of the class (package). If the NEED argument is given then it

will check that the current version (as defined by the $VERSION variable in the given package) not

less than NEED; it will die if this is not the case. This method is normally called as a class method.

This method is called automatically by the VERSION form of use.

use A 1.2 qw(some imported subs);

# implies:

A−>VERSION(1.2);

NOTE: can directly uses Perl‘s internal code for method lookup, and isa uses a very similar method and

cache−ing strategy. This may cause strange effects if the Perl code dynamically changes @ISA in any

package.

You may add other methods to the UNIVERSAL class via Perl or XS code. You do not need to use

UNIVERSAL in order to make these methods available to your program. This is necessary only if you wish

to have isa available as a plain subroutine in the current package.

Destructors

When the last reference to an object goes away, the object is automatically destroyed. (This may even be

after you exit, if you‘ve stored references in global variables.) If you want to capture control just before the

object is freed, you may define a DESTROY method in your class. It will automatically be called at the

appropriate moment, and you can do any extra cleanup you need to do. Perl passes a reference to the object

under destruction as the first (and only) argument. Beware that the reference is a read−only value, and

cannot be modified by manipulating $_[0] within the destructor. The object itself (i.e. the thingy the

reference points to, namely ${$_[0]}, @{$_[0]}, %{$_[0]} etc.) is not similarly constrained.

If you arrange to re−bless the reference before the destructor returns, perl will again call the DESTROY

method for the re−blessed object after the current one returns. This can be used for clean delegation of

object destruction, or for ensuring that destructors in the base classes of your choosing get called. Explicitly

calling DESTROY is also possible, but is usually never needed.

Do not confuse the foregoing with how objects CONTAINED in the current one are destroyed. Such objects

will be freed and destroyed automatically when the current object is freed, provided no other references to

them exist elsewhere.

WARNING

While indirect object syntax may well be appealing to English speakers and to C++ programmers, be not

seduced! It suffers from two grave problems.

The first problem is that an indirect object is limited to a name, a scalar variable, or a block, because it would

have to do too much lookahead otherwise, just like any other postfix dereference in the language. (These are

the same quirky rules as are used for the filehandle slot in functions like print and printf.) This can

lead to horribly confusing precedence problems, as in these next two lines:

move $obj−>{FIELD}; # probably wrong!

move $ary[$i]; # probably wrong!

18−Oct−1998 Version 5.005_02 333

perlobj Perl Programmers Reference Guide perlobj

Those actually parse as the very surprising:

$obj−>move−>{FIELD}; # Well, lookee here

$ary−>move−>[$i]; # Didn’t expect this one, eh?

Rather than what you might have expected:

$obj−>{FIELD}−>move(); # You should be so lucky.

$ary[$i]−>move; # Yeah, sure.

The left side of ‘‘−>‘’ is not so limited, because it‘s an infix operator, not a postfix operator.

As if that weren‘t bad enough, think about this: Perl must guess at compile time whether name and move

above are functions or methods. Usually Perl gets it right, but when it doesn‘t it, you get a function call

compiled as a method, or vice versa. This can introduce subtle bugs that are hard to unravel. For example,

calling a method new in indirect notation—as C++ programmers are so wont to do—can be miscompiled

into a subroutine call if there‘s already a new function in scope. You‘d end up calling the current package‘s

new as a subroutine, rather than the desired class‘s method. The compiler tries to cheat by remembering

bareword requires, but the grief if it messes up just isn‘t worth the years of debugging it would likely take

you to to track such subtle bugs down.

The infix arrow notation using ‘‘−>‘’ doesn‘t suffer from either of these disturbing ambiguities, so we

recommend you use it exclusively.

Summary

That‘s about all there is to it. Now you need just to go off and buy a book about object−oriented design

methodology, and bang your forehead with it for the next six months or so.

Two−Phased Garbage Collection

For most purposes, Perl uses a fast and simple reference−based garbage collection system. For this reason,

there‘s an extra dereference going on at some level, so if you haven‘t built your Perl executable using your C

compiler‘s −O flag, performance will suffer. If you have built Perl with cc −O, then this probably won‘t

matter.

A more serious concern is that unreachable memory with a non−zero reference count will not normally get

freed. Therefore, this is a bad idea:

{

my $a;

$a = \$a;

}

Even thought $a should go away, it can‘t. When building recursive data structures, you‘ll have to break the

self−reference yourself explicitly if you don‘t care to leak. For example, here‘s a self−referential node such

as one might use in a sophisticated tree structure:

sub new_node {

my $self = shift;

my $class = ref($self) || $self;

my $node = {};

$node−>{LEFT} = $node−>{RIGHT} = $node;

$node−>{DATA} = [ @_ ];

return bless $node => $class;

}

If you create nodes like that, they (currently) won‘t go away unless you break their self reference yourself.

(In other words, this is not to be construed as a feature, and you shouldn‘t depend on it.)

Almost.

When an interpreter thread finally shuts down (usually when your program exits), then a rather costly but

complete mark−and−sweep style of garbage collection is performed, and everything allocated by that thread

334 Version 5.005_02 18−Oct−1998

perlobj Perl Programmers Reference Guide perlobj

gets destroyed. This is essential to support Perl as an embedded or a multithreadable language. For

example, this program demonstrates Perl‘s two−phased garbage collection:

#!/usr/bin/perl

package Subtle;

sub new {

my $test;

$test = \$test;

warn "CREATING " . \$test;

return bless \$test;

}

sub DESTROY {

my $self = shift;

warn "DESTROYING $self";

}

package main;

warn "starting program";

{

my $a = Subtle−>new;

my $b = Subtle−>new;

$$a = 0; # break selfref

warn "leaving block";

}

warn "just exited block";

warn "time to die...";

exit;

When run as /tmp/test, the following output is produced:

starting program at /tmp/test line 18.

CREATING SCALAR(0x8e5b8) at /tmp/test line 7.

CREATING SCALAR(0x8e57c) at /tmp/test line 7.

leaving block at /tmp/test line 23.

DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/test line 13.

just exited block at /tmp/test line 26.

time to die... at /tmp/test line 27.

DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.

Notice that "global destruction" bit there? That‘s the thread garbage collector reaching the unreachable.

Objects are always destructed, even when regular refs aren‘t and in fact are destructed in a separate pass

before ordinary refs just to try to prevent object destructors from using refs that have been themselves

destructed. Plain refs are only garbage−collected if the destruct level is greater than 0. You can test the

higher levels of global destruction by setting the PERL_DESTRUCT_LEVEL environment variable,

presuming −DDEBUGGING was enabled during perl build time.

A more complete garbage collection strategy will be implemented at a future date.

In the meantime, the best solution is to create a non−recursive container class that holds a pointer to the

self−referential data structure. Define a DESTROY method for the containing object‘s class that manually

breaks the circularities in the self−referential structure.

SEE ALSO

A kinder, gentler tutorial on object−oriented programming in Perl can be found in perltoot. You should also

check out perlbot for other object tricks, traps, and tips, as well as perlmodlib for some style guides on

constructing both modules and classes.

18−Oct−1998 Version 5.005_02 335

perltie Perl Programmers Reference Guide perltie

NAME

perltie − how to hide an object class in a simple variable

SYNOPSIS

tie VARIABLE, CLASSNAME, LIST

$object = tied VARIABLE

untie VARIABLE

DESCRIPTION

Prior to release 5.0 of Perl, a programmer could use dbmopen() to connect an on−disk database in the

standard Unix dbm(3x) format magically to a %HASH in their program. However, their Perl was either built

with one particular dbm library or another, but not both, and you couldn‘t extend this mechanism to other

packages or types of variables.

Now you can.

The tie() function binds a variable to a class (package) that will provide the implementation for access

methods for that variable. Once this magic has been performed, accessing a tied variable automatically

triggers method calls in the proper class. The complexity of the class is hidden behind magic methods calls.

The method names are in ALL CAPS, which is a convention that Perl uses to indicate that they‘re called

implicitly rather than explicitly—just like the BEGIN() and END() functions.

In the tie() call, VARIABLE is the name of the variable to be enchanted. CLASSNAME is the name of a

class implementing objects of the correct type. Any additional arguments in the LIST are passed to the

appropriate constructor method for that class—meaning TIESCALAR(), TIEARRAY(), TIEHASH(), or

TIEHANDLE(). (Typically these are arguments such as might be passed to the dbminit() function of

C.) The object returned by the "new" method is also returned by the tie() function, which would be useful

if you wanted to access other methods in CLASSNAME. (You don‘t actually have to return a reference to a

right "type" (e.g., HASH or CLASSNAME) so long as it‘s a properly blessed object.) You can also retrieve a

reference to the underlying object using the tied() function.

Unlike dbmopen(), the tie() function will not use or require a module for you—you need to do that

explicitly yourself.

Tying Scalars

A class implementing a tied scalar should define the following methods: TIESCALAR, FETCH, STORE,

and possibly DESTROY.

Let‘s look at each in turn, using as an example a tie class for scalars that allows the user to do something

like:

tie $his_speed, ’Nice’, getppid();

tie $my_speed, ’Nice’, $$;

And now whenever either of those variables is accessed, its current system priority is retrieved and returned.

If those variables are set, then the process‘s priority is changed!

We‘ll use Jarkko Hietaniemi <jhi@iki.fi‘s BSD::Resource class (not included) to access the

PRIO_PROCESS, PRIO_MIN, and PRIO_MAX constants from your system, as well as the

getpriority() and setpriority() system calls. Here‘s the preamble of the class.

package Nice;

use Carp;

use BSD::Resource;

use strict;

$Nice::DEBUG = 0 unless defined $Nice::DEBUG;

336 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

TIESCALAR classname, LIST

This is the constructor for the class. That means it is expected to return a blessed reference to a new

scalar (probably anonymous) that it‘s creating. For example:

sub TIESCALAR {

my $class = shift;

my $pid = shift || $$; # 0 means me

if ($pid !~ /^\d+$/) {

carp "Nice::Tie::Scalar got non−numeric pid $pid" if $^W;

return undef;

}

unless (kill 0, $pid) { # EPERM or ERSCH, no doubt

carp "Nice::Tie::Scalar got bad pid $pid: $!" if $^W;

return undef;

}

return bless \$pid, $class;

}

This tie class has chosen to return an error rather than raising an exception if its constructor should fail.

While this is how dbmopen() works, other classes may well not wish to be so forgiving. It checks

the global variable $^W to see whether to emit a bit of noise anyway.

FETCH this

This method will be triggered every time the tied variable is accessed (read). It takes no arguments

beyond its self reference, which is the object representing the scalar we‘re dealing with. Because in

this case we‘re using just a SCALAR ref for the tied scalar object, a simple $$self allows the

method to get at the real value stored there. In our example below, that real value is the process ID to

which we‘ve tied our variable.

sub FETCH {

my $self = shift;

confess "wrong type" unless ref $self;

croak "usage error" if @_;

my $nicety;

local($!) = 0;

$nicety = getpriority(PRIO_PROCESS, $$self);

if ($!) { croak "getpriority failed: $!" }

return $nicety;

}

This time we‘ve decided to blow up (raise an exception) if the renice fails—there‘s no place for us to

return an error otherwise, and it‘s probably the right thing to do.

STORE this, value

This method will be triggered every time the tied variable is set (assigned). Beyond its self reference,

it also expects one (and only one) argument—the new value the user is trying to assign.

sub STORE {

my $self = shift;

confess "wrong type" unless ref $self;

my $new_nicety = shift;

croak "usage error" if @_;

if ($new_nicety < PRIO_MIN) {

carp sprintf

"WARNING: priority %d less than minimum system priority %d",

18−Oct−1998 Version 5.005_02 337

perltie Perl Programmers Reference Guide perltie

$new_nicety, PRIO_MIN if $^W;

$new_nicety = PRIO_MIN;

}

if ($new_nicety > PRIO_MAX) {

carp sprintf

"WARNING: priority %d greater than maximum system priority %d",

$new_nicety, PRIO_MAX if $^W;

$new_nicety = PRIO_MAX;

}

unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {

confess "setpriority failed: $!";

}

return $new_nicety;

}

DESTROY this

This method will be triggered when the tied variable needs to be destructed. As with other object

classes, such a method is seldom necessary, because Perl deallocates its moribund object‘s memory for

you automatically—this isn‘t C++, you know. We‘ll use a DESTROY method here for debugging

purposes only.

sub DESTROY {

my $self = shift;

confess "wrong type" unless ref $self;

carp "[ Nice::DESTROY pid $$self ]" if $Nice::DEBUG;

}

That‘s about all there is to it. Actually, it‘s more than all there is to it, because we‘ve done a few nice things

here for the sake of completeness, robustness, and general aesthetics. Simpler TIESCALAR classes are

certainly possible.

Tying Arrays

A class implementing a tied ordinary array should define the following methods: TIEARRAY, FETCH,

STORE, FETCHSIZE, STORESIZE and perhaps DESTROY.

FETCHSIZE and STORESIZE are used to provide $#array and equivalent scalar(@array) access.

The methods POP, PUSH, SHIFT, UNSHIFT, SPLICE are required if the perl operator with the

corresponding (but lowercase) name is to operate on the tied array. The Tie::Array class can be used as a

base class to implement these in terms of the basic five methods above.

In addition EXTEND will be called when perl would have pre−extended allocation in a real array.

This means that tied arrays are now complete. The example below needs upgrading to illustrate this. (The

documentation in Tie::Array is more complete.)

For this discussion, we‘ll implement an array whose indices are fixed at its creation. If you try to access

anything beyond those bounds, you‘ll take an exception. For example:

require Bounded_Array;

tie @ary, ’Bounded_Array’, 2;

$| = 1;

for $i (0 .. 10) {

print "setting index $i: ";

$ary[$i] = 10 * $i;

print "value of elt $i now $ary[$i]\n";

}

338 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

The preamble code for the class is as follows:

package Bounded_Array;

use Carp;

use strict;

TIEARRAY classname, LIST

This is the constructor for the class. That means it is expected to return a blessed reference through

which the new array (probably an anonymous ARRAY ref) will be accessed.

In our example, just to show you that you don‘t really have to return an ARRAY reference, we‘ll

choose a HASH reference to represent our object. A HASH works out well as a generic record type:

the {BOUND} field will store the maximum bound allowed, and the {ARRAY} field will hold the true

ARRAY ref. If someone outside the class tries to dereference the object returned (doubtless thinking it

an ARRAY ref), they‘ll blow up. This just goes to show you that you should respect an object‘s

privacy.

sub TIEARRAY {

my $class = shift;

my $bound = shift;

confess "usage: tie(\@ary, ’Bounded_Array’, max_subscript)"

if @_ || $bound =~ /\D/;

return bless {

BOUND => $bound,

ARRAY => [],

}, $class;

}

FETCH this, index

This method will be triggered every time an individual element the tied array is accessed (read). It

takes one argument beyond its self reference: the index whose value we‘re trying to fetch.

sub FETCH {

my($self,$idx) = @_;

if ($idx > $self−>{BOUND}) {

confess "Array OOB: $idx > $self−>{BOUND}";

}

return $self−>{ARRAY}[$idx];

}

As you may have noticed, the name of the FETCH method (et al.) is the same for all accesses, even

though the constructors differ in names (TIESCALAR vs TIEARRAY). While in theory you could

have the same class servicing several tied types, in practice this becomes cumbersome, and it‘s easiest

to keep them at simply one tie type per class.

STORE this, index, value

This method will be triggered every time an element in the tied array is set (written). It takes two

arguments beyond its self reference: the index at which we‘re trying to store something and the value

we‘re trying to put there. For example:

sub STORE {

my($self, $idx, $value) = @_;

print "[STORE $value at $idx]\n" if _debug;

if ($idx > $self−>{BOUND} ) {

confess "Array OOB: $idx > $self−>{BOUND}";

}

return $self−>{ARRAY}[$idx] = $value;

}

18−Oct−1998 Version 5.005_02 339

perltie Perl Programmers Reference Guide perltie

DESTROY this

This method will be triggered when the tied variable needs to be destructed. As with the scalar tie

class, this is almost never needed in a language that does its own garbage collection, so this time we‘ll

just leave it out.

The code we presented at the top of the tied array class accesses many elements of the array, far more than

we‘ve set the bounds to. Therefore, it will blow up once they try to access beyond the 2nd element of @ary,

as the following output demonstrates:

setting index 0: value of elt 0 now 0

setting index 1: value of elt 1 now 10

setting index 2: value of elt 2 now 20

setting index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39

Bounded_Array::FETCH called at testba line 12

Tying Hashes

As the first Perl data type to be tied (see dbmopen()), hashes have the most complete and useful tie()

implementation. A class implementing a tied hash should define the following methods: TIEHASH is the

constructor. FETCH and STORE access the key and value pairs. EXISTS reports whether a key is present in

the hash, and DELETE deletes one. CLEAR empties the hash by deleting all the key and value pairs.

FIRSTKEY and NEXTKEY implement the keys() and each() functions to iterate over all the keys. And

DESTROY is called when the tied variable is garbage collected.

If this seems like a lot, then feel free to inherit from merely the standard Tie::Hash module for most of your

methods, redefining only the interesting ones. See Tie::Hash for details.

Remember that Perl distinguishes between a key not existing in the hash, and the key existing in the hash but

having a corresponding value of undef. The two possibilities can be tested with the exists() and

defined() functions.

Here‘s an example of a somewhat interesting tied hash class: it gives you a hash representing a particular

user‘s dot files. You index into the hash with the name of the file (minus the dot) and you get back that dot

file‘s contents. For example:

use DotFiles;

tie %dot, ’DotFiles’;

if ( $dot{profile} =~ /MANPATH/ ||

$dot{login} =~ /MANPATH/ ||

$dot{cshrc} =~ /MANPATH/ )

{

print "you seem to set your MANPATH\n";

}

Or here‘s another sample of using our tied class:

tie %him, ’DotFiles’, ’daemon’;

foreach $f ( keys %him ) {

printf "daemon dot file %s is size %d\n",

$f, length $him{$f};

}

In our tied hash DotFiles example, we use a regular hash for the object containing several important fields, of

which only the {LIST} field will be what the user thinks of as the real hash.

USER

whose dot files this object represents

340 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

HOME

where those dot files live

CLOBBER

whether we should try to change or remove those dot files

LIST the hash of dot file names and content mappings

Here‘s the start of Dotfiles.pm:

package DotFiles;

use Carp;

sub whowasi { (caller(1))[3] . ’()’ }

my $DEBUG = 0;

sub debug { $DEBUG = @_ ? shift : 1 }

For our example, we want to be able to emit debugging info to help in tracing during development. We keep

also one convenience function around internally to help print out warnings; whowasi() returns the

function name that calls it.

Here are the methods for the DotFiles tied hash.

TIEHASH classname, LIST

This is the constructor for the class. That means it is expected to return a blessed reference through

which the new object (probably but not necessarily an anonymous hash) will be accessed.

Here‘s the constructor:

sub TIEHASH {

my $self = shift;

my $user = shift || $>;

my $dotdir = shift || ’’;

croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;

$user = getpwuid($user) if $user =~ /^\d+$/;

my $dir = (getpwnam($user))[7]

|| croak "@{[&whowasi]}: no user $user";

$dir .= "/$dotdir" if $dotdir;

my $node = {

USER => $user,

HOME => $dir,

LIST => {},

CLOBBER => 0,

};

opendir(DIR, $dir)

|| croak "@{[&whowasi]}: can’t opendir $dir: $!";

foreach $dot ( grep /^\./ && −f "$dir/$_", readdir(DIR)) {

$dot =~ s/^\.//;

$node−>{LIST}{$dot} = undef;

}

closedir DIR;

return bless $node, $self;

}

It‘s probably worth mentioning that if you‘re going to filetest the return values out of a readdir, you‘d

better prepend the directory in question. Otherwise, because we didn‘t chdir() there, it would have

been testing the wrong file.

18−Oct−1998 Version 5.005_02 341

perltie Perl Programmers Reference Guide perltie

FETCH this, key

This method will be triggered every time an element in the tied hash is accessed (read). It takes one

argument beyond its self reference: the key whose value we‘re trying to fetch.

Here‘s the fetch for our DotFiles example.

sub FETCH {

carp &whowasi if $DEBUG;

my $self = shift;

my $dot = shift;

my $dir = $self−>{HOME};

my $file = "$dir/.$dot";

unless (exists $self−>{LIST}−>{$dot} || −f $file) {

carp "@{[&whowasi]}: no $dot file" if $DEBUG;

return undef;

}

if (defined $self−>{LIST}−>{$dot}) {

return $self−>{LIST}−>{$dot};

} else {

return $self−>{LIST}−>{$dot} = ‘cat $dir/.$dot‘;

}

It was easy to write by having it call the Unix cat(1) command, but it would probably be more portable

to open the file manually (and somewhat more efficient). Of course, because dot files are a Unixy

concept, we‘re not that concerned.

STORE this, key, value

This method will be triggered every time an element in the tied hash is set (written). It takes two

arguments beyond its self reference: the index at which we‘re trying to store something, and the value

we‘re trying to put there.

Here in our DotFiles example, we‘ll be careful not to let them try to overwrite the file unless they‘ve

called the clobber() method on the original object reference returned by tie().

sub STORE {

carp &whowasi if $DEBUG;

my $self = shift;

my $dot = shift;

my $value = shift;

my $file = $self−>{HOME} . "/.$dot";

my $user = $self−>{USER};

croak "@{[&whowasi]}: $file not clobberable"

unless $self−>{CLOBBER};

open(F, "> $file") || croak "can’t open $file: $!";

print F $value;

close(F);

}

If they wanted to clobber something, they might say:

$ob = tie %daemon_dots, ’daemon’;

$ob−>clobber(1);

$daemon_dots{signature} = "A true daemon\n";

Another way to lay hands on a reference to the underlying object is to use the tied() function, so

342 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

they might alternately have set clobber using:

tie %daemon_dots, ’daemon’;

tied(%daemon_dots)−>clobber(1);

The clobber method is simply:

sub clobber {

my $self = shift;

$self−>{CLOBBER} = @_ ? shift : 1;

}

DELETE this, key

This method is triggered when we remove an element from the hash, typically by using the

delete() function. Again, we‘ll be careful to check whether they really want to clobber files.

sub DELETE {

carp &whowasi if $DEBUG;

my $self = shift;

my $dot = shift;

my $file = $self−>{HOME} . "/.$dot";

croak "@{[&whowasi]}: won’t remove file $file"

unless $self−>{CLOBBER};

delete $self−>{LIST}−>{$dot};

my $success = unlink($file);

carp "@{[&whowasi]}: can’t unlink $file: $!" unless $success;

$success;

}

The value returned by DELETE becomes the return value of the call to delete(). If you want to

emulate the normal behavior of delete(), you should return whatever FETCH would have returned

for this key. In this example, we have chosen instead to return a value which tells the caller whether

the file was successfully deleted.

CLEAR this

This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.

In our example, that would remove all the user‘s dot files! It‘s such a dangerous thing that they‘ll have

to set CLOBBER to something higher than 1 to make it happen.

sub CLEAR {

carp &whowasi if $DEBUG;

my $self = shift;

croak "@{[&whowasi]}: won’t remove all dot files for $self−>{USER}"

unless $self−>{CLOBBER} > 1;

my $dot;

foreach $dot ( keys %{$self−>{LIST}}) {

$self−>DELETE($dot);

}

EXISTS this, key

This method is triggered when the user uses the exists() function on a particular hash. In our

example, we‘ll look at the {LIST} hash element for this:

sub EXISTS {

carp &whowasi if $DEBUG;

my $self = shift;

18−Oct−1998 Version 5.005_02 343

perltie Perl Programmers Reference Guide perltie

my $dot = shift;

return exists $self−>{LIST}−>{$dot};

}

FIRSTKEY this

This method will be triggered when the user is going to iterate through the hash, such as via a keys()

or each() call.

sub FIRSTKEY {

carp &whowasi if $DEBUG;

my $self = shift;

my $a = keys %{$self−>{LIST}}; # reset each() iterator

each %{$self−>{LIST}}

}

NEXTKEY this, lastkey

This method gets triggered during a keys() or each() iteration. It has a second argument which is

the last key that had been accessed. This is useful if you‘re carrying about ordering or calling the

iterator from more than one sequence, or not really storing things in a hash anywhere.

For our example, we‘re using a real hash so we‘ll do just the simple thing, but we‘ll have to go through

the LIST field indirectly.

sub NEXTKEY {

carp &whowasi if $DEBUG;

my $self = shift;

return each %{ $self−>{LIST} }

}

DESTROY this

This method is triggered when a tied hash is about to go out of scope. You don‘t really need it unless

you‘re trying to add debugging or have auxiliary state to clean up. Here‘s a very simple function:

sub DESTROY {

carp &whowasi if $DEBUG;

}

Note that functions such as keys() and values() may return huge lists when used on large objects, like

DBM files. You may prefer to use the each() function to iterate over such. Example:

# print out history file offsets

use NDBM_File;

tie(%HIST, ’NDBM_File’, ’/usr/lib/news/history’, 1, 0);

while (($key,$val) = each %HIST) {

print $key, ’ = ’, unpack(’L’,$val), "\n";

}

untie(%HIST);

Tying FileHandles

This is partially implemented now.

A class implementing a tied filehandle should define the following methods: TIEHANDLE, at least one of

PRINT, PRINTF, WRITE, READLINE, GETC, READ, and possibly CLOSE and DESTROY.

It is especially useful when perl is embedded in some other program, where output to STDOUT and

STDERR may have to be redirected in some special way. See nvi and the Apache module for examples.

In our example we‘re going to create a shouting handle.

package Shout;

344 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

TIEHANDLE classname, LIST

This is the constructor for the class. That means it is expected to return a blessed reference of some

sort. The reference can be used to hold some internal information.

sub TIEHANDLE { print "<shout>\n"; my $i; bless \$i, shift }

WRITE this, LIST

This method will be called when the handle is written to via the syswrite function.

sub WRITE {

$r = shift;

my($buf,$len,$offset) = @_;

print "WRITE called, \$buf=$buf, \$len=$len, \$offset=$offset";

}

PRINT this, LIST

This method will be triggered every time the tied handle is printed to with the print() function.

Beyond its self reference it also expects the list that was passed to the print function.

sub PRINT { $r = shift; $$r++; print join($,,map(uc($_),@_)),$\ }

PRINTF this, LIST

This method will be triggered every time the tied handle is printed to with the printf() function.

Beyond its self reference it also expects the format and list that was passed to the printf function.

sub PRINTF {

shift;

my $fmt = shift;

print sprintf($fmt, @_)."\n";

}

READ this, LIST

This method will be called when the handle is read from via the read or sysread functions.

sub READ {

$r = shift;

my($buf,$len,$offset) = @_;

print "READ called, \$buf=$buf, \$len=$len, \$offset=$offset";

}

READLINE this

This method will be called when the handle is read from via <HANDLE. The method should return

undef when there is no more data.

sub READLINE { $r = shift; "PRINT called $$r times\n"; }

GETC this

This method will be called when the getc function is called.

sub GETC { print "Don’t GETC, Get Perl"; return "a"; }

CLOSE this

This method will be called when the handle is closed via the close function.

sub CLOSE { print "CLOSE called.\n" }

DESTROY this

As with the other types of ties, this method will be called when the tied handle is about to be destroyed.

This is useful for debugging and possibly cleaning up.

18−Oct−1998 Version 5.005_02 345

perltie Perl Programmers Reference Guide perltie

sub DESTROY { print "</shout>\n" }

Here‘s how to use our little example:

tie(*FOO,’Shout’);

print FOO "hello\n";

$a = 4; $b = 6;

print FOO $a, " plus ", $b, " equals ", $a + $b, "\n";

print <FOO>;

The untie Gotcha

If you intend making use of the object returned from either tie() or tied(), and if the tie‘s target class

defines a destructor, there is a subtle gotcha you must guard against.

As setup, consider this (admittedly rather contrived) example of a tie; all it does is use a file to keep a log of

the values assigned to a scalar.

package Remember;

use strict;

use IO::File;

sub TIESCALAR {

my $class = shift;

my $filename = shift;

my $handle = new IO::File "> $filename"

or die "Cannot open $filename: $!\n";

print $handle "The Start\n";

bless {FH => $handle, Value => 0}, $class;

}

sub FETCH {

my $self = shift;

return $self−>{Value};

}

sub STORE {

my $self = shift;

my $value = shift;

my $handle = $self−>{FH};

print $handle "$value\n";

$self−>{Value} = $value;

}

sub DESTROY {

my $self = shift;

my $handle = $self−>{FH};

print $handle "The End\n";

close $handle;

}

Here is an example that makes use of this tie:

use strict;

use Remember;

my $fred;

tie $fred, ’Remember’, ’myfile.txt’;

$fred = 1;

346 Version 5.005_02 18−Oct−1998

perltie Perl Programmers Reference Guide perltie

$fred = 4;

$fred = 5;

untie $fred;

system "cat myfile.txt";

This is the output when it is executed:

The Start

The End

So far so good. Those of you who have been paying attention will have spotted that the tied object hasn‘t

been used so far. So lets add an extra method to the Remember class to allow comments to be included in

the file — say, something like this:

sub comment {

my $self = shift;

my $text = shift;

my $handle = $self−>{FH};

print $handle $text, "\n";

}

And here is the previous example modified to use the comment method (which requires the tied object):

use strict;

use Remember;

my ($fred, $x);

$x = tie $fred, ’Remember’, ’myfile.txt’;

$fred = 1;

$fred = 4;

comment $x "changing...";

$fred = 5;

untie $fred;

system "cat myfile.txt";

When this code is executed there is no output. Here‘s why:

When a variable is tied, it is associated with the object which is the return value of the TIESCALAR,

TIEARRAY, or TIEHASH function. This object normally has only one reference, namely, the implicit

reference from the tied variable. When untie() is called, that reference is destroyed. Then, as in the first

example above, the object‘s destructor (DESTROY) is called, which is normal for objects that have no more

valid references; and thus the file is closed.

In the second example, however, we have stored another reference to the tied object in $x. That means that

when untie() gets called there will still be a valid reference to the object in existence, so the destructor is

not called at that time, and thus the file is not closed. The reason there is no output is because the file buffers

have not been flushed to disk.

Now that you know what the problem is, what can you do to avoid it? Well, the good old −w flag will spot

any instances where you call untie() and there are still valid references to the tied object. If the second

script above is run with the −w flag, Perl prints this warning message:

untie attempted while 1 inner references still exist

To get the script to work properly and silence the warning make sure there are no valid references to the tied

object before untie() is called:

undef $x;

18−Oct−1998 Version 5.005_02 347

perltie Perl Programmers Reference Guide perltie

untie $fred;

SEE ALSO

See DB_File or Config for some interesting tie() implementations.

BUGS

Tied arrays are incomplete. They are also distinctly lacking something for the $#ARRAY access (which is

hard, as it‘s an lvalue), as well as the other obvious array functions, like push(), pop(), shift(),

unshift(), and splice().

You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is

that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with

how references are to be represented on disk. One experimental module that does attempt to address this

need partially is the MLDBM module. Check your nearest CPAN site as described in perlmodlib for source

code to MLDBM.

AUTHOR

Tom Christiansen

TIEHANDLE by Sven Verdoolaege <skimo@dns.ufsia.ac.be and Doug MacEachern <dougm@osf.org

348 Version 5.005_02 18−Oct−1998

perlbot Perl Programmers Reference Guide perlbot

NAME

perlbot − Bag‘o Object Tricks (the BOT)

DESCRIPTION

The following collection of tricks and hints is intended to whet curious appetites about such things as the use

of instance variables and the mechanics of object and class relationships. The reader is encouraged to

consult relevant textbooks for discussion of Object Oriented definitions and methodology. This is not

intended as a tutorial for object−oriented programming or as a comprehensive guide to Perl‘s object oriented

features, nor should it be construed as a style guide.

The Perl motto still holds: There‘s more than one way to do it.

OO SCALING TIPS

1 Do not attempt to verify the type of $self. That‘ll break if the class is inherited, when the type of

$self is valid but its package isn‘t what you expect. See rule 5.

2 If an object−oriented (OO) or indirect−object (IO) syntax was used, then the object is probably the

correct type and there‘s no need to become paranoid about it. Perl isn‘t a paranoid language anyway.

If people subvert the OO or IO syntax then they probably know what they‘re doing and you should

let them do it. See rule 1.

3 Use the two−argument form of bless(). Let a subclass use your constructor. See

INHERITING A CONSTRUCTOR.

4 The subclass is allowed to know things about its immediate superclass, the superclass is allowed to

know nothing about a subclass.

5 Don‘t be trigger happy with inheritance. A "using", "containing", or "delegation" relationship (some

sort of aggregation, at least) is often more appropriate. See OBJECT RELATIONSHIPS,

USING RELATIONSHIP WITH SDBM, and "DELEGATION".

6 The object is the namespace. Make package globals accessible via the object. This will remove the

guess work about the symbol‘s home package. See CLASS CONTEXT AND THE OBJECT.

7 IO syntax is certainly less noisy, but it is also prone to ambiguities that can cause difficult−to−find

bugs. Allow people to use the sure−thing OO syntax, even if you don‘t like it.

8 Do not use function−call syntax on a method. You‘re going to be bitten someday. Someone might

move that method into a superclass and your code will be broken. On top of that you‘re feeding the

paranoia in rule 2.

9 Don‘t assume you know the home package of a method. You‘re making it difficult for someone to

override that method. See THINKING OF CODE REUSE.

INSTANCE VARIABLES

An anonymous array or anonymous hash can be used to hold instance variables. Named parameters are also

demonstrated.

package Foo;

sub new {

my $type = shift;

my %params = @_;

my $self = {};

$self−>{’High’} = $params{’High’};

$self−>{’Low’} = $params{’Low’};

bless $self, $type;

}

package Bar;

18−Oct−1998 Version 5.005_02 349

perlbot Perl Programmers Reference Guide perlbot

sub new {

my $type = shift;

my %params = @_;

my $self = [];

$self−>[0] = $params{’Left’};

$self−>[1] = $params{’Right’};

bless $self, $type;

}

package main;

$a = Foo−>new( ’High’ => 42, ’Low’ => 11 );

print "High=$a−>{’High’}\n";

print "Low=$a−>{’Low’}\n";

$b = Bar−>new( ’Left’ => 78, ’Right’ => 40 );

print "Left=$b−>[0]\n";

print "Right=$b−>[1]\n";

SCALAR INSTANCE VARIABLES

An anonymous scalar can be used when only one instance variable is needed.

package Foo;

sub new {

my $type = shift;

my $self;

$self = shift;

bless \$self, $type;

}

package main;

$a = Foo−>new( 42 );

print "a=$$a\n";

INSTANCE VARIABLE INHERITANCE

This example demonstrates how one might inherit instance variables from a superclass for inclusion in the

new class. This requires calling the superclass‘s constructor and adding one‘s own instance variables to the

new object.

package Bar;

sub new {

my $type = shift;

my $self = {};

$self−>{’buz’} = 42;

bless $self, $type;

}

package Foo;

@ISA = qw( Bar );

sub new {

my $type = shift;

my $self = Bar−>new;

$self−>{’biz’} = 11;

bless $self, $type;

}

package main;

350 Version 5.005_02 18−Oct−1998

perlbot Perl Programmers Reference Guide perlbot

$a = Foo−>new;

print "buz = ", $a−>{’buz’}, "\n";

print "biz = ", $a−>{’biz’}, "\n";

OBJECT RELATIONSHIPS

The following demonstrates how one might implement "containing" and "using" relationships between

objects.

package Bar;

sub new {

my $type = shift;

my $self = {};

$self−>{’buz’} = 42;

bless $self, $type;

}

package Foo;

sub new {

my $type = shift;

my $self = {};

$self−>{’Bar’} = Bar−>new;

$self−>{’biz’} = 11;

bless $self, $type;

}

package main;

$a = Foo−>new;

print "buz = ", $a−>{’Bar’}−>{’buz’}, "\n";

print "biz = ", $a−>{’biz’}, "\n";

OVERRIDING SUPERCLASS METHODS

The following example demonstrates how to override a superclass method and then call the overridden

method. The SUPER pseudo−class allows the programmer to call an overridden superclass method without

actually knowing where that method is defined.

package Buz;

sub goo { print "here’s the goo\n" }

package Bar; @ISA = qw( Buz );

sub google { print "google here\n" }

package Baz;

sub mumble { print "mumbling\n" }

package Foo;

@ISA = qw( Bar Baz );

sub new {

my $type = shift;

bless [], $type;

}

sub grr { print "grumble\n" }

sub goo {

my $self = shift;

$self−>SUPER::goo();

}

sub mumble {

my $self = shift;

18−Oct−1998 Version 5.005_02 351

perlbot Perl Programmers Reference Guide perlbot

$self−>SUPER::mumble();

}

sub google {

my $self = shift;

$self−>SUPER::google();

}

package main;

$foo = Foo−>new;

$foo−>mumble;

$foo−>grr;

$foo−>goo;

$foo−>google;

USING RELATIONSHIP WITH SDBM

This example demonstrates an interface for the SDBM class. This creates a "using" relationship between the

SDBM class and the new class Mydbm.

package Mydbm;

require SDBM_File;

require Tie::Hash;

@ISA = qw( Tie::Hash );

sub TIEHASH {

my $type = shift;

my $ref = SDBM_File−>new(@_);

bless {’dbm’ => $ref}, $type;

}

sub FETCH {

my $self = shift;

my $ref = $self−>{’dbm’};

$ref−>FETCH(@_);

}

sub STORE {

my $self = shift;

if (defined $_[0]){

my $ref = $self−>{’dbm’};

$ref−>STORE(@_);

} else {

die "Cannot STORE an undefined key in Mydbm\n";

}

package main;

use Fcntl qw( O_RDWR O_CREAT );

tie %foo, "Mydbm", "Sdbm", O_RDWR|O_CREAT, 0640;

$foo{’bar’} = 123;

print "foo−bar = $foo{’bar’}\n";

tie %bar, "Mydbm", "Sdbm2", O_RDWR|O_CREAT, 0640;

$bar{’Cathy’} = 456;

print "bar−Cathy = $bar{’Cathy’}\n";

352 Version 5.005_02 18−Oct−1998

perlbot Perl Programmers Reference Guide perlbot

THINKING OF CODE REUSE

One strength of Object−Oriented languages is the ease with which old code can use new code. The

following examples will demonstrate first how one can hinder code reuse and then how one can promote

code reuse.

This first example illustrates a class which uses a fully−qualified method call to access the "private" method

BAZ(). The second example will show that it is impossible to override the BAZ() method.

package FOO;

sub new {

my $type = shift;

bless {}, $type;

}

sub bar {

my $self = shift;

$self−>FOO::private::BAZ;

}

package FOO::private;

sub BAZ {

print "in BAZ\n";

}

package main;

$a = FOO−>new;

$a−>bar;

Now we try to override the BAZ() method. We would like FOO::bar() to call GOOP::BAZ(), but this

cannot happen because FOO::bar() explicitly calls FOO::private::BAZ().

package FOO;

sub new {

my $type = shift;

bless {}, $type;

}

sub bar {

my $self = shift;

$self−>FOO::private::BAZ;

}

package FOO::private;

sub BAZ {

print "in BAZ\n";

}

package GOOP;

@ISA = qw( FOO );

sub new {

my $type = shift;

bless {}, $type;

}

sub BAZ {

print "in GOOP::BAZ\n";

}

18−Oct−1998 Version 5.005_02 353

perlbot Perl Programmers Reference Guide perlbot

package main;

$a = GOOP−>new;

$a−>bar;

To create reusable code we must modify class FOO, flattening class FOO::private. The next example shows

a reusable class FOO which allows the method GOOP::BAZ() to be used in place of FOO::BAZ().

package FOO;

sub new {

my $type = shift;

bless {}, $type;

}

sub bar {

my $self = shift;

$self−>BAZ;

}

sub BAZ {

print "in BAZ\n";

}

package GOOP;

@ISA = qw( FOO );

sub new {

my $type = shift;

bless {}, $type;

}

sub BAZ {

print "in GOOP::BAZ\n";

}

package main;

$a = GOOP−>new;

$a−>bar;

CLASS CONTEXT AND THE OBJECT

Use the object to solve package and class context problems. Everything a method needs should be available

via the object or should be passed as a parameter to the method.

A class will sometimes have static or global data to be used by the methods. A subclass may want to

override that data and replace it with new data. When this happens the superclass may not know how to find

the new copy of the data.

This problem can be solved by using the object to define the context of the method. Let the method look in

the object for a reference to the data. The alternative is to force the method to go hunting for the data ("Is it

in my class, or in a subclass? Which subclass?"), and this can be inconvenient and will lead to hackery. It is

better just to let the object tell the method where that data is located.

package Bar;

%fizzle = ( ’Password’ => ’XYZZY’ );

sub new {

my $type = shift;

my $self = {};

$self−>{’fizzle’} = \%fizzle;

bless $self, $type;

}

354 Version 5.005_02 18−Oct−1998

perlbot Perl Programmers Reference Guide perlbot

sub enter {

my $self = shift;

# Don’t try to guess if we should use %Bar::fizzle

# or %Foo::fizzle. The object already knows which

# we should use, so just ask it.

my $fizzle = $self−>{’fizzle’};

print "The word is ", $fizzle−>{’Password’}, "\n";

}

package Foo;

@ISA = qw( Bar );

%fizzle = ( ’Password’ => ’Rumple’ );

sub new {

my $type = shift;

my $self = Bar−>new;

$self−>{’fizzle’} = \%fizzle;

bless $self, $type;

}

package main;

$a = Bar−>new;

$b = Foo−>new;

$a−>enter;

$b−>enter;

INHERITING A CONSTRUCTOR

An inheritable constructor should use the second form of bless() which allows blessing directly into a

specified class. Notice in this example that the object will be a BAR not a FOO, even though the constructor

is in class FOO.

package FOO;

sub new {

my $type = shift;

my $self = {};

bless $self, $type;

}

sub baz {

print "in FOO::baz()\n";

}

package BAR;

@ISA = qw(FOO);

sub baz {

print "in BAR::baz()\n";

}

package main;

$a = BAR−>new;

$a−>baz;

18−Oct−1998 Version 5.005_02 355

perlbot Perl Programmers Reference Guide perlbot

DELEGATION

Some classes, such as SDBM_File, cannot be effectively subclassed because they create foreign objects.

Such a class can be extended with some sort of aggregation technique such as the "using" relationship

mentioned earlier or by delegation.

The following example demonstrates delegation using an AUTOLOAD() function to perform

message−forwarding. This will allow the Mydbm object to behave exactly like an SDBM_File object. The

Mydbm class could now extend the behavior by adding custom FETCH() and STORE() methods, if this is

desired.

package Mydbm;

require SDBM_File;

require Tie::Hash;

@ISA = qw(Tie::Hash);

sub TIEHASH {

my $type = shift;

my $ref = SDBM_File−>new(@_);

bless {’delegate’ => $ref};

}

sub AUTOLOAD {

my $self = shift;

# The Perl interpreter places the name of the

# message in a variable called $AUTOLOAD.

# DESTROY messages should never be propagated.

return if $AUTOLOAD =~ /::DESTROY$/;

# Remove the package name.

$AUTOLOAD =~ s/^Mydbm:://;

# Pass the message to the delegate.

$self−>{’delegate’}−>$AUTOLOAD(@_);

}

package main;

use Fcntl qw( O_RDWR O_CREAT );

tie %foo, "Mydbm", "adbm", O_RDWR|O_CREAT, 0640;

$foo{’bar’} = 123;

print "foo−bar = $foo{’bar’}\n";

356 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

NAME

perldebug − Perl debugging

DESCRIPTION

First of all, have you tried using the −w switch?

The Perl Debugger

"As soon as we started programming, we found to our surprise that it wasn‘t as easy to get programs right as

we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a

large part of my life from then on was going to be spent in finding mistakes in my own programs."

—Maurice Wilkes, 1949

If you invoke Perl with the −d switch, your script runs under the Perl source debugger. This works like an

interactive Perl environment, prompting for debugger commands that let you examine source code, set

breakpoints, get stack backtraces, change the values of variables, etc. This is so convenient that you often

fire up the debugger all by itself just to test out Perl constructs interactively to see what they do. For

example:

perl −d −e 42

In Perl, the debugger is not a separate program as it usually is in the typical compiled environment. Instead,

the −d flag tells the compiler to insert source information into the parse trees it‘s about to hand off to the

interpreter. That means your code must first compile correctly for the debugger to work on it. Then when

the interpreter starts up, it preloads a Perl library file containing the debugger itself.

The program will halt right before the first run−time executable statement (but see below regarding

compile−time statements) and ask you to enter a debugger command. Contrary to popular expectations,

whenever the debugger halts and shows you a line of code, it always displays the line it‘s about to execute,

rather than the one it has just executed.

Any command not recognized by the debugger is directly executed (eval‘d) as Perl code in the current

package. (The debugger uses the DB package for its own state information.)

Leading white space before a command would cause the debugger to think it‘s NOT a debugger command

but for Perl, so be careful not to do that.

Debugger Commands

The debugger understands the following commands:

h [command] Prints out a help message.

If you supply another debugger command as an argument to the h command, it prints out

the description for just that command. The special argument of h h produces a more

compact help listing, designed to fit together on one screen.

If the output of the h command (or any command, for that matter) scrolls past your screen,

either precede the command with a leading pipe symbol so it‘s run through your pager, as

DB> |h

You may change the pager which is used via O pager=... command.

p expr Same as print {$DB::OUT} expr in the current package. In particular, because this

is just Perl‘s own print function, this means that nested data structures and objects are not

dumped, unlike with the x command.

The DB::OUT filehandle is opened to /dev/tty, regardless of where STDOUT may be

redirected to.

18−Oct−1998 Version 5.005_02 357

perldebug Perl Programmers Reference Guide perldebug

x expr Evaluates its expression in list context and dumps out the result in a pretty−printed fashion.

Nested data structures are printed out recursively, unlike the print function.

The details of printout are governed by multiple Options.

V [pkg [vars]] Display all (or some) variables in package (defaulting to the main package) using a data

pretty−printer (hashes show their keys and values so you see what‘s what, control

characters are made printable, etc.). Make sure you don‘t put the type specifier (like $)

there, just the symbol names, like this:

V DB filename line

Use ~pattern and !pattern for positive and negative regexps.

Nested data structures are printed out in a legible fashion, unlike the print function.

The details of printout are governed by multiple Options.

X [vars] Same as V currentpackage [vars].

T Produce a stack backtrace. See below for details on its output.

s [expr] Single step. Executes until it reaches the beginning of another statement, descending into

subroutine calls. If an expression is supplied that includes function calls, it too will be

single−stepped.

n [expr] Next. Executes over subroutine calls, until it reaches the beginning of the next statement.

If an expression is supplied that includes function calls, those functions will be executed

with stops before each statement.

<CR> Repeat last n or s command.

c [line|sub] Continue, optionally inserting a one−time−only breakpoint at the specified line or

subroutine.

l List next window of lines.

l min+incr List incr+1 lines starting at min.

l min−max List lines min through max. l − is synonymous to −.

l line List a single line.

l subname List first window of lines from subroutine.

− List previous window of lines.

w [line] List window (a few lines) around the current line.

. Return debugger pointer to the last−executed line and print it out.

f filename Switch to viewing a different file or eval statement. If filename is not a full filename as

found in values of %INC, it is considered as a regexp.

/pattern/ Search forwards for pattern; final / is optional.

?pattern? Search backwards for pattern; final ? is optional.

L List all breakpoints and actions.

S [[!]pattern] List subroutine names [not] matching pattern.

t Toggle trace mode (see also AutoTrace Option).

t expr Trace through execution of expr. For example:

$ perl −de 42

Stack dump during die enabled outside of evals.

358 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

Loading DB routines from perl5db.pl patch level 0.94

Emacs support available.

Enter h or ‘h h’ for help.

main::(−e:1): 0

DB<1> sub foo { 14 }

DB<2> sub bar { 3 }

DB<3> t print foo() * bar()

main::((eval 172):3): print foo() + bar();

main::foo((eval 168):2):

main::bar((eval 170):2):

or, with the Option frame=2 set,

DB<4> O f=2

frame = ’2’

DB<5> t print foo() * bar()

3: foo() * bar()

entering main::foo

2: sub foo { 14 };

exited main::foo

entering main::bar

2: sub bar { 3 };

exited main::bar

b [line] [condition]

Set a breakpoint. If line is omitted, sets a breakpoint on the line that is about to be

executed. If a condition is specified, it‘s evaluated each time the statement is reached and

a breakpoint is taken only if the condition is true. Breakpoints may be set on only lines

that begin an executable statement. Conditions don‘t use if:

b 237 $x > 30

b 237 ++$count237 < 11

b 33 /pattern/i

b subname [condition]

Set a breakpoint at the first line of the named subroutine.

b postpone subname [condition]

Set breakpoint at first line of subroutine after it is compiled.

b load filename

Set breakpoint at the first executed line of the file. Filename should be a full name as

found in values of %INC.

b compile subname

Sets breakpoint at the first statement executed after the subroutine is compiled.

d [line] Delete a breakpoint at the specified line. If line is omitted, deletes the breakpoint on the

line that is about to be executed.

D Delete all installed breakpoints.

a [line] command

Set an action to be done before the line is executed. The sequence of steps taken by the

debugger is

18−Oct−1998 Version 5.005_02 359

perldebug Perl Programmers Reference Guide perldebug

1. check for a breakpoint at this line

2. print the line if necessary (tracing)

3. do any actions associated with that line

4. prompt user if at a breakpoint or in single−step

5. evaluate line

For example, this will print out $foo every time line 53 is passed:

a 53 print "DB FOUND $foo\n"

A Delete all installed actions.

W [expr] Add a global watch−expression.

W Delete all watch−expressions.

O [opt[=val]] [opt"val"] [opt?]...

Set or query values of options. val defaults to 1. opt can be abbreviated. Several options

can be listed.

recallCommand, ShellBang

The characters used to recall command or spawn shell. By default, these

are both set to !.

pager Program to use for output of pager−piped commands (those beginning

with a | character.) By default, $ENV{PAGER} will be used.

tkRunning Run Tk while prompting (with ReadLine).

signalLevel, warnLevel, dieLevel

Level of verbosity. By default the debugger is in a sane verbose mode,

thus it will print backtraces on all the warnings and die−messages which

are going to be printed out, and will print a message when interesting

uncaught signals arrive.

To disable this behaviour, set these values to 0. If dieLevel is 2, then

the messages which will be caught by surrounding eval are also

printed.

AutoTrace Trace mode (similar to t command, but can be put into

PERLDB_OPTS).

LineInfo File or pipe to print line number info to. If it is a pipe (say,

|visual_perl_db), then a short, "emacs like" message is used.

inhibit_exit

If 0, allows stepping off the end of the script.

PrintRet affects printing of return value after r command.

ornaments affects screen appearance of the command line (see Term::ReadLine).

frame affects printing messages on entry and exit from subroutines. If frame

& 2 is false, messages are printed on entry only. (Printing on exit may

be useful if inter(di)spersed with other messages.)

If frame & 4, arguments to functions are printed as well as the context

and caller info. If frame & 8, overloaded stringify and tied

FETCH are enabled on the printed arguments. If frame & 16, the

return value from the subroutine is printed as well.

The length at which the argument list is truncated is governed by the

next option:

360 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

maxTraceLen length at which the argument list is truncated when frame option‘s bit 4

is set.

The following options affect what happens with V, X, and x commands:

arrayDepth, hashDepth

Print only first N elements (‘’ for all).

compactDump, veryCompact

Change style of array and hash dump. If compactDump, short array

may be printed on one line.

globPrint Whether to print contents of globs.

DumpDBFiles Dump arrays holding debugged files.

DumpPackages

Dump symbol tables of packages.

DumpReused Dump contents of "reused" addresses.

quote, HighBit, undefPrint

Change style of string dump. Default value of quote is auto, one can

enable either double−quotish dump, or single−quotish by setting it to "

or ’. By default, characters with high bit set are printed as is.

UsageOnly very rudimentally per−package memory usage dump. Calculates total

size of strings in variables in the package.

During startup options are initialized from $ENV{PERLDB_OPTS}. You can put

additional initialization options TTY, noTTY, ReadLine, and NonStop there.

Example rc file:

&parse_options("NonStop=1 LineInfo=db.out AutoTrace");

The script will run without human intervention, putting trace information into the file

db.out. (If you interrupt it, you would better reset LineInfo to something "interactive"!)

TTY The TTY to use for debugging I/O.

noTTY If set, goes in NonStop mode, and would not connect to a TTY. If

interrupt (or if control goes to debugger via explicit setting of

$DB::signal or $DB::single from the Perl script), connects to a

TTY specified by the TTY option at startup, or to a TTY found at

runtime using Term::Rendezvous module of your choice.

This module should implement a method new which returns an object

with two methods: IN and OUT, returning two filehandles to use for

debugging input and output correspondingly. Method new may inspect

an argument which is a value of $ENV{PERLDB_NOTTY} at startup, or

is "/tmp/perldbtty$$" otherwise.

ReadLine If false, readline support in debugger is disabled, so you can debug

ReadLine applications.

NonStop If set, debugger goes into noninteractive mode until interrupted, or

programmatically by setting $DB::signal or $DB::single.

Here‘s an example of using the $ENV{PERLDB_OPTS} variable:

$ PERLDB_OPTS="N f=2" perl −d myprogram

18−Oct−1998 Version 5.005_02 361

perldebug Perl Programmers Reference Guide perldebug

will run the script myprogram without human intervention, printing out the call tree with

entry and exit points. Note that N f=2 is equivalent to NonStop=1 frame=2. Note

also that at the moment when this documentation was written all the options to the

debugger could be uniquely abbreviated by the first letter (with exception of Dump*

options).

Other examples may include

$ PERLDB_OPTS="N f A L=listing" perl −d myprogram

− runs script noninteractively, printing info on each entry into a subroutine and each

executed line into the file listing. (If you interrupt it, you would better reset LineInfo to

something "interactive"!)

$ env "PERLDB_OPTS=R=0 TTY=/dev/ttyc" perl −d myprogram

may be useful for debugging a program which uses Term::ReadLine itself. Do not

forget detach shell from the TTY in the window which corresponds to /dev/ttyc, say, by

issuing a command like

$ sleep 1000000

See "Debugger Internals" below for more details.

< [ command ] Set an action (Perl command) to happen before every debugger prompt. A multi−line

command may be entered by backslashing the newlines. If command is missing, resets

the list of actions.

<< command Add an action (Perl command) to happen before every debugger prompt. A multi−line

command may be entered by backslashing the newlines.

> command Set an action (Perl command) to happen after the prompt when you‘ve just given a

command to return to executing the script. A multi−line command may be entered by

backslashing the newlines. If command is missing, resets the list of actions.

>> command Adds an action (Perl command) to happen after the prompt when you‘ve just given a

command to return to executing the script. A multi−line command may be entered by

backslashing the newlines.

{ [ command ] Set an action (debugger command) to happen before every debugger prompt. A multi−line

command may be entered by backslashing the newlines. If command is missing, resets

the list of actions.

{{ command Add an action (debugger command) to happen before every debugger prompt. A

multi−line command may be entered by backslashing the newlines.

! number Redo a previous command (default previous command).

! −number Redo number‘th−to−last command.

! pattern Redo last command that started with pattern. See O recallCommand, too.

!! cmd Run cmd in a subprocess (reads from DB::IN, writes to DB::OUT) See O shellBang

too.

H −number Display last n commands. Only commands longer than one character are listed. If number

is omitted, lists them all.

q or ^D Quit. ("quit" doesn‘t work for this.) This is the only supported way to exit the debugger,

though typing exit twice may do it too.

Set an Option inhibit_exit to 0 if you want to be able to step off the end the script.

You may also need to set $finished to 0 at some moment if you want to step through

global destruction.

362 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

R Restart the debugger by execing a new session. It tries to maintain your history across this,

but internal settings and command line options may be lost.

Currently the following setting are preserved: history, breakpoints, actions, debugger

Options, and the following command line options: −w, −I, and −e.

|dbcmd Run debugger command, piping DB::OUT to current pager.

||dbcmd Same as |dbcmd but DB::OUT is temporarily selected as well. Often used with

commands that would otherwise produce long output, such as

|V main

= [alias value] Define a command alias, like

= quit q

or list current aliases.

command Execute command as a Perl statement. A missing semicolon will be supplied.

m expr The expression is evaluated, and the methods which may be applied to the result are listed.

m package The methods which may be applied to objects in the package are listed.

Debugger input/output

Prompt The debugger prompt is something like

DB<8>

or even

DB<<17>>

where that number is the command number, which you‘d use to access with the builtin csh−like

history mechanism, e.g., !17 would repeat command number 17. The number of angle brackets

indicates the depth of the debugger. You could get more than one set of brackets, for example, if

you‘d already at a breakpoint and then printed out the result of a function call that itself also has

a breakpoint, or you step into an expression via s/n/t expression command.

Multiline commands

If you want to enter a multi−line command, such as a subroutine definition with several

statements, or a format, you may escape the newline that would normally end the debugger

command with a backslash. Here‘s an example:

DB<1> for (1..4) { \

cont: print "ok\n"; \

cont: }

Note that this business of escaping a newline is specific to interactive commands typed into the

debugger.

Stack backtrace

Here‘s an example of what a stack backtrace via T command might look like:

$ = main::infested called from file ‘Ambulation.pm’ line 10

@ = Ambulation::legs(1, 2, 3, 4) called from file ‘camel_flea’ line 7

$ = main::pests(’bactrian’, 4) called from file ‘camel_flea’ line 4

18−Oct−1998 Version 5.005_02 363

perldebug Perl Programmers Reference Guide perldebug

The left−hand character up there tells whether the function was called in a scalar or list context

(we bet you can tell which is which). What that says is that you were in the function

main::infested when you ran the stack dump, and that it was called in a scalar context

from line 10 of the file Ambulation.pm, but without any arguments at all, meaning it was called

as &infested. The next stack frame shows that the function Ambulation::legs was

called in a list context from the camel_flea file with four arguments. The last stack frame shows

that main::pests was called in a scalar context, also from camel_flea, but from line 4.

Note that if you execute T command from inside an active use statement, the backtrace will

contain both require frame and an eval) frame.

Listing Listing given via different flavors of l command looks like this:

DB<<13>> l

101: @i{@i} = ();

102:b @isa{@i,$pack} = ()

103 if(exists $i{$prevpack} || exists $isa{$pack});

104 }

105

106 next

107==> if(exists $isa{$pack});

108

109:a if ($extra−− > 0) {

110: %isa = ($pack,1);

Note that the breakable lines are marked with :, lines with breakpoints are marked by b, with

actions by a, and the next executed line is marked by ==>.

Frame listing

When frame option is set, debugger would print entered (and optionally exited) subroutines in

different styles.

What follows is the start of the listing of

env "PERLDB_OPTS=f=n N" perl −d −V

for different values of n:

entering main::BEGIN

entering Config::BEGIN

Package lib/Exporter.pm.

Package lib/Carp.pm.

Package lib/Config.pm.

entering Config::TIEHASH

entering Exporter::import

entering Exporter::export

entering Config::myconfig

entering Config::FETCH

entering main::BEGIN

entering Config::BEGIN

Package lib/Exporter.pm.

Package lib/Carp.pm.

exited Config::BEGIN

364 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

Package lib/Config.pm.

entering Config::TIEHASH

exited Config::TIEHASH

entering Exporter::import

entering Exporter::export

exited Exporter::export

exited Exporter::import

exited main::BEGIN

entering Config::myconfig

entering Config::FETCH

exited Config::FETCH

entering Config::FETCH

exited Config::FETCH

entering Config::FETCH

in $=main::BEGIN() from /dev/nul:0

in $=Config::BEGIN() from lib/Config.pm:2

Package lib/Exporter.pm.

Package lib/Carp.pm.

Package lib/Config.pm.

in $=Config::TIEHASH(’Config’) from lib/Config.pm:644

in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

in @=Config::myconfig() from /dev/nul:0

in $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’SUBVERSION’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’osname’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’osvers’) from lib/Config.pm:574

in $=main::BEGIN() from /dev/nul:0

in $=Config::BEGIN() from lib/Config.pm:2

Package lib/Exporter.pm.

Package lib/Carp.pm.

out $=Config::BEGIN() from lib/Config.pm:0

Package lib/Config.pm.

in $=Config::TIEHASH(’Config’) from lib/Config.pm:644

out $=Config::TIEHASH(’Config’) from lib/Config.pm:644

in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

out $=main::BEGIN() from /dev/nul:0

in @=Config::myconfig() from /dev/nul:0

in $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574

out $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574

out $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574

out $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574

in $=Config::FETCH(ref(Config), ’SUBVERSION’) from lib/Config.pm:574

18−Oct−1998 Version 5.005_02 365

perldebug Perl Programmers Reference Guide perldebug

in $=main::BEGIN() from /dev/nul:0

in $=Config::BEGIN() from lib/Config.pm:2

Package lib/Exporter.pm.

Package lib/Carp.pm.

out $=Config::BEGIN() from lib/Config.pm:0

Package lib/Config.pm.

in $=Config::TIEHASH(’Config’) from lib/Config.pm:644

out $=Config::TIEHASH(’Config’) from lib/Config.pm:644

in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

out $=main::BEGIN() from /dev/nul:0

in @=Config::myconfig() from /dev/nul:0

in $=Config::FETCH(’Config=HASH(0x1aa444)’, ’package’) from lib/Confi

out $=Config::FETCH(’Config=HASH(0x1aa444)’, ’package’) from lib/Confi

in $=Config::FETCH(’Config=HASH(0x1aa444)’, ’baserev’) from lib/Confi

out $=Config::FETCH(’Config=HASH(0x1aa444)’, ’baserev’) from lib/Confi

in $=CODE(0x15eca4)() from /dev/null:0

in $=CODE(0x182528)() from lib/Config.pm:2

Package lib/Exporter.pm.

out $=CODE(0x182528)() from lib/Config.pm:0

scalar context return from CODE(0x182528): undef

Package lib/Config.pm.

in $=Config::TIEHASH(’Config’) from lib/Config.pm:628

out $=Config::TIEHASH(’Config’) from lib/Config.pm:628

scalar context return from Config::TIEHASH: empty hash

in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f

scalar context return from Exporter::export: ’’

out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/

scalar context return from Exporter::import: ’’

In all the cases indentation of lines shows the call tree, if bit 2 of frame is set, then a line is

printed on exit from a subroutine as well, if bit 4 is set, then the arguments are printed as well as

the caller info, if bit 8 is set, the arguments are printed even if they are tied or references, if bit 16

is set, the return value is printed as well.

When a package is compiled, a line like this

Package lib/Carp.pm.

is printed with proper indentation.

Debugging compile−time statements

If you have any compile−time executable statements (code within a BEGIN block or a use statement), these

will NOT be stopped by debugger, although requires will (and compile−time statements can be traced

with AutoTrace option set in PERLDB_OPTS). From your own Perl code, however, you can transfer

control back to the debugger using the following statement, which is harmless if the debugger is not running:

$DB::single = 1;

366 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

If you set $DB::single to the value 2, it‘s equivalent to having just typed the n command, whereas a

value of 1 means the s command. The $DB::trace variable should be set to 1 to simulate having typed

the t command.

Another way to debug compile−time code is to start debugger, set a breakpoint on load of some module

thusly

DB<7> b load f:/perllib/lib/Carp.pm

Will stop on load of ‘f:/perllib/lib/Carp.pm’.

and restart debugger by R command (if possible). One can use b compile subname for the same

purpose.

Debugger Customization

Most probably you do not want to modify the debugger, it contains enough hooks to satisfy most needs. You

may change the behaviour of debugger from the debugger itself, using Options, from the command line via

PERLDB_OPTS environment variable, and from customization files.

You can do some customization by setting up a .perldb file which contains initialization code. For instance,

you could make aliases like these (the last one is one people expect to be there):

$DB::alias{’len’} = ’s/^len(.*)/p length($1)/’;

$DB::alias{’stop’} = ’s/^stop (at|in)/b/’;

$DB::alias{’ps’} = ’s/^ps\b/p scalar /’;

$DB::alias{’quit’} = ’s/^quit(\s*)/exit\$/’;

One changes options from .perldb file via calls like this one;

parse_options("NonStop=1 LineInfo=db.out AutoTrace=1 frame=2");

(the code is executed in the package DB). Note that .perldb is processed before processing PERLDB_OPTS.

If .perldb defines the subroutine afterinit, it is called after all the debugger initialization ends. .perldb

may be contained in the current directory, or in the LOGDIR/HOME directory.

If you want to modify the debugger, copy perl5db.pl from the Perl library to another name and modify it as

necessary. You‘ll also want to set your PERL5DB environment variable to say something like this:

BEGIN { require "myperl5db.pl" }

As the last resort, one can use PERL5DB to customize debugger by directly setting internal variables or

calling debugger functions.

Readline Support

As shipped, the only command line history supplied is a simplistic one that checks for leading exclamation

points. However, if you install the Term::ReadKey and Term::ReadLine modules from CPAN, you will

have full editing capabilities much like GNU readline(3) provides. Look for these in the

modules/by−module/Term directory on CPAN.

A rudimentary command line completion is also available. Unfortunately, the names of lexical variables are

not available for completion.

Editor Support for Debugging

If you have GNU emacs installed on your system, it can interact with the Perl debugger to provide an

integrated software development environment reminiscent of its interactions with C debuggers.

Perl is also delivered with a start file for making emacs act like a syntax−directed editor that understands

(some of) Perl‘s syntax. Look in the emacs directory of the Perl source distribution.

(Historically, a similar setup for interacting with vi and the X11 window system had also been available, but

at the time of this writing, no debugger support for vi currently exists.)

18−Oct−1998 Version 5.005_02 367

perldebug Perl Programmers Reference Guide perldebug

The Perl Profiler

If you wish to supply an alternative debugger for Perl to run, just invoke your script with a colon and a

package argument given to the −d flag. One of the most popular alternative debuggers for Perl is DProf, the

Perl profiler. As of this writing, DProf is not included with the standard Perl distribution, but it is expected

to be included soon, for certain values of "soon".

Meanwhile, you can fetch the Devel::Dprof module from CPAN. Assuming it‘s properly installed on your

system, to profile your Perl program in the file mycode.pl, just type:

perl −d:DProf mycode.pl

When the script terminates the profiler will dump the profile information to a file called tmon.out. A tool

like dprofpp (also supplied with the Devel::DProf package) can be used to interpret the information which is

in that profile.

Debugger support in perl

When you call the caller function (see caller) from the package DB, Perl sets the array @DB::args to contain

the arguments the corresponding stack frame was called with.

If perl is run with −d option, the following additional features are enabled (cf.

$^P

Perl inserts the contents of $ENV{PERL5DB} (or BEGIN {require ‘perl5db.pl‘} if not

present) before the first line of the application.

The array @{"_<$filename"} is the line−by−line contents of $filename for all the compiled

files. Same for evaled strings which contain subroutines, or which are currently executed. The

$filename for evaled strings looks like (eval 34).

The hash %{"_<$filename"} contains breakpoints and action (it is keyed by line number), and

individual entries are settable (as opposed to the whole hash). Only true/false is important to Perl,

though the values used by perl5db.pl have the form "$break_condition\0$action". Values

are magical in numeric context: they are zeros if the line is not breakable.

Same for evaluated strings which contain subroutines, or which are currently executed. The

$filename for evaled strings looks like (eval 34).

The scalar ${"_<$filename"} contains "_<$filename". Same for evaluated strings which

contain subroutines, or which are currently executed. The $filename for evaled strings looks like

(eval 34).

After each required file is compiled, but before it is executed,

DB::postponed(*{"_<$filename"}) is called (if subroutine DB::postponed exists).

Here the $filename is the expanded name of the required file (as found in values of %INC).

After each subroutine subname is compiled existence of $DB::postponed{subname} is

checked. If this key exists, DB::postponed(subname) is called (if subroutine

DB::postponed exists).

A hash %DB::sub is maintained, with keys being subroutine names, values having the form

filename:startline−endline. filename has the form (eval 31) for subroutines

defined inside evals.

When execution of the application reaches a place that can have a breakpoint, a call to DB::DB() is

performed if any one of variables $DB::trace, $DB::single, or $DB::signal is true. (Note

that these variables are not localizable.) This feature is disabled when the control is inside

DB::DB() or functions called from it (unless $^D & (1<<30)).

When execution of the application reaches a subroutine call, a call to &DB::sub(

args

) is

performed instead, with $DB::sub being the name of the called subroutine. (Unless the subroutine is

compiled in the package DB.)

368 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

Note that if &DB::sub needs some external data to be setup for it to work, no subroutine call is possible

until this is done. For the standard debugger $DB::deep (how many levels of recursion deep into the

debugger you can go before a mandatory break) gives an example of such a dependency.

The minimal working debugger consists of one line

sub DB::DB {}

which is quite handy as contents of PERL5DB environment variable:

env "PERL5DB=sub DB::DB {}" perl −d your−script

Another (a little bit more useful) minimal debugger can be created with the only line being

sub DB::DB {print ++$i; scalar <STDIN>}

This debugger would print the sequential number of encountered statement, and would wait for your CR to

continue.

The following debugger is quite functional:

{

package DB;

sub DB {}

sub sub {print ++$i, " $sub\n"; &$sub}

}

It prints the sequential number of subroutine call and the name of the called subroutine. Note that

&DB::sub should be compiled into the package DB.

Debugger Internals

At the start, the debugger reads your rc file (./.perldb or ~/.perldb under Unix), which can set important

options. This file may define a subroutine &afterinit to be executed after the debugger is initialized.

After the rc file is read, the debugger reads environment variable PERLDB_OPTS and parses it as a rest of O

... line in debugger prompt.

It also maintains magical internal variables, such as @DB::dbline, %DB::dbline, which are aliases for

@{"::_<current_file"} %{"::_<current_file"}. Here current_file is the currently

selected (with the debugger‘s f command, or by flow of execution) file.

Some functions are provided to simplify customization. See "Debugger Customization" for description of

DB::parse_options(string). The function DB::dump_trace(skip[, count]) skips the

specified number of frames, and returns a list containing info about the caller frames (all if count is

missing). Each entry is a hash with keys context ($ or @), sub (subroutine name, or info about eval),

args (undef or a reference to an array), file, and line.

The function DB::print_trace(FH, skip[, count[, short]]) prints formatted info about

caller frames. The last two functions may be convenient as arguments to <, << commands.

Other resources

You did try the −w switch, didn‘t you?

BUGS

You cannot get the stack frame information or otherwise debug functions that were not compiled by Perl,

such as C or C++ extensions.

If you alter your @_ arguments in a subroutine (such as with shift or pop, the stack backtrace will not show

the original values.

18−Oct−1998 Version 5.005_02 369

perldebug Perl Programmers Reference Guide perldebug

Debugging Perl memory usage

Perl is very frivolous with memory. There is a saying that to estimate memory usage of Perl, assume a

reasonable algorithm of allocation, and multiply your estimages by 10. This is not absolutely true, but may

give you a good grasp of what happens.

Say, an integer cannot take less than 20 bytes of memory, a float cannot take less than 24 bytes, a string

cannot take less than 32 bytes (all these examples assume 32−bit architectures, the result are much worse on

64−bit architectures). If a variable is accessed in two of three different ways (which require an integer, a

float, or a string), the memory footprint may increase by another 20 bytes. A sloppy malloc()

implementation will make these numbers yet more.

On the opposite end of the scale, a declaration like

sub foo;

may take (on some versions of perl) up to 500 bytes of memory.

Off−the−cuff anecdotal estimates of a code bloat give a factor around 8. This means that the compiled form

of reasonable (commented indented etc.) code will take approximately 8 times more than the disk space the

code takes.

There are two Perl−specific ways to analyze the memory usage: $ENV{PERL_DEBUG_MSTATS} and −DL

switch. First one is available only if perl is compiled with Perl‘s malloc(), the second one only if Perl

compiled with −DDEBUGGING (as with giving −D optimise=−g option to Configure).

Using $ENV{PERL_DEBUG_MSTATS}

If your perl is using Perl‘s malloc(), and compiled with correct switches (this is the default), then it will

print memory usage statistics after compiling your code (if $ENV{PERL_DEBUG_MSTATS} 1), and before

termination of the script (if $ENV{PERL_DEBUG_MSTATS} = 1). The report format is similar to one in

the following example:

env PERL_DEBUG_MSTATS=2 perl −e "require Carp"

Memory allocation statistics after compilation: (buckets 4(4)..8188(8192)

14216 free: 130 117 28 7 9 0 2 2 1 0 0

437 61 36 0 5

60924 used: 125 137 161 55 7 8 6 16 2 0 1

74 109 304 84 20

Total sbrk(): 77824/21:119. Odd ends: pad+heads+chain+tail: 0+636+0+2048.

Memory allocation statistics after execution: (buckets 4(4)..8188(8192)

30888 free: 245 78 85 13 6 2 1 3 2 0 1

315 162 39 42 11

175816 used: 265 176 1112 111 26 22 11 27 2 1 1

196 178 1066 798 39

Total sbrk(): 215040/47:145. Odd ends: pad+heads+chain+tail: 0+2192+0+6144.

It is possible to ask for such a statistic at arbitrary moment by usind Devel::Peek::mstats() (module

Devel::Peek is available on CPAN).

Here is the explanation of different parts of the format:

buckets SMALLEST(APPROX)..GREATEST(APPROX)

Perl‘s malloc() uses bucketed allocations. Every request is rounded up to the closest bucket size

available, and a bucket of these size is taken from the pool of the buckets of this size.

The above line describes limits of buckets currently in use. Each bucket has two sizes: memory

footprint, and the maximal size of user data which may be put into this bucket. Say, in the above

example the smallest bucket is both sizes 4. The biggest bucket has usable size 8188, and the memory

footprint 8192.

370 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

With debugging Perl some buckets may have negative usable size. This means that these buckets

cannot (and will not) be used. For greater buckets the memory footprint may be one page greater than

a power of 2. In such a case the corresponding power of two is printed instead in the APPROX field

above.

Free/Used

The following 1 or 2 rows of numbers correspond to the number of buckets of each size between

SMALLEST and GREATEST. In the first row the sizes (memory footprints) of buckets are powers of

two (or possibly one page greater). In the second row (if present) the memory footprints of the buckets

are between memory footprints of two buckets "above".

Say, with the above example the memory footprints are (with current algorith)

free: 8 16 32 64 128 256 512 1024 2048 4096 8192

4 12 24 48 80

With non−DEBUGGING perl the buckets starting from 128−long ones have 4−byte overhead, thus

8192−long bucket may take up to 8188−byte−long allocations.

Total sbrk(): SBRKed/SBRKs:CONTINUOUS

The first two fields give the total amount of memory perl sbrk()ed, and number of sbrk()s used.

The third number is what perl thinks about continuity of returned chunks. As far as this number is

positive, malloc() will assume that it is probable that sbrk() will provide continuous memory.

The amounts sbrk()ed by external libraries is not counted.

pad: 0

The amount of sbrk()ed memory needed to keep buckets aligned.

heads: 2192

While memory overhead of bigger buckets is kept inside the bucket, for smaller buckets it is kept in

separate areas. This field gives the total size of these areas.

chain: 0

malloc() may want to subdivide a bigger bucket into smaller buckets. If only a part of the

deceased−bucket is left non−subdivided, the rest is kept as an element of a linked list. This field gives

the total size of these chunks.

tail: 6144

To minimize amount of sbrk()s malloc() asks for more memory. This field gives the size of the

yet−unused part, which is sbrk()ed, but never touched.

Example of using −DL switch

Below we show how to analyse memory usage by

do ’lib/auto/POSIX/autosplit.ix’;

The file in question contains a header and 146 lines similar to

sub getcwd ;

Note: the discussion below supposes 32−bit architecture. In the newer versions of perl the memory usage of

the constructs discussed here is much improved, but the story discussed below is a real−life story. This story

is very terse, and assumes more than cursory knowledge of Perl internals.

Here is the itemized list of Perl allocations performed during parsing of this file:

!!! "after" at test.pl line 3.

Id subtot 4 8 12 16 20 24 28 32 36 40 48 56 64 72 80 80+

0 02 13752 . . . . 294 . . . . . . . . . . 4

0 54 5545 . . 8 124 16 . . . 1 1 . . . . . 3

5 05 32 . . . . . . . 1 . . . . . . . .

18−Oct−1998 Version 5.005_02 371

perldebug Perl Programmers Reference Guide perldebug

6 02 7152 . . . . . . . . . . 149 . . . . .

7 02 3600 . . . . . 150 . . . . . . . . . .

7 03 64 . −1 . 1 . . 2 . . . . . . . . .

7 04 7056 . . . . . . . . . . . . . . . 7

7 17 38404 . . . . . . . 1 . . 442 149 . . 147 .

9 03 2078 17 249 32 . . . . 2 . . . . . . . .

To see this list insert two warn(‘!...’) statements around the call:

warn(’!’);

do ’lib/auto/POSIX/autosplit.ix’;

warn(’!!! "after"’);

and run it with −DL option. The first warn() will print memory allocation info before the parsing of the

file, and will memorize the statistics at this point (we ignore what it prints). The second warn() will print

increments w.r.t. this memorized statistics. This is the above printout.

Different Ids on the left correspond to different subsystems of perl interpreter, they are just first argument

given to perl memory allocation API New(). To find what 9 03 means grep the perl source for 903.

You will see that it is util.c, function savepvn(). This function is used to store a copy of existing chunk

of memory. Using C debugger, one can see that it is called either directly from gv_init(), or via

sv_magic(), and gv_init() is called from gv_fetchpv() − which is called from newSUB().

Note: to reach this place in debugger and skip all the calls to savepvn during the compilation of the main

script, set a C breakpoint in Perl_warn(), continue this point is reached, then set breakpoint in

Perl_savepvn(). Note that you may need to skip a handful of Perl_savepvn() which do not

correspond to mass production of CVs (there are more 903 allocations than 146 similar lines of

lib/auto/POSIX/autosplit.ix). Note also that Perl_ prefixes are added by macroization code in perl header

files to avoid conflicts with external libraries.

Anyway, we see that 903 ids correspond to creation of globs, twice per glob − for glob name, and glob

stringification magic.

Here are explanations for other Ids above:

717

is for creation of bigger XPV* structures. In the above case it creates 3 AV per subroutine, one for a

list of lexical variable names, one for a scratchpad (which contains lexical variables and targets),

and one for the array of scratchpads needed for recursion.

It also creates a GV and a CV per subroutine (all called from start_subparse()).

002 Creates C array corresponding to the AV of scratchpads, and the scratchpad itself (the first fake entry of

this scratchpad is created though the subroutine itself is not defined yet).

It also creates C arrays to keep data for the stash (this is one HV, but it grows, thus there are 4 big

allocations: the big chunks are not freeed, but are kept as additional arenas for SV allocations).

054 creates a HEK for the name of the glob for the subroutine (this name is a key in a stash).

Big allocations with this Id correspond to allocations of new arenas to keep HE.

602 creates a GP for the glob for the subroutine.

702 creates the MAGIC for the glob for the subroutine.

704 creates arenas which keep SVs.

−DL details

If Perl is run with −DL option, then warn()s which start with ‘!’ behave specially. They print a list of

categories of memory allocations, and statistics of allocations of different sizes for these categories.

If warn() string starts with

372 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

!!!

print changed categories only, print the differences in counts of allocations;

!! print grown categories only; print the absolute values of counts, and totals;

! print nonempty categories, print the absolute values of counts and totals.

Limitations of −DL statistic

If an extension or an external library does not use Perl API to allocate memory, these allocations are not

counted.

Debugging regular expressions

There are two ways to enable debugging output for regular expressions.

If your perl is compiled with −DDEBUGGING, you may use the −Dr flag on the command line.

Otherwise, one can use re ‘debug’, which has effects both at compile time, and at run time (and is not

lexically scoped).

Compile−time output

The debugging output for the compile time looks like this:

compiling RE ‘[bc]d(ef*g)+h[ij]k$’

size 43 first at 1

1: ANYOF(11)

11: EXACT <d>(13)

13: CURLYX {1,32767}(27)

15: OPEN1(17)

17: EXACT <e>(19)

19: STAR(22)

20: EXACT <f>(0)

22: EXACT <g>(24)

24: CLOSE1(26)

26: WHILEM(0)

27: NOTHING(28)

28: EXACT <h>(30)

30: ANYOF(40)

40: EXACT <k>(42)

42: EOL(43)

43: END(0)

anchored ‘de’ at 1 floating ‘gh’ at 3..2147483647 (checking floating)

stclass ‘ANYOF’ minlen 7

The first line shows the pre−compiled form of the regexp, and the second shows the size of the compiled

form (in arbitrary units, usually 4−byte words) and the label id of the first node which does a match.

The last line (split into two lines in the above) contains the optimizer info. In the example shown, the

optimizer found that the match should contain a substring de at the offset 1, and substring gh at some offset

between 3 and infinity. Moreover, when checking for these substrings (to abandon impossible matches

quickly) it will check for the substring gh before checking for the substring de. The optimizer may also use

the knowledge that the match starts (at the first id) with a character class, and the match cannot be shorter

than 7 chars.

The fields of interest which may appear in the last line are

anchored

STRING

POS

floating

STRING

POS1..POS2

see above;

18−Oct−1998 Version 5.005_02 373

perldebug Perl Programmers Reference Guide perldebug

matching floating/anchored

which substring to check first;

minlen

the minimal length of the match;

stclass

TYPE

The type of the first matching node.

noscan

which advises to not scan for the found substrings;

isall

which says that the optimizer info is in fact all that the regular expression contains (thus one does not

need to enter the RE engine at all);

GPOS

if the pattern contains \G;

plus

if the pattern starts with a repeated char (as in x+y);

implicit

if the pattern starts with .*;

with eval

if the pattern contain eval−groups (see (?{ code }));

anchored(TYPE)

if the pattern may match only at a handful of places (with TYPE being BOL, MBOL, or GPOS, see the

table below).

If a substring is known to match at end−of−line only, it may be followed by $, as in floating ‘k‘$.

The optimizer−specific info is used to avoid entering (a slow) RE engine on strings which will definitely not

match. If isall flag is set, a call to the RE engine may be avoided even when optimizer found an

appropriate place for the match.

The rest of the output contains the list of nodes of the compiled form of the RE. Each line has format

id: TYPE OPTIONAL−INFO (next−id)

Types of nodes

Here is the list of possible types with short descriptions:

# TYPE arg−description [num−args] [longjump−len] DESCRIPTION

# Exit points

END no End of program.

SUCCEED no Return from a subroutine, basically.

# Anchors:

BOL no Match "" at beginning of line.

MBOL no Same, assuming multiline.

SBOL no Same, assuming singleline.

EOS no Match "" at end of string.

EOL no Match "" at end of line.

MEOL no Same, assuming multiline.

SEOL no Same, assuming singleline.

BOUND no Match "" at any word boundary

374 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

BOUNDL no Match "" at any word boundary

NBOUND no Match "" at any word non−boundary

NBOUNDL no Match "" at any word non−boundary

GPOSno Matches where last m//g left off.

# [Special] alternatives

ANY no Match any one character (except newline).

SANY no Match any one character.

ANYOF sv Match character in (or not in) this class.

ALNUM no Match any alphanumeric character

ALNUML no Match any alphanumeric char in locale

NALNUM no Match any non−alphanumeric character

NALNUML no Match any non−alphanumeric char in locale

SPACE no Match any whitespace character

SPACEL no Match any whitespace char in locale

NSPACE no Match any non−whitespace character

NSPACEL no Match any non−whitespace char in locale

DIGIT no Match any numeric character

NDIGIT no Match any non−numeric character

# BRANCH The set of branches constituting a single choice are hooked

# together with their "next" pointers, since precedence prevents

# anything being concatenated to any individual branch. The

# "next" pointer of the last BRANCH in a choice points to the

# thing following the whole choice. This is also where the

# final "next" pointer of each individual branch points; each

# branch starts with the operand node of a BRANCH node.

BRANCH node Match this alternative, or the next...

# BACK Normal "next" pointers all implicitly point forward; BACK

# exists to make loop structures possible.

# not used

BACK no Match "", "next" ptr points backward.

# Literals

EXACT sv Match this string (preceded by length).

EXACTF sv Match this string, folded (prec. by length).

EXACTFL sv Match this string, folded in locale (w/len).

# Do nothing

NOTHING no Match empty string.

# A variant of above which delimits a group, thus stops optimizations

TAIL no Match empty string. Can jump here from outside.

# STAR,PLUS ’?’, and complex ’*’ and ’+’, are implemented as circular

# BRANCH structures using BACK. Simple cases (one character

# per match) are implemented with STAR and PLUS for speed

# and to minimize recursive plunges.

STAR node Match this (simple) thing 0 or more times.

PLUS node Match this (simple) thing 1 or more times.

CURLY sv 2 Match this simple thing {n,m} times.

CURLYN no 2 Match next−after−this simple thing

# {n,m} times, set parenths.

CURLYM no 2 Match this medium−complex thing {n,m} times.

CURLYX sv 2 Match this complex thing {n,m} times.

18−Oct−1998 Version 5.005_02 375

perldebug Perl Programmers Reference Guide perldebug

# This terminator creates a loop structure for CURLYX

WHILEM no Do curly processing and see if rest matches.

# OPEN,CLOSE,GROUPP ...are numbered at compile time.

OPEN num 1 Mark this point in input as start of #n.

CLOSE num 1 Analogous to OPEN.

REF num 1 Match some already matched string

REFF num 1 Match already matched string, folded

REFFL num 1 Match already matched string, folded in loc.

# grouping assertions

IFMATCH off 1 2 Succeeds if the following matches.

UNLESSM off 1 2 Fails if the following matches.

SUSPEND off 1 1 "Independent" sub−RE.

IFTHEN off 1 1 Switch, should be preceeded by switcher .

GROUPP num 1 Whether the group matched.

# Support for long RE

LONGJMP off 1 1 Jump far away.

BRANCHJ off 1 1 BRANCH with long offset.

# The heavy worker

EVAL evl 1 Execute some Perl code.

# Modifiers

MINMOD no Next operator is not greedy.

LOGICAL no Next opcode should set the flag only.

# This is not used yet

RENUM off 1 1 Group with independently numbered parens.

# This is not really a node, but an optimized away piece of a "long" node.

# To simplify debugging output, we mark it as if it were a node

OPTIMIZED off Placeholder for dump.

Run−time output

First of all, when doing a match, one may get no run−time output even if debugging is enabled. this means

that the RE engine was never entered, all of the job was done by the optimizer.

If RE engine was entered, the output may look like this:

Matching ‘[bc]d(ef*g)+h[ij]k$’ against ‘abcdefg__gh__’

Setting an EVAL scope, savestack=3

2 <ab> <cdefg__gh_> | 1: ANYOF

3 <abc> <defg__gh_> | 11: EXACT <d>

4 <abcd> <efg__gh_> | 13: CURLYX {1,32767}

4 <abcd> <efg__gh_> | 26: WHILEM

0 out of 1..32767 cc=effff31c

4 <abcd> <efg__gh_> | 15: OPEN1

4 <abcd> <efg__gh_> | 17: EXACT <e>

5 <abcde> <fg__gh_> | 19: STAR

EXACT <f> can match 1 times out of 32767...

Setting an EVAL scope, savestack=3

6 <bcdef> <g__gh__> | 22: EXACT <g>

7 <bcdefg> <__gh__> | 24: CLOSE1

7 <bcdefg> <__gh__> | 26: WHILEM

1 out of 1..32767 cc=effff31c

Setting an EVAL scope, savestack=12

7 <bcdefg> <__gh__> | 15: OPEN1

376 Version 5.005_02 18−Oct−1998

perldebug Perl Programmers Reference Guide perldebug

7 <bcdefg> <__gh__> | 17: EXACT <e>

restoring \1 to 4(4)..7

failed, try continuation...

7 <bcdefg> <__gh__> | 27: NOTHING

7 <bcdefg> <__gh__> | 28: EXACT <h>

failed...

The most significant information in the output is about the particular node of the compiled RE which is

currently being tested against the target string. The format of these lines is

STRING−OFFSET <PRE−STRING <POST−STRING |ID: TYPE

The TYPE info is indented with respect to the backtracking level. Other incidental information appears

interspersed within.

18−Oct−1998 Version 5.005_02 377

perldiag Perl Programmers Reference Guide perldiag

NAME

perldiag − various Perl diagnostics

DESCRIPTION

These messages are classified as follows (listed in increasing order of desperation):

(W) A warning (optional).

(D) A deprecation (optional).

(S) A severe warning (mandatory).

(F) A fatal error (trappable).

(P) An internal error you should never see (trappable).

(X) A very fatal error (nontrappable).

(A) An alien error message (not generated by Perl).

Optional warnings are enabled by using the −w switch. Warnings may be captured by setting

$SIG{__WARN__} to a reference to a routine that will be called on each warning instead of printing it. See

perlvar. Trappable errors may be trapped using the eval operator. See eval.

Some of these messages are generic. Spots that vary are denoted with a %s, just as in a printf format. Note

that some messages start with a %s! The symbols "%(−?@ sort before the letters, while [ and \ sort after.

"my" variable %s can‘t be in a package

(F) Lexically scoped variables aren‘t in a package, so it doesn‘t make sense to try to declare one with a

package qualifier on the front. Use local() if you want to localize a package variable.

"my" variable %s masks earlier declaration in same scope

(W) A lexical variable has been redeclared in the same scope, effectively eliminating all access to the

previous instance. This is almost always a typographical error. Note that the earlier variable will still

exist until the end of the scope or until all closure referents to it are destroyed.

"no" not allowed in expression

(F) The "no" keyword is recognized and executed at compile time, and returns no useful value. See

perlmod.

"use" not allowed in expression

(F) The "use" keyword is recognized and executed at compile time, and returns no useful value. See

perlmod.

% may only be used in unpack

(F) You can‘t pack a string by supplying a checksum, because the checksumming process loses

information, and you can‘t go the other way. See unpack.

%s (...) interpreted as function

(W) You‘ve run afoul of the rule that says that any list operator followed by parentheses turns into a

function, with all the list operators arguments found inside the parentheses. See

Terms and List Operators (Leftward).

%s argument is not a HASH element

(F) The argument to exists() must be a hash element, such as

$foo{$bar}

$ref−>[12]−>{"susie"}

%s argument is not a HASH element or slice

(F) The argument to delete() must be either a hash element, such as

$foo{$bar}

$ref−>[12]−>{"susie"}

378 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

or a hash slice, such as

@foo{$bar, $baz, $xyzzy}

@{$ref−>[12]}{"susie", "queue"}

%s did not return a true value

(F) A required (or used) file must return a true value to indicate that it compiled correctly and ran its

initialization code correctly. It‘s traditional to end such a file with a "1;", though any true value would

do. See require.

%s found where operator expected

(S) The Perl lexer knows whether to expect a term or an operator. If it sees what it knows to be a term

when it was expecting to see an operator, it gives you this warning. Usually it indicates that an

operator or delimiter was omitted, such as a semicolon.

%s had compilation errors

(F) The final summary message when a perl −c fails.

%s has too many errors

(F) The parser has given up trying to parse the program after 10 errors. Further error messages would

likely be uninformative.

%s matches null string many times

(W) The pattern you‘ve specified would be an infinite loop if the regular expression engine didn‘t

specifically check for that. See perlre.

%s never introduced

(S) The symbol in question was declared but somehow went out of scope before it could possibly have

been used.

%s syntax OK

(F) The final summary message when a perl −c succeeds.

%s: Command not found

(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually

feed your script into Perl yourself.

%s: Expression syntax

(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually

feed your script into Perl yourself.

%s: Undefined variable

(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually

feed your script into Perl yourself.

%s: not found

(A) You‘ve accidentally run your script through the Bourne shell instead of Perl. Check the #! line, or

manually feed your script into Perl yourself.

(Missing semicolon on previous line?)

(S) This is an educated guess made in conjunction with the message "%s found where operator

expected". Don‘t automatically put a semicolon on the previous line just because you saw this

message.

−P not allowed for setuid/setgid script

(F) The script would have to be opened by the C preprocessor by name, which provides a race

condition that breaks security.

18−Oct−1998 Version 5.005_02 379

perldiag Perl Programmers Reference Guide perldiag

−T and −B not implemented on filehandles

(F) Perl can‘t peek at the stdio buffer of filehandles when it doesn‘t know about your kind of stdio.

You‘ll have to use a filename instead.

−p destination: %s

(F) An error occurred during the implicit output invoked by the −p command−line switch. (This

output goes to STDOUT unless you‘ve redirected it with select().)

500 Server error

See Server error.

?+* follows nothing in regexp

(F) You started a regular expression with a quantifier. Backslash it if you meant it literally. See

perlre.

@ outside of string

(F) You had a pack template that specified an absolute position outside the string being unpacked. See

pack.

accept() on closed fd

(W) You tried to do an accept on a closed socket. Did you forget to check the return value of your

socket() call? See accept.

Allocation too large: %lx

(X) You can‘t allocate more than 64K on an MS−DOS machine.

Applying %s to %s will act on scalar(%s)

(W) The pattern match (//), substitution (s///), and transliteration (tr///) operators work on scalar values.

If you apply one of them to an array or a hash, it will convert the array or hash to a scalar value — the

length of an array, or the population info of a hash — and then work on that scalar value. This is

probably not what you meant to do. See grep and map for alternatives.

Arg too short for msgsnd

(F) msgsnd() requires a string at least as long as sizeof(long).

Ambiguous use of %s resolved as %s

(W)(S) You said something that may not be interpreted the way you thought. Normally it‘s pretty easy

to disambiguate it by supplying a missing quote, operator, parenthesis pair or declaration.

Ambiguous call resolved as CORE::%s(), qualify as such or use &

(W) A subroutine you have declared has the same name as a Perl keyword, and you have used the

name without qualification for calling one or the other. Perl decided to call the builtin because the

subroutine is not imported.

To force interpretation as a subroutine call, either put an ampersand before the subroutine name, or

qualify the name with its package. Alternatively, you can import the subroutine (or pretend that it‘s

imported with the use subs pragma).

To silently interpret it as the Perl operator, use the CORE:: prefix on the operator (e.g.

CORE::log($x)) or by declaring the subroutine to be an object method (see attrs).

Args must match #! line

(F) The setuid emulator requires that the arguments Perl was invoked with match the arguments

specified on the #! line. Since some systems impose a one−argument limit on the #! line, try

combining switches; for example, turn −w −U into −wU.

380 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Argument "%s" isn‘t numeric%s

(W) The indicated string was fed as an argument to an operator that expected a numeric value instead.

If you‘re fortunate the message will identify which operator was so unfortunate.

Array @%s missing the @ in argument %d of %s()

(D) Really old Perl let you omit the @ on array names in some spots. This is now heavily deprecated.

assertion botched: %s

(P) The malloc package that comes with Perl had an internal failure.

Assertion failed: file "%s"

(P) A general assertion failed. The file in question must be examined.

Assignment to both a list and a scalar

(F) If you assign to a conditional operator, the 2nd and 3rd arguments must either both be scalars or

both be lists. Otherwise Perl won‘t know which context to supply to the right side.

Attempt to free non−arena SV: 0x%lx

(P) All SV objects are supposed to be allocated from arenas that will be garbage collected on exit. An

SV was discovered to be outside any of those arenas.

Attempt to free nonexistent shared string

(P) Perl maintains a reference counted internal table of strings to optimize the storage and access of

hash keys and other strings. This indicates someone tried to decrement the reference count of a string

that can no longer be found in the table.

Attempt to free temp prematurely

(W) Mortalized values are supposed to be freed by the free_tmps() routine. This indicates that

something else is freeing the SV before the free_tmps() routine gets a chance, which means that

the free_tmps() routine will be freeing an unreferenced scalar when it does try to free it.

Attempt to free unreferenced glob pointers

(P) The reference counts got screwed up on symbol aliases.

Attempt to free unreferenced scalar

(W) Perl went to decrement the reference count of a scalar to see if it would go to 0, and discovered

that it had already gone to 0 earlier, and should have been freed, and in fact, probably was freed. This

could indicate that SvREFCNT_dec() was called too many times, or that SvREFCNT_inc() was

called too few times, or that the SV was mortalized when it shouldn‘t have been, or that memory has

been corrupted.

Attempt to pack pointer to temporary value

(W) You tried to pass a temporary value (like the result of a function, or a computed expression) to the

"p" pack() template. This means the result contains a pointer to a location that could become invalid

anytime, even before the end of the current statement. Use literals or global values as arguments to the

"p" pack() template to avoid this warning.

Attempt to use reference as lvalue in substr

(W) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty

strange. Perhaps you forgot to dereference it first. See substr.

Bad arg length for %s, is %d, should be %d

(F) You passed a buffer of the wrong size to one of msgctl(), semctl() or shmctl(). In C

parlance, the correct sizes are, respectively, sizeof(struct msqid_ds *), sizeof(struct semid_ds *), and

sizeof(struct shmid_ds *).

18−Oct−1998 Version 5.005_02 381

perldiag Perl Programmers Reference Guide perldiag

Bad filehandle: %s

(F) A symbol was passed to something wanting a filehandle, but the symbol has no filehandle

associated with it. Perhaps you didn‘t do an open(), or did it in another package.

Bad free() ignored

(S) An internal routine called free() on something that had never been malloc()ed in the first

place. Mandatory, but can be disabled by setting environment variable PERL_BADFREE to 1.

This message can be quite often seen with DB_File on systems with "hard" dynamic linking, like AIX

and OS/2. It is a bug of Berkeley DB which is left unnoticed if DB uses forgiving system

malloc().

Bad hash

(P) One of the internal hash routines was passed a null HV pointer.

Bad index while coercing array into hash

(F) The index looked up in the hash found as the 0‘th element of a pseudo−hash is not legal. Index

values must be at 1 or greater. See perlref.

Bad name after %s::

(F) You started to name a symbol by using a package prefix, and then didn‘t finish the symbol. In

particular, you can‘t interpolate outside of quotes, so

$var = ’myvar’;

$sym = mypack::$var;

is not the same as

$var = ’myvar’;

$sym = "mypack::$var";

Bad symbol for array

(P) An internal request asked to add an array entry to something that wasn‘t a symbol table entry.

Bad symbol for filehandle

(P) An internal request asked to add a filehandle entry to something that wasn‘t a symbol table entry.

Bad symbol for hash

(P) An internal request asked to add a hash entry to something that wasn‘t a symbol table entry.

Badly placed ()‘s

(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually

feed your script into Perl yourself.

Bareword "%s" not allowed while "strict subs" in use

(F) With "strict subs" in use, a bareword is only allowed as a subroutine identifier, in curly braces or to

the left of the "=" symbol. Perhaps you need to predeclare a subroutine?

Bareword "%s" refers to nonexistent package

(W) You used a qualified bareword of the form Foo::, but the compiler saw no other uses of that

namespace before that point. Perhaps you need to predeclare a package?

BEGIN failed—compilation aborted

(F) An untrapped exception was raised while executing a BEGIN subroutine. Compilation stops

immediately and the interpreter is exited.

BEGIN not safe after errors—compilation aborted

(F) Perl found a BEGIN {} subroutine (or a use directive, which implies a BEGIN {}) after one or

more compilation errors had already occurred. Since the intended environment for the BEGIN {}

382 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

could not be guaranteed (due to the errors), and since subsequent code likely depends on its correct

operation, Perl just gave up.

bind() on closed fd

(W) You tried to do a bind on a closed socket. Did you forget to check the return value of your

socket() call? See bind.

Bizarre copy of %s in %s

(P) Perl detected an attempt to copy an internal value that is not copiable.

Callback called exit

(F) A subroutine invoked from an external package via perl_call_sv() exited by calling exit.

Can‘t "goto" outside a block

(F) A "goto" statement was executed to jump out of what might look like a block, except that it isn‘t a

proper block. This usually occurs if you tried to jump out of a sort() block or subroutine, which is a

no−no. See goto.

Can‘t "goto" into the middle of a foreach loop

(F) A "goto" statement was executed to jump into the middle of a foreach loop. You can‘t get there

from here. See goto.

Can‘t "last" outside a block

(F) A "last" statement was executed to break out of the current block, except that there‘s this itty bitty

problem called there isn‘t a current block. Note that an "if" or "else" block doesn‘t count as a

"loopish" block, as doesn‘t a block given to sort(). You can usually double the curlies to get the

same effect though, because the inner curlies will be considered a block that loops once. See last.

Can‘t "next" outside a block

(F) A "next" statement was executed to reiterate the current block, but there isn‘t a current block. Note

that an "if" or "else" block doesn‘t count as a "loopish" block, as doesn‘t a block given to sort().

You can usually double the curlies to get the same effect though, because the inner curlies will be

considered a block that loops once. See next.

Can‘t "redo" outside a block

(F) A "redo" statement was executed to restart the current block, but there isn‘t a current block. Note

that an "if" or "else" block doesn‘t count as a "loopish" block, as doesn‘t a block given to sort().

You can usually double the curlies to get the same effect though, because the inner curlies will be

considered a block that loops once. See redo.

Can‘t bless non−reference value

(F) Only hard references may be blessed. This is how Perl "enforces" encapsulation of objects. See

perlobj.

Can‘t break at that line

(S) A warning intended to only be printed while running within the debugger, indicating the line

number specified wasn‘t the location of a statement that could be stopped at.

Can‘t call method "%s" in empty package "%s"

(F) You called a method correctly, and it correctly indicated a package functioning as a class, but that

package doesn‘t have ANYTHING defined in it, let alone methods. See perlobj.

Can‘t call method "%s" on unblessed reference

(F) A method call must know in what package it‘s supposed to run. It ordinarily finds this out from the

object reference you supply, but you didn‘t supply an object reference in this case. A reference isn‘t an

object reference until it has been blessed. See perlobj.

18−Oct−1998 Version 5.005_02 383

perldiag Perl Programmers Reference Guide perldiag

Can‘t call method "%s" without a package or object reference

(F) You used the syntax of a method call, but the slot filled by the object reference or package name

contains an expression that returns a defined value which is neither an object reference nor a package

name. Something like this will reproduce the error:

$BADREF = 42;

process $BADREF 1,2,3;

$BADREF−>process(1,2,3);

Can‘t call method "%s" on an undefined value

(F) You used the syntax of a method call, but the slot filled by the object reference or package name

contains an undefined value. Something like this will reproduce the error:

$BADREF = undef;

process $BADREF 1,2,3;

$BADREF−>process(1,2,3);

Can‘t chdir to %s

(F) You called perl −x/foo/bar, but /foo/bar is not a directory that you can chdir to, possibly

because it doesn‘t exist.

Can‘t coerce %s to integer in %s

(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop

being what they are. So you can‘t say things like:

*foo += 1;

You CAN say

$foo = *foo;

$foo += 1;

but then $foo no longer contains a glob.

Can‘t coerce %s to number in %s

(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop

being what they are.

Can‘t coerce %s to string in %s

(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop

being what they are.

Can‘t coerce array into hash

(F) You used an array where a hash was expected, but the array has no information on how to map

from keys to array indices. You can do that only with arrays that have a hash reference at index 0.

Can‘t create pipe mailbox

(P) An error peculiar to VMS. The process is suffering from exhausted quotas or other plumbing

problems.

Can‘t declare %s in my

(F) Only scalar, array, and hash variables may be declared as lexical variables. They must have

ordinary identifiers as names.

Can‘t do inplace edit on %s: %s

(S) The creation of the new file failed for the indicated reason.

Can‘t do inplace edit without backup

(F) You‘re on a system such as MS−DOS that gets confused if you try reading from a deleted (but still

opened) file. You have to say −i.bak, or some such.

384 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Can‘t do inplace edit: %s > 14 characters

(S) There isn‘t enough room in the filename to make a backup name for the file.

Can‘t do inplace edit: %s is not a regular file

(S) You tried to use the −i switch on a special file, such as a file in /dev, or a FIFO. The file was

ignored.

Can‘t do setegid!

(P) The setegid() call failed for some reason in the setuid emulator of suidperl.

Can‘t do seteuid!

(P) The setuid emulator of suidperl failed for some reason.

Can‘t do setuid

(F) This typically means that ordinary perl tried to exec suidperl to do setuid emulation, but couldn‘t

exec it. It looks for a name of the form sperl5.000 in the same directory that the perl executable resides

under the name perl5.000, typically /usr/local/bin on Unix machines. If the file is there, check the

execute permissions. If it isn‘t, ask your sysadmin why he and/or she removed it.

Can‘t do waitpid with flags

(F) This machine doesn‘t have either waitpid() or wait4(), so only waitpid() without flags

is emulated.

Can‘t do {n,m} with n > m

(F) Minima must be less than or equal to maxima. If you really want your regexp to match something

0 times, just put {0}. See perlre.

Can‘t emulate −%s on #! line

(F) The #! line specifies a switch that doesn‘t make sense at this point. For example, it‘d be kind of

silly to put a −x on the #! line.

Can‘t exec "%s": %s

(W) An system(), exec(), or piped open call could not execute the named program for the

indicated reason. Typical reasons include: the permissions were wrong on the file, the file wasn‘t

found in $ENV{PATH}, the executable in question was compiled for another architecture, or the #!

line in a script points to an interpreter that can‘t be run for similar reasons. (Or maybe your system

doesn‘t support #! at all.)

Can‘t exec %s

(F) Perl was trying to execute the indicated program for you because that‘s what the #! line said. If

that‘s not what you wanted, you may need to mention "perl" on the #! line somewhere.

Can‘t execute %s

(F) You used the −S switch, but the copies of the script to execute found in the PATH did not have

correct permissions.

Can‘t find %s on PATH, ’.’ not in PATH

(F) You used the −S switch, but the script to execute could not be found in the PATH, or at least not

with the correct permissions. The script exists in the current directory, but PATH prohibits running it.

Can‘t find %s on PATH

(F) You used the −S switch, but the script to execute could not be found in the PATH.

Can‘t find label %s

(F) You said to goto a label that isn‘t mentioned anywhere that it‘s possible for us to go to. See goto.

18−Oct−1998 Version 5.005_02 385

perldiag Perl Programmers Reference Guide perldiag

Can‘t find string terminator %s anywhere before EOF

(F) Perl strings can stretch over multiple lines. This message means that the closing delimiter was

omitted. Because bracketed quotes count nesting levels, the following is missing its final parenthesis:

print q(The character ’(’ starts a side comment.);

If you‘re getting this error from a here−document, you may have included unseen whitespace before

or after your closing tag. A good programmer‘s editor will have a way to help you find these

characters.

Can‘t fork

(F) A fatal error occurred while trying to fork while opening a pipeline.

Can‘t get filespec − stale stat buffer?

(S) A warning peculiar to VMS. This arises because of the difference between access checks under

VMS and under the Unix model Perl assumes. Under VMS, access checks are done by filename,

rather than by bits in the stat buffer, so that ACLs and other protections can be taken into account.

Unfortunately, Perl assumes that the stat buffer contains all the necessary information, and passes it,

instead of the filespec, to the access checking routine. It will try to retrieve the filespec using the

device name and FID present in the stat buffer, but this works only if you haven‘t made a subsequent

call to the CRTL stat() routine, because the device name is overwritten with each call. If this

warning appears, the name lookup failed, and the access checking routine gave up and returned

FALSE, just to be conservative. (Note: The access checking routine knows about the Perl stat

operator and file tests, so you shouldn‘t ever see this warning in response to a Perl command; it arises

only if some internal code takes stat buffers lightly.)

Can‘t get pipe mailbox device name

(P) An error peculiar to VMS. After creating a mailbox to act as a pipe, Perl can‘t retrieve its name for

later use.

Can‘t get SYSGEN parameter value for MAXBUF

(P) An error peculiar to VMS. Perl asked $GETSYI how big you want your mailbox buffers to be,

and didn‘t get an answer.

Can‘t goto subroutine outside a subroutine

(F) The deeply magical "goto subroutine" call can only replace one subroutine call for another. It can‘t

manufacture one out of whole cloth. In general you should be calling it out of only an AUTOLOAD

routine anyway. See goto.

Can‘t goto subroutine from an eval−string

(F) The "goto subroutine" call can‘t be used to jump out of an eval "string". (You can use it to jump

out of an eval {BLOCK}, but you probably don‘t want to.)

Can‘t localize through a reference

(F) You said something like local $$ref, which Perl can‘t currently handle, because when it goes

to restore the old value of whatever $ref pointed to after the scope of the local() is finished, it

can‘t be sure that $ref will still be a reference.

Can‘t localize lexical variable %s

(F) You used local on a variable name that was previously declared as a lexical variable using "my".

This is not allowed. If you want to localize a package variable of the same name, qualify it with the

package name.

Can‘t localize pseudo−hash element

(F) You said something like local $ar−>{‘key‘}, where $ar is a reference to a pseudo−hash.

That hasn‘t been implemented yet, but you can get a similar effect by localizing the corresponding

array element directly — local $ar−>[$ar−>[0]{‘key‘}].

386 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Can‘t locate auto/%s.al in @INC

(F) A function (or method) was called in a package which allows autoload, but there is no function to

autoload. Most probable causes are a misprint in a function/method name or a failure to AutoSplit

the file, say, by doing make install.

Can‘t locate %s in @INC

(F) You said to do (or require, or use) a file that couldn‘t be found in any of the libraries mentioned in

@INC. Perhaps you need to set the PERL5LIB or PERL5OPT environment variable to say where the

extra library is, or maybe the script needs to add the library name to @INC. Or maybe you just

misspelled the name of the file. See require.

Can‘t locate object method "%s" via package "%s"

(F) You called a method correctly, and it correctly indicated a package functioning as a class, but that

package doesn‘t define that particular method, nor does any of its base classes. See perlobj.

Can‘t locate package %s for @%s::ISA

(W) The @ISA array contained the name of another package that doesn‘t seem to exist.

Can‘t make list assignment to \%ENV on this system

(F) List assignment to %ENV is not supported on some systems, notably VMS.

Can‘t modify %s in %s

(F) You aren‘t allowed to assign to the item indicated, or otherwise try to change it, such as with an

auto−increment.

Can‘t modify nonexistent substring

(P) The internal routine that does assignment to a substr() was handed a NULL.

Can‘t msgrcv to read−only var

(F) The target of a msgrcv must be modifiable to be used as a receive buffer.

Can‘t open %s: %s

(S) The implicit opening of a file through use of the <> filehandle, either implicitly under the −n or −p

command−line switches, or explicitly, failed for the indicated reason. Usually this is because you

don‘t have read permission for a file which you named on the command line.

Can‘t open bidirectional pipe

(W) You tried to say open(CMD, "|cmd|"), which is not supported. You can try any of several

modules in the Perl library to do this, such as IPC::Open2. Alternately, direct the pipe‘s output to a file

using ">", and then read it in under a different file handle.

Can‘t open error file %s as stderr

(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file

specified after ‘2>’ or ‘2>>’ on the command line for writing.

Can‘t open input file %s as stdin

(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file

specified after ‘<’ on the command line for reading.

Can‘t open output file %s as stdout

(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file

specified after ‘>’ or ‘>>’ on the command line for writing.

Can‘t open output pipe (name: %s)

(P) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the pipe

into which to send data destined for stdout.

18−Oct−1998 Version 5.005_02 387

perldiag Perl Programmers Reference Guide perldiag

Can‘t open perl script "%s": %s

(F) The script you specified can‘t be opened for the indicated reason.

Can‘t redefine active sort subroutine %s

(F) Perl optimizes the internal handling of sort subroutines and keeps pointers into them. You tried to

redefine one such sort subroutine when it was currently active, which is not allowed. If you really

want to do this, you should write sort { &func } @x instead of sort func @x.

Can‘t rename %s to %s: %s, skipping file

(S) The rename done by the −i switch failed for some reason, probably because you don‘t have write

permission to the directory.

Can‘t reopen input pipe (name: %s) in binary mode

(P) An error peculiar to VMS. Perl thought stdin was a pipe, and tried to reopen it to accept binary

data. Alas, it failed.

Can‘t reswap uid and euid

(P) The setreuid() call failed for some reason in the setuid emulator of suidperl.

Can‘t return outside a subroutine

(F) The return statement was executed in mainline code, that is, where there was no subroutine call to

return out of. See perlsub.

Can‘t stat script "%s"

(P) For some reason you can‘t fstat() the script even though you have it open already. Bizarre.

Can‘t swap uid and euid

(P) The setreuid() call failed for some reason in the setuid emulator of suidperl.

Can‘t take log of %g

(F) For ordinary real numbers, you can‘t take the logarithm of a negative number or zero. There‘s a

Math::Complex package that comes standard with Perl, though, if you really want to do that for the

negative numbers.

Can‘t take sqrt of %g

(F) For ordinary real numbers, you can‘t take the square root of a negative number. There‘s a

Math::Complex package that comes standard with Perl, though, if you really want to do that.

Can‘t undef active subroutine

(F) You can‘t undefine a routine that‘s currently running. You can, however, redefine it while it‘s

running, and you can even undef the redefined subroutine while the old routine is running. Go figure.

Can‘t unshift

(F) You tried to unshift an "unreal" array that can‘t be unshifted, such as the main Perl stack.

Can‘t upgrade that kind of scalar

(P) The internal sv_upgrade routine adds "members" to an SV, making it into a more specialized kind

of SV. The top several SV types are so specialized, however, that they cannot be interconverted. This

message indicates that such a conversion was attempted.

Can‘t upgrade to undef

(P) The undefined SV is the bottom of the totem pole, in the scheme of upgradability. Upgrading to

undef indicates an error in the code calling sv_upgrade.

Can‘t use %%! because Errno.pm is not available

(F) The first time the %! hash is used, perl automatically loads the Errno.pm module. The Errno

module is expected to tie the %! hash to provide symbolic names for $! errno values.

388 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Can‘t use "my %s" in sort comparison

(F) The global variables $a and $b are reserved for sort comparisons. You mentioned $a or $b in the

same line as the <=> or cmp operator, and the variable had earlier been declared as a lexical variable.

Either qualify the sort variable with the package name, or rename the lexical variable.

Can‘t use %s for loop variable

(F) Only a simple scalar variable may be used as a loop variable on a foreach.

Can‘t use %s ref as %s ref

(F) You‘ve mixed up your reference types. You have to dereference a reference of the type needed.

You can use the ref() function to test the type of the reference, if need be.

Can‘t use \1 to mean $1 in expression

(W) In an ordinary expression, backslash is a unary operator that creates a reference to its argument.

The use of backslash to indicate a backreference to a matched substring is valid only as part of a

regular expression pattern. Trying to do this in ordinary Perl code produces a value that prints out

looking like SCALAR(0xdecaf). Use the $1 form instead.

Can‘t use bareword ("%s") as %s ref while \"strict refs\" in use

(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.

Can‘t use string ("%s") as %s ref while "strict refs" in use

(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.

Can‘t use an undefined value as %s reference

(F) A value used as either a hard reference or a symbolic reference must be a defined value. This helps

to delurk some insidious errors.

Can‘t use global %s in "my"

(F) You tried to declare a magical variable as a lexical variable. This is not allowed, because the

magic can be tied to only one location (namely the global variable) and it would be incredibly

confusing to have variables in your program that looked like magical variables but weren‘t.

Can‘t use subscript on %s

(F) The compiler tried to interpret a bracketed expression as a subscript. But to the left of the brackets

was an expression that didn‘t look like an array reference, or anything else subscriptable.

Can‘t x= to read−only value

(F) You tried to repeat a constant value (often the undefined value) with an assignment operator, which

implies modifying the value itself. Perhaps you need to copy the value to a temporary, and repeat that.

Cannot find an opnumber for "%s"

(F) A string of a form CORE::word was given to prototype(), but there is no builtin with the

name word.

Cannot resolve method ‘%s’ overloading ‘%s’ in package ‘%s’

(F|P) Error resolving overloading specified by a method name (as opposed to a subroutine reference):

no such method callable via the package. If method name is ???, this is an internal error.

Character class syntax [. .] is reserved for future extensions

(W) Within regular expression character classes ([]) the syntax beginning with "[." and ending with ".]"

is reserved for future extensions. If you need to represent those character sequences inside a regular

expression character class, just quote the square brackets with the backslash: "\[." and ".\]".

18−Oct−1998 Version 5.005_02 389

perldiag Perl Programmers Reference Guide perldiag

Character class syntax [: :] is reserved for future extensions

(W) Within regular expression character classes ([]) the syntax beginning with "[:" and ending with

":]" is reserved for future extensions. If you need to represent those character sequences inside a

regular expression character class, just quote the square brackets with the backslash: "\[:" and ":\]".

Character class syntax [= =] is reserved for future extensions

(W) Within regular expression character classes ([]) the syntax beginning with "[=" and ending with

"=]" is reserved for future extensions. If you need to represent those character sequences inside a

regular expression character class, just quote the square brackets with the backslash: "\[=" and "=\]".

chmod: mode argument is missing initial 0

(W) A novice will sometimes say

chmod 777, $filename

not realizing that 777 will be interpreted as a decimal number, equivalent to 01411. Octal constants

are introduced with a leading 0 in Perl, as in C.

Close on unopened file <%s>

(W) You tried to close a filehandle that was never opened.

Compilation failed in require

(F) Perl could not compile a file specified in a require statement. Perl uses this generic message

when none of the errors that it encountered were severe enough to halt compilation immediately.

Complex regular subexpression recursion limit (%d) exceeded

(W) The regular expression engine uses recursion in complex situations where back−tracking is

required. Recursion depth is limited to 32766, or perhaps less in architectures where the stack cannot

grow arbitrarily. ("Simple" and "medium" situations are handled without recursion and are not subject

to a limit.) Try shortening the string under examination; looping in Perl code (e.g. with while) rather

than in the regular expression engine; or rewriting the regular expression so that it is simpler or

backtracks less. (See perlbook for information on Mastering Regular Expressions.)

connect() on closed fd

(W) You tried to do a connect on a closed socket. Did you forget to check the return value of your

socket() call? See connect.

Constant subroutine %s redefined

(S) You redefined a subroutine which had previously been eligible for inlining. See

Constant Functions in perlsub for commentary and workarounds.

Constant subroutine %s undefined

(S) You undefined a subroutine which had previously been eligible for inlining. See

Constant Functions in perlsub for commentary and workarounds.

Copy method did not return a reference

(F) The method which overloads "=" is buggy. See Copy Constructor.

Corrupt malloc ptr 0x%lx at 0x%lx

(P) The malloc package that comes with Perl had an internal failure.

corrupted regexp pointers

(P) The regular expression engine got confused by what the regular expression compiler gave it.

corrupted regexp program

(P) The regular expression engine got passed a regexp program without a valid magic number.

390 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Deep recursion on subroutine "%s"

(W) This subroutine has called itself (directly or indirectly) 100 times more than it has returned. This

probably indicates an infinite recursion, unless you‘re writing strange benchmark programs, in which

case it indicates something else.

Delimiter for here document is too long

(F) In a here document construct like <<FOO, the label FOO is too long for Perl to handle. You have to

be seriously twisted to write code that triggers this error.

Did you mean &%s instead?

(W) You probably referred to an imported subroutine &FOO as $FOO or some such.

Did you mean $ or @ instead of %?

(W) You probably said %hash{$key} when you meant $hash{$key} or @hash{@keys}. On the

other hand, maybe you just meant %hash and got carried away.

Died

(F) You passed die() an empty string (the equivalent of die "") or you called it with no args and

both $@ and $_ were empty.

Do you need to predeclare %s?

(S) This is an educated guess made in conjunction with the message "%s found where operator

expected". It often means a subroutine or module name is being referenced that hasn‘t been declared

yet. This may be because of ordering problems in your file, or because of a missing "sub", "package",

"require", or "use" statement. If you‘re referencing something that isn‘t defined yet, you don‘t actually

have to define the subroutine or package before the current location. You can use an empty "sub foo;"

or "package FOO;" to enter a "forward" declaration.

Don‘t know how to handle magic of type ‘%s’

(P) The internal handling of magical variables has been cursed.

do_study: out of memory

(P) This should have been caught by safemalloc() instead.

Duplicate free() ignored

(S) An internal routine called free() on something that had already been freed.

elseif should be elsif

(S) There is no keyword "elseif" in Perl because Larry thinks it‘s ugly. Your code will be interpreted

as an attempt to call a method named "elseif" for the class returned by the following block. This is

unlikely to be what you want.

END failed—cleanup aborted

(F) An untrapped exception was raised while executing an END subroutine. The interpreter is

immediately exited.

Error converting file specification %s

(F) An error peculiar to VMS. Because Perl may have to deal with file specifications in either VMS or

Unix syntax, it converts them to a single form when it must operate on them directly. Either you‘ve

passed an invalid file specification to Perl, or you‘ve found a case the conversion routines don‘t

handle. Drat.

%s: Eval−group in insecure regular expression

(F) Perl detected tainted data when trying to compile a regular expression that contains the (?{ ...

}) zero−width assertion, which is unsafe. See (?{ code }), and perlsec.

18−Oct−1998 Version 5.005_02 391

perldiag Perl Programmers Reference Guide perldiag

%s: Eval−group not allowed, use re ‘eval’

(F) A regular expression contained the (?{ ... }) zero−width assertion, but that construct is only

allowed when the use re ‘eval’ pragma is in effect. See (?{ code }).

%s: Eval−group not allowed at run time

(F) Perl tried to compile a regular expression containing the (?{ ... }) zero−width assertion at run

time, as it would when the pattern contains interpolated values. Since that is a security risk, it is not

allowed. If you insist, you may still do this by explicitly building the pattern from an interpolated

string at run time and using that in an eval(). See (?{ code }).

Excessively long < operator

(F) The contents of a < operator may not exceed the maximum size of a Perl identifier. If you‘re just

trying to glob a long list of filenames, try using the glob() operator, or put the filenames into a

variable and glob that.

Execution of %s aborted due to compilation errors

(F) The final summary message when a Perl compilation fails.

Exiting eval via %s

(W) You are exiting an eval by unconventional means, such as a goto, or a loop control statement.

Exiting pseudo−block via %s

(W) You are exiting a rather special block construct (like a sort block or subroutine) by unconventional

means, such as a goto, or a loop control statement. See sort.

Exiting subroutine via %s

(W) You are exiting a subroutine by unconventional means, such as a goto, or a loop control statement.

Exiting substitution via %s

(W) You are exiting a substitution by unconventional means, such as a return, a goto, or a loop control

statement.

Explicit blessing to ‘’ (assuming package main)

(W) You are blessing a reference to a zero length string. This has the effect of blessing the reference

into the package main. This is usually not what you want. Consider providing a default target

package, e.g. bless($ref, $p or ‘MyPackage’);

Fatal VMS error at %s, line %d

(P) An error peculiar to VMS. Something untoward happened in a VMS system service or RTL

routine; Perl‘s exit status should provide more details. The filename in "at %s" and the line number in

"line %d" tell you which section of the Perl source code is distressed.

fcntl is not implemented

(F) Your machine apparently doesn‘t implement fcntl(). What is this, a PDP−11 or something?

Filehandle %s never opened

(W) An I/O operation was attempted on a filehandle that was never initialized. You need to do an

open() or a socket() call, or call a constructor from the FileHandle package.

Filehandle %s opened for only input

(W) You tried to write on a read−only filehandle. If you intended it to be a read−write filehandle, you

needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to

write the file, use ">" or ">>". See open.

Filehandle opened for only input

(W) You tried to write on a read−only filehandle. If you intended it to be a read−write filehandle, you

needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to

392 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

write the file, use ">" or ">>". See open.

Final $ should be \$ or $name

(F) You must now decide whether the final $ in a string was meant to be a literal dollar sign, or was

meant to introduce a variable name that happens to be missing. So you have to put either the backslash

or the name.

Final @ should be \@ or @name

(F) You must now decide whether the final @ in a string was meant to be a literal "at" sign, or was

meant to introduce a variable name that happens to be missing. So you have to put either the backslash

or the name.

Format %s redefined

(W) You redefined a format. To suppress this warning, say

{

local $^W = 0;

eval "format NAME =...";

}

Format not terminated

(F) A format must be terminated by a line with a solitary dot. Perl got to the end of your file without

finding such a line.

Found = in conditional, should be ==

(W) You said

if ($foo = 123)

when you meant

if ($foo == 123)

(or something like that).

gdbm store returned %d, errno %d, key "%s"

(S) A warning from the GDBM_File extension that a store failed.

gethostent not implemented

(F) Your C library apparently doesn‘t implement gethostent(), probably because if it did, it‘d feel

morally obligated to return every hostname on the Internet.

get{sock,peer}name() on closed fd

(W) You tried to get a socket or peer socket name on a closed socket. Did you forget to check the

return value of your socket() call?

getpwnam returned invalid UIC %#o for user "%s"

(S) A warning peculiar to VMS. The call to sys$getuai underlying the getpwnam operator

returned an invalid UIC.

Glob not terminated

(F) The lexer saw a left angle bracket in a place where it was expecting a term, so it‘s looking for the

corresponding right angle bracket, and not finding it. Chances are you left some needed parentheses

out earlier in the line, and you really meant a "less than".

Global symbol "%s" requires explicit package name

(F) You‘ve said "use strict vars", which indicates that all variables must either be lexically scoped

(using "my"), or explicitly qualified to say which package the global variable is in (using "::").

18−Oct−1998 Version 5.005_02 393

perldiag Perl Programmers Reference Guide perldiag

goto must have label

(F) Unlike with "next" or "last", you‘re not allowed to goto an unspecified destination. See goto.

Had to create %s unexpectedly

(S) A routine asked for a symbol from a symbol table that ought to have existed already, but for some

reason it didn‘t, and had to be created on an emergency basis to prevent a core dump.

Hash %%s missing the % in argument %d of %s()

(D) Really old Perl let you omit the % on hash names in some spots. This is now heavily deprecated.

Identifier too long

(F) Perl limits identifiers (names for variables, functions, etc.) to about 250 characters for simple

names, and somewhat more for compound names (like $A::B). You‘ve exceeded Perl‘s limits.

Future versions of Perl are likely to eliminate these arbitrary limitations.

Ill−formed logical name |%s| in prime_env_iter

(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over

%ENV which violates the syntactic rules governing logical names. Because it cannot be translated

normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some

software packages might directly modify logical name tables and introduce nonstandard names, or it

may indicate that a logical name table has been corrupted.

Illegal character %s (carriage return)

(F) A carriage return character was found in the input. This is an error, and not a warning, because

carriage return characters can break multi−line strings, including here documents (e.g., print

<<EOF;).

Under Unix, this error is usually caused by executing Perl code — either the main program, a module,

or an eval‘d string — that was transferred over a network connection from a non−Unix system without

properly converting the text file format.

Under systems that use something other than ‘\n’ to delimit lines of text, this error can also be caused

by reading Perl code from a file handle that is in binary mode (as set by the binmode operator).

In either case, the Perl code in question will probably need to be converted with something like

s/\x0D\x0A?/\n/g before it can be executed.

Illegal division by zero

(F) You tried to divide a number by 0. Either something was wrong in your logic, or you need to put a

conditional in to guard against meaningless input.

Illegal modulus zero

(F) You tried to divide a number by 0 to get the remainder. Most numbers don‘t take to this kindly.

Illegal octal digit

(F) You used an 8 or 9 in a octal number.

Illegal octal digit ignored

(W) You may have tried to use an 8 or 9 in a octal number. Interpretation of the octal number stopped

before the 8 or 9.

Illegal hex digit ignored

(W) You may have tried to use a character other than 0 − 9 or A − F in a hexadecimal number.

Interpretation of the hexadecimal number stopped before the illegal character.

Illegal switch in PERL5OPT: %s

(X) The PERL5OPT environment variable may only be used to set the following switches:

−[DIMUdmw].

394 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

In string, @%s now must be written as \@%s

(F) It used to be that Perl would try to guess whether you wanted an array interpolated or a literal @. It

did this when the string was first used at runtime. Now strings are parsed at compile time, and

ambiguous instances of @ must be disambiguated, either by prepending a backslash to indicate a

literal, or by declaring (or using) the array within the program before the string (lexically). (Someday

it will simply assume that an unbackslashed @ interpolates an array.)

Insecure dependency in %s

(F) You tried to do something that the tainting mechanism didn‘t like. The tainting mechanism is

turned on when you‘re running setuid or setgid, or when you specify −T to turn it on explicitly. The

tainting mechanism labels all data that‘s derived directly or indirectly from the user, who is considered

to be unworthy of your trust. If any such data is used in a "dangerous" operation, you get this error.

See perlsec for more information.

Insecure directory in %s

(F) You can‘t use system(), exec(), or a piped open in a setuid or setgid script if $ENV{PATH}

contains a directory that is writable by the world. See perlsec.

Insecure $ENV{%s} while running %s

(F) You can‘t use system(), exec(), or a piped open in a setuid or setgid script if any of

$ENV{PATH}, $ENV{IFS}, $ENV{CDPATH}, $ENV{ENV} or $ENV{BASH_ENV} are derived

from data supplied (or potentially supplied) by the user. The script must set the path to a known value,

using trustworthy data. See perlsec.

Integer overflow in hex number

(S) The literal hex number you have specified is too big for your architecture. On a 32−bit architecture

the largest hex literal is 0xFFFFFFFF.

Integer overflow in octal number

(S) The literal octal number you have specified is too big for your architecture. On a 32−bit

architecture the largest octal literal is 037777777777.

Internal inconsistency in tracking vforks

(S) A warning peculiar to VMS. Perl keeps track of the number of times you‘ve called fork and

exec, to determine whether the current call to exec should affect the current script or a subprocess

(see exec). Somehow, this count has become scrambled, so Perl is making a guess and treating this

exec as a request to terminate the Perl script and execute the specified command.

internal disaster in regexp

(P) Something went badly wrong in the regular expression parser.

internal error: glob failed

(P) Something went wrong with the external program(s) used for glob and <*.c>. This may mean

that your csh (C shell) is broken. If so, you should change all of the csh−related variables in config.sh:

If you have tcsh, make the variables refer to it as if it were csh (e.g.

full_csh=‘/usr/bin/tcsh’); otherwise, make them all empty (except that d_csh should be

‘undef’) so that Perl will think csh is missing. In either case, after editing config.sh, run

./Configure −S and rebuild Perl.

internal urp in regexp at /%s/

(P) Something went badly awry in the regular expression parser.

invalid [] range in regexp

(F) The range specified in a character class had a minimum character greater than the maximum

character. See perlre.

18−Oct−1998 Version 5.005_02 395

perldiag Perl Programmers Reference Guide perldiag

Invalid conversion in %s: "%s"

(W) Perl does not understand the given format conversion. See sprintf.

Invalid type in pack: ‘%s’

(F) The given character is not a valid pack type. See pack. (W) The given character is not a valid pack

type but used to be silently ignored.

Invalid type in unpack: ‘%s’

(F) The given character is not a valid unpack type. See unpack. (W) The given character is not a valid

unpack type but used to be silently ignored.

ioctl is not implemented

(F) Your machine apparently doesn‘t implement ioctl(), which is pretty strange for a machine that

supports C.

junk on end of regexp

(P) The regular expression parser is confused.

Label not found for "last %s"

(F) You named a loop to break out of, but you‘re not currently in a loop of that name, not even if you

count where you were called from. See last.

Label not found for "next %s"

(F) You named a loop to continue, but you‘re not currently in a loop of that name, not even if you

count where you were called from. See last.

Label not found for "redo %s"

(F) You named a loop to restart, but you‘re not currently in a loop of that name, not even if you count

where you were called from. See last.

listen() on closed fd

(W) You tried to do a listen on a closed socket. Did you forget to check the return value of your

socket() call? See listen.

Method for operation %s not found in package %s during blessing

(F) An attempt was made to specify an entry in an overloading table that doesn‘t resolve to a valid

subroutine. See overload.

Might be a runaway multi−line %s string starting on line %d

(S) An advisory indicating that the previous error may have been caused by a missing delimiter on a

string or pattern, because it eventually ended earlier on the current line.

Misplaced _ in number

(W) An underline in a decimal constant wasn‘t on a 3−digit boundary.

Missing $ on loop variable

(F) Apparently you‘ve been programming in csh too much. Variables are always mentioned with the $

in Perl, unlike in the shells, where it can vary from one line to the next.

Missing comma after first argument to %s function

(F) While certain functions allow you to specify a filehandle or an "indirect object" before the

argument list, this ain‘t one of them.

Missing operator before %s?

(S) This is an educated guess made in conjunction with the message "%s found where operator

expected". Often the missing operator is a comma.

396 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

Missing right bracket

(F) The lexer counted more opening curly brackets (braces) than closing ones. As a general rule, you‘ll

find it‘s missing near the place you were last editing.

Modification of a read−only value attempted

(F) You tried, directly or indirectly, to change the value of a constant. You didn‘t, of course, try "2 =

1", because the compiler catches that. But an easy way to do the same thing is:

sub mod { $_[0] = 1 }

mod(2);

Another way is to assign to a substr() that‘s off the end of the string.

Modification of non−creatable array value attempted, subscript %d

(F) You tried to make an array value spring into existence, and the subscript was probably negative,

even counting from end of the array backwards.

Modification of non−creatable hash value attempted, subscript "%s"

(P) You tried to make a hash value spring into existence, and it couldn‘t be created for some peculiar

reason.

Module name must be constant

(F) Only a bare module name is allowed as the first argument to a "use".

msg%s not implemented

(F) You don‘t have System V message IPC on your system.

Multidimensional syntax %s not supported

(W) Multidimensional arrays aren‘t written like $foo[1,2,3]. They‘re written like

$foo[1][2][3], as in C.

Name "%s::%s" used only once: possible typo

(W) Typographical errors often show up as unique variable names. If you had a good reason for having

a unique name, then just mention it again somehow to suppress the message. The use vars pragma

is provided for just this purpose.

Negative length

(F) You tried to do a read/write/send/recv operation with a buffer length that is less than 0. This is

difficult to imagine.

nested *?+ in regexp

(F) You can‘t quantify a quantifier without intervening parentheses. So things like ** or +* or ?* are

illegal.

Note, however, that the minimal matching quantifiers, *?, +?, and ?? appear to be nested quantifiers,

but aren‘t. See perlre.

No #! line

(F) The setuid emulator requires that scripts have a well−formed #! line even on machines that don‘t

support the #! construct.

No %s allowed while running setuid

(F) Certain operations are deemed to be too insecure for a setuid or setgid script to even be allowed to

attempt. Generally speaking there will be another way to do what you want that is, if not secure, at

least securable. See perlsec.

No −e allowed in setuid scripts

(F) A setuid script can‘t be specified by the user.

18−Oct−1998 Version 5.005_02 397

perldiag Perl Programmers Reference Guide perldiag

No comma allowed after %s

(F) A list operator that has a filehandle or "indirect object" is not allowed to have a comma between

that and the following arguments. Otherwise it‘d be just another one of the arguments.

One possible cause for this is that you expected to have imported a constant to your name space with

use or import while no such importing took place, it may for example be that your operating system

does not support that particular constant. Hopefully you did use an explicit import list for the constants

you expect to see, please see use and import. While an explicit import list would probably have caught

this error earlier it naturally does not remedy the fact that your operating system still does not support

that constant. Maybe you have a typo in the constants of the symbol import list of use or import or in

the constant name at the line where this error was triggered?

No command into which to pipe on command line

(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘|’ at the

end of the command line, so it doesn‘t know where you want to pipe the output from this command.

No DB::DB routine defined

(F) The currently executing code was compiled with the −d switch, but for some reason the perl5db.pl

file (or some facsimile thereof) didn‘t define a routine to be called at the beginning of each statement.

Which is odd, because the file should have been required automatically, and should have blown up the

require if it didn‘t parse right.

No dbm on this machine

(P) This is counted as an internal error, because every machine should supply dbm nowadays, because

Perl comes with SDBM. See SDBM_File.

No DBsub routine

(F) The currently executing code was compiled with the −d switch, but for some reason the perl5db.pl

file (or some facsimile thereof) didn‘t define a DB::sub routine to be called at the beginning of each

ordinary subroutine call.

No error file after 2> or 2>> on command line

(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘2>’ or a

‘2>>’ on the command line, but can‘t find the name of the file to which to write data destined for

stderr.

No input file after < on command line

(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘<’ on the

command line, but can‘t find the name of the file from which to read data for stdin.

No output file after > on command line

(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a lone ‘>’ at

the end of the command line, so it doesn‘t know where you wanted to redirect stdout.

No output file after > or >> on command line

(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘>’ or a

‘>>’ on the command line, but can‘t find the name of the file to which to write data destined for stdout.

No Perl script found in input

(F) You called perl −x, but no line was found in the file beginning with #! and containing the word

"perl".

No setregid available

(F) Configure didn‘t find anything resembling the setregid() call for your system.

398 Version 5.005_02 18−Oct−1998

perldiag Perl Programmers Reference Guide perldiag

No setreuid available

(F) Configure didn‘t find anything resembling the setreuid() call for your system.

No space allowed after −I

(F) The argument to −I must follow the −I immediately with no intervening space.

No such array field

(F) You tried to access an array as a hash, but the field name used is not defined. The hash at index 0

should map all valid field names to array indices for that to work.

No such field "%s" in variable %s of type %s

(F) You tried to access a field of a typed variable where the type does not know about the field name.

The field names are looked up in the %FIELDS hash in the type package at compile time. The

%FIELDS hash is usually set up with the ‘fields’ pragma.

No such pipe open

(P) An error peculiar to VMS. The internal routine my_pclose() tried to close a pipe which hadn‘t

been opened. This should have been caught earlier as an attempt to close an unopened filehandle.

No such signal: SIG%s

(W) You specified a signal name as a subscript to %SIG that was not recognized. Say kill −l in

your shell to see the valid signal names on your system.

Not a CODE reference

(F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference

to something else instead. You can use the ref() function to find out what kind of ref it really was.

SEE ALSO

There‘s a lot more to networking than this, but this should get you started.

For intrepid programmers, the indispensable textbook is Unix Network Programming by W. Richard Stevens

(published by Addison−Wesley). Note that most books on networking address networking from the

perspective of a C programmer; translation to Perl is left as an exercise for the reader.

The IO::Socket(3) manpage describes the object library, and the Socket(3) manpage describes the low−level

interface to sockets. Besides the obvious functions in perlfunc, you should also check out the modules file at

your nearest CPAN site. (See perlmodlib or best yet, the Perl FAQ for a description of what CPAN is and

where to get it.)

Section 5 of the modules file is devoted to "Networking, Device Control (modems), and Interprocess

Communication", and contains numerous unbundled modules numerous networking modules, Chat and

Expect operations, CGI programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet,

Threads, and ToolTalk—just to name a few.

18−Oct−1998 Version 5.005_02 443

perlsec Perl Programmers Reference Guide perlsec

NAME

perlsec − Perl security

DESCRIPTION

Perl is designed to make it easy to program securely even when running with extra privileges, like setuid or

setgid programs. Unlike most command line shells, which are based on multiple substitution passes on each

line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags. Additionally,

because the language has more builtin functionality, it can rely less upon external (and possibly

untrustworthy) programs to accomplish its purposes.

Perl automatically enables a set of special security checks, called taint mode, when it detects its program

running with differing real and effective user or group IDs. The setuid bit in Unix permissions is mode

04000, the setgid bit mode 02000; either or both may be set. You can also enable taint mode explicitly by

using the −T command line flag. This flag is strongly suggested for server programs and any program run on

behalf of someone else, such as a CGI script. Once taint mode is on, it‘s on for the remainder of your script.

While in this mode, Perl takes special precautions called taint checks to prevent both obvious and subtle

traps. Some of these checks are reasonably simple, such as verifying that path directories aren‘t writable by

others; careful programmers have always used checks like these. Other checks, however, are best supported

by the language itself, and it is these checks especially that contribute to making a set−id Perl program more

secure than the corresponding C program.

You may not use data derived from outside your program to affect something else outside your program—at

least, not by accident. All command line arguments, environment variables, locale information (see

perllocale), results of certain system calls (readdir, readlink, the gecos field of getpw* calls), and all file

input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that

invokes a sub−shell, nor in any command that modifies files, directories, or processes. (Important

exception: If you pass a list of arguments to either system or exec, the elements of that list are NOT

checked for taintedness.) Any variable set to a value derived from tainted data will itself be tainted, even if it

is logically impossible for the tainted data to alter the variable. Because taintedness is associated with each

scalar value, some elements of an array can be tainted and others not.

For example:

$arg = shift; # $arg is tainted

$hid = $arg, ’bar’; # $hid is also tainted

$line = <>; # Tainted

$line = <STDIN>; # Also tainted

open FOO, "/home/me/bar" or die $!;

$line = <FOO>; # Still tainted

$path = $ENV{’PATH’}; # Tainted, but see below

$data = ’abc’; # Not tainted

system "echo $arg"; # Insecure

system "/bin/echo", $arg; # Secure (doesn’t use sh)

system "echo $hid"; # Insecure

system "echo $data"; # Insecure until PATH set

$path = $ENV{’PATH’}; # $path now tainted

$ENV{’PATH’} = ’/bin:/usr/bin’;

delete @ENV{’IFS’, ’CDPATH’, ’ENV’, ’BASH_ENV’};

$path = $ENV{’PATH’}; # $path now NOT tainted

system "echo $data"; # Is secure now!

open(FOO, "< $arg"); # OK − read−only file

open(FOO, "> $arg"); # Not OK − trying to write

444 Version 5.005_02 18−Oct−1998

perlsec Perl Programmers Reference Guide perlsec

open(FOO,"echo $arg|");# Not OK, but...

open(FOO,"−|")

or exec ’echo’, $arg; # OK

$shout = ‘echo $arg‘; # Insecure, $shout now tainted

unlink $data, $arg; # Insecure

umask $arg; # Insecure

exec "echo $arg"; # Insecure

exec "echo", $arg; # Secure (doesn’t use the shell)

exec "sh", ’−c’, $arg; # Considered secure, alas!

@files = <*.c>; # Always insecure (uses csh)

@files = glob(’*.c’); # Always insecure (uses csh)

If you try to do something insecure, you will get a fatal error saying something like "Insecure dependency"

or "Insecure $ENV{PATH}". Note that you can still write an insecure system or exec, but only by

explicitly doing something like the "considered secure" example above.

Laundering and Detecting Tainted Data

To test whether a variable contains tainted data, and whose use would thus trigger an "Insecure dependency"

message, check your nearby CPAN mirror for the Taint.pm module, which should become available around

November 1997. Or you may be able to use the following

is_tainted()

function.

sub is_tainted {

return ! eval {

join(’’,@_), kill 0;

};

}

This function makes use of the fact that the presence of tainted data anywhere within an expression renders

the entire expression tainted. It would be inefficient for every operator to test every argument for

taintedness. Instead, the slightly more efficient and conservative approach is used that if any tainted value

has been accessed within the same expression, the whole expression is considered tainted.

But testing for taintedness gets you only so far. Sometimes you have just to clear your data‘s taintedness.

The only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression

match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were

doing when you wrote the pattern. That means using a bit of thought—don‘t just blindly untaint anything, or

you defeat the entire mechanism. It‘s better to verify that the variable has only good characters (for certain

values of "good") rather than checking whether it has any bad characters. That‘s because it‘s far too easy to

miss bad characters that you never thought of.

Here‘s a test to make sure that the data contains nothing but "word" characters (alphabetics, numerics, and

underscores), a hyphen, an at sign, or a dot.

if ($data =~ /^([−\@\w.]+)$/) {

$data = $1; # $data now untainted

} else {

die "Bad data in $data"; # log this somewhere

}

This is fairly secure because /\w+/ doesn‘t normally match shell metacharacters, nor are dot, dash, or at

going to mean something special to the shell. Use of /.+/ would have been insecure in theory because it

lets everything through, but Perl doesn‘t check for that. The lesson is that when untainting, you must be

exceedingly careful with your patterns. Laundering data using regular expression is the ONLY mechanism for

untainting dirty data, unless you use the strategy detailed below to fork a child of lesser privilege.

18−Oct−1998 Version 5.005_02 445

perlsec Perl Programmers Reference Guide perlsec

The example does not untaint $data if use locale is in effect, because the characters matched by \w

are determined by the locale. Perl considers that locale definitions are untrustworthy because they contain

data from outside the program. If you are writing a locale−aware program, and want to launder data with a

regular expression containing \w, put no locale ahead of the expression in the same block. See

SECURITY for further discussion and examples.

Switches On the "#!" Line

When you make a script executable, in order to make it usable as a command, the system will pass switches

to perl from the script‘s #! line. Perl checks that any command line switches given to a setuid (or setgid)

script actually match the ones set on the #! line. Some Unix and Unix−like environments impose a

one−switch limit on the #! line, so you may need to use something like −wU instead of −w −U under such

systems. (This issue should arise only in Unix or Unix−like environments that support #! and setuid or

setgid scripts.)

Cleaning Up Your Path

For "Insecure $ENV{PATH}" messages, you need to set $ENV{‘PATH‘} to a known value, and each

directory in the path must be non−writable by others than its owner and group. You may be surprised to get

this message even if the pathname to your executable is fully qualified. This is not generated because you

didn‘t supply a full path to the program; instead, it‘s generated because you never set your PATH

environment variable, or you didn‘t set it to something that was safe. Because Perl can‘t guarantee that the

executable in question isn‘t itself going to turn around and execute some other program that is dependent on

your PATH, it makes sure you set the PATH.

The PATH isn‘t the only environment variable which can cause problems. Because some shells may use the

variables IFS, CDPATH, ENV, and BASH_ENV, Perl checks that those are either empty or untainted when

starting subprocesses. You may wish to add something like this to your setid and taint−checking scripts.

delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer

It‘s also possible to get into trouble with other operations that don‘t care whether they use tainted values.

Make judicious use of the file tests in dealing with any user−supplied filenames. When possible, do opens

and such after properly dropping any special user (or group!) privileges. Perl doesn‘t prevent you from

opening tainted filenames for reading, so be careful what you print out. The tainting mechanism is intended

to prevent stupid mistakes, not to remove the need for thought.

Perl does not call the shell to expand wild cards when you pass system and exec explicit parameter lists

instead of strings with possible shell wildcards in them. Unfortunately, the open, glob, and backtick

functions provide no such alternate calling convention, so more subterfuge will be required.

Perl provides a reasonably safe way to open a file or pipe from a setuid or setgid program: just create a child

process with reduced privilege who does the dirty work for you. First, fork a child using the special open

syntax that connects the parent and child by a pipe. Now the child resets its ID set and any other per−process

attributes, like environment variables, umasks, current working directories, back to the originals or known

safe values. Then the child process, which no longer has any special permissions, does the open or other

system call. Finally, the child passes the data it managed to access back to the parent. Because the file or

pipe was opened in the child while running under less privilege than the parent, it‘s not apt to be tricked into

doing something it shouldn‘t.

Here‘s a way to do backticks reasonably safely. Notice how the exec is not called with a string that the shell

could expand. This is by far the best way to call something that might be subjected to shell escapes: just

never call the shell at all.

use English;

die "Can’t fork: $!" unless defined $pid = open(KID, "−|");

if ($pid) { # parent

while (<KID>) {

# do something

}

close KID;

446 Version 5.005_02 18−Oct−1998

perlsec Perl Programmers Reference Guide perlsec

} else {

my @temp = ($EUID, $EGID);

$EUID = $UID;

$EGID = $GID; # initgroups() also called!

# Make sure privs are really gone

($EUID, $EGID) = @temp;

die "Can’t drop privileges"

unless $UID == $EUID && $GID eq $EGID;

$ENV{PATH} = "/bin:/usr/bin";

exec ’myprog’, ’arg1’, ’arg2’

or die "can’t exec myprog: $!";

}

A similar strategy would work for wildcard expansion via glob, although you can use readdir instead.

Taint checking is most useful when although you trust yourself not to have written a program to give away

the farm, you don‘t necessarily trust those who end up using it not to try to trick it into doing something bad.

This is the kind of security checking that‘s useful for set−id programs and programs launched on someone

else‘s behalf, like CGI programs.

This is quite different, however, from not even trusting the writer of the code not to try to do something evil.

That‘s the kind of trust needed when someone hands you a program you‘ve never seen before and says,

"Here, run this." For that kind of safety, check out the Safe module, included standard in the Perl

distribution. This module allows the programmer to set up special compartments in which all system

operations are trapped and namespace access is carefully controlled.

Security Bugs

Beyond the obvious problems that stem from giving special privileges to systems as flexible as scripts, on

many versions of Unix, set−id scripts are inherently insecure right from the start. The problem is a race

condition in the kernel. Between the time the kernel opens the file to see which interpreter to run and when

the (now−set−id) interpreter turns around and reopens the file to interpret it, the file in question may have

changed, especially if you have symbolic links on your system.

Fortunately, sometimes this kernel "feature" can be disabled. Unfortunately, there are two ways to disable it.

The system can simply outlaw scripts with any set−id bit set, which doesn‘t help much. Alternately, it can

simply ignore the set−id bits on scripts. If the latter is true, Perl can emulate the setuid and setgid

mechanism when it notices the otherwise useless setuid/gid bits on Perl scripts. It does this via a special

executable called suidperl that is automatically invoked for you if it‘s needed.

However, if the kernel set−id script feature isn‘t disabled, Perl will complain loudly that your set−id script is

insecure. You‘ll need to either disable the kernel set−id script feature, or put a C wrapper around the script.

A C wrapper is just a compiled program that does nothing except call your Perl program. Compiled

programs are not subject to the kernel bug that plagues set−id scripts. Here‘s a simple wrapper, written in C:

#define REAL_PATH "/path/to/script"

main(ac, av)

char **av;

{

execv(REAL_PATH, av);

}

Compile this wrapper into a binary executable and then make it rather than your script setuid or setgid.

See the program wrapsuid in the eg directory of your Perl distribution for a convenient way to do this

automatically for all your setuid Perl programs. It moves setuid scripts into files with the same name plus a

leading dot, and then compiles a wrapper like the one above for each of them.

In recent years, vendors have begun to supply systems free of this inherent security bug. On such systems,

18−Oct−1998 Version 5.005_02 447

perlsec Perl Programmers Reference Guide perlsec

when the kernel passes the name of the set−id script to open to the interpreter, rather than using a pathname

subject to meddling, it instead passes /dev/fd/3. This is a special file already opened on the script, so that

there can be no race condition for evil scripts to exploit. On these systems, Perl should be compiled with

−DSETUID_SCRIPTS_ARE_SECURE_NOW. The Configure program that builds Perl tries to figure this

out for itself, so you should never have to specify this yourself. Most modern releases of SysVr4 and BSD

4.4 use this approach to avoid the kernel race condition.

Prior to release 5.003 of Perl, a bug in the code of suidperl could introduce a security hole in systems

compiled with strict POSIX compliance.

Protecting Your Programs

There are a number of ways to hide the source to your Perl programs, with varying levels of "security".

First of all, however, you can‘t take away read permission, because the source code has to be readable in

order to be compiled and interpreted. (That doesn‘t mean that a CGI script‘s source is readable by people on

the web, though.) So you have to leave the permissions at the socially friendly 0755 level. This lets people

on your local system only see your source.

Some people mistakenly regard this as a security problem. If your program does insecure things, and relies

on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to

determine the insecure things and exploit them without viewing the source. Security through obscurity, the

name for hiding your bugs instead of fixing them, is little security indeed.

You can try using encryption via source filters (Filter::* from CPAN). But crackers might be able to decrypt

it. You can try using the byte code compiler and interpreter described below, but crackers might be able to

de−compile it. You can try using the native−code compiler described below, but crackers might be able to

disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can

definitively conceal it (this is true of every language, not just Perl).

If you‘re concerned about people profiting from your code, then the bottom line is that nothing but a

restrictive licence will give you legal security. License your software and pepper it with threatening

statements like "This is unpublished proprietary software of XYZ Corp. Your access to it does not give you

permission to use it blah blah blah." You should see a lawyer to be sure your licence‘s wording will stand up

in court.

SEE ALSO

perlrun for its description of cleaning up environment variables.

448 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

NAME

perltrap − Perl traps for the unwary

DESCRIPTION

The biggest trap of all is forgetting to use the −w switch; see perlrun. The second biggest trap is not making

your entire program runnable under use strict. The third biggest trap is not reading the list of changes

in this version of Perl; see perldelta.

Awk Traps

Accustomed awk users should take special note of the following:

The English module, loaded via

use English;

allows you to refer to special variables (like $/) with names (like $RS), as though they were in awk;

see perlvar for details.

Semicolons are required after all simple statements in Perl (except at the end of a block). Newline is

not a statement delimiter.

Curly brackets are required on ifs and whiles.

Variables begin with "$", "@" or "%" in Perl.

Arrays index from 0. Likewise string positions in substr() and index().

You have to decide whether your array has numeric or string indices.

Hash values do not spring into existence upon mere reference.

You have to decide whether you want to use string or numeric comparisons.

Reading an input line does not split it for you. You get to split it to an array yourself. And the

split() operator has different arguments than awk‘s.

The current input line is normally in $_, not $0. It generally does not have the newline stripped.

($0 is the name of the program executed.) See perlvar.

digit

> does not refer to fields—it refers to substrings matched by the last match pattern.

The print() statement does not add field and record separators unless you set $, and $\. You can

set $OFS and $ORS if you‘re using the English module.

You must open your files before you print to them.

The range operator is "..", not comma. The comma operator works as in C.

The match operator is "=~", not "~". ("~" is the one‘s complement operator, as in C.)

The exponentiation operator is "**", not "^". "^" is the XOR operator, as in C. (You know, one could

get the feeling that awk is basically incompatible with C.)

The concatenation operator is ".", not the null string. (Using the null string would render /pat/

/pat/ unparsable, because the third slash would be interpreted as a division operator—the tokenizer

is in fact slightly context sensitive for operators like "/", "?", and ">". And in fact, "." itself can be the

beginning of a number.)

The next, exit, and continue keywords work differently.

The following variables work differently:

Awk Perl

ARGC $#ARGV or scalar @ARGV

ARGV[0] $0

18−Oct−1998 Version 5.005_02 449

perltrap Perl Programmers Reference Guide perltrap

FILENAME $ARGV

FNR$. − something

FS(whatever you like)

NF$#Fld, or some such

NR$.

OFMT$#

OFS$,

ORS$\

RLENGTH length($&)

RS$/

RSTART length($‘)

SUBSEP $;

You cannot set $RS to a pattern, only a string.

When in doubt, run the awk construct through a2p and see what it gives you.

C Traps

Cerebral C programmers should take note of the following:

Curly brackets are required on if‘s and while‘s.

You must use elsif rather than else if.

The break and continue keywords from C become in Perl last and next, respectively. Unlike

in C, these do NOT work within a do { } while construct.

There‘s no switch statement. (But it‘s easy to build one on the fly.)

Variables begin with "$", "@" or "%" in Perl.

printf() does not implement the "*" format for interpolating field widths, but it‘s trivial to use

interpolation of double−quoted strings to achieve the same effect.

Comments begin with "#", not "/*".

You can‘t take the address of anything, although a similar operator in Perl is the backslash, which

creates a reference.

ARGV must be capitalized. $ARGV[0] is C‘s argv[1], and argv[0] ends up in $0.

System calls such as link(), unlink(), rename(), etc. return nonzero for success, not 0.

Signal handlers deal with signal names, not numbers. Use kill −l to find their names on your

system.

Sed Traps

Seasoned sed programmers should take note of the following:

Backreferences in substitutions use "$" rather than "\".

The pattern matching metacharacters "(", ")", and "|" do not have backslashes in front.

The range operator is ..., rather than comma.

Shell Traps

Sharp shell programmers should take note of the following:

The backtick operator does variable interpolation without regard to the presence of single quotes in the

command.

The backtick operator does no translation of the return value, unlike csh.

450 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

Shells (especially csh) do several levels of substitution on each command line. Perl does substitution

in only certain constructs such as double quotes, backticks, angle brackets, and search patterns.

Shells interpret scripts a little bit at a time. Perl compiles the entire program before executing it

(except for BEGIN blocks, which execute at compile time).

The arguments are available via @ARGV, not $1, $2, etc.

The environment is not automatically made available as separate scalar variables.

Perl Traps

Practicing Perl Programmers should take note of the following:

Remember that many operations behave differently in a list context than they do in a scalar one. See

perldata for details.

Avoid barewords if you can, especially all lowercase ones. You can‘t tell by just looking at it whether

a bareword is a function or a string. By using quotes on strings and parentheses on function calls, you

won‘t ever get them confused.

You cannot discern from mere inspection which builtins are unary operators (like chop() and

chdir()) and which are list operators (like print() and unlink()). (User−defined subroutines

can be only list operators, never unary ones.) See perlop.

People have a hard time remembering that some functions default to $_, or @ARGV, or whatever,

but that others which you might expect to do not.

The <FH> construct is not the name of the filehandle, it is a readline operation on that handle. The

data read is assigned to $_ only if the file read is the sole condition in a while loop:

while (<FH>) { }

while (defined($_ = <FH>)) { }..

<FH>; # data discarded!

Remember not to use "=" when you need "=~"; these two constructs are quite different:

$x = /foo/;

$x =~ /foo/;

The do {} construct isn‘t a real loop that you can use loop control on.

Use my() for local variables whenever you can get away with it (but see perlform for where you

can‘t). Using local() actually gives a local value to a global variable, which leaves you open to

unforeseen side−effects of dynamic scoping.

If you localize an exported variable in a module, its exported value will not change. The local name

becomes an alias to a new value but the external name is still an alias for the original.

Perl4 to Perl5 Traps

Practicing Perl4 Programmers should take note of the following Perl4−to−Perl5 specific traps.

They‘re crudely ordered according to the following list:

Discontinuance, Deprecation, and BugFix traps

Anything that‘s been fixed as a perl4 bug, removed as a perl4 feature or deprecated as a perl4 feature

with the intent to encourage usage of some other perl5 feature.

Parsing Traps

Traps that appear to stem from the new parser.

Numerical Traps

Traps having to do with numerical or mathematical operators.

18−Oct−1998 Version 5.005_02 451

perltrap Perl Programmers Reference Guide perltrap

General data type traps

Traps involving perl standard data types.

Context Traps − scalar, list contexts

Traps related to context within lists, scalar statements/declarations.

Precedence Traps

Traps related to the precedence of parsing, evaluation, and execution of code.

General Regular Expression Traps using s///, etc.

Traps related to the use of pattern matching.

Subroutine, Signal, Sorting Traps

Traps related to the use of signals and signal handlers, general subroutines, and sorting, along with

sorting subroutines.

OS Traps

OS−specific traps.

DBM Traps

Traps specific to the use of dbmopen(), and specific dbm implementations.

Unclassified Traps

Everything else.

If you find an example of a conversion trap that is not listed here, please submit it to Bill Middleton

<wjm@best.com for inclusion. Also note that at least some of these can be caught with −w.

Discontinuance, Deprecation, and BugFix traps

Anything that has been discontinued, deprecated, or fixed as a bug from perl4.

Discontinuance

Symbols starting with "_" are no longer forced into package main, except for $_ itself (and @_, etc.).

package test;

$_legacy = 1;

package main;

print "\$_legacy is ",$_legacy,"\n";

# perl4 prints: $_legacy is 1

# perl5 prints: $_legacy is

Deprecation

Double−colon is now a valid package separator in a variable name. Thus these behave differently in

perl4 vs. perl5, because the packages don‘t exist.

$a=1;$b=2;$c=3;$var=4;

print "$a::$b::$c ";

print "$var::abc::xyz\n";

# perl4 prints: 1::2::3 4::abc::xyz

# perl5 prints: 3

Given that :: is now the preferred package delimiter, it is debatable whether this should be classed as

a bug or not. (The older package delimiter, ’ ,is used here)

$x = 10 ;

print "x=${’x}\n" ;

# perl4 prints: x=10

452 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

# perl5 prints: Can’t find string terminator "’" anywhere before EOF

You can avoid this problem, and remain compatible with perl4, if you always explicitly include the

package name:

$x = 10 ;

print "x=${main’x}\n" ;

Also see precedence traps, for parsing $:.

BugFix

The second and third arguments of splice() are now evaluated in scalar context (as the Camel says)

rather than list context.

sub sub1{return(0,2) } # return a 2−element list

sub sub2{ return(1,2,3)} # return a 3−element list

@a1 = ("a","b","c","d","e");

@a2 = splice(@a1,&sub1,&sub2);

print join(’ ’,@a2),"\n";

# perl4 prints: a b

# perl5 prints: c d e

Discontinuance

You can‘t do a goto into a block that is optimized away. Darn.

goto marker1;

for(1){

marker1:

print "Here I is!\n";

}

# perl4 prints: Here I is!

# perl5 dumps core (SEGV)

Discontinuance

It is no longer syntactically legal to use whitespace as the name of a variable, or as a delimiter for any

kind of quote construct. Double darn.

$a = ("foo bar");

$b = q baz ;

print "a is $a, b is $b\n";

# perl4 prints: a is foo bar, b is baz

# perl5 errors: Bareword found where operator expected

Discontinuance

The archaic while/if BLOCK BLOCK syntax is no longer supported.

if { 1 } {

print "True!";

}

else {

print "False!";

}

# perl4 prints: True!

# perl5 errors: syntax error at test.pl line 1, near "if {"

18−Oct−1998 Version 5.005_02 453

perltrap Perl Programmers Reference Guide perltrap

BugFix

The ** operator now binds more tightly than unary minus. It was documented to work this way before,

but didn‘t.

print −4**2,"\n";

# perl4 prints: 16

# perl5 prints: −16

Discontinuance

The meaning of foreach{} has changed slightly when it is iterating over a list which is not an array.

This used to assign the list to a temporary array, but no longer does so (for efficiency). This means

that you‘ll now be iterating over the actual values, not over copies of the values. Modifications to the

loop variable can change the original values.

@list = (’ab’,’abc’,’bcd’,’def’);

foreach $var (grep(/ab/,@list)){

$var = 1;

}

print (join(’:’,@list));

# perl4 prints: ab:abc:bcd:def

# perl5 prints: 1:1:bcd:def

To retain Perl4 semantics you need to assign your list explicitly to a temporary array and then iterate

over that. For example, you might need to change

foreach $var (grep(/ab/,@list)){

foreach $var (@tmp = grep(/ab/,@list)){

Otherwise changing $var will clobber the values of @list. (This most often happens when you use

$_ for the loop variable, and call subroutines in the loop that don‘t properly localize $_.)

Discontinuance

split with no arguments now behaves like split ’ ’ (which doesn‘t return an initial null field if

$_ starts with whitespace), it used to behave like split /\s+/ (which does).

$_ = ’ hi mom’;

print join(’:’, split);

# perl4 prints: :hi:mom

# perl5 prints: hi:mom

BugFix

Perl 4 would ignore any text which was attached to an −e switch, always taking the code snippet from

the following arg. Additionally, it would silently accept an −e switch without a following arg. Both of

these behaviors have been fixed.

perl −e’print "attached to −e"’ ’print "separate arg"’

# perl4 prints: separate arg

# perl5 prints: attached to −e

perl −e

# perl4 prints:

# perl5 dies: No code specified for −e.

454 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

Discontinuance

In Perl 4 the return value of push was undocumented, but it was actually the last value being pushed

onto the target list. In Perl 5 the return value of push is documented, but has changed, it is the

number of elements in the resulting list.

@x = (’existing’);

print push(@x, ’first new’, ’second new’);

# perl4 prints: second new

# perl5 prints: 3

Discontinuance

In Perl 4 (and versions of Perl 5 before 5.004), ‘\r’ characters in Perl code were silently allowed,

although they could cause (mysterious!) failures in certain constructs, particularly here documents.

Now, ‘\r’ characters cause an immediate fatal error. (Note: In this example, the notation \015

represents the incorrect line ending. Depending upon your text viewer, it will look different.)

print "foo";\015

print "bar";

# perl4 prints: foobar

# perl5.003 prints: foobar

# perl5.004 dies: Illegal character \015 (carriage return)

See perldiag for full details.

Deprecation

Some error messages will be different.

Discontinuance

Some bugs may have been inadvertently removed. :−)

Parsing Traps

Perl4−to−Perl5 traps from having to do with parsing.

Parsing

Note the space between . and =

$string . = "more string";

print $string;

# perl4 prints: more string

# perl5 prints: syntax error at − line 1, near ". ="

Parsing

Better parsing in perl 5

sub foo {}

&foo

print("hello, world\n");

# perl4 prints: hello, world

# perl5 prints: syntax error

Parsing

"if it looks like a function, it is a function" rule.

($foo == 1) ? "is one\n" : "is zero\n";

# perl4 prints: is zero

18−Oct−1998 Version 5.005_02 455

perltrap Perl Programmers Reference Guide perltrap

# perl5 warns: "Useless use of a constant in void context" if using −w

Parsing

String interpolation of the $#array construct differs when braces are to used around the name.

@ = (1..3);

print "${#a}";

# perl4 prints: 2

# perl5 fails with syntax error

@ = (1..3);

print "$#{a}";

# perl4 prints: {a}

# perl5 prints: 2

Numerical Traps

Perl4−to−Perl5 traps having to do with numerical operators, operands, or output from same.

Numerical

Formatted output and significant digits

print 7.373504 − 0, "\n";

printf "%20.18f\n", 7.373504 − 0;

# Perl4 prints:

7.375039999999996141

7.37503999999999614

# Perl5 prints:

7.373504

7.37503999999999614

Numerical

This specific item has been deleted. It demonstrated how the auto−increment operator would not

catch when a number went over the signed int limit. Fixed in version 5.003_04. But always be wary

when using large integers. If in doubt:

use Math::BigInt;

Numerical

Assignment of return values from numeric equality tests does not work in perl5 when the test

evaluates to false (0). Logical tests now return an null, instead of 0

$p = ($test == 1);

print $p,"\n";

# perl4 prints: 0

# perl5 prints:

Also see , etc." for another example of this new feature...

General data type traps

Perl4−to−Perl5 traps involving most data−types, and their usage within certain expressions and/or context.

(Arrays)

Negative array subscripts now count from the end of the array.

@a = (1, 2, 3, 4, 5);

print "The third element of the array is $a[3] also expressed as $a[−2] \n";

# perl4 prints: The third element of the array is 4 also expressed as

456 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

# perl5 prints: The third element of the array is 4 also expressed as 4

(Arrays)

Setting $#array lower now discards array elements, and makes them impossible to recover.

@a = (a,b,c,d,e);

print "Before: ",join(’’,@a);

$#a =1;

print ", After: ",join(’’,@a);

$#a =3;

print ", Recovered: ",join(’’,@a),"\n";

# perl4 prints: Before: abcde, After: ab, Recovered: abcd

# perl5 prints: Before: abcde, After: ab, Recovered: ab

(Hashes)

Hashes get defined before use

local($s,@a,%h);

die "scalar \$s defined" if defined($s);

die "array \@a defined" if defined(@a);

die "hash \%h defined" if defined(%h);

# perl4 prints:

# perl5 dies: hash %h defined

(Globs)

glob assignment from variable to variable will fail if the assigned variable is localized subsequent to

the assignment

@a = ("This is Perl 4");

*b = *a;

local(@a);

print @b,"\n";

# perl4 prints: This is Perl 4

# perl5 prints:

(Globs)

Assigning undef to a glob has no effect in Perl 5. In Perl 4 it undefines the associated scalar (but

may have other side effects including SEGVs).

(Scalar String)

Changes in unary negation (of strings) This change effects both the return value and what it does to

auto(magic)increment.

$x = "aaa";

print ++$x," : ";

print −$x," : ";

print ++$x,"\n";

# perl4 prints: aab : −0 : 1

# perl5 prints: aab : −aab : aac

(Constants)

perl 4 lets you modify constants:

$foo = "x";

&mod($foo);

for ($x = 0; $x < 3; $x++) {

&mod("a");

18−Oct−1998 Version 5.005_02 457

perltrap Perl Programmers Reference Guide perltrap

}

sub mod {

print "before: $_[0]";

$_[0] = "m";

print " after: $_[0]\n";

}

# perl4:

# before: x after: m

# before: a after: m

# before: m after: m

# Perl5:

# before: x after: m

# Modification of a read−only value attempted at foo.pl line 12.

# before: a

(Scalars)

The behavior is slightly different for:

print "$x", defined $x

# perl 4: 1

# perl 5: <no output, $x is not called into existence>

(Variable Suicide)

Variable suicide behavior is more consistent under Perl 5. Perl5 exhibits the same behavior for hashes

and scalars, that perl4 exhibits for only scalars.

$aGlobal{ "aKey" } = "global value";

print "MAIN:", $aGlobal{"aKey"}, "\n";

$GlobalLevel = 0;

&test( *aGlobal );

sub test {

local( *theArgument ) = @_;

local( %aNewLocal ); # perl 4 != 5.001l,m

$aNewLocal{"aKey"} = "this should never appear";

print "SUB: ", $theArgument{"aKey"}, "\n";

$aNewLocal{"aKey"} = "level $GlobalLevel"; # what should print

$GlobalLevel++;

if( $GlobalLevel<4 ) {

&test( *aNewLocal );

}

# Perl4:

# MAIN:global value

# SUB: global value

# SUB: level 0

# SUB: level 1

# SUB: level 2

# Perl5:

# MAIN:global value

# SUB: global value

# SUB: this should never appear

458 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

# SUB: this should never appear

Context Traps − scalar, list contexts

(list context)

The elements of argument lists for formats are now evaluated in list context. This means you can

interpolate list values now.

@fmt = ("foo","bar","baz");

format STDOUT=

@<<<<< @||||| @>>>>>

@fmt;

write;

# perl4 errors: Please use commas to separate fields in file

# perl5 prints: foo bar baz

(scalar context)

The caller() function now returns a false value in a scalar context if there is no caller. This lets

library files determine if they‘re being required.

caller() ? (print "You rang?\n") : (print "Got a 0\n");

# perl4 errors: There is no caller

# perl5 prints: Got a 0

(scalar context)

The comma operator in a scalar context is now guaranteed to give a scalar context to its arguments.

@y= (’a’,’b’,’c’);

$x = (1, 2, @y);

print "x = $x\n";

# Perl4 prints: x = c # Thinks list context interpolates list

# Perl5 prints: x = 3 # Knows scalar uses length of list

(list, builtin)

sprintf() funkiness (array argument converted to scalar array count) This test could be added to

t/op/sprintf.t

@z = (’%s%s’, ’foo’, ’bar’);

$x = sprintf(@z);

if ($x eq ’foobar’) {print "ok 2\n";} else {print "not ok 2 ’$x’\n";}

# perl4 prints: ok 2

# perl5 prints: not ok 2

printf() works fine, though:

printf STDOUT (@z);

print "\n";

# perl4 prints: foobar

# perl5 prints: foobar

Probably a bug.

Precedence Traps

Perl4−to−Perl5 traps involving precedence order.

Perl 4 has almost the same precedence rules as Perl 5 for the operators that they both have. Perl 4 however,

seems to have had some inconsistencies that made the behavior differ from what was documented.

18−Oct−1998 Version 5.005_02 459

perltrap Perl Programmers Reference Guide perltrap

Precedence

LHS vs. RHS of any assignment operator. LHS is evaluated first in perl4, second in perl5; this can

affect the relationship between side−effects in sub−expressions.

@arr = ( ’left’, ’right’ );

$a{shift @arr} = shift @arr;

print join( ’ ’, keys %a );

# perl4 prints: left

# perl5 prints: right

Precedence

These are now semantic errors because of precedence:

@list = (1,2,3,4,5);

%map = ("a",1,"b",2,"c",3,"d",4);

$n = shift @list + 2; # first item in list plus 2

print "n is $n, ";

$m = keys %map + 2; # number of items in hash plus 2

print "m is $m\n";

# perl4 prints: n is 3, m is 6

# perl5 errors and fails to compile

Precedence

The precedence of assignment operators is now the same as the precedence of assignment. Perl 4

mistakenly gave them the precedence of the associated operator. So you now must parenthesize them

in expressions like

/foo/ ? ($a += 2) : ($a −= 2);

Otherwise

/foo/ ? $a += 2 : $a −= 2

would be erroneously parsed as

(/foo/ ? $a += 2 : $a) −= 2;

On the other hand,

$a += /foo/ ? 1 : 2;

now works as a C programmer would expect.

Precedence

open FOO || die;

is now incorrect. You need parentheses around the filehandle. Otherwise, perl5 leaves the statement

as its default precedence:

open(FOO || die);

# perl4 opens or dies

# perl5 errors: Precedence problem: open FOO should be open(FOO)

Precedence

perl4 gives the special variable, $: precedence, where perl5 treats $:: as main package

$a = "x"; print "$::a";

# perl 4 prints: −:a

# perl 5 prints: x

460 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

Precedence

perl4 had buggy precedence for the file test operators vis−a−vis the assignment operators. Thus,

although the precedence table for perl4 leads one to believe −e $foo .= "q" should parse as

((−e $foo) .= "q"), it actually parses as (−e ($foo .= "q")). In perl5, the precedence

is as documented.

−e $foo .= "q"

# perl4 prints: no output

# perl5 prints: Can’t modify −e in concatenation

Precedence

In perl4, keys(), each() and values() were special high−precedence operators that operated

on a single hash, but in perl5, they are regular named unary operators. As documented, named unary

operators have lower precedence than the arithmetic and concatenation operators + − ., but the

perl4 variants of these operators actually bind tighter than + − .. Thus, for:

%foo = 1..10;

print keys %foo − 1

# perl4 prints: 4

# perl5 prints: Type of arg 1 to keys must be hash (not subtraction)

The perl4 behavior was probably more useful, if less consistent.

General Regular Expression Traps using s///, etc.

All types of RE traps.

Regular Expression

s‘$lhs‘$rhs’ now does no interpolation on either side. It used to interpolate $lhs but not

$rhs. (And still does not match a literal ‘$’ in string)

$a=1;$b=2;

$string = ’1 2 $a $b’;

$string =~ s’$a’$b’;

print $string,"\n";

# perl4 prints: $b 2 $a $b

# perl5 prints: 1 2 $a $b

Regular Expression

m//g now attaches its state to the searched string rather than the regular expression. (Once the scope

of a block is left for the sub, the state of the searched string is lost)

$_ = "ababab";

while(m/ab/g){

&doit("blah");

}

sub doit{local($_) = shift; print "Got $_ "}

# perl4 prints: blah blah blah

# perl5 prints: infinite loop blah...

Regular Expression

Currently, if you use the m//o qualifier on a regular expression within an anonymous sub, all

closures generated from that anonymous sub will use the regular expression as it was compiled when

it was used the very first time in any such closure. For instance, if you say

sub build_match {

my($left,$right) = @_;

18−Oct−1998 Version 5.005_02 461

perltrap Perl Programmers Reference Guide perltrap

return sub { $_[0] =~ /$left stuff $right/o; };

}

build_match() will always return a sub which matches the contents of $left and $right as

they were the first time that build_match() was called, not as they are in the current call.

This is probably a bug, and may change in future versions of Perl.

Regular Expression

If no parentheses are used in a match, Perl4 sets $+ to the whole match, just like $&. Perl5 does not.

"abcdef" =~ /b.*e/;

print "\$+ = $+\n";

# perl4 prints: bcde

# perl5 prints:

Regular Expression

substitution now returns the null string if it fails

$string = "test";

$value = ($string =~ s/foo//);

print $value, "\n";

# perl4 prints: 0

# perl5 prints:

Also see Numerical Traps for another example of this new feature.

Regular Expression

s‘lhs‘rhs‘ (using backticks) is now a normal substitution, with no backtick expansion

$string = "";

$string =~ s‘^‘hostname‘;

print $string, "\n";

# perl4 prints: <the local hostname>

# perl5 prints: hostname

Regular Expression

Stricter parsing of variables used in regular expressions

s/^([^$grpc]*$grpc[$opt$plus$rep]?)//o;

# perl4: compiles w/o error

# perl5: with Scalar found where operator expected ..., near "$opt$plus"

an added component of this example, apparently from the same script, is the actual value of the s‘d

string after the substitution. [$opt] is a character class in perl4 and an array subscript in perl5

$grpc = ’a’;

$opt = ’r’;

$_ = ’bar’;

s/^([^$grpc]*$grpc[$opt]?)/foo/;

print ;

# perl4 prints: foo

# perl5 prints: foobar

Regular Expression

Under perl5, m?x? matches only once, like ?x?. Under perl4, it matched repeatedly, like /x/ or

m!x!.

462 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

$test = "once";

sub match { $test =~ m?once?; }

&match();

if( &match() ) {

# m?x? matches more then once

print "perl4\n";

} else {

# m?x? matches only once

print "perl5\n";

}

# perl4 prints: perl4

# perl5 prints: perl5

Subroutine, Signal, Sorting Traps

The general group of Perl4−to−Perl5 traps having to do with Signals, Sorting, and their related subroutines,

as well as general subroutine traps. Includes some OS−Specific traps.

(Signals)

Barewords that used to look like strings to Perl will now look like subroutine calls if a subroutine by

that name is defined before the compiler sees them.

sub SeeYa { warn"Hasta la vista, baby!" }

$SIG{’TERM’} = SeeYa;

print "SIGTERM is now $SIG{’TERM’}\n";

# perl4 prints: SIGTERM is main’SeeYa

# perl5 prints: SIGTERM is now main::1

Use −w to catch this one

(Sort Subroutine)

reverse is no longer allowed as the name of a sort subroutine.

sub reverse{ print "yup "; $a <=> $b }

print sort reverse a,b,c;

# perl4 prints: yup yup yup yup abc

# perl5 prints: abc

warn() won‘t let you specify a filehandle.

Although it _always_ printed to STDERR, warn() would let you specify a filehandle in perl4.

With perl5 it does not.

warn STDERR "Foo!";

# perl4 prints: Foo!

# perl5 prints: String found where operator expected

OS Traps

(SysV)

Under HPUX, and some other SysV OSes, one had to reset any signal handler, within the signal

handler function, each time a signal was handled with perl4. With perl5, the reset is now done

correctly. Any code relying on the handler _not_ being reset will have to be reworked.

Since version 5.002, Perl uses sigaction() under SysV.

sub gotit {

print "Got @_... ";

}

$SIG{’INT’} = ’gotit’;

18−Oct−1998 Version 5.005_02 463

perltrap Perl Programmers Reference Guide perltrap

$| = 1;

$pid = fork;

if ($pid) {

kill(’INT’, $pid);

sleep(1);

kill(’INT’, $pid);

} else {

while (1) {sleep(10);}

}

# perl4 (HPUX) prints: Got INT...

# perl5 (HPUX) prints: Got INT... Got INT...

(SysV)

Under SysV OSes, seek() on a file opened to append >> now does the right thing w.r.t. the

fopen() manpage. e.g., − When a file is opened for append, it is impossible to overwrite

information already in the file.

open(TEST,">>seek.test");

$start = tell TEST ;

foreach(1 .. 9){

print TEST "$_ ";

}

$end = tell TEST ;

seek(TEST,$start,0);

print TEST "18 characters here";

# perl4 (solaris) seek.test has: 18 characters here

# perl5 (solaris) seek.test has: 1 2 3 4 5 6 7 8 9 18 characters here

Interpolation Traps

Perl4−to−Perl5 traps having to do with how things get interpolated within certain expressions, statements,

contexts, or whatever.

Interpolation

@ now always interpolates an array in double−quotish strings.

print "To: someone@somewhere.com\n";

# perl4 prints: To:someone@somewhere.com

# perl5 errors : In string, @somewhere now must be written as \@somewhere

Interpolation

Double−quoted strings may no longer end with an unescaped $ or @.

$foo = "foo$";

$bar = "bar@";

print "foo is $foo, bar is $bar\n";

# perl4 prints: foo is foo$, bar is bar@

# perl5 errors: Final $ should be \$ or $name

Note: perl5 DOES NOT error on the terminating @ in $bar

Interpolation

Perl now sometimes evaluates arbitrary expressions inside braces that occur within double quotes

(usually when the opening brace is preceded by $ or @).

@www = "buz";

$foo = "foo";

$bar = "bar";

464 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

sub foo { return "bar" };

print "|@{w.w.w}|${main’foo}|";

# perl4 prints: |@{w.w.w}|foo|

# perl5 prints: |buz|bar|

Note that you can use strict; to ward off such trappiness under perl5.

Interpolation

The construct "this is $$x" used to interpolate the pid at that point, but now apparently tries to

dereference $x. $$ by itself still works fine, however.

print "this is $$x\n";

# perl4 prints: this is XXXx (XXX is the current pid)

# perl5 prints: this is

Interpolation

Creation of hashes on the fly with eval "EXPR" now requires either both $‘s to be protected in

the specification of the hash name, or both curlies to be protected. If both curlies are protected, the

result will be compatible with perl4 and perl5. This is a very common practice, and should be

changed to use the block form of eval{} if possible.

$hashname = "foobar";

$key = "baz";

$value = 1234;

eval "\$$hashname{’$key’} = q|$value|";

(defined($foobar{’baz’})) ? (print "Yup") : (print "Nope");

# perl4 prints: Yup

# perl5 prints: Nope

Changing

eval "\$$hashname{’$key’} = q|$value|";

eval "\$\$hashname{’$key’} = q|$value|";

causes the following result:

# perl4 prints: Nope

# perl5 prints: Yup

or, changing to

eval "\$$hashname\{’$key’\} = q|$value|";

causes the following result:

# perl4 prints: Yup

# perl5 prints: Yup

# and is compatible for both versions

Interpolation

perl4 programs which unconsciously rely on the bugs in earlier perl versions.

perl −e ’$bar=q/not/; print "This is $foo{$bar} perl5"’

# perl4 prints: This is not perl5

# perl5 prints: This is perl5

18−Oct−1998 Version 5.005_02 465

perltrap Perl Programmers Reference Guide perltrap

Interpolation

You also have to be careful about array references.

print "$foo{"

perl 4 prints: {

perl 5 prints: syntax error

Interpolation

Similarly, watch out for:

$foo = "array";

print "\$$foo{bar}\n";

# perl4 prints: $array{bar}

# perl5 prints: $

Perl 5 is looking for $array{bar} which doesn‘t exist, but perl 4 is happy just to expand $foo to

"array" by itself. Watch out for this especially in eval‘s.

Interpolation

qq() string passed to eval

eval qq(

foreach \$y (keys %\$x\) {

\$count++;

}

);

# perl4 runs this ok

# perl5 prints: Can’t find string terminator ")"

DBM Traps

General DBM traps.

DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script,

run under perl5, to fail. The build of perl5 must have been linked with the same dbm/ndbm as the

default for dbmopen() to function properly without tie‘ing to an extension dbm implementation.

dbmopen (%dbm, "file", undef);

print "ok\n";

# perl4 prints: ok

# perl5 prints: ok (IFF linked with −ldbm or −lndbm)

DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script,

run under perl5, to fail. The error generated when exceeding the limit on the key/value size will

cause perl5 to exit immediately.

dbmopen(DB, "testdb",0600) || die "couldn’t open db! $!";

$DB{’trap’} = "x" x 1024; # value too large for most dbm/ndbm

print "YUP\n";

# perl4 prints:

dbm store returned −1, errno 28, key "trap" at − line 3.

YUP

# perl5 prints:

dbm store returned −1, errno 28, key "trap" at − line 3.

466 Version 5.005_02 18−Oct−1998

perltrap Perl Programmers Reference Guide perltrap

Unclassified Traps

Everything else.

require/do trap using returned value

If the file doit.pl has:

sub foo {

$rc = do "./do.pl";

return 8;

}

print &foo, "\n";

And the do.pl file has the following single line:

return 3;

Running doit.pl gives the following:

# perl 4 prints: 3 (aborts the subroutine early)

# perl 5 prints: 8

Same behavior if you replace do with require.

split on empty string with LIMIT specified

$string = ’’;

@list = split(/foo/, $string, 2)

Perl4 returns a one element list containing the empty string but Perl5 returns an empty list.

As always, if any of these are ever officially declared as bugs, they‘ll be fixed and removed.

18−Oct−1998 Version 5.005_02 467

perlstyle Perl Programmers Reference Guide perlstyle

NAME

perlstyle − Perl style guide

DESCRIPTION

Each programmer will, of course, have his or her own preferences in regards to formatting, but there are

some general guidelines that will make your programs easier to read, understand, and maintain.

The most important thing is to run your programs under the −w flag at all times. You may turn it off

explicitly for particular portions of code via the $^W variable if you must. You should also always run under

use strict or know the reason why not. The use sigtrap and even use diagnostics pragmas

may also prove useful.

Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly

brace of a multi−line BLOCK should line up with the keyword that started the construct. Beyond that, he has

other preferences that aren‘t so strong:

4−column indent.

Opening curly on same line as keyword, if possible, otherwise line up.

Space before the opening curly of a multi−line BLOCK.

One−line BLOCK may be put on one line, including curlies.

No space before the semicolon.

Semicolon omitted in "short" one−line BLOCK.

Space around most operators.

Space around a "complex" subscript (inside brackets).

Blank lines between chunks that do different things.

Uncuddled elses.

No space between function name and its opening parenthesis.

Space after each comma.

Long lines broken after an operator (except "and" and "or").

Space after last parenthesis matching on current line.

Line up corresponding items vertically.

Omit redundant punctuation as long as clarity doesn‘t suffer.

Larry has his reasons for each of these things, but he doesn‘t claim that everyone else‘s mind works the same

as his does.

Here are some other more substantive style issues to think about:

Just because you CAN do something a particular way doesn‘t mean that you SHOULD do it that way.

Perl is designed to give you several ways to do anything, so consider picking the most readable one.

For instance

open(FOO,$foo) || die "Can’t open $foo: $!";

is better than

die "Can’t open $foo: $!" unless open(FOO,$foo);

because the second way hides the main point of the statement in a modifier. On the other hand

print "Starting analysis\n" if $verbose;

468 Version 5.005_02 18−Oct−1998

perlstyle Perl Programmers Reference Guide perlstyle

is better than

$verbose && print "Starting analysis\n";

because the main point isn‘t whether the user typed −v or not.

Similarly, just because an operator lets you assume default arguments doesn‘t mean that you have to

make use of the defaults. The defaults are there for lazy systems programmers writing one−shot

programs. If you want your program to be readable, consider supplying the argument.

Along the same lines, just because you CAN omit parentheses in many places doesn‘t mean that you

ought to:

return print reverse sort num values %array;

return print(reverse(sort num (values(%array))));

When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in

vi.

Even if you aren‘t in doubt, consider the mental welfare of the person who has to maintain the code

after you, and who will probably put parentheses in the wrong place.

Don‘t go through silly contortions to exit a loop at the top or the bottom, when Perl provides the last

operator so you can exit in the middle. Just "outdent" it a little to make it more visible:

LINE:

for (;;) {

statements;

last LINE if $foo;

next LINE if /^#/;

statements;

}

Don‘t be afraid to use loop labels—they‘re there to enhance readability as well as to allow multilevel

loop breaks. See the previous example.

Avoid using grep() (or map()) or ‘backticks‘ in a void context, that is, when you just throw away

their return values. Those functions all have return values, so use them. Otherwise use a foreach()

loop or the system() function instead.

For portability, when using features that may not be implemented on every machine, test the construct

in an eval to see if it fails. If you know what version or patchlevel a particular feature was

implemented, you can test $] ($PERL_VERSION in English) to see if it will be there. The

Config module will also let you interrogate values determined by the Configure program when Perl

was installed.

Choose mnemonic identifiers. If you can‘t remember what mnemonic means, you‘ve got a problem.

While short identifiers like $gotit are probably ok, use underscores to separate words. It is

generally easier to read $var_names_like_this than $VarNamesLikeThis, especially for

non−native speakers of English. It‘s also a simple rule that works consistently with

VAR_NAMES_LIKE_THIS.

Package names are sometimes an exception to this rule. Perl informally reserves lowercase module

names for "pragma" modules like integer and strict. Other modules should begin with a capital

letter and use mixed case, but probably without underscores due to limitations in primitive file

systems’ representations of module names as files that must fit into a few sparse bytes.

You may find it helpful to use letter case to indicate the scope or nature of a variable. For example:

$ALL_CAPS_HERE constants only (beware clashes with perl vars!)

$Some_Caps_Here package−wide global/static

$no_caps_here function scope my() or local() variables

18−Oct−1998 Version 5.005_02 469

perlstyle Perl Programmers Reference Guide perlstyle

Function and method names seem to work best as all lowercase. E.g., $obj−>as_string().

You can use a leading underscore to indicate that a variable or function should not be used outside the

package that defined it.

If you have a really hairy regular expression, use the /x modifier and put in some whitespace to make

it look a little less like line noise. Don‘t use slash as a delimiter when your regexp has slashes or

backslashes.

Use the new "and" and "or" operators to avoid having to parenthesize list operators so much, and to

reduce the incidence of punctuation operators like && and ||. Call your subroutines as if they were

functions or list operators to avoid excessive ampersands and parentheses.

Use here documents instead of repeated print() statements.

Line up corresponding things vertically, especially if it‘d be too long to fit on one line anyway.

$IDX = $ST_MTIME;

$IDX = $ST_ATIME if $opt_u;

$IDX = $ST_CTIME if $opt_c;

$IDX = $ST_SIZE if $opt_s;

mkdir $tmpdir, 0700 or die "can’t mkdir $tmpdir: $!";

chdir($tmpdir) or die "can’t chdir $tmpdir: $!";

mkdir ’tmp’, 0777 or die "can’t mkdir $tmpdir/tmp: $!";

Always check the return codes of system calls. Good error messages should go to STDERR, include

which program caused the problem, what the failed system call and arguments were, and (VERY

IMPORTANT) should contain the standard system error message for what went wrong. Here‘s a

simple but sufficient example:

opendir(D, $dir) or die "can’t opendir $dir: $!";

Line up your transliterations when it makes sense:

tr [abc]

[xyz];

Think about reusability. Why waste brainpower on a one−shot when you might want to do something

like it again? Consider generalizing your code. Consider writing a module or object class. Consider

making your code run cleanly with use strict and −w in effect. Consider giving away your code.

Consider changing your whole world view. Consider... oh, never mind.

Be consistent.

Be nice.

470 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

NAME

perlxs − XS language reference manual

DESCRIPTION

Introduction

XS is a language used to create an extension interface between Perl and some C library which one wishes to

use with Perl. The XS interface is combined with the library to create a new library which can be linked to

Perl. An XSUB is a function in the XS language and is the core component of the Perl application interface.

The XS compiler is called xsubpp. This compiler will embed the constructs necessary to let an XSUB,

which is really a C function in disguise, manipulate Perl values and creates the glue necessary to let Perl

access the XSUB. The compiler uses typemaps to determine how to map C function parameters and

variables to Perl values. The default typemap handles many common C types. A supplement typemap must

be created to handle special structures and types for the library being linked.

See perlxstut for a tutorial on the whole extension creation process.

Note: For many extensions, Dave Beazley‘s SWIG system provides a significantly more convenient

mechanism for creating the XS glue code. See http://www.cs.utah.edu/~beazley/SWIG for more information.

On The Road

Many of the examples which follow will concentrate on creating an interface between Perl and the ONC+

RPC bind library functions. The rpcb_gettime() function is used to demonstrate many features of the

XS language. This function has two parameters; the first is an input parameter and the second is an output

parameter. The function also returns a status value.

bool_t rpcb_gettime(const char *host, time_t *timep);

From C this function will be called with the following statements.

#include <rpc/rpc.h>

bool_t status;

time_t timep;

status = rpcb_gettime( "localhost", &timep );

If an XSUB is created to offer a direct translation between this function and Perl, then this XSUB will be

used from Perl with the following code. The $status and $timep variables will contain the output of the

function.

use RPC;

$status = rpcb_gettime( "localhost", $timep );

The following XS file shows an XS subroutine, or XSUB, which demonstrates one possible interface to the

rpcb_gettime() function. This XSUB represents a direct translation between C and Perl and so

preserves the interface even from Perl. This XSUB will be invoked from Perl with the usage shown above.

Note that the first three #include statements, for EXTERN.h, perl.h, and XSUB.h, will always be present

at the beginning of an XS file. This approach and others will be expanded later in this document.

#include "EXTERN.h"

#include "perl.h"

#include "XSUB.h"

#include <rpc/rpc.h>

MODULE = RPC PACKAGE = RPC

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

18−Oct−1998 Version 5.005_02 471

perlxs Perl Programmers Reference Guide perlxs

OUTPUT:

timep

Any extension to Perl, including those containing XSUBs, should have a Perl module to serve as the

bootstrap which pulls the extension into Perl. This module will export the extension‘s functions and

variables to the Perl program and will cause the extension‘s XSUBs to be linked into Perl. The following

module will be used for most of the examples in this document and should be used from Perl with the use

command as shown earlier. Perl modules are explained in more detail later in this document.

package RPC;

require Exporter;

require DynaLoader;

@ISA = qw(Exporter DynaLoader);

@EXPORT = qw( rpcb_gettime );

bootstrap RPC;

Throughout this document a variety of interfaces to the rpcb_gettime() XSUB will be explored. The

XSUBs will take their parameters in different orders or will take different numbers of parameters. In each

case the XSUB is an abstraction between Perl and the real C rpcb_gettime() function, and the XSUB

must always ensure that the real rpcb_gettime() function is called with the correct parameters. This

abstraction will allow the programmer to create a more Perl−like interface to the C function.

The Anatomy of an XSUB

The following XSUB allows a Perl program to access a C library function called sin(). The XSUB will

imitate the C function which takes a single argument and returns a single value.

double

sin(x)

double x

When using C pointers the indirection operator * should be considered part of the type and the address

operator & should be considered part of the variable, as is demonstrated in the rpcb_gettime() function

above. See the section on typemaps for more about handling qualifiers and unary operators in C types.

The function name and the return type must be placed on separate lines.

INCORRECT CORRECT

double sin(x) double

double x sin(x)

double x

The function body may be indented or left−adjusted. The following example shows a function with its body

left−adjusted. Most examples in this document will indent the body.

CORRECT

double

sin(x)

double x

The Argument Stack

The argument stack is used to store the values which are sent as parameters to the XSUB and to store the

XSUB‘s return value. In reality all Perl functions keep their values on this stack at the same time, each

limited to its own range of positions on the stack. In this document the first position on that stack which

belongs to the active function will be referred to as position 0 for that function.

XSUBs refer to their stack arguments with the macro ST(x), where x refers to a position in this XSUB‘s part

of the stack. Position 0 for that function would be known to the XSUB as ST(0). The XSUB‘s incoming

472 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

parameters and outgoing return values always begin at ST(0). For many simple cases the xsubpp compiler

will generate the code necessary to handle the argument stack by embedding code fragments found in the

typemaps. In more complex cases the programmer must supply the code.

The RETVAL Variable

The RETVAL variable is a magic variable which always matches the return type of the C library function.

The xsubpp compiler will supply this variable in each XSUB and by default will use it to hold the return

value of the C library function being called. In simple cases the value of RETVAL will be placed in ST(0)

of the argument stack where it can be received by Perl as the return value of the XSUB.

If the XSUB has a return type of void then the compiler will not supply a RETVAL variable for that

function. When using the PPCODE: directive the RETVAL variable is not needed, unless used explicitly.

If PPCODE: directive is not used, void return value should be used only for subroutines which do not

return a value, even if CODE: directive is used which sets ST(0) explicitly.

Older versions of this document recommended to use void return value in such cases. It was discovered that

this could lead to segfaults in cases when XSUB was truely void. This practice is now deprecated, and may

be not supported at some future version. Use the return value SV * in such cases. (Currently xsubpp

contains some heuristic code which tries to disambiguate between "truely−void" and

"old−practice−declared−as−void" functions. Hence your code is at mercy of this heuristics unless you use SV

* as return value.)

The MODULE Keyword

The MODULE keyword is used to start the XS code and to specify the package of the functions which are

being defined. All text preceding the first MODULE keyword is considered C code and is passed through to

the output untouched. Every XS module will have a bootstrap function which is used to hook the XSUBs

into Perl. The package name of this bootstrap function will match the value of the last MODULE statement

in the XS source files. The value of MODULE should always remain constant within the same XS file,

though this is not required.

The following example will start the XS code and will place all functions in a package named RPC.

MODULE = RPC

The PACKAGE Keyword

When functions within an XS source file must be separated into packages the PACKAGE keyword should be

used. This keyword is used with the MODULE keyword and must follow immediately after it when used.

MODULE = RPC PACKAGE = RPC

[ XS code in package RPC ]

MODULE = RPC PACKAGE = RPCB

[ XS code in package RPCB ]

MODULE = RPC PACKAGE = RPC

[ XS code in package RPC ]

Although this keyword is optional and in some cases provides redundant information it should always be

used. This keyword will ensure that the XSUBs appear in the desired package.

The PREFIX Keyword

The PREFIX keyword designates prefixes which should be removed from the Perl function names. If the C

function is rpcb_gettime() and the PREFIX value is rpcb_ then Perl will see this function as

gettime().

This keyword should follow the PACKAGE keyword when used. If PACKAGE is not used then PREFIX

should follow the MODULE keyword.

18−Oct−1998 Version 5.005_02 473

perlxs Perl Programmers Reference Guide perlxs

MODULE = RPC PREFIX = rpc_

MODULE = RPC PACKAGE = RPCB PREFIX = rpcb_

The OUTPUT: Keyword

The OUTPUT: keyword indicates that certain function parameters should be updated (new values made

visible to Perl) when the XSUB terminates or that certain values should be returned to the calling Perl

function. For simple functions, such as the sin() function above, the RETVAL variable is automatically

designated as an output value. In more complex functions the xsubpp compiler will need help to determine

which variables are output variables.

This keyword will normally be used to complement the CODE: keyword. The RETVAL variable is not

recognized as an output variable when the CODE: keyword is present. The OUTPUT: keyword is used in

this situation to tell the compiler that RETVAL really is an output variable.

The OUTPUT: keyword can also be used to indicate that function parameters are output variables. This may

be necessary when a parameter has been modified within the function and the programmer would like the

update to be seen by Perl.

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

OUTPUT:

timep

The OUTPUT: keyword will also allow an output parameter to be mapped to a matching piece of code rather

than to a typemap.

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

OUTPUT:

timep sv_setnv(ST(1), (double)timep);

xsubpp emits an automatic SvSETMAGIC() for all parameters in the OUTPUT section of the XSUB,

except RETVAL. This is the usually desired behavior, as it takes care of properly invoking ‘set’ magic on

output parameters (needed for hash or array element parameters that must be created if they didn‘t exist). If

for some reason, this behavior is not desired, the OUTPUT section may contain a SETMAGIC: DISABLE

line to disable it for the remainder of the parameters in the OUTPUT section. Likewise, SETMAGIC:

ENABLE can be used to reenable it for the remainder of the OUTPUT section. See perlguts for more details

about ‘set’ magic.

The CODE: Keyword

This keyword is used in more complicated XSUBs which require special handling for the C function. The

RETVAL variable is available but will not be returned unless it is specified under the OUTPUT: keyword.

The following XSUB is for a C function which requires special handling of its parameters. The Perl usage is

given first.

$status = rpcb_gettime( "localhost", $timep );

The XSUB follows.

bool_t

rpcb_gettime(host,timep)

char *host

time_t timep

CODE:

RETVAL = rpcb_gettime( host, &timep );

474 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

OUTPUT:

timep

RETVAL

The INIT: Keyword

The INIT: keyword allows initialization to be inserted into the XSUB before the compiler generates the call

to the C function. Unlike the CODE: keyword above, this keyword does not affect the way the compiler

handles RETVAL.

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

INIT:

printf("# Host is %s\n", host );

OUTPUT:

timep

The NO_INIT Keyword

The NO_INIT keyword is used to indicate that a function parameter is being used only as an output value.

The xsubpp compiler will normally generate code to read the values of all function parameters from the

argument stack and assign them to C variables upon entry to the function. NO_INIT will tell the compiler

that some parameters will be used for output rather than for input and that they will be handled before the

function terminates.

The following example shows a variation of the rpcb_gettime() function. This function uses the timep

variable only as an output variable and does not care about its initial contents.

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep = NO_INIT

OUTPUT:

timep

Initializing Function Parameters

Function parameters are normally initialized with their values from the argument stack. The typemaps

contain the code segments which are used to transfer the Perl values to the C parameters. The programmer,

however, is allowed to override the typemaps and supply alternate (or additional) initialization code.

The following code demonstrates how to supply initialization code for function parameters. The

initialization code is eval‘d within double quotes by the compiler before it is added to the output so anything

which should be interpreted literally [mainly $, @, or \\] must be protected with backslashes. The variables

$var, $arg, and $type can be used as in typemaps.

bool_t

rpcb_gettime(host,timep)

char *host = (char *)SvPV($arg,PL_na);

time_t &timep = 0;

OUTPUT:

timep

This should not be used to supply default values for parameters. One would normally use this when a

function parameter must be processed by another library function before it can be used. Default parameters

are covered in the next section.

If the initialization begins with =, then it is output on the same line where the input variable is declared. If

the initialization begins with ; or +, then it is output after all of the input variables have been declared. The

= and ; cases replace the initialization normally supplied from the typemap. For the + case, the initialization

18−Oct−1998 Version 5.005_02 475

perlxs Perl Programmers Reference Guide perlxs

from the typemap will preceed the initialization code included after the +. A global variable, %v, is available

for the truely rare case where information from one initialization is needed in another initialization.

bool_t

rpcb_gettime(host,timep)

time_t &timep ; /*\$v{time}=@{[$v{time}=$arg]}*/

char *host + SvOK($v{time}) ? SvPV($arg,PL_na) : NULL;

OUTPUT:

timep

Default Parameter Values

Default values can be specified for function parameters by placing an assignment statement in the parameter

list. The default value may be a number or a string. Defaults should always be used on the right−most

parameters only.

To allow the XSUB for rpcb_gettime() to have a default host value the parameters to the XSUB could

be rearranged. The XSUB will then call the real rpcb_gettime() function with the parameters in the

correct order. Perl will call this XSUB with either of the following statements.

$status = rpcb_gettime( $timep, $host );

$status = rpcb_gettime( $timep );

The XSUB will look like the code which follows. A CODE: block is used to call the real

rpcb_gettime() function with the parameters in the correct order for that function.

bool_t

rpcb_gettime(timep,host="localhost")

char *host

time_t timep = NO_INIT

CODE:

RETVAL = rpcb_gettime( host, &timep );

OUTPUT:

timep

RETVAL

The PREINIT: Keyword

The PREINIT: keyword allows extra variables to be declared before the typemaps are expanded. If a

variable is declared in a CODE: block then that variable will follow any typemap code. This may result in a

C syntax error. To force the variable to be declared before the typemap code, place it into a PREINIT: block.

The PREINIT: keyword may be used one or more times within an XSUB.

The following examples are equivalent, but if the code is using complex typemaps then the first example is

safer.

bool_t

rpcb_gettime(timep)

time_t timep = NO_INIT

PREINIT:

char *host = "localhost";

CODE:

RETVAL = rpcb_gettime( host, &timep );

OUTPUT:

timep

RETVAL

A correct, but error−prone example.

bool_t

rpcb_gettime(timep)

476 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

time_t timep = NO_INIT

CODE:

char *host = "localhost";

RETVAL = rpcb_gettime( host, &timep );

OUTPUT:

timep

RETVAL

The SCOPE: Keyword

The SCOPE: keyword allows scoping to be enabled for a particular XSUB. If enabled, the XSUB will

invoke ENTER and LEAVE automatically.

To support potentially complex type mappings, if a typemap entry used by this XSUB contains a comment

like /*scope*/ then scoping will automatically be enabled for that XSUB.

To enable scoping:

SCOPE: ENABLE

To disable scoping:

SCOPE: DISABLE

The INPUT: Keyword

The XSUB‘s parameters are usually evaluated immediately after entering the XSUB. The INPUT: keyword

can be used to force those parameters to be evaluated a little later. The INPUT: keyword can be used

multiple times within an XSUB and can be used to list one or more input variables. This keyword is used

with the PREINIT: keyword.

The following example shows how the input parameter timep can be evaluated late, after a PREINIT.

bool_t

rpcb_gettime(host,timep)

char *host

PREINIT:

time_t tt;

INPUT:

time_t timep

CODE:

RETVAL = rpcb_gettime( host, &tt );

timep = tt;

OUTPUT:

timep

RETVAL

The next example shows each input parameter evaluated late.

bool_t

rpcb_gettime(host,timep)

PREINIT:

time_t tt;

INPUT:

char *host

PREINIT:

char *h;

INPUT:

time_t timep

CODE:

h = host;

RETVAL = rpcb_gettime( h, &tt );

18−Oct−1998 Version 5.005_02 477

perlxs Perl Programmers Reference Guide perlxs

timep = tt;

OUTPUT:

timep

RETVAL

Variable−length Parameter Lists

XSUBs can have variable−length parameter lists by specifying an ellipsis (...) in the parameter list. This

use of the ellipsis is similar to that found in ANSI C. The programmer is able to determine the number of

arguments passed to the XSUB by examining the items variable which the xsubpp compiler supplies for

all XSUBs. By using this mechanism one can create an XSUB which accepts a list of parameters of

unknown length.

The host parameter for the rpcb_gettime() XSUB can be optional so the ellipsis can be used to indicate

that the XSUB will take a variable number of parameters. Perl should be able to call this XSUB with either

of the following statements.

$status = rpcb_gettime( $timep, $host );

$status = rpcb_gettime( $timep );

The XS code, with ellipsis, follows.

bool_t

rpcb_gettime(timep, ...)

time_t timep = NO_INIT

PREINIT:

char *host = "localhost";

CODE:

if( items > 1 )

host = (char *)SvPV(ST(1), PL_na);

RETVAL = rpcb_gettime( host, &timep );

OUTPUT:

timep

RETVAL

The C_ARGS: Keyword

The C_ARGS: keyword allows creating of XSUBS which have different calling sequence from Perl than

from C, without a need to write CODE: or CPPCODE: section. The contents of the C_ARGS: paragraph is

put as the argument to the called C function without any change.

For example, suppose that C function is declared as

symbolic nth_derivative(int n, symbolic function, int flags);

and that the default flags are kept in a global C variable default_flags. Suppose that you want to create

an interface which is called as

$second_deriv = $function−>nth_derivative(2);

To do this, declare the XSUB as

symbolic

nth_derivative(function, n)

symbolic function

int n

C_ARGS:

n, function, default_flags

The PPCODE: Keyword

The PPCODE: keyword is an alternate form of the CODE: keyword and is used to tell the xsubpp compiler

that the programmer is supplying the code to control the argument stack for the XSUBs return values.

478 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

Occasionally one will want an XSUB to return a list of values rather than a single value. In these cases one

must use PPCODE: and then explicitly push the list of values on the stack. The PPCODE: and CODE:

keywords are not used together within the same XSUB.

The following XSUB will call the C rpcb_gettime() function and will return its two output values,

timep and status, to Perl as a single list.

void

rpcb_gettime(host)

char *host

PREINIT:

time_t timep;

bool_t status;

PPCODE:

status = rpcb_gettime( host, &timep );

EXTEND(SP, 2);

PUSHs(sv_2mortal(newSViv(status)));

PUSHs(sv_2mortal(newSViv(timep)));

Notice that the programmer must supply the C code necessary to have the real rpcb_gettime() function

called and to have the return values properly placed on the argument stack.

The void return type for this function tells the xsubpp compiler that the RETVAL variable is not needed or

used and that it should not be created. In most scenarios the void return type should be used with the

PPCODE: directive.

The EXTEND() macro is used to make room on the argument stack for 2 return values. The PPCODE:

directive causes the xsubpp compiler to create a stack pointer available as SP, and it is this pointer which is

being used in the EXTEND() macro. The values are then pushed onto the stack with the PUSHs() macro.

Now the rpcb_gettime() function can be used from Perl with the following statement.

($status, $timep) = rpcb_gettime("localhost");

When handling output parameters with a PPCODE section, be sure to handle ‘set’ magic properly. See

perlguts for details about ‘set’ magic.

Returning Undef And Empty Lists

Occasionally the programmer will want to return simply undef or an empty list if a function fails rather

than a separate status value. The rpcb_gettime() function offers just this situation. If the function

succeeds we would like to have it return the time and if it fails we would like to have undef returned. In the

following Perl code the value of $timep will either be undef or it will be a valid time.

$timep = rpcb_gettime( "localhost" );

The following XSUB uses the SV * return type as a mnemonic only, and uses a CODE: block to indicate to

the compiler that the programmer has supplied all the necessary code. The sv_newmortal() call will

initialize the return value to undef, making that the default return value.

SV *

rpcb_gettime(host)

char * host

PREINIT:

time_t timep;

bool_t x;

CODE:

ST(0) = sv_newmortal();

if( rpcb_gettime( host, &timep ) )

sv_setnv( ST(0), (double)timep);

18−Oct−1998 Version 5.005_02 479

perlxs Perl Programmers Reference Guide perlxs

The next example demonstrates how one would place an explicit undef in the return value, should the need

arise.

SV *

rpcb_gettime(host)

char * host

PREINIT:

time_t timep;

bool_t x;

CODE:

ST(0) = sv_newmortal();

if( rpcb_gettime( host, &timep ) ){

sv_setnv( ST(0), (double)timep);

}

else{

ST(0) = &PL_sv_undef;

}

To return an empty list one must use a PPCODE: block and then not push return values on the stack.

void

rpcb_gettime(host)

char *host

PREINIT:

time_t timep;

PPCODE:

if( rpcb_gettime( host, &timep ) )

PUSHs(sv_2mortal(newSViv(timep)));

else{

/* Nothing pushed on stack, so an empty */

/* list is implicitly returned. */

}

Some people may be inclined to include an explicit return in the above XSUB, rather than letting control

fall through to the end. In those situations XSRETURN_EMPTY should be used, instead. This will ensure

that the XSUB stack is properly adjusted. Consult API LISTING in perlguts for other XSRETURN macros.

The REQUIRE: Keyword

The REQUIRE: keyword is used to indicate the minimum version of the xsubpp compiler needed to compile

the XS module. An XS module which contains the following statement will compile with only xsubpp

version 1.922 or greater:

REQUIRE: 1.922

The CLEANUP: Keyword

This keyword can be used when an XSUB requires special cleanup procedures before it terminates. When

the CLEANUP: keyword is used it must follow any CODE:, PPCODE:, or OUTPUT: blocks which are

present in the XSUB. The code specified for the cleanup block will be added as the last statements in the

XSUB.

The BOOT: Keyword

The BOOT: keyword is used to add code to the extension‘s bootstrap function. The bootstrap function is

generated by the xsubpp compiler and normally holds the statements necessary to register any XSUBs with

Perl. With the BOOT: keyword the programmer can tell the compiler to add extra statements to the bootstrap

function.

This keyword may be used any time after the first MODULE keyword and should appear on a line by itself.

The first blank line after the keyword will terminate the code block.

480 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

BOOT:

# The following message will be printed when the

# bootstrap function executes.

printf("Hello from the bootstrap!\n");

The VERSIONCHECK: Keyword

The VERSIONCHECK: keyword corresponds to xsubpp‘s −versioncheck and −noversioncheck

options. This keyword overrides the command line options. Version checking is enabled by default. When

version checking is enabled the XS module will attempt to verify that its version matches the version of the

PM module.

To enable version checking:

VERSIONCHECK: ENABLE

To disable version checking:

VERSIONCHECK: DISABLE

The PROTOTYPES: Keyword

The PROTOTYPES: keyword corresponds to xsubpp‘s −prototypes and −noprototypes options.

This keyword overrides the command line options. Prototypes are enabled by default. When prototypes are

enabled XSUBs will be given Perl prototypes. This keyword may be used multiple times in an XS module to

enable and disable prototypes for different parts of the module.

To enable prototypes:

PROTOTYPES: ENABLE

To disable prototypes:

PROTOTYPES: DISABLE

The PROTOTYPE: Keyword

This keyword is similar to the PROTOTYPES: keyword above but can be used to force xsubpp to use a

specific prototype for the XSUB. This keyword overrides all other prototype options and keywords but

affects only the current XSUB. Consult Prototypes for information about Perl prototypes.

bool_t

rpcb_gettime(timep, ...)

time_t timep = NO_INIT

PROTOTYPE: $;$

PREINIT:

char *host = "localhost";

CODE:

if( items > 1 )

host = (char *)SvPV(ST(1), PL_na);

RETVAL = rpcb_gettime( host, &timep );

OUTPUT:

timep

RETVAL

The ALIAS: Keyword

The ALIAS: keyword allows an XSUB to have two or more unique Perl names and to know which of those

names was used when it was invoked. The Perl names may be fully−qualified with package names. Each

alias is given an index. The compiler will setup a variable called ix which contain the index of the alias

which was used. When the XSUB is called with its declared name ix will be 0.

The following example will create aliases FOO::gettime() and BAR::getit() for this function.

bool_t

18−Oct−1998 Version 5.005_02 481

perlxs Perl Programmers Reference Guide perlxs

rpcb_gettime(host,timep)

char *host

time_t &timep

ALIAS:

FOO::gettime = 1

BAR::getit = 2

INIT:

printf("# ix = %d\n", ix );

OUTPUT:

timep

The INTERFACE: Keyword

This keyword declares the current XSUB as a keeper of the given calling signature. If some text follows this

keyword, it is considered as a list of functions which have this signature, and should be attached to XSUBs.

Say, if you have 4 functions multiply(), divide(), add(), subtract() all having the signature

symbolic f(symbolic, symbolic);

you code them all by using XSUB

symbolic

interface_s_ss(arg1, arg2)

symbolic arg1

symbolic arg2

INTERFACE:

multiply divide

add subtract

The advantage of this approach comparing to ALIAS: keyword is that one can attach an extra function

remainder() at runtime by using

CV *mycv = newXSproto("Symbolic::remainder",

XS_Symbolic_interface_s_ss, __FILE__, "$$");

XSINTERFACE_FUNC_SET(mycv, remainder);

(This example supposes that there was no INTERFACE_MACRO: section, otherwise one needs to use

something else instead of XSINTERFACE_FUNC_SET.)

The INTERFACE_MACRO: Keyword

This keyword allows one to define an INTERFACE using a different way to extract a function pointer from

an XSUB. The text which follows this keyword should give the name of macros which would extract/set a

function pointer. The extractor macro is given return type, CV*, and XSANY.any_dptr for this CV*. The

setter macro is given cv, and the function pointer.

The default value is XSINTERFACE_FUNC and XSINTERFACE_FUNC_SET. An INTERFACE keyword

with an empty list of functions can be omitted if INTERFACE_MACRO keyword is used.

Suppose that in the previous example functions pointers for multiply(), divide(), add(),

subtract() are kept in a global C array fp[] with offsets being multiply_off, divide_off,

add_off, subtract_off. Then one can use

#define XSINTERFACE_FUNC_BYOFFSET(ret,cv,f) \

((XSINTERFACE_CVT(ret,))fp[CvXSUBANY(cv).any_i32])

#define XSINTERFACE_FUNC_BYOFFSET_set(cv,f) \

CvXSUBANY(cv).any_i32 = CAT2( f, _off )

in C section,

symbolic

interface_s_ss(arg1, arg2)

482 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

symbolicarg1

symbolicarg2

INTERFACE_MACRO:

XSINTERFACE_FUNC_BYOFFSET

XSINTERFACE_FUNC_BYOFFSET_set

INTERFACE:

multiply divide

add subtract

in XSUB section.

The INCLUDE: Keyword

This keyword can be used to pull other files into the XS module. The other files may have XS code.

INCLUDE: can also be used to run a command to generate the XS code to be pulled into the module.

The file Rpcb1.xsh contains our rpcb_gettime() function:

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

OUTPUT:

timep

The XS module can use INCLUDE: to pull that file into it.

INCLUDE: Rpcb1.xsh

If the parameters to the INCLUDE: keyword are followed by a pipe (|) then the compiler will interpret the

parameters as a command.

INCLUDE: cat Rpcb1.xsh |

The CASE: Keyword

The CASE: keyword allows an XSUB to have multiple distinct parts with each part acting as a virtual

XSUB. CASE: is greedy and if it is used then all other XS keywords must be contained within a CASE:.

This means nothing may precede the first CASE: in the XSUB and anything following the last CASE: is

included in that case.

A CASE: might switch via a parameter of the XSUB, via the ix ALIAS: variable (see

"The ALIAS: Keyword"), or maybe via the items variable (see "Variable−length Parameter Lists"). The

last CASE: becomes the default case if it is not associated with a conditional. The following example shows

CASE switched via ix with a function rpcb_gettime() having an alias x_gettime(). When the

function is called as rpcb_gettime() its parameters are the usual (char *host, time_t

*timep), but when the function is called as x_gettime() its parameters are reversed, (time_t

*timep, char *host).

long

rpcb_gettime(a,b)

CASE: ix == 1

ALIAS:

x_gettime = 1

INPUT:

# ’a’ is timep, ’b’ is host

char *b

time_t a = NO_INIT

CODE:

RETVAL = rpcb_gettime( b, &a );

OUTPUT:

18−Oct−1998 Version 5.005_02 483

perlxs Perl Programmers Reference Guide perlxs

RETVAL

CASE:

# ’a’ is host, ’b’ is timep

char *a

time_t &b = NO_INIT

OUTPUT:

RETVAL

That function can be called with either of the following statements. Note the different argument lists.

$status = rpcb_gettime( $host, $timep );

$status = x_gettime( $timep, $host );

The & Unary Operator

The & unary operator is used to tell the compiler that it should dereference the object when it calls the C

function. This is used when a CODE: block is not used and the object is a not a pointer type (the object is an

int or long but not a int* or long*).

The following XSUB will generate incorrect C code. The xsubpp compiler will turn this into code which

calls rpcb_gettime() with parameters (char *host, time_t timep), but the real

rpcb_gettime() wants the timep parameter to be of type time_t* rather than time_t.

bool_t

rpcb_gettime(host,timep)

char *host

time_t timep

OUTPUT:

timep

That problem is corrected by using the & operator. The xsubpp compiler will now turn this into code which

calls rpcb_gettime() correctly with parameters (char *host, time_t *timep). It does this by

carrying the & through, so the function call looks like rpcb_gettime(host, &timep).

bool_t

rpcb_gettime(host,timep)

char *host

time_t &timep

OUTPUT:

timep

Inserting Comments and C Preprocessor Directives

C preprocessor directives are allowed within BOOT:, PREINIT: INIT:, CODE:, PPCODE:, and CLEANUP:

blocks, as well as outside the functions. Comments are allowed anywhere after the MODULE keyword. The

compiler will pass the preprocessor directives through untouched and will remove the commented lines.

Comments can be added to XSUBs by placing a # as the first non−whitespace of a line. Care should be

taken to avoid making the comment look like a C preprocessor directive, lest it be interpreted as such. The

simplest way to prevent this is to put whitespace in front of the #.

If you use preprocessor directives to choose one of two versions of a function, use

#if ... version1

#else /* ... version2 */

#endif

and not

#if ... version1

#endif

484 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

#if ... version2

#endif

because otherwise xsubpp will believe that you made a duplicate definition of the function. Also, put a blank

line before the #else/#endif so it will not be seen as part of the function body.

Using XS With C++

If a function is defined as a C++ method then it will assume its first argument is an object pointer. The

object pointer will be stored in a variable called THIS. The object should have been created by C++ with the

new() function and should be blessed by Perl with the sv_setref_pv() macro. The blessing of the

object by Perl can be handled by a typemap. An example typemap is shown at the end of this section.

If the method is defined as static it will call the C++ function using the class::method() syntax. If the

method is not static the function will be called using the THIS−>method() syntax.

The next examples will use the following C++ class.

class color {

public:

color();

~color();

int blue();

void set_blue( int );

private:

int c_blue;

};

The XSUBs for the blue() and set_blue() methods are defined with the class name but the parameter

for the object (THIS, or "self") is implicit and is not listed.

int

color::blue()

void

color::set_blue( val )

int val

Both functions will expect an object as the first parameter. The xsubpp compiler will call that object THIS

and will use it to call the specified method. So in the C++ code the blue() and set_blue() methods

will be called in the following manner.

RETVAL = THIS−>blue();

THIS−>set_blue( val );

If the function‘s name is DESTROY then the C++ delete function will be called and THIS will be given

as its parameter.

void

color::DESTROY()

The C++ code will call delete.

delete THIS;

If the function‘s name is new then the C++ new function will be called to create a dynamic C++ object. The

XSUB will expect the class name, which will be kept in a variable called CLASS, to be given as the first

argument.

color *

color::new()

18−Oct−1998 Version 5.005_02 485

perlxs Perl Programmers Reference Guide perlxs

The C++ code will call new.

RETVAL = new color();

The following is an example of a typemap that could be used for this C++ example.

TYPEMAP

color * O_OBJECT

OUTPUT

# The Perl object is blessed into ’CLASS’, which should be a

# char* having the name of the package for the blessing.

O_OBJECT

sv_setref_pv( $arg, CLASS, (void*)$var );

INPUT

O_OBJECT

if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )

$var = ($type)SvIV((SV*)SvRV( $arg ));

else{

warn( \"${Package}::$func_name() −− $var is not a blessed SV referenc

XSRETURN_UNDEF;

}

Interface Strategy

When designing an interface between Perl and a C library a straight translation from C to XS is often

sufficient. The interface will often be very C−like and occasionally nonintuitive, especially when the C

function modifies one of its parameters. In cases where the programmer wishes to create a more Perl−like

interface the following strategy may help to identify the more critical parts of the interface.

Identify the C functions which modify their parameters. The XSUBs for these functions may be able to

return lists to Perl, or may be candidates to return undef or an empty list in case of failure.

Identify which values are used by only the C and XSUB functions themselves. If Perl does not need to

access the contents of the value then it may not be necessary to provide a translation for that value from C to

Perl.

Identify the pointers in the C function parameter lists and return values. Some pointers can be handled in XS

with the & unary operator on the variable name while others will require the use of the * operator on the type

name. In general it is easier to work with the & operator.

Identify the structures used by the C functions. In many cases it may be helpful to use the T_PTROBJ

typemap for these structures so they can be manipulated by Perl as blessed objects.

Perl Objects And C Structures

When dealing with C structures one should select either T_PTROBJ or T_PTRREF for the XS type. Both

types are designed to handle pointers to complex objects. The T_PTRREF type will allow the Perl object to

be unblessed while the T_PTROBJ type requires that the object be blessed. By using T_PTROBJ one can

achieve a form of type−checking because the XSUB will attempt to verify that the Perl object is of the

expected type.

The following XS code shows the getnetconfigent() function which is used with ONC+ TIRPC. The

getnetconfigent() function will return a pointer to a C structure and has the C prototype shown

below. The example will demonstrate how the C pointer will become a Perl reference. Perl will consider

this reference to be a pointer to a blessed object and will attempt to call a destructor for the object. A

destructor will be provided in the XS source to free the memory used by getnetconfigent().

Destructors in XS can be created by specifying an XSUB function whose name ends with the word

DESTROY. XS destructors can be used to free memory which may have been malloc‘d by another XSUB.

struct netconfig *getnetconfigent(const char *netid);

486 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

A typedef will be created for struct netconfig. The Perl object will be blessed in a class matching

the name of the C type, with the tag Ptr appended, and the name should not have embedded spaces if it will

be a Perl package name. The destructor will be placed in a class corresponding to the class of the object and

the PREFIX keyword will be used to trim the name to the word DESTROY as Perl will expect.

typedef struct netconfig Netconfig;

MODULE = RPC PACKAGE = RPC

Netconfig *

getnetconfigent(netid)

char *netid

MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_

void

rpcb_DESTROY(netconf)

Netconfig *netconf

CODE:

printf("Now in NetconfigPtr::DESTROY\n");

free( netconf );

This example requires the following typemap entry. Consult the typemap section for more information about

adding new typemaps for an extension.

TYPEMAP

Netconfig * T_PTROBJ

This example will be used with the following Perl statements.

use RPC;

$netconf = getnetconfigent("udp");

When Perl destroys the object referenced by $netconf it will send the object to the supplied XSUB

DESTROY function. Perl cannot determine, and does not care, that this object is a C struct and not a Perl

object. In this sense, there is no difference between the object created by the getnetconfigent()

XSUB and an object created by a normal Perl subroutine.

The Typemap

The typemap is a collection of code fragments which are used by the xsubpp compiler to map C function

parameters and values to Perl values. The typemap file may consist of three sections labeled TYPEMAP,

INPUT, and OUTPUT. The INPUT section tells the compiler how to translate Perl values into variables of

certain C types. The OUTPUT section tells the compiler how to translate the values from certain C types

into values Perl can understand. The TYPEMAP section tells the compiler which of the INPUT and

OUTPUT code fragments should be used to map a given C type to a Perl value. Each of the sections of the

typemap must be preceded by one of the TYPEMAP, INPUT, or OUTPUT keywords.

The default typemap in the ext directory of the Perl source contains many useful types which can be used

by Perl extensions. Some extensions define additional typemaps which they keep in their own directory.

These additional typemaps may reference INPUT and OUTPUT maps in the main typemap. The xsubpp

compiler will allow the extension‘s own typemap to override any mappings which are in the default

typemap.

Most extensions which require a custom typemap will need only the TYPEMAP section of the typemap file.

The custom typemap used in the getnetconfigent() example shown earlier demonstrates what may be

the typical use of extension typemaps. That typemap is used to equate a C structure with the T_PTROBJ

typemap. The typemap used by getnetconfigent() is shown here. Note that the C type is separated

from the XS type with a tab and that the C unary operator * is considered to be a part of the C type name.

TYPEMAP

Netconfig *<tab>T_PTROBJ

18−Oct−1998 Version 5.005_02 487

perlxs Perl Programmers Reference Guide perlxs

Here‘s a more complicated example: suppose that you wanted struct netconfig to be blessed into the

class Net::Config. One way to do this is to use underscores (_) to separate package names, as follows:

typedef struct netconfig * Net_Config;

And then provide a typemap entry T_PTROBJ_SPECIAL that maps underscores to double−colons (::), and

declare Net_Config to be of that type:

TYPEMAP

Net_Config T_PTROBJ_SPECIAL

INPUT

T_PTROBJ_SPECIAL

if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")) {

IV tmp = SvIV((SV*)SvRV($arg));

$var = ($type) tmp;

}

else

croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$nt

OUTPUT

T_PTROBJ_SPECIAL

sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\",

(void*)$var);

The INPUT and OUTPUT sections substitute underscores for double−colons on the fly, giving the desired

effect. This example demonstrates some of the power and versatility of the typemap facility.

EXAMPLES

File RPC.xs: Interface to some ONC+ RPC bind library functions.

#include "EXTERN.h"

#include "perl.h"

#include "XSUB.h"

#include <rpc/rpc.h>

typedef struct netconfig Netconfig;

MODULE = RPC PACKAGE = RPC

SV *

rpcb_gettime(host="localhost")

char *host

PREINIT:

time_t timep;

CODE:

ST(0) = sv_newmortal();

if( rpcb_gettime( host, &timep ) )

sv_setnv( ST(0), (double)timep );

Netconfig *

getnetconfigent(netid="udp")

char *netid

MODULE = RPC PACKAGE = NetconfigPtr PREFIX = rpcb_

void

rpcb_DESTROY(netconf)

Netconfig *netconf

CODE:

printf("NetconfigPtr::DESTROY\n");

488 Version 5.005_02 18−Oct−1998

perlxs Perl Programmers Reference Guide perlxs

free( netconf );

File typemap: Custom typemap for RPC.xs.

TYPEMAP

Netconfig * T_PTROBJ

File RPC.pm: Perl module for the RPC extension.

package RPC;

require Exporter;

require DynaLoader;

@ISA = qw(Exporter DynaLoader);

@EXPORT = qw(rpcb_gettime getnetconfigent);

bootstrap RPC;

File rpctest.pl: Perl test program for the RPC extension.

use RPC;

$netconf = getnetconfigent();

$a = rpcb_gettime();

print "time = $a\n";

print "netconf = $netconf\n";

$netconf = getnetconfigent("tcp");

$a = rpcb_gettime("poplar");

print "time = $a\n";

print "netconf = $netconf\n";

XS VERSION

This document covers features supported by xsubpp 1.935.

AUTHOR

Dean Roehrich <roehrich@cray.com Jul 8, 1996

18−Oct−1998 Version 5.005_02 489

perlxstut Perl Programmers Reference Guide perlxstut

NAME

perlXStut − Tutorial for XSUBs

DESCRIPTION

This tutorial will educate the reader on the steps involved in creating a Perl extension. The reader is assumed

to have access to perlguts and perlxs.

This tutorial starts with very simple examples and becomes more complex, with each new example adding

new features. Certain concepts may not be completely explained until later in the tutorial to ease the reader

slowly into building extensions.

VERSION CAVEAT

This tutorial tries hard to keep up with the latest development versions of Perl. This often means that it is

sometimes in advance of the latest released version of Perl, and that certain features described here might not

work on earlier versions. This section will keep track of when various features were added to Perl 5.

In versions of Perl 5.002 prior to the gamma version, the test script in Example 1 will not function

properly. You need to change the "use lib" line to read:

use lib ’./blib’;

In versions of Perl 5.002 prior to version beta 3, the line in the .xs file about "PROTOTYPES:

DISABLE" will cause a compiler error. Simply remove that line from the file.

In versions of Perl 5.002 prior to version 5.002b1h, the test.pl file was not automatically created by

h2xs. This means that you cannot say "make test" to run the test script. You will need to add the

following line before the "use extension" statement:

use lib ’./blib’;

In versions 5.000 and 5.001, instead of using the above line, you will need to use the following line:

BEGIN { unshift(@INC, "./blib") }

This document assumes that the executable named "perl" is Perl version 5. Some systems may have

installed Perl version 5 as "perl5".

DYNAMIC VERSUS STATIC

It is commonly thought that if a system does not have the capability to load a library dynamically, you

cannot build XSUBs. This is incorrect. You can build them, but you must link the XSUB‘s subroutines with

the rest of Perl, creating a new executable. This situation is similar to Perl 4.

This tutorial can still be used on such a system. The XSUB build mechanism will check the system and

build a dynamically−loadable library if possible, or else a static library and then, optionally, a new

statically−linked executable with that static library linked in.

Should you wish to build a statically−linked executable on a system which can dynamically load libraries,

you may, in all the following examples, where the command "make" with no arguments is executed, run the

command "make perl" instead.

If you have generated such a statically−linked executable by choice, then instead of saying "make test", you

should say "make test_static". On systems that cannot build dynamically−loadable libraries at all, simply

saying "make test" is sufficient.

EXAMPLE 1

Our first extension will be very simple. When we call the routine in the extension, it will print out a

well−known message and return.

Run h2xs −A −n Mytest. This creates a directory named Mytest, possibly under ext/ if that directory

exists in the current working directory. Several files will be created in the Mytest dir, including

MANIFEST, Makefile.PL, Mytest.pm, Mytest.xs, test.pl, and Changes.

490 Version 5.005_02 18−Oct−1998

perlxstut Perl Programmers Reference Guide perlxstut

The MANIFEST file contains the names of all the files created.

The file Makefile.PL should look something like this:

use ExtUtils::MakeMaker;

# See lib/ExtUtils/MakeMaker.pm for details of how to influence

# the contents of the Makefile that is written.

WriteMakefile(

’NAME’ => ’Mytest’,

’VERSION_FROM’ => ’Mytest.pm’, # finds $VERSION

’LIBS’ => [’’], # e.g., ’−lm’

’DEFINE’ => ’’, # e.g., ’−DHAVE_SOMETHING’

’INC’ => ’’, # e.g., ’−I/usr/include/other’

);

The file Mytest.pm should start with something like this:

package Mytest;

require Exporter;

require DynaLoader;

@ISA = qw(Exporter DynaLoader);

# Items to export into callers namespace by default. Note: do not export

# names by default without a very good reason. Use EXPORT_OK instead.

# Do not simply export all your public functions/methods/constants.

@EXPORT = qw(

);

$VERSION = ’0.01’;

bootstrap Mytest $VERSION;

# Preloaded methods go here.

# Autoload methods go after __END__, and are processed by the autosplit progr

__END__

# Below is the stub of documentation for your module. You better edit it!

And the Mytest.xs file should look something like this:

#ifdef __cplusplus

extern "C" {

#endif

#include "EXTERN.h"

#include "perl.h"

#include "XSUB.h"

#ifdef __cplusplus

}

#endif

PROTOTYPES: DISABLE

MODULE = Mytest PACKAGE = Mytest

Let‘s edit the .xs file by adding this to the end of the file:

void

hello()

CODE:

printf("Hello, world!\n");

18−Oct−1998 Version 5.005_02 491

perlxstut Perl Programmers Reference Guide perlxstut

Now we‘ll run "perl Makefile.PL". This will create a real Makefile, which make needs. Its output looks

something like:

% perl Makefile.PL

Checking if your kit is complete...

Looks good

Writing Makefile for Mytest

Now, running make will produce output that looks something like this (some long lines shortened for

clarity):

% make

umask 0 && cp Mytest.pm ./blib/Mytest.pm

perl xsubpp −typemap typemap Mytest.xs >Mytest.tc && mv Mytest.tc Mytest.c

cc −c Mytest.c

Running Mkbootstrap for Mytest ()

chmod 644 Mytest.bs

LD_RUN_PATH="" ld −o ./blib/PA−RISC1.1/auto/Mytest/Mytest.sl −b Mytest.o

chmod 755 ./blib/PA−RISC1.1/auto/Mytest/Mytest.sl

cp Mytest.bs ./blib/PA−RISC1.1/auto/Mytest/Mytest.bs

chmod 644 ./blib/PA−RISC1.1/auto/Mytest/Mytest.bs

Now, although there is already a test.pl template ready for us, for this example only, we‘ll create a special

test script. Create a file called hello that looks like this:

#! /opt/perl5/bin/perl

use ExtUtils::testlib;

use Mytest;

Mytest::hello();

Now we run the script and we should see the following output:

% perl hello

Hello, world!

EXAMPLE 2

Now let‘s add to our extension a subroutine that will take a single argument and return 1 if the argument is

even, 0 if the argument is odd.

Add the following to the end of Mytest.xs:

int

is_even(input)

int input

CODE:

RETVAL = (input % 2 == 0);

OUTPUT:

RETVAL

There does not need to be white space at the start of the "int input" line, but it is useful for improving

readability. The semi−colon at the end of that line is also optional.

Any white space may be between the "int" and "input". It is also okay for the four lines starting at the

"CODE:" line to not be indented. However, for readability purposes, it is suggested that you indent them 8

spaces (or one normal tab stop).

Now rerun make to rebuild our new shared library.

492 Version 5.005_02 18−Oct−1998

perlxstut Perl Programmers Reference Guide perlxstut

Now perform the same steps as before, generating a Makefile from the Makefile.PL file, and running make.

To test that our extension works, we now need to look at the file test.pl. This file is set up to imitate the

same kind of testing structure that Perl itself has. Within the test script, you perform a number of tests to

confirm the behavior of the extension, printing "ok" when the test is correct, "not ok" when it is not. Change

the print statement in the BEGIN block to print "1..4", and add the following code to the end of the file:

print &Mytest::is_even(0) == 1 ? "ok 2" : "not ok 2", "\n";

print &Mytest::is_even(1) == 0 ? "ok 3" : "not ok 3", "\n";

print &Mytest::is_even(2) == 1 ? "ok 4" : "not ok 4", "\n";

We will be calling the test script through the command "make test". You should see output that looks

something like this:

% make test

PERL_DL_NONLAZY=1 /opt/perl5.002b2/bin/perl (lots of −I arguments) test.pl

1..4

ok 1

ok 2

ok 3

ok 4

WHAT HAS GONE ON?

The program h2xs is the starting point for creating extensions. In later examples we‘ll see how we can use

h2xs to read header files and generate templates to connect to C routines.

h2xs creates a number of files in the extension directory. The file Makefile.PL is a perl script which will

generate a true Makefile to build the extension. We‘ll take a closer look at it later.

The files <extension>.pm and <extension>.xs contain the meat of the extension. The .xs file holds the C

routines that make up the extension. The .pm file contains routines that tell Perl how to load your extension.

Generating and invoking the Makefile created a directory blib (which stands for "build library") in the

current working directory. This directory will contain the shared library that we will build. Once we have

tested it, we can install it into its final location.

Invoking the test script via "make test" did something very important. It invoked perl with all those −I

arguments so that it could find the various files that are part of the extension.

It is very important that while you are still testing extensions that you use "make test". If you try to run the

test script all by itself, you will get a fatal error.

Another reason it is important to use "make test" to run your test script is that if you are testing an upgrade to

an already−existing version, using "make test" insures that you use your new extension, not the

already−existing version.

When Perl sees a use extension;, it searches for a file with the same name as the use‘d extension that

has a .pm suffix. If that file cannot be found, Perl dies with a fatal error. The default search path is

contained in the @INC array.

In our case, Mytest.pm tells perl that it will need the Exporter and Dynamic Loader extensions. It then sets

the @ISA and @EXPORT arrays and the $VERSION scalar; finally it tells perl to bootstrap the module.

Perl will call its dynamic loader routine (if there is one) and load the shared library.

The two arrays that are set in the .pm file are very important. The @ISA array contains a list of other

packages in which to search for methods (or subroutines) that do not exist in the current package. The

@EXPORT array tells Perl which of the extension‘s routines should be placed into the calling package‘s

namespace.

It‘s important to select what to export carefully. Do NOT export method names and do NOT export anything

else by default without a good reason.

18−Oct−1998 Version 5.005_02 493

perlxstut Perl Programmers Reference Guide perlxstut

As a general rule, if the module is trying to be object−oriented then don‘t export anything. If it‘s just a

collection of functions then you can export any of the functions via another array, called @EXPORT_OK.

See perlmod for more information.

The $VERSION variable is used to ensure that the .pm file and the shared library are "in sync" with each

other. Any time you make changes to the .pm or .xs files, you should increment the value of this variable.

WRITING GOOD TEST SCRIPTS

The importance of writing good test scripts cannot be overemphasized. You should closely follow the

"ok/not ok" style that Perl itself uses, so that it is very easy and unambiguous to determine the outcome of

each test case. When you find and fix a bug, make sure you add a test case for it.

By running "make test", you ensure that your test.pl script runs and uses the correct version of your

extension. If you have many test cases, you might want to copy Perl‘s test style. Create a directory named

"t", and ensure all your test files end with the suffix ".t". The Makefile will properly run all these test files.

EXAMPLE 3

Our third extension will take one argument as its input, round off that value, and set the argument to the

rounded value.

Add the following to the end of Mytest.xs:

void

round(arg)

double arg

CODE:

if (arg > 0.0) {

arg = floor(arg + 0.5);

} else if (arg < 0.0) {

arg = ceil(arg − 0.5);

} else {

arg = 0.0;

}

OUTPUT:

arg

Edit the Makefile.PL file so that the corresponding line looks like this:

’LIBS’ => [’−lm’], # e.g., ’−lm’

Generate the Makefile and run make. Change the BEGIN block to print out "1..9" and add the following to

test.pl:

$i = −1.5; &Mytest::round($i); print $i == −2.0 ? "ok 5" : "not ok 5", "\n";

$i = −1.1; &Mytest::round($i); print $i == −1.0 ? "ok 6" : "not ok 6", "\n";

$i = 0.0; &Mytest::round($i); print $i == 0.0 ? "ok 7" : "not ok 7", "\n";

$i = 0.5; &Mytest::round($i); print $i == 1.0 ? "ok 8" : "not ok 8", "\n";

$i = 1.2; &Mytest::round($i); print $i == 1.0 ? "ok 9" : "not ok 9", "\n";

Running "make test" should now print out that all nine tests are okay.

You might be wondering if you can round a constant. To see what happens, add the following line to test.pl

temporarily:

&Mytest::round(3);

Run "make test" and notice that Perl dies with a fatal error. Perl won‘t let you change the value of constants!

494 Version 5.005_02 18−Oct−1998

perlxstut Perl Programmers Reference Guide perlxstut

WHAT‘S NEW HERE?

Two things are new here. First, we‘ve made some changes to Makefile.PL. In this case, we‘ve specified an

extra library to link in, the math library libm. We‘ll talk later about how to write XSUBs that can call every

routine in a library.

Second, the value of the function is being passed back not as the function‘s return value, but through the

same variable that was passed into the function.

INPUT AND OUTPUT PARAMETERS

You specify the parameters that will be passed into the XSUB just after you declare the function return value

and name. Each parameter line starts with optional white space, and may have an optional terminating

semicolon.

The list of output parameters occurs after the OUTPUT: directive. The use of RETVAL tells Perl that you

wish to send this value back as the return value of the XSUB function. In Example 3, the value we wanted

returned was contained in the same variable we passed in, so we listed it (and not RETVAL) in the

OUTPUT: section.

THE XSUBPP COMPILER

The compiler xsubpp takes the XS code in the .xs file and converts it into C code, placing it in a file whose

suffix is .c. The C code created makes heavy use of the C functions within Perl.

THE TYPEMAP FILE

The xsubpp compiler uses rules to convert from Perl‘s data types (scalar, array, etc.) to C‘s data types (int,

char *, etc.). These rules are stored in the typemap file ($PERLLIB/ExtUtils/typemap). This file is

split into three parts.

The first part attempts to map various C data types to a coded flag, which has some correspondence with the

various Perl types. The second part contains C code which xsubpp uses for input parameters. The third part

contains C code which xsubpp uses for output parameters. We‘ll talk more about the C code later.

Let‘s now take a look at a portion of the .c file created for our extension.

XS(XS_Mytest_round)

{

dXSARGS;

if (items != 1)

croak("Usage: Mytest::round(arg)");

{

double arg = (double)SvNV(ST(0)); /* XXXXX */

if (arg > 0.0) {

arg = floor(arg + 0.5);

} else if (arg < 0.0) {

arg = ceil(arg − 0.5);

} else {

arg = 0.0;

}

sv_setnv(ST(0), (double)arg); /* XXXXX */

}

XSRETURN(1);

}

Notice the two lines marked with "XXXXX". If you check the first section of the typemap file, you‘ll see

that doubles are of type T_DOUBLE. In the INPUT section, an argument that is T_DOUBLE is assigned to

the variable arg by calling the routine SvNV on something, then casting it to double, then assigned to the

variable arg. Similarly, in the OUTPUT section, once arg has its final value, it is passed to the sv_setnv

function to be passed back to the calling subroutine. These two functions are explained in perlguts; we‘ll

talk more later about what that "ST(0)" means in the section on the argument stack.

18−Oct−1998 Version 5.005_02 495

perlxstut Perl Programmers Reference Guide perlxstut

WARNING

In general, it‘s not a good idea to write extensions that modify their input parameters, as in Example 3.

However, to accommodate better calling pre−existing C routines, which often do modify their input

parameters, this behavior is tolerated. The next example will show how to do this.

EXAMPLE 4

In this example, we‘ll now begin to write XSUBs that will interact with predefined C libraries. To begin

with, we will build a small library of our own, then let h2xs write our .pm and .xs files for us.

Create a new directory called Mytest2 at the same level as the directory Mytest. In the Mytest2 directory,

create another directory called mylib, and cd into that directory.

Here we‘ll create some files that will generate a test library. These will include a C source file and a header

file. We‘ll also create a Makefile.PL in this directory. Then we‘ll make sure that running make at the

Mytest2 level will automatically run this Makefile.PL file and the resulting Makefile.

In the testlib directory, create a file mylib.h that looks like this:

#define TESTVAL 4

extern double foo(int, long, const char*);

Also create a file mylib.c that looks like this:

#include <stdlib.h>

#include "./mylib.h"

double

foo(a, b, c)

int a;

long b;

const char * c;

{

return (a + b + atof(c) + TESTVAL);

}

And finally create a file Makefile.PL that looks like this:

use ExtUtils::MakeMaker;

$Verbose = 1;

WriteMakefile(

NAME => ’Mytest2::mylib’,

SKIP => [qw(all static static_lib dynamic dynamic_lib)],

clean => {’FILES’ => ’libmylib$(LIB_EXT)’},

);

sub MY::top_targets {

’

all :: static

static :: libmylib$(LIB_EXT)

libmylib$(LIB_EXT): $(O_FILES)

$(AR) cr libmylib$(LIB_EXT) $(O_FILES)

$(RANLIB) libmylib$(LIB_EXT)

’;

}

We will now create the main top−level Mytest2 files. Change to the directory above Mytest2 and run the

following command:

496 Version 5.005_02 18−Oct−1998

perlxstut Perl Programmers Reference Guide perlxstut

% h2xs −O −n Mytest2 ./Mytest2/mylib/mylib.h

This will print out a warning about overwriting Mytest2, but that‘s okay. Our files are stored in

Mytest2/mylib, and will be untouched.

The normal Makefile.PL that h2xs generates doesn‘t know about the mylib directory. We need to tell it that

there is a subdirectory and that we will be generating a library in it. Let‘s add the following key−value pair

to the WriteMakefile call:

’MYEXTLIB’ => ’mylib/libmylib$(LIB_EXT)’,

and a new replacement subroutine too:

sub MY::postamble {

’

$(MYEXTLIB): mylib/Makefile

cd mylib && $(MAKE) $(PASTHRU)

’;

}

(Note: Most makes will require that there be a tab character that indents the line cd mylib && $(MAKE)

$(PASTHRU), similarly for the Makefile in the subdirectory.)

Let‘s also fix the MANIFEST file so that it accurately reflects the contents of our extension. The single line

that says "mylib" should be replaced by the following three lines:

mylib/Makefile.PL

mylib/mylib.c

mylib/mylib.h

To keep our namespace nice and unpolluted, edit the .pm file and change the lines setting @EXPORT to

@EXPORT_OK (there are two: one in the line beginning "use vars" and one setting the array itself).

Finally, in the .xs file, edit the #include line to read:

#include "mylib/mylib.h"

And also add the following function definition to the end of the .xs file:

double

foo(a,b,c)

int a

long b

const char * c

OUTPUT:

RETVAL

Now we also need to create a typemap file because the default Perl doesn‘t currently support the const char *

type. Create a file called typemap and place the following in it:

const char * T_PV

Now run perl on the top−level Makefile.PL. Notice that it also created a Makefile in the mylib directory.

Run make and see that it does cd into the mylib directory and run make in there as well.

Now edit the test.pl script and change the BEGIN block to print "1..4", and add the following lines to the end

of the script:

print &Mytest2::foo(1, 2, "Hello, world!") == 7 ? "ok 2\n" : "not ok 2\n";

print &Mytest2::foo(1, 2, "0.0") == 7 ? "ok 3\n" : "not ok 3\n";

print abs(&Mytest2::foo(0, 0, "−3.4") − 0.6) <= 0.01 ? "ok 4\n" : "not ok 4\n

(When dealing with floating−point comparisons, it is often useful not to check for equality, but rather the

difference being below a certain epsilon factor, 0.01 in this case)

18−Oct−1998 Version 5.005_02 497

perlxstut Perl Programmers Reference Guide perlxstut

Run "make test" and all should be well.

WHAT HAS HAPPENED HERE?

Unlike previous examples, we‘ve now run h2xs on a real include file. This has caused some extra goodies to

appear in both the .pm and .xs files.

In the .xs file, there‘s now a #include declaration with the full path to the mylib.h header file.

There‘s now some new C code that‘s been added to the .xs file. The purpose of the constant

routine is to make the values that are #define‘d in the header file available to the Perl script (in this

case, by calling &main::TESTVAL). There‘s also some XS code to allow calls to the constant

routine.

The .pm file has exported the name TESTVAL in the @EXPORT array. This could lead to name

clashes. A good rule of thumb is that if the #define is going to be used by only the C routines

themselves, and not by the user, they should be removed from the @EXPORT array. Alternately, if

you don‘t mind using the "fully qualified name" of a variable, you could remove most or all of the

items in the @EXPORT array.

If our include file contained #include directives, these would not be processed at all by h2xs. There is

no good solution to this right now.

We‘ve also told Perl about the library that we built in the mylib subdirectory. That required the addition of

only the MYEXTLIB variable to the WriteMakefile call and the replacement of the postamble subroutine to

cd into the subdirectory and run make. The Makefile.PL for the library is a bit more complicated, but not

excessively so. Again we replaced the postamble subroutine to insert our own code. This code specified

simply that the library to be created here was a static archive (as opposed to a dynamically loadable library)

and provided the commands to build it.

SPECIFYING ARGUMENTS TO XSUBPP

With the completion of Example 4, we now have an easy way to simulate some real−life libraries whose

interfaces may not be the cleanest in the world. We shall now continue with a discussion of the arguments

passed to the xsubpp compiler.

When you specify arguments in the .xs file, you are really passing three pieces of information for each one

listed. The first piece is the order of that argument relative to the others (first, second, etc). The second is

the type of argument, and consists of the type declaration of the argument (e.g., int, char*, etc). The third

piece is the exact way in which the argument should be used in the call to the library function from this

XSUB. This would mean whether or not to place a "&" before the argument or not, meaning the argument

expects to be passed the address of the specified data type.

There is a difference between the two arguments in this hypothetical function:

int

foo(a,b)

char &a

char * b

The first argument to this function would be treated as a char and assigned to the variable a, and its address

would be passed into the function foo. The second argument would be treated as a string pointer and

assigned to the variable b. The value of b would be passed into the function foo. The actual call to the

function foo that xsubpp generates would look like this:

foo(&a, b);

Xsubpp will identically parse the following function argument lists:

char &a

char&a

char & a

498 Version 5.005_02 18−Oct−1998

perlxstut Perl Programmers Reference Guide perlxstut

However, to help ease understanding, it is suggested that you place a "&" next to the variable name and away

from the variable type), and place a "*" near the variable type, but away from the variable name (as in the

complete example above). By doing so, it is easy to understand exactly what will be passed to the C function

— it will be whatever is in the "last column".

You should take great pains to try to pass the function the type of variable it wants, when possible. It will

save you a lot of trouble in the long run.

THE ARGUMENT STACK

If we look at any of the C code generated by any of the examples except example 1, you will notice a number

of references to ST(n), where n is usually 0. The "ST" is actually a macro that points to the n‘th argument on

the argument stack. ST(0) is thus the first argument passed to the XSUB, ST(1) is the second argument, and

so on.

When you list the arguments to the XSUB in the .xs file, that tells xsubpp which argument corresponds to

which of the argument stack (i.e., the first one listed is the first argument, and so on). You invite disaster if

you do not list them in the same order as the function expects them.

EXTENDING YOUR EXTENSION

Sometimes you might want to provide some extra methods or subroutines to assist in making the interface

between Perl and your extension simpler or easier to understand. These routines should live in the .pm file.

Whether they are automatically loaded when the extension itself is loaded or loaded only when called

depends on where in the .pm file the subroutine definition is placed.

DOCUMENTING YOUR EXTENSION

There is absolutely no excuse for not documenting your extension. Documentation belongs in the .pm file.

This file will be fed to pod2man, and the embedded documentation will be converted to the manpage format,

then placed in the blib directory. It will be copied to Perl‘s man page directory when the extension is

installed.

You may intersperse documentation and Perl code within the .pm file. In fact, if you want to use method

autoloading, you must do this, as the comment inside the .pm file explains.

See perlpod for more information about the pod format.

INSTALLING YOUR EXTENSION

Once your extension is complete and passes all its tests, installing it is quite simple: you simply run "make

install". You will either need to have write permission into the directories where Perl is installed, or ask your

system administrator to run the make for you.

SEE ALSO

For more information, consult perlguts, perlxs, perlmod, and perlpod.

Author

Jeff Okamoto <okamoto@corp.hp.com

Reviewed and assisted by Dean Roehrich, Ilya Zakharevich, Andreas Koenig, and Tim Bunce.

Last Changed

1996/7/10

18−Oct−1998 Version 5.005_02 499

perlguts Perl Programmers Reference Guide perlguts

NAME

perlguts − Perl‘s Internal Functions

DESCRIPTION

This document attempts to describe some of the internal functions of the Perl executable. It is far from

complete and probably contains many errors. Please refer any questions or comments to the author below.

Variables

Datatypes

Perl has three typedefs that handle Perl‘s three main data types:

SV Scalar Value

AV Array Value

HV Hash Value

Each typedef has specific routines that manipulate the various data types.

What is an "IV"?

Perl uses a special typedef IV which is a simple integer type that is guaranteed to be large enough to hold a

pointer (as well as an integer).

Perl also uses two special typedefs, I32 and I16, which will always be at least 32−bits and 16−bits long,

respectively.

Working with SVs

An SV can be created and loaded with one command. There are four types of values that can be loaded: an

integer value (IV), a double (NV), a string, (PV), and another scalar (SV).

The six routines are:

SV* newSViv(IV);

SV* newSVnv(double);

SV* newSVpv(char*, int);

SV* newSVpvn(char*, int);

SV* newSVpvf(const char*, ...);

SV* newSVsv(SV*);

To change the value of an *already−existing* SV, there are seven routines:

void sv_setiv(SV*, IV);

void sv_setuv(SV*, UV);

void sv_setnv(SV*, double);

void sv_setpv(SV*, char*);

void sv_setpvn(SV*, char*, int)

void sv_setpvf(SV*, const char*, ...);

void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);

void sv_setsv(SV*, SV*);

Notice that you can choose to specify the length of the string to be assigned by using sv_setpvn,

newSVpvn, or newSVpv, or you may allow Perl to calculate the length by using sv_setpv or by

specifying 0 as the second argument to newSVpv. Be warned, though, that Perl will determine the string‘s

length by using strlen, which depends on the string terminating with a NUL character.

The arguments of sv_setpvf are processed like sprintf, and the formatted output becomes the value.

sv_setpvfn is an analogue of vsprintf, but it allows you to specify either a pointer to a variable

argument list or the address and length of an array of SVs. The last argument points to a boolean; on return,

if that boolean is true, then locale−specific information has been used to format the string, and the string‘s

contents are therefore untrustworty (see perlsec). This pointer may be NULL if that information is not

important. Note that this function requires you to specify the length of the format.

500 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

The sv_set*() functions are not generic enough to operate on values that have "magic". See

Magic Virtual Tables later in this document.

All SVs that contain strings should be terminated with a NUL character. If it is not NUL−terminated there is

a risk of core dumps and corruptions from code which passes the string to C functions or system calls which

expect a NUL−terminated string. Perl‘s own functions typically add a trailing NUL for this reason.

Nevertheless, you should be very careful when you pass a string stored in an SV to a C function or system

call.

To access the actual value that an SV points to, you can use the macros:

SvIV(SV*)

SvNV(SV*)

SvPV(SV*, STRLEN len)

which will automatically coerce the actual scalar type into an IV, double, or string.

In the SvPV macro, the length of the string returned is placed into the variable len (this is a macro, so you

do not use &len). If you do not care what the length of the data is, use the global variable PL_na.

Remember, however, that Perl allows arbitrary strings of data that may both contain NULs and might not be

terminated by a NUL.

If you want to know if the scalar value is TRUE, you can use:

SvTRUE(SV*)

Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for

your SV, you can use the macro

SvGROW(SV*, STRLEN newlen)

which will determine if more memory needs to be allocated. If so, it will call the function sv_grow. Note

that SvGROW can only increase, not decrease, the allocated memory of an SV and that it does not

automatically add a byte for the a trailing NUL (perl‘s own string functions typically do SvGROW(sv,

len + 1)).

If you have an SV and want to know what kind of data Perl thinks is stored in it, you can use the following

macros to check the type of SV you have.

SvIOK(SV*)

SvNOK(SV*)

SvPOK(SV*)

You can get and set the current length of the string stored in an SV with the following macros:

SvCUR(SV*)

SvCUR_set(SV*, I32 val)

You can also get a pointer to the end of the string stored in the SV with the macro:

SvEND(SV*)

But note that these last three macros are valid only if SvPOK() is true.

If you want to append something to the end of string stored in an SV*, you can use the following functions:

void sv_catpv(SV*, char*);

void sv_catpvn(SV*, char*, int);

void sv_catpvf(SV*, const char*, ...);

void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);

void sv_catsv(SV*, SV*);

The first function calculates the length of the string to be appended by using strlen. In the second, you

specify the length of the string yourself. The third function processes its arguments like sprintf and

18−Oct−1998 Version 5.005_02 501

perlguts Perl Programmers Reference Guide perlguts

appends the formatted output. The fourth function works like vsprintf. You can specify the address and

length of an array of SVs instead of the va_list argument. The fifth function extends the string stored in the

first SV with the string stored in the second SV. It also forces the second SV to be interpreted as a string.

The sv_cat*() functions are not generic enough to operate on values that have "magic". See

Magic Virtual Tables later in this document.

If you know the name of a scalar variable, you can get a pointer to its SV by using the following:

SV* perl_get_sv("package::varname", FALSE);

This returns NULL if the variable does not exist.

If you want to know if this variable (or any other SV) is actually defined, you can call:

SvOK(SV*)

The scalar undef value is stored in an SV instance called PL_sv_undef. Its address can be used

whenever an SV* is needed.

There are also the two values PL_sv_yes and PL_sv_no, which contain Boolean TRUE and FALSE

values, respectively. Like PL_sv_undef, their addresses can be used whenever an SV* is needed.

Do not be fooled into thinking that (SV *) 0 is the same as &PL_sv_undef. Take this code:

SV* sv = (SV*) 0;

if (I−am−to−return−a−real−value) {

sv = sv_2mortal(newSViv(42));

}

sv_setsv(ST(0), sv);

This code tries to return a new SV (which contains the value 42) if it should return a real value, or undef

otherwise. Instead it has returned a NULL pointer which, somewhere down the line, will cause a

segmentation violation, bus error, or just weird results. Change the zero to &PL_sv_undef in the first line

and all will be well.

To free an SV that you‘ve created, call SvREFCNT_dec(SV*). Normally this call is not necessary (see

Reference Counts and Mortality).

What‘s Really Stored in an SV?

Recall that the usual method of determining the type of scalar you have is to use Sv*OK macros. Because a

scalar can be both a number and a string, usually these macros will always return TRUE and calling the

Sv*V macros will do the appropriate conversion of string to integer/double or integer/double to string.

If you really need to know if you have an integer, double, or string pointer in an SV, you can use the

following three macros instead:

SvIOKp(SV*)

SvNOKp(SV*)

SvPOKp(SV*)

These will tell you if you truly have an integer, double, or string pointer stored in your SV. The "p" stands

for private.

In general, though, it‘s best to use the Sv*V macros.

Working with AVs

There are two ways to create and load an AV. The first method creates an empty AV:

AV* newAV();

The second method both creates the AV and initially populates it with SVs:

AV* av_make(I32 num, SV **ptr);

502 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

The second argument points to an array containing num SV*‘s. Once the AV has been created, the SVs can

be destroyed, if so desired.

Once the AV has been created, the following operations are possible on AVs:

void av_push(AV*, SV*);

SV* av_pop(AV*);

SV* av_shift(AV*);

void av_unshift(AV*, I32 num);

These should be familiar operations, with the exception of av_unshift. This routine adds num elements

at the front of the array with the undef value. You must then use av_store (described below) to assign

values to these new elements.

Here are some other functions:

I32 av_len(AV*);

SV** av_fetch(AV*, I32 key, I32 lval);

SV** av_store(AV*, I32 key, SV* val);

The av_len function returns the highest index value in array (just like $#array in Perl). If the array is

empty, −1 is returned. The av_fetch function returns the value at index key, but if lval is non−zero,

then av_fetch will store an undef value at that index. The av_store function stores the value val at

index key, and does not increment the reference count of val. Thus the caller is responsible for taking care

of that, and if av_store returns NULL, the caller will have to decrement the reference count to avoid a

memory leak. Note that av_fetch and av_store both return SV**‘s, not SV*‘s as their return value.

void av_clear(AV*);

void av_undef(AV*);

void av_extend(AV*, I32 key);

The av_clear function deletes all the elements in the AV* array, but does not actually delete the array

itself. The av_undef function will delete all the elements in the array plus the array itself. The

av_extend function extends the array so that it contains key elements. If key is less than the current

length of the array, then nothing is done.

If you know the name of an array variable, you can get a pointer to its AV by using the following:

AV* perl_get_av("package::varname", FALSE);

This returns NULL if the variable does not exist.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use the array

access functions on tied arrays.

Working with HVs

To create an HV, you use the following routine:

HV* newHV();

Once the HV has been created, the following operations are possible on HVs:

SV** hv_store(HV*, char* key, U32 klen, SV* val, U32 hash);

SV** hv_fetch(HV*, char* key, U32 klen, I32 lval);

The klen parameter is the length of the key being passed in (Note that you cannot pass 0 in as a value of

klen to tell Perl to measure the length of the key). The val argument contains the SV pointer to the scalar

being stored, and hash is the precomputed hash value (zero if you want hv_store to calculate it for you).

The lval parameter indicates whether this fetch is actually a part of a store operation, in which case a new

undefined value will be added to the HV with the supplied key and hv_fetch will return as if the value had

already existed.

Remember that hv_store and hv_fetch return SV**‘s and not just SV*. To access the scalar value,

18−Oct−1998 Version 5.005_02 503

perlguts Perl Programmers Reference Guide perlguts

you must first dereference the return value. However, you should check to make sure that the return value is

not NULL before dereferencing it.

These two functions check if a hash table entry exists, and deletes it.

bool hv_exists(HV*, char* key, U32 klen);

SV* hv_delete(HV*, char* key, U32 klen, I32 flags);

If flags does not include the G_DISCARD flag then hv_delete will create and return a mortal copy of

the deleted value.

And more miscellaneous functions:

void hv_clear(HV*);

void hv_undef(HV*);

Like their AV counterparts, hv_clear deletes all the entries in the hash table but does not actually delete

the hash table. The hv_undef deletes both the entries and the hash table itself.

Perl keeps the actual data in linked list of structures with a typedef of HE. These contain the actual key and

value pointers (plus extra administrative overhead). The key is a string pointer; the value is an SV*.

However, once you have an HE*, to get the actual key and value, use the routines specified below.

I32 hv_iterinit(HV*);

/* Prepares starting point to traverse hash table */

HE* hv_iternext(HV*);

/* Get the next entry, and return a pointer to a

structure that has both the key and value */

char* hv_iterkey(HE* entry, I32* retlen);

/* Get the key from an HE structure and also return

the length of the key string */

SV* hv_iterval(HV*, HE* entry);

/* Return a SV pointer to the value of the HE

structure */

SV* hv_iternextsv(HV*, char** key, I32* retlen);

/* This convenience routine combines hv_iternext,

hv_iterkey, and hv_iterval. The key and retlen

arguments are return values for the key and its

length. The value is returned in the SV* argument */

If you know the name of a hash variable, you can get a pointer to its HV by using the following:

HV* perl_get_hv("package::varname", FALSE);

This returns NULL if the variable does not exist.

The hash algorithm is defined in the PERL_HASH(hash, key, klen) macro:

i = klen;

hash = 0;

s = key;

while (i−−)

hash = hash * 33 + *s++;

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use the hash

access functions on tied hashes.

Hash API Extensions

Beginning with version 5.004, the following functions are also supported:

HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash);

HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash);

504 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

bool hv_exists_ent (HV* tb, SV* key, U32 hash);

SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);

SV* hv_iterkeysv (HE* entry);

Note that these functions take SV* keys, which simplifies writing of extension code that deals with hash

structures. These functions also allow passing of SV* keys to tie functions without forcing you to stringify

the keys (unlike the previous set of functions).

They also return and accept whole hash entries (HE*), making their use more efficient (since the hash

number for a particular string doesn‘t have to be recomputed every time). See API LISTING later in this

document for detailed descriptions.

The following macros must always be used to access the contents of hash entries. Note that the arguments to

these macros must be simple variables, since they may get evaluated more than once. See API LISTING later

in this document for detailed descriptions of these macros.

HePV(HE* he, STRLEN len)

HeVAL(HE* he)

HeHASH(HE* he)

HeSVKEY(HE* he)

HeSVKEY_force(HE* he)

HeSVKEY_set(HE* he, SV* sv)

These two lower level macros are defined, but must only be used when dealing with keys that are not SV*s:

HeKEY(HE* he)

HeKLEN(HE* he)

Note that both hv_store and hv_store_ent do not increment the reference count of the stored val,

which is the caller‘s responsibility. If these functions return a NULL value, the caller will usually have to

decrement the reference count of val to avoid a memory leak.

References

References are a special type of scalar that point to other data types (including references).

To create a reference, use either of the following functions:

SV* newRV_inc((SV*) thing);

SV* newRV_noinc((SV*) thing);

The thing argument can be any of an SV*, AV*, or HV*. The functions are identical except that

newRV_inc increments the reference count of the thing, while newRV_noinc does not. For historical

reasons, newRV is a synonym for newRV_inc.

Once you have a reference, you can use the following macro to dereference the reference:

SvRV(SV*)

then call the appropriate routines, casting the returned SV* to either an AV* or HV*, if required.

To determine if an SV is a reference, you can use the following macro:

SvROK(SV*)

To discover what type of value the reference refers to, use the following macro and then check the return

value.

SvTYPE(SvRV(SV*))

The most useful types that will be returned are:

SVt_IV Scalar

SVt_NV Scalar

SVt_PV Scalar

18−Oct−1998 Version 5.005_02 505

perlguts Perl Programmers Reference Guide perlguts

SVt_RV Scalar

SVt_PVAV Array

SVt_PVHV Hash

SVt_PVCV Code

SVt_PVGV Glob (possible a file handle)

SVt_PVMG Blessed or Magical Scalar

See the sv.h header file for more details.

Blessed References and Class Objects

References are also used to support object−oriented programming. In the OO lexicon, an object is simply a

reference that has been blessed into a package (or class). Once blessed, the programmer may now use the

reference to access the various methods in the class.

A reference can be blessed into a package with the following function:

SV* sv_bless(SV* sv, HV* stash);

The sv argument must be a reference. The stash argument specifies which class the reference will belong

to. See Stashes and Globs for information on converting class names into stashes.

/* Still under construction */

Upgrades rv to reference if not already one. Creates new SV for rv to point to. If classname is non−null,

the SV is blessed into the specified class. SV is returned.

SV* newSVrv(SV* rv, char* classname);

Copies integer or double into an SV whose reference is rv. SV is blessed if classname is non−null.

SV* sv_setref_iv(SV* rv, char* classname, IV iv);

SV* sv_setref_nv(SV* rv, char* classname, NV iv);

Copies the pointer value (the address, not the string!) into an SV whose reference is rv. SV is blessed if

classname is non−null.

SV* sv_setref_pv(SV* rv, char* classname, PV iv);

Copies string into an SV whose reference is rv. Set length to 0 to let Perl calculate the string length. SV is

blessed if classname is non−null.

SV* sv_setref_pvn(SV* rv, char* classname, PV iv, int length);

Tests whether the SV is blessed into the specified class. It does not check inheritance relationships.

int sv_isa(SV* sv, char* name);

Tests whether the SV is a reference to a blessed object.

int sv_isobject(SV* sv);

Tests whether the SV is derived from the specified class. SV can be either a reference to a blessed object or a

string containing a class name. This is the function implementing the UNIVERSAL::isa functionality.

bool sv_derived_from(SV* sv, char* name);

To check if you‘ve got an object derived from a specific class you have to write:

if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }

Creating New Variables

To create a new Perl variable with an undef value which can be accessed from your Perl script, use the

following routines, depending on the variable type.

SV* perl_get_sv("package::varname", TRUE);

AV* perl_get_av("package::varname", TRUE);

506 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

HV* perl_get_hv("package::varname", TRUE);

Notice the use of TRUE as the second parameter. The new variable can now be set, using the routines

appropriate to the data type.

There are additional macros whose values may be bitwise OR‘ed with the TRUE argument to enable certain

extra features. Those bits are:

GV_ADDMULTI Marks the variable as multiply defined, thus preventing the

"Name <varname> used only once: possible typo" warning.

GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if

the variable did not exist before the function was called.

If you do not specify a package name, the variable is created in the current package.

Reference Counts and Mortality

Perl uses an reference count−driven garbage collection mechanism. SVs, AVs, or HVs (xV for short in the

following) start their life with a reference count of 1. If the reference count of an xV ever drops to 0, then it

will be destroyed and its memory made available for reuse.

This normally doesn‘t happen at the Perl level unless a variable is undef‘ed or the last variable holding a

reference to it is changed or overwritten. At the internal level, however, reference counts can be manipulated

with the following macros:

int SvREFCNT(SV* sv);

SV* SvREFCNT_inc(SV* sv);

void SvREFCNT_dec(SV* sv);

However, there is one other function which manipulates the reference count of its argument. The

newRV_inc function, you will recall, creates a reference to the specified argument. As a side effect, it

increments the argument‘s reference count. If this is not what you want, use newRV_noinc instead.

For example, imagine you want to return a reference from an XSUB function. Inside the XSUB routine, you

create an SV which initially has a reference count of one. Then you call newRV_inc, passing it the

just−created SV. This returns the reference as a new SV, but the reference count of the SV you passed to

newRV_inc has been incremented to two. Now you return the reference from the XSUB routine and forget

about the SV. But Perl hasn‘t! Whenever the returned reference is destroyed, the reference count of the

original SV is decreased to one and nothing happens. The SV will hang around without any way to access it

until Perl itself terminates. This is a memory leak.

The correct procedure, then, is to use newRV_noinc instead of newRV_inc. Then, if and when the last

reference is destroyed, the reference count of the SV will go to zero and it will be destroyed, stopping any

memory leak.

There are some convenience functions available that can help with the destruction of xVs. These functions

introduce the concept of "mortality". An xV that is mortal has had its reference count marked to be

decremented, but not actually decremented, until "a short time later". Generally the term "short time later"

means a single Perl statement, such as a call to an XSUB function. The actual determinant for when mortal

xVs have their reference count decremented depends on two macros, SAVETMPS and FREETMPS. See

perlcall and perlxs for more details on these macros.

"Mortalization" then is at its simplest a deferred SvREFCNT_dec. However, if you mortalize a variable

twice, the reference count will later be decremented twice.

You should be careful about creating mortal variables. Strange things can happen if you make the same

value mortal within multiple contexts, or if you make a variable mortal multiple times.

To create a mortal variable, use the functions:

SV* sv_newmortal()

SV* sv_2mortal(SV*)

SV* sv_mortalcopy(SV*)

18−Oct−1998 Version 5.005_02 507

perlguts Perl Programmers Reference Guide perlguts

The first call creates a mortal SV, the second converts an existing SV to a mortal SV (and thus defers a call

to SvREFCNT_dec), and the third creates a mortal copy of an existing SV.

The mortal routines are not just for SVs — AVs and HVs can be made mortal by passing their address

(type−casted to SV*) to the sv_2mortal or sv_mortalcopy routines.

Stashes and Globs

A "stash" is a hash that contains all of the different objects that are contained within a package. Each key of

the stash is a symbol name (shared by all the different types of objects that have the same name), and each

value in the hash table is a GV (Glob Value). This GV in turn contains references to the various objects of

that name, including (but not limited to) the following:

Scalar Value

Array Value

Hash Value

I/O Handle

Format

Subroutine

There is a single stash called "PL_defstash" that holds the items that exist in the "main" package. To get at

the items in other packages, append the string "::" to the package name. The items in the "Foo" package are

in the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are in the stash "Baz::" in "Bar::"‘s

stash.

To get the stash pointer for a particular package, use the function:

HV* gv_stashpv(char* name, I32 create)

HV* gv_stashsv(SV*, I32 create)

The first function takes a literal string, the second uses the string stored in the SV. Remember that a stash is

just a hash table, so you get back an HV*. The create flag will create a new package if it is set.

The name that gv_stash*v wants is the name of the package whose symbol table you want. The default

package is called main. If you have multiply nested packages, pass their names to gv_stash*v,

separated by :: as in the Perl language itself.

Alternately, if you have an SV that is a blessed reference, you can find out the stash pointer by using:

HV* SvSTASH(SvRV(SV*));

then use the following to get the package name itself:

char* HvNAME(HV* stash);

If you need to bless or re−bless an object you can use the following function:

SV* sv_bless(SV*, HV* stash)

where the first argument, an SV*, must be a reference, and the second argument is a stash. The returned

SV* can now be used in the same way as any other SV.

For more information on references and blessings, consult perlref.

Double−Typed SVs

Scalar variables normally contain only one type of value, an integer, double, pointer, or reference. Perl will

automatically convert the actual scalar data from the stored type into the requested type.

Some scalar variables contain more than one type of scalar data. For example, the variable $! contains

either the numeric value of errno or its string equivalent from either strerror or sys_errlist[].

To force multiple data values into an SV, you must do two things: use the sv_set*v routines to add the

additional scalar type, then set a flag so that Perl will believe it contains more than one type of data. The

four macros to set the flags are:

508 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

SvIOK_on

SvNOK_on

SvPOK_on

SvROK_on

The particular macro you must use depends on which sv_set*v routine you called first. This is because

every sv_set*v routine turns on only the bit for the particular type of data being set, and turns off all the

rest.

For example, to create a new Perl variable called "dberror" that contains both the numeric and descriptive

string error values, you could use the following code:

extern int dberror;

extern char *dberror_list;

SV* sv = perl_get_sv("dberror", TRUE);

sv_setiv(sv, (IV) dberror);

sv_setpv(sv, dberror_list[dberror]);

SvIOK_on(sv);

If the order of sv_setiv and sv_setpv had been reversed, then the macro SvPOK_on would need to be

called instead of SvIOK_on.

Magic Variables

[This section still under construction. Ignore everything here. Post no bills. Everything not permitted is

forbidden.]

Any SV may be magical, that is, it has special features that a normal SV does not have. These features are

stored in the SV structure in a linked list of struct magic‘s, typedef‘ed to MAGIC.

struct magic {

MAGIC* mg_moremagic;

MGVTBL* mg_virtual;

U16 mg_private;

char mg_type;

U8 mg_flags;

SV* mg_obj;

char* mg_ptr;

I32 mg_len;

};

Note this is current as of patchlevel 0, and could change at any time.

Assigning Magic

Perl adds magic to an SV using the sv_magic function:

void sv_magic(SV* sv, SV* obj, int how, char* name, I32 namlen);

The sv argument is a pointer to the SV that is to acquire a new magical feature.

If sv is not already magical, Perl uses the SvUPGRADE macro to set the SVt_PVMG flag for the sv. Perl

then continues by adding it to the beginning of the linked list of magical features. Any prior entry of the

same type of magic is deleted. Note that this can be overridden, and multiple instances of the same type of

magic can be associated with an SV.

The name and namlen arguments are used to associate a string with the magic, typically the name of a

variable. namlen is stored in the mg_len field and if name is non−null and namlen = 0 a malloc‘d copy

of the name is stored in mg_ptr field.

The sv_magic function uses how to determine which, if any, predefined "Magic Virtual Table" should be

assigned to the mg_virtual field. See the "Magic Virtual Table" section below. The how argument is

also stored in the mg_type field.

18−Oct−1998 Version 5.005_02 509

perlguts Perl Programmers Reference Guide perlguts

The obj argument is stored in the mg_obj field of the MAGIC structure. If it is not the same as the sv

argument, the reference count of the obj object is incremented. If it is the same, or if the how argument is

"#", or if it is a NULL pointer, then obj is merely stored, without the reference count being incremented.

There is also a function to add magic to an HV:

void hv_magic(HV *hv, GV *gv, int how);

This simply calls sv_magic and coerces the gv argument into an SV.

To remove the magic from an SV, call the function sv_unmagic:

void sv_unmagic(SV *sv, int type);

The type argument should be equal to the how value when the SV was initially made magical.

Magic Virtual Tables

The mg_virtual field in the MAGIC structure is a pointer to a MGVTBL, which is a structure of function

pointers and stands for "Magic Virtual Table" to handle the various operations that might be applied to that

variable.

The MGVTBL has five pointers to the following routine types:

int (*svt_get)(SV* sv, MAGIC* mg);

int (*svt_set)(SV* sv, MAGIC* mg);

U32 (*svt_len)(SV* sv, MAGIC* mg);

int (*svt_clear)(SV* sv, MAGIC* mg);

int (*svt_free)(SV* sv, MAGIC* mg);

This MGVTBL structure is set at compile−time in perl.h and there are currently 19 types (or 21 with

overloading turned on). These different structures contain pointers to various routines that perform

additional actions depending on which function is being called.

Function pointer Action taken

−−−−−−−−−−−−−−−− −−−−−−−−−−−−

svt_get Do something after the value of the SV is retrieved.

svt_set Do something after the SV is assigned a value.

svt_len Report on the SV’s length.

svt_clear Clear something the SV represents.

svt_free Free any extra storage associated with the SV.

For instance, the MGVTBL structure called vtbl_sv (which corresponds to an mg_type of ‘\0’) contains:

{ magic_get, magic_set, magic_len, 0, 0 }

Thus, when an SV is determined to be magical and of type ‘\0‘, if a get operation is being performed, the

routine magic_get is called. All the various routines for the various magical types begin with magic_.

The current kinds of Magic Virtual Tables are:

mg_type MGVTBL Type of magic

−−−−−−− −−−−−− −−−−−−−−−−−−−−−−−−−−−−−−−−−−

\0 vtbl_sv Special scalar variable

A vtbl_amagic %OVERLOAD hash

a vtbl_amagicelem %OVERLOAD hash element

c (none) Holds overload table (AMT) on stash

B vtbl_bm Boyer−Moore (fast string search)

E vtbl_env %ENV hash

e vtbl_envelem %ENV hash element

f vtbl_fm Formline (’compiled’ format)

g vtbl_mglob m//g target / study()ed string

510 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

I vtbl_isa @ISA array

i vtbl_isaelem @ISA array element

k vtbl_nkeys scalar(keys()) lvalue

L (none) Debugger %_<filename

l vtbl_dbline Debugger %_<filename element

o vtbl_collxfrm Locale transformation

P vtbl_pack Tied array or hash

p vtbl_packelem Tied array or hash element

q vtbl_packelem Tied scalar or handle

S vtbl_sig %SIG hash

s vtbl_sigelem %SIG hash element

t vtbl_taint Taintedness

U vtbl_uvar Available for use by extensions

v vtbl_vec vec() lvalue

x vtbl_substr substr() lvalue

y vtbl_defelem Shadow "foreach" iterator variable /

smart parameter vivification

* vtbl_glob GV (typeglob)

# vtbl_arylen Array length ($#ary)

. vtbl_pos pos() lvalue

~ (none) Available for use by extensions

When an uppercase and lowercase letter both exist in the table, then the uppercase letter is used to represent

some kind of composite type (a list or a hash), and the lowercase letter is used to represent an element of that

composite type.

The ‘~’ and ‘U’ magic types are defined specifically for use by extensions and will not be used by perl itself.

Extensions can use ‘~’ magic to ‘attach’ private information to variables (typically objects). This is

especially useful because there is no way for normal perl code to corrupt this private information (unlike

using extra elements of a hash object).

Similarly, ‘U’ magic can be used much like tie() to call a C function any time a scalar‘s value is used or

changed. The MAGIC‘s mg_ptr field points to a ufuncs structure:

struct ufuncs {

I32 (*uf_val)(IV, SV*);

I32 (*uf_set)(IV, SV*);

IV uf_index;

};

When the SV is read from or written to, the uf_val or uf_set function will be called with uf_index as

the first arg and a pointer to the SV as the second.

Note that because multiple extensions may be using ‘~’ or ‘U’ magic, it is important for extensions to take

extra care to avoid conflict. Typically only using the magic on objects blessed into the same class as the

extension is sufficient. For ‘~’ magic, it may also be appropriate to add an I32 ‘signature’ at the top of the

private data area and check that.

Also note that the sv_set*() and sv_cat*() functions described earlier do not invoke ‘set’ magic on

their targets. This must be done by the user either by calling the SvSETMAGIC() macro after calling these

functions, or by using one of the sv_set*_mg() or sv_cat*_mg() functions. Similarly, generic C

code must call the SvGETMAGIC() macro to invoke any ‘get’ magic if they use an SV obtained from

external sources in functions that don‘t handle magic. API LISTING later in this document identifies such

functions. For example, calls to the sv_cat*() functions typically need to be followed by

SvSETMAGIC(), but they don‘t need a prior SvGETMAGIC() since their implementation handles ‘get’

magic.

18−Oct−1998 Version 5.005_02 511

perlguts Perl Programmers Reference Guide perlguts

Finding Magic

MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */

This routine returns a pointer to the MAGIC structure stored in the SV. If the SV does not have that magical

feature, NULL is returned. Also, if the SV is not of type SVt_PVMG, Perl may core dump.

int mg_copy(SV* sv, SV* nsv, char* key, STRLEN klen);

This routine checks to see what types of magic sv has. If the mg_type field is an uppercase letter, then the

mg_obj is copied to nsv, but the mg_type field is changed to be the lowercase letter.

Understanding the Magic of Tied Hashes and Arrays

Tied hashes and arrays are magical beasts of the ‘P’ magic type.

WARNING: As of the 5.004 release, proper usage of the array and hash access functions requires

understanding a few caveats. Some of these caveats are actually considered bugs in the API, to be fixed in

later releases, and are bracketed with [MAYCHANGE] below. If you find yourself actually applying such

information in this section, be aware that the behavior may change in the future, umm, without warning.

The av_store function, when given a tied array argument, merely copies the magic of the array onto the

value to be "stored", using mg_copy. It may also return NULL, indicating that the value did not actually

need to be stored in the array. [MAYCHANGE] After a call to av_store on a tied array, the caller will

usually need to call mg_set(val) to actually invoke the perl level "STORE" method on the TIEARRAY

object. If av_store did return NULL, a call to SvREFCNT_dec(val) will also be usually necessary to

avoid a memory leak. [/MAYCHANGE]

The previous paragraph is applicable verbatim to tied hash access using the hv_store and

hv_store_ent functions as well.

av_fetch and the corresponding hash functions hv_fetch and hv_fetch_ent actually return an

undefined mortal value whose magic has been initialized using mg_copy. Note the value so returned does

not need to be deallocated, as it is already mortal. [MAYCHANGE] But you will need to call mg_get()

on the returned value in order to actually invoke the perl level "FETCH" method on the underlying TIE

object. Similarly, you may also call mg_set() on the return value after possibly assigning a suitable value

to it using sv_setsv, which will invoke the "STORE" method on the TIE object. [/MAYCHANGE]

[MAYCHANGE] In other words, the array or hash fetch/store functions don‘t really fetch and store actual

values in the case of tied arrays and hashes. They merely call mg_copy to attach magic to the values that

were meant to be "stored" or "fetched". Later calls to mg_get and mg_set actually do the job of invoking

the TIE methods on the underlying objects. Thus the magic mechanism currently implements a kind of lazy

access to arrays and hashes.

Currently (as of perl version 5.004), use of the hash and array access functions requires the user to be aware

of whether they are operating on "normal" hashes and arrays, or on their tied variants. The API may be

changed to provide more transparent access to both tied and normal data types in future versions.

[/MAYCHANGE]

You would do well to understand that the TIEARRAY and TIEHASH interfaces are mere sugar to invoke

some perl method calls while using the uniform hash and array syntax. The use of this sugar imposes some

overhead (typically about two to four extra opcodes per FETCH/STORE operation, in addition to the

creation of all the mortal variables required to invoke the methods). This overhead will be comparatively

small if the TIE methods are themselves substantial, but if they are only a few statements long, the overhead

will not be insignificant.

Localizing changes

Perl has a very handy construction

{

local $var = 2;

...

512 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

}

This construction is approximately equivalent to

{

my $oldvar = $var;

$var = 2;

...

$var = $oldvar;

}

The biggest difference is that the first construction would reinstate the initial value of $var, irrespective of

how control exits the block: goto, return, die/eval etc. It is a little bit more efficient as well.

There is a way to achieve a similar task from C via Perl API: create a pseudo−block, and arrange for some

changes to be automatically undone at the end of it, either explicit, or via a non−local exit (via die()). A

block−like construct is created by a pair of ENTER/LEAVE macros (see

Returning a Scalar in perlcall/EXAMPLE). Such a construct may be created specially for some important

localized task, or an existing one (like boundaries of enclosing Perl subroutine/block, or an existing pair for

freeing TMPs) may be used. (In the second case the overhead of additional localization must be almost

negligible.) Note that any XSUB is automatically enclosed in an ENTER/LEAVE pair.

Inside such a pseudo−block the following service is available:

SAVEINT(int i)

SAVEIV(IV i)

SAVEI32(I32 i)

SAVELONG(long i)

These macros arrange things to restore the value of integer variable i at the end of enclosing

pseudo−block.

SAVESPTR(s)

SAVEPPTR(p)

These macros arrange things to restore the value of pointers s and p. s must be a pointer of a type

which survives conversion to SV* and back, p should be able to survive conversion to char* and

back.

SAVEFREESV(SV *sv)

The refcount of sv would be decremented at the end of pseudo−block. This is similar to

sv_2mortal, which should (?) be used instead.

SAVEFREEOP(OP *op)

The OP * is op_free()ed at the end of pseudo−block.

SAVEFREEPV(p)

The chunk of memory which is pointed to by p is Safefree()ed at the end of pseudo−block.

SAVECLEARSV(SV *sv)

Clears a slot in the current scratchpad which corresponds to sv at the end of pseudo−block.

SAVEDELETE(HV *hv, char *key, I32 length)

The key key of hv is deleted at the end of pseudo−block. The string pointed to by key is

Safefree()ed. If one has a key in short−lived storage, the corresponding string may be reallocated

like this:

SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));

SAVEDESTRUCTOR(f,p)

At the end of pseudo−block the function f is called with the only argument (of type void*) p.

18−Oct−1998 Version 5.005_02 513

perlguts Perl Programmers Reference Guide perlguts

SAVESTACK_POS()

The current offset on the Perl internal stack (cf. SP) is restored at the end of pseudo−block.

The following API list contains functions, thus one needs to provide pointers to the modifiable data

explicitly (either C pointers, or Perlish GV *s). Where the above macros take int, a similar function takes

int *.

SV* save_scalar(GV *gv)

Equivalent to Perl code local $gv.

AV* save_ary(GV *gv)

HV* save_hash(GV *gv)

Similar to save_scalar, but localize @gv and %gv.

void save_item(SV *item)

Duplicates the current value of SV, on the exit from the current ENTER/LEAVE pseudo−block will

restore the value of SV using the stored value.

void save_list(SV **sarg, I32 maxsarg)

A variant of save_item which takes multiple arguments via an array sarg of SV* of length

maxsarg.

SV* save_svref(SV **sptr)

Similar to save_scalar, but will reinstate a SV *.

void save_aptr(AV **aptr)

void save_hptr(HV **hptr)

Similar to save_svref, but localize AV * and HV *.

The Alias module implements localization of the basic types within the caller‘s scope. People who are

interested in how to localize things in the containing scope should take a look there too.

Subroutines

XSUBs and the Argument Stack

The XSUB mechanism is a simple way for Perl programs to access C subroutines. An XSUB routine will

have a stack that contains the arguments from the Perl program, and a way to map from the Perl data

structures to a C equivalent.

The stack arguments are accessible through the ST(n) macro, which returns the n‘th stack argument.

Argument 0 is the first argument passed in the Perl subroutine call. These arguments are SV*, and can be

used anywhere an SV* is used.

Most of the time, output from the C routine can be handled through use of the RETVAL and OUTPUT

directives. However, there are some cases where the argument stack is not already long enough to handle all

the return values. An example is the POSIX tzname() call, which takes no arguments, but returns two, the

local time zone‘s standard and summer time abbreviations.

To handle this situation, the PPCODE directive is used and the stack is extended using the macro:

EXTEND(SP, num);

where SP is the macro that represents the local copy of the stack pointer, and num is the number of elements

the stack should be extended by.

Now that there is room on the stack, values can be pushed on it using the macros to push IVs, doubles,

strings, and SV pointers respectively:

PUSHi(IV)

PUSHn(double)

PUSHp(char*, I32)

514 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

PUSHs(SV*)

And now the Perl program calling tzname, the two values will be assigned as in:

($standard_abbrev, $summer_abbrev) = POSIX::tzname;

An alternate (and possibly simpler) method to pushing values on the stack is to use the macros:

XPUSHi(IV)

XPUSHn(double)

XPUSHp(char*, I32)

XPUSHs(SV*)

These macros automatically adjust the stack for you, if needed. Thus, you do not need to call EXTEND to

extend the stack.

For more information, consult perlxs and perlxstut.

Calling Perl Routines from within C Programs

There are four routines that can be used to call a Perl subroutine from within a C program. These four are:

I32 perl_call_sv(SV*, I32);

I32 perl_call_pv(char*, I32);

I32 perl_call_method(char*, I32);

I32 perl_call_argv(char*, I32, register char**);

The routine most often used is perl_call_sv. The SV* argument contains either the name of the Perl

subroutine to be called, or a reference to the subroutine. The second argument consists of flags that control

the context in which the subroutine is called, whether or not the subroutine is being passed arguments, how

errors should be trapped, and how to treat return values.

All four routines return the number of arguments that the subroutine returned on the Perl stack.

When using any of these routines (except perl_call_argv), the programmer must manipulate the Perl

stack. These include the following macros and functions:

dSP

PUSHMARK()

PUTBACK

SPAGAIN

ENTER

SAVETMPS

FREETMPS

LEAVE

XPUSH*()

POP*()

For a detailed description of calling conventions from C to Perl, consult perlcall.

Memory Allocation

It is suggested that you use the version of malloc that is distributed with Perl. It keeps pools of various sizes

of unallocated memory in order to satisfy allocation requests more quickly. However, on some platforms, it

may cause spurious malloc or free errors.

New(x, pointer, number, type);

Newc(x, pointer, number, type, cast);

Newz(x, pointer, number, type);

These three macros are used to initially allocate memory.

The first argument x was a "magic cookie" that was used to keep track of who called the macro, to help

when debugging memory problems. However, the current code makes no use of this feature (most Perl

18−Oct−1998 Version 5.005_02 515

perlguts Perl Programmers Reference Guide perlguts

developers now use run−time memory checkers), so this argument can be any number.

The second argument pointer should be the name of a variable that will point to the newly allocated

memory.

The third and fourth arguments number and type specify how many of the specified type of data structure

should be allocated. The argument type is passed to sizeof. The final argument to Newc, cast, should

be used if the pointer argument is different from the type argument.

Unlike the New and Newc macros, the Newz macro calls memzero to zero out all the newly allocated

memory.

Renew(pointer, number, type);

Renewc(pointer, number, type, cast);

Safefree(pointer)

These three macros are used to change a memory buffer size or to free a piece of memory no longer needed.

The arguments to Renew and Renewc match those of New and Newc with the exception of not needing the

"magic cookie" argument.

Move(source, dest, number, type);

Copy(source, dest, number, type);

Zero(dest, number, type);

These three macros are used to move, copy, or zero out previously allocated memory. The source and

dest arguments point to the source and destination starting points. Perl will move, copy, or zero out

number instances of the size of the type data structure (using the sizeof function).

PerlIO

The most recent development releases of Perl has been experimenting with removing Perl‘s dependency on

the "normal" standard I/O suite and allowing other stdio implementations to be used. This involves creating

a new abstraction layer that then calls whichever implementation of stdio Perl was compiled with. All

XSUBs should now use the functions in the PerlIO abstraction layer and not make any assumptions about

what kind of stdio is being used.

For a complete description of the PerlIO abstraction, consult perlapio.

Putting a C value on Perl stack

A lot of opcodes (this is an elementary operation in the internal perl stack machine) put an SV* on the stack.

However, as an optimization the corresponding SV is (usually) not recreated each time. The opcodes reuse

specially assigned SVs (targets) which are (as a corollary) not constantly freed/created.

Each of the targets is created only once (but see Scratchpads and recursion below), and when an opcode

needs to put an integer, a double, or a string on stack, it just sets the corresponding parts of its target and puts

the target on stack.

The macro to put this target on stack is PUSHTARG, and it is directly used in some opcodes, as well as

indirectly in zillions of others, which use it via (X)PUSH[pni].

Scratchpads

The question remains on when the SVs which are targets for opcodes are created. The answer is that they are

created when the current unit — a subroutine or a file (for opcodes for statements outside of subroutines) —

is compiled. During this time a special anonymous Perl array is created, which is called a scratchpad for the

current unit.

A scratchpad keeps SVs which are lexicals for the current unit and are targets for opcodes. One can deduce

that an SV lives on a scratchpad by looking on its flags: lexicals have SVs_PADMY set, and targets have

SVs_PADTMP set.

The correspondence between OPs and targets is not 1−to−1. Different OPs in the compile tree of the unit can

use the same target, if this would not conflict with the expected life of the temporary.

516 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

Scratchpads and recursion

In fact it is not 100% true that a compiled unit contains a pointer to the scratchpad AV. In fact it contains a

pointer to an AV of (initially) one element, and this element is the scratchpad AV. Why do we need an extra

level of indirection?

The answer is recursion, and maybe (sometime soon) threads. Both these can create several execution

pointers going into the same subroutine. For the subroutine−child not write over the temporaries for the

subroutine−parent (lifespan of which covers the call to the child), the parent and the child should have

different scratchpads. (And the lexicals should be separate anyway!)

So each subroutine is born with an array of scratchpads (of length 1). On each entry to the subroutine it is

checked that the current depth of the recursion is not more than the length of this array, and if it is, new

scratchpad is created and pushed into the array.

The targets on this scratchpad are undefs, but they are already marked with correct flags.

Compiled code

Code tree

Here we describe the internal form your code is converted to by Perl. Start with a simple example:

$a = $b + $c;

This is converted to a tree similar to this one:

assign−to

/ \

+ $a

/ \

$b $c

(but slightly more complicated). This tree reflects the way Perl parsed your code, but has nothing to do with

the execution order. There is an additional "thread" going through the nodes of the tree which shows the

order of execution of the nodes. In our simplified example above it looks like:

$b −−−> $c −−−> + −−−> $a −−−> assign−to

But with the actual compile tree for $a = $b + $c it is different: some nodes optimized away. As a

corollary, though the actual tree contains more nodes than our simplified example, the execution order is the

same as in our example.

Examining the tree

If you have your perl compiled for debugging (usually done with −D optimize=−g on Configure

command line), you may examine the compiled tree by specifying −Dx on the Perl command line. The

output takes several lines per node, and for $b+$c it looks like this:

5 TYPE = add ===> 6

TARG = 1

FLAGS = (SCALAR,KIDS)

{

TYPE = null ===> (4)

(was rv2sv)

FLAGS = (SCALAR,KIDS)

{

3 TYPE = gvsv ===> 4

FLAGS = (SCALAR)

GV = main::b

}

{

18−Oct−1998 Version 5.005_02 517

perlguts Perl Programmers Reference Guide perlguts

TYPE = null ===> (5)

(was rv2sv)

FLAGS = (SCALAR,KIDS)

{

4 TYPE = gvsv ===> 5

FLAGS = (SCALAR)

GV = main::c

}

This tree has 5 nodes (one per TYPE specifier), only 3 of them are not optimized away (one per number in

the left column). The immediate children of the given node correspond to {} pairs on the same level of

indentation, thus this listing corresponds to the tree:

add

/ \

null null

| |

gvsv gvsv

The execution order is indicated by ===> marks, thus it is 3 4 5 6 (node 6 is not included into above

listing), i.e., gvsv gvsv add whatever.

Compile pass 1: check routines

The tree is created by the pseudo−compiler while yacc code feeds it the constructions it recognizes. Since

yacc works bottom−up, so does the first pass of perl compilation.

What makes this pass interesting for perl developers is that some optimization may be performed on this

pass. This is optimization by so−called check routines. The correspondence between node names and

corresponding check routines is described in opcode.pl (do not forget to run make regen_headers if

you modify this file).

A check routine is called when the node is fully constructed except for the execution−order thread. Since at

this time there are no back−links to the currently constructed node, one can do most any operation to the

top−level node, including freeing it and/or creating new nodes above/below it.

The check routine returns the node which should be inserted into the tree (if the top−level node was not

modified, check routine returns its argument).

By convention, check routines have names ck_*. They are usually called from new*OP subroutines (or

convert) (which in turn are called from perly.y).

Compile pass 1a: constant folding

Immediately after the check routine is called the returned node is checked for being compile−time

executable. If it is (the value is judged to be constant) it is immediately executed, and a constant node with

the "return value" of the corresponding subtree is substituted instead. The subtree is deleted.

If constant folding was not performed, the execution−order thread is created.

Compile pass 2: context propagation

When a context for a part of compile tree is known, it is propagated down through the tree. At this time the

context can have 5 values (instead of 2 for runtime context): void, boolean, scalar, list, and lvalue. In

contrast with the pass 1 this pass is processed from top to bottom: a node‘s context determines the context

for its children.

Additional context−dependent optimizations are performed at this time. Since at this moment the compile

tree contains back−references (via "thread" pointers), nodes cannot be free()d now. To allow

optimized−away nodes at this stage, such nodes are null()ified instead of free()ing (i.e. their type is

changed to OP_NULL).

518 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

Compile pass 3: peephole optimization

After the compile tree for a subroutine (or for an eval or a file) is created, an additional pass over the code

is performed. This pass is neither top−down or bottom−up, but in the execution order (with additional

complications for conditionals). These optimizations are done in the subroutine peep(). Optimizations

performed at this stage are subject to the same restrictions as in the pass 2.

API LISTING

This is a listing of functions, macros, flags, and variables that may be useful to extension writers or that may

be found while reading other extensions.

Note that all Perl API global variables must be referenced with the PL_ prefix. Some macros are provided

for compatibility with the older, unadorned names, but this support will be removed in a future release.

It is strongly recommended that all Perl API functions that don‘t begin with perl be referenced with an

explicit Perl_ prefix.

The sort order of the listing is case insensitive, with any occurrences of ‘_’ ignored for the the purpose of

sorting.

av_clear Clears an array, making it empty. Does not free the memory used by the array itself.

void av_clear (AV* ar)

av_extend

Pre−extend an array. The key is the index to which the array should be extended.

void av_extend (AV* ar, I32 key)

av_fetch Returns the SV at the specified index in the array. The key is the index. If lval is set then the

fetch will be part of a store. Check that the return value is non−null before dereferencing it to a

SV*.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied arrays.

SV** av_fetch (AV* ar, I32 key, I32 lval)

AvFILL Same as av_len(). Deprecated, use av_len() instead.

av_len Returns the highest index in the array. Returns −1 if the array is empty.

I32 av_len (AV* ar)

av_make Creates a new AV and populates it with a list of SVs. The SVs are copied into the array, so they

may be freed after the call to av_make. The new AV will have a reference count of 1.

AV* av_make (I32 size, SV** svp)

av_pop Pops an SV off the end of the array. Returns &PL_sv_undef if the array is empty.

SV* av_pop (AV* ar)

av_push Pushes an SV onto the end of the array. The array will grow automatically to accommodate the

addition.

void av_push (AV* ar, SV* val)

av_shift Shifts an SV off the beginning of the array.

SV* av_shift (AV* ar)

av_store Stores an SV in an array. The array index is specified as key. The return value will be NULL if

the operation failed or if the value did not need to be actually stored within the array (as in the

case of tied arrays). Otherwise it can be dereferenced to get the original SV*. Note that the

caller is responsible for suitably incrementing the reference count of val before the call, and

18−Oct−1998 Version 5.005_02 519

perlguts Perl Programmers Reference Guide perlguts

decrementing it if the function returned NULL.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied arrays.

SV** av_store (AV* ar, I32 key, SV* val)

av_undef Undefines the array. Frees the memory used by the array itself.

void av_undef (AV* ar)

av_unshift

Unshift the given number of undef values onto the beginning of the array. The array will grow

automatically to accommodate the addition. You must then use av_store to assign values to

these new elements.

void av_unshift (AV* ar, I32 num)

CLASS Variable which is setup by xsubpp to indicate the class name for a C++ XS constructor. This is

always a char*. See THIS and Using XS With C++ in perlxs.

Copy The XSUB−writer‘s interface to the C memcpy function. The s is the source, d is the

destination, n is the number of items, and t is the type. May fail on overlapping copies. See

also Move.

void Copy( s, d, n, t )

croak This is the XSUB−writer‘s interface to Perl‘s die function. Use this function the same way you

use the C printf function. See warn.

CvSTASH

Returns the stash of the CV.

HV* CvSTASH( SV* sv )

PL_DBsingle

When Perl is run in debugging mode, with the −d switch, this SV is a boolean which indicates

whether subs are being single−stepped. Single−stepping is automatically turned on after every

step. This is the C variable which corresponds to Perl‘s $DB::single variable. See

PL_DBsub.

PL_DBsub

When Perl is run in debugging mode, with the −d switch, this GV contains the SV which holds

the name of the sub being debugged. This is the C variable which corresponds to Perl‘s

$DB::sub variable. See PL_DBsingle. The sub name can be found by

SvPV( GvSV( PL_DBsub ), PL_na )

PL_DBtrace

Trace variable used when Perl is run in debugging mode, with the −d switch. This is the C

variable which corresponds to Perl‘s $DB::trace variable. See PL_DBsingle.

dMARK Declare a stack marker variable, mark, for the XSUB. See MARK and dORIGMARK.

dORIGMARK

Saves the original stack mark for the XSUB. See ORIGMARK.

PL_dowarn

The C variable which corresponds to Perl‘s $^W warning variable.

dSP Declares a local copy of perl‘s stack pointer for the XSUB, available via the SP macro. See SP.

520 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

dXSARGS

Sets up stack and mark pointers for an XSUB, calling dSP and dMARK. This is usually handled

automatically by xsubpp. Declares the items variable to indicate the number of items on the

stack.

dXSI32 Sets up the ix variable for an XSUB which has aliases. This is usually handled automatically by

xsubpp.

do_binmode

Switches filehandle to binmode. iotype is what IoTYPE(io) would contain.

do_binmode(fp, iotype, TRUE);

ENTER Opening bracket on a callback. See LEAVE and perlcall.

ENTER;

EXTEND Used to extend the argument stack for an XSUB‘s return values.

EXTEND( sp, int x )

fbm_compile

Analyses the string in order to make fast searches on it using fbm_instr() — the

Boyer−Moore algorithm.

void fbm_compile(SV* sv, U32 flags)

fbm_instr Returns the location of the SV in the string delimited by str and strend. It returns Nullch

if the string can‘t be found. The sv does not have to be fbm_compiled, but the search will not

be as fast then.

char* fbm_instr(char *str, char *strend, SV *sv, U32 flags)

FREETMPS

Closing bracket for temporaries on a callback. See SAVETMPS and perlcall.

FREETMPS;

G_ARRAY

Used to indicate array context. See GIMME_V, GIMME and perlcall.

G_DISCARD

Indicates that arguments returned from a callback should be discarded. See perlcall.

G_EVAL Used to force a Perl eval wrapper around a callback. See perlcall.

GIMME A backward−compatible version of GIMME_V which can only return G_SCALAR or G_ARRAY;

in a void context, it returns G_SCALAR.

GIMME_V

The XSUB−writer‘s equivalent to Perl‘s wantarray. Returns G_VOID, G_SCALAR or

G_ARRAY for void, scalar or array context, respectively.

G_NOARGS

Indicates that no arguments are being sent to a callback. See perlcall.

G_SCALAR

Used to indicate scalar context. See GIMME_V, GIMME, and perlcall.

gv_fetchmeth

Returns the glob with the given name and a defined subroutine or NULL. The glob lives in the

given stash, or in the stashes accessible via @ISA and @UNIVERSAL.

18−Oct−1998 Version 5.005_02 521

perlguts Perl Programmers Reference Guide perlguts

The argument level should be either 0 or −1. If level==0, as a side−effect creates a glob

with the given name in the given stash which in the case of success contains an alias for the

subroutine, and sets up caching info for this glob. Similarly for all the searched stashes.

This function grants "SUPER" token as a postfix of the stash name.

The GV returned from gv_fetchmeth may be a method cache entry, which is not visible to

Perl code. So when calling perl_call_sv, you should not use the GV directly; instead, you

should use the method‘s CV, which can be obtained from the GV with the GvCV macro.

GV* gv_fetchmeth (HV* stash, char* name, STRLEN len, I32 level)

gv_fetchmethod

gv_fetchmethod_autoload

Returns the glob which contains the subroutine to call to invoke the method on the stash. In

fact in the presense of autoloading this may be the glob for "AUTOLOAD". In this case the

corresponding variable $AUTOLOAD is already setup.

The third parameter of gv_fetchmethod_autoload determines whether AUTOLOAD

lookup is performed if the given method is not present: non−zero means yes, look for

AUTOLOAD; zero means no, don‘t look for AUTOLOAD. Calling gv_fetchmethod is

equivalent to calling gv_fetchmethod_autoload with a non−zero autoload parameter.

These functions grant "SUPER" token as a prefix of the method name.

Note that if you want to keep the returned glob for a long time, you need to check for it being

"AUTOLOAD", since at the later time the call may load a different subroutine due to

$AUTOLOAD changing its value. Use the glob created via a side effect to do this.

These functions have the same side−effects and as gv_fetchmeth with level==0. name

should be writable if contains ‘:’ or ‘\‘’. The warning against passing the GV returned by

gv_fetchmeth to perl_call_sv apply equally to these functions.

GV* gv_fetchmethod (HV* stash, char* name)

GV* gv_fetchmethod_autoload (HV* stash, char* name, I32 autoload)

G_VOID Used to indicate void context. See GIMME_V and perlcall.

gv_stashpv

Returns a pointer to the stash for a specified package. If create is set then the package will be

created if it does not already exist. If create is not set and the package does not exist then

NULL is returned.

HV* gv_stashpv (char* name, I32 create)

gv_stashsv

Returns a pointer to the stash for a specified package. See gv_stashpv.

HV* gv_stashsv (SV* sv, I32 create)

GvSV Return the SV from the GV.

HEf_SVKEY

This flag, used in the length slot of hash entries and magic structures, specifies the structure

contains a SV* pointer where a char* pointer is to be expected. (For information only—not to

be used).

HeHASH Returns the computed hash stored in the hash entry.

U32 HeHASH(HE* he)

522 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

HeKEY Returns the actual pointer stored in the key slot of the hash entry. The pointer may be either

char* or SV*, depending on the value of HeKLEN(). Can be assigned to. The HePV() or

HeSVKEY() macros are usually preferable for finding the value of a key.

char* HeKEY(HE* he)

HeKLEN If this is negative, and amounts to HEf_SVKEY, it indicates the entry holds an SV* key.

Otherwise, holds the actual length of the key. Can be assigned to. The HePV() macro is usually

preferable for finding key lengths.

int HeKLEN(HE* he)

HePV Returns the key slot of the hash entry as a char* value, doing any necessary dereferencing of

possibly SV* keys. The length of the string is placed in len (this is a macro, so do not use

&len). If you do not care about what the length of the key is, you may use the global variable

PL_na. Remember though, that hash keys in perl are free to contain embedded nulls, so using

strlen() or similar is not a good way to find the length of hash keys. This is very similar to

the SvPV() macro described elsewhere in this document.

char* HePV(HE* he, STRLEN len)

HeSVKEY

Returns the key as an SV*, or Nullsv if the hash entry does not contain an SV* key.

HeSVKEY(HE* he)

HeSVKEY_force

Returns the key as an SV*. Will create and return a temporary mortal SV* if the hash entry

contains only a char* key.

HeSVKEY_force(HE* he)

HeSVKEY_set

Sets the key to a given SV*, taking care to set the appropriate flags to indicate the presence of an

SV* key, and returns the same SV*.

HeSVKEY_set(HE* he, SV* sv)

HeVAL Returns the value slot (type SV*) stored in the hash entry.

HeVAL(HE* he)

hv_clear Clears a hash, making it empty.

void hv_clear (HV* tb)

hv_delayfree_ent

Releases a hash entry, such as while iterating though the hash, but delays actual freeing of key

and value until the end of the current statement (or thereabouts) with sv_2mortal. See

hv_iternext and hv_free_ent.

void hv_delayfree_ent (HV* hv, HE* entry)

hv_delete

Deletes a key/value pair in the hash. The value SV is removed from the hash and returned to the

caller. The klen is the length of the key. The flags value will normally be zero; if set to

G_DISCARD then NULL will be returned.

SV* hv_delete (HV* tb, char* key, U32 klen, I32 flags)

hv_delete_ent

Deletes a key/value pair in the hash. The value SV is removed from the hash and returned to the

caller. The flags value will normally be zero; if set to G_DISCARD then NULL will be

18−Oct−1998 Version 5.005_02 523

perlguts Perl Programmers Reference Guide perlguts

returned. hash can be a valid precomputed hash value, or 0 to ask for it to be computed.

SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash)

hv_exists Returns a boolean indicating whether the specified hash key exists. The klen is the length of

the key.

bool hv_exists (HV* tb, char* key, U32 klen)

hv_exists_ent

Returns a boolean indicating whether the specified hash key exists. hash can be a valid

precomputed hash value, or 0 to ask for it to be computed.

bool hv_exists_ent (HV* tb, SV* key, U32 hash)

hv_fetch Returns the SV which corresponds to the specified key in the hash. The klen is the length of

the key. If lval is set then the fetch will be part of a store. Check that the return value is

non−null before dereferencing it to a SV*.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied hashes.

SV** hv_fetch (HV* tb, char* key, U32 klen, I32 lval)

hv_fetch_ent

Returns the hash entry which corresponds to the specified key in the hash. hash must be a valid

precomputed hash number for the given key, or 0 if you want the function to compute it. IF

lval is set then the fetch will be part of a store. Make sure the return value is non−null before

accessing it. The return value when tb is a tied hash is a pointer to a static location, so be sure

to make a copy of the structure if you need to store it somewhere.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied hashes.

HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash)

hv_free_ent

Releases a hash entry, such as while iterating though the hash. See hv_iternext and

hv_delayfree_ent.

void hv_free_ent (HV* hv, HE* entry)

hv_iterinit Prepares a starting point to traverse a hash table.

I32 hv_iterinit (HV* tb)

Returns the number of keys in the hash (i.e. the same as HvKEYS(tb)). The return value is

currently only meaningful for hashes without tie magic.

NOTE: Before version 5.004_65, hv_iterinit used to return the number of hash buckets that

happen to be in use. If you still need that esoteric value, you can get it through the macro

HvFILL(tb).

hv_iterkey

Returns the key from the current position of the hash iterator. See hv_iterinit.

char* hv_iterkey (HE* entry, I32* retlen)

hv_iterkeysv

Returns the key as an SV* from the current position of the hash iterator. The return value will

always be a mortal copy of the key. Also see hv_iterinit.

SV* hv_iterkeysv (HE* entry)

524 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

hv_iternext

Returns entries from a hash iterator. See hv_iterinit.

HE* hv_iternext (HV* tb)

hv_iternextsv

Performs an hv_iternext, hv_iterkey, and hv_iterval in one operation.

SV* hv_iternextsv (HV* hv, char** key, I32* retlen)

hv_iterval Returns the value from the current position of the hash iterator. See hv_iterkey.

SV* hv_iterval (HV* tb, HE* entry)

hv_magic Adds magic to a hash. See sv_magic.

void hv_magic (HV* hv, GV* gv, int how)

HvNAME Returns the package name of a stash. See SvSTASH, CvSTASH.

char* HvNAME (HV* stash)

hv_store Stores an SV in a hash. The hash key is specified as key and klen is the length of the key.

The hash parameter is the precomputed hash value; if it is zero then Perl will compute it. The

return value will be NULL if the operation failed or if the value did not need to be actually stored

within the hash (as in the case of tied hashes). Otherwise it can be dereferenced to get the

original SV*. Note that the caller is responsible for suitably incrementing the reference count of

val before the call, and decrementing it if the function returned NULL.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied hashes.

SV** hv_store (HV* tb, char* key, U32 klen, SV* val, U32 hash)

hv_store_ent

Stores val in a hash. The hash key is specified as key. The hash parameter is the

precomputed hash value; if it is zero then Perl will compute it. The return value is the new hash

entry so created. It will be NULL if the operation failed or if the value did not need to be

actually stored within the hash (as in the case of tied hashes). Otherwise the contents of the

return value can be accessed using the He??? macros described here. Note that the caller is

responsible for suitably incrementing the reference count of val before the call, and

decrementing it if the function returned NULL.

See Understanding the Magic of Tied Hashes and Arrays for more information on how to use

this function on tied hashes.

HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash)

hv_undef Undefines the hash.

void hv_undef (HV* tb)

isALNUM Returns a boolean indicating whether the C char is an ascii alphanumeric character or digit.

int isALNUM (char c)

isALPHA Returns a boolean indicating whether the C char is an ascii alphabetic character.

int isALPHA (char c)

isDIGIT Returns a boolean indicating whether the C char is an ascii digit.

int isDIGIT (char c)

18−Oct−1998 Version 5.005_02 525

perlguts Perl Programmers Reference Guide perlguts

isLOWER

Returns a boolean indicating whether the C char is a lowercase character.

int isLOWER (char c)

isSPACE Returns a boolean indicating whether the C char is whitespace.

int isSPACE (char c)

isUPPER Returns a boolean indicating whether the C char is an uppercase character.

int isUPPER (char c)

items Variable which is setup by xsubpp to indicate the number of items on the stack. See

Variable−length Parameter Lists in perlxs.

ix Variable which is setup by xsubpp to indicate which of an XSUB‘s aliases was used to invoke

it. See The ALIAS: Keyword in perlxs.

LEAVE Closing bracket on a callback. See ENTER and perlcall.

LEAVE;

looks_like_number

Test if an the content of an SV looks like a number (or is a number).

int looks_like_number(SV*)

MARK Stack marker variable for the XSUB. See dMARK.

mg_clear Clear something magical that the SV represents. See sv_magic.

int mg_clear (SV* sv)

mg_copy Copies the magic from one SV to another. See sv_magic.

int mg_copy (SV *, SV *, char *, STRLEN)

mg_find Finds the magic pointer for type matching the SV. See sv_magic.

MAGIC* mg_find (SV* sv, int type)

mg_free Free any magic storage used by the SV. See sv_magic.

int mg_free (SV* sv)

mg_get Do magic after a value is retrieved from the SV. See sv_magic.

int mg_get (SV* sv)

mg_len Report on the SV‘s length. See sv_magic.

U32 mg_len (SV* sv)

mg_magical

Turns on the magical status of an SV. See sv_magic.

void mg_magical (SV* sv)

mg_set Do magic after a value is assigned to the SV. See sv_magic.

int mg_set (SV* sv)

Move The XSUB−writer‘s interface to the C memmove function. The s is the source, d is the

destination, n is the number of items, and t is the type. Can do overlapping moves. See also

Copy.

void Move( s, d, n, t )

526 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

PL_na A variable which may be used with SvPV to tell Perl to calculate the string length.

New The XSUB−writer‘s interface to the C malloc function.

void* New( x, void *ptr, int size, type )

newAV Creates a new AV. The reference count is set to 1.

AV* newAV (void)

Newc The XSUB−writer‘s interface to the C malloc function, with cast.

void* Newc( x, void *ptr, int size, type, cast )

newCONSTSUB

Creates a constant sub equivalent to Perl sub FOO () { 123 } which is eligible for inlining

at compile−time.

void newCONSTSUB(HV* stash, char* name, SV* sv)

newHV Creates a new HV. The reference count is set to 1.

HV* newHV (void)

newRV_inc

Creates an RV wrapper for an SV. The reference count for the original SV is incremented.

SV* newRV_inc (SV* ref)

For historical reasons, "newRV" is a synonym for "newRV_inc".

newRV_noinc

Creates an RV wrapper for an SV. The reference count for the original SV is not incremented.

SV* newRV_noinc (SV* ref)

NEWSV Creates a new SV. A non−zero len parameter indicates the number of bytes of preallocated

string space the SV should have. An extra byte for a tailing NUL is also reserved. (SvPOK is

not set for the SV even if string space is allocated.) The reference count for the new SV is set to

1. id is an integer id between 0 and 1299 (used to identify leaks).

SV* NEWSV (int id, STRLEN len)

newSViv Creates a new SV and copies an integer into it. The reference count for the SV is set to 1.

SV* newSViv (IV i)

newSVnv Creates a new SV and copies a double into it. The reference count for the SV is set to 1.

SV* newSVnv (NV i)

newSVpv Creates a new SV and copies a string into it. The reference count for the SV is set to 1. If len

is zero then Perl will compute the length.

SV* newSVpv (char* s, STRLEN len)

newSVpvf

Creates a new SV an initialize it with the string formatted like sprintf.

SV* newSVpvf(const char* pat, ...);

newSVpvn

Creates a new SV and copies a string into it. The reference count for the SV is set to 1. If len

is zero then Perl will create a zero length string.

SV* newSVpvn (char* s, STRLEN len)

18−Oct−1998 Version 5.005_02 527

perlguts Perl Programmers Reference Guide perlguts

newSVrv Creates a new SV for the RV, rv, to point to. If rv is not an RV then it will be upgraded to one.

If classname is non−null then the new SV will be blessed in the specified package. The new

SV is returned and its reference count is 1.

SV* newSVrv (SV* rv, char* classname)

newSVsv Creates a new SV which is an exact duplicate of the original SV.

SV* newSVsv (SV* old)

newXS Used by xsubpp to hook up XSUBs as Perl subs.

newXSproto

Used by xsubpp to hook up XSUBs as Perl subs. Adds Perl prototypes to the subs.

Newz The XSUB−writer‘s interface to the C malloc function. The allocated memory is zeroed with

memzero.

void* Newz( x, void *ptr, int size, type )

Nullav Null AV pointer.

Nullch Null character pointer.

Nullcv Null CV pointer.

Nullhv Null HV pointer.

Nullsv Null SV pointer.

ORIGMARK

The original stack mark for the XSUB. See dORIGMARK.

perl_alloc Allocates a new Perl interpreter. See perlembed.

perl_call_argv

Performs a callback to the specified Perl sub. See perlcall.

I32 perl_call_argv (char* subname, I32 flags, char** argv)

perl_call_method

Performs a callback to the specified Perl method. The blessed object must be on the stack. See

perlcall.

I32 perl_call_method (char* methname, I32 flags)

perl_call_pv

Performs a callback to the specified Perl sub. See perlcall.

I32 perl_call_pv (char* subname, I32 flags)

perl_call_sv

Performs a callback to the Perl sub whose name is in the SV. See perlcall.

I32 perl_call_sv (SV* sv, I32 flags)

perl_construct

Initializes a new Perl interpreter. See perlembed.

perl_destruct

Shuts down a Perl interpreter. See perlembed.

perl_eval_sv

Tells Perl to eval the string in the SV.

528 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

I32 perl_eval_sv (SV* sv, I32 flags)

perl_eval_pv

Tells Perl to eval the given string and return an SV* result.

SV* perl_eval_pv (char* p, I32 croak_on_error)

perl_free Releases a Perl interpreter. See perlembed.

perl_get_av

Returns the AV of the specified Perl array. If create is set and the Perl variable does not exist

then it will be created. If create is not set and the variable does not exist then NULL is

returned.

AV* perl_get_av (char* name, I32 create)

perl_get_cv

Returns the CV of the specified Perl sub. If create is set and the Perl variable does not exist

then it will be created. If create is not set and the variable does not exist then NULL is

returned.

CV* perl_get_cv (char* name, I32 create)

perl_get_hv

Returns the HV of the specified Perl hash. If create is set and the Perl variable does not exist

then it will be created. If create is not set and the variable does not exist then NULL is

returned.

HV* perl_get_hv (char* name, I32 create)

perl_get_sv

Returns the SV of the specified Perl scalar. If create is set and the Perl variable does not exist

then it will be created. If create is not set and the variable does not exist then NULL is

returned.

SV* perl_get_sv (char* name, I32 create)

perl_parse

Tells a Perl interpreter to parse a Perl script. See perlembed.

perl_require_pv

Tells Perl to require a module.

void perl_require_pv (char* pv)

perl_run Tells a Perl interpreter to run. See perlembed.

POPi Pops an integer off the stack.

int POPi()

POPl Pops a long off the stack.

long POPl()

POPp Pops a string off the stack.

char* POPp()

POPn Pops a double off the stack.

double POPn()

18−Oct−1998 Version 5.005_02 529

perlguts Perl Programmers Reference Guide perlguts

POPs Pops an SV off the stack.

SV* POPs()

PUSHMARK

Opening bracket for arguments on a callback. See PUTBACK and perlcall.

PUSHMARK(p)

PUSHi Push an integer onto the stack. The stack must have room for this element. Handles ‘set’ magic.

See XPUSHi.

void PUSHi(int d)

PUSHn Push a double onto the stack. The stack must have room for this element. Handles ‘set’ magic.

See XPUSHn.

void PUSHn(double d)

PUSHp Push a string onto the stack. The stack must have room for this element. The len indicates the

length of the string. Handles ‘set’ magic. See XPUSHp.

void PUSHp(char *c, int len )

PUSHs Push an SV onto the stack. The stack must have room for this element. Does not handle ‘set’

magic. See XPUSHs.

void PUSHs(sv)

PUSHu Push an unsigned integer onto the stack. The stack must have room for this element. See

XPUSHu.

void PUSHu(unsigned int d)

PUTBACK

Closing bracket for XSUB arguments. This is usually handled by xsubpp. See PUSHMARK and

perlcall for other uses.

PUTBACK;

Renew The XSUB−writer‘s interface to the C realloc function.

void* Renew( void *ptr, int size, type )

Renewc The XSUB−writer‘s interface to the C realloc function, with cast.

void* Renewc( void *ptr, int size, type, cast )

RETVAL Variable which is setup by xsubpp to hold the return value for an XSUB. This is always the

proper type for the XSUB. See The RETVAL Variable in perlxs.

safefree The XSUB−writer‘s interface to the C free function.

safemalloc

The XSUB−writer‘s interface to the C malloc function.

saferealloc

The XSUB−writer‘s interface to the C realloc function.

savepv Copy a string to a safe spot. This does not use an SV.

char* savepv (char* sv)

savepvn Copy a string to a safe spot. The len indicates number of bytes to copy. This does not use an

SV.

530 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

char* savepvn (char* sv, I32 len)

SAVETMPS

Opening bracket for temporaries on a callback. See FREETMPS and perlcall.

SAVETMPS;

SP Stack pointer. This is usually handled by xsubpp. See dSP and SPAGAIN.

SPAGAIN

Refetch the stack pointer. Used after a callback. See perlcall.

SPAGAIN;

ST Used to access elements on the XSUB‘s stack.

SV* ST(int x)

strEQ Test two strings to see if they are equal. Returns true or false.

int strEQ( char *s1, char *s2 )

strGE Test two strings to see if the first, s1, is greater than or equal to the second, s2. Returns true or

false.

int strGE( char *s1, char *s2 )

strGT Test two strings to see if the first, s1, is greater than the second, s2. Returns true or false.

int strGT( char *s1, char *s2 )

strLE Test two strings to see if the first, s1, is less than or equal to the second, s2. Returns true or

false.

int strLE( char *s1, char *s2 )

strLT Test two strings to see if the first, s1, is less than the second, s2. Returns true or false.

int strLT( char *s1, char *s2 )

strNE Test two strings to see if they are different. Returns true or false.

int strNE( char *s1, char *s2 )

strnEQ Test two strings to see if they are equal. The len parameter indicates the number of bytes to

compare. Returns true or false.

int strnEQ( char *s1, char *s2 )

strnNE Test two strings to see if they are different. The len parameter indicates the number of bytes to

compare. Returns true or false.

int strnNE( char *s1, char *s2, int len )

sv_2mortal

Marks an SV as mortal. The SV will be destroyed when the current context ends.

SV* sv_2mortal (SV* sv)

sv_bless Blesses an SV into a specified package. The SV must be an RV. The package must be

designated by its stash (see gv_stashpv()). The reference count of the SV is unaffected.

SV* sv_bless (SV* sv, HV* stash)

sv_catpv Concatenates the string onto the end of the string which is in the SV. Handles ‘get’ magic, but

not ‘set’ magic. See sv_catpv_mg.

void sv_catpv (SV* sv, char* ptr)

18−Oct−1998 Version 5.005_02 531

perlguts Perl Programmers Reference Guide perlguts

sv_catpv_mg

Like sv_catpv, but also handles ‘set’ magic.

void sv_catpvn (SV* sv, char* ptr)

sv_catpvn

Concatenates the string onto the end of the string which is in the SV. The len indicates number

of bytes to copy. Handles ‘get’ magic, but not ‘set’ magic. See sv_catpvn_mg.

void sv_catpvn (SV* sv, char* ptr, STRLEN len)

sv_catpvn_mg

Like sv_catpvn, but also handles ‘set’ magic.

void sv_catpvn_mg (SV* sv, char* ptr, STRLEN len)

sv_catpvf Processes its arguments like sprintf and appends the formatted output to an SV. Handles

‘get’ magic, but not ‘set’ magic. SvSETMAGIC() must typically be called after calling this

function to handle ‘set’ magic.

void sv_catpvf (SV* sv, const char* pat, ...)

sv_catpvf_mg

Like sv_catpvf, but also handles ‘set’ magic.

void sv_catpvf_mg (SV* sv, const char* pat, ...)

sv_catsv Concatenates the string from SV ssv onto the end of the string in SV dsv. Handles ‘get’

magic, but not ‘set’ magic. See sv_catsv_mg.

void sv_catsv (SV* dsv, SV* ssv)

sv_catsv_mg

Like sv_catsv, but also handles ‘set’ magic.

void sv_catsv_mg (SV* dsv, SV* ssv)

sv_chop Efficient removal of characters from the beginning of the string buffer. SvPOK(sv) must be true

and the ptr must be a pointer to somewhere inside the string buffer. The ptr becomes the first

character of the adjusted string.

void sv_chop(SV* sv, char *ptr)

sv_cmp Compares the strings in two SVs. Returns −1, 0, or 1 indicating whether the string in sv1 is less

than, equal to, or greater than the string in sv2.

I32 sv_cmp (SV* sv1, SV* sv2)

SvCUR Returns the length of the string which is in the SV. See SvLEN.

int SvCUR (SV* sv)

SvCUR_set

Set the length of the string which is in the SV. See SvCUR.

void SvCUR_set (SV* sv, int val )

sv_dec Auto−decrement of the value in the SV.

void sv_dec (SV* sv)

sv_derived_from

Returns a boolean indicating whether the SV is a subclass of the specified class.

int sv_derived_from(SV* sv, char* class)

532 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

sv_derived_from

Returns a boolean indicating whether the SV is derived from the specified class. This is the

function that implements UNIVERSAL::isa. It works for class names as well as for objects.

bool sv_derived_from _((SV* sv, char* name));

SvEND Returns a pointer to the last character in the string which is in the SV. See SvCUR. Access the

character as

char* SvEND(sv)

sv_eq Returns a boolean indicating whether the strings in the two SVs are identical.

I32 sv_eq (SV* sv1, SV* sv2)

SvGETMAGIC

Invokes mg_get on an SV if it has ‘get’ magic. This macro evaluates its argument more than

once.

void SvGETMAGIC( SV *sv )

SvGROW

Expands the character buffer in the SV so that it has room for the indicated number of bytes

(remember to reserve space for an extra trailing NUL character). Calls sv_grow to perform the

expansion if necessary. Returns a pointer to the character buffer.

char* SvGROW( SV* sv, int len )

sv_grow Expands the character buffer in the SV. This will use sv_unref and will upgrade the SV to

SVt_PV. Returns a pointer to the character buffer. Use SvGROW.

sv_inc Auto−increment of the value in the SV.

void sv_inc (SV* sv)

sv_insert Inserts a string at the specified offset/length within the SV. Similar to the Perl substr()

function.

void sv_insert(SV *sv, STRLEN offset, STRLEN len,

char *str, STRLEN strlen)

SvIOK Returns a boolean indicating whether the SV contains an integer.

int SvIOK (SV* SV)

SvIOK_off

Unsets the IV status of an SV.

void SvIOK_off (SV* sv)

SvIOK_on

Tells an SV that it is an integer.

void SvIOK_on (SV* sv)

SvIOK_only

Tells an SV that it is an integer and disables all other OK bits.

void SvIOK_only (SV* sv)

SvIOKp Returns a boolean indicating whether the SV contains an integer. Checks the private setting.

Use SvIOK.

int SvIOKp (SV* SV)

18−Oct−1998 Version 5.005_02 533

perlguts Perl Programmers Reference Guide perlguts

sv_isa Returns a boolean indicating whether the SV is blessed into the specified class. This does not

check for subtypes; use sv_derived_from to verify an inheritance relationship.

int sv_isa (SV* sv, char* name)

sv_isobject

Returns a boolean indicating whether the SV is an RV pointing to a blessed object. If the SV is

not an RV, or if the object is not blessed, then this will return false.

int sv_isobject (SV* sv)

SvIV Returns the integer which is in the SV.

int SvIV (SV* sv)

SvIVX Returns the integer which is stored in the SV.

int SvIVX (SV* sv)

SvLEN Returns the size of the string buffer in the SV. See SvCUR.

int SvLEN (SV* sv)

sv_len Returns the length of the string in the SV. Use SvCUR.

STRLEN sv_len (SV* sv)

sv_magic Adds magic to an SV.

void sv_magic (SV* sv, SV* obj, int how, char* name, I32 namlen)

sv_mortalcopy

Creates a new SV which is a copy of the original SV. The new SV is marked as mortal.

SV* sv_mortalcopy (SV* oldsv)

sv_newmortal

Creates a new SV which is mortal. The reference count of the SV is set to 1.

SV* sv_newmortal (void)

SvNIOK Returns a boolean indicating whether the SV contains a number, integer or double.

int SvNIOK (SV* SV)

SvNIOK_off

Unsets the NV/IV status of an SV.

void SvNIOK_off (SV* sv)

SvNIOKp Returns a boolean indicating whether the SV contains a number, integer or double. Checks the

private setting. Use SvNIOK.

int SvNIOKp (SV* SV)

PL_sv_no

This is the false SV. See PL_sv_yes. Always refer to this as &PL_sv_no.

SvNOK Returns a boolean indicating whether the SV contains a double.

int SvNOK (SV* SV)

SvNOK_off

Unsets the NV status of an SV.

void SvNOK_off (SV* sv)

534 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

SvNOK_on

Tells an SV that it is a double.

void SvNOK_on (SV* sv)

SvNOK_only

Tells an SV that it is a double and disables all other OK bits.

void SvNOK_only (SV* sv)

SvNOKp Returns a boolean indicating whether the SV contains a double. Checks the private setting. Use

SvNOK.

int SvNOKp (SV* SV)

SvNV Returns the double which is stored in the SV.

double SvNV (SV* sv)

SvNVX Returns the double which is stored in the SV.

double SvNVX (SV* sv)

SvOK Returns a boolean indicating whether the value is an SV.

int SvOK (SV* sv)

SvOOK Returns a boolean indicating whether the SvIVX is a valid offset value for the SvPVX. This

hack is used internally to speed up removal of characters from the beginning of a SvPV. When

SvOOK is true, then the start of the allocated string buffer is really (SvPVX − SvIVX).

int SvOOK(SV* sv)

SvPOK Returns a boolean indicating whether the SV contains a character string.

int SvPOK (SV* SV)

SvPOK_off

Unsets the PV status of an SV.

void SvPOK_off (SV* sv)

SvPOK_on

Tells an SV that it is a string.

void SvPOK_on (SV* sv)

SvPOK_only

Tells an SV that it is a string and disables all other OK bits.

void SvPOK_only (SV* sv)

SvPOKp Returns a boolean indicating whether the SV contains a character string. Checks the private

setting. Use SvPOK.

int SvPOKp (SV* SV)

SvPV Returns a pointer to the string in the SV, or a stringified form of the SV if the SV does not

contain a string. If len is PL_na then Perl will handle the length on its own. Handles ‘get’

magic.

char* SvPV (SV* sv, int len )

SvPV_force

Like <SvPV but will force the SV into becoming a string (SvPOK). You want force if you are

going to update the SvPVX directly.

18−Oct−1998 Version 5.005_02 535

perlguts Perl Programmers Reference Guide perlguts

char* SvPV_force(SV* sv, int len)

SvPVX Returns a pointer to the string in the SV. The SV must contain a string.

char* SvPVX (SV* sv)

SvREFCNT

Returns the value of the object‘s reference count.

int SvREFCNT (SV* sv)

SvREFCNT_dec

Decrements the reference count of the given SV.

void SvREFCNT_dec (SV* sv)

SvREFCNT_inc

Increments the reference count of the given SV.

void SvREFCNT_inc (SV* sv)

SvROK Tests if the SV is an RV.

int SvROK (SV* sv)

SvROK_off

Unsets the RV status of an SV.

void SvROK_off (SV* sv)

SvROK_on

Tells an SV that it is an RV.

void SvROK_on (SV* sv)

SvRV Dereferences an RV to return the SV.

SV* SvRV (SV* sv)

SvSETMAGIC

Invokes mg_set on an SV if it has ‘set’ magic. This macro evaluates its argument more than

once.

void SvSETMAGIC( SV *sv )

sv_setiv Copies an integer into the given SV. Does not handle ‘set’ magic. See sv_setiv_mg.

void sv_setiv (SV* sv, IV num)

sv_setiv_mg

Like sv_setiv, but also handles ‘set’ magic.

void sv_setiv_mg (SV* sv, IV num)

sv_setnv Copies a double into the given SV. Does not handle ‘set’ magic. See sv_setnv_mg.

void sv_setnv (SV* sv, double num)

sv_setnv_mg

Like sv_setnv, but also handles ‘set’ magic.

void sv_setnv_mg (SV* sv, double num)

sv_setpv Copies a string into an SV. The string must be null−terminated. Does not handle ‘set’ magic.

See sv_setpv_mg.

536 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

void sv_setpv (SV* sv, char* ptr)

sv_setpv_mg

Like sv_setpv, but also handles ‘set’ magic.

void sv_setpv_mg (SV* sv, char* ptr)

sv_setpviv

Copies an integer into the given SV, also updating its string value. Does not handle ‘set’ magic.

See sv_setpviv_mg.

void sv_setpviv (SV* sv, IV num)

sv_setpviv_mg

Like sv_setpviv, but also handles ‘set’ magic.

void sv_setpviv_mg (SV* sv, IV num)

sv_setpvn

Copies a string into an SV. The len parameter indicates the number of bytes to be copied.

Does not handle ‘set’ magic. See sv_setpvn_mg.

void sv_setpvn (SV* sv, char* ptr, STRLEN len)

sv_setpvn_mg

Like sv_setpvn, but also handles ‘set’ magic.

void sv_setpvn_mg (SV* sv, char* ptr, STRLEN len)

sv_setpvf Processes its arguments like sprintf and sets an SV to the formatted output. Does not handle

‘set’ magic. See sv_setpvf_mg.

void sv_setpvf (SV* sv, const char* pat, ...)

sv_setpvf_mg

Like sv_setpvf, but also handles ‘set’ magic.

void sv_setpvf_mg (SV* sv, const char* pat, ...)

sv_setref_iv

Copies an integer into a new SV, optionally blessing the SV. The rv argument will be upgraded

to an RV. That RV will be modified to point to the new SV. The classname argument

indicates the package for the blessing. Set classname to Nullch to avoid the blessing. The

new SV will be returned and will have a reference count of 1.

SV* sv_setref_iv (SV *rv, char *classname, IV iv)

sv_setref_nv

Copies a double into a new SV, optionally blessing the SV. The rv argument will be upgraded

to an RV. That RV will be modified to point to the new SV. The classname argument

indicates the package for the blessing. Set classname to Nullch to avoid the blessing. The

new SV will be returned and will have a reference count of 1.

SV* sv_setref_nv (SV *rv, char *classname, double nv)

sv_setref_pv

Copies a pointer into a new SV, optionally blessing the SV. The rv argument will be upgraded

to an RV. That RV will be modified to point to the new SV. If the pv argument is NULL then

PL_sv_undef will be placed into the SV. The classname argument indicates the package

for the blessing. Set classname to Nullch to avoid the blessing. The new SV will be

returned and will have a reference count of 1.

18−Oct−1998 Version 5.005_02 537

perlguts Perl Programmers Reference Guide perlguts

SV* sv_setref_pv (SV *rv, char *classname, void* pv)

Do not use with integral Perl types such as HV, AV, SV, CV, because those objects will become

corrupted by the pointer copy process.

Note that sv_setref_pvn copies the string while this copies the pointer.

sv_setref_pvn

Copies a string into a new SV, optionally blessing the SV. The length of the string must be

specified with n. The rv argument will be upgraded to an RV. That RV will be modified to

point to the new SV. The classname argument indicates the package for the blessing. Set

classname to Nullch to avoid the blessing. The new SV will be returned and will have a

reference count of 1.

SV* sv_setref_pvn (SV *rv, char *classname, char* pv, I32 n)

Note that sv_setref_pv copies the pointer while this copies the string.

SvSetSV Calls sv_setsv if dsv is not the same as ssv. May evaluate arguments more than once.

void SvSetSV (SV* dsv, SV* ssv)

SvSetSV_nosteal

Calls a non−destructive version of sv_setsv if dsv is not the same as ssv. May evaluate

arguments more than once.

void SvSetSV_nosteal (SV* dsv, SV* ssv)

sv_setsv Copies the contents of the source SV ssv into the destination SV dsv. The source SV may be

destroyed if it is mortal. Does not handle ‘set’ magic. See the macro forms SvSetSV,

SvSetSV_nosteal and sv_setsv_mg.

void sv_setsv (SV* dsv, SV* ssv)

sv_setsv_mg

Like sv_setsv, but also handles ‘set’ magic.

void sv_setsv_mg (SV* dsv, SV* ssv)

sv_setuv Copies an unsigned integer into the given SV. Does not handle ‘set’ magic. See

sv_setuv_mg.

void sv_setuv (SV* sv, UV num)

sv_setuv_mg

Like sv_setuv, but also handles ‘set’ magic.

void sv_setuv_mg (SV* sv, UV num)

SvSTASH

Returns the stash of the SV.

HV* SvSTASH (SV* sv)

SvTAINT Taints an SV if tainting is enabled

void SvTAINT (SV* sv)

SvTAINTED

Checks to see if an SV is tainted. Returns TRUE if it is, FALSE if not.

int SvTAINTED (SV* sv)

538 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

SvTAINTED_off

Untaints an SV. Be very careful with this routine, as it short−circuits some of Perl‘s fundamental

security features. XS module authors should not use this function unless they fully understand all

the implications of unconditionally untainting the value. Untainting should be done in the

standard perl fashion, via a carefully crafted regexp, rather than directly untainting variables.

void SvTAINTED_off (SV* sv)

SvTAINTED_on

Marks an SV as tainted.

void SvTAINTED_on (SV* sv)

SVt_IV Integer type flag for scalars. See svtype.

SVt_PV Pointer type flag for scalars. See svtype.

SVt_PVAV

Type flag for arrays. See svtype.

SVt_PVCV

Type flag for code refs. See svtype.

SVt_PVHV

Type flag for hashes. See svtype.

SVt_PVMG

Type flag for blessed scalars. See svtype.

SVt_NV Double type flag for scalars. See svtype.

SvTRUE Returns a boolean indicating whether Perl would evaluate the SV as true or false, defined or

undefined. Does not handle ‘get’ magic.

int SvTRUE (SV* sv)

SvTYPE Returns the type of the SV. See svtype.

svtype SvTYPE (SV* sv)

svtype An enum of flags for Perl types. These are found in the file sv.h in the svtype enum. Test

these flags with the SvTYPE macro.

PL_sv_undef

This is the undef SV. Always refer to this as &PL_sv_undef.

sv_unref Unsets the RV status of the SV, and decrements the reference count of whatever was being

referenced by the RV. This can almost be thought of as a reversal of newSVrv. See

SvROK_off.

void sv_unref (SV* sv)

SvUPGRADE

Used to upgrade an SV to a more complex form. Uses sv_upgrade to perform the upgrade if

necessary. See svtype.

bool SvUPGRADE (SV* sv, svtype mt)

sv_upgrade

Upgrade an SV to a more complex form. Use SvUPGRADE. See svtype.

18−Oct−1998 Version 5.005_02 539

perlguts Perl Programmers Reference Guide perlguts

sv_usepvn

Tells an SV to use ptr to find its string value. Normally the string is stored inside the SV but

sv_usepvn allows the SV to use an outside string. The ptr should point to memory that was

allocated by malloc. The string length, len, must be supplied. This function will realloc the

memory pointed to by ptr, so that pointer should not be freed or used by the programmer after

giving it to sv_usepvn. Does not handle ‘set’ magic. See sv_usepvn_mg.

void sv_usepvn (SV* sv, char* ptr, STRLEN len)

sv_usepvn_mg

Like sv_usepvn, but also handles ‘set’ magic.

void sv_usepvn_mg (SV* sv, char* ptr, STRLEN len)

sv_vcatpvfn(sv, pat, patlen, args, svargs, svmax, used_locale)

Processes its arguments like vsprintf and appends the formatted output to an SV. Uses an

array of SVs if the C style variable argument list is missing (NULL). Indicates if locale

information has been used for formatting.

void sv_catpvfn _((SV* sv, const char* pat, STRLEN patlen,

va_list *args, SV **svargs, I32 svmax,

bool *used_locale));

sv_vsetpvfn(sv, pat, patlen, args, svargs, svmax, used_locale)

Works like vcatpvfn but copies the text into the SV instead of appending it.

void sv_setpvfn _((SV* sv, const char* pat, STRLEN patlen,

va_list *args, SV **svargs, I32 svmax,

bool *used_locale));

SvUV Returns the unsigned integer which is in the SV.

UV SvUV(SV* sv)

SvUVX Returns the unsigned integer which is stored in the SV.

UV SvUVX(SV* sv)

PL_sv_yes

This is the true SV. See PL_sv_no. Always refer to this as &PL_sv_yes.

THIS Variable which is setup by xsubpp to designate the object in a C++ XSUB. This is always the

proper type for the C++ object. See CLASS and Using XS With C++ in perlxs.

toLOWER

Converts the specified character to lowercase.

int toLOWER (char c)

toUPPER Converts the specified character to uppercase.

int toUPPER (char c)

warn This is the XSUB−writer‘s interface to Perl‘s warn function. Use this function the same way

you use the C printf function. See croak().

XPUSHi Push an integer onto the stack, extending the stack if necessary. Handles ‘set’ magic. See

PUSHi.

XPUSHi(int d)

540 Version 5.005_02 18−Oct−1998

perlguts Perl Programmers Reference Guide perlguts

XPUSHn Push a double onto the stack, extending the stack if necessary. Handles ‘set’ magic. See

PUSHn.

XPUSHn(double d)

XPUSHp Push a string onto the stack, extending the stack if necessary. The len indicates the length of

the string. Handles ‘set’ magic. See PUSHp.

XPUSHp(char *c, int len)

XPUSHs Push an SV onto the stack, extending the stack if necessary. Does not handle ‘set’ magic. See

PUSHs.

XPUSHs(sv)

XPUSHu Push an unsigned integer onto the stack, extending the stack if necessary. See PUSHu.

XS Macro to declare an XSUB and its C parameter list. This is handled by xsubpp.

XSRETURN

Return from XSUB, indicating number of items on the stack. This is usually handled by

xsubpp.

XSRETURN(int x)

XSRETURN_EMPTY

Return an empty list from an XSUB immediately.

XSRETURN_EMPTY;

XSRETURN_IV

Return an integer from an XSUB immediately. Uses XST_mIV.

XSRETURN_IV(IV v)

XSRETURN_NO

Return &PL_sv_no from an XSUB immediately. Uses XST_mNO.

XSRETURN_NO;

XSRETURN_NV

Return an double from an XSUB immediately. Uses XST_mNV.

XSRETURN_NV(NV v)

XSRETURN_PV

Return a copy of a string from an XSUB immediately. Uses XST_mPV.

XSRETURN_PV(char *v)

XSRETURN_UNDEF

Return &PL_sv_undef from an XSUB immediately. Uses XST_mUNDEF.

XSRETURN_UNDEF;

XSRETURN_YES

Return &PL_sv_yes from an XSUB immediately. Uses XST_mYES.

XSRETURN_YES;

XST_mIV Place an integer into the specified position i on the stack. The value is stored in a new mortal

SV.

XST_mIV( int i, IV v )

18−Oct−1998 Version 5.005_02 541

perlguts Perl Programmers Reference Guide perlguts

XST_mNV

Place a double into the specified position i on the stack. The value is stored in a new mortal SV.

XST_mNV( int i, NV v )

XST_mNO

Place &PL_sv_no into the specified position i on the stack.

XST_mNO( int i )

XST_mPV

Place a copy of a string into the specified position i on the stack. The value is stored in a new

mortal SV.

XST_mPV( int i, char *v )

XST_mUNDEF

Place &PL_sv_undef into the specified position i on the stack.

XST_mUNDEF( int i )

XST_mYES

Place &PL_sv_yes into the specified position i on the stack.

XST_mYES( int i )

XS_VERSION

The version identifier for an XS module. This is usually handled automatically by

ExtUtils::MakeMaker. See XS_VERSION_BOOTCHECK.

XS_VERSION_BOOTCHECK

Macro to verify that a PM module‘s $VERSION variable matches the XS module‘s

XS_VERSION variable. This is usually handled automatically by xsubpp. See

The VERSIONCHECK: Keyword in perlxs.

Zero The XSUB−writer‘s interface to the C memzero function. The d is the destination, n is the

number of items, and t is the type.

void Zero( d, n, t )

AUTHORS

Until May 1997, this document was maintained by Jeff Okamoto <okamoto@corp.hp.com. It is now

maintained as part of Perl itself.

With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, Andreas Koenig, Paul Hudson, Ilya

Zakharevich, Paul Marquess, Neil Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,

Stephen McCamant, and Gurusamy Sarathy.

API Listing originally by Dean Roehrich <roehrich@cray.com.

542 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

NAME

perlcall − Perl calling conventions from C

DESCRIPTION

The purpose of this document is to show you how to call Perl subroutines directly from C, i.e., how to write

callbacks.

Apart from discussing the C interface provided by Perl for writing callbacks the document uses a series of

examples to show how the interface actually works in practice. In addition some techniques for coding

callbacks are covered.

Examples where callbacks are necessary include

An Error Handler

You have created an XSUB interface to an application‘s C API.

A fairly common feature in applications is to allow you to define a C function that will be called

whenever something nasty occurs. What we would like is to be able to specify a Perl subroutine that

will be called instead.

An Event Driven Program

The classic example of where callbacks are used is when writing an event driven program like for an

X windows application. In this case you register functions to be called whenever specific events

occur, e.g., a mouse button is pressed, the cursor moves into a window or a menu item is selected.

Although the techniques described here are applicable when embedding Perl in a C program, this is not the

primary goal of this document. There are other details that must be considered and are specific to embedding

Perl. For details on embedding Perl in C refer to perlembed.

Before you launch yourself head first into the rest of this document, it would be a good idea to have read the

following two documents − perlxs and perlguts.

THE PERL_CALL FUNCTIONS

Although this stuff is easier to explain using examples, you first need be aware of a few important

definitions.

Perl has a number of C functions that allow you to call Perl subroutines. They are

I32 perl_call_sv(SV* sv, I32 flags) ;

I32 perl_call_pv(char *subname, I32 flags) ;

I32 perl_call_method(char *methname, I32 flags) ;

I32 perl_call_argv(char *subname, I32 flags, register char **argv) ;

The key function is perl_call_sv. All the other functions are fairly simple wrappers which make it easier to

call Perl subroutines in special cases. At the end of the day they will all call perl_call_sv to invoke the Perl

subroutine.

All the perl_call_* functions have a flags parameter which is used to pass a bit mask of options to Perl.

This bit mask operates identically for each of the functions. The settings available in the bit mask are

discussed in FLAG VALUES.

Each of the functions will now be discussed in turn.

perl_call_sv

perl_call_sv takes two parameters, the first, sv, is an SV*. This allows you to specify the Perl

subroutine to be called either as a C string (which has first been converted to an SV) or a reference to

a subroutine. The section, Using perl_call_sv, shows how you can make use of perl_call_sv.

perl_call_pv

The function, perl_call_pv, is similar to perl_call_sv except it expects its first parameter to be a C

char* which identifies the Perl subroutine you want to call, e.g., perl_call_pv("fred", 0).

18−Oct−1998 Version 5.005_02 543

perlcall Perl Programmers Reference Guide perlcall

If the subroutine you want to call is in another package, just include the package name in the string,

e.g., "pkg::fred".

perl_call_method

The function perl_call_method is used to call a method from a Perl class. The parameter methname

corresponds to the name of the method to be called. Note that the class that the method belongs to is

passed on the Perl stack rather than in the parameter list. This class can be either the name of the class

(for a static method) or a reference to an object (for a virtual method). See perlobj for more

information on static and virtual methods and Using perl_call_method for an example of using

perl_call_method.

perl_call_argv

perl_call_argv calls the Perl subroutine specified by the C string stored in the subname parameter.

It also takes the usual flags parameter. The final parameter, argv, consists of a NULL terminated

list of C strings to be passed as parameters to the Perl subroutine. See Using perl_call_argv.

All the functions return an integer. This is a count of the number of items returned by the Perl subroutine.

The actual items returned by the subroutine are stored on the Perl stack.

As a general rule you should always check the return value from these functions. Even if you are expecting

only a particular number of values to be returned from the Perl subroutine, there is nothing to stop someone

from doing something unexpected − don‘t say you haven‘t been warned.

FLAG VALUES

The flags parameter in all the perl_call_* functions is a bit mask which can consist of any combination of

the symbols defined below, OR‘ed together.

G_VOID

Calls the Perl subroutine in a void context.

This flag has 2 effects:

1. It indicates to the subroutine being called that it is executing in a void context (if it executes

wantarray the result will be the undefined value).

2. It ensures that nothing is actually returned from the subroutine.

The value returned by the perl_call_* function indicates how many items have been returned by the Perl

subroutine − in this case it will be 0.

G_SCALAR

Calls the Perl subroutine in a scalar context. This is the default context flag setting for all the perl_call_*

functions.

This flag has 2 effects:

1. It indicates to the subroutine being called that it is executing in a scalar context (if it executes

wantarray the result will be false).

2. It ensures that only a scalar is actually returned from the subroutine. The subroutine can, of course,

ignore the wantarray and return a list anyway. If so, then only the last element of the list will be

returned.

The value returned by the perl_call_* function indicates how many items have been returned by the Perl

subroutine − in this case it will be either 0 or 1.

If 0, then you have specified the G_DISCARD flag.

If 1, then the item actually returned by the Perl subroutine will be stored on the Perl stack − the section

Returning a Scalar shows how to access this value on the stack. Remember that regardless of how many

items the Perl subroutine returns, only the last one will be accessible from the stack − think of the case where

only one value is returned as being a list with only one element. Any other items that were returned will not

544 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

exist by the time control returns from the perl_call_* function. The section Returning a list in a scalar

context shows an example of this behavior.

G_ARRAY

Calls the Perl subroutine in a list context.

As with G_SCALAR, this flag has 2 effects:

1. It indicates to the subroutine being called that it is executing in an array context (if it executes

wantarray the result will be true).

2. It ensures that all items returned from the subroutine will be accessible when control returns from the

perl_call_* function.

The value returned by the perl_call_* function indicates how many items have been returned by the Perl

subroutine.

If 0, then you have specified the G_DISCARD flag.

If not 0, then it will be a count of the number of items returned by the subroutine. These items will be stored

on the Perl stack. The section Returning a list of values gives an example of using the G_ARRAY flag and

the mechanics of accessing the returned items from the Perl stack.

G_DISCARD

By default, the perl_call_* functions place the items returned from by the Perl subroutine on the stack. If

you are not interested in these items, then setting this flag will make Perl get rid of them automatically for

you. Note that it is still possible to indicate a context to the Perl subroutine by using either G_SCALAR or

G_ARRAY.

If you do not set this flag then it is very important that you make sure that any temporaries (i.e., parameters

passed to the Perl subroutine and values returned from the subroutine) are disposed of yourself. The section

Returning a Scalar gives details of how to dispose of these temporaries explicitly and the section Using Perl

to dispose of temporaries discusses the specific circumstances where you can ignore the problem and let Perl

deal with it for you.

G_NOARGS

Whenever a Perl subroutine is called using one of the perl_call_* functions, it is assumed by default that

parameters are to be passed to the subroutine. If you are not passing any parameters to the Perl subroutine,

you can save a bit of time by setting this flag. It has the effect of not creating the @_ array for the Perl

subroutine.

Although the functionality provided by this flag may seem straightforward, it should be used only if there is

a good reason to do so. The reason for being cautious is that even if you have specified the G_NOARGS

flag, it is still possible for the Perl subroutine that has been called to think that you have passed it parameters.

In fact, what can happen is that the Perl subroutine you have called can access the @_ array from a previous

Perl subroutine. This will occur when the code that is executing the perl_call_* function has itself been

called from another Perl subroutine. The code below illustrates this

sub fred

{ print "@_\n" }

sub joe

{ &fred }

&joe(1,2,3) ;

This will print

1 2 3

18−Oct−1998 Version 5.005_02 545

perlcall Perl Programmers Reference Guide perlcall

What has happened is that fred accesses the @_ array which belongs to joe.

G_EVAL

It is possible for the Perl subroutine you are calling to terminate abnormally, e.g., by calling die explicitly or

by not actually existing. By default, when either of these events occurs, the process will terminate

immediately. If you want to trap this type of event, specify the G_EVAL flag. It will put an eval { } around

the subroutine call.

Whenever control returns from the perl_call_* function you need to check the $@ variable as you would in a

normal Perl script.

The value returned from the perl_call_* function is dependent on what other flags have been specified and

whether an error has occurred. Here are all the different cases that can occur:

If the perl_call_* function returns normally, then the value returned is as specified in the previous

sections.

If G_DISCARD is specified, the return value will always be 0.

If G_ARRAY is specified and an error has occurred, the return value will always be 0.

If G_SCALAR is specified and an error has occurred, the return value will be 1 and the value on the

top of the stack will be undef. This means that if you have already detected the error by checking $@

and you want the program to continue, you must remember to pop the undef from the stack.

See Using G_EVAL for details on using G_EVAL.

G_KEEPERR

You may have noticed that using the G_EVAL flag described above will always clear the $@ variable and

set it to a string describing the error iff there was an error in the called code. This unqualified resetting of $@

can be problematic in the reliable identification of errors using the eval {} mechanism, because the

possibility exists that perl will call other code (end of block processing code, for example) between the time

the error causes $@ to be set within eval {}, and the subsequent statement which checks for the value of

$@ gets executed in the user‘s script.

This scenario will mostly be applicable to code that is meant to be called from within destructors,

asynchronous callbacks, signal handlers, __DIE__ or __WARN__ hooks, and tie functions. In such

situations, you will not want to clear $@ at all, but simply to append any new errors to any existing value of

$@.

The G_KEEPERR flag is meant to be used in conjunction with G_EVAL in perl_call_* functions that are

used to implement such code. This flag has no effect when G_EVAL is not used.

When G_KEEPERR is used, any errors in the called code will be prefixed with the string "\t(in cleanup)",

and appended to the current value of $@.

The G_KEEPERR flag was introduced in Perl version 5.002.

See Using G_KEEPERR for an example of a situation that warrants the use of this flag.

Determining the Context

As mentioned above, you can determine the context of the currently executing subroutine in Perl with

wantarray. The equivalent test can be made in C by using the GIMME_V macro, which returns G_ARRAY if

you have been called in an array context, G_SCALAR if in a scalar context, or G_VOID if in a void context

(i.e. the return value will not be used). An older version of this macro is called GIMME; in a void context it

returns G_SCALAR instead of G_VOID. An example of using the GIMME_V macro is shown in section

Using GIMME_V.

KNOWN PROBLEMS

This section outlines all known problems that exist in the perl_call_* functions.

546 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

1. If you are intending to make use of both the G_EVAL and G_SCALAR flags in your code, use a

version of Perl greater than 5.000. There is a bug in version 5.000 of Perl which means that the

combination of these two flags will not work as described in the section FLAG VALUES.

Specifically, if the two flags are used when calling a subroutine and that subroutine does not call die,

the value returned by perl_call_* will be wrong.

2. In Perl 5.000 and 5.001 there is a problem with using perl_call_* if the Perl sub you are calling

attempts to trap a die.

The symptom of this problem is that the called Perl sub will continue to completion, but whenever it

attempts to pass control back to the XSUB, the program will immediately terminate.

For example, say you want to call this Perl sub

sub fred

{

eval { die "Fatal Error" ; }

print "Trapped error: $@\n"

if $@ ;

}

via this XSUB

void

Call_fred()

CODE:

PUSHMARK(SP) ;

perl_call_pv("fred", G_DISCARD|G_NOARGS) ;

fprintf(stderr, "back in Call_fred\n") ;

When Call_fred is executed it will print

Trapped error: Fatal Error

As control never returns to Call_fred, the "back in Call_fred" string will not get printed.

To work around this problem, you can either upgrade to Perl 5.002 or higher, or use the G_EVAL

flag with perl_call_* as shown below

void

Call_fred()

CODE:

PUSHMARK(SP) ;

perl_call_pv("fred", G_EVAL|G_DISCARD|G_NOARGS) ;

fprintf(stderr, "back in Call_fred\n") ;

EXAMPLES

Enough of the definition talk, let‘s have a few examples.

Perl provides many macros to assist in accessing the Perl stack. Wherever possible, these macros should

always be used when interfacing to Perl internals. We hope this should make the code less vulnerable to any

changes made to Perl in the future.

Another point worth noting is that in the first series of examples I have made use of only the perl_call_pv

function. This has been done to keep the code simpler and ease you into the topic. Wherever possible, if the

choice is between using perl_call_pv and perl_call_sv, you should always try to use perl_call_sv. See Using

perl_call_sv for details.

18−Oct−1998 Version 5.005_02 547

perlcall Perl Programmers Reference Guide perlcall

No Parameters, Nothing returned

This first trivial example will call a Perl subroutine, PrintUID, to print out the UID of the process.

sub PrintUID

{

print "UID is $<\n" ;

}

and here is a C function to call it

static void

call_PrintUID()

{

dSP ;

PUSHMARK(SP) ;

perl_call_pv("PrintUID", G_DISCARD|G_NOARGS) ;

}

Simple, eh.

A few points to note about this example.

1. Ignore dSP and PUSHMARK(SP) for now. They will be discussed in the next example.

2. We aren‘t passing any parameters to PrintUID so G_NOARGS can be specified.

3. We aren‘t interested in anything returned from PrintUID, so G_DISCARD is specified. Even if

PrintUID was changed to return some value(s), having specified G_DISCARD will mean that they

will be wiped by the time control returns from perl_call_pv.

4. As perl_call_pv is being used, the Perl subroutine is specified as a C string. In this case the

subroutine name has been ‘hard−wired’ into the code.

5. Because we specified G_DISCARD, it is not necessary to check the value returned from

perl_call_pv. It will always be 0.

Passing Parameters

Now let‘s make a slightly more complex example. This time we want to call a Perl subroutine,

LeftString, which will take 2 parameters − a string ($s) and an integer ($n). The subroutine will

simply print the first $n characters of the string.

So the Perl subroutine would look like this

sub LeftString

{

my($s, $n) = @_ ;

print substr($s, 0, $n), "\n" ;

}

The C function required to call LeftString would look like this.

static void

call_LeftString(a, b)

char * a ;

int b ;

{

dSP ;

ENTER ;

SAVETMPS ;

548 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSVpv(a, 0)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

perl_call_pv("LeftString", G_DISCARD);

FREETMPS ;

LEAVE ;

}

Here are a few notes on the C function call_LeftString.

1. Parameters are passed to the Perl subroutine using the Perl stack. This is the purpose of the code

beginning with the line dSP and ending with the line PUTBACK. The dSP declares a local copy of

the stack pointer. This local copy should always be accessed as SP.

2. If you are going to put something onto the Perl stack, you need to know where to put it. This is the

purpose of the macro dSP − it declares and initializes a local copy of the Perl stack pointer.

All the other macros which will be used in this example require you to have used this macro.

The exception to this rule is if you are calling a Perl subroutine directly from an XSUB function. In

this case it is not necessary to use the dSP macro explicitly − it will be declared for you

automatically.

3. Any parameters to be pushed onto the stack should be bracketed by the PUSHMARK and PUTBACK

macros. The purpose of these two macros, in this context, is to count the number of parameters you

are pushing automatically. Then whenever Perl is creating the @_ array for the subroutine, it knows

how big to make it.

The PUSHMARK macro tells Perl to make a mental note of the current stack pointer. Even if you

aren‘t passing any parameters (like the example shown in the section No Parameters, Nothing

returned) you must still call the PUSHMARK macro before you can call any of the perl_call_*

functions − Perl still needs to know that there are no parameters.

The PUTBACK macro sets the global copy of the stack pointer to be the same as our local copy. If we

didn‘t do this perl_call_pv wouldn‘t know where the two parameters we pushed were − remember

that up to now all the stack pointer manipulation we have done is with our local copy, not the global

copy.

4. The only flag specified this time is G_DISCARD. Because we are passing 2 parameters to the Perl

subroutine this time, we have not specified G_NOARGS.

5. Next, we come to XPUSHs. This is where the parameters actually get pushed onto the stack. In this

case we are pushing a string and an integer.

See XSUBs and the Argument Stack in perlguts for details on how the XPUSH macros work.

6. Because we created temporary values (by means of sv_2mortal() calls) we will have to tidy up

the Perl stack and dispose of mortal SVs.

This is the purpose of

ENTER ;

SAVETMPS ;

at the start of the function, and

FREETMPS ;

LEAVE ;

at the end. The ENTER/SAVETMPS pair creates a boundary for any temporaries we create. This

means that the temporaries we get rid of will be limited to those which were created after these calls.

18−Oct−1998 Version 5.005_02 549

perlcall Perl Programmers Reference Guide perlcall

The FREETMPS/LEAVE pair will get rid of any values returned by the Perl subroutine (see next

example), plus it will also dump the mortal SVs we have created. Having ENTER/SAVETMPS at the

beginning of the code makes sure that no other mortals are destroyed.

Think of these macros as working a bit like using { and } in Perl to limit the scope of local variables.

See the section Using Perl to dispose of temporaries for details of an alternative to using these

macros.

7. Finally, LeftString can now be called via the perl_call_pv function.

Returning a Scalar

Now for an example of dealing with the items returned from a Perl subroutine.

Here is a Perl subroutine, Adder, that takes 2 integer parameters and simply returns their sum.

sub Adder

{

my($a, $b) = @_ ;

$a + $b ;

}

Because we are now concerned with the return value from Adder, the C function required to call it is now a

bit more complex.

static void

call_Adder(a, b)

int a ;

int b ;

{

dSP ;

int count ;

ENTER ;

SAVETMPS;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(a)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

count = perl_call_pv("Adder", G_SCALAR);

SPAGAIN ;

if (count != 1)

croak("Big trouble\n") ;

printf ("The sum of %d and %d is %d\n", a, b, POPi) ;

PUTBACK ;

FREETMPS ;

LEAVE ;

}

Points to note this time are

1. The only flag specified this time was G_SCALAR. That means the @_ array will be created and that

the value returned by Adder will still exist after the call to perl_call_pv.

550 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

2. The purpose of the macro SPAGAIN is to refresh the local copy of the stack pointer. This is

necessary because it is possible that the memory allocated to the Perl stack has been reallocated

whilst in the perl_call_pv call.

If you are making use of the Perl stack pointer in your code you must always refresh the local copy

using SPAGAIN whenever you make use of the perl_call_* functions or any other Perl internal

function.

3. Although only a single value was expected to be returned from Adder, it is still good practice to

check the return code from perl_call_pv anyway.

Expecting a single value is not quite the same as knowing that there will be one. If someone modified

Adder to return a list and we didn‘t check for that possibility and take appropriate action the Perl

stack would end up in an inconsistent state. That is something you really don‘t want to happen ever.

4. The POPi macro is used here to pop the return value from the stack. In this case we wanted an

integer, so POPi was used.

Here is the complete list of POP macros available, along with the types they return.

POPs SV

POPp pointer

POPn double

POPi integer

POPl long

5. The final PUTBACK is used to leave the Perl stack in a consistent state before exiting the function.

This is necessary because when we popped the return value from the stack with POPi it updated only

our local copy of the stack pointer. Remember, PUTBACK sets the global stack pointer to be the

same as our local copy.

Returning a list of values

Now, let‘s extend the previous example to return both the sum of the parameters and the difference.

Here is the Perl subroutine

sub AddSubtract

{

my($a, $b) = @_ ;

($a+$b, $a−$b) ;

}

and this is the C function

static void

call_AddSubtract(a, b)

int a ;

int b ;

{

dSP ;

int count ;

ENTER ;

SAVETMPS;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(a)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

count = perl_call_pv("AddSubtract", G_ARRAY);

18−Oct−1998 Version 5.005_02 551

perlcall Perl Programmers Reference Guide perlcall

SPAGAIN ;

if (count != 2)

croak("Big trouble\n") ;

printf ("%d − %d = %d\n", a, b, POPi) ;

printf ("%d + %d = %d\n", a, b, POPi) ;

PUTBACK ;

FREETMPS ;

LEAVE ;

}

If call_AddSubtract is called like this

call_AddSubtract(7, 4) ;

then here is the output

7 − 4 = 3

7 + 4 = 11

Notes

1. We wanted array context, so G_ARRAY was used.

2. Not surprisingly POPi is used twice this time because we were retrieving 2 values from the stack.

The important thing to note is that when using the POP* macros they come off the stack in reverse

order.

Returning a list in a scalar context

Say the Perl subroutine in the previous section was called in a scalar context, like this

static void

call_AddSubScalar(a, b)

int a ;

int b ;

{

dSP ;

int count ;

int i ;

ENTER ;

SAVETMPS;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(a)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

count = perl_call_pv("AddSubtract", G_SCALAR);

SPAGAIN ;

printf ("Items Returned = %d\n", count) ;

for (i = 1 ; i <= count ; ++i)

printf ("Value %d = %d\n", i, POPi) ;

PUTBACK ;

FREETMPS ;

LEAVE ;

}

552 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

The other modification made is that call_AddSubScalar will print the number of items returned from the Perl

subroutine and their value (for simplicity it assumes that they are integer). So if call_AddSubScalar is called

call_AddSubScalar(7, 4) ;

then the output will be

Items Returned = 1

Value 1 = 3

In this case the main point to note is that only the last item in the list is returned from the subroutine,

AddSubtract actually made it back to call_AddSubScalar.

Returning Data from Perl via the parameter list

It is also possible to return values directly via the parameter list − whether it is actually desirable to do it is

another matter entirely.

The Perl subroutine, Inc, below takes 2 parameters and increments each directly.

sub Inc

{

++ $_[0] ;

++ $_[1] ;

}

and here is a C function to call it.

static void

call_Inc(a, b)

int a ;

int b ;

{

dSP ;

int count ;

SV * sva ;

SV * svb ;

ENTER ;

SAVETMPS;

sva = sv_2mortal(newSViv(a)) ;

svb = sv_2mortal(newSViv(b)) ;

PUSHMARK(SP) ;

XPUSHs(sva);

XPUSHs(svb);

PUTBACK ;

count = perl_call_pv("Inc", G_DISCARD);

if (count != 0)

croak ("call_Inc: expected 0 values from ’Inc’, got %d\n",

count) ;

printf ("%d + 1 = %d\n", a, SvIV(sva)) ;

printf ("%d + 1 = %d\n", b, SvIV(svb)) ;

FREETMPS ;

LEAVE ;

}

18−Oct−1998 Version 5.005_02 553

perlcall Perl Programmers Reference Guide perlcall

To be able to access the two parameters that were pushed onto the stack after they return from perl_call_pv it

is necessary to make a note of their addresses − thus the two variables sva and svb.

The reason this is necessary is that the area of the Perl stack which held them will very likely have been

overwritten by something else by the time control returns from perl_call_pv.

Using G_EVAL

Now an example using G_EVAL. Below is a Perl subroutine which computes the difference of its 2

parameters. If this would result in a negative result, the subroutine calls die.

sub Subtract

{

my ($a, $b) = @_ ;

die "death can be fatal\n" if $a < $b ;

$a − $b ;

}

and some C to call it

static void

call_Subtract(a, b)

int a ;

int b ;

{

dSP ;

int count ;

ENTER ;

SAVETMPS;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(a)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

count = perl_call_pv("Subtract", G_EVAL|G_SCALAR);

SPAGAIN ;

/* Check the eval first */

if (SvTRUE(ERRSV))

{

printf ("Uh oh − %s\n", SvPV(ERRSV, PL_na)) ;

POPs ;

}

else

{

if (count != 1)

croak("call_Subtract: wanted 1 value from ’Subtract’, got %d\n",

count) ;

printf ("%d − %d = %d\n", a, b, POPi) ;

}

PUTBACK ;

FREETMPS ;

LEAVE ;

}

554 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

If call_Subtract is called thus

call_Subtract(4, 5)

the following will be printed

Uh oh − death can be fatal

Notes

1. We want to be able to catch the die so we have used the G_EVAL flag. Not specifying this flag

would mean that the program would terminate immediately at the die statement in the subroutine

Subtract.

2. The code

if (SvTRUE(ERRSV))

{

printf ("Uh oh − %s\n", SvPV(ERRSV, PL_na)) ;

POPs ;

}

is the direct equivalent of this bit of Perl

print "Uh oh − $@\n" if $@ ;

PL_errgv is a perl global of type GV * that points to the symbol table entry containing the error.

ERRSV therefore refers to the C equivalent of $@.

3. Note that the stack is popped using POPs in the block where SvTRUE(ERRSV) is true. This is

necessary because whenever a perl_call_* function invoked with G_EVAL|G_SCALAR returns an

error, the top of the stack holds the value undef. Because we want the program to continue after

detecting this error, it is essential that the stack is tidied up by removing the undef.

Using G_KEEPERR

Consider this rather facetious example, where we have used an XS version of the call_Subtract example

above inside a destructor:

package Foo;

sub new { bless {}, $_[0] }

sub Subtract {

my($a,$b) = @_;

die "death can be fatal" if $a < $b ;

$a − $b;

}

sub DESTROY { call_Subtract(5, 4); }

sub foo { die "foo dies"; }

package main;

eval { Foo−>new−>foo };

print "Saw: $@" if $@; # should be, but isn’t

This example will fail to recognize that an error occurred inside the eval {}. Here‘s why: the

call_Subtract code got executed while perl was cleaning up temporaries when exiting the eval block, and

because call_Subtract is implemented with perl_call_pv using the G_EVAL flag, it promptly reset $@. This

results in the failure of the outermost test for $@, and thereby the failure of the error trap.

Appending the G_KEEPERR flag, so that the perl_call_pv call in call_Subtract reads:

count = perl_call_pv("Subtract", G_EVAL|G_SCALAR|G_KEEPERR);

will preserve the error and restore reliable error handling.

18−Oct−1998 Version 5.005_02 555

perlcall Perl Programmers Reference Guide perlcall

Using perl_call_sv

In all the previous examples I have ‘hard−wired’ the name of the Perl subroutine to be called from C. Most

of the time though, it is more convenient to be able to specify the name of the Perl subroutine from within

the Perl script.

Consider the Perl code below

sub fred

{

print "Hello there\n" ;

}

CallSubPV("fred") ;

Here is a snippet of XSUB which defines CallSubPV.

void

CallSubPV(name)

char * name

CODE:

PUSHMARK(SP) ;

perl_call_pv(name, G_DISCARD|G_NOARGS) ;

That is fine as far as it goes. The thing is, the Perl subroutine can be specified as only a string. For Perl 4 this

was adequate, but Perl 5 allows references to subroutines and anonymous subroutines. This is where

perl_call_sv is useful.

The code below for CallSubSV is identical to CallSubPV except that the name parameter is now defined as

an SV* and we use perl_call_sv instead of perl_call_pv.

void

CallSubSV(name)

SV * name

CODE:

PUSHMARK(SP) ;

perl_call_sv(name, G_DISCARD|G_NOARGS) ;

Because we are using an SV to call fred the following can all be used

CallSubSV("fred") ;

CallSubSV(\&fred) ;

$ref = \&fred ;

CallSubSV($ref) ;

CallSubSV( sub { print "Hello there\n" } ) ;

As you can see, perl_call_sv gives you much greater flexibility in how you can specify the Perl subroutine.

You should note that if it is necessary to store the SV (name in the example above) which corresponds to the

Perl subroutine so that it can be used later in the program, it not enough just to store a copy of the pointer to

the SV. Say the code above had been like this

static SV * rememberSub ;

void

SaveSub1(name)

SV * name

CODE:

rememberSub = name ;

void

CallSavedSub1()

556 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

CODE:

PUSHMARK(SP) ;

perl_call_sv(rememberSub, G_DISCARD|G_NOARGS) ;

The reason this is wrong is that by the time you come to use the pointer rememberSub in

CallSavedSub1, it may or may not still refer to the Perl subroutine that was recorded in SaveSub1.

This is particularly true for these cases

SaveSub1(\&fred) ;

CallSavedSub1() ;

SaveSub1( sub { print "Hello there\n" } ) ;

CallSavedSub1() ;

By the time each of the SaveSub1 statements above have been executed, the SV*s which corresponded to

the parameters will no longer exist. Expect an error message from Perl of the form

Can’t use an undefined value as a subroutine reference at ...

for each of the CallSavedSub1 lines.

Similarly, with this code

$ref = \&fred ;

SaveSub1($ref) ;

$ref = 47 ;

CallSavedSub1() ;

you can expect one of these messages (which you actually get is dependent on the version of Perl you are

using)

Not a CODE reference at ...

Undefined subroutine &main::47 called ...

The variable $ref may have referred to the subroutine fred whenever the call to SaveSub1 was made

but by the time CallSavedSub1 gets called it now holds the number 47. Because we saved only a pointer

to the original SV in SaveSub1, any changes to $ref will be tracked by the pointer rememberSub. This

means that whenever CallSavedSub1 gets called, it will attempt to execute the code which is referenced

by the SV* rememberSub. In this case though, it now refers to the integer 47, so expect Perl to complain

loudly.

A similar but more subtle problem is illustrated with this code

$ref = \&fred ;

SaveSub1($ref) ;

$ref = \&joe ;

CallSavedSub1() ;

This time whenever CallSavedSub1 get called it will execute the Perl subroutine joe (assuming it

exists) rather than fred as was originally requested in the call to SaveSub1.

To get around these problems it is necessary to take a full copy of the SV. The code below shows

SaveSub2 modified to do that

static SV * keepSub = (SV*)NULL ;

void

SaveSub2(name)

SV * name

CODE:

/* Take a copy of the callback */

if (keepSub == (SV*)NULL)

/* First time, so create a new SV */

18−Oct−1998 Version 5.005_02 557

perlcall Perl Programmers Reference Guide perlcall

keepSub = newSVsv(name) ;

else

/* Been here before, so overwrite */

SvSetSV(keepSub, name) ;

void

CallSavedSub2()

CODE:

PUSHMARK(SP) ;

perl_call_sv(keepSub, G_DISCARD|G_NOARGS) ;

To avoid creating a new SV every time SaveSub2 is called, the function first checks to see if it has been

called before. If not, then space for a new SV is allocated and the reference to the Perl subroutine, name is

copied to the variable keepSub in one operation using newSVsv. Thereafter, whenever SaveSub2 is

called the existing SV, keepSub, is overwritten with the new value using SvSetSV.

Using perl_call_argv

Here is a Perl subroutine which prints whatever parameters are passed to it.

sub PrintList

{

my(@list) = @_ ;

foreach (@list) { print "$_\n" }

}

and here is an example of perl_call_argv which will call PrintList.

static char * words[] = {"alpha", "beta", "gamma", "delta", NULL} ;

static void

call_PrintList()

{

dSP ;

perl_call_argv("PrintList", G_DISCARD, words) ;

}

Note that it is not necessary to call PUSHMARK in this instance. This is because perl_call_argv will do it for

you.

Using perl_call_method

Consider the following Perl code

{

package Mine ;

sub new

{

my($type) = shift ;

bless [@_]

}

sub Display

{

my ($self, $index) = @_ ;

print "$index: $$self[$index]\n" ;

}

sub PrintID

{

my($class) = @_ ;

558 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

print "This is Class $class version 1.0\n" ;

}

It implements just a very simple class to manage an array. Apart from the constructor, new, it declares

methods, one static and one virtual. The static method, PrintID, prints out simply the class name and a

version number. The virtual method, Display, prints out a single element of the array. Here is an all Perl

example of using it.

$a = new Mine (’red’, ’green’, ’blue’) ;

$a−>Display(1) ;

PrintID Mine;

will print

1: green

This is Class Mine version 1.0

Calling a Perl method from C is fairly straightforward. The following things are required

a reference to the object for a virtual method or the name of the class for a static method.

the name of the method.

any other parameters specific to the method.

Here is a simple XSUB which illustrates the mechanics of calling both the PrintID and Display

methods from C.

void

call_Method(ref, method, index)

SV * ref

char * method

int index

CODE:

PUSHMARK(SP);

XPUSHs(ref);

XPUSHs(sv_2mortal(newSViv(index))) ;

PUTBACK;

perl_call_method(method, G_DISCARD) ;

void

call_PrintID(class, method)

char * class

char * method

CODE:

PUSHMARK(SP);

XPUSHs(sv_2mortal(newSVpv(class, 0))) ;

PUTBACK;

perl_call_method(method, G_DISCARD) ;

So the methods PrintID and Display can be invoked like this

$a = new Mine (’red’, ’green’, ’blue’) ;

call_Method($a, ’Display’, 1) ;

call_PrintID(’Mine’, ’PrintID’) ;

The only thing to note is that in both the static and virtual methods, the method name is not passed via the

stack − it is used as the first parameter to perl_call_method.

18−Oct−1998 Version 5.005_02 559

perlcall Perl Programmers Reference Guide perlcall

Using GIMME_V

Here is a trivial XSUB which prints the context in which it is currently executing.

void

PrintContext()

CODE:

I32 gimme = GIMME_V;

if (gimme == G_VOID)

printf ("Context is Void\n") ;

else if (gimme == G_SCALAR)

printf ("Context is Scalar\n") ;

else

printf ("Context is Array\n") ;

and here is some Perl to test it

PrintContext ;

$a = PrintContext ;

@a = PrintContext ;

The output from that will be

Context is Void

Context is Scalar

Context is Array

Using Perl to dispose of temporaries

In the examples given to date, any temporaries created in the callback (i.e., parameters passed on the stack to

the perl_call_* function or values returned via the stack) have been freed by one of these methods

specifying the G_DISCARD flag with perl_call_*.

explicitly disposed of using the ENTER/SAVETMPS − FREETMPS/LEAVE pairing.

There is another method which can be used, namely letting Perl do it for you automatically whenever it

regains control after the callback has terminated. This is done by simply not using the

ENTER ;

SAVETMPS ;

...

FREETMPS ;

LEAVE ;

sequence in the callback (and not, of course, specifying the G_DISCARD flag).

If you are going to use this method you have to be aware of a possible memory leak which can arise under

very specific circumstances. To explain these circumstances you need to know a bit about the flow of

control between Perl and the callback routine.

The examples given at the start of the document (an error handler and an event driven program) are typical of

the two main sorts of flow control that you are likely to encounter with callbacks. There is a very important

distinction between them, so pay attention.

In the first example, an error handler, the flow of control could be as follows. You have created an interface

to an external library. Control can reach the external library like this

perl −−> XSUB −−> external library

Whilst control is in the library, an error condition occurs. You have previously set up a Perl callback to

handle this situation, so it will get executed. Once the callback has finished, control will drop back to Perl

again. Here is what the flow of control will be like in that situation

560 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

perl −−> XSUB −−> external library

...

error occurs

...

external library −−> perl_call −−> perl

perl <−− XSUB <−− external library <−− perl_call <−−−−+

After processing of the error using perl_call_* is completed, control reverts back to Perl more or less

immediately.

In the diagram, the further right you go the more deeply nested the scope is. It is only when control is back

with perl on the extreme left of the diagram that you will have dropped back to the enclosing scope and any

temporaries you have left hanging around will be freed.

In the second example, an event driven program, the flow of control will be more like this

perl −−> XSUB −−> event handler

...

event handler −−> perl_call −−> perl

event handler <−− perl_call <−−−−+

...

event handler −−> perl_call −−> perl

event handler <−− perl_call <−−−−+

...

event handler −−> perl_call −−> perl

event handler <−− perl_call <−−−−+

In this case the flow of control can consist of only the repeated sequence

event handler −−> perl_call −−> perl

for practically the complete duration of the program. This means that control may never drop back to the

surrounding scope in Perl at the extreme left.

So what is the big problem? Well, if you are expecting Perl to tidy up those temporaries for you, you might

be in for a long wait. For Perl to dispose of your temporaries, control must drop back to the enclosing scope

at some stage. In the event driven scenario that may never happen. This means that as time goes on, your

program will create more and more temporaries, none of which will ever be freed. As each of these

temporaries consumes some memory your program will eventually consume all the available memory in

your system − kapow!

So here is the bottom line − if you are sure that control will revert back to the enclosing Perl scope fairly

quickly after the end of your callback, then it isn‘t absolutely necessary to dispose explicitly of any

temporaries you may have created. Mind you, if you are at all uncertain about what to do, it doesn‘t do any

harm to tidy up anyway.

Strategies for storing Callback Context Information

Potentially one of the trickiest problems to overcome when designing a callback interface can be figuring out

how to store the mapping between the C callback function and the Perl equivalent.

To help understand why this can be a real problem first consider how a callback is set up in an all C

environment. Typically a C API will provide a function to register a callback. This will expect a pointer to a

function as one of its parameters. Below is a call to a hypothetical function register_fatal which

registers the C function to get called when a fatal error occurs.

register_fatal(cb1) ;

18−Oct−1998 Version 5.005_02 561

perlcall Perl Programmers Reference Guide perlcall

The single parameter cb1 is a pointer to a function, so you must have defined cb1 in your code, say

something like this

static void

cb1()

{

printf ("Fatal Error\n") ;

exit(1) ;

}

Now change that to call a Perl subroutine instead

static SV * callback = (SV*)NULL;

static void

cb1()

{

dSP ;

PUSHMARK(SP) ;

/* Call the Perl sub to process the callback */

perl_call_sv(callback, G_DISCARD) ;

}

void

register_fatal(fn)

SV * fn

CODE:

/* Remember the Perl sub */

if (callback == (SV*)NULL)

callback = newSVsv(fn) ;

else

SvSetSV(callback, fn) ;

/* register the callback with the external library */

register_fatal(cb1) ;

where the Perl equivalent of register_fatal and the callback it registers, pcb1, might look like this

# Register the sub pcb1

register_fatal(\&pcb1) ;

sub pcb1

{

die "I’m dying...\n" ;

}

The mapping between the C callback and the Perl equivalent is stored in the global variable callback.

This will be adequate if you ever need to have only one callback registered at any time. An example could be

an error handler like the code sketched out above. Remember though, repeated calls to register_fatal

will replace the previously registered callback function with the new one.

Say for example you want to interface to a library which allows asynchronous file i/o. In this case you may

be able to register a callback whenever a read operation has completed. To be of any use we want to be able

to call separate Perl subroutines for each file that is opened. As it stands, the error handler example above

would not be adequate as it allows only a single callback to be defined at any time. What we require is a

means of storing the mapping between the opened file and the Perl subroutine we want to be called for that

file.

562 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

Say the i/o library has a function asynch_read which associates a C function ProcessRead with a file

handle fh − this assumes that it has also provided some routine to open the file and so obtain the file handle.

asynch_read(fh, ProcessRead)

This may expect the C ProcessRead function of this form

void

ProcessRead(fh, buffer)

int fh ;

char * buffer ;

{

...

}

To provide a Perl interface to this library we need to be able to map between the fh parameter and the Perl

subroutine we want called. A hash is a convenient mechanism for storing this mapping. The code below

shows a possible implementation

static HV * Mapping = (HV*)NULL ;

void

asynch_read(fh, callback)

int fh

SV * callback

CODE:

/* If the hash doesn’t already exist, create it */

if (Mapping == (HV*)NULL)

Mapping = newHV() ;

/* Save the fh −> callback mapping */

hv_store(Mapping, (char*)&fh, sizeof(fh), newSVsv(callback), 0) ;

/* Register with the C Library */

asynch_read(fh, asynch_read_if) ;

and asynch_read_if could look like this

static void

asynch_read_if(fh, buffer)

int fh ;

char * buffer ;

{

dSP ;

SV ** sv ;

/* Get the callback associated with fh */

sv = hv_fetch(Mapping, (char*)&fh , sizeof(fh), FALSE) ;

if (sv == (SV**)NULL)

croak("Internal error...\n") ;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(fh))) ;

XPUSHs(sv_2mortal(newSVpv(buffer, 0))) ;

PUTBACK ;

/* Call the Perl sub */

perl_call_sv(*sv, G_DISCARD) ;

}

For completeness, here is asynch_close. This shows how to remove the entry from the hash Mapping.

18−Oct−1998 Version 5.005_02 563

perlcall Perl Programmers Reference Guide perlcall

void

asynch_close(fh)

intfh

CODE:

/* Remove the entry from the hash */

(void) hv_delete(Mapping, (char*)&fh, sizeof(fh), G_DISCARD) ;

/* Now call the real asynch_close */

asynch_close(fh) ;

So the Perl interface would look like this

sub callback1

{

my($handle, $buffer) = @_ ;

}

# Register the Perl callback

asynch_read($fh, \&callback1) ;

asynch_close($fh) ;

The mapping between the C callback and Perl is stored in the global hash Mapping this time. Using a hash

has the distinct advantage that it allows an unlimited number of callbacks to be registered.

What if the interface provided by the C callback doesn‘t contain a parameter which allows the file handle to

Perl subroutine mapping? Say in the asynchronous i/o package, the callback function gets passed only the

buffer parameter like this

void

ProcessRead(buffer)

char * buffer ;

{

...

}

Without the file handle there is no straightforward way to map from the C callback to the Perl subroutine.

In this case a possible way around this problem is to predefine a series of C functions to act as the interface

to Perl, thus

#define MAX_CB 3

#define NULL_HANDLE −1

typedef void (*FnMap)() ;

struct MapStruct {

FnMap Function ;

SV * PerlSub ;

int Handle ;

} ;

static void fn1() ;

static void fn2() ;

static void fn3() ;

static struct MapStruct Map [MAX_CB] =

{

{ fn1, NULL, NULL_HANDLE },

{ fn2, NULL, NULL_HANDLE },

{ fn3, NULL, NULL_HANDLE }

} ;

564 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

static void

Pcb(index, buffer)

int index ;

char * buffer ;

{

dSP ;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSVpv(buffer, 0))) ;

PUTBACK ;

/* Call the Perl sub */

perl_call_sv(Map[index].PerlSub, G_DISCARD) ;

}

static void

fn1(buffer)

char * buffer ;

{

Pcb(0, buffer) ;

}

static void

fn2(buffer)

char * buffer ;

{

Pcb(1, buffer) ;

}

static void

fn3(buffer)

char * buffer ;

{

Pcb(2, buffer) ;

}

void

array_asynch_read(fh, callback)

int fh

SV * callback

CODE:

int index ;

int null_index = MAX_CB ;

/* Find the same handle or an empty entry */

for (index = 0 ; index < MAX_CB ; ++index)

{

if (Map[index].Handle == fh)

break ;

if (Map[index].Handle == NULL_HANDLE)

null_index = index ;

}

if (index == MAX_CB && null_index == MAX_CB)

croak ("Too many callback functions registered\n") ;

if (index == MAX_CB)

index = null_index ;

18−Oct−1998 Version 5.005_02 565

perlcall Perl Programmers Reference Guide perlcall

/* Save the file handle */

Map[index].Handle = fh ;

/* Remember the Perl sub */

if (Map[index].PerlSub == (SV*)NULL)

Map[index].PerlSub = newSVsv(callback) ;

else

SvSetSV(Map[index].PerlSub, callback) ;

asynch_read(fh, Map[index].Function) ;

void

array_asynch_close(fh)

int fh

CODE:

int index ;

/* Find the file handle */

for (index = 0; index < MAX_CB ; ++ index)

if (Map[index].Handle == fh)

break ;

if (index == MAX_CB)

croak ("could not close fh %d\n", fh) ;

Map[index].Handle = NULL_HANDLE ;

SvREFCNT_dec(Map[index].PerlSub) ;

Map[index].PerlSub = (SV*)NULL ;

asynch_close(fh) ;

In this case the functions fn1, fn2, and fn3 are used to remember the Perl subroutine to be called. Each of

the functions holds a separate hard−wired index which is used in the function Pcb to access the Map array

and actually call the Perl subroutine.

There are some obvious disadvantages with this technique.

Firstly, the code is considerably more complex than with the previous example.

Secondly, there is a hard−wired limit (in this case 3) to the number of callbacks that can exist

simultaneously. The only way to increase the limit is by modifying the code to add more functions and then

recompiling. None the less, as long as the number of functions is chosen with some care, it is still a

workable solution and in some cases is the only one available.

To summarize, here are a number of possible methods for you to consider for storing the mapping between C

and the Perl callback

1. Ignore the problem − Allow only 1 callback

For a lot of situations, like interfacing to an error handler, this may be a perfectly adequate solution.

2. Create a sequence of callbacks − hard wired limit

If it is impossible to tell from the parameters passed back from the C callback what the context is,

then you may need to create a sequence of C callback interface functions, and store pointers to each

in an array.

3. Use a parameter to map to the Perl callback

A hash is an ideal mechanism to store the mapping between C and Perl.

Alternate Stack Manipulation

Although I have made use of only the POP* macros to access values returned from Perl subroutines, it is

also possible to bypass these macros and read the stack using the ST macro (See perlxs for a full description

of the ST macro).

566 Version 5.005_02 18−Oct−1998

perlcall Perl Programmers Reference Guide perlcall

Most of the time the POP* macros should be adequate, the main problem with them is that they force you to

process the returned values in sequence. This may not be the most suitable way to process the values in some

cases. What we want is to be able to access the stack in a random order. The ST macro as used when coding

an XSUB is ideal for this purpose.

The code below is the example given in the section Returning a list of values recoded to use ST instead of

POP*.

static void

call_AddSubtract2(a, b)

int a ;

int b ;

{

dSP ;

I32 ax ;

int count ;

ENTER ;

SAVETMPS;

PUSHMARK(SP) ;

XPUSHs(sv_2mortal(newSViv(a)));

XPUSHs(sv_2mortal(newSViv(b)));

PUTBACK ;

count = perl_call_pv("AddSubtract", G_ARRAY);

SPAGAIN ;

SP −= count ;

ax = (SP − PL_stack_base) + 1 ;

if (count != 2)

croak("Big trouble\n") ;

printf ("%d + %d = %d\n", a, b, SvIV(ST(0))) ;

printf ("%d − %d = %d\n", a, b, SvIV(ST(1))) ;

PUTBACK ;

FREETMPS ;

LEAVE ;

}

Notes

1. Notice that it was necessary to define the variable ax. This is because the ST macro expects it to

exist. If we were in an XSUB it would not be necessary to define ax as it is already defined for you.

2. The code

SPAGAIN ;

SP −= count ;

ax = (SP − PL_stack_base) + 1 ;

sets the stack up so that we can use the ST macro.

3. Unlike the original coding of this example, the returned values are not accessed in reverse order. So

ST(0) refers to the first value returned by the Perl subroutine and ST(count−1) refers to the last.

Creating and calling an anonymous subroutine in C

As we‘ve already shown, perl_call_sv can be used to invoke an anonymous subroutine. However, our

example showed how Perl script invoking an XSUB to preform this operation. Let‘s see how it can be done

inside our C code:

18−Oct−1998 Version 5.005_02 567

perlcall Perl Programmers Reference Guide perlcall

...

SV *cvrv = perl_eval_pv("sub { print ’You will not find me cluttering any namespace!

...

perl_call_sv(cvrv, G_VOID|G_NOARGS);

perl_eval_pv is used to compile the anonymous subroutine, which will be the return value as well (read

more about perl_eval_pv in perl_eval_pv). Once this code reference is in hand, it can be mixed in with

all the previous examples we‘ve shown.

SEE ALSO

perlxs, perlguts, perlembed

AUTHOR

Paul Marquess <pmarquess@bfsec.bt.co.uk

Special thanks to the following people who assisted in the creation of the document.

Jeff Okamoto, Tim Bunce, Nick Gianniotis, Steve Kelem, Gurusamy Sarathy and Larry Wall.

DATE

Version 1.3, 14th Apr 1997

568 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

NAME

perlembed − how to embed perl in your C program

DESCRIPTION

PREAMBLE

Do you want to:

Use C from Perl?

Read perlxstut, perlxs, h2xs, and perlguts.

Use a Unix program from Perl?

Read about back−quotes and about system and exec in perlfunc.

Use Perl from Perl?

Read about do and eval and require and use.

Use C from C?

Rethink your design.

Use Perl from C?

Read on...

ROADMAP

Compiling your C program

Adding a Perl interpreter to your C program

Calling a Perl subroutine from your C program

Evaluating a Perl statement from your C program

Performing Perl pattern matches and substitutions from your C program

Fiddling with the Perl stack from your C program

Maintaining a persistent interpreter

Maintaining multiple interpreter instances

Using Perl modules, which themselves use C libraries, from your C program

Embedding Perl under Win32

Compiling your C program

If you have trouble compiling the scripts in this documentation, you‘re not alone. The cardinal rule:

COMPILE THE PROGRAMS IN EXACTLY THE SAME WAY THAT YOUR PERL WAS COMPILED.

(Sorry for yelling.)

Also, every C program that uses Perl must link in the perl library. What‘s that, you ask? Perl is itself written

in C; the perl library is the collection of compiled C programs that were used to create your perl executable

(/usr/bin/perl or equivalent). (Corollary: you can‘t use Perl from your C program unless Perl has been

compiled on your machine, or installed properly—that‘s why you shouldn‘t blithely copy Perl executables

from machine to machine without also copying the lib directory.)

When you use Perl from C, your C program will—usually—allocate, "run", and deallocate a PerlInterpreter

object, which is defined by the perl library.

If your copy of Perl is recent enough to contain this documentation (version 5.002 or later), then the perl

library (and EXTERN.h and perl.h, which you‘ll also need) will reside in a directory that looks like this:

/usr/local/lib/perl5/your_architecture_here/CORE

18−Oct−1998 Version 5.005_02 569

perlembed Perl Programmers Reference Guide perlembed

or perhaps just

/usr/local/lib/perl5/CORE

or maybe something like

/usr/opt/perl5/CORE

Execute this statement for a hint about where to find CORE:

perl −MConfig −e ’print $Config{archlib}’

Here‘s how you‘d compile the example in the next section, Adding a Perl interpreter to your C program, on

my Linux box:

% gcc −O2 −Dbool=char −DHAS_BOOL −I/usr/local/include

−I/usr/local/lib/perl5/i586−linux/5.003/CORE

−L/usr/local/lib/perl5/i586−linux/5.003/CORE

−o interp interp.c −lperl −lm

(That‘s all one line.) On my DEC Alpha running old 5.003_05, the incantation is a bit different:

% cc −O2 −Olimit 2900 −DSTANDARD_C −I/usr/local/include

−I/usr/local/lib/perl5/alpha−dec_osf/5.00305/CORE

−L/usr/local/lib/perl5/alpha−dec_osf/5.00305/CORE −L/usr/local/lib

−D__LANGUAGE_C__ −D_NO_PROTO −o interp interp.c −lperl −lm

How can you figure out what to add? Assuming your Perl is post−5.001, execute a perl −V command and

pay special attention to the "cc" and "ccflags" information.

You‘ll have to choose the appropriate compiler (cc, gcc, et al.) for your machine: perl −MConfig −e

‘print $Config{cc}’ will tell you what to use.

You‘ll also have to choose the appropriate library directory (/usr/local/lib/...) for your machine. If your

compiler complains that certain functions are undefined, or that it can‘t locate −lperl, then you need to

change the path following the −L. If it complains that it can‘t find EXTERN.h and perl.h, you need to

change the path following the −I.

You may have to add extra libraries as well. Which ones? Perhaps those printed by

perl −MConfig −e ’print $Config{libs}’

Provided your perl binary was properly configured and installed the ExtUtils::Embed module will

determine all of this information for you:

% cc −o interp interp.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

If the ExtUtils::Embed module isn‘t part of your Perl distribution, you can retrieve it from

http://www.perl.com/perl/CPAN/modules/by−module/ExtUtils::Embed. (If this documentation came from

your Perl distribution, then you‘re running 5.004 or better and you already have it.)

The ExtUtils::Embed kit on CPAN also contains all source code for the examples in this document, tests,

additional examples and other information you may find useful.

Adding a Perl interpreter to your C program

In a sense, perl (the C program) is a good example of embedding Perl (the language), so I‘ll demonstrate

embedding with miniperlmain.c, included in the source distribution. Here‘s a bastardized, nonportable

version of miniperlmain.c containing the essentials of embedding:

#include <EXTERN.h> /* from the Perl distribution */

#include <perl.h> /* from the Perl distribution */

static PerlInterpreter *my_perl; /*** The Perl interpreter ***/

int main(int argc, char **argv, char **env)

570 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

{

my_perl = perl_alloc();

perl_construct(my_perl);

perl_parse(my_perl, NULL, argc, argv, (char **)NULL);

perl_run(my_perl);

perl_destruct(my_perl);

perl_free(my_perl);

}

Notice that we don‘t use the env pointer. Normally handed to perl_parse as its final argument, env

here is replaced by NULL, which means that the current environment will be used.

Now compile this program (I‘ll call it interp.c) into an executable:

% cc −o interp interp.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

After a successful compilation, you‘ll be able to use interp just like perl itself:

% interp

print "Pretty Good Perl \n";

print "10890 − 9801 is ", 10890 − 9801;

<CTRL−D>

Pretty Good Perl

10890 − 9801 is 1089

% interp −e ’printf("%x", 3735928559)’

deadbeef

You can also read and execute Perl statements from a file while in the midst of your C program, by placing

the filename in argv[1] before calling perl_run.

Calling a Perl subroutine from your C program

To call individual Perl subroutines, you can use any of the perl_call_* functions documented in perlcall. In

this example we‘ll use perl_call_argv.

That‘s shown below, in a program I‘ll call showtime.c.

#include <EXTERN.h>

#include <perl.h>

static PerlInterpreter *my_perl;

int main(int argc, char **argv, char **env)

{

char *args[] = { NULL };

my_perl = perl_alloc();

perl_construct(my_perl);

perl_parse(my_perl, NULL, argc, argv, NULL);

/*** skipping perl_run() ***/

perl_call_argv("showtime", G_DISCARD | G_NOARGS, args);

perl_destruct(my_perl);

perl_free(my_perl);

}

where showtime is a Perl subroutine that takes no arguments (that‘s the G_NOARGS) and for which I‘ll

ignore the return value (that‘s the G_DISCARD). Those flags, and others, are discussed in perlcall.

18−Oct−1998 Version 5.005_02 571

perlembed Perl Programmers Reference Guide perlembed

I‘ll define the showtime subroutine in a file called showtime.pl:

print "I shan’t be printed.";

sub showtime {

print time;

}

Simple enough. Now compile and run:

% cc −o showtime showtime.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

% showtime showtime.pl

818284590

yielding the number of seconds that elapsed between January 1, 1970 (the beginning of the Unix epoch), and

the moment I began writing this sentence.

In this particular case we don‘t have to call perl_run, but in general it‘s considered good practice to ensure

proper initialization of library code, including execution of all object DESTROY methods and package END

{} blocks.

If you want to pass arguments to the Perl subroutine, you can add strings to the NULL−terminated args list

passed to perl_call_argv. For other data types, or to examine return values, you‘ll need to manipulate the

Perl stack. That‘s demonstrated in the last section of this document:

Fiddling with the Perl stack from your C program.

Evaluating a Perl statement from your C program

Perl provides two API functions to evaluate pieces of Perl code. These are perl_eval_sv and perl_eval_pv.

Arguably, these are the only routines you‘ll ever need to execute snippets of Perl code from within your C

program. Your code can be as long as you wish; it can contain multiple statements; it can employ use,

require, and do to include external Perl files.

perl_eval_pv lets us evaluate individual Perl strings, and then extract variables for coercion into C types.

The following program, string.c, executes three Perl strings, extracting an int from the first, a float from

the second, and a char * from the third.

#include <EXTERN.h>

#include <perl.h>

static PerlInterpreter *my_perl;

main (int argc, char **argv, char **env)

{

char *embedding[] = { "", "−e", "0" };

my_perl = perl_alloc();

perl_construct( my_perl );

perl_parse(my_perl, NULL, 3, embedding, NULL);

perl_run(my_perl);

/** Treat $a as an integer **/

perl_eval_pv("$a = 3; $a **= 2", TRUE);

printf("a = %d\n", SvIV(perl_get_sv("a", FALSE)));

/** Treat $a as a float **/

perl_eval_pv("$a = 3.14; $a **= 2", TRUE);

printf("a = %f\n", SvNV(perl_get_sv("a", FALSE)));

/** Treat $a as a string **/

perl_eval_pv("$a = ’rekcaH lreP rehtonA tsuJ’; $a = reverse($a);", TRUE);

printf("a = %s\n", SvPV(perl_get_sv("a", FALSE), PL_na));

572 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

perl_destruct(my_perl);

perl_free(my_perl);

}

All of those strange functions with sv in their names help convert Perl scalars to C types. They‘re described

in perlguts.

If you compile and run string.c, you‘ll see the results of using

SvIV()

to create an int,

SvNV()

to create

a float, and

SvPV()

to create a string:

a = 9

a = 9.859600

a = Just Another Perl Hacker

In the example above, we‘ve created a global variable to temporarily store the computed value of our eval‘d

expression. It is also possible and in most cases a better strategy to fetch the return value from

perl_eval_pv()

instead. Example:

...

SV *val = perl_eval_pv("reverse ’rekcaH lreP rehtonA tsuJ’", TRUE);

printf("%s\n", SvPV(val,PL_na));

...

This way, we avoid namespace pollution by not creating global variables and we‘ve simplified our code as

well.

Performing Perl pattern matches and substitutions from your C program

The

perl_eval_sv()

function lets us evaluate strings of Perl code, so we can define some functions that

use it to "specialize" in matches and substitutions:

match()

substitute()

, and

matches()

I32 match(SV *string, char *pattern);

Given a string and a pattern (e.g., m/clasp/ or /\b\w*\b/, which in your C program might appear as

"/\\b\\w*\\b/"), match() returns 1 if the string matches the pattern and 0 otherwise.

int substitute(SV **string, char *pattern);

Given a pointer to an SV and an =~ operation (e.g., s/bob/robert/g or tr[A−Z][a−z]),

substitute() modifies the string within the AV at according to the operation, returning the number of

substitutions made.

int matches(SV *string, char *pattern, AV **matches);

Given an SV, a pattern, and a pointer to an empty AV, matches() evaluates $string =~ $pattern in

an array context, and fills in matches with the array elements, returning the number of matches found.

Here‘s a sample program, match.c, that uses all three (long lines have been wrapped here):

#include <EXTERN.h>

#include <perl.h>

/** my_perl_eval_sv(code, error_check)

** kinda like perl_eval_sv(),

** but we pop the return value off the stack

**/

SV* my_perl_eval_sv(SV *sv, I32 croak_on_error)

{

dSP;

SV* retval;

PUSHMARK(SP);

perl_eval_sv(sv, G_SCALAR);

18−Oct−1998 Version 5.005_02 573

perlembed Perl Programmers Reference Guide perlembed

SPAGAIN;

retval = POPs;

PUTBACK;

if (croak_on_error && SvTRUE(ERRSV))

croak(SvPVx(ERRSV, PL_na));

return retval;

}

/** match(string, pattern)

** Used for matches in a scalar context.

** Returns 1 if the match was successful; 0 otherwise.

**/

I32 match(SV *string, char *pattern)

{

SV *command = NEWSV(1099, 0), *retval;

sv_setpvf(command, "my $string = ’%s’; $string =~ %s",

SvPV(string,PL_na), pattern);

retval = my_perl_eval_sv(command, TRUE);

SvREFCNT_dec(command);

return SvIV(retval);

}

/** substitute(string, pattern)

** Used for =~ operations that modify their left−hand side (s/// and tr///)

** Returns the number of successful matches, and

** modifies the input string if there were any.

**/

I32 substitute(SV **string, char *pattern)

{

SV *command = NEWSV(1099, 0), *retval;

sv_setpvf(command, "$string = ’%s’; ($string =~ %s)",

SvPV(*string,PL_na), pattern);

retval = my_perl_eval_sv(command, TRUE);

SvREFCNT_dec(command);

*string = perl_get_sv("string", FALSE);

return SvIV(retval);

}

/** matches(string, pattern, matches)

** Used for matches in an array context.

** Returns the number of matches,

** and fills in **matches with the matching substrings

**/

I32 matches(SV *string, char *pattern, AV **match_list)

574 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

{

SV *command = NEWSV(1099, 0);

I32 num_matches;

sv_setpvf(command, "my $string = ’%s’; @array = ($string =~ %s)",

SvPV(string,PL_na), pattern);

my_perl_eval_sv(command, TRUE);

SvREFCNT_dec(command);

*match_list = perl_get_av("array", FALSE);

num_matches = av_len(*match_list) + 1; /** assume $[ is 0 **/

return num_matches;

}

main (int argc, char **argv, char **env)

{

PerlInterpreter *my_perl = perl_alloc();

char *embedding[] = { "", "−e", "0" };

AV *match_list;

I32 num_matches, i;

SV *text = NEWSV(1099,0);

perl_construct(my_perl);

perl_parse(my_perl, NULL, 3, embedding, NULL);

sv_setpv(text, "When he is at a convenience store and the bill comes to some amo

if (match(text, "m/quarter/")) /** Does text contain ’quarter’? **/

printf("match: Text contains the word ’quarter’.\n\n");

else

printf("match: Text doesn’t contain the word ’quarter’.\n\n");

if (match(text, "m/eighth/")) /** Does text contain ’eighth’? **/

printf("match: Text contains the word ’eighth’.\n\n");

else

printf("match: Text doesn’t contain the word ’eighth’.\n\n");

/** Match all occurrences of /wi../ **/

num_matches = matches(text, "m/(wi..)/g", &match_list);

printf("matches: m/(wi..)/g found %d matches...\n", num_matches);

for (i = 0; i < num_matches; i++)

printf("match: %s\n", SvPV(*av_fetch(match_list, i, FALSE),PL_na));

printf("\n");

/** Remove all vowels from text **/

num_matches = substitute(&text, "s/[aeiou]//gi");

if (num_matches) {

printf("substitute: s/[aeiou]//gi...%d substitutions made.\n",

num_matches);

printf("Now text is: %s\n\n", SvPV(text,PL_na));

}

/** Attempt a substitution **/

if (!substitute(&text, "s/Perl/C/")) {

printf("substitute: s/Perl/C...No substitution made.\n\n");

}

SvREFCNT_dec(text);

18−Oct−1998 Version 5.005_02 575

perlembed Perl Programmers Reference Guide perlembed

PL_perl_destruct_level = 1;

perl_destruct(my_perl);

perl_free(my_perl);

}

which produces the output (again, long lines have been wrapped here)

match: Text contains the word ’quarter’.

match: Text doesn’t contain the word ’eighth’.

matches: m/(wi..)/g found 2 matches...

match: will

match: with

substitute: s/[aeiou]//gi...139 substitutions made.

Now text is: Whn h s t cnvnnc str nd th bll cms t sm mnt lk 76 cnts,

Mynrd s wr tht thr s smthng h *shld* d, smthng tht wll nbl hm t gt bck

qrtr, bt h hs n d *wht*. H fmbls thrgh hs rd sqzy chngprs nd gvs th by

thr xtr pnns wth hs dllr, hpng tht h mght lck nt th crrct mnt. Th by gvs

hm bck tw f hs wn pnns nd thn th bg shny qrtr tht s hs prz. −RCHH

substitute: s/Perl/C...No substitution made.

Fiddling with the Perl stack from your C program

When trying to explain stacks, most computer science textbooks mumble something about spring−loaded

columns of cafeteria plates: the last thing you pushed on the stack is the first thing you pop off. That‘ll do

for our purposes: your C program will push some arguments onto "the Perl stack", shut its eyes while some

magic happens, and then pop the results—the return value of your Perl subroutine—off the stack.

First you‘ll need to know how to convert between C types and Perl types, with newSViv() and

sv_setnv() and newAV() and all their friends. They‘re described in perlguts.

Then you‘ll need to know how to manipulate the Perl stack. That‘s described in perlcall.

Once you‘ve understood those, embedding Perl in C is easy.

Because C has no builtin function for integer exponentiation, let‘s make Perl‘s ** operator available to it

(this is less useful than it sounds, because Perl implements ** with C‘s

pow()

function). First I‘ll create a

stub exponentiation function in power.pl:

sub expo {

my ($a, $b) = @_;

return $a ** $b;

}

Now I‘ll create a C program, power.c, with a function

PerlPower()

that contains all the perlguts

necessary to push the two arguments into

expo()

and to pop the return value out. Take a deep breath...

#include <EXTERN.h>

#include <perl.h>

static PerlInterpreter *my_perl;

static void

PerlPower(int a, int b)

{

dSP; /* initialize stack pointer */

ENTER; /* everything created after here */

SAVETMPS; /* ...is a temporary variable. */

PUSHMARK(SP); /* remember the stack pointer */

XPUSHs(sv_2mortal(newSViv(a))); /* push the base onto the stack */

576 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

XPUSHs(sv_2mortal(newSViv(b))); /* push the exponent onto stack */

PUTBACK; /* make local stack pointer global */

perl_call_pv("expo", G_SCALAR); /* call the function */

SPAGAIN; /* refresh stack pointer */

/* pop the return value from stack */

printf ("%d to the %dth power is %d.\n", a, b, POPi);

PUTBACK;

FREETMPS; /* free that return value */

LEAVE; /* ...and the XPUSHed "mortal" args.*/

}

int main (int argc, char **argv, char **env)

{

char *my_argv[] = { "", "power.pl" };

my_perl = perl_alloc();

perl_construct( my_perl );

perl_parse(my_perl, NULL, 2, my_argv, (char **)NULL);

perl_run(my_perl);

PerlPower(3, 4); /*** Compute 3 ** 4 ***/

perl_destruct(my_perl);

perl_free(my_perl);

}

Compile and run:

% cc −o power power.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

% power

3 to the 4th power is 81.

Maintaining a persistent interpreter

When developing interactive and/or potentially long−running applications, it‘s a good idea to maintain a

persistent interpreter rather than allocating and constructing a new interpreter multiple times. The major

reason is speed: since Perl will only be loaded into memory once.

However, you have to be more cautious with namespace and variable scoping when using a persistent

interpreter. In previous examples we‘ve been using global variables in the default package main. We knew

exactly what code would be run, and assumed we could avoid variable collisions and outrageous symbol

table growth.

Let‘s say your application is a server that will occasionally run Perl code from some arbitrary file. Your

server has no way of knowing what code it‘s going to run. Very dangerous.

If the file is pulled in by perl_parse(), compiled into a newly constructed interpreter, and subsequently

cleaned out with perl_destruct() afterwards, you‘re shielded from most namespace troubles.

One way to avoid namespace collisions in this scenario is to translate the filename into a guaranteed−unique

package name, and then compile the code into that package using eval. In the example below, each file will

only be compiled once. Or, the application might choose to clean out the symbol table associated with the

file after it‘s no longer needed. Using perl_call_argv, We‘ll call the subroutine

Embed::Persistent::eval_file which lives in the file persistent.pl and pass the filename

and boolean cleanup/cache flag as arguments.

Note that the process will continue to grow for each file that it uses. In addition, there might be

AUTOLOADed subroutines and other conditions that cause Perl‘s symbol table to grow. You might want to

add some logic that keeps track of the process size, or restarts itself after a certain number of requests, to

ensure that memory consumption is minimized. You‘ll also want to scope your variables with my whenever

18−Oct−1998 Version 5.005_02 577

perlembed Perl Programmers Reference Guide perlembed

possible.

package Embed::Persistent;

#persistent.pl

use strict;

use vars ’%Cache’;

use Symbol qw(delete_package);

sub valid_package_name {

my($string) = @_;

$string =~ s/([^A−Za−z0−9\/])/sprintf("_%2x",unpack("C",$1))/eg;

# second pass only for words starting with a digit

$string =~ s|/(\d)|sprintf("/_%2x",unpack("C",$1))|eg;

# Dress it up as a real package name

$string =~ s|/|::|g;

return "Embed" . $string;

}

sub eval_file {

my($filename, $delete) = @_;

my $package = valid_package_name($filename);

my $mtime = −M $filename;

if(defined $Cache{$package}{mtime}

$Cache{$package}{mtime} <= $mtime)

{

# we have compiled this subroutine already,

# it has not been updated on disk, nothing left to do

print STDERR "already compiled $package−>handler\n";

}

else {

local *FH;

open FH, $filename or die "open ’$filename’ $!";

local($/) = undef;

my $sub = <FH>;

close FH;

#wrap the code into a subroutine inside our unique package

my $eval = qq{package $package; sub handler { $sub; }};

{

# hide our variables within this block

my($filename,$mtime,$package,$sub);

eval $eval;

}

die $@ if $@;

#cache it unless we’re cleaning out each time

$Cache{$package}{mtime} = $mtime unless $delete;

}

eval {$package−>handler;};

die $@ if $@;

delete_package($package) if $delete;

#take a look if you want

#print Devel::Symdump−>rnew($package)−>as_string, $/;

578 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

}

__END__

/* persistent.c */

#include <EXTERN.h>

#include <perl.h>

/* 1 = clean out filename’s symbol table after each request, 0 = don’t */

#ifndef DO_CLEAN

#define DO_CLEAN 0

#endif

static PerlInterpreter *perl = NULL;

int

main(int argc, char **argv, char **env)

{

char *embedding[] = { "", "persistent.pl" };

char *args[] = { "", DO_CLEAN, NULL };

char filename [1024];

int exitstatus = 0;

if((perl = perl_alloc()) == NULL) {

fprintf(stderr, "no memory!");

exit(1);

}

perl_construct(perl);

exitstatus = perl_parse(perl, NULL, 2, embedding, NULL);

if(!exitstatus) {

exitstatus = perl_run(perl);

while(printf("Enter file name: ") && gets(filename)) {

/* call the subroutine, passing it the filename as an argument */

args[0] = filename;

perl_call_argv("Embed::Persistent::eval_file",

G_DISCARD | G_EVAL, args);

/* check $@ */

if(SvTRUE(ERRSV))

fprintf(stderr, "eval error: %s\n", SvPV(ERRSV,PL_na));

}

PL_perl_destruct_level = 0;

perl_destruct(perl);

perl_free(perl);

exit(exitstatus);

}

Now compile:

% cc −o persistent persistent.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

Here‘s a example script file:

#test.pl

my $string = "hello";

18−Oct−1998 Version 5.005_02 579

perlembed Perl Programmers Reference Guide perlembed

foo($string);

sub foo {

print "foo says: @_\n";

}

Now run:

% persistent

Enter file name: test.pl

foo says: hello

Enter file name: test.pl

already compiled Embed::test_2epl−>handler

foo says: hello

Enter file name: ^C

Maintaining multiple interpreter instances

Some rare applications will need to create more than one interpreter during a session. Such an application

might sporadically decide to release any resources associated with the interpreter.

The program must take care to ensure that this takes place before the next interpreter is constructed. By

default, the global variable PL_perl_destruct_level is set to , since extra cleaning isn‘t needed

when a program has only one interpreter.

Setting PL_perl_destruct_level to 1 makes everything squeaky clean:

PL_perl_destruct_level = 1;

while(1) {

...

/* reset global variables here with PL_perl_destruct_level = 1 */

perl_construct(my_perl);

...

/* clean and reset _everything_ during perl_destruct */

perl_destruct(my_perl);

perl_free(my_perl);

...

/* let’s go do it again! */

}

When

perl_destruct()

is called, the interpreter‘s syntax parse tree and symbol tables are cleaned up,

and global variables are reset.

Now suppose we have more than one interpreter instance running at the same time. This is feasible, but only

if you used the −DMULTIPLICITY flag when building Perl. By default, that sets

PL_perl_destruct_level to 1.

Let‘s give it a try:

#include <EXTERN.h>

#include <perl.h>

/* we’re going to embed two interpreters */

#define SAY_HELLO "−e", "print qq(Hi, I’m $^X\n)"

int main(int argc, char **argv, char **env)

{

PerlInterpreter

*one_perl = perl_alloc(),

*two_perl = perl_alloc();

580 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

char *one_args[] = { "one_perl", SAY_HELLO };

char *two_args[] = { "two_perl", SAY_HELLO };

perl_construct(one_perl);

perl_construct(two_perl);

perl_parse(one_perl, NULL, 3, one_args, (char **)NULL);

perl_parse(two_perl, NULL, 3, two_args, (char **)NULL);

perl_run(one_perl);

perl_run(two_perl);

perl_destruct(one_perl);

perl_destruct(two_perl);

perl_free(one_perl);

perl_free(two_perl);

}

Compile as usual:

% cc −o multiplicity multiplicity.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

Run it, Run it:

% multiplicity

Hi, I’m one_perl

Hi, I’m two_perl

Using Perl modules, which themselves use C libraries, from your C program

If you‘ve played with the examples above and tried to embed a script that

use()

s a Perl module (such as

Socket) which itself uses a C or C++ library, this probably happened:

Can’t load module Socket, dynamic loading not available in this perl.

(You may need to build a new perl executable which either supports

dynamic loading or has the Socket module statically linked into it.)

What‘s wrong?

Your interpreter doesn‘t know how to communicate with these extensions on its own. A little glue will help.

Up until now you‘ve been calling

perl_parse()

, handing it NULL for the second argument:

perl_parse(my_perl, NULL, argc, my_argv, NULL);

That‘s where the glue code can be inserted to create the initial contact between Perl and linked C/C++

routines. Let‘s take a look some pieces of perlmain.c to see how Perl does this:

#ifdef __cplusplus

# define EXTERN_C extern "C"

#else

# define EXTERN_C extern

#endif

static void xs_init _((void));

EXTERN_C void boot_DynaLoader _((CV* cv));

EXTERN_C void boot_Socket _((CV* cv));

EXTERN_C void

xs_init()

{

char *file = __FILE__;

/* DynaLoader is a special case */

newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);

18−Oct−1998 Version 5.005_02 581

perlembed Perl Programmers Reference Guide perlembed

newXS("Socket::bootstrap", boot_Socket, file);

}

Simply put: for each extension linked with your Perl executable (determined during its initial configuration

on your computer or when adding a new extension), a Perl subroutine is created to incorporate the

extension‘s routines. Normally, that subroutine is named

Module::bootstrap()

and is invoked when

you say use Module. In turn, this hooks into an XSUB, boot_Module, which creates a Perl counterpart for

each of the extension‘s XSUBs. Don‘t worry about this part; leave that to the xsubpp and extension authors.

If your extension is dynamically loaded, DynaLoader creates

Module::bootstrap()

for you on the fly.

In fact, if you have a working DynaLoader then there is rarely any need to link in any other extensions

statically.

Once you have this code, slap it into the second argument of

perl_parse()

perl_parse(my_perl, xs_init, argc, my_argv, NULL);

Then compile:

% cc −o interp interp.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘

% interp

use Socket;

use SomeDynamicallyLoadedModule;

print "Now I can use extensions!\n"’

ExtUtils::Embed can also automate writing the xs_init glue code.

% perl −MExtUtils::Embed −e xsinit −− −o perlxsi.c

% cc −c perlxsi.c ‘perl −MExtUtils::Embed −e ccopts‘

% cc −c interp.c ‘perl −MExtUtils::Embed −e ccopts‘

% cc −o interp perlxsi.o interp.o ‘perl −MExtUtils::Embed −e ldopts‘

Consult perlxs and perlguts for more details.

Embedding Perl under Win32

At the time of this writing (5.004), there are two versions of Perl which run under Win32. (The two versions

are merging in 5.005.) Interfacing to ActiveState‘s Perl library is quite different from the examples in this

documentation, as significant changes were made to the internal Perl API. However, it is possible to embed

ActiveState‘s Perl runtime. For details, see the Perl for Win32 FAQ at

http://www.perl.com/perl/faq/win32/Perl_for_Win32_FAQ.html.

With the "official" Perl version 5.004 or higher, all the examples within this documentation will compile and

run untouched, although the build process is slightly different between Unix and Win32.

For starters, backticks don‘t work under the Win32 native command shell. The ExtUtils::Embed kit on

CPAN ships with a script called genmake, which generates a simple makefile to build a program from a

single C source file. It can be used like this:

C:\ExtUtils−Embed\eg> perl genmake interp.c

C:\ExtUtils−Embed\eg> nmake

C:\ExtUtils−Embed\eg> interp −e "print qq{I’m embedded in Win32!\n}"

You may wish to use a more robust environment such as the Microsoft Developer Studio. In this case, run

this to generate perlxsi.c:

perl −MExtUtils::Embed −e xsinit

Create a new project and Insert − Files into Project: perlxsi.c, perl.lib, and your own source files, e.g.

interp.c. Typically you‘ll find perl.lib in C:\perl\lib\CORE, if not, you should see the CORE directory

relative to perl −V:archlib. The studio will also need this path so it knows where to find Perl include

files. This path can be added via the Tools − Options − Directories menu. Finally, select Build − Build

interp.exe and you‘re ready to go.

582 Version 5.005_02 18−Oct−1998

perlembed Perl Programmers Reference Guide perlembed

MORAL

You can sometimes write faster code in C, but you can always write code faster in Perl. Because you can

use each from the other, combine them as you wish.

AUTHOR

Jon Orwant <orwant@tpj.com and Doug MacEachern <dougm@osf.org, with small contributions from Tim

Bunce, Tom Christiansen, Guy Decoux, Hallvard Furuseth, Dov Grobgeld, and Ilya Zakharevich.

Doug MacEachern has an article on embedding in Volume 1, Issue 4 of The Perl Journal (http://tpj.com).

Doug is also the developer of the most widely−used Perl embedding: the mod_perl system (perl.apache.org),

which embeds Perl in the Apache web server. Oracle, Binary Evolution, ActiveState, and Ben Sugars‘s

nsapi_perl have used this model for Oracle, Netscape and Internet Information Server Perl plugins.

July 22, 1998

Permission is granted to make and distribute verbatim copies of this documentation provided the copyright

notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this documentation under the conditions

for verbatim copying, provided also that they are marked clearly as modified versions, that the authors’

names and title are unchanged (though subtitles and additional authors’ names may be added), and that the

entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this documentation into another language, under

the above conditions for modified versions.

18−Oct−1998 Version 5.005_02 583

perlpod Perl Programmers Reference Guide perlpod

NAME

perlpod − plain old documentation

DESCRIPTION

A pod−to−whatever translator reads a pod file paragraph by paragraph, and translates it to the appropriate

output format. There are three kinds of paragraphs: Verbatim Paragraph in verbatim|,

Command Paragraph in command|, and Ordinary Block of Text in ordinary text|.

Verbatim Paragraph

A verbatim paragraph, distinguished by being indented (that is, it starts with space or tab). It should be

reproduced exactly, with tabs assumed to be on 8−column boundaries. There are no special formatting

escapes, so you can‘t italicize or anything like that. A \ means \, and nothing else.

Command Paragraph

All command paragraphs start with "=", followed by an identifier, followed by arbitrary text that the

command can use however it pleases. Currently recognized commands are

=head1 heading

=head2 heading

=item text

=over N

=back

=cut

=pod

=for X

=begin X

=end X

=pod

=cut The "=pod" directive does nothing beyond telling the compiler to lay off parsing code through the next

"=cut". It‘s useful for adding another paragraph to the doc if you‘re mixing up code and pod a lot.

=head1

=head2

Head1 and head2 produce first and second level headings, with the text in the same paragraph as the

"=headn" directive forming the heading description.

=over

=back

=item

Item, over, and back require a little more explanation: "=over" starts a section specifically for the

generation of a list using "=item" commands. At the end of your list, use "=back" to end it. You will

probably want to give "4" as the number to "=over", as some formatters will use this for indentation.

This should probably be a default. Note also that there are some basic rules to using =item: don‘t use

them outside of an =over/=back block, use at least one inside an =over/=back block, you don‘t _have_

to include the =back if the list just runs off the document, and perhaps most importantly, keep the items

consistent: either use "=item *" for all of them, to produce bullets, or use "=item 1.", "=item 2.", etc., to

produce numbered lists, or use "=item foo", "=item bar", etc., i.e., things that looks nothing like bullets

or numbers. If you start with bullets or numbers, stick with them, as many formatters use the first

"=item" type to decide how to format the list.

=for

=begin

=end

For, begin, and end let you include sections that are not interpreted as pod text, but passed directly to

particular formatters. A formatter that can utilize that format will use the section, otherwise it will be

completely ignored. The directive "=for" specifies that the entire next paragraph is in the format

584 Version 5.005_02 18−Oct−1998

perlpod Perl Programmers Reference Guide perlpod

indicated by the first word after "=for", like this:

=for html <br>

<p> This is a raw HTML paragraph </p>

The paired commands "=begin" and "=end" work very similarly to "=for", but instead of only

accepting a single paragraph, all text from "=begin" to a paragraph with a matching "=end" are treated

as a particular format.

Here are some examples of how to use these:

=begin html

<br>Figure 1.<IMG SRC="figure1.png"><br>

=end html

=begin text

−−−−−−−−−−−−−−−

| foo |

| bar |

−−−−−−−−−−−−−−−

^^^^ Figure 1. ^^^^

=end text

Some format names that formatters currently are known to accept include "roff", "man", "latex", "tex",

"text", and "html". (Some formatters will treat some of these as synonyms.)

And don‘t forget, when using any command, that the command lasts up until the end of the

paragraph, not the line. Hence in the examples below, you can see the empty lines after each

command to end its paragraph.

Some examples of lists include:

=over 4

=item *

First item

=item *

Second item

=back

=over 4

=item Foo()

Description of Foo function

=item Bar()

Description of Bar function

=back

Ordinary Block of Text

It will be filled, and maybe even justified. Certain interior sequences are recognized both here and in

commands:

I<text> italicize text, used for emphasis or variables

B<text> embolden text, used for switches and programs

S<text> text contains non−breaking spaces

18−Oct−1998 Version 5.005_02 585

perlpod Perl Programmers Reference Guide perlpod

C<code> literal code

L<name> A link (cross reference) to name

L<name> manual page

L<name/ident>item in manual page

L<name/"sec">section in other manual page

L<"sec"> section in this manual page

(the quotes are optional)

L</"sec"> ditto

same as above but only ’text’ is used for output.

(Text can not contain the characters ’|’ or ’>’)

L<text|name>

L<text|name/ident>

L<text|name/"sec">

L<text|"sec">

L<text|/"sec">

F<file> Used for filenames

X<index> An index entry

Z<> A zero−width character

E<escape> A named character (very similar to HTML escapes)

E<lt> A literal <

E<gt> A literal >

(these are optional except in other interior

sequences and when preceded by a capital letter)

E<n> Character number n (probably in ASCII)

E<html> Some non−numeric HTML entity, such

as E<Agrave>

The Intent

That‘s it. The intent is simplicity, not power. I wanted paragraphs to look like paragraphs (block format), so

that they stand out visually, and so that I could run them through fmt easily to reformat them (that‘s F7 in my

version of vi). I wanted the translator (and not me) to worry about whether " or ’ is a left quote or a right

quote within filled text, and I wanted it to leave the quotes alone, dammit, in verbatim mode, so I could slurp

in a working program, shift it over 4 spaces, and have it print out, er, verbatim. And presumably in a

constant width font.

In particular, you can leave things like this verbatim in your text:

Perl

FILEHANDLE

$variable

function()

manpage(3r)

Doubtless a few other commands or sequences will need to be added along the way, but I‘ve gotten along

surprisingly well with just these.

Note that I‘m not at all claiming this to be sufficient for producing a book. I‘m just trying to make an

idiot−proof common source for nroff, TeX, and other markup languages, as used for online documentation.

Translators exist for pod2man (that‘s for nroff(1) and troff(1)), pod2text, pod2html, pod2latex, and

pod2fm.

Embedding Pods in Perl Modules

You can embed pod documentation in your Perl scripts. Start your documentation with a "=head1"

command at the beginning, and end it with a "=cut" command. Perl will ignore the pod text. See any of the

supplied library modules for examples. If you‘re going to put your pods at the end of the file, and you‘re

using an __END__ or __DATA__ cut mark, make sure to put an empty line there before the first pod

directive.

586 Version 5.005_02 18−Oct−1998

perlpod Perl Programmers Reference Guide perlpod

__END__

=head1 NAME

modern − I am a modern module

If you had not had that empty line there, then the translators wouldn‘t have seen it.

Common Pod Pitfalls

Pod translators usually will require paragraphs to be separated by completely empty lines. If you have

an apparently empty line with some spaces on it, this can cause odd formatting.

Translators will mostly add wording around a L<> link, so that L<foo(1)> becomes "the foo(1)

manpage", for example (see pod2man for details). Thus, you shouldn‘t write things like the

L<foo> manpage, if you want the translated document to read sensibly.

If you don need or want total control of the text used for a link in the output use the form L<show this

text|foo> instead.

The script pod/checkpods.PL in the Perl source distribution provides skeletal checking for lines that

look empty but aren‘t only, but is there as a placeholder until someone writes Pod::Checker. The best

way to check your pod is to pass it through one or more translators and proofread the result, or print

out the result and proofread that. Some of the problems found may be bugs in the translators, which

you may or may not wish to work around.

SEE ALSO

pod2man and PODs: Embedded Documentation in perlsyn

AUTHOR

Larry Wall

18−Oct−1998 Version 5.005_02 587

perlbook Perl Programmers Reference Guide perlbook

NAME

perlbook − Perl book information

DESCRIPTION

The Camel Book, officially known as Programming Perl, Second Edition, by Larry Wall et al, is the

definitive reference work covering nearly all of Perl. You can order it and other Perl books from O‘Reilly &

Associates, 1−800−998−9938. Local/overseas is +1 707 829 0515. If you can locate an O‘Reilly order

form, you can also fax to +1 707 829 0104. If you‘re web−connected, you can even mosey on over to

http://www.ora.com/ for an online order form.

Other Perl books from various publishers and authors can be found listed in perlfaq3.

588 Version 5.005_02 18−Oct−1998

perlapio Perl Programmers Reference Guide perlapio

NAME

perlapio − perl‘s IO abstraction interface.

SYNOPSIS

PerlIO *PerlIO_stdin(void);

PerlIO *PerlIO_stdout(void);

PerlIO *PerlIO_stderr(void);

PerlIO *PerlIO_open(const char *,const char *);

int PerlIO_close(PerlIO *);

int PerlIO_stdoutf(const char *,...)

int PerlIO_puts(PerlIO *,const char *);

int PerlIO_putc(PerlIO *,int);

int PerlIO_write(PerlIO *,const void *,size_t);

int PerlIO_printf(PerlIO *, const char *,...);

int PerlIO_vprintf(PerlIO *, const char *, va_list);

int PerlIO_flush(PerlIO *);

int PerlIO_eof(PerlIO *);

int PerlIO_error(PerlIO *);

void PerlIO_clearerr(PerlIO *);

int PerlIO_getc(PerlIO *);

int PerlIO_ungetc(PerlIO *,int);

int PerlIO_read(PerlIO *,void *,size_t);

int PerlIO_fileno(PerlIO *);

PerlIO *PerlIO_fdopen(int, const char *);

PerlIO *PerlIO_importFILE(FILE *, int flags);

FILE *PerlIO_exportFILE(PerlIO *, int flags);

FILE *PerlIO_findFILE(PerlIO *);

void PerlIO_releaseFILE(PerlIO *,FILE *);

void PerlIO_setlinebuf(PerlIO *);

long PerlIO_tell(PerlIO *);

int PerlIO_seek(PerlIO *,off_t,int);

int PerlIO_getpos(PerlIO *,Fpos_t *)

int PerlIO_setpos(PerlIO *,Fpos_t *)

void PerlIO_rewind(PerlIO *);

int PerlIO_has_base(PerlIO *);

int PerlIO_has_cntptr(PerlIO *);

int PerlIO_fast_gets(PerlIO *);

int PerlIO_canset_cnt(PerlIO *);

char *PerlIO_get_ptr(PerlIO *);

int PerlIO_get_cnt(PerlIO *);

void PerlIO_set_cnt(PerlIO *,int);

void PerlIO_set_ptrcnt(PerlIO *,char *,int);

char *PerlIO_get_base(PerlIO *);

int PerlIO_get_bufsiz(PerlIO *);

DESCRIPTION

Perl‘s source code should use the above functions instead of those defined in ANSI C‘s stdio.h. The perl

headers will #define them to the I/O mechanism selected at Configure time.

The functions are modeled on those in stdio.h, but parameter order has been "tidied up a little".

18−Oct−1998 Version 5.005_02 589

perlapio Perl Programmers Reference Guide perlapio

PerlIO *

This takes the place of FILE *. Like FILE * it should be treated as opaque (it is probably safe to

assume it is a pointer to something).

PerlIO_stdin(), PerlIO_stdout(), PerlIO_stderr()

Use these rather than stdin, stdout, stderr. They are written to look like "function calls" rather

than variables because this makes it easier to make them function calls if platform cannot export data to

loaded modules, or if (say) different "threads" might have different values.

PerlIO_open(path, mode), PerlIO_fdopen(fd,mode)

These correspond to fopen()/fdopen() arguments are the same.

PerlIO_printf(f,fmt,...), PerlIO_vprintf(f,fmt,a)

These are fprintf()/vfprintf() equivalents.

PerlIO_stdoutf(fmt,...)

This is printf() equivalent. printf is #defined to this function, so it is (currently) legal to use

printf(fmt,...) in perl sources.

PerlIO_read(f,buf,count), PerlIO_write(f,buf,count)

These correspond to fread() and fwrite(). Note that arguments are different, there is only one

"count" and order has "file" first.

PerlIO_close(f)

PerlIO_puts(f,s), PerlIO_putc(f,c)

These correspond to fputs() and fputc(). Note that arguments have been revised to have "file"

first.

PerlIO_ungetc(f,c)

This corresponds to ungetc(). Note that arguments have been revised to have "file" first.

PerlIO_getc(f)

This corresponds to getc().

PerlIO_eof(f)

This corresponds to feof().

PerlIO_error(f)

This corresponds to ferror().

PerlIO_fileno(f)

This corresponds to fileno(), note that on some platforms, the meaning of "fileno" may not match

Unix.

PerlIO_clearerr(f)

This corresponds to clearerr(), i.e., clears ‘eof’ and ‘error’ flags for the "stream".

PerlIO_flush(f)

This corresponds to fflush().

PerlIO_tell(f)

This corresponds to ftell().

PerlIO_seek(f,o,w)

This corresponds to fseek().

PerlIO_getpos(f,p), PerlIO_setpos(f,p)

These correspond to fgetpos() and fsetpos(). If platform does not have the stdio calls then

they are implemented in terms of PerlIO_tell() and PerlIO_seek().

590 Version 5.005_02 18−Oct−1998

perlapio Perl Programmers Reference Guide perlapio

PerlIO_rewind(f)

This corresponds to rewind(). Note may be redefined in terms of PerlIO_seek() at some point.

PerlIO_tmpfile()

This corresponds to tmpfile(), i.e., returns an anonymous PerlIO which will automatically be

deleted when closed.

Co−existence with stdio

There is outline support for co−existence of PerlIO with stdio. Obviously if PerlIO is implemented in terms

of stdio there is no problem. However if perlio is implemented on top of (say) sfio then mechanisms must

exist to create a FILE * which can be passed to library code which is going to use stdio calls.

PerlIO_importFILE(f,flags)

Used to get a PerlIO * from a FILE *. May need additional arguments, interface under review.

PerlIO_exportFILE(f,flags)

Given an PerlIO * return a ‘native’ FILE * suitable for passing to code expecting to be compiled and

linked with ANSI C stdio.h.

The fact that such a FILE * has been ‘exported’ is recorded, and may affect future PerlIO operations

on the original PerlIO *.

PerlIO_findFILE(f)

Returns previously ‘exported’ FILE * (if any). Place holder until interface is fully defined.

PerlIO_releaseFILE(p,f)

Calling PerlIO_releaseFILE informs PerlIO that all use of FILE * is complete. It is removed from list

of ‘exported’ FILE *s, and associated PerlIO * should revert to original behaviour.

PerlIO_setlinebuf(f)

This corresponds to setlinebuf(). Use is deprecated pending further discussion. (Perl core uses it

only when "dumping"; it has nothing to do with $| auto−flush.)

In addition to user API above there is an "implementation" interface which allows perl to get at internals of

PerlIO. The following calls correspond to the various FILE_xxx macros determined by Configure. This

section is really of interest to only those concerned with detailed perl−core behaviour or implementing a

PerlIO mapping.

PerlIO_has_cntptr(f)

Implementation can return pointer to current position in the "buffer" and a count of bytes available in

the buffer.

PerlIO_get_ptr(f)

Return pointer to next readable byte in buffer.

PerlIO_get_cnt(f)

Return count of readable bytes in the buffer.

PerlIO_canset_cnt(f)

Implementation can adjust its idea of number of bytes in the buffer.

PerlIO_fast_gets(f)

Implementation has all the interfaces required to allow perl‘s fast code to handle <FILE mechanism.

PerlIO_fast_gets(f) = PerlIO_has_cntptr(f) && \

PerlIO_canset_cnt(f) && \

‘Can set pointer into buffer’

18−Oct−1998 Version 5.005_02 591

perlapio Perl Programmers Reference Guide perlapio

PerlIO_set_ptrcnt(f,p,c)

Set pointer into buffer, and a count of bytes still in the buffer. Should be used only to set pointer to

within range implied by previous calls to PerlIO_get_ptr and PerlIO_get_cnt.

PerlIO_set_cnt(f,c)

Obscure − set count of bytes in the buffer. Deprecated. Currently used in only doio.c to force count <

−1 to −1. Perhaps should be PerlIO_set_empty or similar. This call may actually do nothing if "count"

is deduced from pointer and a "limit".

PerlIO_has_base(f)

Implementation has a buffer, and can return pointer to whole buffer and its size. Used by perl for −T /

−B tests. Other uses would be very obscure...

PerlIO_get_base(f)

Return start of buffer.

PerlIO_get_bufsiz(f)

Return total size of buffer.

592 Version 5.005_02 18−Oct−1998

perldelta Perl Programmers Reference Guide perldelta

NAME

perldelta − what‘s new for perl5.005

DESCRIPTION

This document describes differences between the 5.004 release and this one.

About the new versioning system

Perl is now developed on two tracks: a maintenance track that makes small, safe updates to released

production versions with emphasis on compatibility; and a development track that pursues more aggressive

evolution. Maintenance releases (which should be considered production quality) have subversion numbers

that run from 1 to 49, and development releases (which should be considered "alpha" quality) run from 50

to 99.

Perl 5.005 is the combined product of the new dual−track development scheme.

Incompatible Changes

WARNING: This version is not binary compatible with Perl 5.004.

Starting with Perl 5.004_50 there were many deep and far−reaching changes to the language internals. If

you have dynamically loaded extensions that you built under perl 5.003 or 5.004, you can continue to use

them with 5.004, but you will need to rebuild and reinstall those extensions to use them 5.005. See INSTALL

for detailed instructions on how to upgrade.

Default installation structure has changed

The new Configure defaults are designed to allow a smooth upgrade from 5.004 to 5.005, but you should

read INSTALL for a detailed discussion of the changes in order to adapt them to your system.

Perl Source Compatibility

When none of the experimental features are enabled, there should be very few user−visible Perl source

compatibility issues.

If threads are enabled, then some caveats apply. @_ and $_ become lexical variables. The effect of this

should be largely transparent to the user, but there are some boundary conditions under which user will need

to be aware of the issues. For example, local(@_) results in a "Can‘t localize lexical variable @_ ..."

message. This may be enabled in a future version.

Some new keywords have been introduced. These are generally expected to have very little impact on

compatibility. See New

INIT

keyword, New

lock

keyword, and / operator.

Certain barewords are now reserved. Use of these will provoke a warning if you have asked for them with

the −w switch. See

our

is now a reserved word.

C Source Compatibility

There have been a large number of changes in the internals to support the new features in this release.

Core sources now require ANSI C compiler

An ANSI C compiler is now required to build perl. See INSTALL.

All Perl global variables must now be referenced with an explicit prefix

All Perl global variables that are visible for use by extensions now have a PL_ prefix. New extensions

should not refer to perl globals by their unqualified names. To preserve sanity, we provide limited

backward compatibility for globals that are being widely used like sv_undef and na (which should

now be written as PL_sv_undef, PL_na etc.)

If you find that your XS extension does not compile anymore because a perl global is not visible, try

adding a PL_ prefix to the global and rebuild.

It is strongly recommended that all functions in the Perl API that don‘t begin with perl be referenced

with a Perl_ prefix. The bare function names without the Perl_ prefix are supported with macros,

but this support may cease in a future release.

18−Oct−1998 Version 5.005_02 593

perldelta Perl Programmers Reference Guide perldelta

See API LISTING.

Enabling threads has source compatibility issues

Perl built with threading enabled requires extensions to use the new dTHR macro to initialize the

handle to access per−thread data. If you see a compiler error that talks about the variable thr not

being declared (when building a module that has XS code), you need to add dTHR; at the beginning

of the block that elicited the error.

The API function perl_get_sv("@",FALSE) should be used instead of directly accessing perl

globals as GvSV(errgv). The API call is backward compatible with existing perls and provides

source compatibility with threading is enabled.

See API Changes for more information.

Binary Compatibility

This version is NOT binary compatible with older versions. All extensions will need to be recompiled.

Further binaries built with threads enabled are incompatible with binaries built without. This should largely

be transparent to the user, as all binary incompatible configurations have their own unique architecture name,

and extension binaries get installed at unique locations. This allows coexistence of several configurations in

the same directory hierarchy. See INSTALL.

Security fixes may affect compatibility

A few taint leaks and taint omissions have been corrected. This may lead to "failure" of scripts that used to

work with older versions. Compiling with −DINCOMPLETE_TAINTS provides a perl with minimal

amounts of changes to the tainting behavior. But note that the resulting perl will have known insecurities.

Oneliners with the −e switch do not create temporary files anymore.

Relaxed new mandatory warnings introduced in 5.004

Many new warnings that were introduced in 5.004 have been made optional. Some of these warnings are

still present, but perl‘s new features make them less often a problem. See New Diagnostics.

Licensing

Perl has a new Social Contract for contributors. See Porting/Contract.

The license included in much of the Perl documentation has changed. Most of the Perl documentation was

previously under the implicit GNU General Public License or the Artistic License (at the user‘s choice). Now

much of the documentation unambigously states the terms under which it may be distributed. Those terms

are in general much less restrictive than the GNU GPL. See perl and the individual perl man pages listed

therein.

Core Changes

Threads

WARNING: Threading is considered an experimental feature. Details of the implementation may change

without notice. There are known limitations and some bugs. These are expected to be fixed in future

versions.

See README.threads.

Compiler

WARNING: The Compiler and related tools are considered experimental. Features may change without

notice, and there are known limitations and bugs. Since the compiler is fully external to perl, the default

configuration will build and install it.

The Compiler produces three different types of transformations of a perl program. The C backend generates

C code that captures perl‘s state just before execution begins. It eliminates the compile−time overheads of

the regular perl interpreter, but the run−time performance remains comparatively the same. The CC backend

generates optimized C code equivalent to the code path at run−time. The CC backend has greater potential

for big optimizations, but only a few optimizations are implemented currently. The Bytecode backend

594 Version 5.005_02 18−Oct−1998

perldelta Perl Programmers Reference Guide perldelta

generates a platform independent bytecode representation of the interpreter‘s state just before execution.

Thus, the Bytecode back end also eliminates much of the compilation overhead of the interpreter.

The compiler comes with several valuable utilities.

B::Lint is an experimental module to detect and warn about suspicious code, especially the cases that the

−w switch does not detect.

B::Deparse can be used to demystify perl code, and understand how perl optimizes certain constructs.

B::Xref generates cross reference reports of all definition and use of variables, subroutines and formats in

a program.

B::Showlex show the lexical variables used by a subroutine or file at a glance.

perlcc is a simple frontend for compiling perl.

See ext/B/README, B, and the respective compiler modules.

Regular Expressions

Perl‘s regular expression engine has been seriously overhauled, and many new constructs are supported.

Several bugs have been fixed.

Here is an itemized summary:

Many new and improved optimizations

Changes in the RE engine:

Unneeded nodes removed;

Substrings merged together;

New types of nodes to process (SUBEXPR)* and similar expressions

quickly, used if the SUBEXPR has no side effects and matches

strings of the same length;

Better optimizations by lookup for constant substrings;

Better search for constants substrings anchored by $ ;

Changes in Perl code using RE engine:

More optimizations to s/longer/short/;

study() was not working;

/blah/ may be optimized to an analogue of index() if $& $‘ $’ not seen;

Unneeded copying of matched−against string removed;

Only matched part of the string is copying if $‘ $’ were not seen;

Many bug fixes

Note that only the major bug fixes are listed here. See Changes for others.

Backtracking might not restore start of $3.

No feedback if max count for * or + on "complex" subexpression

was reached, similarly (but at compile time) for {3,34567}

Primitive restrictions on max count introduced to decrease a

possibility of a segfault;

(ZERO−LENGTH)* could segfault;

(ZERO−LENGTH)* was prohibited;

Long REs were not allowed;

/RE/g could skip matches at the same position after a

zero−length match;

New regular expression constructs

The following new syntax elements are supported:

(?<=RE)

18−Oct−1998 Version 5.005_02 595

perldelta Perl Programmers Reference Guide perldelta

(?<!RE)

(?{ CODE })

(?i−x)

(?i:RE)

(?(COND)YES_RE|NO_RE)

(?>RE)

New operator for precompiled regular expressions

See / operator.

Other improvements

Better debugging output (possibly with colors),

even from non−debugging Perl;

RE engine code now looks like C, not like assembler;

Behaviour of RE modifiable by ‘use re’ directive;

Improved documentation;

Test suite significantly extended;

Syntax [:^upper:] etc., reserved inside character classes;

Incompatible changes

(?i) localized inside enclosing group;

$( is not interpolated into RE any more;

/RE/g may match at the same position (with non−zero length)

after a zero−length match (bug fix).

See perlre and perlop.

Improved malloc()

See banner at the beginning of malloc.c for details.

Quicksort is internally implemented

Perl now contains its own highly optimized qsort() routine. The new qsort() is resistant to

inconsistent comparison functions, so Perl‘s sort() will not provoke coredumps any more when given

poorly written sort subroutines. (Some C library qsort()s that were being used before used to have this

problem.) In our testing, the new qsort() required the minimal number of pair−wise compares on

average, among all known qsort() implementations.

See perlfunc/sort.

Reliable signals

Perl‘s signal handling is susceptible to random crashes, because signals arrive asynchronously, and the Perl

runtime is not reentrant at arbitrary times.

However, one experimental implementation of reliable signals is available when threads are enabled. See

Thread::Signal. Also see INSTALL for how to build a Perl capable of threads.

Reliable stack pointers

The internals now reallocate the perl stack only at predictable times. In particular, magic calls never trigger

reallocations of the stack, because all reentrancy of the runtime is handled using a "stack of stacks". This

should improve reliability of cached stack pointers in the internals and in XSUBs.

More generous treatment of carriage returns

Perl used to complain if it encountered literal carriage returns in scripts. Now they are mostly treated like

whitespace within program text. Inside string literals and here documents, literal carriage returns are ignored

if they occur paired with newlines, or get interpreted as newlines if they stand alone. This behavior means

that literal carriage returns in files should be avoided. You can get the older, more compatible (but less

generous) behavior by defining the preprocessor symbol PERL_STRICT_CR when building perl. Of

596 Version 5.005_02 18−Oct−1998

perldelta Perl Programmers Reference Guide perldelta

course, all this has nothing whatever to do with how escapes like \r are handled within strings.

Note that this doesn‘t somehow magically allow you to keep all text files in DOS format. The generous

treatment only applies to files that perl itself parses. If your C compiler doesn‘t allow carriage returns in

files, you may still be unable to build modules that need a C compiler.

Memory leaks

substr, pos and vec don‘t leak memory anymore when used in lvalue context. Many small leaks that

impacted applications that embed multiple interpreters have been fixed.

Better support for multiple interpreters

The build−time option −DMULTIPLICITY has had many of the details reworked. Some previously global

variables that should have been per−interpreter now are. With care, this allows interpreters to call each

other. See the PerlInterp extension on CPAN.

Behavior of local() on array and hash elements is now well−defined

See "Temporary Values via

local()

%! is transparently tied to the

Errno

module

See perlvar, and Errno.

Pseudo−hashes are supported

See perlref.

EXPR foreach EXPR is supported

See perlsyn.

Keywords can be globally overridden

See perlsub.

$^E is meaningful on Win32

See perlvar.

foreach (1..1000000) optimized

foreach (1..1000000) is now optimized into a counting loop. It does not try to allocate a

1000000−size list anymore.

Foo:: can be used as implicitly quoted package name

Barewords caused unintuitive behavior when a subroutine with the same name as a package happened to be

defined. Thus, new Foo @args, use the result of the call to Foo() instead of Foo being treated as a

literal. The recommended way to write barewords in the indirect object slot is new Foo:: @args. Note

that the method new() is called with a first argument of Foo, not Foo:: when you do that.

exists $Foo::{Bar::} tests existence of a package

It was impossible to test for the existence of a package without actually creating it before. Now exists

$Foo::{Bar::} can be used to test if the Foo::Bar namespace has been created.

Better locale support

See perllocale.

Experimental support for 64−bit platforms

Perl5 has always had 64−bit support on systems with 64−bit longs. Starting with 5.005, the beginnings of

experimental support for systems with 32−bit long and 64−bit ‘long long’ integers has been added. If you

add −DUSE_LONG_LONG to your ccflags in config.sh (or manually define it in perl.h) then perl will be

built with ‘long long’ support. There will be many compiler warnings, and the resultant perl may not work

on all systems. There are many other issues related to third−party extensions and libraries. This option

exists to allow people to work on those issues.

18−Oct−1998 Version 5.005_02 597

perldelta Perl Programmers Reference Guide perldelta

prototype() returns useful results on builtins

See prototype.

Extended support for exception handling

die() now accepts a reference value, and $@ gets set to that value in exception traps. This makes it

possible to propagate exception objects. This is an undocumented experimental feature.

Re−blessing in DESTROY() supported for chaining DESTROY() methods

See Destructors.

All printf format conversions are handled internally

See printf.

New INIT keyword

INIT subs are like BEGIN and END, but they get run just before the perl runtime begins execution. e.g., the

Perl Compiler makes use of INIT blocks to initialize and resolve pointers to XSUBs.

New lock keyword

The lock keyword is the fundamental synchronization primitive in threaded perl. When threads are not

enabled, it is currently a noop.

To minimize impact on source compatibility this keyword is "weak", i.e., any user−defined subroutine of the

same name overrides it, unless a use Thread has been seen.

New qr// operator

The qr// operator, which is syntactically similar to the other quote−like operators, is used to create

precompiled regular expressions. This compiled form can now be explicitly passed around in variables, and

interpolated in other regular expressions. See perlop.

our is now a reserved word

Calling a subroutine with the name our will now provoke a warning when using the −w switch.

Tied arrays are now fully supported

See Tie::Array.

Tied handles support is better

Several missing hooks have been added. There is also a new base class for TIEARRAY implementations.

See Tie::Array.

4th argument to substr

substr() can now both return and replace in one operation. The optional 4th argument is the replacement

string. See substr.

Negative LENGTH argument to splice

splice() with a negative LENGTH argument now work similar to what the LENGTH did for

substr(). Previously a negative LENGTH was treated as 0. See splice.

Magic lvalues are now more magical

When you say something like substr($x, 5) = "hi", the scalar returned by substr() is special, in

that any modifications to it affect $x. (This is called a ‘magic lvalue’ because an ‘lvalue’ is something on

the left side of an assignment.) Normally, this is exactly what you would expect to happen, but Perl uses the

same magic if you use substr(), pos(), or vec() in a context where they might be modified, like

taking a reference with \ or as an argument to a sub that modifies @_. In previous versions, this ‘magic’ only

went one way, but now changes to the scalar the magic refers to ($x in the above example) affect the magic

lvalue too. For instance, this code now acts differently:

$x = "hello";

sub printit {

598 Version 5.005_02 18−Oct−1998

perldelta Perl Programmers Reference Guide perldelta

$x = "g’bye";

print $_[0], "\n";

}

printit(substr($x, 0, 5));

In previous versions, this would print "hello", but it now prints "g‘bye".

<> now reads in records

If $/ is a referenence to an integer, or a scalar that holds an integer, <> will read in records instead of lines.

For more info, see

Supported Platforms

Configure has many incremental improvements. Site−wide policy for building perl can now be made

persistent, via Policy.sh. Configure also records the command−line arguments used in config.sh.

New Platforms

BeOS is now supported. See README.beos.

DOS is now supported under the DJGPP tools. See README.dos.

MPE/iX is now supported. See README.mpeix.

MVS (OS390) is now supported. See README.os390.

Changes in existing support

Win32 support has been vastly enhanced. Support for Perl Object, a C++ encapsulation of Perl. GCC and

EGCS are now supported on Win32. See README.win32, aka perlwin32.

VMS configuration system has been rewritten. See README.vms.

The hints files for most Unix platforms have seen incremental improvements.

Modules and Pragmata

New Modules

B Perl compiler and tools. See B.

Data::Dumper

A module to pretty print Perl data. See Data::Dumper.

Errno

A module to look up errors more conveniently. See Errno.

File::Spec

A portable API for file operations.

ExtUtils::Installed

Query and manage installed modules.

ExtUtils::Packlist

Manipulate .packlist files.

Fatal

Make functions/builtins succeed or die.

IPC::SysV

Constants and other support infrastructure for System V IPC operations in perl.

Test

A framework for writing testsuites.

18−Oct−1998 Version 5.005_02 599

perldelta Perl Programmers Reference Guide perldelta

Tie::Array

Base class for tied arrays.

Tie::Handle

Base class for tied handles.

Thread

Perl thread creation, manipulation, and support.

attrs

Set subroutine attributes.

fields

Compile−time class fields.

re Various pragmata to control behavior of regular expressions.

Changes in existing modules

CGI CGI has been updated to version 2.42.

POSIX

POSIX now has its own platform−specific hints files.

DB_File

DB_File supports version 2.x of Berkeley DB. See ext/DB_File/Changes.

MakeMaker

MakeMaker now supports writing empty makefiles, provides a way to specify that site umask()

policy should be honored. There is also better support for manipulation of .packlist files, and getting

information about installed modules.

Extensions that have both architecture−dependent and architecture−independent files are now always

installed completely in the architecture−dependent locations. Previously, the shareable parts were

shared both across architectures and across perl versions and were therefore liable to be overwritten

with newer versions that might have subtle incompatibilities.

CPAN

See <perlmodinstall and CPAN.

Cwd

Cwd::cwd is faster on most platforms.

Benchmark

Keeps better time.

Utility Changes

h2ph and related utilities have been vastly overhauled.

perlcc, a new experimental front end for the compiler is available.

The crude GNU configure emulator is now called configure.gnu to avoid trampling on

Configure under case−insensitive filesystems.

perldoc used to be rather slow. The slower features are now optional. In particular, case−insensitive

searches need the −i switch, and recursive searches need −r. You can set these switches in the PERLDOC

environment variable to get the old behavior.

600 Version 5.005_02 18−Oct−1998

perldelta Perl Programmers Reference Guide perldelta

Documentation Changes

Config.pm now has a glossary of variables.

Porting/patching.pod has detailed instructions on how to create and submit patches for perl.

perlport specifies guidelines on how to write portably.

perlmodinstall describes how to fetch and install modules from CPAN sites.

Some more Perl traps are documented now. See perltrap.

New Diagnostics

Ambiguous call resolved as CORE::%s(), qualify as such or use &

(W) A subroutine you have declared has the same name as a Perl keyword, and you have used the

name without qualification for calling one or the other. Perl decided to call the builtin because the

subroutine is not imported.

To force interpretation as a subroutine call, either put an ampersand before the subroutine name, or

qualify the name with its package. Alternatively, you can import the subroutine (or pretend that it‘s

imported with the use subs pragma).

To silently interpret it as the Perl operator, use the CORE:: prefix on the operator (e.g.

CORE::log($x)) or by declaring the subroutine to be an object method (see attrs).

Bad index while coercing array into hash

(F) The index looked up in the hash found as the 0‘th element of a pseudo−hash is not legal. Index

values must be at 1 or greater. See perlref.

Bareword "%s" refers to nonexistent package

(W) You used a qualified bareword of the form Foo::, but the compiler saw no other uses of that

namespace before that point. Perhaps you need to predeclare a package?

Can‘t call method "%s" on an undefined value

(F) You used the syntax of a method call, but the slot filled by the object reference or package name

contains an undefined value. Something like this will reproduce the error:

$BADREF = 42;

process $BADREF 1,2,3;

$BADREF−>process(1,2,3);

Can‘t coerce array into hash

(F) You used an array where a hash was expected, but the array has no information on how to map

from keys to array indices. You can do that only with arrays that have a hash reference at index 0.

Can‘t goto subroutine from an eval−string

(F) The "goto subroutine" call can‘t be used to jump out of an eval "string". (You can use it to jump

out of an eval {BLOCK}, but you probably don‘t want to.)

Can‘t localize pseudo−hash element

(F) You said something like local $ar−>{‘key‘}, where $ar is a reference to a pseudo−hash.

That hasn‘t been implemented yet, but you can get a similar effect by localizing the corresponding

array element directly — local $ar−>[$ar−>[0]{‘key‘}].

Can‘t use %%! because Errno.pm is not available

(F) The first time the %! hash is used, perl automatically loads the Errno.pm module. The Errno

module is expected to tie the %! hash to provide symbolic names for $! errno values.

Cannot find an opnumber for "%s"

(F) A string of a form CORE::word was given to prototype(), but there is no builtin with the

name word.

18−Oct−1998 Version 5.005_02 601