Perl Programmers Reference Guide Version 5.005 02
User Manual: Pdf
Open the PDF directly: View PDF .Page Count: 1463
Perl Programmers Reference Guide
Version 5.005_02
18−Oct−1998
"There’s more than one way to do it."
−− Larry Wall, Author of the Perl Programming Language
Author: Perl5−Porters
blank
INSTALL
Perl Programmers Reference Guide
INSTALL
NAME
Install − Build and Installation guide for perl5.
SYNOPSIS
The basic steps to build and install perl5 on a Unix system are:
rm −f config.sh Policy.sh
sh Configure
make
make test
make install
# You may also wish to add these:
(cd /usr/include && h2ph *.h sys/*.h)
(installhtml −−help)
(cd pod && make tex && )
Each of these is explained in further detail below.
For information on non−Unix systems, see the section on "Porting information" below.
For information on what‘s new in this release, see the pod/perldelta.pod file. For more detailed information
about specific changes, see the Changes file.
DESCRIPTION
This document is written in pod format as an easy way to indicate its structure. The pod format is described
in pod/perlpod.pod, but you can read it as is with any pager or editor. Headings and items are marked by
lines beginning with ‘=’. The other mark−up used is
B
C
L
embolden text, used for switches, programs or commands
literal code
A link (cross reference) to name
You should probably at least skim through this entire document before proceeding.
If you‘re building Perl on a non−Unix system, you should also read the README file specific to your
operating system, since this may provide additional or different instructions for building Perl.
If there is a hint file for your system (in the hints/ directory) you should also read that hint file for specific
information for your system. (Unixware users should use the svr4.sh hint file.)
WARNING: This version is not binary compatible with Perl 5.004.
Starting with Perl 5.004_50 there were many deep and far−reaching changes to the language internals. If
you have dynamically loaded extensions that you built under perl 5.003 or 5.004, you can continue to use
them with 5.004, but you will need to rebuild and reinstall those extensions to use them 5.005. See the
discussions below on "Coexistence with earlier versions of perl5" and "Upgrading from 5.004 to 5.005" for
more details.
The standard extensions supplied with Perl will be handled automatically.
In a related issue, old extensions may possibly be affected by the changes in the Perl language in the current
release. Please see pod/perldelta.pod for a description of what‘s changed.
Space Requirements
The complete perl5 source tree takes up about 10 MB of disk space. The complete tree after completing
make takes roughly 20 MB, though the actual total is likely to be quite system−dependent. The installation
directories need something on the order of 10 MB, though again that value is system−dependent.
18−Oct−1998
Version 5.005_02
3
INSTALL
Perl Programmers Reference Guide
INSTALL
Start with a Fresh Distribution
If you have built perl before, you should clean out the build directory with the command
make distclean
or
make realclean
The only difference between the two is that make distclean also removes your old config.sh and Policy.sh
files.
The results of a Configure run are stored in the config.sh and Policy.sh files. If you are upgrading from a
previous version of perl, or if you change systems or compilers or make other significant changes, or if you
are experiencing difficulties building perl, you should probably not re−use your old config.sh. Simply
remove it or rename it, e.g.
mv config.sh config.sh.old
If you wish to use your old config.sh, be especially attentive to the version and architecture−specific
questions and answers. For example, the default directory for architecture−dependent library modules
includes the version name. By default, Configure will reuse your old name (e.g.
/opt/perl/lib/i86pc−solaris/5.003) even if you‘re running Configure for a different version, e.g. 5.004. Yes,
Configure should probably check and correct for this, but it doesn‘t, presently. Similarly, if you used a
shared libperl.so (see below) with version numbers, you will probably want to adjust them as well.
Also, be careful to check your architecture name. Some Linux systems (such as Debian) use i386, while
others may use i486, i586, or i686. If you pick up a precompiled binary, it might not use the same name.
In short, if you wish to use your old config.sh, I recommend running Configure interactively rather than
blindly accepting the defaults.
If your reason to reuse your old config.sh is to save your particular installation choices, then you can
probably achieve the same effect by using the new Policy.sh file. See the section on
"Site−wide Policy settings" below.
Run Configure
Configure will figure out various things about your system. Some things Configure will figure out for itself,
other things it will ask you about. To accept the default, just press RETURN. The default is almost always
okay. At any Configure prompt, you can type &−d and Configure will use the defaults from then on.
After it runs, Configure will perform variable substitution on all the *.SH files and offer to run make depend.
Configure supports a number of useful options. Run Configure −h to get a listing. See the Porting/Glossary
file for a complete list of Configure variables you can set and their definitions.
To compile with gcc, for example, you should run
sh Configure −Dcc=gcc
This is the preferred way to specify gcc (or another alternative compiler) so that the hints files can set
appropriate defaults.
If you want to use your old config.sh but override some of the items with command line options, you need to
use Configure −O.
By default, for most systems, perl will be installed in /usr/local/{bin, lib, man}. You can specify a different
‘prefix’ for the default installation directory, when Configure prompts you or by using the Configure
command line option −Dprefix=‘/some/directory‘, e.g.
sh Configure −Dprefix=/opt/perl
4
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
If your prefix contains the string "perl", then the directories are simplified. For example, if you use
prefix=/opt/perl, then Configure will suggest /opt/perl/lib instead of /opt/perl/lib/perl5/.
NOTE: You must not specify an installation directory that is below your perl source directory. If you do,
installperl will attempt infinite recursion.
It may seem obvious to say, but Perl is useful only when users can easily find it. It‘s often a good idea to
have both /usr/bin/perl and /usr/local/bin/perl be symlinks to the actual binary. Be especially careful,
however, of overwriting a version of perl supplied by your vendor. In any case, system administrators are
strongly encouraged to put (symlinks to) perl and its accompanying utilities, such as perldoc, into a directory
typically found along a user‘s PATH, or in another obvious and convenient place.
By default, Configure will compile perl to use dynamic loading if your system supports it. If you want to
force perl to be compiled statically, you can either choose this when Configure prompts you or you can use
the Configure command line option −Uusedl.
If you are willing to accept all the defaults, and you want terse output, you can run
sh Configure −des
For my Solaris system, I usually use
sh Configure −Dprefix=/opt/perl −Doptimize=’−xpentium −xO4’ −des
GNU−style configure
If you prefer the GNU−style configure command line interface, you can use the supplied configure.gnu
command, e.g.
CC=gcc ./configure.gnu
The configure.gnu script emulates a few of the more common configure options. Try
./configure.gnu −−help
for a listing.
Cross compiling is not supported.
(The file is called configure.gnu to avoid problems on systems that would not distinguish the files
"Configure" and "configure".)
Extensions
By default, Configure will offer to build every extension which appears to be supported. For example,
Configure will offer to build GDBM_File only if it is able to find the gdbm library. (See examples below.)
B, DynaLoader, Fcntl, IO, and attrs are always built by default. Configure does not contain code to test for
POSIX compliance, so POSIX is always built by default as well. If you wish to skip POSIX, you can set the
Configure variable useposix=false either in a hint file or from the Configure command line. Similarly, the
Opcode extension is always built by default, but you can skip it by setting the Configure variable
useopcode=false either in a hint file for from the command line.
You can learn more about each of these extensions by consulting the documentation in the individual .pm
modules, located under the ext/ subdirectory.
Even if you do not have dynamic loading, you must still build the DynaLoader extension; you should just
build the stub dl_none.xs version. (Configure will suggest this as the default.)
In summary, here are the Configure command−line variables you can set to turn off each extension:
B
DB_File
DynaLoader
Fcntl
GDBM_File
IO
18−Oct−1998
(Always included by default)
i_db
(Must always be included as a static extension)
(Always included by default)
i_gdbm
(Always included by default)
Version 5.005_02
5
INSTALL
Perl Programmers Reference Guide
NDBM_File
ODBM_File
POSIX
SDBM_File
Opcode
Socket
Threads
attrs
INSTALL
i_ndbm
i_dbm
useposix
(Always included by default)
useopcode
d_socket
usethreads
(Always included by default)
Thus to skip the NDBM_File extension, you can use
sh Configure −Ui_ndbm
Again, this is taken care of automatically if you don‘t have the ndbm library.
Of course, you may always run Configure interactively and select only the extensions you want.
Note: The DB_File module will only work with version 1.x of Berkeley DB or newer releases of version 2.
Configure will automatically detect this for you and refuse to try to build DB_File with version 2.
If you re−use your old config.sh but change your system (e.g. by adding libgdbm) Configure will still offer
your old choices of extensions for the default answer, but it will also point out the discrepancy to you.
Finally, if you have dynamic loading (most modern Unix systems do) remember that these extensions do not
increase the size of your perl executable, nor do they impact start−up time, so you probably might as well
build all the ones that will work on your system.
Including locally−installed libraries
Perl5 comes with interfaces to number of database extensions, including dbm, ndbm, gdbm, and Berkeley
db. For each extension, if Configure can find the appropriate header files and libraries, it will automatically
include that extension. The gdbm and db libraries are not included with perl. See the library documentation
for how to obtain the libraries.
Note: If your database header (.h) files are not in a directory normally searched by your C compiler, then
you will need to include the appropriate −I/your/directory option when prompted by Configure. If your
database library (.a) files are not in a directory normally searched by your C compiler and linker, then you
will need to include the appropriate −L/your/directory option when prompted by Configure. See the
examples below.
Examples
gdbm in /usr/local
Suppose you have gdbm and want Configure to find it and build the GDBM_File extension. This
examples assumes you have gdbm.h installed in /usr/local/include/gdbm.h and libgdbm.a installed in
/usr/local/lib/libgdbm.a. Configure should figure all the necessary steps out automatically.
Specifically, when Configure prompts you for flags for your C compiler, you should include
−I/usr/local/include.
When Configure prompts you for linker flags, you should include −L/usr/local/lib.
If you are using dynamic loading, then when Configure prompts you for linker flags for dynamic
loading, you should again include −L/usr/local/lib.
Again, this should all happen automatically. If you want to accept the defaults for all the questions and
have Configure print out only terse messages, then you can just run
sh Configure −des
and Configure should include the GDBM_File extension automatically.
This should actually work if you have gdbm installed in any of (/usr/local, /opt/local, /usr/gnu,
/opt/gnu, /usr/GNU, or /opt/GNU).
6
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
gdbm in /usr/you
Suppose you have gdbm installed in some place other than /usr/local/, but you still want Configure to
find it. To be specific, assume you have /usr/you/include/gdbm.h and /usr/you/lib/libgdbm.a. You still
have to add −I/usr/you/include to cc flags, but you have to take an extra step to help Configure find
libgdbm.a. Specifically, when Configure prompts you for library directories, you have to add
/usr/you/lib to the list.
It is possible to specify this from the command line too (all on one line):
sh Configure −des \
−Dlocincpth="/usr/you/include" \
−Dloclibpth="/usr/you/lib"
locincpth is a space−separated list of include directories to search. Configure will automatically add
the appropriate −I directives.
loclibpth is a space−separated list of library directories to search. Configure will automatically add the
appropriate −L directives. If you have some libraries under /usr/local/ and others under /usr/you, then
you have to include both, namely
sh Configure −des \
−Dlocincpth="/usr/you/include /usr/local/include" \
−Dloclibpth="/usr/you/lib /usr/local/lib"
Installation Directories
The installation directories can all be changed by answering the appropriate questions in Configure. For
convenience, all the installation questions are near the beginning of Configure.
I highly recommend running Configure interactively to be sure it puts everything where you want it. At any
point during the Configure process, you can answer a question with &−d and Configure will use the
defaults from then on.
By default, Configure will use the following directories for library files for 5.005 (archname is a string like
sun4−sunos, determined by Configure).
Configure variable
$archlib
$privlib
$sitearch
$sitelib
Default value
/usr/local/lib/perl5/5.005/archname
/usr/local/lib/perl5/5.005
/usr/local/lib/perl5/site_perl/5.005/archname
/usr/local/lib/perl5/site_perl/5.005
Some users prefer to append a "/share" to $privlib and $sitelib to emphasize that those directories
can be shared among different architectures.
By default, Configure will use the following directories for manual pages:
Configure variable
$man1dir
$man3dir
Default value
/usr/local/man/man1
/usr/local/lib/perl5/man/man3
(Actually, Configure recognizes the SVR3−style /usr/local/man/l_man/man1 directories, if present, and uses
those instead.)
The module man pages are stuck in that strange spot so that they don‘t collide with other man pages stored in
/usr/local/man/man3, and so that Perl‘s man pages don‘t hide system man pages. On some systems, man
less would end up calling up Perl‘s less.pm module man page, rather than the less program. (This default
location will likely change to /usr/local/man/man3 in a future release of perl.)
Note: Many users prefer to store the module man pages in /usr/local/man/man3. You can do this from the
command line with
18−Oct−1998
Version 5.005_02
7
INSTALL
Perl Programmers Reference Guide
INSTALL
sh Configure −Dman3dir=/usr/local/man/man3
Some users also prefer to use a .3pm suffix. You can do that with
sh Configure −Dman3ext=3pm
If you specify a prefix that contains the string "perl", then the directory structure is simplified. For example,
if you Configure with −Dprefix=/opt/perl, then the defaults for 5.005 are
Configure variable
$archlib
$privlib
$sitearch
$sitelib
$man1dir
$man3dir
Default value
/opt/perl/lib/5.005/archname
/opt/perl/lib/5.005
/opt/perl/lib/site_perl/5.005/archname
/opt/perl/lib/site_perl/5.005
/opt/perl/man/man1
/opt/perl/man/man3
The perl executable will search the libraries in the order given above.
The directories under site_perl are empty, but are intended to be used for installing local or site−wide
extensions. Perl will automatically look in these directories.
In order to support using things like #!/usr/local/bin/perl5.005 after a later version is released,
architecture−dependent libraries are stored in a version−specific directory, such as
/usr/local/lib/perl5/archname/5.005/.
Further details about the installation directories, maintenance and development subversions, and about
supporting multiple versions are discussed in "Coexistence with earlier versions of perl5" below.
Again, these are just the defaults, and can be changed as you run Configure.
Changing the installation directory
Configure distinguishes between the directory in which perl (and its associated files) should be installed and
the directory in which it will eventually reside. For most sites, these two are the same; for sites that use AFS,
this distinction is handled automatically. However, sites that use software such as depot to manage software
packages may also wish to install perl into a different directory and use that management software to move
perl to its final destination. This section describes how to do this. Someday, Configure may support an
option −Dinstallprefix=/foo to simplify this.
Suppose you want to install perl under the /tmp/perl5 directory. You can edit config.sh and change all the
install* variables to point to /tmp/perl5 instead of /usr/local/wherever. Or, you can automate this process by
placing the following lines in a file config.over before you run Configure (replace /tmp/perl5 by a directory
of your choice):
installprefix=/tmp/perl5
test −d $installprefix || mkdir $installprefix
test −d $installprefix/bin || mkdir $installprefix/bin
installarchlib=‘echo $installarchlib | sed "s!$prefix!$installprefix!"‘
installbin=‘echo $installbin | sed "s!$prefix!$installprefix!"‘
installman1dir=‘echo $installman1dir | sed "s!$prefix!$installprefix!"‘
installman3dir=‘echo $installman3dir | sed "s!$prefix!$installprefix!"‘
installprivlib=‘echo $installprivlib | sed "s!$prefix!$installprefix!"‘
installscript=‘echo $installscript | sed "s!$prefix!$installprefix!"‘
installsitelib=‘echo $installsitelib | sed "s!$prefix!$installprefix!"‘
installsitearch=‘echo $installsitearch | sed "s!$prefix!$installprefix!"‘
Then, you can Configure and install in the usual way:
sh Configure −des
make
make test
8
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
make install
Beware, though, that if you go to try to install new add−on extensions, they too will get installed in under
‘/tmp/perl5’ if you follow this example. The next section shows one way of dealing with that problem.
Creating an installable tar archive
If you need to install perl on many identical systems, it is convenient to compile it once and create an archive
that can be installed on multiple systems. Here‘s one way to do that:
# Set up config.over to install perl into a different directory,
# e.g. /tmp/perl5 (see previous part).
sh Configure −des
make
make test
make install
cd /tmp/perl5
# Edit $archlib/Config.pm to change all the
# install* variables back to reflect where everything will
# really be installed.
# Edit any of the scripts in $scriptdir to have the correct
# #!/wherever/perl line.
tar cvf ../perl5−archive.tar .
# Then, on each machine where you want to install perl,
cd /usr/local # Or wherever you specified as $prefix
tar xvf perl5−archive.tar
Site−wide Policy settings
After Configure runs, it stores a number of common site−wide "policy" answers (such as installation
directories and the local perl contact person) in the Policy.sh file. If you want to build perl on another
system using the same policy defaults, simply copy the Policy.sh file to the new system and Configure will
use it along with the appropriate hint file for your system.
Alternatively, if you wish to change some or all of those policy answers, you should
rm −f Policy.sh
to ensure that Configure doesn‘t re−use them.
Further information is in the Policy_sh.SH file itself.
Configure−time Options
There are several different ways to Configure and build perl for your system. For most users, the defaults are
sensible and will work. Some users, however, may wish to further customize perl. Here are some of the
main things you can change.
Threads
On some platforms, perl5.005 can be compiled to use threads.
README.threads, and then try
To enable this, read the file
sh Configure −Dusethreads
Currently, you need to specify −Dusethreads on the Configure command line so that the hint files can make
appropriate adjustments.
The default is to compile without thread support.
Selecting File IO mechanisms
Previous versions of perl used the standard IO mechanisms as defined in stdio.h. Versions 5.003_02 and
later of perl allow alternate IO mechanisms via a "PerlIO" abstraction, but the stdio mechanism is still the
default and is the only supported mechanism.
18−Oct−1998
Version 5.005_02
9
INSTALL
Perl Programmers Reference Guide
INSTALL
This PerlIO abstraction can be enabled either on the Configure command line with
sh Configure −Duseperlio
or interactively at the appropriate Configure prompt.
If you choose to use the PerlIO abstraction layer, there are two (experimental) possibilities for the underlying
IO calls. These have been tested to some extent on some platforms, but are not guaranteed to work
everywhere.
1.
AT&T‘s "sfio". This has superior performance to stdio.h in many cases, and is extensible by the use
of "discipline" modules. Sfio currently only builds on a subset of the UNIX platforms perl supports.
Because the data structures are completely different from stdio, perl extension modules or external
libraries may not work. This configuration exists to allow these issues to be worked on.
This option requires the ‘sfio’ package to have been built and installed. A (fairly old) version of sfio is
in CPAN.
You select this option by
sh Configure −Duseperlio −Dusesfio
If you have already selected −Duseperlio, and if Configure detects that you have sfio, then sfio will be
the default suggested by Configure.
Note: On some systems, sfio‘s iffe configuration script fails to detect that you have an atexit function
(or equivalent). Apparently, this is a problem at least for some versions of Linux and SunOS 4.
You can test if you have this problem by trying the following shell script. (You may have to add some
extra cflags and libraries. A portable version of this may eventually make its way into Configure.)
#!/bin/sh
cat > try.c <<’EOCP’
#include
main() { printf("42\n"); }
EOCP
cc −o try try.c −lsfio
val=‘./try‘
if test X$val = X42; then
echo "Your sfio looks ok"
else
echo "Your sfio has the exit problem."
fi
If you have this problem, the fix is to go back to your sfio sources and correct iffe‘s guess about atexit.
There also might be a more recent release of Sfio that fixes your problem.
2.
Normal stdio IO, but with all IO going through calls to the PerlIO abstraction layer. This configuration
can be used to check that perl and extension modules have been correctly converted to use the PerlIO
abstraction.
This configuration should work on all platforms (but might not).
You select this option via:
sh Configure −Duseperlio −Uusesfio
If you have already selected −Duseperlio, and if Configure does not detect sfio, then this will be the
default suggested by Configure.
10
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
Building a shared libperl.so Perl library
Currently, for most systems, the main perl executable is built by linking the "perl library" libperl.a with
perlmain.o, your static extensions (usually just DynaLoader.a) and various extra libraries, such as −lm.
On some systems that support dynamic loading, it may be possible to replace libperl.a with a shared
libperl.so. If you anticipate building several different perl binaries (e.g. by embedding libperl into different
programs, or by using the optional compiler extension), then you might wish to build a shared libperl.so so
that all your binaries can share the same library.
The disadvantages are that there may be a significant performance penalty associated with the shared
libperl.so, and that the overall mechanism is still rather fragile with respect to different versions and
upgrades.
In terms of performance, on my test system (Solaris 2.5_x86) the perl test suite took roughly 15% longer to
run with the shared libperl.so. Your system and typical applications may well give quite different results.
The default name for the shared library is typically something like libperl.so.3.2 (for Perl 5.003_02) or
libperl.so.302 or simply libperl.so. Configure tries to guess a sensible naming convention based on your C
library name. Since the library gets installed in a version−specific architecture−dependent directory, the
exact name isn‘t very important anyway, as long as your linker is happy.
For some systems (mostly SVR4), building a shared libperl is required for dynamic loading to work, and
hence is already the default.
You can elect to build a shared libperl by
sh Configure −Duseshrplib
To actually build perl, you must add the current working directory to your LD_LIBRARY_PATH
environment variable before running make. You can do this with
LD_LIBRARY_PATH=‘pwd‘:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH
for Bourne−style shells, or
setenv LD_LIBRARY_PATH ‘pwd‘
for Csh−style shells. You *MUST* do this before running make. Folks running NeXT OPENSTEP must
substitute DYLD_LIBRARY_PATH for LD_LIBRARY_PATH above.
There is also an potential problem with the shared perl library if you want to have more than one "flavor" of
the same version of perl (e.g. with and without −DDEBUGGING). For example, suppose you build and
install a standard Perl 5.004 with a shared library. Then, suppose you try to build Perl 5.004 with
−DDEBUGGING enabled, but everything else the same, including all the installation directories. How can
you ensure that your newly built perl will link with your newly built libperl.so.4 rather with the installed
libperl.so.4? The answer is that you might not be able to. The installation directory is encoded in the perl
binary with the LD_RUN_PATH environment variable (or equivalent ld command−line option). On Solaris,
you can override that with LD_LIBRARY_PATH; on Linux you can‘t. On Digital Unix, you can override
LD_LIBRARY_PATH by setting the _RLD_ROOT environment variable to point to the perl build directory.
The only reliable answer is that you should specify a different directory for the architecture−dependent
library for your −DDEBUGGING version of perl. You can do this by changing all the *archlib* variables in
config.sh, namely archlib, archlib_exp, and installarchlib, to point to your new architecture−dependent
library.
Malloc Issues
Perl relies heavily on malloc(3) to grow data structures as needed, so perl‘s performance can be noticeably
affected by the performance of the malloc function on your system.
The perl source is shipped with a version of malloc that is very fast but somewhat wasteful of space. On the
other hand, your system‘s malloc function may be a bit slower but also a bit more frugal. However, as of
18−Oct−1998
Version 5.005_02
11
INSTALL
Perl Programmers Reference Guide
INSTALL
5.004_68, perl‘s malloc has been optimized for the typical requests from perl, so there‘s a chance that it may
be both faster and use less memory.
For many uses, speed is probably the most important consideration, so the default behavior (for most
systems) is to use the malloc supplied with perl. However, if you will be running very large applications
(e.g. Tk or PDL) or if your system already has an excellent malloc, or if you are experiencing difficulties
with extensions that use third−party libraries that call malloc, then you might wish to use your system‘s
malloc. (Or, you might wish to explore the malloc flags discussed below.)
To build without perl‘s malloc, you can use the Configure command
sh Configure −Uusemymalloc
or you can answer ‘n’ at the appropriate interactive Configure prompt.
Malloc Performance Flags
If you are using Perl‘s malloc, you may add one or more of the following items to your ccflags config.sh
variable to change its behavior. You can find out more about these and other flags by reading the
commentary near the top of the malloc.c source. The defaults should be fine for nearly everyone.
−DNO_FANCY_MALLOC
Undefined by default. Defining it returns malloc to the version used in Perl 5.004.
−DPLAIN_MALLOC
Undefined by default. Defining it in addition to NO_FANCY_MALLOC returns malloc to the version
used in Perl version 5.000.
Building a debugging perl
You can run perl scripts under the perl debugger at any time with perl −d your_script. If, however, you
want to debug perl itself, you probably want to do
sh Configure −Doptimize=’−g’
This will do two independent things: First, it will force compilation to use cc −g so that you can use your
system‘s debugger on the executable. (Note: Your system may actually require something like cc −g2.
Check your man pages for cc(1) and also any hint file for your system.) Second, it will add
−DDEBUGGING to your ccflags variable in config.sh so that you can use perl −D to access perl‘s internal
state. (Note: Configure will only add −DDEBUGGING by default if you are not reusing your old config.sh.
If you want to reuse your old config.sh, then you can just edit it and change the optimize and ccflags
variables by hand and then propagate your changes as shown in "Propagating your changes to config.sh"
below.)
You can actually specify −g and −DDEBUGGING independently, but usually it‘s convenient to have both.
If you are using a shared libperl, see the warnings about multiple versions of perl under
Building a shared libperl.so Perl library.
Other Compiler Flags
For most users, all of the Configure defaults are fine. However, you can change a number of factors in the
way perl is built by adding appropriate −D directives to your ccflags variable in config.sh.
For example, you can replace the rand() and srand() functions in the perl source by any other random
number generator by a trick such as the following (this should all be on one line):
sh Configure −Dccflags=’−Dmy_rand=random −Dmy_srand=srandom’ \
−Drandbits=31
or you can use the drand48 family of functions with
sh Configure −Dccflags=’−Dmy_rand=lrand48 −Dmy_srand=srand48’ \
−Drandbits=31
12
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
or by adding the −D flags to your ccflags at the appropriate Configure prompt. (Read pp.c to see how this
works.)
You should also run Configure interactively to verify that a hint file doesn‘t inadvertently override your
ccflags setting. (Hints files shouldn‘t do that, but some might.)
What if it doesn‘t work?
Running Configure Interactively
If Configure runs into trouble, remember that you can always run Configure interactively so that you
can check (and correct) its guesses.
All the installation questions have been moved to the top, so you don‘t have to wait for them. Once
you‘ve handled them (and your C compiler and flags) you can type &−d at the next Configure prompt
and Configure will use the defaults from then on.
If you find yourself trying obscure command line incantations and config.over tricks, I recommend you
run Configure interactively instead. You‘ll probably save yourself time in the long run.
Hint files
The perl distribution includes a number of system−specific hints files in the hints/ directory. If one of
them matches your system, Configure will offer to use that hint file.
Several of the hint files contain additional important information. If you have any problems, it is a
good idea to read the relevant hint file for further information. See hints/solaris_2.sh for an extensive
example. More information about writing good hints is in the hints/README.hints file.
** WHOA THERE!!! ***
Occasionally, Configure makes a wrong guess. For example, on SunOS 4.1.3, Configure incorrectly
concludes that tzname[] is in the standard C library. The hint file is set up to correct for this. You will
see a message:
*** WHOA THERE!!! ***
The recommended value for $d_tzname on this machine was "undef"!
Keep the recommended value? [y]
You should always keep the recommended value unless, after reading the relevant section of the hint
file, you are sure you want to try overriding it.
If you are re−using an old config.sh, the word "previous" will be used instead of "recommended".
Again, you will almost always want to keep the previous value, unless you have changed something on
your system.
For example, suppose you have added libgdbm.a to your system and you decide to reconfigure perl to
use GDBM_File. When you run Configure again, you will need to add −lgdbm to the list of libraries.
Now, Configure will find your gdbm include file and library and will issue a message:
*** WHOA THERE!!! ***
The previous value for $i_gdbm on this machine was "undef"!
Keep the previous value? [y]
In this case, you do not want to keep the previous value, so you should answer ‘n’. (You‘ll also have
to manually add GDBM_File to the list of dynamic extensions to build.)
Changing Compilers
If you change compilers or make other significant changes, you should probably not re−use your old
config.sh. Simply remove it or rename it, e.g. mv config.sh config.sh.old. Then rerun Configure with
the options you want to use.
This is a common source of problems. If you change from cc to gcc, you should almost always
remove your old config.sh.
18−Oct−1998
Version 5.005_02
13
INSTALL
Perl Programmers Reference Guide
INSTALL
Propagating your changes to config.sh
If you make any changes to config.sh, you should propagate them to all the .SH files by running
sh Configure −S
You will then have to rebuild by running
make depend
make
config.over
You can also supply a shell script config.over to over−ride Configure‘s guesses. It will get loaded up
at the very end, just before config.sh is created. You have to be careful with this, however, as
Configure does no checking that your changes make sense. See the section on
"Changing the installation directory" for an example.
config.h
Many of the system dependencies are contained in config.h. Configure builds config.h by running the
config_h.SH script. The values for the variables are taken from config.sh.
If there are any problems, you can edit config.h directly. Beware, though, that the next time you run
Configure, your changes will be lost.
cflags
If you have any additional changes to make to the C compiler command line, they can be made in
cflags.SH. For instance, to turn off the optimizer on toke.c, find the line in the switch structure for
toke.c and put the command optimize=‘−g’ before the ;; . You can also edit cflags directly, but beware
that your changes will be lost the next time you run Configure.
To explore various ways of changing ccflags from within a hint file, see the file hints/README.hints.
To change the C flags for all the files, edit config.sh and change either $ccflags or $optimize,
and then re−run
sh Configure −S
make depend
No sh
If you don‘t have sh, you‘ll have to copy the sample file Porting/config_H to config.h and edit the
config.h to reflect your system‘s peculiarities. You‘ll probably also have to extensively modify the
extension building mechanism.
Porting information
Specific information for the OS/2, Plan9, VMS and Win32 ports is in the corresponding README
files and subdirectories. Additional information, including a glossary of all those config.sh variables,
is in the Porting subdirectory.
Ports for other systems may also be available. You should check out http://www.perl.com/CPAN/ports
for current information on ports to various other operating systems.
make depend
This will look for all the includes. The output is stored in makefile. The only difference between Makefile
and makefile is the dependencies at the bottom of makefile. If you have to make any changes, you should
edit makefile, not Makefile since the Unix make command reads makefile first. (On non−Unix systems, the
output may be stored in a different file. Check the value of $firstmakefile in your config.sh if in
doubt.)
Configure will offer to do this step for you, so it isn‘t listed explicitly above.
14
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
make
This will attempt to make perl in the current directory.
If you can‘t compile successfully, try some of the following ideas. If none of them help, and careful reading
of the error message and the relevant manual pages on your system doesn‘t help, you can send a message to
either the comp.lang.perl.misc newsgroup or to perlbug@perl.com with an accurate description of your
problem. See "Reporting Problems" below.
hints
If you used a hint file, try reading the comments in the hint file for further tips and information.
extensions
If you can successfully build miniperl, but the process crashes during the building of extensions, you
should run
make minitest
to test your version of miniperl.
locale
If you have any locale−related environment variables set, try unsetting them. I have some reports that
some versions of IRIX hang while running ./miniperl configpm with locales other than the C locale.
See the discussion under "make test" below about locales and the whole "Locale problems" section in
the file pod/perllocale.pod. The latter is especially useful if you see something like this
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LC_ALL = "En_US",
LANG = (unset)
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
at Perl startup.
malloc duplicates
If you get duplicates upon linking for malloc et al, add −DEMBEDMYMALLOC to your ccflags
variable in config.sh.
varargs
If you get varargs problems with gcc, be sure that gcc is installed correctly and that you are not passing
−I/usr/include to gcc. When using gcc, you should probably have i_stdarg=‘define’ and
i_varargs=‘undef’ in config.sh. The problem is usually solved by running fixincludes correctly. If you
do change config.sh, don‘t forget to propagate your changes (see
"Propagating your changes to config.sh" below). See also the "vsprintf" item below.
util.c
If you get error messages such as the following (the exact line numbers and function name may vary in
different versions of perl):
util.c: In function ‘Perl_form’:
util.c:1107: number of arguments doesn’t match prototype
proto.h:125: prototype declaration
it might well be a symptom of the gcc "varargs problem". See the previous "varargs" item.
Solaris and SunOS dynamic loading
If you have problems with dynamic loading using gcc on SunOS or Solaris, and you are using GNU as
and GNU ld, you may need to add −B/bin/ (for SunOS) or −B/usr/ccs/bin/ (for Solaris) to your
$ccflags, $ldflags, and $lddlflags so that the system‘s versions of as and ld are used.
18−Oct−1998
Version 5.005_02
15
INSTALL
Perl Programmers Reference Guide
INSTALL
Note that the trailing ‘/’ is required. Alternatively, you can use the GCC_EXEC_PREFIX environment
variable to ensure that Sun‘s as and ld are used. Consult your gcc documentation for further
information on the −B option and the GCC_EXEC_PREFIX variable.
One convenient way to ensure you are not using GNU as and ld is to invoke Configure with
sh Configure −Dcc=’gcc −B/usr/ccs/bin/’
for Solaris systems. For a SunOS system, you must use −B/bin/ instead.
Alternatively, recent versions of GNU ld reportedly work if you include −Wl,−export−dynamic
in the ccdlflags variable in config.sh.
ld.so.1: ./perl: fatal: relocation error:
If you get this message on SunOS or Solaris, and you‘re using gcc, it‘s probably the GNU as or GNU
ld problem in the previous item "Solaris and SunOS dynamic loading".
LD_LIBRARY_PATH
If you run into dynamic loading problems, check your setting of the LD_LIBRARY_PATH
environment variable. If you‘re creating a static Perl library (libperl.a rather than libperl.so) it should
build fine with LD_LIBRARY_PATH unset, though that may depend on details of your local set−up.
dlopen: stub interception failed
The primary cause of the ‘dlopen: stub interception failed’ message is that the LD_LIBRARY_PATH
environment variable includes a directory which is a symlink to /usr/lib (such as /lib).
The reason this causes a problem is quite subtle. The file libdl.so.1.0 actually *only* contains
functions which generate ‘stub interception failed’ errors! The runtime linker intercepts links to
"/usr/lib/libdl.so.1.0" and links in internal implementation of those functions instead. [Thanks to Tim
Bunce for this explanation.]
nm extraction
If Configure seems to be having trouble finding library functions, try not using nm extraction. You
can do this from the command line with
sh Configure −Uusenm
or by answering the nm extraction question interactively. If you have previously run Configure, you
should not reuse your old config.sh.
umask not found
If the build processes encounters errors relating to umask(), the problem is probably that Configure
couldn‘t find your umask() system call. Check your config.sh. You should have d_umask=‘define’.
If you don‘t, this is probably the "nm extraction" problem discussed above. Also, try reading the hints
file for your system for further information.
vsprintf
If you run into problems with vsprintf in compiling util.c, the problem is probably that Configure failed
to detect your system‘s version of vsprintf(). Check whether your system has vprintf().
(Virtually all modern Unix systems do.) Then, check the variable d_vprintf in config.sh. If your
system has vprintf, it should be:
d_vprintf=’define’
If Configure guessed wrong, it is likely that Configure guessed wrong on a number of other common
functions too. This is probably the "nm extraction" problem discussed above.
do_aspawn
If you run into problems relating to do_aspawn or do_spawn, the problem is probably that Configure
failed to detect your system‘s fork() function. Follow the procedure in the previous item on
"nm extraction".
16
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
__inet_* errors
If you receive unresolved symbol errors during Perl build and/or test referring to __inet_* symbols,
check to see whether BIND 8.1 is installed. It installs a /usr/local/include/arpa/inet.h that refers to
these symbols. Versions of BIND later than 8.1 do not install inet.h in that location and avoid the
errors. You should probably update to a newer version of BIND. If you can‘t, you can either link with
the updated resolver library provided with BIND 8.1 or rename /usr/local/bin/arpa/inet.h during the
Perl build and test process to avoid the problem.
Optimizer
If you can‘t compile successfully, try turning off your compiler‘s optimizer. Edit config.sh and change
the line
optimize=’−O’
to
optimize=’ ’
then propagate your changes with sh Configure −S and rebuild with make depend; make.
CRIPPLED_CC
If you still can‘t compile successfully, try adding a −DCRIPPLED_CC flag. (Just because you get no
errors doesn‘t mean it compiled right!) This simplifies some complicated expressions for compilers
that get indigestion easily.
Missing functions
If you have missing routines, you probably need to add some library or other, or you need to undefine
some feature that Configure thought was there but is defective or incomplete. Look through config.h
for likely suspects. If Configure guessed wrong on a number of functions, you might have the
"nm extraction" problem discussed above.
toke.c
Some compilers will not compile or optimize the larger files (such as toke.c) without some extra
switches to use larger jump offsets or allocate larger internal tables. You can customize the switches
for each file in cflags. It‘s okay to insert rules for specific files into makefile since a default rule only
takes effect in the absence of a specific rule.
Missing dbmclose
SCO prior to 3.2.4 may be missing dbmclose(). An upgrade to 3.2.4 that includes libdbm.nfs
(which includes dbmclose()) may be available.
Note (probably harmless): No library found for −lsomething
If you see such a message during the building of an extension, but the extension passes its tests anyway
(see "make test" below), then don‘t worry about the warning message. The extension Makefile.PL
goes looking for various libraries needed on various systems; few systems will need all the possible
libraries listed. For example, a system may have −lcposix or −lposix, but it‘s unlikely to have both, so
most users will see warnings for the one they don‘t have. The phrase ‘probably harmless’ is intended
to reassure you that nothing unusual is happening, and the build process is continuing.
On the other hand, if you are building GDBM_File and you get the message
Note (probably harmless): No library found for −lgdbm
then it‘s likely you‘re going to run into trouble somewhere along the line, since it‘s hard to see how
you can use the GDBM_File extension without the −lgdbm library.
It is true that, in principle, Configure could have figured all of this out, but Configure and the extension
building process are not quite that tightly coordinated.
18−Oct−1998
Version 5.005_02
17
INSTALL
Perl Programmers Reference Guide
INSTALL
sh: ar: not found
This is a message from your shell telling you that the command ‘ar’ was not found. You need to check
your PATH environment variable to make sure that it includes the directory with the ‘ar’ command.
This is a common problem on Solaris, where ‘ar’ is in the /usr/ccs/bin directory.
db−recno failure on tests 51, 53 and 55
Old versions of the DB library (including the DB library which comes with FreeBSD 2.1) had broken
handling of recno databases with modified bval settings. Upgrade your DB library or OS.
Bad arg length for semctl, is XX, should be ZZZ
If you get this error message from the lib/ipc_sysv test, your System V IPC may be broken. The XX
typically is 20, and that is what ZZZ also should be. Consider upgrading your OS, or reconfiguring
your OS to include the System V semaphores.
lib/ipc_sysv........semget: No space left on device
Either your account or the whole system has run out of semaphores. Or both. Either list the
semaphores with "ipcs" and remove the unneeded ones (which ones these are depends on your system
and applications) with "ipcrm −s SEMAPHORE_ID_HERE" or configure more semaphores to your
system.
Miscellaneous
Some additional things that have been reported for either perl4 or perl5:
Genix may need to use libc rather than libc_s, or #undef VARARGS.
NCR Tower 32 (OS 2.01.01) may need −W2,−Sl,2000 and #undef MKDIR.
UTS may need one or more of −DCRIPPLED_CC, −K or −g, and undef LSTAT.
FreeBSD can fail the lib/ipc_sysv.t test if SysV IPC has not been configured to the kernel. Perl tries to
detect this, though, and you will get a message telling what to do.
If you get syntax errors on ‘(‘, try −DCRIPPLED_CC.
Machines with half−implemented dbm routines will need to #undef I_ODBM
make test
This will run the regression tests on the perl you just made (you should run plain ‘make’ before ‘make test’
otherwise you won‘t have a complete build). If ‘make test’ doesn‘t say "All tests successful" then something
went wrong. See the file t/README in the t subdirectory.
Note that you can‘t run the tests in background if this disables opening of /dev/tty. You can use ‘make
test−notty’ in that case but a few tty tests will be skipped.
What if make test doesn‘t work?
If make test bombs out, just cd to the t directory and run ./TEST by hand to see if it makes any difference. If
individual tests bomb, you can run them by hand, e.g.,
./perl op/groups.t
Another way to get more detailed information about failed tests and individual subtests is to cd to the t
directory and run
./perl harness
(this assumes that most basic tests succeed, since harness uses complicated constructs).
You should also read the individual tests to see if there are any helpful comments that apply to your system.
locale
Note: One possible reason for errors is that some external programs may be broken due to the
combination of your environment and the way make test exercises them. For example, this may
18
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
happen if you have one or more of these environment variables set: LC_ALL LC_CTYPE
LC_COLLATE LANG. In some versions of UNIX, the non−English locales are known to cause
programs to exhibit mysterious errors.
If you have any of the above environment variables set, please try
setenv LC_ALL C
(for C shell) or
LC_ALL=C;export LC_ALL
for Bourne or Korn shell) from the command line and then retry make test. If the tests then succeed,
you may have a broken program that is confusing the testing. Please run the troublesome test by hand
as shown above and see whether you can locate the program. Look for things like: exec, ‘backquoted
command‘, system, open("|...") or open("...|"). All these mean that Perl is trying to run some external
program.
Out of memory
On some systems, particularly those with smaller amounts of RAM, some of the tests in t/op/pat.t may
fail with an "Out of memory" message. Specifically, in perl5.004_64, tests 74 and 78 have been
reported to fail on some systems. On my SparcStation IPC with 8 MB of RAM, test 78 will fail if the
system is running any other significant tasks at the same time.
Try stopping other jobs on the system and then running the test by itself:
cd t; ./perl op/pat.t
to see if you have any better luck. If your perl still fails this test, it does not necessarily mean you have
a broken perl. This test tries to exercise the regular expression subsystem quite thoroughly, and may
well be far more demanding than your normal usage.
make install
This will put perl into the public directory you specified to Configure; by default this is /usr/local/bin. It will
also try to put the man pages in a reasonable place. It will not nroff the man pages, however. You may need
to be root to run make install. If you are not root, you must own the directories in question and you should
ignore any messages about chown not working.
Installing perl under different names
If you want to install perl under a name other than "perl" (for example, when installing perl with special
features enabled, such as debugging), indicate the alternate name on the "make install" line, such as:
make install PERLNAME=myperl
Installed files
If you want to see exactly what will happen without installing anything, you can run
./perl installperl −n
./perl installman −n
make install will install the following:
perl,
perl5.nnn
suidperl,
sperl5.nnn
a2p
cppstdin
c2ph, pstruct
s2p
18−Oct−1998
where nnn is the current release number.
will be a link to perl.
This
If you requested setuid emulation.
awk−to−perl translator
This is used by perl −P, if your cc −E can’t
read from stdin.
Scripts for handling C structures in header files.
sed−to−perl translator
Version 5.005_02
19
INSTALL
Perl Programmers Reference Guide
INSTALL
find2perl
find−to−perl translator
h2ph
Extract constants and simple macros from C headers
h2xs
Converts C .h header files to Perl extensions.
perlbug
Tool to report bugs in Perl.
perldoc
Tool to read perl’s pod documentation.
pl2pm
Convert Perl 4 .pl files to Perl 5 .pm modules
pod2html,
Converters from perl’s pod documentation format
pod2latex,
to other useful formats.
pod2man, and
pod2text
splain
Describe Perl warnings and errors
library files
man pages
module
man pages
pod/*.pod
in $privlib and $archlib specified to
Configure, usually under /usr/local/lib/perl5/.
in the location specified to Configure, usually
something like /usr/local/man/man1.
in the location specified to Configure, usually
under /usr/local/lib/perl5/man/man3.
in $privlib/pod/.
Installperl will also create the library directories $siteperl and $sitearch listed in config.sh. Usually,
these are something like
/usr/local/lib/perl5/site_perl/5.005
/usr/local/lib/perl5/site_perl/5.005/archname
where archname is something like sun4−sunos. These directories will be used for installing extensions.
Perl‘s *.h header files and the libperl.a library are also installed under $archlib so that any user may later
build new extensions, run the optional Perl compiler, or embed the perl interpreter into another program even
if the Perl source is no longer available.
Coexistence with earlier versions of perl5
WARNING: The upgrade from 5.004_0x to 5.005 is going to be a bit tricky. See
"Upgrading from 5.004 to 5.005" below.
In general, you can usually safely upgrade from one version of Perl (e.g. 5.004_04) to another similar version
(e.g. 5.004_05) without re−compiling all of your add−on extensions. You can also safely leave the old
version around in case the new version causes you problems for some reason. For example, if you want to be
sure that your script continues to run with 5.004_04, simply replace the ‘#!/usr/local/bin/perl’ line at the top
of the script with the particular version you want to run, e.g. #!/usr/local/bin/perl5.00404.
Most extensions will probably not need to be recompiled to use with a newer version of perl. Here is how it
is supposed to work. (These examples assume you accept all the Configure defaults.)
The directories searched by version 5.005 will be
Configure variable
$archlib
$privlib
$sitearch
$sitelib
Default value
/usr/local/lib/perl5/5.005/archname
/usr/local/lib/perl5/5.005
/usr/local/lib/perl5/site_perl/5.005/archname
/usr/local/lib/perl5/site_perl/5.005
while the directories searched by version 5.005_01 will be
$archlib
$privlib
$sitearch
$sitelib
20
/usr/local/lib/perl5/5.00501/archname
/usr/local/lib/perl5/5.00501
/usr/local/lib/perl5/site_perl/5.005/archname
/usr/local/lib/perl5/site_perl/5.005
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
When you install an add−on extension, it gets installed into $sitelib (or $sitearch if it is
architecture−specific). This directory deliberately does NOT include the sub−version number (01) so that
both 5.005 and 5.005_01 can use the extension. Only when a perl version changes to break backwards
compatibility will the default suggestions for the $sitearch and $sitelib version numbers be
increased.
However, if you do run into problems, and you want to continue to use the old version of perl along with
your extension, move those extension files to the appropriate version directory, such as $privlib (or
$archlib). (The extension‘s .packlist file lists the files installed with that extension. For the Tk
extension, for example, the list of files installed is in $sitearch/auto/Tk/.packlist.) Then use
your newer version of perl to rebuild and re−install the extension into $sitelib. This way, Perl 5.005
will find your files in the 5.005 directory, and newer versions of perl will find your newer extension in the
$sitelib directory. (This is also why perl searches the site−specific libraries last.)
Alternatively, if you are willing to reinstall all your extensions every time you upgrade perl, then you can
include the subversion number in $sitearch and $sitelib when you run Configure.
Maintaining completely separate versions
Many users prefer to keep all versions of perl in completely separate directories. One convenient way to do
this is by using a separate prefix for each version, such as
sh Configure −Dprefix=/opt/perl5.004
and adding /opt/perl5.004/bin to the shell PATH variable. Such users may also wish to add a symbolic link
/usr/local/bin/perl so that scripts can still start with #!/usr/local/bin/perl.
Others might share a common directory for maintenance sub−versions (e.g. 5.004 for all 5.004_0x versions),
but change directory with each major version.
If you are installing a development subversion, you probably ought to seriously consider using a separate
directory, since development subversions may not have all the compatibility wrinkles ironed out yet.
Upgrading from 5.004 to 5.005
Extensions built and installed with versions of perl prior to 5.004_50 will need to be recompiled to be used
with 5.004_50 and later. You will, however, be able to continue using 5.004 even after you install 5.005.
The 5.004 binary will still be able to find the extensions built under 5.004; the 5.005 binary will look in the
new $sitearch and $sitelib directories, and will not find them.
Coexistence with perl4
You can safely install perl5 even if you want to keep perl4 around.
By default, the perl5 libraries go into /usr/local/lib/perl5/, so they don‘t override the perl4 libraries in
/usr/local/lib/perl/.
In your /usr/local/bin directory, you should have a binary named perl4.036. That will not be touched by the
perl5 installation process. Most perl4 scripts should run just fine under perl5. However, if you have any
scripts that require perl4, you can replace the #! line at the top of them by #!/usr/local/bin/perl4.036 (or
whatever the appropriate pathname is). See pod/perltrap.pod for possible problems running perl4 scripts
under perl5.
cd /usr/include; h2ph *.h sys/*.h
Some perl scripts need to be able to obtain information from the system header files. This command will
convert the most commonly used header files in /usr/include into files that can be easily interpreted by perl.
These files will be placed in the architecture−dependent library ($archlib) directory you specified to
Configure.
Note: Due to differences in the C and perl languages, the conversion of the header files is not perfect. You
will probably have to hand−edit some of the converted files to get them to parse correctly. For example,
h2ph breaks spectacularly on type casting and certain structures.
18−Oct−1998
Version 5.005_02
21
INSTALL
Perl Programmers Reference Guide
INSTALL
installhtml —help
Some sites may wish to make perl documentation available in HTML format. The installhtml utility can be
used to convert pod documentation into linked HTML files and install them.
The following command−line is an example of one used to convert perl documentation:
./installhtml
\
−−podroot=.
\
−−podpath=lib:ext:pod:vms
\
−−recurse
\
−−htmldir=/perl/nmanual
\
−−htmlroot=/perl/nmanual
\
−−splithead=pod/perlipc
\
−−splititem=pod/perlfunc
\
−−libpods=perlfunc:perlguts:perlvar:perlrun:perlop \
−−verbose
See the documentation in installhtml for more details. It can take many minutes to execute a large
installation and you should expect to see warnings like "no title", "unexpected directive" and "cannot
resolve" as the files are processed. We are aware of these problems (and would welcome patches for them).
You may find it helpful to run installhtml twice. That should reduce the number of "cannot resolve"
warnings.
cd pod && make tex && (process the latex files)
Some sites may also wish to make the documentation in the pod/ directory available in TeX format. Type
(cd pod && make tex && )
Reporting Problems
If you have difficulty building perl, and none of the advice in this file helps, and careful reading of the error
message and the relevant manual pages on your system doesn‘t help either, then you should send a message
to either the comp.lang.perl.misc newsgroup or to perlbug@perl.com with an accurate description of your
problem.
Please include the output of the ./myconfig shell script that comes with the distribution. Alternatively, you
can use the perlbug program that comes with the perl distribution, but you need to have perl compiled before
you can use it. (If you have not installed it yet, you need to run ./perl −Ilib utils/perlbug
instead of a plain perlbug.)
You might also find helpful information in the Porting directory of the perl distribution.
DOCUMENTATION
Read the manual entries before running perl. The main documentation is in the pod/ subdirectory and should
have been installed during the build process. Type man perl to get started. Alternatively, you can type
perldoc perl to use the supplied perldoc script. This is sometimes useful for finding things in the library
modules.
Under UNIX, you can produce a documentation book in postscript form, along with its table of contents, by
going to the pod/ subdirectory and running (either):
./roffitall −groff
./roffitall −psroff
# If you have GNU groff installed
# If you have psroff
This will leave you with two postscript files ready to be printed. (You may need to fix the roffitall command
to use your local troff set−up.)
Note that you must have performed the installation already before running the above, since the script collects
the installed files to generate the documentation.
22
Version 5.005_02
18−Oct−1998
INSTALL
Perl Programmers Reference Guide
INSTALL
AUTHOR
Original author: Andy Dougherty doughera@lafayette.edu , borrowing very heavily from the original
README by Larry Wall, with lots of helpful feedback and additions from the perl5−porters@perl.org folks.
If you have problems, corrections, or questions, please see "Reporting Problems" above.
REDISTRIBUTION
This document is part of the Perl package and may be distributed under the same terms as perl itself.
If you are distributing a modified version of perl (perhaps as part of a larger package) please do modify these
installation instructions and the contact information to match your distribution.
LAST MODIFIED
$Id: INSTALL,v 1.42 1998/07/15 18:04:44 doughera Released $
18−Oct−1998
Version 5.005_02
23
perlfaq
Perl Programmers Reference Guide
perlfaq
NAME
perlfaq − frequently asked questions about Perl ($Date: 1998/08/05 12:09:32 $)
DESCRIPTION
This document is structured into the following sections:
perlfaq: Structural overview of the FAQ.
This document.
perlfaq1: General Questions About Perl
Very general, high−level information about Perl.
perlfaq2: Obtaining and Learning about Perl
Where to find source and documentation to Perl, support, and related matters.
perlfaq3: Programming Tools
Programmer tools and programming support.
perlfaq4: Data Manipulation
Manipulating numbers, dates, strings, arrays, hashes, and miscellaneous data issues.
perlfaq5: Files and Formats
I/O and the "f" issues: filehandles, flushing, formats and footers.
perlfaq6: Regexps
Pattern matching and regular expressions.
perlfaq7: General Perl Language Issues
General Perl language issues that don‘t clearly fit into any of the other sections.
perlfaq8: System Interaction
Interprocess communication (IPC), control over the user−interface (keyboard, screen and pointing
devices).
perlfaq9: Networking
Networking, the Internet, and a few on the web.
Where to get this document
This document is posted regularly to comp.lang.perl.announce and several other related newsgroups. It is
available in a variety of formats from CPAN in the /CPAN/doc/FAQs/FAQ/ directory, or on the web at
http://www.perl.com/perl/faq/ .
How to contribute to this document
You may mail corrections, additions, and suggestions to perlfaq−suggestions@perl.com . This alias should
not be used to ask FAQs. It‘s for fixing the current FAQ.
What will happen if you mail your Perl programming problems to the authors
Your questions will probably go unread, unless they‘re suggestions of new questions to add to the FAQ, in
which case they should have gone to the perlfaq−suggestions@perl.com instead.
You should have read section 2 of this faq. There you would have learned that comp.lang.perl.misc is the
appropriate place to go for free advice. If your question is really important and you require a prompt and
correct answer, you should hire a consultant.
Credits
When I first began the Perl FAQ in the late 80s, I never realized it would have grown to over a hundred
pages, nor that Perl would ever become so popular and widespread. This document could not have been
written without the tremendous help provided by Larry Wall and the rest of the Perl Porters.
24
Version 5.005_02
18−Oct−1998
perlfaq
Perl Programmers Reference Guide
perlfaq
Author and Copyright Information
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
Bundled Distributions
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in these files are hereby placed into the public domain.
You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit.
A simple comment in the code giving credit would be courteous but is not required.
Disclaimer
This information is offered in good faith and in the hope that it may be of use, but is not guaranteed to be
correct, up to date, or suitable for any particular purpose whatsoever. The authors accept no liability in
respect of this information or its use.
Changes
22/June/98
Significant changes throughout in preparation for the 5.005 release.
24/April/97
Style and whitespace changes from Chip, new question on reading one character at a time from a
terminal using POSIX from Tom.
23/April/97
Added http://www.oasis.leo.org/perl/ to perlfaq2. Style fix to perlfaq3. Added floating point
precision, fixed complex number arithmetic, cross−references, caveat for Text::Wrap, alternative
answer for initial capitalizing, fixed incorrect regexp, added example of Tie::IxHash to perlfaq4.
Added example of passing and storing filehandles, added commify to perlfaq5. Restored variable
suicide, and added mass commenting to perlfaq7. Added Net::Telnet, fixed backticks, added
reader/writer pair to telnet question, added FindBin, grouped module questions together in perlfaq8.
Expanded caveats for the simple URL extractor, gave LWP example, added CGI security question,
expanded on the mail address answer in perlfaq9.
25/March/97
Added more info to the binary distribution section of perlfaq2. Added Net::Telnet to perlfaq6. Fixed
typos in perlfaq8. Added mail sending example to perlfaq9. Added Merlyn‘s columns to perlfaq2.
18/March/97
Added the DATE to the NAME section, indicating which sections have changed.
Mentioned SIGPIPE and perlipc in the forking open answer in perlfaq8.
Fixed description of a regular expression in perlfaq4.
17/March/97 Version
Various typos fixed throughout.
Added new question on Perl BNF on perlfaq7.
Initial Release: 11/March/97
This is the initial release of version 3 of the FAQ; consequently there have been no changes since its
initial release.
18−Oct−1998
Version 5.005_02
25
perlfaq1
Perl Programmers Reference Guide
perlfaq1
NAME
perlfaq1 − General Questions About Perl ($Revision: 1.15 $, $Date: 1998/08/05 11:52:24 $)
DESCRIPTION
This section of the FAQ answers very general, high−level questions about Perl.
What is Perl?
Perl is a high−level programming language with an eclectic heritage written by Larry Wall and a cast of
thousands. It derives from the ubiquitous C programming language and to a lesser extent from sed, awk, the
Unix shell, and at least a dozen other tools and languages. Perl‘s process, file, and text manipulation facilities
make it particularly well−suited for tasks involving quick prototyping, system utilities, software tools,
system management tasks, database access, graphical programming, networking, and world wide web
programming. These strengths make it especially popular with system administrators and CGI script authors,
but mathematicians, geneticists, journalists, and even managers also use Perl. Maybe you should, too.
Who supports Perl? Who develops it? Why is it free?
The original culture of the pre−populist Internet and the deeply−held beliefs of Perl‘s author, Larry Wall,
gave rise to the free and open distribution policy of perl. Perl is supported by its users. The core, the
standard Perl library, the optional modules, and the documentation you‘re reading now were all written by
volunteers. See the personal note at the end of the README file in the perl source distribution for more
details. See perlhist (new as of 5.005) for Perl‘s milestone releases.
In particular, the core development team (known as the Perl Porters) are a rag−tag band of highly altruistic
individuals committed to producing better software for free than you could hope to purchase for money.
You may snoop on pending developments via news://genetics.upenn.edu/perl.porters−gw/ and
http://www.frii.com/~gnat/perl/porters/summary.html.
While the GNU project includes Perl in its distributions, there‘s no such thing as "GNU Perl". Perl is not
produced nor maintained by the Free Software Foundation. Perl‘s licensing terms are also more open than
GNU software‘s tend to be.
You can get commercial support of Perl if you wish, although for most users the informal support will more
than suffice. See the answer to "Where can I buy a commercial version of perl?" for more information.
Which version of Perl should I use?
You should definitely use version 5. Version 4 is old, limited, and no longer maintained; its last patch
(4.036) was in 1992. The most recent production release is 5.005_01. Further references to the Perl
language in this document refer to this production release unless otherwise specified. There may be one or
more official bug fixes for 5.005_01 by the time you read this, and also perhaps some experimental versions
on the way to the next release.
What are perl4 and perl5?
Perl4 and perl5 are informal names for different versions of the Perl programming language. It‘s easier to
say "perl5" than it is to say "the 5(.004) release of Perl", but some people have interpreted this to mean
there‘s a language called "perl5", which isn‘t the case. Perl5 is merely the popular name for the fifth major
release (October 1994), while perl4 was the fourth major release (March 1991). There was also a perl1 (in
January 1988), a perl2 (June 1988), and a perl3 (October 1989).
The 5.0 release is, essentially, a complete rewrite of the perl source code from the ground up. It has been
modularized, object−oriented, tweaked, trimmed, and optimized until it almost doesn‘t look like the old
code. However, the interface is mostly the same, and compatibility with previous releases is very high.
To avoid the "what language is perl5?" confusion, some people prefer to simply use "perl" to refer to the
latest version of perl and avoid using "perl5" altogether. It‘s not really that big a deal, though.
See perlhist for a history of Perl revisions.
26
Version 5.005_02
18−Oct−1998
perlfaq1
Perl Programmers Reference Guide
perlfaq1
How stable is Perl?
Production releases, which incorporate bug fixes and new functionality, are widely tested before release.
Since the 5.000 release, we have averaged only about one production release per year.
Larry and the Perl development team occasionally make changes to the internal core of the language, but all
possible efforts are made toward backward compatibility. While not quite all perl4 scripts run flawlessly
under perl5, an update to perl should nearly never invalidate a program written for an earlier version of perl
(barring accidental bug fixes and the rare new keyword).
Is Perl difficult to learn?
No, Perl is easy to start learning — and easy to keep learning. It looks like most programming languages
you‘re likely to have experience with, so if you‘ve ever written an C program, an awk script, a shell script, or
even BASIC program, you‘re already part way there.
Most tasks only require a small subset of the Perl language. One of the guiding mottos for Perl development
is "there‘s more than one way to do it" (TMTOWTDI, sometimes pronounced "tim toady"). Perl‘s learning
curve is therefore shallow (easy to learn) and long (there‘s a whole lot you can do if you really want).
Finally, Perl is (frequently) an interpreted language. This means that you can write your programs and test
them without an intermediate compilation step, allowing you to experiment and test/debug quickly and
easily. This ease of experimentation flattens the learning curve even more.
Things that make Perl easier to learn: Unix experience, almost any kind of programming experience, an
understanding of regular expressions, and the ability to understand other people‘s code. If there‘s something
you need to do, then it‘s probably already been done, and a working example is usually available for free.
Don‘t forget the new perl modules, either. They‘re discussed in Part 3 of this FAQ, along with the CPAN,
which is discussed in Part 2.
How does Perl compare with other languages like Java, Python, REXX, Scheme, or Tcl?
Favorably in some areas, unfavorably in others. Precisely which areas are good and bad is often a personal
choice, so asking this question on Usenet runs a strong risk of starting an unproductive Holy War.
Probably the best thing to do is try to write equivalent code to do a set of tasks. These languages have their
own newsgroups in which you can learn about (but hopefully not argue about) them.
Can I do [task] in Perl?
Perl is flexible and extensible enough for you to use on almost any task, from one−line file−processing tasks
to complex systems. For many people, Perl serves as a great replacement for shell scripting. For others, it
serves as a convenient, high−level replacement for most of what they‘d program in low−level languages like
C or C++. It‘s ultimately up to you (and possibly your management ...) which tasks you‘ll use Perl for and
which you won‘t.
If you have a library that provides an API, you can make any component of it available as just another Perl
function or variable using a Perl extension written in C or C++ and dynamically linked into your main perl
interpreter. You can also go the other direction, and write your main program in C or C++, and then link in
some Perl code on the fly, to create a powerful application.
That said, there will always be small, focused, special−purpose languages dedicated to a specific problem
domain that are simply more convenient for certain kinds of problems. Perl tries to be all things to all
people, but nothing special to anyone. Examples of specialized languages that come to mind include prolog
and matlab.
When shouldn‘t I program in Perl?
When your manager forbids it — but do consider replacing them :−).
Actually, one good reason is when you already have an existing application written in another language
that‘s all done (and done well), or you have an application language specifically designed for a certain task
(e.g. prolog, make).
18−Oct−1998
Version 5.005_02
27
perlfaq1
Perl Programmers Reference Guide
perlfaq1
For various reasons, Perl is probably not well−suited for real−time embedded systems, low−level operating
systems development work like device drivers or context−switching code, complex multithreaded
shared−memory applications, or extremely large applications. You‘ll notice that perl is not itself written in
Perl.
The new native−code compiler for Perl may reduce the limitations given in the previous statement to some
degree, but understand that Perl remains fundamentally a dynamically typed language, and not a statically
typed one. You certainly won‘t be chastized if you don‘t trust nuclear−plant or brain−surgery monitoring
code to it. And Larry will sleep easier, too — Wall Street programs not withstanding. :−)
What‘s the difference between "perl" and "Perl"?
One bit. Oh, you weren‘t talking ASCII? :−) Larry now uses "Perl" to signify the language proper and "perl"
the implementation of it, i.e. the current interpreter. Hence Tom‘s quip that "Nothing but perl can parse
Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and
"Python and Perl" look ok, while "awk and Perl" and "Python and perl" do not.
Is it a Perl program or a Perl script?
It doesn‘t matter.
In "standard terminology" a program has been compiled to physical machine code once, and can then be be
run multiple times, whereas a script must be translated by a program each time it‘s used. Perl programs,
however, are usually neither strictly compiled nor strictly interpreted. They can be compiled to a byte code
form (something of a Perl virtual machine) or to completely different languages, like C or assembly
language. You can‘t tell just by looking whether the source is destined for a pure interpreter, a parse−tree
interpreter, a byte code interpreter, or a native−code compiler, so it‘s hard to give a definitive answer here.
What is a JAPH?
These are the "just another perl hacker" signatures that some people sign their postings with. About 100 of
the of the earlier ones are available from http://www.perl.com/CPAN/misc/japh .
Where can I get a list of Larry Wall witticisms?
Over a hundred quips by Larry, from postings of his or source code, can be found at
http://www.perl.com/CPAN/misc/lwall−quotes .
How can I convince my sysadmin/supervisor/employees to use version (5/5.005/Perl instead of
some other language)?
If your manager or employees are wary of unsupported software, or software which doesn‘t officially ship
with your Operating System, you might try to appeal to their self−interest. If programmers can be more
productive using and utilizing Perl constructs, functionality, simplicity, and power, then the typical
manager/supervisor/employee may be persuaded. Regarding using Perl in general, it‘s also sometimes
helpful to point out that delivery times may be reduced using Perl, as compared to other languages.
If you have a project which has a bottleneck, especially in terms of translation or testing, Perl almost
certainly will provide a viable, and quick solution. In conjunction with any persuasion effort, you should not
fail to point out that Perl is used, quite extensively, and with extremely reliable and valuable results, at many
large computer software and/or hardware companies throughout the world. In fact, many Unix vendors now
ship Perl by default, and support is usually just a news−posting away, if you can‘t find the answer in the
comprehensive documentation, including this FAQ.
If you face reluctance to upgrading from an older version of perl, then point out that version 4 is utterly
unmaintained and unsupported by the Perl Development Team. Another big sell for Perl5 is the large
number of modules and extensions which greatly reduce development time for any given task. Also mention
that the difference between version 4 and version 5 of Perl is like the difference between awk and C++.
(Well, ok, maybe not quite that distinct, but you get the idea.) If you want support and a reasonable
guarantee that what you‘re developing will continue to work in the future, then you have to run the supported
version. That probably means running the 5.005 release, although 5.004 isn‘t that bad (it‘s just one year and
one release behind). Several important bugs were fixed from the 5.000 through 5.003 versions, though, so
try upgrading past them if possible.
28
Version 5.005_02
18−Oct−1998
perlfaq1
Perl Programmers Reference Guide
perlfaq1
Of particular note is the massive bughunt for buffer overflow problems that went into the 5.004 release. All
releases prior to that, including perl4, are considered insecure and should be upgraded as soon as possible.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or
otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of
this FAQ outside of that, see perlfaq.
Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged
to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit to the FAQ would be courteous but is not required.
18−Oct−1998
Version 5.005_02
29
perlfaq2
Perl Programmers Reference Guide
perlfaq2
NAME
perlfaq2 − Obtaining and Learning about Perl ($Revision: 1.25 $, $Date: 1998/08/05 11:47:25 $)
DESCRIPTION
This section of the FAQ answers questions about where to find source and documentation for Perl, support,
and related matters.
What machines support Perl? Where do I get it?
The standard release of Perl (the one maintained by the perl development team) is distributed only in source
code form. You can find this at http://www.perl.com/CPAN/src/latest.tar.gz, which in standard Internet
format (a gzipped archive in POSIX tar format).
Perl builds and runs on a bewildering number of platforms. Virtually all known and current Unix derivatives
are supported (Perl‘s native platform), as are proprietary systems like VMS, DOS, OS/2, Windows, QNX,
BeOS, and the Amiga. There are also the beginnings of support for MPE/iX.
Binary distributions for some proprietary platforms, including Apple systems can be found
http://www.perl.com/CPAN/ports/ directory. Because these are not part of the standard distribution, they
may and in fact do differ from the base Perl port in a variety of ways. You‘ll have to check their respective
release notes to see just what the differences are. These differences can be either positive (e.g. extensions for
the features of the particular platform that are not supported in the source release of perl) or negative (e.g.
might be based upon a less current source release of perl).
A useful FAQ for Win32 Perl users is
http://www.endcontsw.com/people/evangelo/Perl_for_Win32_FAQ.html
How can I get a binary version of Perl?
If you don‘t have a C compiler because for whatever reasons your vendor did not include one with your
system, the best thing to do is grab a binary version of gcc from the net and use that to compile perl with.
CPAN only has binaries for systems that are terribly hard to get free compilers for, not for Unix systems.
Your first stop should be http://www.perl.com/CPAN/ports to see what information is already available. A
simple installation guide for MS−DOS is available at http://www.cs.ruu.nl/~piet/perl5dos.html , and
similarly for Windows 3.1 at http://www.cs.ruu.nl/~piet/perlwin3.html .
I don‘t have a C compiler on my system. How can I compile perl?
Since you don‘t have a C compiler, you‘re doomed and your vendor should be sacrificed to the Sun gods.
But that doesn‘t help you.
What you need to do is get a binary version of gcc for your system first. Consult the Usenet FAQs for your
operating system for information on where to get such a binary version.
I copied the Perl binary from one machine to another, but scripts don‘t work.
That‘s probably because you forgot libraries, or library paths differ. You really should build the whole
distribution on the machine it will eventually live on, and then type make install. Most other
approaches are doomed to failure.
One simple way to check that things are in the right place is to print out the hard−coded @INC which perl is
looking for.
perl −e ’print join("\n",@INC)’
If this command lists any paths which don‘t exist on your system, then you may need to move the
appropriate libraries to these locations, or create symlinks, aliases, or shortcuts appropriately.
You might also want to check out How do I keep my own module/library directory? in perlfaq8.
I grabbed the sources and tried to compile but gdbm/dynamic loading/malloc/linking/... failed.
How do I make it work?
Read the INSTALL file, which is part of the source distribution. It describes in detail how to cope with most
30
Version 5.005_02
18−Oct−1998
perlfaq2
Perl Programmers Reference Guide
perlfaq2
idiosyncracies that the Configure script can‘t work around for any given system or architecture.
What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/...
mean?
CPAN stands for Comprehensive Perl Archive Network, a huge archive replicated on dozens of machines all
over the world. CPAN contains source code, non−native ports, documentation, scripts, and many
third−party modules and extensions, designed for everything from commercial database interfaces to
keyboard/screen control to web walking and CGI scripts. The master machine for CPAN is
ftp://ftp.funet.fi/pub/languages/perl/CPAN/, but you can use the address
http://www.perl.com/CPAN/CPAN.html to fetch a copy from a "site near you". See
http://www.perl.com/CPAN (without a slash at the end) for how this process works.
CPAN/path/... is a naming convention for files available on CPAN sites. CPAN indicates the base directory
of a CPAN mirror, and the rest of the path is the path from that directory to the file. For instance, if you‘re
using ftp://ftp.funet.fi/pub/languages/perl/CPAN as your CPAN site, the file CPAN/misc/japh file is
downloadable as ftp://ftp.funet.fi/pub/languages/perl/CPAN/misc/japh .
Considering that there are hundreds of existing modules in the archive, one probably exists to do nearly
anything you can think of. Current categories under CPAN/modules/by−category/ include perl core modules;
development support; operating system interfaces; networking, devices, and interprocess communication;
data type utilities; database interfaces; user interfaces; interfaces to other languages; filenames, file systems,
and file locking; internationalization and locale; world wide web support; server and daemon utilities;
archiving and compression; image manipulation; mail and news; control flow utilities; filehandle and I/O;
Microsoft Windows modules; and miscellaneous modules.
Is there an ISO or ANSI certified version of Perl?
Certainly not. Larry expects that he‘ll be certified before Perl is.
Where can I get information on Perl?
The complete Perl documentation is available with the perl distribution. If you have perl installed locally,
you probably have the documentation installed as well: type man perl if you‘re on a system resembling
Unix. This will lead you to other important man pages, including how to set your $MANPATH. If you‘re not
on a Unix system, access to the documentation will be different; for example, it might be only in HTML
format. But all proper perl installations have fully−accessible documentation.
You might also try perldoc perl in case your system doesn‘t have a proper man command, or it‘s been
misinstalled. If that doesn‘t work, try looking in /usr/local/lib/perl5/pod for documentation.
If all else fails, consult the CPAN/doc directory, which contains the complete documentation in various
formats, including native pod, troff, html, and plain text. There‘s also a web page at
http://www.perl.com/perl/info/documentation.html that might help.
Many good books have been written about Perl — see the section below for more details.
What are the Perl newsgroups on USENET? Where do I post questions?
The now defunct comp.lang.perl newsgroup has been superseded by the following groups:
comp.lang.perl.announce
comp.lang.perl.misc
comp.lang.perl.moderated
comp.lang.perl.modules
comp.lang.perl.tk
Moderated announcement group
Very busy group about Perl in general
Moderated discussion group
Use and development of Perl modules
Using Tk (and X) from Perl
comp.infosystems.www.authoring.cgi
Writing CGI scripts for the Web.
Actually, the moderated group hasn‘t passed yet, but we‘re keeping our fingers crossed.
There is also USENET gateway to the mailing list used by the crack Perl development team (perl5−porters)
at news://news.perl.com/perl.porters−gw/ .
18−Oct−1998
Version 5.005_02
31
perlfaq2
Perl Programmers Reference Guide
perlfaq2
Where should I post source code?
You should post source code to whichever group is most appropriate, but feel free to cross−post to
comp.lang.perl.misc. If you want to cross−post to alt.sources, please make sure it follows their posting
standards, including setting the Followup−To header line to NOT include alt.sources; see their FAQ for
details.
If you‘re just looking for software, first use Alta Vista, Deja News, and search CPAN. This is faster and
more productive than just posting a request.
Perl Books
A number of books on Perl and/or CGI programming are available. A few of these are good, some are ok,
but many aren‘t worth your money. Tom Christiansen maintains a list of these books, some with extensive
reviews, at http://www.perl.com/perl/critiques/index.html.
The incontestably definitive reference book on Perl, written by the creator of Perl, is now in its second
edition:
Programming Perl (the "Camel Book"):
Authors: Larry Wall, Tom Christiansen, and Randal Schwartz
ISBN 1−56592−149−6
(English)
ISBN 4−89052−384−7
(Japanese)
URL: http://www.oreilly.com/catalog/pperl2/
(French, German, Italian, and Hungarian translations also
available)
The companion volume to the Camel containing thousands of real−world examples, mini−tutorials, and
complete programs (first premiering at the 1998 Perl Conference), is:
The Perl Cookbook (the "Ram Book"):
Authors: Tom Christiansen and Nathan Torkington,
with Foreword by Larry Wall
ISBN: 1−56592−243−3
URL: http://perl.oreilly.com/cookbook/
If you‘re already a hard−core systems programmer, then the Camel Book might suffice for you to learn Perl
from. But if you‘re not, check out:
Learning Perl (the "Llama Book"):
Authors: Randal Schwartz and Tom Christiansen
with Foreword by Larry Wall
ISBN: 1−56592−284−0
URL: http://www.oreilly.com/catalog/lperl2/
Despite the picture at the URL above, the second edition of "Llama Book" really has a blue cover, and is
updated for the 5.004 release of Perl. Various foreign language editions are available, including Learning
Perl on Win32 Systems (the Gecko Book).
If you‘re not an accidental programmer, but a more serious and possibly even degreed computer scientist
who doesn‘t need as much hand−holding as we try to provide in the Llama or its defurred cousin the Gecko,
please check out the delightful book, Perl: The Programmer‘s Companion, written by Nigel Chapman.
You can order O‘Reilly books directly from O‘Reilly & Associates, 1−800−998−9938. Local/overseas is
1−707−829−0515. If you can locate an O‘Reilly order form, you can also fax to 1−707−829−0104. See
http://www.ora.com/ on the Web.
What follows is a list of the books that the FAQ authors found personally useful. Your mileage may (but, we
hope, probably won‘t) vary.
Recommended books on (or muchly on) Perl follow; those marked with a star may be ordered from
O‘Reilly.
32
Version 5.005_02
18−Oct−1998
perlfaq2
Perl Programmers Reference Guide
perlfaq2
References
*Programming Perl
by Larry Wall, Tom Christiansen, and Randal L. Schwartz
*Perl 5 Desktop Reference
By Johan Vromans
Tutorials
*Learning Perl [2nd edition]
by Randal L. Schwartz and Tom Christiansen
with foreword by Larry Wall
*Learning Perl on Win32 Systems
by Randal L. Schwartz, Erik Olson, and Tom Christiansen,
with foreword by Larry Wall
Perl: The Programmer’s Companion
by Nigel Chapman
Cross−Platform Perl
by Eric F. Johnson
MacPerl: Power and Ease
by Vicki Brown and Chris Nandor, foreword by Matthias Neeracher
Task−Oriented
*The Perl Cookbook
by Tom Christiansen and Nathan Torkington
with foreword by Larry Wall
Perl5 Interactive Course [2nd edition]
by Jon Orwant
*Advanced Perl Programming
by Sriram Srinivasan
Effective Perl Programming
by Joseph Hall
Special Topics
*Mastering Regular Expressions
by Jeffrey Friedl
How to Set up and Maintain a World Wide Web Site [2nd edition]
by Lincoln Stein
Perl in Magazines
The first and only periodical devoted to All Things Perl, The Perl Journal contains tutorials, demonstrations,
case studies, announcements, contests, and much more. TPJ has columns on web development, databases,
Win32 Perl, graphical programming, regular expressions, and networking, and sponsors the Obfuscated Perl
Contest. It is published quarterly under the gentle hand of its editor, Jon Orwant. See http://www.tpj.com/
or send mail to subscriptions@tpj.com.
Beyond this, magazines that frequently carry high−quality articles on Perl are Web Techniques (see
http://www.webtechniques.com/), Performance Computing (http://www.performance−computing.com/), and
Usenix‘s newsletter/magazine to its members, login:, at http://www.usenix.org/. Randal‘s Web Technique‘s
columns are available on the web at http://www.stonehenge.com/merlyn/WebTechniques/.
18−Oct−1998
Version 5.005_02
33
perlfaq2
Perl Programmers Reference Guide
perlfaq2
Perl on the Net: FTP and WWW Access
To get the best (and possibly cheapest) performance, pick a site from the list below and use it to grab the
complete list of mirror sites. From there you can find the quickest site for you. Remember, the following list
is not the complete list of CPAN mirrors.
http://www.perl.com/CPAN
(redirects to another mirror)
http://www.perl.org/CPAN
ftp://ftp.funet.fi/pub/languages/perl/CPAN/
http://www.cs.ruu.nl/pub/PERL/CPAN/
ftp://ftp.cs.colorado.edu/pub/perl/CPAN/
What mailing lists are there for perl?
Most of the major modules (tk, CGI, libwww−perl) have their own mailing lists. Consult the documentation
that came with the module for subscription information. The following are a list of mailing lists related to
perl itself.
If you subscribe to a mailing list, it behooves you to know how to unsubscribe from it. Strident pleas to the
list itself to get you off will not be favorably received.
MacPerl
There is a mailing list for discussing Macintosh Perl. Contact "mac−perl−request@iis.ee.ethz.ch".
Also see Matthias Neeracher‘s (the creator and maintainer of MacPerl) webpage at
http://www.iis.ee.ethz.ch/~neeri/macintosh/perl.html for many links to interesting MacPerl sites, and
the applications/MPW tools, precompiled.
Perl5−Porters
The core development team have a mailing list for discussing fixes and changes to the language. Send
mail to "perl5−porters−request@perl.org" with help in the body of the message for information on
subscribing.
NTPerl
This list is used to discuss issues involving Win32 Perl 5 (Windows NT and Win95). Subscribe by
mailing ListManager@ActiveWare.com with the message body:
subscribe Perl−Win32−Users
The list software, also written in perl, will automatically determine your address, and subscribe you
automatically. To unsubscribe, mail the following in the message body to the same address like so:
unsubscribe Perl−Win32−Users
You can also check http://www.activeware.com/ and select "Mailing Lists" to join or leave this list.
Perl−Packrats
Discussion related to archiving of perl materials, particularly the Comprehensive Perl Archive
Network (CPAN). Subscribe by emailing majordomo@cis.ufl.edu:
subscribe perl−packrats
The list software, also written in perl, will automatically determine your address, and subscribe you
automatically. To unsubscribe, simple prepend the same command with an "un", and mail to the same
address like so:
unsubscribe perl−packrats
Archives of comp.lang.perl.misc
Have you tried Deja News or Alta Vista?
ftp.cis.ufl.edu:/pub/perl/comp.lang.perl.*/monthly has an almost complete collection dating back to 12/89
(missing 08/91 through 12/93). They are kept as one large file for each month.
34
Version 5.005_02
18−Oct−1998
perlfaq2
Perl Programmers Reference Guide
perlfaq2
You‘ll probably want more a sophisticated query and retrieval mechanism than a file listing, preferably one
that allows you to retrieve articles using a fast−access indices, keyed on at least author, date, subject, thread
(as in "trn") and probably keywords. The best solution the FAQ authors know of is the MH pick command,
but it is very slow to select on 18000 articles.
If you have, or know where can be found, the missing sections, please let perlfaq−suggestions@perl.com
know.
Where can I buy a commercial version of Perl?
In a sense, Perl already is commercial software: It has a licence that you can grab and carefully read to your
manager. It is distributed in releases and comes in well−defined packages. There is a very large user
community and an extensive literature. The comp.lang.perl.* newsgroups and several of the mailing lists
provide free answers to your questions in near real−time. Perl has traditionally been supported by Larry,
dozens of software designers and developers, and thousands of programmers, all working for free to create a
useful thing to make life better for everyone.
However, these answers may not suffice for managers who require a purchase order from a company whom
they can sue should anything go wrong. Or maybe they need very serious hand−holding and contractual
obligations. Shrink−wrapped CDs with perl on them are available from several sources if that will help.
Or you can purchase a real support contract. Although Cygnus historically provided this service, they no
longer sell support contracts for Perl. Instead, the Paul Ingram Group will be taking up the slack through The
Perl Clinic. The following is a commercial from them:
"Do you need professional support for Perl and/or Oraperl? Do you need a support contract with defined
levels of service? Do you want to pay only for what you need?
"The Paul Ingram Group has provided quality software development and support services to some of the
world‘s largest corporations for ten years. We are now offering the same quality support services for Perl at
The Perl Clinic. This service is led by Tim Bunce, an active perl porter since 1994 and well known as the
author and maintainer of the DBI, DBD::Oracle, and Oraperl modules and author/co−maintainer of The Perl
5 Module List. We also offer Oracle users support for Perl5 Oraperl and related modules (which Oracle is
planning to ship as part of Oracle Web Server 3). 20% of the profit from our Perl support work will be
donated to The Perl Institute."
For more information, contact the The Perl Clinic:
Tel:
Fax:
Web:
Email:
+44 1483 424424
+44 1483 419419
http://www.perl.co.uk/
perl−support−info@perl.co.uk or Tim.Bunce@ig.co.uk
See also www.perl.com for updates on training and support.
Where do I send bug reports?
If you are reporting a bug in the perl interpreter or the modules shipped with perl, use the perlbug program in
the perl distribution or mail your report to perlbug@perl.com.
If you are posting a bug with a non−standard port (see the answer to "What platforms is Perl available for?"),
a binary distribution, or a non−standard module (such as Tk, CGI, etc), then please see the documentation
that came with it to determine the correct place to post bugs.
Read the perlbug(1) man page (perl5.004 or later) for more information.
What is perl.com? perl.org? The Perl Institute?
The perl.com domain is managed by Tom Christiansen, who created it as a public service long before
perl.org came about. Despite the name, it‘s a pretty non−commercial site meant to be a clearinghouse for
information about all things Perlian, accepting no paid advertisements, bouncy happy gifs, or silly java
applets on its pages. The Perl Home Page at http://www.perl.com/ is currently hosted on a T3 line courtesy
of Songline Systems, a software−oriented subsidiary of O‘Reilly and Associates.
18−Oct−1998
Version 5.005_02
35
perlfaq2
Perl Programmers Reference Guide
perlfaq2
perl.org is the official vehicle for The Perl Institute. The motto of TPI is "helping people help Perl help
people" (or something like that). It‘s a non−profit organization supporting development, documentation, and
dissemination of perl.
How do I learn about object−oriented Perl programming?
perltoot (distributed with 5.004 or later) is a good place to start. Also, perlobj, perlref, and perlmod are
useful references, while perlbot has some excellent tips and tricks.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or
otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of
this FAQ outside of that, see perlfaq.
Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged
to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit to the FAQ would be courteous but is not required.
36
Version 5.005_02
18−Oct−1998
perlfaq3
Perl Programmers Reference Guide
perlfaq3
NAME
perlfaq3 − Programming Tools ($Revision: 1.29 $, $Date: 1998/08/05 11:57:04 $)
DESCRIPTION
This section of the FAQ answers questions related to programmer tools and programming support.
How do I do (anything)?
Have you looked at CPAN (see perlfaq2)? The chances are that someone has already written a module that
can solve your problem. Have you read the appropriate man pages? Here‘s a brief index:
Basics
Execution
Functions
Objects
Data Structures
Modules
Regexps
Moving to perl5
Linking w/C
Various
perldata, perlvar, perlsyn, perlop, perlsub
perlrun, perldebug
perlfunc
perlref, perlmod, perlobj, perltie
perlref, perllol, perldsc
perlmod, perlmodlib, perlsub
perlre, perlfunc, perlop, perllocale
perltrap, perl
perlxstut, perlxs, perlcall, perlguts, perlembed
http://www.perl.com/CPAN/doc/FMTEYEWTK/index.html
(not a man−page but still useful)
perltoc provides a crude table of contents for the perl man page set.
How can I use Perl interactively?
The typical approach uses the Perl debugger, described in the perldebug(1) man page, on an ‘‘empty‘’
program, like this:
perl −de 42
Now just type in any legal Perl code, and it will be immediately evaluated. You can also examine the
symbol table, get stack backtraces, check variable values, set breakpoints, and other operations typically
found in symbolic debuggers.
Is there a Perl shell?
In general, no. The Shell.pm module (distributed with perl) makes perl try commands which aren‘t part of
the Perl language as shell commands. perlsh from the source distribution is simplistic and uninteresting, but
may still be what you want.
How do I debug my Perl programs?
Have you used −w? It enables warnings for dubious practices.
Have you tried use strict? It prevents you from using symbolic references, makes you predeclare any
subroutines that you call as bare words, and (probably most importantly) forces you to predeclare your
variables with my or use vars.
Did you check the returns of each and every system call? The operating system (and thus Perl) tells you
whether they worked or not, and if not why.
open(FH, "> /etc/cantwrite")
or die "Couldn’t write to /etc/cantwrite: $!\n";
Did you read perltrap? It‘s full of gotchas for old and new Perl programmers, and even has sections for
those of you who are upgrading from languages like awk and C.
Have you tried the Perl debugger, described in perldebug? You can step through your program and see what
it‘s doing and thus work out why what it‘s doing isn‘t what it should be doing.
18−Oct−1998
Version 5.005_02
37
perlfaq3
Perl Programmers Reference Guide
perlfaq3
How do I profile my Perl programs?
You should get the Devel::DProf module from CPAN, and also use Benchmark.pm from the standard
distribution. Benchmark lets you time specific portions of your code, while Devel::DProf gives detailed
breakdowns of where your code spends its time.
Here‘s a sample use of Benchmark:
use Benchmark;
@junk = ‘cat /etc/motd‘;
$count = 10_000;
timethese($count, {
’map’ => sub { my @a = @junk;
map { s/a/b/ } @a;
return @a
},
’for’ => sub { my @a = @junk;
local $_;
for (@a) { s/a/b/ };
return @a },
});
This is what it prints (on one machine—your results will be dependent on your hardware, operating system,
and the load on your machine):
Benchmark: timing 10000 iterations of for, map...
for: 4 secs ( 3.97 usr 0.01 sys = 3.98 cpu)
map: 6 secs ( 4.97 usr 0.00 sys = 4.97 cpu)
How do I cross−reference my Perl programs?
The B::Xref module, shipped with the new, alpha−release Perl compiler (not the general distribution prior to
the 5.005 release), can be used to generate cross−reference reports for Perl programs.
perl −MO=Xref[,OPTIONS] scriptname.plx
Is there a pretty−printer (formatter) for Perl?
There is no program that will reformat Perl as much as indent(1) does for C. The complex feedback between
the scanner and the parser (this feedback is what confuses the vgrind and emacs programs) makes it
challenging at best to write a stand−alone Perl parser.
Of course, if you simply follow the guidelines in perlstyle, you shouldn‘t need to reformat. The habit of
formatting your code as you write it will help prevent bugs. Your editor can and should help you with this.
The perl−mode for emacs can provide a remarkable amount of help with most (but not all) code, and even
less programmable editors can provide significant assistance.
If you are used to using vgrind program for printing out nice code to a laser printer, you can take a stab at
this using http://www.perl.com/CPAN/doc/misc/tips/working.vgrind.entry, but the results are not particularly
satisfying for sophisticated code.
Is there a ctags for Perl?
There‘s a simple one at http://www.perl.com/CPAN/authors/id/TOMC/scripts/ptags.gz which may do the
trick.
Where can I get Perl macros for vi?
For a complete version of Tom Christiansen‘s vi configuration file, see
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/toms.exrc, the standard benchmark file for vi
emulators. This runs best with nvi, the current version of vi out of Berkeley, which incidentally can be built
with an embedded Perl interpreter — see http://www.perl.com/CPAN/src/misc.
38
Version 5.005_02
18−Oct−1998
perlfaq3
Perl Programmers Reference Guide
perlfaq3
Where can I get perl−mode for emacs?
Since Emacs version 19 patchlevel 22 or so, there have been both a perl−mode.el and support for the perl
debugger built in. These should come with the standard Emacs 19 distribution.
In the perl source directory, you‘ll find a directory called "emacs", which contains a cperl−mode that
color−codes keywords, provides context−sensitive help, and other nifty things.
Note that the perl−mode of emacs will have fits with "main‘foo" (single quote), and mess up the
indentation and hilighting. You should be using "main::foo" in new Perl code anyway, so this shouldn‘t
be an issue.
How can I use curses with Perl?
The Curses module from CPAN provides a dynamically loadable object module interface to a curses library.
A small demo can be found at the directory
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/rep; this program repeats a command and
updates the screen as needed, rendering rep ps axu similar to top.
How can I use X or Tk with Perl?
Tk is a completely Perl−based, object−oriented interface to the Tk toolkit that doesn‘t force you to use Tcl
just to get at Tk. Sx is an interface to the Athena Widget set. Both are available from CPAN. See the
directory http://www.perl.com/CPAN/modules/by−category/08_User_Interfaces/
Invaluable for Perl/Tk programming are: the Perl/Tk FAQ at
http://w4.lns.cornell.edu/~pvhp/ptk/ptkTOC.html , the Perl/Tk Reference Guide available at
http://www.perl.com/CPAN−local/authors/Stephen_O_Lidie/ , and the online manpages at
http://www−users.cs.umn.edu/~amundson/perl/perltk/toc.html .
How can I generate simple menus without using CGI or Tk?
The http://www.perl.com/CPAN/authors/id/SKUNZ/perlmenu.v4.0.tar.gz module, which is curses−based,
can help with this.
What is undump?
See the next questions.
How can I make my Perl program run faster?
The best way to do this is to come up with a better algorithm. This can often make a dramatic difference.
Chapter 8 in the Camel has some efficiency tips in it you might want to look at. Jon Bentley‘s book
‘‘Programming Pearls‘’ (that‘s not a misspelling!) has some good tips on optimization, too. Advice on
benchmarking boils down to: benchmark and profile to make sure you‘re optimizing the right part, look for
better algorithms instead of microtuning your code, and when all else fails consider just buying faster
hardware.
A different approach is to autoload seldom−used Perl code. See the AutoSplit and AutoLoader modules in
the standard distribution for that. Or you could locate the bottleneck and think about writing just that part in
C, the way we used to take bottlenecks in C code and write them in assembler. Similar to rewriting in C is
the use of modules that have critical sections written in C (for instance, the PDL module from CPAN).
In some cases, it may be worth it to use the backend compiler to produce byte code (saving compilation
time) or compile into C, which will certainly save compilation time and sometimes a small amount (but not
much) execution time. See the question about compiling your Perl programs for more on the compiler—the
wins aren‘t as obvious as you‘d hope.
If you‘re currently linking your perl executable to a shared libc.so, you can often gain a 10−25%
performance benefit by rebuilding it to link with a static libc.a instead. This will make a bigger perl
executable, but your Perl programs (and programmers) may thank you for it. See the INSTALL file in the
source distribution for more information.
Unsubstantiated reports allege that Perl interpreters that use sfio outperform those that don‘t (for IO intensive
applications). To try this, see the INSTALL file in the source distribution, especially the ‘‘Selecting File IO
18−Oct−1998
Version 5.005_02
39
perlfaq3
Perl Programmers Reference Guide
perlfaq3
mechanisms‘’ section.
The undump program was an old attempt to speed up your Perl program by storing the already−compiled
form to disk. This is no longer a viable option, as it only worked on a few architectures, and wasn‘t a good
solution anyway.
How can I make my Perl program take less memory?
When it comes to time−space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in
Perl use more memory than strings in C, arrays take more that, and hashes use even more. While there‘s still
a lot to be done, recent releases have been addressing these issues. For example, as of 5.004, duplicate hash
keys are shared amongst all hashes using them, so require no reallocation.
In some cases, using substr() or vec() to simulate arrays can be highly beneficial. For example, an
array of a thousand booleans will take at least 20,000 bytes of space, but it can be turned into one 125−byte
bit vector for a considerable memory savings. The standard Tie::SubstrHash module can also help for
certain types of data structure. If you‘re working with specialist data structures (matrices, for instance)
modules that implement these in C may use less memory than equivalent Perl modules.
Another thing to try is learning whether your Perl was compiled with the system malloc or with Perl‘s builtin
malloc. Whichever one it is, try using the other one and see whether this makes a difference. Information
about malloc is in the INSTALL file in the source distribution. You can find out whether you are using
perl‘s malloc by typing perl −V:usemymalloc.
Is it unsafe to return a pointer to local data?
No, Perl‘s garbage collection system takes care of this.
sub makeone {
my @a = ( 1 .. 10 );
return \@a;
}
for $i ( 1 .. 10 ) {
push @many, makeone();
}
print $many[4][5], "\n";
print "@many\n";
How can I free an array or hash so my program shrinks?
You can‘t. On most operating systems, memory allocated to a program can never be returned to the system.
That‘s why long−running programs sometimes re−exec themselves. Some operating systems (notably,
FreeBSD) allegedly reclaim large chunks of memory that is no longer used, but it doesn‘t appear to happen
with Perl (yet). The Mac appears to be the only platform that will reliably (albeit, slowly) return memory to
the OS.
However, judicious use of my() on your variables will help make sure that they go out of scope so that Perl
can free up their storage for use in other parts of your program. A global variable, of course, never goes out
of scope, so you can‘t get its space automatically reclaimed, although undef()ing and/or delete()ing it
will achieve the same effect. In general, memory allocation and de−allocation isn‘t something you can or
should be worrying about much in Perl, but even this capability (preallocation of data types) is in the works.
How can I make my CGI script more efficient?
Beyond the normal measures described to make general Perl programs faster or smaller, a CGI program has
additional issues. It may be run several times per second. Given that each time it runs it will need to be
re−compiled and will often allocate a megabyte or more of system memory, this can be a killer. Compiling
into C isn‘t going to help you because the process start−up overhead is where the bottleneck is.
There are two popular ways to avoid this overhead. One solution involves running the Apache HTTP server
(available from http://www.apache.org/) with either of the mod_perl or mod_fastcgi plugin modules.
40
Version 5.005_02
18−Oct−1998
perlfaq3
Perl Programmers Reference Guide
perlfaq3
With mod_perl and the Apache::Registry module (distributed with mod_perl), httpd will run with an
embedded Perl interpreter which pre−compiles your script and then executes it within the same address
space without forking. The Apache extension also gives Perl access to the internal server API, so modules
written in Perl can do just about anything a module written in C can. For more on mod_perl, see
http://perl.apache.org/
With the FCGI module (from CPAN), a Perl executable compiled with sfio (see the INSTALL file in the
distribution) and the mod_fastcgi module (available from http://www.fastcgi.com/) each of your perl scripts
becomes a permanent CGI daemon process.
Both of these solutions can have far−reaching effects on your system and on the way you write your CGI
scripts, so investigate them with care.
See http://www.perl.com/CPAN/modules/by−category/15_World_Wide_Web_HTML_HTTP_CGI/ .
A non−free, commerical product, ‘‘The Velocity Engine for Perl‘’, (http://www.binevolve.com/ or
http://www.binevolve.com/bine/vep) might also be worth looking at. It will allow you to increase the
performance of your perl scripts, upto 25 times faster than normal CGI perl by running in persistent perl
mode, or 4 to 5 times faster without any modification to your existing CGI scripts. Fully functional
evaluation copies are available from the web site.
How can I hide the source for my Perl program?
Delete it. :−) Seriously, there are a number of (mostly unsatisfactory) solutions with varying levels of
‘‘security‘’.
First of all, however, you can‘t take away read permission, because the source code has to be readable in
order to be compiled and interpreted. (That doesn‘t mean that a CGI script‘s source is readable by people on
the web, though, only by people with access to the filesystem) So you have to leave the permissions at the
socially friendly 0755 level.
Some people regard this as a security problem. If your program does insecure things, and relies on people
not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to determine
the insecure things and exploit them without viewing the source. Security through obscurity, the name for
hiding your bugs instead of fixing them, is little security indeed.
You can try using encryption via source filters (Filter::* from CPAN), but crackers might be able to decrypt
it. You can try using the byte code compiler and interpreter described below, but crackers might be able to
de−compile it. You can try using the native−code compiler described below, but crackers might be able to
disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can
definitively conceal it (this is true of every language, not just Perl).
If you‘re concerned about people profiting from your code, then the bottom line is that nothing but a
restrictive licence will give you legal security. License your software and pepper it with threatening
statements like ‘‘This is unpublished proprietary software of XYZ Corp. Your access to it does not give you
permission to use it blah blah blah.‘’ We are not lawyers, of course, so you should see a lawyer if you want
to be sure your licence‘s wording will stand up in court.
How can I compile my Perl program into byte code or C?
Malcolm Beattie has written a multifunction backend compiler, available from CPAN, that can do both these
things. It is included in the perl5.005 release, but is still considered experimental. This means it‘s fun to play
with if you‘re a programmer but not really for people looking for turn−key solutions.
Merely compiling into C does not in and of itself guarantee that your code will run very much faster. That‘s
because except for lucky cases where a lot of native type inferencing is possible, the normal Perl run time
system is still present and so your program will take just as long to run and be just as big. Most programs
save little more than compilation time, leaving execution no more than 10−30% faster. A few rare programs
actually benefit significantly (like several times faster), but this takes some tweaking of your code.
You‘ll probably be astonished to learn that the current version of the compiler generates a compiled form of
your script whose executable is just as big as the original perl executable, and then some. That‘s because as
18−Oct−1998
Version 5.005_02
41
perlfaq3
Perl Programmers Reference Guide
perlfaq3
currently written, all programs are prepared for a full eval() statement. You can tremendously reduce this
cost by building a shared libperl.so library and linking against that. See the INSTALL podfile in the perl
source distribution for details. If you link your main perl binary with this, it will make it miniscule. For
example, on one author‘s system, /usr/bin/perl is only 11k in size!
In general, the compiler will do nothing to make a Perl program smaller, faster, more portable, or more
secure. In fact, it will usually hurt all of those. The executable will be bigger, your VM system may take
longer to load the whole thing, the binary is fragile and hard to fix, and compilation never stopped software
piracy in the form of crackers, viruses, or bootleggers. The real advantage of the compiler is merely
packaging, and once you see the size of what it makes (well, unless you use a shared libperl.so), you‘ll
probably want a complete Perl install anyway.
How can I get #!perl to work on [MS−DOS,NT,...]?
For OS/2 just use
extproc perl −S −your_switches
as the first line in *.cmd file (−S due to a bug in cmd.exe‘s ‘extproc’ handling). For DOS one should first
invent a corresponding batch file, and codify it in ALTERNATIVE_SHEBANG (see the INSTALL file in the
source distribution for more information).
The Win95/NT installation, when using the ActiveState port of Perl, will modify the Registry to associate the
.pl extension with the perl interpreter. If you install another port (Gurusaramy Sarathy‘s is the
recommended Win95/NT port), or (eventually) build your own Win95/NT Perl using WinGCC, then you‘ll
have to modify the Registry yourself.
Macintosh perl scripts will have the the appropriate Creator and Type, so that double−clicking them will
invoke the perl application.
IMPORTANT!: Whatever you do, PLEASE don‘t get frustrated, and just throw the perl interpreter into your
cgi−bin directory, in order to get your scripts working for a web server. This is an EXTREMELY big
security risk. Take the time to figure out how to do it correctly.
Can I write useful perl programs on the command line?
Yes. Read perlrun for more information. Some examples follow. (These assume standard Unix shell
quoting rules.)
# sum first and last fields
perl −lane ’print $F[0] + $F[−1]’ *
# identify text files
perl −le ’for(@ARGV) {print if −f && −T _}’ *
# remove (most) comments from C program
perl −0777 −pe ’s{/\*.*?\*/}{}gs’ foo.c
# make file a month younger than today, defeating reaper daemons
perl −e ’$X=24*60*60; utime(time(),time() + 30 * $X,@ARGV)’ *
# find first unused uid
perl −le ’$i++ while getpwuid($i); print $i’
# display reasonable manpath
echo $PATH | perl −nl −072 −e ’
s![^/+]*$!man!&&−d&&!$s{$_}++&&push@m,$_;END{print"@m"}’
Ok, the last one was actually an obfuscated perl entry. :−)
Why don‘t perl one−liners work on my DOS/Mac/VMS system?
The problem is usually that the command interpreters on those systems have rather different ideas about
quoting than the Unix shells under which the one−liners were created. On some systems, you may have to
change single−quotes to double ones, which you must NOT do on Unix or Plan9 systems. You might also
42
Version 5.005_02
18−Oct−1998
perlfaq3
Perl Programmers Reference Guide
perlfaq3
have to change a single % to a %%.
For example:
# Unix
perl −e ’print "Hello world\n"’
# DOS, etc.
perl −e "print \"Hello world\n\""
# Mac
print "Hello world\n"
(then Run "Myscript" or Shift−Command−R)
# VMS
perl −e "print ""Hello world\n"""
The problem is that none of this is reliable: it depends on the command interpreter. Under Unix, the first two
often work. Under DOS, it‘s entirely possible neither works. If 4DOS was the command shell, you‘d
probably have better luck like this:
perl −e "print "Hello world\n""
Under the Mac, it depends which environment you are using. The MacPerl shell, or MPW, is much like
Unix shells in its support for several quoting variants, except that it makes free use of the Mac‘s non−ASCII
characters as control characters.
There is no general solution to all of this. It is a mess, pure and simple. Sucks to be away from Unix, huh?
:−)
[Some of this answer was contributed by Kenneth Albanowski.]
Where can I learn about CGI or Web programming in Perl?
For modules, get the CGI or LWP modules from CPAN. For textbooks, see the two especially dedicated to
web stuff in the question on books. For problems and questions related to the web, like ‘‘Why do I get 500
Errors‘’ or ‘‘Why doesn‘t it run from the browser right when it runs fine on the command line‘’, see these
sources:
WWW Security FAQ
http://www.w3.org/Security/Faq/
Web FAQ
http://www.boutell.com/faq/
CGI FAQ
http://www.webthing.com/page.cgi/cgifaq
HTTP Spec
http://www.w3.org/pub/WWW/Protocols/HTTP/
HTML Spec
http://www.w3.org/TR/REC−html40/
http://www.w3.org/pub/WWW/MarkUp/
CGI Spec
http://www.w3.org/CGI/
CGI Security FAQ
http://www.go2net.com/people/paulp/cgi−security/safe−cgi.txt
Where can I learn about object−oriented Perl programming?
perltoot is a good place to start, and you can use perlobj and perlbot for reference. Perltoot didn‘t come out
until the 5.004 release, but you can get a copy (in pod, html, or postscript) from
http://www.perl.com/CPAN/doc/FMTEYEWTK/ .
18−Oct−1998
Version 5.005_02
43
perlfaq3
Perl Programmers Reference Guide
perlfaq3
Where can I learn about linking C with Perl? [h2xs, xsubpp]
If you want to call C from Perl, start with perlxstut, moving on to perlxs, xsubpp, and perlguts. If you want
to call Perl from C, then read perlembed, perlcall, and perlguts. Don‘t forget that you can learn a lot from
looking at how the authors of existing extension modules wrote their code and solved their problems.
I‘ve read perlembed, perlguts, etc., but I can‘t embed perl in
my C program, what am I doing wrong?
Download the ExtUtils::Embed kit from CPAN and run ‘make test’. If the tests pass, read the pods again
and again and again. If they fail, see perlbug and send a bugreport with the output of make test
TEST_VERBOSE=1 along with perl −V.
When I tried to run my script, I got this message. What does it
mean?
perldiag has a complete list of perl‘s error messages and warnings, with explanatory text. You can also use
the splain program (distributed with perl) to explain the error messages:
perl program 2>diag.out
splain [−v] [−p] diag.out
or change your program to explain the messages for you:
use diagnostics;
or
use diagnostics −verbose;
What‘s MakeMaker?
This module (part of the standard perl distribution) is designed to write a Makefile for an extension module
from a Makefile.PL. For more information, see ExtUtils::MakeMaker.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or
otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of
this FAQ outside of that, see perlfaq.
Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged
to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit to the FAQ would be courteous but is not required.
44
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
NAME
perlfaq4 − Data Manipulation ($Revision: 1.26 $, $Date: 1998/08/05 12:04:00 $)
DESCRIPTION
The section of the FAQ answers question related to the manipulation of data as numbers, dates, strings,
arrays, hashes, and miscellaneous data issues.
Data: Numbers
Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting
(eg, 19.95)?
The infinite set that a mathematician thinks of as the real numbers can only be approximate on a computer,
since the computer only has a finite number of bits to store an infinite number of, um, numbers.
Internally, your computer represents floating−point numbers in binary. Floating−point numbers read in from
a file or appearing as literals in your program are converted from their decimal floating−point representation
(eg, 19.95) to the internal binary representation.
However, 19.95 can‘t be precisely represented as a binary floating−point number, just like 1/3 can‘t be
exactly represented as a decimal floating−point number. The computer‘s binary representation of 19.95,
therefore, isn‘t exactly 19.95.
When a floating−point number gets printed, the binary floating−point representation is converted back to
decimal. These decimal numbers are displayed in either the format you specify with printf(), or the
current output format for numbers (see $# in perlvar if you use print. $# has a different default value
in Perl5 than it did in Perl4. Changing $# yourself is deprecated.
This affects all computer languages that represent decimal floating−point numbers in binary, not just Perl.
Perl provides arbitrary−precision decimal numbers with the Math::BigFloat module (part of the standard Perl
distribution), but mathematical operations are consequently slower.
To get rid of the superfluous digits, just use a format (eg, printf("%.2f", 19.95)) to get the required
precision. See Floating−point Arithmetic in perlop.
Why isn‘t my octal data interpreted correctly?
Perl only understands octal and hex numbers as such when they occur as literals in your program. If they are
read in from somewhere and assigned, no automatic conversion takes place. You must explicitly use oct()
or hex() if you want the values converted. oct() interprets both hex ("0x350") numbers and octal ones
("0350" or even without the leading "0", like "377"), while hex() only converts hexadecimal ones, with or
without a leading "0x", like "0x255", "3A", "ff", or "deadbeef".
This problem shows up most often when people try using chmod(), mkdir(), umask(), or
sysopen(), which all want permissions in octal.
chmod(644, $file); # WRONG −− perl −w catches this
chmod(0644, $file); # right
Does perl have a round function? What about ceil() and floor()? Trig functions?
Remember that int() merely truncates toward 0. For rounding to a certain number of digits, sprintf()
or printf() is usually the easiest route.
printf("%.3f", 3.1415926535);
# prints 3.142
The POSIX module (part of the standard perl distribution) implements ceil(), floor(), and a number of
other mathematical and trigonometric functions.
use POSIX;
$ceil
= ceil(3.5);
$floor = floor(3.5);
# 4
# 3
In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig
18−Oct−1998
Version 5.005_02
45
perlfaq4
Perl Programmers Reference Guide
perlfaq4
module (part of the standard perl distribution) implements the trigonometric functions. Internally it uses the
Math::Complex module and some functions can break out from the real axis into the complex plane, for
example the inverse sine of 2.
Rounding in financial applications can have serious implications, and the rounding method used should be
specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by
Perl, but to instead implement the rounding function you need yourself.
How do I convert bits into ints?
To turn a string of 1s and 0s like 10110110 into a scalar containing its binary value, use the pack()
function (documented in pack in perlfunc):
$decimal = pack(’B8’, ’10110110’);
Here‘s an example of going the other way:
$binary_string = join(’’, unpack(’B*’, "\x29"));
How do I multiply matrices?
Use the Math::Matrix or Math::MatrixReal modules (available from CPAN) or the PDL extension (also
available from CPAN).
How do I perform an operation on a series of integers?
To call a function on each element in an array, and collect the results, use:
@results = map { my_func($_) } @array;
For example:
@triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the results:
foreach $iterator (@array) {
&my_func($iterator);
}
To call a function on each integer in a (small) range, you can use:
@results = map { &my_func($_) } (5 .. 25);
but you should be aware that the .. operator creates an array of all integers in the range. This can take a lot
of memory for large ranges. Instead use:
@results = ();
for ($i=5; $i < 500_005; $i++) {
push(@results, &my_func($i));
}
How can I output Roman numerals?
Get the http://www.perl.com/CPAN/modules/by−module/Roman module.
Why aren‘t my random numbers random?
The short explanation is that you‘re getting pseudorandom numbers, not random ones, because computers
are good at being predictable and bad at being random (despite appearances caused by bugs in your programs
:−). A longer explanation is available on http://www.perl.com/CPAN/doc/FMTEYEWTK/random, courtesy
of Tom Phoenix. John von Neumann said, ‘‘Anyone who attempts to generate random numbers by
deterministic means is, of course, living in a state of sin.‘’
You should also check out the Math::TrulyRandom module from CPAN. It uses the imperfections in your
system‘s timer to generate random numbers, but this takes quite a while. If you want a better pseudorandom
generator than comes with your operating system, look at ‘‘Numerical Recipes in C‘’ at
http://nr.harvard.edu/nr/bookc.html .
46
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
Data: Dates
How do I find the week−of−the−year/day−of−the−year?
The day of the year is in the array returned by localtime() (see localtime in perlfunc):
$day_of_year = (localtime(time()))[7];
or more legibly (in 5.004 or higher):
use Time::localtime;
$day_of_year = localtime(time())−>yday;
You can find the week of the year by dividing this by 7:
$week_of_year = int($day_of_year / 7);
Of course, this believes that weeks start at zero. The Date::Calc module from CPAN has a lot of date
calculation functions, including day of the year, week of the year, and so on. Note that not all business
consider ‘‘week 1‘’ to be the same; for example, American business often consider the first week with a
Monday in it to be Work Week #1, despite ISO 8601, which consider WW1 to be the frist week with a
Thursday in it.
How can I compare two dates and find the difference?
If you‘re storing your dates as epoch seconds then simply subtract one from the other. If you‘ve got a
structured date (distinct year, day, month, hour, minute, seconds values) then use one of the Date::Manip and
Date::Calc modules from CPAN.
How can I take a string and turn it into epoch seconds?
If it‘s a regular enough string that it always has the same format, you can split it up and pass the parts to
timelocal in the standard Time::Local module. Otherwise, you should look into the Date::Calc and
Date::Manip modules from CPAN.
How can I find the Julian Day?
Neither Date::Manip nor Date::Calc deal with Julian days. Instead, there is an example of Julian date
calculation that should help you in
http://www.perl.com/CPAN/authors/David_Muir_Sharnoff/modules/Time/JulianDay.pm.gz .
Does Perl have a year 2000 problem? Is Perl Y2K compliant?
Short answer: No, Perl does not have a Year 2000 problem. Yes, Perl is Y2K compliant. The programmers
you‘re hired to use it, however, probably are not.
Long answer: Perl is just as Y2K compliant as your pencil—no more, and no less. The date and time
functions supplied with perl (gmtime and localtime) supply adequate information to determine the year well
beyond 2000 (2038 is when trouble strikes for 32−bit machines). The year returned by these functions when
used in an array context is the year minus 1900. For years between 1910 and 1999 this happens to be a
2−digit decimal number. To avoid the year 2000 problem simply do not treat the year as a 2−digit number. It
isn‘t.
When gmtime() and localtime() are used in scalar context they return a timestamp string that
contains a fully−expanded year. For example, $timestamp = gmtime(1005613200) sets
$timestamp to "Tue Nov 13 01:00:00 2001". There‘s no year 2000 problem here.
That doesn‘t mean that Perl can‘t be used to create non−Y2K compliant programs. It can. But so can your
pencil. It‘s the fault of the user, not the language. At the risk of inflaming the NRA: ‘‘Perl doesn‘t break
Y2K, people do.‘’ See http://language.perl.com/news/y2k.html for a longer exposition.
Data: Strings
How do I validate input?
The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more
specific questions (numbers, mail addresses, etc.) for details.
18−Oct−1998
Version 5.005_02
47
perlfaq4
Perl Programmers Reference Guide
perlfaq4
How do I unescape a string?
It depends just what you mean by ‘‘escape‘’. URL escapes are dealt with in perlfaq9. Shell escapes with the
backslash (\) character are removed with:
s/\\(.)/$1/g;
This won‘t expand "\n" or "\t" or any other special escapes.
How do I remove consecutive pairs of characters?
To turn "abbcccd" into "abccd":
s/(.)\1/$1/g;
How do I expand function calls in a string?
This is documented in perlref. In general, this is fraught with quoting and readability problems, but it is
possible. To interpolate a subroutine call (in list context) into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:
print "That yields ${\($n + 5)} widgets\n";
Version 5.004 of Perl had a bug that gave list context to the expression in ${...}, but this is fixed in
version 5.005.
See also ‘‘How can I expand variables in text strings?‘’ in this section of the FAQ.
How do I find matching/nesting anything?
This isn‘t something that can be done in one regular expression, no matter how complicated. To find
something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1.
For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these
deals with nested patterns, nor can they. For that you‘ll have to write a parser.
If you are serious about writing a parser, there are a number of modules or oddities that will make your life a
lot easier. There is the CPAN module Parse::RecDescent, the standard module Text::Balanced, the byacc
program, and Mark−Jason Dominus‘s excellent py tool at http://www.plover.com/~mjd/perl/py/ .
One simple destructive, inside−out approach that you might try is to pull out the smallest nesting parts one at
a time:
while (s//BEGIN((?:(?!BEGIN)(?!END).)*)END/gs) {
# do something with $1
}
How do I reverse a string?
Use reverse() in scalar context, as documented in reverse.
$reversed = reverse $string;
How do I expand tabs in a string?
You can do it yourself:
1 while $string =~ s/\t+/’ ’ x (length($&) * 8 − length($‘) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard perl distribution).
use Text::Tabs;
@expanded_lines = expand(@lines_with_tabs);
How do I reformat a paragraph?
Use Text::Wrap (part of the standard perl distribution):
48
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
use Text::Wrap;
print wrap("\t", ’
perlfaq4
’, @paragraphs);
The paragraphs you give to Text::Wrap should not contain embedded newlines. Text::Wrap doesn‘t justify
the lines (flush−right).
How can I access/change the first N letters of a string?
There are many ways. If you just want to grab a copy, use substr():
$first_byte = substr($a, 0, 1);
If you want to modify part of a string, the simplest way is often to use substr() as an lvalue:
substr($a, 0, 3) = "Tom";
Although those with a pattern matching kind of thought process will likely prefer:
$a =~ s/^.../Tom/;
How do I change the Nth occurrence of something?
You have to keep track of N yourself. For example, let‘s say you want to change the fifth occurrence of
"whoever" or "whomever" into "whosoever" or "whomsoever", case insensitively.
$count = 0;
s{((whom?)ever)}{
++$count == 5
? "${2}soever"
: $1
}igex;
# is it the 5th?
# yes, swap
# renege and leave it there
In the more general case, you can use the /g modifier in a while loop, keeping count of matches.
$WANT = 3;
$count = 0;
while (/(\w+)\s+fish\b/gi) {
if (++$count == $WANT) {
print "The third fish is a $1 one.\n";
# Warning: don’t ‘last’ out of this loop
}
}
That prints out: "The third fish is a red one." You can also use a repetition count and
repeated pattern like this:
/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;
How can I count the number of occurrences of a substring within a string?
There are a number of ways, with varying efficiency: If you want a count of a certain single character (X)
within a string, you can use the tr/// function like so:
$string = "ThisXlineXhasXsomeXx’sXinXit":
$count = ($string =~ tr/X//);
print "There are $count X charcters in the string";
This is fine if you are just looking for a single character. However, if you are trying to count multiple
character substrings within a larger string, tr/// won‘t work. What you can do is wrap a while() loop
around a global pattern match. For example, let‘s count negative integers:
$string = "−9 55 48 −2 23 −76 4 14 −44";
while ($string =~ /−\d+/g) { $count++ }
print "There are $count negative numbers in the string";
18−Oct−1998
Version 5.005_02
49
perlfaq4
Perl Programmers Reference Guide
perlfaq4
How do I capitalize all the words on one line?
To make the first letter of each word upper case:
$line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning "don‘t do it" into "Don‘T Do It". Sometimes you might want
this, instead (Suggested by Brian Foy):
$string =~ s/ (
(^\w)
#at the beginning of the line
|
# or
(\s\w)
#preceded by whitespace
)
/\U$1/xg;
$string =~ /([\w’]+)/\u\L$1/g;
To make the whole line upper case:
$line = uc($line);
To force each word to be lower case, with the first letter upper case:
$line =~ s/(\w+)/\u\L$1/g;
You can (and probably should) enable locale awareness of those characters by placing a use locale
pragma in your program. See perllocale for endless details on locales.
How can I split a [character] delimited string except when inside
[character]? (Comma−separated files)
Take the example case of trying to split a string that is comma−separated into its different fields. (We‘ll
pretend you said comma−separated, not comma−delimited, which is different and almost never what you
mean.) You can‘t use split(/,/) because you shouldn‘t split if the comma is inside quotes. For
example, take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl,
author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming
your string is contained in $text):
@new = ();
push(@new, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text,−1,1) eq ’,’;
If you want to represent quotation marks inside a quotation−mark−delimited field, escape them with
backslashes (eg, "like \"this\"". Unescaping them is a task addressed earlier in this section.
Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:
use Text::ParseWords;
@new = quotewords(",", 0, $text);
How do I strip blank space from the beginning/end of a string?
Although the simplest approach would seem to be:
$string =~ s/^\s*(.*?)\s*$/$1/;
50
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
This is unneccesarily slow, destructive, and fails with embedded newlines. It is much better faster to do this
in two steps:
$string =~ s/^\s+//;
$string =~ s/\s+$//;
Or more nicely written as:
for ($string) {
s/^\s+//;
s/\s+$//;
}
This idiom takes advantage of the foreach loop‘s aliasing behavior to factor out common code. You can
do this on several strings at once, or arrays, or even the values of a hash if you use a slide:
# trim whitespace in the scalar, the array,
# and all the values in the hash
foreach ($scalar, @array, @hash{keys %hash}) {
s/^\s+//;
s/\s+$//;
}
How do I extract selected columns from a string?
Use substr() or unpack(), both documented in perlfunc. If you prefer thinking in terms of columns
instead of widths, you can use this kind of thing:
# determine the unpack format needed to split Linux ps output
# arguments are cut columns
my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72);
sub cut2fmt {
my(@positions) = @_;
my $template = ’’;
my $lastpos
= 1;
for my $place (@positions) {
$template .= "A" . ($place − $lastpos) . " ";
$lastpos
= $place;
}
$template .= "A*";
return $template;
}
How do I find the soundex value of a string?
Use the standard Text::Soundex module distributed with perl.
How can I expand variables in text strings?
Let‘s assume that you have a string like:
$text = ’this has a $foo in it and a $bar’;
If those were both global variables, then this would suffice:
$text =~ s/\$(\w+)/${$1}/g;
But since they are probably lexicals, or at least, they could be, you‘d have to do this:
$text =~ s/(\$\w+)/$1/eeg;
die if $@;
# needed on /ee, not /e
It‘s probably better in the general case to treat those variables as entries in some special hash. For example:
18−Oct−1998
Version 5.005_02
51
perlfaq4
Perl Programmers Reference Guide
perlfaq4
%user_defs = (
foo => 23,
bar => 19,
);
$text =~ s/\$(\w+)/$user_defs{$1}/g;
See also ‘‘How do I expand function calls in a string?‘’ in this section of the FAQ.
What‘s wrong with always quoting "$vars"?
The problem is that those double−quotes force stringification, coercing numbers and references into strings,
even when you don‘t want them to be.
If you get used to writing odd things like these:
print "$var";
$new = "$old";
somefunc("$var");
# BAD
# BAD
# BAD
You‘ll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
print $var;
$new = $old;
somefunc($var);
Otherwise, besides slowing you down, you‘re going to break code when the thing in the scalar is actually
neither a string nor a number, but a reference:
func(\@array);
sub func {
my $aref = shift;
my $oref = "$aref";
}
# WRONG
You can also get into subtle problems on those few operations in Perl that actually do care about the
difference between a string and a number, such as the magical ++ autoincrement operator or the
syscall() function.
Stringification also destroys arrays.
@lines = ‘command‘;
print "@lines";
print @lines;
# WRONG − extra blanks
# right
Why don‘t my <op_ppaddr)() ) ;
@@@
TAINT_NOT;
@@@
return 0;
@@@ }
MAIN_INTERPRETER_LOOP
Or with a fixed amount of leading white space, with remaining indentation correctly preserved:
$poem = fix< 1 ? \@intersection : \@difference }, $element;
}
How do I find the first array element for which a condition is true?
You can use this if you care about the index:
for ($i=0; $i < @array; $i++) {
if ($array[$i] eq "Waldo") {
$found_index = $i;
last;
}
}
Now $found_index has what you want.
How do I handle linked lists?
In general, you usually don‘t need a linked list in Perl, since with regular arrays, you can push and pop or
shift and unshift at either end, or you can use splice to add and/or remove arbitrary number of elements at
arbitrary points. Both pop and shift are both O(1) operations on perl‘s dynamic arrays. In the absence of
shifts and pops, push in general needs to reallocate on the order every log(N) times, and unshift will need to
copy pointers each time.
If you really, really wanted, you could use structures as described in perldsc or perltoot and do just what the
algorithm book tells you to do.
How do I handle circular lists?
Circular lists could be handled in the traditional fashion with linked lists, or you could just do something like
this with an array:
unshift(@array, pop(@array));
push(@array, shift(@array));
18−Oct−1998
# the last shall be first
# and vice versa
Version 5.005_02
55
perlfaq4
Perl Programmers Reference Guide
perlfaq4
How do I shuffle an array randomly?
Use this:
# fisher_yates_shuffle( \@array ) :
# generate a random permutation of @array in place
sub fisher_yates_shuffle {
my $array = shift;
my $i;
for ($i = @$array; −−$i; ) {
my $j = int rand ($i+1);
next if $i == $j;
@$array[$i,$j] = @$array[$j,$i];
}
}
fisher_yates_shuffle( \@array );
# permutes @array in place
You‘ve probably seen shuffling algorithms that works using splice, randomly picking another element to
swap the current element with:
srand;
@new = ();
@old = 1 .. 10; # just a demo
while (@old) {
push(@new, splice(@old, rand @old, 1));
}
This is bad because splice is already O(N), and since you do it N times, you just invented a quadratic
algorithm; that is, O(N**2). This does not scale, although Perl is so efficient that you probably won‘t notice
this until you have rather largish arrays.
How do I process/modify each element of an array?
Use for/foreach:
for (@lines) {
s/foo/bar/;
y/XZ/ZX/;
}
# change that word
# swap those letters
Here‘s another; let‘s compute spherical volumes:
for (@volumes = @radii) {
$_ **= 3;
$_ *= (4/3) * 3.14159;
}
# @volumes has changed parts
# this will be constant folded
If you want to do the same thing to modify the values of the hash, you may not use the values function,
oddly enough. You need a slice:
for $orbit ( @orbits{keys %orbits} ) {
($orbit **= 3) *= (4/3) * 3.14159;
}
How do I select a random element from an array?
Use the rand() function (see rand):
# at the top of the program:
srand;
# not needed for 5.004 and later
# then later on
56
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
$index
= rand @array;
$element = $array[$index];
Make sure you only call srand once per program, if then. If you are calling it more than once (such as before
each call to rand), you‘re almost certainly doing something wrong.
How do I permute N elements of a list?
Here‘s a little program that generates all permutations of all the words on each line of input. The algorithm
embodied in the permute() function should work on any list:
#!/usr/bin/perl −n
# tsc−permute: permute each word of input
permute([split], []);
sub permute {
my @items = @{ $_[0] };
my @perms = @{ $_[1] };
unless (@items) {
print "@perms\n";
} else {
my(@newitems,@newperms,$i);
foreach $i (0 .. $#items) {
@newitems = @items;
@newperms = @perms;
unshift(@newperms, splice(@newitems, $i, 1));
permute([@newitems], [@newperms]);
}
}
}
How do I sort an array by (anything)?
Supply a comparison function to sort() (described in sort):
@list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would sort (1, 2, 10) into (1, 10, 2).
<=>, used above, is the numerical comparison operator.
If you have a complicated function needed to pull out the part you want to sort on, then don‘t do it inside the
sort function. Pull it out first, because the sort BLOCK can be called many times for the same element.
Here‘s an example of how to pull out the first word after the first number on each item, and then sort those
words case−insensitively.
@idx = ();
for (@data) {
($item) = /\d+\s*(\S+)/;
push @idx, uc($item);
}
@sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
Which could also be written this way, using a trick that‘s come to be known as the Schwartzian Transform:
@sorted = map { $_−>[0] }
sort { $a−>[1] cmp $b−>[1] }
map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
If you need to sort on several fields, the following paradigm is useful.
@sorted = sort { field1($a) <=> field1($b) ||
field2($a) cmp field2($b) ||
field3($a) cmp field3($b)
18−Oct−1998
Version 5.005_02
57
perlfaq4
Perl Programmers Reference Guide
}
perlfaq4
@data;
This can be conveniently combined with precalculation of keys as given above.
See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about this approach.
See also the question below on sorting hashes.
How do I manipulate arrays of bits?
Use pack() and unpack(), or else vec() and the bitwise operations.
For example, this sets $vec to have bit N set if $ints[N] was set:
$vec = ’’;
foreach(@ints) { vec($vec,$_,1) = 1 }
And here‘s how, given a vector in $vec, you can get those bits into your @ints array:
sub bitvec_to_list {
my $vec = shift;
my @ints;
# Find null−byte density then select best algorithm
if ($vec =~ tr/\0// / length $vec > 0.95) {
use integer;
my $i;
# This method is faster with mostly null−bytes
while($vec =~ /[^\0]/g ) {
$i = −9 + 8 * pos $vec;
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
push @ints, $i if vec($vec, ++$i, 1);
}
} else {
# This method is a fast general algorithm
use integer;
my $bits = unpack "b*", $vec;
push @ints, 0 if $bits =~ s/^(\d)// && $1;
push @ints, pos $bits while($bits =~ /1/g);
}
return \@ints;
}
This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)
Why does defined() return true on empty arrays and hashes?
See defined in the 5.004 release or later of Perl.
Data: Hashes (Associative Arrays)
How do I process an entire hash?
Use the each() function (see each) if you don‘t care whether it‘s sorted:
while ( ($key, $value) = each %hash) {
print "$key = $value\n";
}
58
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
If you want it sorted, you‘ll have to use foreach() on the result of sorting the keys as shown in an earlier
question.
What happens if I add or remove keys from a hash while iterating over it?
Don‘t do that.
How do I look up a hash element by value?
Create a reverse hash:
%by_value = reverse %by_key;
$key = $by_value{$value};
That‘s not particularly efficient. It would be more space−efficient to use:
while (($key, $value) = each %by_key) {
$by_value{$value} = $key;
}
If your hash could have repeated values, the methods above will only find one of the associated keys. This
may or may not worry you.
How can I know how many entries are in a hash?
If you mean how many keys, then all you have to do is take the scalar sense of the keys() function:
$num_keys = scalar keys %hash;
In void context it just resets the iterator, which is faster for tied hashes.
How do I sort a hash (optionally by value instead of key)?
Internally, hashes are stored in a way that prevents you from imposing an order on key−value pairs. Instead,
you have to sort a list of the keys or values:
@keys = sort keys %hash;
# sorted by key
@keys = sort {
$hash{$a} cmp $hash{$b}
} keys %hash;
# and by value
Here we‘ll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that
fails, by straight ASCII comparison of the keys (well, possibly modified by your locale — see perllocale).
@keys = sort {
$hash{$b} <=> $hash{$a}
||
length($b) <=> length($a)
||
$a cmp $b
} keys %hash;
How can I always keep my hash sorted?
You can look into using the DB_File module and tie() using the $DB_BTREE hash bindings as
documented in In Memory Databases in DB_File. The Tie::IxHash module from CPAN might also be
instructive.
What‘s the difference between "delete" and "undef" with hashes?
Hashes are pairs of scalars: the first is the key, the second is the value. The key will be coerced to a string,
although the value can be any kind of scalar: string, number, or reference. If a key $key is present in the
array, exists($key) will return true. The value for a given key can be undef, in which case
$array{$key} will be undef while $exists{$key} will return true. This corresponds to ($key,
undef) being in the hash.
Pictures help... here‘s the %ary table:
18−Oct−1998
Version 5.005_02
59
perlfaq4
Perl Programmers Reference Guide
perlfaq4
keys values
+−−−−−−+−−−−−−+
| a
| 3
|
| x
| 7
|
| d
| 0
|
| e
| 2
|
+−−−−−−+−−−−−−+
And these conditions hold
$ary{’a’}
$ary{’d’}
defined $ary{’d’}
defined $ary{’a’}
exists $ary{’a’}
grep ($_ eq ’a’, keys %ary)
is
is
is
is
is
is
true
false
true
true
true (perl5 only)
true
is
is
is
is
is
is
FALSE
false
true
FALSE
true (perl5 only)
true
If you now say
undef $ary{’a’}
your table now reads:
keys values
+−−−−−−+−−−−−−+
| a
| undef|
| x
| 7
|
| d
| 0
|
| e
| 2
|
+−−−−−−+−−−−−−+
and these conditions now hold; changes in caps:
$ary{’a’}
$ary{’d’}
defined $ary{’d’}
defined $ary{’a’}
exists $ary{’a’}
grep ($_ eq ’a’, keys %ary)
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
delete $ary{’a’}
your table now reads:
keys values
+−−−−−−+−−−−−−+
| x
| 7
|
| d
| 0
|
| e
| 2
|
+−−−−−−+−−−−−−+
and these conditions now hold; changes in caps:
$ary{’a’}
$ary{’d’}
defined $ary{’d’}
defined $ary{’a’}
exists $ary{’a’}
60
is
is
is
is
is
Version 5.005_02
false
false
true
false
FALSE (perl5 only)
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
grep ($_ eq ’a’, keys %ary)
perlfaq4
is FALSE
See, the whole entry is gone!
Why don‘t my tied hashes make the defined/exists distinction?
They may or may not implement the EXISTS() and DEFINED() methods differently. For example, there
isn‘t the concept of undef with hashes that are tied to DBM* files. This means the true/false tables above will
give different results when used on such a hash. It also means that exists and defined do the same thing with
a DBM* file, and what they end up doing is not what they do with ordinary hashes.
How do I reset an each() operation part−way through?
Using keys %hash in scalar context returns the number of keys in the hash and resets the iterator
associated with the hash. You may need to do this if you use last to exit a loop early so that when you
re−enter it, the hash iterator has been reset.
How can I get the unique keys from two hashes?
First you extract the keys from the hashes into arrays, and then solve the uniquifying the array problem
described above. For example:
%seen = ();
for $element (keys(%foo), keys(%bar)) {
$seen{$element}++;
}
@uniq = keys %seen;
Or more succinctly:
@uniq = keys %{{%foo,%bar}};
Or if you really want to save space:
%seen = ();
while (defined ($key = each %foo)) {
$seen{$key}++;
}
while (defined ($key = each %bar)) {
$seen{$key}++;
}
@uniq = keys %seen;
How can I store a multidimensional array in a DBM file?
Either stringify the structure yourself (no fun), or else get the MLDBM (which uses Data::Dumper) module
from CPAN and layer it on top of either DB_File or GDBM_File.
How can I make my hash remember the order I put elements into it?
Use the Tie::IxHash from CPAN.
use Tie::IxHash;
tie(%myhash, Tie::IxHash);
for ($i=0; $i<20; $i++) {
$myhash{$i} = 2*$i;
}
@keys = keys %myhash;
# @keys = (0,1,2,3,...)
Why does passing a subroutine an undefined element in a hash create it?
If you say something like:
somefunc($hash{"nonesuch key here"});
18−Oct−1998
Version 5.005_02
61
perlfaq4
Perl Programmers Reference Guide
perlfaq4
Then that element "autovivifies"; that is, it springs into existence whether you store something there or not.
That‘s because functions get scalars passed in by reference. If somefunc() modifies $_[0], it has to be
ready to write it back into the caller‘s version.
This has been fixed as of perl5.004.
Normally, merely accessing a key‘s value for a nonexistent key does not cause that key to be forever there.
This is different than awk‘s behavior.
How can I make the Perl equivalent of a C structure/C++ class/hash or array of hashes or arrays?
Use references (documented in perlref). Examples of complex data structures are given in perldsc and
perllol. Examples of structures and object−oriented classes are in perltoot.
How can I use a reference as a hash key?
You can‘t do this directly, but you could use the standard Tie::Refhash module distributed with perl.
Data: Misc
How do I handle binary data correctly?
Perl is binary clean, so this shouldn‘t be a problem. For example, this works fine (assuming the files are
found):
if (‘cat /vmunix‘ =~ /gzip/) {
print "Your kernel is GNU−zip enabled!\n";
}
On some systems, however, you have to play tedious games with "text" versus "binary" files.
binmode in perlfunc.
See
If you‘re concerned about 8−bit ASCII data, then see perllocale.
If you want to deal with multibyte characters, however, there are some gotchas. See the section on Regular
Expressions.
How do I determine whether a scalar is a number/whole/integer/float?
Assuming that you don‘t care about IEEE notations like "NaN" or "Infinity", you probably just want to use a
regular expression.
warn "has nondigits"
if
/\D/;
warn "not a natural number" unless /^\d+$/;
# rejects −3
warn "not an integer"
unless /^−?\d+$/;
# rejects +3
warn "not an integer"
unless /^[+−]?\d+$/;
warn "not a decimal number" unless /^−?\d+\.?\d*$/; # rejects .2
warn "not a decimal number" unless /^−?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
unless /^([+−]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+−]?\d+))?$/;
If you‘re on a POSIX system, Perl‘s supports the POSIX::strtod function. Its semantics are somewhat
cumbersome, so here‘s a getnum wrapper function for more convenient access. This function takes a string
and returns the number it found, or undef for input that isn‘t a C float. The is_numeric function is a
front end to getnum if you just want to say, ‘‘Is this a float?‘’
sub getnum {
use POSIX qw(strtod);
my $str = shift;
$str =~ s/^\s+//;
$str =~ s/\s+$//;
$! = 0;
my($num, $unparsed) = strtod($str);
if (($str eq ’’) || ($unparsed != 0) || $!) {
return undef;
62
Version 5.005_02
18−Oct−1998
perlfaq4
Perl Programmers Reference Guide
perlfaq4
} else {
return $num;
}
}
sub is_numeric { defined &getnum }
Or you could check out http://www.perl.com/CPAN/modules/by−module/String/String−Scanf−1.1.tar.gz
instead. The POSIX module (part of the standard Perl distribution) provides the strtol and strtod for
converting strings to double and longs, respectively.
How do I keep persistent data across program calls?
For some specific applications, you can use one of the DBM modules. See AnyDBM_File. More generically,
you should consult the FreezeThaw, Storable, or Class::Eroot modules from CPAN.
How do I print out or copy a recursive data structure?
The Data::Dumper module on CPAN is nice for printing out data structures, and FreezeThaw for copying
them. For example:
use FreezeThaw qw(freeze thaw);
$new = thaw freeze $old;
Where $old can be (a reference to) any kind of data structure you‘d like. It will be deeply copied.
How do I define methods for every class/object?
Use the UNIVERSAL class (see UNIVERSAL).
How do I verify a credit card checksum?
Get the Business::CreditCard module from CPAN.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You
are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit would be courteous but is not required.
18−Oct−1998
Version 5.005_02
63
perlfaq5
Perl Programmers Reference Guide
perlfaq5
NAME
perlfaq5 − Files and Formats ($Revision: 1.24 $, $Date: 1998/07/05 15:07:20 $)
DESCRIPTION
This section deals with I/O and the "f" issues: filehandles, flushing, formats, and footers.
How do I flush/unbuffer an output filehandle? Why must I do this?
The C standard I/O library (stdio) normally buffers characters sent to devices. This is done for efficiency
reasons, so that there isn‘t a system call for each byte. Any time you use print() or write() in Perl,
you go though this buffering. syswrite() circumvents stdio and buffering.
In most stdio implementations, the type of output buffering and the size of the buffer varies according to the
type of device. Disk files are block buffered, often with a buffer size of more than 2k. Pipes and sockets are
often buffered with a buffer size between 1/2 and 2k. Serial devices (e.g. modems, terminals) are normally
line−buffered, and stdio sends the entire line when it gets the newline.
Perl does not support truly unbuffered output (except insofar as you can syswrite(OUT, $char, 1)).
What it does instead support is "command buffering", in which a physical write is performed after every
output command. This isn‘t as hard on your system as unbuffering, but does get the output where you want
it when you want it.
If you expect characters to get to your device when you print them there, you‘ll want to autoflush its handle.
Use select() and the $| variable to control autoflushing (see $| and select):
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
Or using the traditional idiom:
select((select(OUTPUT_HANDLE), $| = 1)[0]);
Or if don‘t mind slowly loading several thousand lines of module code just because you‘re afraid of the $|
variable:
use FileHandle;
open(DEV, "+autoflush(1);
# ceci n’est pas une pipe
or the newer IO::* modules:
use IO::Handle;
open(DEV, ">/dev/printer");
DEV−>autoflush(1);
# but is this?
or even this:
use IO::Socket;
# this one is kinda a pipe?
$sock = IO::Socket::INET−>new(PeerAddr => ’www.perl.com’,
PeerPort => ’http(80)’,
Proto
=> ’tcp’);
die "$!" unless $sock;
$sock−>autoflush();
print $sock "GET / HTTP/1.0" . "\015\012" x 2;
$document = join(’’, <$sock>);
print "DOC IS: $document\n";
Note the bizarrely hardcoded carriage return and newline in their octal equivalents. This is the ONLY way
(currently) to assure a proper flush on all platforms, including Macintosh. That the way things work in
network programming: you really should specify the exact bit pattern on the network line terminator. In
64
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
practice, "\n\n" often works, but this is not portable.
See perlfaq9 for other examples of fetching URLs over the web.
How do I change one line in a file/delete a line in a file/insert a line in the middle of a file/append to
the beginning of a file?
Although humans have an easy time thinking of a text file as being a sequence of lines that operates much
like a stack of playing cards — or punch cards — computers usually see the text file as a sequence of bytes.
In general, there‘s no direct way for Perl to seek to a particular line of a file, insert text into a file, or remove
text from a file.
(There are exceptions in special circumstances. You can add or remove at the very end of the file. Another
is replacing a sequence of bytes with another sequence of the same length. Another is using the
$DB_RECNO array bindings as documented in DB_File. Yet another is manipulating files with all lines the
same length.)
The general solution is to create a temporary copy of the text file with the changes you want, then copy that
over the original. This assumes no locking.
$old = $file;
$new = "$file.tmp.$$";
$bak = "$file.bak";
open(OLD, "< $old")
open(NEW, "> $new")
or die "can’t open $old: $!";
or die "can’t open $new: $!";
# Correct typos, preserving case
while () {
s/\b(p)earl\b/${1}erl/i;
(print NEW $_)
or die "can’t write to $new: $!";
}
close(OLD)
close(NEW)
or die "can’t close $old: $!";
or die "can’t close $new: $!";
rename($old, $bak)
rename($new, $old)
or die "can’t rename $old to $bak: $!";
or die "can’t rename $new to $old: $!";
Perl can do this sort of thing for you automatically with the −i command−line switch or the closely−related
$^I variable (see perlrun for more details). Note that −i may require a suffix on some non−Unix systems;
see the platform−specific documentation that came with your port.
# Renumber a series of tests from the command line
perl −pi −e ’s/(^\s+test\s+)\d+/ $1 . ++$count /e’ t/op/taint.t
# form a script
local($^I, @ARGV) = (’.bak’, glob("*.c"));
while (<>) {
if ($. == 1) {
print "This line should appear at the top of each file\n";
}
s/\b(p)earl\b/${1}erl/i;
# Correct typos, preserving case
print;
close ARGV if eof;
# Reset $.
}
If you need to seek to an arbitrary line of a file that changes infrequently, you could build up an index of byte
positions of where the line ends are in the file. If the file is large, an index of every tenth or hundredth line
end would allow you to seek and read fairly efficiently. If the file is sorted, try the look.pl library (part of the
standard perl distribution).
18−Oct−1998
Version 5.005_02
65
perlfaq5
Perl Programmers Reference Guide
perlfaq5
In the unique case of deleting lines at the end of a file, you can use tell() and truncate(). The
following code snippet deletes the last line of a file without making a copy or reading the whole file into
memory:
open (FH, "+< $file");
while ( ) { $addr = tell(FH) unless eof(FH) }
truncate(FH, $addr);
Error checking is left as an exercise for the reader.
How do I count the number of lines in a file?
One fairly efficient way is to count newlines in the file. The following program uses a feature of tr///, as
documented in perlop. If your text file doesn‘t end with a newline, then it‘s not really a proper text file, so
this may report one fewer line than you expect.
$lines = 0;
open(FILE, $filename) or die "Can’t open ‘$filename’: $!";
while (sysread FILE, $buffer, 4096) {
$lines += ($buffer =~ tr/\n//);
}
close FILE;
This assumes no funny games with newline translations.
How do I make a temporary file name?
Use the new_tmpfile class method from the IO::File module to get a filehandle opened for reading and
writing. Use this if you don‘t need to know the file‘s name.
use IO::File;
$fh = IO::File−>new_tmpfile()
or die "Unable to make new temporary file: $!";
Or you can use the tmpnam function from the POSIX module to get a filename that you then open yourself.
Use this if you do need to know the file‘s name.
use Fcntl;
use POSIX qw(tmpnam);
# try new temporary filenames until we get one that didn’t already
# exist; the check should be unnecessary, but you can’t be too careful
do { $name = tmpnam() }
until sysopen(FH, $name, O_RDWR|O_CREAT|O_EXCL);
# install atexit−style handler so that when we exit or die,
# we automatically delete this temporary file
END { unlink($name) or die "Couldn’t unlink $name : $!" }
# now go on to use the file ...
If you‘re committed to doing this by hand, use the process ID and/or the current time−value. If you need to
have many temporary files in one process, use a counter:
BEGIN {
use Fcntl;
my $temp_dir = −d ’/tmp’ ? ’/tmp’ : $ENV{TMP} || $ENV{TEMP};
my $base_name = sprintf("%s/%d−%d−0000", $temp_dir, $$, time());
sub temp_file {
local *FH;
my $count = 0;
until (defined(fileno(FH)) || $count++ > 100) {
$base_name =~ s/−(\d+)$/"−" . (1 + $1)/e;
66
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
sysopen(FH, $base_name, O_WRONLY|O_EXCL|O_CREAT);
}
if (defined(fileno(FH))
return (*FH, $base_name);
} else {
return ();
}
}
}
How can I manipulate fixed−record−length files?
The most efficient way is using pack() and unpack(). This is faster than using substr() when take
many, many strings. It is slower for just a few.
Here is a sample chunk of code to break up and put back together again some fixed−format input lines, in
this case from the output of a normal, Berkeley−style ps:
# sample input line:
#
15158 p5 T
0:00 perl /home/tchrist/scripts/now−what
$PS_T = ’A6 A4 A7 A5 A*’;
open(PS, "ps|");
print scalar ;
while () {
($pid, $tt, $stat, $time, $command) = unpack($PS_T, $_);
for $var (qw!pid tt stat time command!) {
print "$var: <$$var>\n";
}
print ’line=’, pack($PS_T, $pid, $tt, $stat, $time, $command),
"\n";
}
We‘ve used $$var in a way that forbidden by use strict ‘refs’. That is, we‘ve promoted a string to
a scalar variable reference using symbolic references. This is ok in small programs, but doesn‘t scale well.
It also only works on global variables, not lexicals.
How can I make a filehandle local to a subroutine? How do I pass filehandles between
subroutines? How do I make an array of filehandles?
The fastest, simplest, and most direct way is to localize the typeglob of the filehandle in question:
local *TmpHandle;
Typeglobs are fast (especially compared with the alternatives) and reasonably easy to use, but they also have
one subtle drawback. If you had, for example, a function named TmpHandle(), or a variable named
%TmpHandle, you just hid it from yourself.
sub findme {
local *HostFile;
open(HostFile, ") {
print if /\b127\.(0\.0\.)?1\b/;
}
# *HostFile automatically closes/disappears here
}
Here‘s how to use this in a loop to open and store a bunch of filehandles. We‘ll use as values of the hash an
ordered pair to make it easy to sort the hash in insertion order.
@names = qw(motd termcap passwd hosts);
18−Oct−1998
Version 5.005_02
67
perlfaq5
Perl Programmers Reference Guide
perlfaq5
my $i = 0;
foreach $filename (@names) {
local *FH;
open(FH, "/etc/$filename") || die "$filename: $!";
$file{$filename} = [ $i++, *FH ];
}
# Using the filehandles in the array
foreach $name (sort { $file{$a}[0] <=> $file{$b}[0] } keys %file) {
my $fh = $file{$name}[1];
my $line = <$fh>;
print "$name $. $line";
}
For passing filehandles to functions, the easiest way is to prefer them with a star, as in func(*STDIN). See
Passing Filehandles in perlfaq7 for details.
If you want to create many, anonymous handles, you should check out the Symbol, FileHandle, or
IO::Handle (etc.) modules. Here‘s the equivalent code with Symbol::gensym, which is reasonably
light−weight:
foreach $filename (@names) {
use Symbol;
my $fh = gensym();
open($fh, "/etc/$filename") || die "open /etc/$filename: $!";
$file{$filename} = [ $i++, $fh ];
}
Or here using the semi−object−oriented FileHandle, which certainly isn‘t light−weight:
use FileHandle;
foreach $filename (@names) {
my $fh = FileHandle−>new("/etc/$filename") or die "$filename: $!";
$file{$filename} = [ $i++, $fh ];
}
Please understand that whether the filehandle happens to be a (probably localized) typeglob or an anonymous
handle from one of the modules, in no way affects the bizarre rules for managing indirect handles. See the
next question.
How can I use a filehandle indirectly?
An indirect filehandle is using something other than a symbol in a place that a filehandle is expected. Here
are ways to get those:
$fh
$fh
$fh
$fh
$fh
=
SOME_FH;
= "SOME_FH";
= *SOME_FH;
= \*SOME_FH;
= *SOME_FH{IO};
#
#
#
#
#
bareword is strict−subs hostile
strict−refs hostile; same package only
typeglob
ref to typeglob (bless−able)
blessed IO::Handle from *SOME_FH typeglob
Or to use the new method from the FileHandle or IO modules to create an anonymous filehandle, store that
in a scalar variable, and use it as though it were a normal filehandle.
use FileHandle;
$fh = FileHandle−>new();
use IO::Handle;
$fh = IO::Handle−>new();
# 5.004 or higher
Then use any of those as you would a normal filehandle. Anywhere that Perl is expecting a filehandle, an
68
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
indirect filehandle may be used instead. An indirect filehandle is just a scalar variable that contains a
filehandle. Functions like print, open, seek, or the functions or the diamond operator will accept
either a read filehandle or a scalar variable containing one:
($ifh, $ofh, $efh) = (*STDIN, *STDOUT, *STDERR);
print $ofh "Type it: ";
$got = <$ifh>
print $efh "What was that: $got";
Of you‘re passing a filehandle to a function, you can write the function in two ways:
sub accept_fh {
my $fh = shift;
print $fh "Sending to indirect filehandle\n";
}
Or it can localize a typeglob and use the filehandle directly:
sub accept_fh {
local *FH = shift;
print FH "Sending to localized filehandle\n";
}
Both styles work with either objects or typeglobs of real filehandles. (They might also work with strings
under some circumstances, but this is risky.)
accept_fh(*STDOUT);
accept_fh($handle);
In the examples above, we assigned the filehandle to a scalar variable before using it. That is because only
simple scalar variables, not expressions or subscripts into hashes or arrays, can be used with built−ins like
print, printf, or the diamond operator. These are illegal and won‘t even compile:
@fd = (*STDIN, *STDOUT, *STDERR);
print $fd[1] "Type it: ";
$got = <$fd[0]>
print $fd[2] "What was that: $got";
# WRONG
# WRONG
# WRONG
With print and printf, you get around this by using a block and an expression where you would place
the filehandle:
print { $fd[1] } "funny stuff\n";
printf { $fd[1] } "Pity the poor %x.\n", 3_735_928_559;
# Pity the poor deadbeef.
That block is a proper block like any other, so you can put more complicated code there. This sends the
message out to one of two places:
$ok = −x "/bin/cat";
print { $ok ? $fd[1] : $fd[2] } "cat stat $ok\n";
print { $fd[ 1+ ($ok || 0) ] } "cat stat $ok\n";
This approach of treating print and printf like object methods calls doesn‘t work for the diamond
operator. That‘s because it‘s a real operator, not just a function with a comma−less argument. Assuming
you‘ve been storing typeglobs in your structure as we did above, you can use the built−in function named
readline to reads a record just as <> does. Given the initialization shown above for @fd, this would
work, but only because readline() require a typeglob. It doesn‘t work with objects or strings, which
might be a bug we haven‘t fixed yet.
$got = readline($fd[0]);
Let it be noted that the flakiness of indirect filehandles is not related to whether they‘re strings, typeglobs,
18−Oct−1998
Version 5.005_02
69
perlfaq5
Perl Programmers Reference Guide
perlfaq5
objects, or anything else. It‘s the syntax of the fundamental operators. Playing the object game doesn‘t help
you at all here.
How can I set up a footer format to be used with write()?
There‘s no builtin way to do this, but perlform has a couple of techniques to make it possible for the intrepid
hacker.
How can I write() into a string?
See perlform for an swrite() function.
How can I output my numbers with commas added?
This one will do it for you:
sub commify {
local $_ = shift;
1 while s/^(−?\d+)(\d{3})/$1,$2/;
return $_;
}
$n = 23659019423.2331;
print "GOT: ", commify($n), "\n";
GOT: 23,659,019,423.2331
You can‘t just:
s/^(−?\d+)(\d{3})/$1,$2/g;
because you have to put the comma in and then recalculate your position.
Alternatively, this commifies all numbers in a line regardless of whether they have decimal portions, are
preceded by + or −, or whatever:
# from Andrew Johnson
sub commify {
my $input = shift;
$input = reverse $input;
$input =~ s<(\d\d\d)(?=\d)(?!\d*\.)><$1,>g;
return reverse $input;
}
How can I translate tildes (~) in a filename?
Use the <> (glob()) operator, documented in perlfunc. This requires that you have a shell installed that
groks tildes, meaning csh or tcsh or (some versions of) ksh, and thus may have portability problems. The
Glob::KGlob module (available from CPAN) gives more portable glob functionality.
Within Perl, you may use this directly:
$filename =~ s{
^ ~
# find a leading tilde
(
# save this in $1
[^/]
# a non−slash character
*
# repeated 0 or more times (0 means me)
)
}{
$1
? (getpwnam($1))[7]
: ( $ENV{HOME} || $ENV{LOGDIR} )
}ex;
70
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
How come when I open a file read−write it wipes it out?
Because you‘re using something like this, which truncates the file and then gives you read−write access:
open(FH, "+> /path/name");
# WRONG (almost always)
Whoops. You should instead use this, which will fail if the file doesn‘t exist. Using ">" always clobbers or
creates. Using "<" never does either. The "+" doesn‘t change this.
Here are examples of many kinds of file opens. Those using sysopen() all assume
use Fcntl;
To open file for reading:
open(FH, "< $path")
sysopen(FH, $path, O_RDONLY)
|| die $!;
|| die $!;
To open file for writing, create new file if needed or else truncate old file:
open(FH, "> $path") || die $!;
sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT)
sysopen(FH, $path, O_WRONLY|O_TRUNC|O_CREAT, 0666)
|| die $!;
|| die $!;
To open file for writing, create new file, file must not exist:
sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT)
sysopen(FH, $path, O_WRONLY|O_EXCL|O_CREAT, 0666)
|| die $!;
|| die $!;
To open file for appending, create if necessary:
open(FH, ">> $path") || die $!;
sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT)
|| die $!;
sysopen(FH, $path, O_WRONLY|O_APPEND|O_CREAT, 0666) || die $!;
To open file for appending, file must exist:
sysopen(FH, $path, O_WRONLY|O_APPEND)
|| die $!;
To open file for update, file must exist:
open(FH, "+< $path")
sysopen(FH, $path, O_RDWR)
|| die $!;
|| die $!;
To open file for update, create file if necessary:
sysopen(FH, $path, O_RDWR|O_CREAT)
sysopen(FH, $path, O_RDWR|O_CREAT, 0666)
|| die $!;
|| die $!;
To open file for update, file must not exist:
sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT)
sysopen(FH, $path, O_RDWR|O_EXCL|O_CREAT, 0666)
|| die $!;
|| die $!;
To open a file without blocking, creating if necessary:
sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT)
or die "can’t open /tmp/somefile: $!":
Be warned that neither creation nor deletion of files is guaranteed to be an atomic operation over NFS. That
is, two processes might both successful create or unlink the same file! Therefore O_EXCL isn‘t so exclusive
as you might wish.
Why do I sometimes get an "Argument list too long" when I use <*?
The <> operator performs a globbing operation (see above). By default glob() forks csh(1) to do the actual
glob expansion, but csh can‘t handle more than 127 items and so gives the error message Argument list
too long. People who installed tcsh as csh won‘t have this problem, but their users may be surprised by
18−Oct−1998
Version 5.005_02
71
perlfaq5
Perl Programmers Reference Guide
perlfaq5
it.
To get around this, either do the glob yourself with Dirhandles and patterns, or use a module like
Glob::KGlob, one that doesn‘t use the shell to do globbing.
Is there a leak/bug in glob()?
Due to the current implementation on some operating systems, when you use the glob() function or its
angle−bracket alias in a scalar context, you may cause a leak and/or unpredictable behavior. It‘s best
therefore to use glob() only in list context.
How can I open a file with a leading ">" or trailing blanks?
Normally perl ignores trailing blanks in filenames, and interprets certain leading characters (or a trailing "|")
to mean something special. To avoid this, you might want to use a routine like this. It makes incomplete
pathnames into explicit relative ones, and tacks a trailing null byte on the name to make perl leave it alone:
sub safe_filename {
local $_ = shift;
return m#^/#
? "$_\0"
: "./$_\0";
}
$fn = safe_filename("<< $fn") or "couldn’t open $fn: $!";
");
You could also use the sysopen() function (see sysopen).
How can I reliably rename a file?
Well, usually you just use Perl‘s rename() function. But that may not work everywhere, in particular,
renaming files across file systems. If your operating system supports a mv(1) program or its moral
equivalent, this works:
rename($old, $new) or system("mv", $old, $new);
It may be more compelling to use the File::Copy module instead. You just copy to the new file to the new
name (checking return values), then delete the old one. This isn‘t really the same semantics as a real
rename(), though, which preserves metainformation like permissions, timestamps, inode info, etc.
The newer version of File::Copy export a move() function.
How can I lock a file?
Perl‘s builtin flock() function (see perlfunc for details) will call flock(2) if that exists, fcntl(2) if it doesn‘t
(on perl version 5.004 and later), and lockf(3) if neither of the two previous system calls exists. On some
systems, it may even use a different form of native locking. Here are some gotchas with Perl‘s flock():
1
Produces a fatal error if none of the three system calls (or their close equivalent) exists.
2
lockf(3) does not provide shared locking, and requires that the filehandle be open for writing (or
appending, or read/writing).
3
Some versions of flock() can‘t lock files over a network (e.g. on NFS file systems), so you‘d need
to force the use of fcntl(2) when you build Perl. See the flock entry of perlfunc, and the INSTALL file
in the source distribution for information on building Perl to do this.
What can‘t I just open(FH, "file.lock")?
A common bit of code NOT TO USE is this:
sleep(3) while −e "file.lock";
open(LCK, "> file.lock");
# PLEASE DO NOT USE
# THIS BROKEN CODE
This is a classic race condition: you take two steps to do something which must be done in one. That‘s why
computer hardware provides an atomic test−and−set instruction. In theory, this "ought" to work:
72
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
sysopen(FH, "file.lock", O_WRONLY|O_EXCL|O_CREAT)
or die "can’t open file.lock: $!":
except that lamentably, file creation (and deletion) is not atomic over NFS, so this won‘t work (at least, not
every time) over the net. Various schemes involving involving link() have been suggested, but these tend
to involve busy−wait, which is also subdesirable.
I still don‘t get locking. I just want to increment the number in the file. How can I do this?
Didn‘t anyone ever tell you web−page hit counters were useless? They don‘t count number of hits, they‘re a
waste of time, and they serve only to stroke the writer‘s vanity. Better to pick a random number. It‘s more
realistic.
Anyway, this is what you can do if you can‘t help yourself.
use Fcntl;
sysopen(FH, "numfile", O_RDWR|O_CREAT)
flock(FH, 2)
$num = || 0;
seek(FH, 0, 0)
truncate(FH, 0)
(print FH $num+1, "\n")
# DO NOT UNLOCK THIS UNTIL YOU CLOSE
close FH
or die "can’t open numfile: $!";
or die "can’t flock numfile: $!";
or die "can’t rewind numfile: $!";
or die "can’t truncate numfile: $!";
or die "can’t write numfile: $!";
or die "can’t close numfile: $!";
Here‘s a much better web−page hit counter:
$hits = int( (time() − 850_000_000) / rand(1_000) );
If the count doesn‘t impress your friends, then the code might. :−)
How do I randomly update a binary file?
If you‘re just trying to patch a binary, in many cases something as simple as this works:
perl −i −pe ’s{window manager}{window mangler}g’ /usr/bin/emacs
However, if you have fixed sized records, then you might do something more like this:
$RECSIZE = 220; # size of record, in bytes
$recno
= 37; # which record to update
open(FH, "+mtime);
print "file $file updated at $date_string\n";
Error checking is left as an exercise for the reader.
How do I set a file‘s timestamp in perl?
You use the utime() function documented in utime. By way of example, here‘s a little program that copies
the read and write times from its first argument to all the rest of them.
if (@ARGV < 2) {
die "usage: cptimes timestamp_file other_files ...\n";
}
$timestamp = shift;
($atime, $mtime) = (stat($timestamp))[8,9];
utime $atime, $mtime, @ARGV;
Error checking is left as an exercise for the reader.
Note that utime() currently doesn‘t work correctly with Win95/NT ports. A bug has been reported.
Check it carefully before using it on those platforms.
How do I print to more than one file at once?
If you only have to do this once, you can do this:
for $fh (FH1, FH2, FH3) { print $fh "whatever\n" }
To connect up to one filehandle to several output filehandles, it‘s easiest to use the tee(1) program if you
have it, and let it take care of the multiplexing:
open (FH, "| tee file1 file2 file3");
Or even:
# make STDOUT go to three files, plus original STDOUT
open (STDOUT, "| tee file1 file2 file3") or die "Teeing off: $!\n";
print "whatever\n"
or die "Writing: $!\n";
close(STDOUT)
or die "Closing: $!\n";
Otherwise you‘ll have to write your own multiplexing print function — or your own tee program — or use
Tom Christiansen‘s, at http://www.perl.com/CPAN/authors/id/TOMC/scripts/tct.gz, which is written in Perl
and offers much greater functionality than the stock version.
How can I read in a file by paragraphs?
Use the $\ variable (see perlvar for details). You can either set it to "" to eliminate empty paragraphs
("abc\n\n\n\ndef", for instance, gets treated as two paragraphs and not three), or "\n\n" to accept
empty paragraphs.
How can I read a single character from a file? From the keyboard?
You can use the builtin getc() function for most filehandles, but it won‘t (easily) work on a terminal
device. For STDIN, either use the Term::ReadKey module from CPAN, or use the sample code in getc.
If your system supports POSIX, you can use the following code, which you‘ll note turns off echo processing
as well.
#!/usr/bin/perl −w
use strict;
$| = 1;
74
Version 5.005_02
18−Oct−1998
perlfaq5
Perl Programmers Reference Guide
perlfaq5
for (1..4) {
my $got;
print "gimme: ";
$got = getone();
print "−−> $got\n";
}
exit;
BEGIN {
use POSIX qw(:termios_h);
my ($term, $oterm, $echo, $noecho, $fd_stdin);
$fd_stdin = fileno(STDIN);
$term
= POSIX::Termios−>new();
$term−>getattr($fd_stdin);
$oterm
= $term−>getlflag();
$echo
$noecho
= ECHO | ECHOK | ICANON;
= $oterm & ~$echo;
sub cbreak {
$term−>setlflag($noecho);
$term−>setcc(VTIME, 1);
$term−>setattr($fd_stdin, TCSANOW);
}
sub cooked {
$term−>setlflag($oterm);
$term−>setcc(VTIME, 0);
$term−>setattr($fd_stdin, TCSANOW);
}
sub getone {
my $key = ’’;
cbreak();
sysread(STDIN, $key, 1);
cooked();
return $key;
}
}
END { cooked() }
The Term::ReadKey module from CPAN may be easier to use:
use Term::ReadKey;
open(TTY, " fionread.c
#include
main() {
printf("%#08x\n", FIONREAD);
}
^D
% cc −o fionread fionread
% ./fionread
0x4004667f
And then hard−code it, leaving porting as an exercise to your successor.
$FIONREAD = 0x4004667f;
# XXX: opsys dependent
$size = pack("L", 0);
ioctl(FH, $FIONREAD, $size)
$size = unpack("L", $size);
or die "Couldn’t call ioctl: $!\n";
FIONREAD requires a filehandle connected to a stream, meaning sockets, pipes, and tty devices work, but
not files.
How do I do a tail −f in perl?
First try
seek(GWFILE, 0, 1);
The statement seek(GWFILE, 0, 1) doesn‘t change the current position, but it does clear the
end−of−file condition on the handle, so that the next ; $curpos = tell(GWFILE)) {
# search for some stuff and put it into files
}
# sleep for a while
seek(GWFILE, $curpos, 0); # seek to where we had been
}
If this still doesn‘t work, look into the POSIX module. POSIX defines the clearerr() method, which
can remove the end of file condition on a filehandle. The method: read until end of file, clearerr(), read
some more. Lather, rinse, repeat.
How do I dup() a filehandle in Perl?
If you check open, you‘ll see that several of the ways to call open() should do the trick. For example:
open(LOG, ">>/tmp/logfile");
open(STDERR, ">&LOG");
Or even with a literal numeric descriptor:
$fd = $ENV{MHCONTEXTFD};
open(MHCONTEXT, "<&=$fd");
# like fdopen(3S)
Note that "<&STDIN" makes a copy, but "<&=STDIN" make an alias. That means if you close an aliased
18−Oct−1998
Version 5.005_02
77
perlfaq5
Perl Programmers Reference Guide
perlfaq5
handle, all aliases become inaccessible. This is not true with a copied one.
Error checking, as always, has been left as an exercise for the reader.
How do I close a file descriptor by number?
This should rarely be necessary, as the Perl close() function is to be used for things that Perl opened
itself, even if it was a dup of a numeric descriptor, as with MHCONTEXT above. But if you really have to,
you may be able to do this:
require ’sys/syscall.ph’;
$rc = syscall(&SYS_close, $fd + 0); # must force numeric
die "can’t sysclose $fd: $!" unless $rc == −1;
Why can‘t I use "C:\temp\foo" in DOS paths? What doesn‘t ‘C:\temp\foo.exe‘ work?
Whoops! You just put a tab and a formfeed into that filename! Remember that within double quoted strings
("like\this"), the backslash is an escape character. The full list of these is in
Quote and Quote−like Operators. Unsurprisingly, you don‘t have a file called "c:(tab)emp(formfeed)oo" or
"c:(tab)emp(formfeed)oo.exe" on your DOS filesystem.
Either single−quote your strings, or (preferably) use forward slashes. Since all DOS and Windows versions
since something like MS−DOS 2.0 or so have treated / and \ the same in a path, you might as well use the
one that doesn‘t clash with Perl — or the POSIX shell, ANSI C and C++, awk, Tcl, Java, or Python, just to
mention a few.
Why doesn‘t glob("*.*") get all the files?
Because even on non−Unix ports, Perl‘s glob function follows standard Unix globbing semantics. You‘ll
need glob("*") to get all (non−hidden) files. This makes glob() portable.
Why does Perl let me delete read−only files? Why does −i clobber protected files? Isn‘t this a
bug in Perl?
This is elaborately and painstakingly described in the "Far More Than You Ever Wanted To Know" in
http://www.perl.com/CPAN/doc/FMTEYEWTK/file−dir−perms .
The executive summary: learn how your filesystem works. The permissions on a file say what can happen to
the data in that file. The permissions on a directory say what can happen to the list of files in that directory.
If you delete a file, you‘re removing its name from the directory (so the operation depends on the
permissions of the directory, not of the file). If you try to write to the file, the permissions of the file govern
whether you‘re allowed to.
How do I select a random line from a file?
Here‘s an algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file in. A simple proof by induction is
available upon request if you doubt its correctness.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as an integrated part of the Standard Distribution of Perl or of its documentation (printed or
otherwise), this works is covered under Perl‘s Artistic Licence. For separate distributions of all or part of
this FAQ outside of that, see perlfaq.
Irrespective of its distribution, all code examples here are public domain. You are permitted and encouraged
to use this code and any derivatives thereof in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit to the FAQ would be courteous but is not required.
78
Version 5.005_02
18−Oct−1998
perlfaq6
Perl Programmers Reference Guide
perlfaq6
NAME
perlfaq6 − Regexps ($Revision: 1.22 $, $Date: 1998/07/16 14:01:07 $)
DESCRIPTION
This section is surprisingly small because the rest of the FAQ is littered with answers involving regular
expressions. For example, decoding a URL and checking whether something is a number are handled with
regular expressions, but those answers are found elsewhere in this document (in the section on Data and the
Networking one on networking, to be precise).
How can I hope to use regular expressions without creating illegible and unmaintainable code?
Three techniques can make regular expressions maintainable and understandable.
Comments Outside the Regexp
Describe what you‘re doing and how you‘re doing it, using normal Perl comments.
# turn the line into the first word, a colon, and the
# number of characters on the rest of the line
s/^(\w+)(.*)/ lc($1) . ":" . length($2) /meg;
Comments Inside the Regexp
The /x modifier causes whitespace to be ignored in a regexp pattern (except in a character class), and
also allows you to use normal comments there, too. As you can imagine, whitespace and comments
help a lot.
/x lets you turn this:
s{<(?:[^>’"]*|".*?"|’.*?’)+>}{}gs;
into this:
s{ <
(?:
[^>’"] *
|
".*?"
|
’.*?’
) +
>
}{}gsx;
#
#
#
#
#
#
#
#
#
#
opening angle bracket
Non−backreffing grouping paren
0 or more things that are neither > nor ’ nor "
or else
a section between double quotes (stingy match)
or else
a section between single quotes (stingy match)
all occurring one or more times
closing angle bracket
replace with nothing, i.e. delete
It‘s still not quite so clear as prose, but it is very useful for describing the meaning of each part of the
pattern.
Different Delimiters
While we normally think of patterns as being delimited with / characters, they can be delimited by
almost any character. perlre describes this. For example, the s/// above uses braces as delimiters.
Selecting another delimiter can avoid quoting the delimiter within the pattern:
s/\/usr\/local/\/usr\/share/g;
s#/usr/local#/usr/share#g;
# bad delimiter choice
# better
I‘m having trouble matching over more than one line. What‘s wrong?
Either you don‘t have more than one line in the string you‘re looking at (probably), or else you aren‘t using
the correct modifier(s) on your pattern (possibly).
There are many ways to get multiline data into a string. If you want it to happen automatically while reading
input, you‘ll want to set $/ (probably to ‘’ for paragraphs or undef for the whole file) to allow you to read
more than one line at a time.
18−Oct−1998
Version 5.005_02
79
perlfaq6
Perl Programmers Reference Guide
perlfaq6
Read perlre to help you decide which of /s and /m (or both) you might want to use: /s allows dot to
include newline, and /m allows caret and dollar to match next to a newline, not just at the end of the string.
You do need to make sure that you‘ve actually got a multiline string in there.
For example, this program detects duplicate words, even when they span line breaks (but not paragraph
ones). For this example, we don‘t need /s because we aren‘t using dot in a regular expression that we want
to cross line boundaries. Neither do we need /m because we aren‘t wanting caret or dollar to match at any
point inside the record next to newlines. But it‘s imperative that $/ be set to something other than the
default, or else we won‘t actually ever have a multiline record read in.
$/ = ’’;
# read in more whole paragraph, not just one line
while ( <> ) {
while ( /\b([\w’−]+)(\s+\1)+\b/gi ) {
# word starts alpha
print "Duplicate $1 at paragraph $.\n";
}
}
Here‘s code that finds sentences that begin with "From " (which would be mangled by many mailers):
$/ = ’’;
# read in more whole paragraph, not just one line
while ( <> ) {
while ( /^From /gm ) { # /m makes ^ match next to \n
print "leading from in paragraph $.\n";
}
}
Here‘s code that finds everything between START and END in a paragraph:
undef $/;
# read in whole file, not just one line or paragraph
while ( <> ) {
while ( /START(.*?)END/sm ) { # /s makes . cross line boundaries
print "$1\n";
}
}
How can I pull out lines between two patterns that are themselves on different lines?
You can use Perl‘s somewhat exotic .. operator (documented in perlop):
perl −ne ’print if /START/ .. /END/’ file1 file2 ...
If you wanted text and not lines, you would use
perl −0777 −pe ’print "$1\n" while /START(.*?)END/gs’ file1 file2 ...
But if you want nested occurrences of START through END, you‘ll run up against the problem described in
the question in this section on matching balanced text.
Here‘s another example of using ..:
while (<>) {
$in_header =
1 .. /^$/;
$in_body
= /^$/ .. eof();
# now choose between them
} continue {
reset if eof();
# fix $.
}
I put a regular expression into $/ but it didn‘t work. What‘s wrong?
$/ must be a string, not a regular expression. Awk has to be better for something. :−)
Actually, you could do this if you don‘t mind reading the whole file into memory:
80
Version 5.005_02
18−Oct−1998
perlfaq6
Perl Programmers Reference Guide
perlfaq6
undef $/;
@records = split /your_pattern/, ;
The Net::Telnet module (available from CPAN) has the capability to wait for a pattern in the input stream, or
timeout if it doesn‘t appear within a certain time.
## Create a file with three lines.
open FH, ">file";
print FH "The first line\nThe second line\nThe third line\n";
close FH;
## Get a read/write filehandle to it.
$fh = new FileHandle "+ $fh);
## Search for the second line and print out the third.
$file−>waitfor(’/second line\n/’);
print $file−>getline;
How do I substitute case insensitively on the LHS, but preserving case on the RHS?
It depends on what you mean by "preserving case". The following script makes the substitution have the
same case, letter by letter, as the original. If the substitution has more characters than the string being
substituted, the case of the last character is used for the rest of the substitution.
# Original by Nathan Torkington, massaged by Jeffrey Friedl
#
sub preserve_case($$)
{
my ($old, $new) = @_;
my ($state) = 0; # 0 = no change; 1 = lc; 2 = uc
my ($i, $oldlen, $newlen, $c) = (0, length($old), length($new));
my ($len) = $oldlen < $newlen ? $oldlen : $newlen;
for ($i = 0; $i < $len; $i++) {
if ($c = substr($old, $i, 1), $c =~ /[\W\d_]/) {
$state = 0;
} elsif (lc $c eq $c) {
substr($new, $i, 1) = lc(substr($new, $i, 1));
$state = 1;
} else {
substr($new, $i, 1) = uc(substr($new, $i, 1));
$state = 2;
}
}
# finish up with any remaining new (for when new is longer than old)
if ($newlen > $oldlen) {
if ($state == 1) {
substr($new, $oldlen) = lc(substr($new, $oldlen));
} elsif ($state == 2) {
substr($new, $oldlen) = uc(substr($new, $oldlen));
}
}
return $new;
}
18−Oct−1998
Version 5.005_02
81
perlfaq6
Perl Programmers Reference Guide
perlfaq6
$a = "this is a TEsT case";
$a =~ s/(test)/preserve_case($1, "success")/gie;
print "$a\n";
This prints:
this is a SUcCESS case
How can I make \w match national character sets?
See perllocale.
How can I match a locale−smart version of /[a−zA−Z]/?
One alphabetic character would be /[^\W\d_]/, no matter what locale you‘re in. Non−alphabetics would
be /[\W\d_]/ (assuming you don‘t consider an underscore a letter).
How can I quote a variable to use in a regexp?
The Perl parser will expand $variable and @variable references in regular expressions unless the
delimiter is a single quote. Remember, too, that the right−hand side of a s/// substitution is considered a
double−quoted string (see perlop for more details). Remember also that any regexp special characters will
be acted on unless you precede the substitution with \Q. Here‘s an example:
$string = "to die?";
$lhs = "die?";
$rhs = "sleep no more";
$string =~ s/\Q$lhs/$rhs/;
# $string is now "to sleep no more"
Without the \Q, the regexp would also spuriously match "di".
What is /o really for?
Using a variable in a regular expression match forces a re−evaluation (and perhaps recompilation) each time
through. The /o modifier locks in the regexp the first time it‘s used. This always happens in a constant
regular expression, and in fact, the pattern was compiled into the internal format at the same time your entire
program was.
Use of /o is irrelevant unless variable interpolation is used in the pattern, and if so, the regexp engine will
neither know nor care whether the variables change after the pattern is evaluated the very first time.
/o is often used to gain an extra measure of efficiency by not performing subsequent evaluations when you
know it won‘t matter (because you know the variables won‘t change), or more rarely, when you don‘t want
the regexp to notice if they do.
For example, here‘s a "paragrep" program:
$/ = ’’; # paragraph mode
$pat = shift;
while (<>) {
print if /$pat/o;
}
How do I use a regular expression to strip C style comments from a file?
While this actually can be done, it‘s much harder than you‘d think. For example, this one−liner
perl −0777 −pe ’s{/\*.*?\*/}{}gs’ foo.c
will work in many but not all cases. You see, it‘s too simple−minded for certain kinds of C programs, in
particular, those with what appear to be comments in quoted strings. For that, you‘d need something like
this, created by Jeffrey Friedl:
$/ = undef;
$_ = <>;
82
Version 5.005_02
18−Oct−1998
perlfaq6
Perl Programmers Reference Guide
perlfaq6
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|’(\\.|[^’\\])*’|\n+|.[^/"’\\]*)#$
print;
This could, of course, be more legibly written with the /x modifier, adding whitespace and comments.
Can I use Perl regular expressions to match balanced text?
Although Perl regular expressions are more powerful than "mathematical" regular expressions, because they
feature conveniences like backreferences (\1 and its ilk), they still aren‘t powerful enough. You still need to
use non−regexp techniques to parse balanced text, such as the text enclosed between matching parentheses or
braces, for example.
An elaborate subroutine (for 7−bit ASCII only) to pull out balanced and possibly nested single chars, like ‘
and ’, { and }, or ( and ) can be found in
http://www.perl.com/CPAN/authors/id/TOMC/scripts/pull_quotes.gz .
The C::Scan module from CPAN contains such subs for internal usage, but they are undocumented.
What does it mean that regexps are greedy? How can I get around it?
Most people mean that greedy regexps match as much as they can. Technically speaking, it‘s actually the
quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate
gratification to overall greed. To get non−greedy versions of the same quantifiers, use (??, *?, +?, {}?).
An example:
$s1 = $s2 = "I am very very cold";
$s1 =~ s/ve.*y //;
# I am cold
$s2 =~ s/ve.*?y //;
# I am very cold
Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier
effectively tells the regular expression engine to find a match as quickly as possible and pass control on to
whatever is next in line, like you would if you were playing hot potato.
How do I process each word on each line?
Use the split function:
while (<>) {
foreach $word ( split ) {
# do something with $word here
}
}
Note that this isn‘t really a word in the English sense; it‘s just chunks of consecutive non−whitespace
characters.
To work with only alphanumeric sequences, you might consider
while (<>) {
foreach $word (m/(\w+)/g) {
# do something with $word here
}
}
How can I print out a word−frequency or line−frequency summary?
To do this, you have to parse out each word in the input stream. We‘ll pretend that by word you mean chunk
of alphabetics, hyphens, or apostrophes, rather than the non−whitespace chunk idea of a word given in the
previous question:
while (<>) {
while ( /(\b[^\W_\d][\w’−]+\b)/g ) {
$seen{$1}++;
}
18−Oct−1998
Version 5.005_02
# misses "‘sheep’"
83
perlfaq6
Perl Programmers Reference Guide
perlfaq6
}
while ( ($word, $count) = each %seen ) {
print "$count $word\n";
}
If you wanted to do the same thing for lines, you wouldn‘t need a regular expression:
while (<>) {
$seen{$_}++;
}
while ( ($line, $count) = each %seen ) {
print "$count $line";
}
If you want these output in a sorted order, see the section on Hashes.
How can I do approximate matching?
See the module String::Approx available from CPAN.
How do I efficiently match many regular expressions at once?
The following is super−inefficient:
while () {
foreach $pat (@patterns) {
if ( /$pat/ ) {
# do something
}
}
}
Instead, you either need to use one of the experimental Regexp extension modules from CPAN (which might
well be overkill for your purposes), or else put together something like this, inspired from a routine in Jeffrey
Friedl‘s book:
sub _bm_build {
my $condition = shift;
my @regexp = @_; # this MUST not be local(); need my()
my $expr = join $condition => map { "m/\$regexp[$_]/o" } (0..$#regexp);
my $match_func = eval "sub { $expr }";
die if $@; # propagate $@; this shouldn’t happen!
return $match_func;
}
sub bm_and { _bm_build(’&&’, @_) }
sub bm_or { _bm_build(’||’, @_) }
$f1 = bm_and qw{
xterm
(?i)window
};
$f2 = bm_or qw{
\b[Ff]ree\b
\bBSD\B
(?i)sys(tem)?\s*[V5]\b
};
# feed me /etc/termcap, prolly
while ( <> ) {
print "1: $_" if &$f1;
84
Version 5.005_02
18−Oct−1998
perlfaq6
Perl Programmers Reference Guide
perlfaq6
print "2: $_" if &$f2;
}
Why don‘t word−boundary searches with \b work for me?
Two common misconceptions are that \b is a synonym for \s+, and that it‘s the edge between whitespace
characters and non−whitespace characters. Neither is correct. \b is the place between a \w character and a
\W character (that is, \b is the edge of a "word"). It‘s a zero−width assertion, just like ^, $, and all the
other anchors, so it doesn‘t consume any characters. perlre describes the behaviour of all the regexp
metacharacters.
Here are examples of the incorrect application of \b, with fixes:
"two words" =~ /(\w+)\b(\w+)/;
"two words" =~ /(\w+)\s+(\w+)/;
# WRONG
# right
" =matchless= text" =~ /\b=(\w+)=\b/;
" =matchless= text" =~ /=(\w+)=/;
# WRONG
# right
Although they may not do what you thought they did, \b and \B can still be quite useful. For an example of
the correct use of \b, see the example of matching duplicate words over multiple lines.
An example of using \B is the pattern \Bis\B. This will find occurrences of "is" on the insides of words
only, as in "thistle", but not "this" or "island".
Why does using $&, $‘, or $’ slow my program down?
Because once Perl sees that you need one of these variables anywhere in the program, it has to provide them
on each and every pattern match. The same mechanism that handles these provides for the use of $1, $2,
etc., so you pay the same price for each regexp that contains capturing parentheses. But if you never use $&,
etc., in your script, then regexps without capturing parentheses won‘t be penalized. So avoid $&, $‘, and
$‘ if you can, but if you can‘t (and some algorithms really appreciate them), once you‘ve used them once,
use them at will, because you‘ve already paid the price.
What good is \G in a regular expression?
The notation \G is used in a match or substitution in conjunction the /g modifier (and ignored if there‘s no
/g) to anchor the regular expression to the point just past where the last match occurred, i.e. the pos()
point.
For example, suppose you had a line of text quoted in standard mail and Usenet notation, (that is, with
leading > characters), and you want change each leading > into a corresponding :. You could do so in this
way:
s/^(>+)/’:’ x length($1)/gem;
Or, using \G, the much simpler (and faster):
s/\G>/:/g;
A more sophisticated use might involve a tokenizer. The following lex−like example is courtesy of Jeffrey
Friedl. It did not work in 5.003 due to bugs in that release, but does work in 5.004 or better. (Note the use of
/c, which prevents a failed match with /g from resetting the search position back to the beginning of the
string.)
while (<>) {
chomp;
PARSER: {
m/ \G(
m/ \G(
m/ \G(
m/ \G(
}
}
18−Oct−1998
\d+\b
\w+
\s+
[^\w\d]+
)/gcx
)/gcx
)/gcx
)/gcx
&&
&&
&&
&&
do
do
do
do
Version 5.005_02
{
{
{
{
print
print
print
print
"number:
"word:
"space:
"other:
$1\n";
$1\n";
$1\n";
$1\n";
redo;
redo;
redo;
redo;
85
};
};
};
};
perlfaq6
Perl Programmers Reference Guide
perlfaq6
Of course, that could have been written as
while (<>) {
chomp;
PARSER: {
if ( /\G( \d+\b
)/gcx {
print "number: $1\n";
redo PARSER;
}
if ( /\G( \w+
)/gcx {
print "word: $1\n";
redo PARSER;
}
if ( /\G( \s+
)/gcx {
print "space: $1\n";
redo PARSER;
}
if ( /\G( [^\w\d]+ )/gcx {
print "other: $1\n";
redo PARSER;
}
}
}
But then you lose the vertical alignment of the regular expressions.
Are Perl regexps DFAs or NFAs? Are they POSIX compliant?
While it‘s true that Perl‘s regular expressions resemble the DFAs (deterministic finite automata) of the
egrep(1) program, they are in fact implemented as NFAs (non−deterministic finite automata) to allow
backtracking and backreferencing. And they aren‘t POSIX−style either, because those guarantee worst−case
behavior for all cases. (It seems that some people prefer guarantees of consistency, even when what‘s
guaranteed is slowness.) See the book "Mastering Regular Expressions" (from O‘Reilly) by Jeffrey Friedl
for all the details you could ever hope to know on these matters (a full citation appears in perlfaq2).
What‘s wrong with using grep or map in a void context?
Both grep and map build a return list, regardless of their context. This means you‘re making Perl go to the
trouble of building up a return list that you then just ignore. That‘s no way to treat a programming language,
you insensitive scoundrel!
How can I match strings with multibyte characters?
This is hard, and there‘s no good way. Perl does not directly support wide characters. It pretends that a byte
and a character are synonymous. The following set of approaches was offered by Jeffrey Friedl, whose
article in issue #5 of The Perl Journal talks about this very matter.
Let‘s suppose you have some weird Martian encoding where pairs of ASCII uppercase letters encode single
Martian letters (i.e. the two bytes "CV" make a single Martian letter, as do the two bytes "SG", "VS", "XX",
etc.). Other bytes represent single characters, just like ASCII.
So, the string of Martian "I am CVSGXX!" uses 12 bytes to encode the nine characters ‘I‘, ’ ‘, ‘a‘, ‘m‘, ’ ‘,
‘CV‘, ‘SG‘, ‘XX‘, ‘!’.
Now, say you want to search for the single character /GX/. Perl doesn‘t know about Martian, so it‘ll find the
two bytes "GX" in the "I am CVSGXX!" string, even though that character isn‘t there: it just looks like it is
because "SG" is next to "XX", but there‘s no real "GX". This is a big problem.
Here are a few ways, all painful, to deal with it:
$martian =~ s/([A−Z][A−Z])/ $1 /g; # Make sure adjacent ‘‘martian’’ bytes
# are no longer adjacent.
86
Version 5.005_02
18−Oct−1998
perlfaq6
Perl Programmers Reference Guide
perlfaq6
print "found GX!\n" if $martian =~ /GX/;
Or like this:
@chars = $martian =~ m/([A−Z][A−Z]|[^A−Z])/g;
# above is conceptually similar to:
@chars = $text =~ m/(.)/g;
#
foreach $char (@chars) {
print "found GX!\n", last if $char eq ’GX’;
}
Or like this:
while ($martian =~ m/\G([A−Z][A−Z]|.)/gs) { # \G probably unneeded
print "found GX!\n", last if $1 eq ’GX’;
}
Or like this:
die "sorry, Perl doesn’t (yet) have Martian support )−:\n";
In addition, a sample program which converts half−width to full−width katakana (in Shift−JIS or EUC
encoding) is available from CPAN as
=for Tom make it so
There are many double− (and multi−) byte encodings commonly used these days. Some versions of these
have 1−, 2−, 3−, and 4−byte characters, all mixed.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You
are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit would be courteous but is not required.
18−Oct−1998
Version 5.005_02
87
perlfaq7
Perl Programmers Reference Guide
perlfaq7
NAME
perlfaq7 − Perl Language Issues ($Revision: 1.21 $, $Date: 1998/06/22 15:20:07 $)
DESCRIPTION
This section deals with general Perl language issues that don‘t clearly fit into any of the other sections.
Can I get a BNF/yacc/RE for the Perl language?
There is no BNF, but you can paw your way through the yacc grammar in perly.y in the source distribution if
you‘re particularly brave. The grammar relies on very smart tokenizing code, so be prepared to venture into
toke.c as well.
In the words of Chaim Frenkel: "Perl‘s grammar can not be reduced to BNF. The work of parsing perl is
distributed between yacc, the lexer, smoke and mirrors."
What are all these $@%* punctuation signs, and how do I know when to use them?
They are type specifiers, as detailed in perldata:
$
@
%
*
for scalar values (number, string or reference)
for arrays
for hashes (associative arrays)
for all types of that symbol name. In version 4 you used them like
pointers, but in modern perls you can just use references.
While there are a few places where you don‘t actually need these type specifiers, you should always use
them.
A couple of others that you‘re likely to encounter that aren‘t really type specifiers are:
<> are used for inputting a record from a filehandle.
\ takes a reference to something.
Note that is neither the type specifier for files nor the name of the handle. It is the <> operator
applied to the handle FILE. It reads one line (well, record − see $/) from the handle FILE in scalar context,
or all lines in list context. When performing open, close, or any other operation besides <> on files, or even
talking about the handle, do not use the brackets. These are correct: eof(FH), seek(FH, 0, 2) and
"copying from STDIN to FILE".
Do I always/never have to quote my strings or use semicolons and commas?
Normally, a bareword doesn‘t need to be quoted, but in most cases probably should be (and must be under
use strict). But a hash key consisting of a simple word (that isn‘t the name of a defined subroutine)
and the left−hand operand to the => operator both count as though they were quoted:
This
−−−−−−−−−−−−
$foo{line}
bar => stuff
is like this
−−−−−−−−−−−−−−−
$foo{"line"}
"bar" => stuff
The final semicolon in a block is optional, as is the final comma in a list. Good style (see perlstyle) says to
put them in except for one−liners:
if ($whoops) { exit 1 }
@nums = (1, 2, 3);
if ($whoops) {
exit 1;
}
@lines = (
"There Beren came from mountains cold",
"And lost he wandered under leaves",
);
88
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
How do I skip some return values?
One way is to treat the return values as a list and index into it:
$dir = (getpwnam($user))[7];
Another way is to use undef as an element on the left−hand−side:
($dev, $ino, undef, undef, $uid, $gid) = stat($file);
How do I temporarily block warnings?
The $^W variable (documented in perlvar) controls runtime warnings for a block:
{
local $^W = 0;
$a = $b + $c;
# temporarily turn off warnings
# I know these might be undef
}
Note that like all the punctuation variables, you cannot currently use my() on $^W, only local().
A new use warnings pragma is in the works to provide finer control over all this. The curious should
check the perl5−porters mailing list archives for details.
What‘s an extension?
A way of calling compiled C code from Perl. Reading perlxstut is a good place to learn more about
extensions.
Why do Perl operators have different precedence than C operators?
Actually, they don‘t. All C operators that Perl copies have the same precedence in Perl as they do in C. The
problem is with operators that C doesn‘t have, especially functions that give a list context to everything on
their right, eg print, chmod, exec, and so on. Such functions are called "list operators" and appear as such in
the precedence table in perlop.
A common mistake is to write:
unlink $file || die "snafu";
This gets interpreted as:
unlink ($file || die "snafu");
To avoid this problem, either put in extra parentheses or use the super low precedence or operator:
(unlink $file) || die "snafu";
unlink $file or die "snafu";
The "English" operators (and, or, xor, and not) deliberately have precedence lower than that of list
operators for just such situations as the one above.
Another operator with surprising precedence is exponentiation. It binds more tightly even than unary minus,
making −2**2 product a negative not a positive four. It is also right−associating, meaning that 2**3**2 is
two raised to the ninth power, not eight squared.
Although it has the same precedence as in C, Perl‘s ?: operator produces an lvalue. This assigns $x to
either $a or $b, depending on the trueness of $maybe:
($maybe ? $a : $b) = $x;
How do I declare/create a structure?
In general, you don‘t "declare" a structure. Just use a (probably anonymous) hash reference. See perlref and
perldsc for details. Here‘s an example:
$person = {};
$person−>{AGE} = 24;
$person−>{NAME} = "Nat";
18−Oct−1998
# new anonymous hash
# set field AGE to 24
# set field NAME to "Nat"
Version 5.005_02
89
perlfaq7
Perl Programmers Reference Guide
perlfaq7
If you‘re looking for something a bit more rigorous, try perltoot.
How do I create a module?
A module is a package that lives in a file of the same name. For example, the Hello::There module would
live in Hello/There.pm. For details, read perlmod. You‘ll also find Exporter helpful. If you‘re writing a C
or mixed−language module with both C and Perl, then you should study perlxstut.
Here‘s a convenient template you might wish you use when starting your own module. Make sure to change
the names appropriately.
package Some::Module;
# assumes Some/Module.pm
use strict;
BEGIN {
use Exporter
use vars
();
qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
## set the version for version checking; uncomment to use
## $VERSION
= 1.00;
# if using RCS/CVS, this next line may be preferred,
# but beware two−digit versions.
$VERSION = do{my@r=q$Revision: 1.21 $=~/\d+/g;sprintf ’%d.’.’%02d’x$#r,@r};
@ISA
= qw(Exporter);
@EXPORT
= qw(&func1 &func2 &func3);
%EXPORT_TAGS = ( );
# eg: TAG => [ qw!name1 name2! ],
# your exported package globals go here,
# as well as any optionally exported functions
@EXPORT_OK
= qw($Var1 %Hashit);
}
use vars
@EXPORT_OK;
# non−exported package globals go here
use vars
qw( @more $stuff );
# initialize package globals, first exported ones
$Var1
= ’’;
%Hashit = ();
# then the others (which are still accessible as $Some::Module::stuff)
$stuff = ’’;
@more
= ();
# all file−scoped lexicals must be created before
# the functions below that use them.
# file−private lexicals go here
my $priv_var
= ’’;
my %secret_hash = ();
# here’s a file−private function as a closure,
# callable as &$priv_func; it cannot be prototyped.
my $priv_func = sub {
# stuff goes here.
};
# make all your functions, whether exported or not;
# remember to put something interesting in the {} stubs
sub func1
{}
# no prototype
90
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
sub func2() # {}
proto’d void
sub func3($$)# proto’d
{}
to 2 scalars
# this one isn’t exported, but could be called!
sub func4(\%) {}
# proto’d to 1 hash ref
END { }
# module clean−up code here (global destructor)
1;
# modules must return true
How do I create a class?
See perltoot for an introduction to classes and objects, as well as perlobj and perlbot.
How can I tell if a variable is tainted?
See Laundering and Detecting Tainted Data in perlsec. Here‘s an example (which doesn‘t use any system
calls, because the kill() is given no processes to signal):
sub is_tainted {
return ! eval { join(’’,@_), kill 0; 1; };
}
This is not −w clean, however. There is no −w clean way to detect taintedness − take this as a hint that you
should untaint all possibly−tainted data.
What‘s a closure?
Closures are documented in perlref.
Closure is a computer science term with a precise but hard−to−explain meaning. Closures are implemented
in Perl as anonymous subroutines with lasting references to lexical variables outside their own scopes. These
lexicals magically refer to the variables that were around when the subroutine was defined (deep binding).
Closures make sense in any programming language where you can have the return value of a function be
itself a function, as you can in Perl. Note that some languages provide anonymous functions but are not
capable of providing proper closures; the Python language, for example. For more information on closures,
check out any textbook on functional programming. Scheme is a language that not only supports but
encourages closures.
Here‘s a classic function−generating function:
sub add_function_generator {
return sub { shift + shift };
}
$add_sub = add_function_generator();
$sum = $add_sub−>(4,5);
# $sum is 9 now.
The closure works as a function template with some customization slots left out to be filled later. The
anonymous subroutine returned by add_function_generator() isn‘t technically a closure because it
refers to no lexicals outside its own scope.
Contrast this with the following make_adder() function, in which the returned anonymous function
contains a reference to a lexical variable outside the scope of that function itself. Such a reference requires
that Perl return a proper closure, thus locking in for all time the value that the lexical had when the function
was created.
sub make_adder {
my $addpiece = shift;
return sub { shift + $addpiece };
}
$f1 = make_adder(20);
$f2 = make_adder(555);
18−Oct−1998
Version 5.005_02
91
perlfaq7
Perl Programmers Reference Guide
perlfaq7
Now &$f1($n) is always 20 plus whatever $n you pass in, whereas &$f2($n) is always 555 plus
whatever $n you pass in. The $addpiece in the closure sticks around.
Closures are often used for less esoteric purposes. For example, when you want to pass in a bit of code into
a function:
my $line;
timeout( 30, sub { $line = } );
If the code to execute had been passed in as a string, ‘$line = ’, there would have been no
way for the hypothetical timeout() function to access the lexical variable $line back in its caller‘s
scope.
What is variable suicide and how can I prevent it?
Variable suicide is when you (temporarily or permanently) lose the value of a variable. It is caused by
scoping through my() and local() interacting with either closures or aliased foreach() interator
variables and subroutine arguments. It used to be easy to inadvertently lose a variable‘s value this way, but
now it‘s much harder. Take this code:
my $f = "foo";
sub T {
while ($i++ < 3) { my $f = $f; $f .= "bar"; print $f, "\n" }
}
T;
print "Finally $f\n";
The $f that has "bar" added to it three times should be a new $f (my $f should create a new local variable
each time through the loop). It isn‘t, however. This is a bug, and will be fixed.
How can I pass/return a {Function, FileHandle, Array, Hash, Method, Regexp}?
With the exception of regexps, you need to pass references to these objects. See
Pass by Reference in perlsub for this particular question, and perlref for information on references.
Passing Variables and Functions
Regular variables and functions are quite easy: just pass in a reference to an existing or anonymous
variable or function:
func( \$some_scalar );
func( \$some_array
func( [ 1 .. 10 ]
);
);
func( \%some_hash
);
func( { this => 10, that => 20 }
func( \&some_func
);
func( sub { $_[0] ** $_[1] }
);
);
Passing Filehandles
To pass filehandles to subroutines, use the *FH or \*FH notations. These are "typeglobs" − see
Typeglobs and Filehandles in perldata and especially Pass by Reference in perlsub for more
information.
Here‘s an excerpt:
If you‘re passing around filehandles, you could usually just use the bare typeglob, like *STDOUT, but
typeglobs references would be better because they‘ll still work properly under use strict
‘refs’. For example:
splutter(\*STDOUT);
sub splutter {
my $fh = shift;
92
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
print $fh "her um well a hmmm\n";
}
$rec = get_rec(\*STDIN);
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
If you‘re planning on generating new filehandles, you could do this:
sub openit {
my $name = shift;
local *FH;
return open (FH, $path) ? *FH : undef;
}
$fh = openit(’< /etc/motd’);
print <$fh>;
Passing Regexps
To pass regexps around, you‘ll need to either use one of the highly experimental regular expression
modules from CPAN (Nick Ing−Simmons‘s Regexp or Ilya Zakharevich‘s Devel::Regexp), pass
around strings and use an exception−trapping eval, or else be be very, very clever. Here‘s an example
of how to pass in a string to be regexp compared:
sub compare($$) {
my ($val1, $regexp) = @_;
my $retval = eval { $val =~ /$regexp/ };
die if $@;
return $retval;
}
$match = compare("old McDonald", q/d.*D/);
Make sure you never say something like this:
return eval "\$val =~ /$regexp/";
# WRONG
or someone can sneak shell escapes into the regexp due to the double interpolation of the eval and the
double−quoted string. For example:
$pattern_of_evil = ’danger ${ system("rm −rf * &") } danger’;
eval "\$string =~ /$pattern_of_evil/";
Those preferring to be very, very clever might see the O‘Reilly book, Mastering Regular Expressions,
by Jeffrey Friedl. Page 273‘s Build_MatchMany_Function() is particularly interesting. A
complete citation of this book is given in perlfaq2.
Passing Methods
To pass an object method into a subroutine, you can do this:
call_a_lot(10, $some_obj, "methname")
sub call_a_lot {
my ($count, $widget, $trick) = @_;
for (my $i = 0; $i < $count; $i++) {
$widget−>$trick();
}
}
Or you can use a closure to bundle up the object and its method call and arguments:
18−Oct−1998
Version 5.005_02
93
perlfaq7
Perl Programmers Reference Guide
perlfaq7
my $whatnot = sub { $some_obj−>obfuscate(@args) };
func($whatnot);
sub func {
my $code = shift;
&$code();
}
You could also investigate the can() method in the UNIVERSAL class (part of the standard perl
distribution).
How do I create a static variable?
As with most things in Perl, TMTOWTDI. What is a "static variable" in other languages could be either a
function−private variable (visible only within a single function, retaining its value between calls to that
function), or a file−private variable (visible only to functions within the file it was declared in) in Perl.
Here‘s code to implement a function−private variable:
BEGIN {
my $counter = 42;
sub prev_counter { return −−$counter }
sub next_counter { return $counter++ }
}
Now prev_counter() and next_counter() share a private variable $counter that was initialized
at compile time.
To declare a file−private variable, you‘ll still use a my(), putting it at the outer scope level at the top of the
file. Assume this is in file Pax.pm:
package Pax;
my $started = scalar(localtime(time()));
sub begun { return $started }
When use Pax or require Pax loads this module, the variable will be initialized. It won‘t get
garbage−collected the way most variables going out of scope do, because the begun() function cares about
it, but no one else can get it. It is not called $Pax::started because its scope is unrelated to the package.
It‘s scoped to the file. You could conceivably have several packages in that same file all accessing the same
private variable, but another file with the same package couldn‘t get to it.
See Peristent Private Variables in perlsub for details.
What‘s the difference between dynamic and lexical (static) scoping? Between local() and
my()?
local($x) saves away the old value of the global variable $x, and assigns a new value for the duration
of the subroutine, which is visible in other functions called from that subroutine. This is done at run−time,
so is called dynamic scoping. local() always affects global variables, also called package variables or
dynamic variables.
my($x) creates a new variable that is only visible in the current subroutine. This is done at compile−time,
so is called lexical or static scoping. my() always affects private variables, also called lexical variables or
(improperly) static(ly scoped) variables.
For instance:
sub visible {
print "var has value $var\n";
}
sub dynamic {
local $var = ’local’;
visible();
94
# new temporary value for the still−global
#
variable called $var
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
}
sub lexical {
my $var = ’private’;
visible();
}
# new private variable, $var
# (invisible outside of sub scope)
$var = ’global’;
visible();
dynamic();
lexical();
# prints global
# prints local
# prints global
Notice how at no point does the value "private" get printed. That‘s because $var only has that value within
the block of the lexical() function, and it is hidden from called subroutine.
In summary, local() doesn‘t make what you think of as private, local variables. It gives a global variable
a temporary value. my() is what you‘re looking for if you want private variables.
See "Private Variables via my()" and "Temporary Values via local()" for excruciating details.
How can I access a dynamic variable while a similarly named lexical is in scope?
You can do this via symbolic references, provided you haven‘t set use strict "refs". So instead of
$var, use ${‘var‘}.
local $var = "global";
my
$var = "lexical";
print "lexical is $var\n";
no strict ’refs’;
print "global is ${’var’}\n";
If you know your package, you can just mention it explicitly, as in $Some_Pack::var. Note that the
notation $::var is not the dynamic $var in the current package, but rather the one in the main package,
as though you had written $main::var. Specifying the package directly makes you hard−code its name,
but it executes faster and avoids running afoul of use strict "refs".
What‘s the difference between deep and shallow binding?
In deep binding, lexical variables mentioned in anonymous subroutines are the same ones that were in scope
when the subroutine was created. In shallow binding, they are whichever variables with the same names
happen to be in scope when the subroutine is called. Perl always uses deep binding of lexical variables (i.e.,
those created with my()). However, dynamic variables (aka global, local, or package variables) are
effectively shallowly bound. Consider this just one more reason not to use them. See the answer to
"What‘s a closure?".
Why doesn‘t "my($foo) = read operation, like so many of
Perl‘s functions and operators, can tell which context it was called in and behaves appropriately. In general,
the scalar() function can help. This function does nothing to the data itself (contrary to popular myth) but
rather tells its argument to behave in whatever its scalar fashion is. If that function doesn‘t have a defined
scalar behavior, this of course doesn‘t help you (such as with sort()).
To enforce scalar context in this particular case, however, you need merely omit the parentheses:
local($foo) = ;
local($foo) = scalar();
local $foo = ;
# WRONG
# ok
# right
You should probably be using lexical variables anyway, although the issue is the same here:
my($foo) = ;
18−Oct−1998
# WRONG
Version 5.005_02
95
perlfaq7
Perl Programmers Reference Guide
my $foo
= ;
perlfaq7
# right
How do I redefine a builtin function, operator, or method?
Why do you want to do that? :−)
If you want to override a predefined function, such as open(), then you‘ll have to import the new definition
from a different module. See Overriding Builtin Functions in perlsub. There‘s also an example in
Class::Template in perltoot.
If you want to overload a Perl operator, such as + or **, then you‘ll want to use the use overload
pragma, documented in overload.
If you‘re talking about obscuring method calls in parent classes, see Overridden Methods in perltoot.
What‘s the difference between calling a function as &foo and foo()?
When you call a function as &foo, you allow that function access to your current @_ values, and you
by−pass prototypes. That means that the function doesn‘t get an empty @_, it gets yours! While not strictly
speaking a bug (it‘s documented that way in perlsub), it would be hard to consider this a feature in most
cases.
When you call your function as &foo(), then you do get a new @_, but prototyping is still circumvented.
Normally, you want to call a function using foo(). You may only omit the parentheses if the function is
already known to the compiler because it already saw the definition (use but not require), or via a
forward reference or use subs declaration. Even in this case, you get a clean @_ without any of the old
values leaking through where they don‘t belong.
How do I create a switch or case statement?
This is explained in more depth in the perlsyn. Briefly, there‘s no official case statement, because of the
variety of tests possible in Perl (numeric comparison, string comparison, glob comparison, regexp matching,
overloaded comparisons, ...). Larry couldn‘t decide how best to do this, so he left it out, even though it‘s
been on the wish list since perl1.
The general answer is to write a construct like this:
for ($variable_to_test) {
if
(/pat1/) { }
elsif (/pat2/) { }
elsif (/pat3/) { }
else
{ }
}
#
#
#
#
do something
do something else
do something else
default
Here‘s a simple example of a switch based on pattern matching, this time lined up in a way to make it look
more like a switch statement. We‘ll do a multi−way conditional based on the type of reference stored in
$whatchamacallit:
SWITCH: for (ref $whatchamacallit) {
/^$/
&& die "not a reference";
/SCALAR/
&& do {
print_scalar($$ref);
last SWITCH;
};
/ARRAY/
&& do {
print_array(@$ref);
last SWITCH;
};
/HASH/
&& do {
print_hash(%$ref);
96
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
last SWITCH;
};
/CODE/
&& do {
warn "can’t print function ref";
last SWITCH;
};
# DEFAULT
warn "User defined type skipped";
}
See perlsyn/"Basic BLOCKs and Switch Statements" for many other examples in this style.
Sometimes you should change the positions of the constant and the variable. For example, let‘s say you
wanted to test which of many answers you were given, but in a case−insensitive way that also allows
abbreviations. You can use the following technique if the strings all start with different characters, or if you
want to arrange the matches so that one takes precedence over another, as "SEND" has precedence over
"STOP" here:
chomp($answer = <>);
if
("SEND" =~ /^\Q$answer/i)
elsif ("STOP" =~ /^\Q$answer/i)
elsif ("ABORT" =~ /^\Q$answer/i)
elsif ("LIST" =~ /^\Q$answer/i)
elsif ("EDIT" =~ /^\Q$answer/i)
{
{
{
{
{
print
print
print
print
print
"Action
"Action
"Action
"Action
"Action
is
is
is
is
is
send\n"
stop\n"
abort\n"
list\n"
edit\n"
}
}
}
}
}
A totally different approach is to create a hash of function references.
my %commands =
"happy" =>
"sad", =>
"done" =>
"mad"
=>
);
(
\&joy,
\&sullen,
sub { die "See ya!" },
\&angry,
print "How are you? ";
chomp($string = );
if ($commands{$string}) {
$commands{$string}−>();
} else {
print "No such command: $string\n";
}
How can I catch accesses to undefined variables/functions/methods?
The AUTOLOAD method, discussed in Autoloading in perlsub and
AUTOLOAD: Proxy Methods in perltoot, lets you capture calls to undefined functions and methods.
When it comes to undefined variables that would trigger a warning under −w, you can use a handler to trap
the pseudo−signal __WARN__ like this:
$SIG{__WARN__} = sub {
for ( $_[0] ) {
# voici un switch statement
/Use of uninitialized value/ && do {
# promote warning to a fatal
die $_;
};
18−Oct−1998
Version 5.005_02
97
perlfaq7
Perl Programmers Reference Guide
perlfaq7
# other warning cases to catch could go here;
warn $_;
}
};
Why can‘t a method included in this same file be found?
Some possible reasons: your inheritance is getting confused, you‘ve misspelled the method name, or the
object is of the wrong type. Check out perltoot for details on these. You may also use print
ref($object) to find out the class $object was blessed into.
Another possible reason for problems is because you‘ve used the indirect object syntax (eg, find Guru
"Samy") on a class name before Perl has seen that such a package exists. It‘s wisest to make sure your
packages are all defined before you start using them, which will be taken care of if you use the use
statement instead of require. If not, make sure to use arrow notation (eg, Guru−>find("Samy"))
instead. Object notation is explained in perlobj.
Make sure to read about creating modules in perlmod and the perils of indirect objects in
WARNING in perlobj.
How can I find out my current package?
If you‘re just a random program, you can do this to find out what the currently compiled package is:
my $packname = __PACKAGE__;
But if you‘re a method and you want to print an error message that includes the kind of object you were
called on (which is not necessarily the same as the one in which you were compiled):
sub amethod {
my $self = shift;
my $class = ref($self) || $self;
warn "called me from a $class object";
}
How can I comment out a large block of perl code?
Use embedded POD to discard it:
# program is here
=for nobody
This paragraph is commented out
# program continues
=begin comment text
all of this stuff
here will be ignored
by everyone
=end comment text
=cut
This can‘t go just anywhere. You have to put a pod directive where the parser is expecting a new statement,
not just in the middle of an expression or some other arbitrary yacc grammar production.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
98
Version 5.005_02
18−Oct−1998
perlfaq7
Perl Programmers Reference Guide
perlfaq7
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You
are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit would be courteous but is not required.
18−Oct−1998
Version 5.005_02
99
perlfaq8
Perl Programmers Reference Guide
perlfaq8
NAME
perlfaq8 − System Interaction ($Revision: 1.26 $, $Date: 1998/08/05 12:20:28 $)
DESCRIPTION
This section of the Perl FAQ covers questions involving operating system interaction. This involves
interprocess communication (IPC), control over the user−interface (keyboard, screen and pointing devices),
and most anything else not related to data manipulation.
Read the FAQs and documentation specific to the port of perl to your operating system (eg, perlvms,
perlplan9, ...). These should contain more detailed information on the vagaries of your perl.
How do I find out which operating system I‘m running under?
The $^O variable ($OSNAME if you use English) contains the operating system that your perl binary was
built for.
How come exec() doesn‘t return?
Because that‘s what it does: it replaces your currently running program with a different one. If you want to
keep going (as is probably the case if you‘re asking this question) use system() instead.
How do I do fancy stuff with the keyboard/screen/mouse?
How you access/control keyboards, screens, and pointing devices ("mice") is system−dependent. Try the
following modules:
Keyboard
Term::Cap
Term::ReadKey
Term::ReadLine::Gnu
Term::ReadLine::Perl
Term::Screen
Standard perl distribution
CPAN
CPAN
CPAN
CPAN
Term::Cap
Curses
Term::ANSIColor
Standard perl distribution
CPAN
CPAN
Tk
CPAN
Screen
Mouse
Some of these specific cases are shown below.
How do I print something out in color?
In general, you don‘t, because you don‘t know whether the recipient has a color−aware display device. If
you know that they have an ANSI terminal that understands color, you can use the Term::ANSIColor module
from CPAN:
use Term::ANSIColor;
print color("red"), "Stop!\n", color("reset");
print color("green"), "Go!\n", color("reset");
Or like this:
use Term::ANSIColor qw(:constants);
print RED, "Stop!\n", RESET;
print GREEN, "Go!\n", RESET;
How do I read just one key without waiting for a return key?
Controlling input buffering is a remarkably system−dependent matter. If most systems, you can just use the
stty command as shown in getc, but as you see, that‘s already getting you into portability snags.
100
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
open(TTY, "+/dev/tty 2>&1";
$key = getc(TTY);
# perhaps this works
# OR ELSE
sysread(TTY, $key, 1);
# probably this does
system "stty −cbreak /dev/tty 2>&1";
The Term::ReadKey module from CPAN offers an easy−to−use interface that should be more efficient than
shelling out to stty for each key. It even includes limited support for Windows.
use Term::ReadKey;
ReadMode(’cbreak’);
$key = ReadKey(0);
ReadMode(’normal’);
However, that requires that you have a working C compiler and can use it to build and install a CPAN
module. Here‘s a solution using the standard POSIX module, which is already on your systems (assuming
your system supports POSIX).
use HotKey;
$key = readkey();
And here‘s the HotKey module, which hides the somewhat mystifying calls to manipulate the POSIX
termios structures.
# HotKey.pm
package HotKey;
@ISA = qw(Exporter);
@EXPORT = qw(cbreak cooked readkey);
use strict;
use POSIX qw(:termios_h);
my ($term, $oterm, $echo, $noecho, $fd_stdin);
$fd_stdin = fileno(STDIN);
$term
= POSIX::Termios−>new();
$term−>getattr($fd_stdin);
$oterm
= $term−>getlflag();
$echo
$noecho
= ECHO | ECHOK | ICANON;
= $oterm & ~$echo;
sub cbreak {
$term−>setlflag($noecho); # ok, so i don’t want echo either
$term−>setcc(VTIME, 1);
$term−>setattr($fd_stdin, TCSANOW);
}
sub cooked {
$term−>setlflag($oterm);
$term−>setcc(VTIME, 0);
$term−>setattr($fd_stdin, TCSANOW);
}
sub readkey {
my $key = ’’;
cbreak();
sysread(STDIN, $key, 1);
cooked();
return $key;
18−Oct−1998
Version 5.005_02
101
perlfaq8
Perl Programmers Reference Guide
perlfaq8
}
END { cooked() }
1;
How do I check whether input is ready on the keyboard?
The easiest way to do this is to read a key in nonblocking mode with the Term::ReadKey module from
CPAN, passing it an argument of −1 to indicate not to block:
use Term::ReadKey;
ReadMode(’cbreak’);
if (defined ($char = ReadKey(−1)) ) {
# input was waiting and it was $char
} else {
# no input was waiting
}
ReadMode(’normal’);
# restore normal tty settings
How do I clear the screen?
If you only have to so infrequently, use system:
system("clear");
If you have to do this a lot, save the clear string so you can print it 100 times without calling a program 100
times:
$clear_string = ‘clear‘;
print $clear_string;
If you‘re planning on doing other screen manipulations, like cursor positions, etc, you might wish to use
Term::Cap module:
use Term::Cap;
$terminal = Term::Cap−>Tgetent( {OSPEED => 9600} );
$clear_string = $terminal−>Tputs(’cl’);
How do I get the screen size?
If you have Term::ReadKey module installed from CPAN, you can use it to fetch the width and height in
characters and in pixels:
use Term::ReadKey;
($wchar, $hchar, $wpixels, $hpixels) = GetTerminalSize();
This is more portable than the raw ioctl, but not as illustrative:
require ’sys/ioctl.ph’;
die "no TIOCGWINSZ " unless defined &TIOCGWINSZ;
open(TTY, "+autoflush(1);
18−Oct−1998
Version 5.005_02
103
perlfaq8
Perl Programmers Reference Guide
perlfaq8
As mentioned in the previous item, this still doesn‘t work when using socket I/O between Unix and
Macintosh. You‘ll need to hardcode your line terminators, in that case.
non−blocking input
If you are doing a blocking read() or sysread(), you‘ll have to arrange for an alarm handler to
provide a timeout (see alarm). If you have a non−blocking open, you‘ll likely have a non−blocking
read, which means you may have to use a 4−arg select() to determine whether I/O is ready on that
device (see select in perlfunc.
While trying to read from his caller−id box, the notorious Jamie Zawinski &1");
# starting cu hoses /dev/tty’s stty settings, even when it has
# been opened on a pipe...
system("/bin/stty $stty");
$_ = ;
chop;
if ( !m/^Connected/ ) {
print STDERR "$0: cu printed ‘$_’ instead of ‘Connected’\n";
}
}
How do I decode encrypted password files?
You spend lots and lots of money on dedicated hardware, but this is bound to get you talked about.
Seriously, you can‘t if they are Unix password files − the Unix password system employs one−way
encryption. It‘s more like hashing than encryption. The best you can check is whether something else
hashes to the same string. You can‘t turn a hash back into the original string. Programs like Crack can
forcibly (and intelligently) try to guess passwords, but don‘t (can‘t) guarantee quick success.
If you‘re worried about users selecting bad passwords, you should proactively check when they try to change
their password (by modifying passwd(1), for example).
How do I start a process in the background?
You could use
system("cmd &")
or you could use fork as documented in fork in perlfunc, with further examples in perlipc. Some things to be
aware of, if you‘re on a Unix−like system:
STDIN, STDOUT, and STDERR are shared
Both the main process and the backgrounded one (the "child" process) share the same STDIN,
STDOUT and STDERR filehandles. If both try to access them at once, strange things can happen.
You may want to close or reopen these for the child. You can get around this with opening a pipe
(see open in perlfunc) but on some systems this means that the child process cannot outlive the parent.
Signals
You‘ll have to catch the SIGCHLD signal, and possibly SIGPIPE too. SIGCHLD is sent when the
backgrounded process finishes. SIGPIPE is sent when you write to a filehandle whose child process
has closed (an untrapped SIGPIPE can cause your program to silently die). This is not an issue with
system("cmd&").
104
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
Zombies
You have to be prepared to "reap" the child process when it finishes
$SIG{CHLD} = sub { wait };
See Signals in perlipc for other examples of code to do this. Zombies are not an issue with
system("prog &").
How do I trap control characters/signals?
You don‘t actually "trap" a control character. Instead, that character generates a signal which is sent to your
terminal‘s currently foregrounded process group, which you then trap in your process. Signals are
documented in Signals in perlipc and chapter 6 of the Camel.
Be warned that very few C libraries are re−entrant. Therefore, if you attempt to print() in a handler that
got invoked during another stdio operation your internal structures will likely be in an inconsistent state, and
your program will dump core. You can sometimes avoid this by using syswrite() instead of print().
Unless you‘re exceedingly careful, the only safe things to do inside a signal handler are: set a variable and
exit. And in the first case, you should only set a variable in such a way that malloc() is not called (eg, by
setting a variable that already has a value).
For example:
$Interrupted = 0;
# to ensure it has a value
$SIG{INT} = sub {
$Interrupted++;
syswrite(STDERR, "ouch\n", 5);
}
However, because syscalls restart by default, you‘ll find that if you‘re in a "slow" call, such as ,
read(), connect(), or wait(), that the only way to terminate them is by "longjumping" out; that is, by
raising an exception. See the time−out handler for a blocking flock() in Signals in perlipc or chapter 6 of
the Camel.
How do I modify the shadow password file on a Unix system?
If perl was installed correctly, and your shadow library was written properly, the getpw*() functions
described in perlfunc should in theory provide (read−only) access to entries in the shadow password file. To
change the file, make a new shadow password file (the format varies from system to system − see passwd(5)
for specifics) and use pwd_mkdb(8) to install it (see pwd_mkdb(5) for more details).
How do I set the time and date?
Assuming you‘re running under sufficient permissions, you should be able to set the system−wide date and
time by running the date(1) program. (There is no way to set the time and date on a per−process basis.) This
mechanism will work for Unix, MS−DOS, Windows, and NT; the VMS equivalent is set time.
However, if all you want to do is change your timezone, you can probably get away with setting an
environment variable:
$ENV{TZ} = "MST7MDT";
# unixish
$ENV{’SYS$TIMEZONE_DIFFERENTIAL’}="−5" # vms
system "trn comp.lang.perl.misc";
How can I sleep() or alarm() for under a second?
If you want finer granularity than the 1 second that the sleep() function provides, the easiest way is to use
the select() function as documented in select in perlfunc. If your system has itimers and syscall()
support, you can check out the old example in
http://www.perl.com/CPAN/doc/misc/ancient/tutorial/eg/itimers.pl .
18−Oct−1998
Version 5.005_02
105
perlfaq8
Perl Programmers Reference Guide
perlfaq8
How can I measure time under a second?
In general, you may not be able to. The Time::HiRes module (available from CPAN) provides this
functionality for some systems.
In general, you may not be able to. But if your system supports both the syscall() function in Perl as
well as a system call like gettimeofday(2), then you may be able to do something like this:
require ’sys/syscall.ph’;
$TIMEVAL_T = "LL";
$done = $start = pack($TIMEVAL_T, ());
syscall( &SYS_gettimeofday, $start, 0)) != −1
or die "gettimeofday: $!";
##########################
# DO YOUR OPERATION HERE #
##########################
syscall( &SYS_gettimeofday, $done, 0) != −1
or die "gettimeofday: $!";
@start = unpack($TIMEVAL_T, $start);
@done = unpack($TIMEVAL_T, $done);
# fix microseconds
for ($done[1], $start[1]) { $_ /= 1_000_000 }
$delta_time = sprintf "%.4f", ($done[0]
+ $done[1] )
−
($start[0] + $start[1] );
How can I do an atexit() or setjmp()/longjmp()? (Exception handling)
Release 5 of Perl added the END block, which can be used to simulate atexit(). Each package‘s END
block is called when the program or thread ends (see perlmod manpage for more details).
For example, you can use this to make sure your filter program managed to finish its output without filling
up the disk:
END {
close(STDOUT) || die "stdout close failed: $!";
}
The END block isn‘t called when untrapped signals kill the program, though, so if you use END blocks you
should also use
use sigtrap qw(die normal−signals);
Perl‘s exception−handling mechanism is its eval() operator. You can use eval() as setjmp and die()
as longjmp. For details of this, see the section on signals, especially the time−out handler for a blocking
flock() in Signals in perlipc and chapter 6 of the Camel.
If exception handling is all you‘re interested in, try the exceptions.pl library (part of the standard perl
distribution).
If you want the atexit() syntax (and an rmexit() as well), try the AtExit module available from
CPAN.
Why doesn‘t my sockets program work under System V (Solaris)? What does the error message
"Protocol not supported" mean?
Some Sys−V based systems, notably Solaris 2.X, redefined some of the standard socket constants. Since
these were constant across all architectures, they were often hardwired into perl code. The proper way to
106
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
deal with this is to "use Socket" to get the correct values.
Note that even though SunOS and Solaris are binary compatible, these values are different. Go figure.
How can I call my system‘s unique C functions from Perl?
In most cases, you write an external module to do it − see the answer to "Where can I learn about linking C
with Perl? [h2xs, xsubpp]". However, if the function is a system call, and your system supports
syscall(), you can use the syscall function (documented in perlfunc).
Remember to check the modules that came with your distribution, and CPAN as well − someone may
already have written a module to do it.
Where do I get the include files to do ioctl() or syscall()?
Historically, these would be generated by the h2ph tool, part of the standard perl distribution. This program
converts cpp(1) directives in C header files to files containing subroutine definitions, like
&SYS_getitimer, which you can use as arguments to your functions. It doesn‘t work perfectly, but it
usually gets most of the job done. Simple files like errno.h, syscall.h, and socket.h were fine, but the hard
ones like ioctl.h nearly always need to hand−edited. Here‘s how to install the *.ph files:
1.
2.
3.
become super−user
cd /usr/include
h2ph *.h */*.h
If your system supports dynamic loading, for reasons of portability and sanity you probably ought to use
h2xs (also part of the standard perl distribution). This tool converts C header files to Perl extensions. See
perlxstut for how to get started with h2xs.
If your system doesn‘t support dynamic loading, you still probably ought to use h2xs. See perlxstut and
ExtUtils::MakeMaker for more information (in brief, just use make perl instead of a plain make to rebuild
perl with a new static extension).
Why do setuid perl scripts complain about kernel problems?
Some operating systems have bugs in the kernel that make setuid scripts inherently insecure. Perl gives you
a number of options (described in perlsec) to work around such systems.
How can I open a pipe both to and from a command?
The IPC::Open2 module (part of the standard perl distribution) is an easy−to−use approach that internally
uses pipe(), fork(), and exec() to do the job. Make sure you read the deadlock warnings in its
documentation, though (see IPC::Open2). See
Bidirectional Communication with Another Process in perlipc and
Bidirectional Communication with Yourself in perlipc
You may also use the IPC::Open3 module (part of the standard perl distribution), but be warned that it has a
different order of arguments from IPC::Open2 (see IPC::Open3).
Why can‘t I get the output of a command with system()?
You‘re confusing the purpose of system() and backticks (‘‘). system() runs a command and returns
exit status information (as a 16 bit value: the low 7 bits are the signal the process died from, if any, and the
high 8 bits are the actual exit value). Backticks (‘‘) run a command and return what it sent to STDOUT.
$exit_status
= system("mail−users");
$output_string = ‘ls‘;
How can I capture STDERR from an external command?
There are three basic ways of running external commands:
system $cmd;
$output = ‘$cmd‘;
open (PIPE, "cmd |");
18−Oct−1998
# using system()
# using backticks (‘‘)
# using open()
Version 5.005_02
107
perlfaq8
Perl Programmers Reference Guide
perlfaq8
With system(), both STDOUT and STDERR will go the same place as the script‘s versions of these,
unless the command redirects them. Backticks and open() read only the STDOUT of your command.
With any of these, you can change file descriptors before the call:
open(STDOUT, ">logfile");
system("ls");
or you can use Bourne shell file−descriptor redirection:
$output = ‘$cmd 2>some_file‘;
open (PIPE, "cmd 2>some_file |");
You can also use file−descriptor redirection to make STDERR a duplicate of STDOUT:
$output = ‘$cmd 2>&1‘;
open (PIPE, "cmd 2>&1 |");
Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling
the shell to do the redirection. This doesn‘t work:
open(STDERR, ">&STDOUT");
$alloutput = ‘cmd args‘; # stderr still escapes
This fails because the open() makes STDERR go to where STDOUT was going at the time of the
open(). The backticks then make STDOUT go to a string, but don‘t change STDERR (which still goes to
the old STDOUT).
Note that you must use Bourne shell (sh(1)) redirection syntax in backticks, not csh(1)! Details on why
Perl‘s system() and backtick and pipe opens all use the Bourne shell are in
http://www.perl.com/CPAN/doc/FMTEYEWTK/versus/csh.whynot . To capture a command‘s STDERR and
STDOUT together:
$output = ‘cmd 2>&1‘;
$pid = open(PH, "cmd 2>&1 |");
while () { }
# either with backticks
# or with an open pipe
#
plus a read
To capture a command‘s STDOUT but discard its STDERR:
$output = ‘cmd 2>/dev/null‘;
$pid = open(PH, "cmd 2>/dev/null |");
while () { }
# either with backticks
# or with an open pipe
#
plus a read
To capture a command‘s STDERR but discard its STDOUT:
$output = ‘cmd 2>&1 1>/dev/null‘;
$pid = open(PH, "cmd 2>&1 1>/dev/null |");
while () { }
# either with backticks
# or with an open pipe
#
plus a read
To exchange a command‘s STDOUT and STDERR in order to capture the STDERR but leave its STDOUT
to come out our old STDERR:
$output = ‘cmd 3>&1 1>&2 2>&3 3>&−‘;
# either with backticks
$pid = open(PH, "cmd 3>&1 1>&2 2>&3 3>&−|");# or with an open pipe
while () { }
#
plus a read
To read both a command‘s STDOUT and its STDERR separately, it‘s easiest and safest to redirect them
separately to files, and then read from those files when the program is done:
system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
Ordering is important in all these examples. That‘s because the shell processes file descriptor redirections in
strictly left to right order.
108
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
system("prog args 1>tmpfile 2>&1");
system("prog args 2>&1 1>tmpfile");
The first command sends both standard out and standard error to the temporary file. The second command
sends only the old standard output there, and the old standard error shows up on the old standard out.
Why doesn‘t open() return an error when a pipe open fails?
It does, but probably not how you expect it to. On systems that follow the standard fork()/exec()
paradigm (such as Unix), it works like this: open() causes a fork(). In the parent, open() returns with
the process ID of the child. The child exec()s the command to be piped to/from. The parent can‘t know
whether the exec() was successful or not − all it can return is whether the fork() succeeded or not. To
find out if the command succeeded, you have to catch SIGCHLD and wait() to get the exit status. You
should also catch SIGPIPE if you‘re writing to the child — you may not have found out the exec() failed
by the time you write. This is documented in perlipc.
On systems that follow the spawn() paradigm, open() might do what you expect − unless perl uses a
shell to start your command. In this case the fork()/exec() description still applies.
What‘s wrong with using backticks in a void context?
Strictly speaking, nothing. Stylistically speaking, it‘s not a good way to write maintainable code because
backticks have a (potentially humungous) return value, and you‘re ignoring it. It‘s may also not be very
efficient, because you have to read in all the lines of output, allocate memory for them, and then throw it
away. Too often people are lulled to writing:
‘cp file file.bak‘;
And now they think "Hey, I‘ll just always use backticks to run programs." Bad idea: backticks are for
capturing a program‘s output; the system() function is for running programs.
Consider this line:
‘cat /etc/termcap‘;
You haven‘t assigned the output anywhere, so it just wastes memory (for a little while). Plus you forgot to
check $? to see whether the program even ran correctly. Even if you wrote
print ‘cat /etc/termcap‘;
In most cases, this could and probably should be written as
system("cat /etc/termcap") == 0
or die "cat program failed!";
Which will get the output quickly (as its generated, instead of only at the end) and also check the return
value.
system() also provides direct control over whether shell wildcard processing may take place, whereas
backticks do not.
How can I call backticks without shell processing?
This is a bit tricky. Instead of writing
@ok = ‘grep @opts ’$search_string’ @filenames‘;
You have to do this:
my @ok = ();
if (open(GREP, "−|")) {
while () {
chomp;
push(@ok, $_);
}
close GREP;
18−Oct−1998
Version 5.005_02
109
perlfaq8
Perl Programmers Reference Guide
perlfaq8
} else {
exec ’grep’, @opts, $search_string, @filenames;
}
Just as with system(), no shell escapes happen when you exec() a list.
There are more examples of this Safe Pipe Opens in perlipc.
Why can‘t my script read from STDIN after I gave it EOF (^D on Unix, ^Z on MS−DOS)?
Because some stdio‘s set error and eof flags that need clearing. The POSIX module defines clearerr()
that you can use. That is the technically correct way to do it. Here are some less reliable workarounds:
1
Try keeping around the seekpointer and go there, like this:
$where = tell(LOG);
seek(LOG, $where, 0);
2
If that doesn‘t work, try seeking to a different part of the file and then back.
3
If that doesn‘t work, try seeking to a different part of the file, reading something, and then seeking
back.
4
If that doesn‘t work, give up on your stdio package and use sysread.
How can I convert my shell script to perl?
Learn Perl and rewrite it. Seriously, there‘s no simple converter. Things that are awkward to do in the shell
are easy to do in Perl, and this very awkwardness is what would make a shell−perl converter nigh−on
impossible to write. By rewriting it, you‘ll think about what you‘re really trying to do, and hopefully will
escape the shell‘s pipeline datastream paradigm, which while convenient for some matters, causes many
inefficiencies.
Can I use perl to run a telnet or ftp session?
Try the Net::FTP, TCP::Client, and Net::Telnet modules (available from CPAN).
http://www.perl.com/CPAN/scripts/netstuff/telnet.emul.shar will also help for emulating the telnet protocol,
but Net::Telnet is quite probably easier to use..
If all you want to do is pretend to be telnet but don‘t need the initial telnet handshaking, then the standard
dual−process approach will suffice:
use IO::Socket;
# new in 5.004
$handle = IO::Socket::INET−>new(’www.perl.com:80’)
|| die "can’t connect to port 80 on www.perl.com: $!";
$handle−>autoflush(1);
if (fork()) {
# XXX: undef means failure
select($handle);
print while ;
# everything from stdin to socket
} else {
print while <$handle>; # everything from socket to stdout
}
close $handle;
exit;
How can I write expect in Perl?
Once upon a time, there was a library called chat2.pl (part of the standard perl distribution), which never
really got finished. If you find it somewhere, don‘t use it. These days, your best bet is to look at the Expect
module available from CPAN, which also requires two other modules from CPAN, IO::Pty and IO::Stty.
Is there a way to hide perl‘s command line from programs such as "ps"?
First of all note that if you‘re doing this for security reasons (to avoid people seeing passwords, for example)
then you should rewrite your program so that critical information is never given as an argument. Hiding the
arguments won‘t make your program completely secure.
110
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
To actually alter the visible command line, you can assign to the variable $0 as documented in perlvar. This
won‘t work on all operating systems, though. Daemon programs like sendmail place their state there, as in:
$0 = "orcus [accepting connections]";
I {changed directory, modified my environment} in a perl script. How come the change
disappeared when I exited the script? How do I get my changes to be visible?
Unix
In the strictest sense, it can‘t be done — the script executes as a different process from the shell it was
started from. Changes to a process are not reflected in its parent, only in its own children created after
the change. There is shell magic that may allow you to fake it by eval()ing the script‘s output in
your shell; check out the comp.unix.questions FAQ for details.
How do I close a process‘s filehandle without waiting for it to complete?
Assuming your system supports such things, just send an appropriate signal to the process (see
kill in perlfunc. It‘s common to first send a TERM signal, wait a little bit, and then send a KILL signal to
finish it off.
How do I fork a daemon process?
If by daemon process you mean one that‘s detached (disassociated from its tty), then the following process is
reported to work on most Unixish systems. Non−Unix users should check their Your_OS::Process module
for other solutions.
Open /dev/tty and use the the TIOCNOTTY ioctl on it. See tty(4) for details. Or better yet, you can
just use the POSIX::setsid() function, so you don‘t have to worry about process groups.
Change directory to /
Reopen STDIN, STDOUT, and STDERR so they‘re not connected to the old tty.
Background yourself like this:
fork && exit;
How do I make my program run with sh and csh?
See the eg/nih script (part of the perl source distribution).
How do I find out if I‘m running interactively or not?
Good question. Sometimes −t STDIN and −t STDOUT can give clues, sometimes not.
if (−t STDIN && −t STDOUT) {
print "Now what? ";
}
On POSIX systems, you can test whether your own process group matches the current process group of your
controlling terminal as follows:
use POSIX qw/getpgrp tcgetpgrp/;
open(TTY, "/dev/tty") or die $!;
$tpgrp = tcgetpgrp(TTY);
$pgrp = getpgrp();
if ($tpgrp == $pgrp) {
print "foreground\n";
} else {
print "background\n";
}
How do I timeout a slow event?
Use the alarm() function, probably in conjunction with a signal handler, as documented Signals in perlipc
and chapter 6 of the Camel. You may instead use the more flexible Sys::AlarmCall module available from
18−Oct−1998
Version 5.005_02
111
perlfaq8
Perl Programmers Reference Guide
perlfaq8
CPAN.
How do I set CPU limits?
Use the BSD::Resource module from CPAN.
How do I avoid zombies on a Unix system?
Use the reaper code from Signals in perlipc to call wait() when a SIGCHLD is received, or else use the
double−fork technique described in fork.
How do I use an SQL database?
There are a number of excellent interfaces to SQL databases. See the DBD::* modules available from
http://www.perl.com/CPAN/modules/dbperl/DBD . A lot of information on this can be found at
http://www.hermetica.com/technologia/perl/DBI/index.html .
How do I make a system() exit on control−C?
You can‘t. You need to imitate the system() call (see perlipc for sample code) and then have a signal
handler for the INT signal that passes the signal on to the subprocess. Or you can check for it:
$rc = system($cmd);
if ($rc & 127) { die "signal death" }
How do I open a file without blocking?
If you‘re lucky enough to be using a system that supports non−blocking reads (most Unixish systems do),
you need only to use the O_NDELAY or O_NONBLOCK flag from the Fcntl module in conjunction with
sysopen():
use Fcntl;
sysopen(FH, "/tmp/somefile", O_WRONLY|O_NDELAY|O_CREAT, 0644)
or die "can’t open /tmp/somefile: $!":
How do I install a CPAN module?
The easiest way is to have the CPAN module do it for you. This module comes with perl version 5.004 and
later. To manually install the CPAN module, or any well−behaved CPAN module for that matter, follow
these steps:
1
Unpack the source into a temporary area.
2
perl Makefile.PL
3
make
4
make test
5
make install
If your version of perl is compiled without dynamic loading, then you just need to replace step 3 (make)
with make perl and you will get a new perl binary with your extension linked in.
See ExtUtils::MakeMaker for more details on building extensions. See also the next question.
What‘s the difference between require and use?
Perl offers several different ways to include code from one file into another. Here are the deltas between the
various inclusion constructs:
1)
112
do $file is like eval ‘cat $file‘, except the former:
1.1: searches @INC and updates %INC.
1.2: bequeaths an *unrelated* lexical scope on the eval’ed code.
Version 5.005_02
18−Oct−1998
perlfaq8
Perl Programmers Reference Guide
perlfaq8
2)
require $file is like do $file, except the former:
2.1: checks for redundant loading, skipping already loaded files.
2.2: raises an exception on failure to find, compile, or execute $file.
3)
require Module is like require "Module.pm", except the former:
3.1: translates each "::" into your system’s directory separator.
3.2: primes the parser to disambiguate class Module as an indirect object.
4)
use Module is like require Module, except the former:
4.1: loads the module at compile time, not run−time.
4.2: imports symbols and semantics from that package to the current one.
In general, you usually want use and a proper Perl module.
How do I keep my own module/library directory?
When you build modules, use the PREFIX option when generating Makefiles:
perl Makefile.PL PREFIX=/u/mydir/perl
then either set the PERL5LIB environment variable before you run scripts that use the modules/libraries (see
perlrun) or say
use lib ’/u/mydir/perl’;
See Perl‘s lib for more information.
How do I add the directory my program lives in to the module/library search path?
use FindBin;
use lib "$FindBin::Bin";
use your_own_modules;
How do I add a directory to my include path at runtime?
Here are the suggested ways of modifying your include path:
the
the
the
the
PERLLIB environment variable
PERL5LIB environment variable
perl −Idir commpand line flag
use lib pragma, as in
use lib "$ENV{HOME}/myown_perllib";
The latter is particularly useful because it knows about machine dependent architectures. The lib.pm
pragmatic module was first included with the 5.002 release of Perl.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You
are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit would be courteous but is not required.
18−Oct−1998
Version 5.005_02
113
perlfaq9
Perl Programmers Reference Guide
perlfaq9
NAME
perlfaq9 − Networking ($Revision: 1.20 $, $Date: 1998/06/22 18:31:09 $)
DESCRIPTION
This section deals with questions related to networking, the internet, and a few on the web.
My CGI script runs from the command line but not the browser. (500 Server Error)
If you can demonstrate that you‘ve read the following FAQs and that your problem isn‘t something simple
that can be easily answered, you‘ll probably receive a courteous and useful reply to your question if you post
it on comp.infosystems.www.authoring.cgi (if it‘s something to do with HTTP, HTML, or the CGI
protocols). Questions that appear to be Perl questions but are really CGI ones that are posted to
comp.lang.perl.misc may not be so well received.
The useful FAQs and related documents are:
CGI FAQ
http://www.webthing.com/page.cgi/cgifaq
Web FAQ
http://www.boutell.com/faq/
WWW Security FAQ
http://www.w3.org/Security/Faq/
HTTP Spec
http://www.w3.org/pub/WWW/Protocols/HTTP/
HTML Spec
http://www.w3.org/TR/REC−html40/
http://www.w3.org/pub/WWW/MarkUp/
CGI Spec
http://www.w3.org/CGI/
CGI Security FAQ
http://www.go2net.com/people/paulp/cgi−security/safe−cgi.txt
How can I get better error messages from a CGI program?
Use the CGI::Carp module. It replaces warn and die, plus the normal Carp modules carp, croak, and
confess functions with more verbose and safer versions. It still sends them to the normal server error log.
use CGI::Carp;
warn "This is a complaint";
die "But this one is serious";
The following use of CGI::Carp also redirects errors to a file of your choice, placed in a BEGIN block to
catch compile−time warnings as well:
BEGIN {
use CGI::Carp qw(carpout);
open(LOG, ">>/var/local/cgi−logs/mycgi−log")
or die "Unable to append to mycgi−log: $!\n";
carpout(*LOG);
}
You can even arrange for fatal errors to go back to the client browser, which is nice for your own debugging,
but might confuse the end user.
use CGI::Carp qw(fatalsToBrowser);
die "Bad error here";
114
Version 5.005_02
18−Oct−1998
perlfaq9
Perl Programmers Reference Guide
perlfaq9
Even if the error happens before you get the HTTP header out, the module will try to take care of this to
avoid the dreaded server 500 errors. Normal warnings still go out to the server error log (or wherever you‘ve
sent them with carpout) with the application name and date stamp prepended.
How do I remove HTML from a string?
The most correct way (albeit not the fastest) is to use HTML::Parse from CPAN (part of the libwww−perl
distribution, which is a must−have module for all web hackers).
Many folks attempt a simple−minded regular expression approach, like s/<.*?>//g, but that fails in many
cases because the tags may continue over line breaks, they may contain quoted angle−brackets, or HTML
comment may be present. Plus folks forget to convert entities, like < for example.
Here‘s one "simple−minded" approach, that works for most files:
#!/usr/bin/perl −p0777
s/<(?:[^>’"]*|([’"]).*?\1)*>//gs
If you want a more complete solution, see the 3−stage striphtml program in
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz .
Here are some tricky cases that you should think about when picking a solution:
−−>
<# Just data #>
>>>>>>>>>>> ]]>
If HTML comments include other tags, those solutions would also break on text like this:
You can’t see me!
−−>
How do I extract URLs?
A quick but imperfect approach is
#!/usr/bin/perl −n00
# qxurl − tchrist@perl.com
print "$2\n" while m{
< \s*
A \s+ HREF \s* = \s* (["’]) (.*?) \1
\s* >
}gsix;
This version does not adjust relative URLs, understand alternate bases, deal with HTML comments, deal
with HREF and NAME attributes in the same tag, or accept URLs themselves as arguments. It also runs
about 100x faster than a more "complete" solution using the LWP suite of modules, such as the
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
How do I download a file from the user‘s machine? How do I open a file on another machine?
In the context of an HTML form, you can use what‘s known as multipart/form−data encoding. The
CGI.pm module (available from CPAN) supports this in the start_multipart_form() method, which
isn‘t the same as the startform() method.
18−Oct−1998
Version 5.005_02
115
perlfaq9
Perl Programmers Reference Guide
perlfaq9
How do I make a pop−up menu in HTML?
Use the and tags. The CGI.pm module (available from CPAN) supports this
widget, as well as many others, including some that it cleverly synthesizes on its own.
How do I fetch an HTML file?
One approach, if you have the lynx text−based HTML browser installed on your system, is this:
$html_code = ‘lynx −source $url‘;
$text_data = ‘lynx −dump $url‘;
The libwww−perl (LWP) modules from CPAN provide a more powerful way to do this. They work through
proxies, and don‘t require lynx:
# simplest version
use LWP::Simple;
$content = get($URL);
# or print HTML from a URL
use LWP::Simple;
getprint "http://www.sn.no/libwww−perl/";
# or print ASCII from HTML from a URL
use LWP::Simple;
use HTML::Parse;
use HTML::FormatText;
my ($html, $ascii);
$html = get("http://www.perl.com/");
defined $html
or die "Can’t fetch HTML from http://www.perl.com/";
$ascii = HTML::FormatText−>new−>format(parse_html($html));
print $ascii;
How do I automate an HTML form submission?
If you‘re submitting values using the GET method, create a URL and encode the form using the
query_form method:
use LWP::Simple;
use URI::URL;
my $url = url(’http://www.perl.com/cgi−bin/cpan_mod’);
$url−>query_form(module => ’DB_File’, readme => 1);
$content = get($url);
If you‘re using the POST method, create your own user agent and encode the content appropriately.
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
$ua = LWP::UserAgent−>new();
my $req = POST ’http://www.perl.com/cgi−bin/cpan_mod’,
[ module => ’DB_File’, readme => 1 ];
$content = $ua−>request($req)−>as_string;
How do I decode or create those %−encodings on the web?
Here‘s an example of decoding:
$string = "http://altavista.digital.com/cgi−bin/query?pg=q&what=news&fmt=.&q=%2Bc
$string =~ s/%([a−fA−F0−9]{2})/chr(hex($1))/ge;
Encoding is a bit harder, because you can‘t just blindly change all the non−alphanumunder character (\W)
into their hex escapes. It‘s important that characters with special meaning like / and ? not be translated.
116
Version 5.005_02
18−Oct−1998
perlfaq9
Perl Programmers Reference Guide
perlfaq9
Probably the easiest way to get this right is to avoid reinventing the wheel and just use the URI::Escape
module, which is part of the libwww−perl package (LWP) available from CPAN.
How do I redirect to another page?
Instead of sending back a Content−Type as the headers of your reply, send back a Location: header.
Officially this should be a URI: header, so the CGI.pm module (available from CPAN) sends back both:
Location: http://www.domain.com/newpage
URI: http://www.domain.com/newpage
Note that relative URLs in these headers can cause strange effects because of "optimizations" that servers do.
$url = "http://www.perl.com/CPAN/";
print "Location: $url\n\n";
exit;
To be correct to the spec, each of those "\n" should really each be "\015\012", but unless you‘re stuck
on MacOS, you probably won‘t notice.
How do I put a password on my web pages?
That depends. You‘ll need to read the documentation for your web server, or perhaps check some of the
other FAQs referenced above.
How do I edit my .htpasswd and .htgroup files with Perl?
The HTTPD::UserAdmin and HTTPD::GroupAdmin modules provide a consistent OO interface to these
files, regardless of how they‘re stored. Databases may be text, dbm, Berkley DB or any database with a DBI
compatible driver. HTTPD::UserAdmin supports files used by the ‘Basic’ and ‘Digest’ authentication
schemes. Here‘s an example:
use HTTPD::UserAdmin ();
HTTPD::UserAdmin
−>new(DB => "/foo/.htpasswd")
−>add($username => $password);
How do I make sure users can‘t enter values into a form that cause my CGI script to do bad
things?
Read the CGI security FAQ, at http://www−genome.wi.mit.edu/WWW/faqs/www−security−faq.html, and
the Perl/CGI FAQ at http://www.perl.com/CPAN/doc/FAQs/cgi/perl−cgi−faq.html.
In brief: use tainting (see perlsec), which makes sure that data from outside your script (eg, CGI parameters)
are never used in eval or system calls. In addition to tainting, never use the single−argument form of
system() or exec(). Instead, supply the command and arguments as a list, which prevents shell
globbing.
How do I parse a mail header?
For a quick−and−dirty solution, try this solution derived from page 222 of the 2nd edition of "Programming
Perl":
$/ = ’’;
$header = ;
$header =~ s/\n\s+/ /g;
# merge continuation lines
%head = ( UNIX_FROM_LINE, split /^([−\w]+):\s*/m, $header );
That solution doesn‘t do well if, for example, you‘re trying to maintain all the Received lines. A more
complete approach is to use the Mail::Header module from CPAN (part of the MailTools package).
How do I decode a CGI form?
You use a standard module, probably CGI.pm. Under no circumstances should you attempt to do so by
hand!
18−Oct−1998
Version 5.005_02
117
perlfaq9
Perl Programmers Reference Guide
perlfaq9
You‘ll see a lot of CGI programs that blindly read from STDIN the number of bytes equal to
CONTENT_LENGTH for POSTs, or grab QUERY_STRING for decoding GETs. These programs are very
poorly written. They only work sometimes. They typically forget to check the return value of the read()
system call, which is a cardinal sin. They don‘t handle HEAD requests. They don‘t handle multipart forms
used for file uploads. They don‘t deal with GET/POST combinations where query fields are in more than
one place. They don‘t deal with keywords in the query string.
In short, they‘re bad hacks. Resist them at all costs. Please do not be tempted to reinvent the wheel.
Instead, use the CGI.pm or CGI_Lite.pm (available from CPAN), or if you‘re trapped in the module−free
land of perl1 .. perl4, you might look into cgi−lib.pl (available from
http://www.bio.cam.ac.uk/web/form.html).
Make sure you know whether to use a GET or a POST in your form. GETs should only be used for
something that doesn‘t update the server. Otherwise you can get mangled databases and repeated feedback
mail messages. The fancy word for this is ‘‘idempotency‘’. This simply means that there should be no
difference between making a GET request for a particular URL once or multiple times. This is because the
HTTP protocol definition says that a GET request may be cached by the browser, or server, or an intervening
proxy. POST requests cannot be cached, because each request is independent and matters. Typically, POST
requests change or depend on state on the server (query or update a database, send mail, or purchase a
computer).
How do I check a valid mail address?
You can‘t, at least, not in real time. Bummer, eh?
Without sending mail to the address and seeing whether there‘s a human on the other hand to answer you,
you cannot determine whether a mail address is valid. Even if you apply the mail header standard, you can
have problems, because there are deliverable addresses that aren‘t RFC−822 (the mail header standard)
compliant, and addresses that aren‘t deliverable which are compliant.
Many are tempted to try to eliminate many frequently−invalid mail addresses with a simple regexp, such as
/^[\w.−]+\@([\w.−]\.)+\w+$/. It‘s a very bad idea. However, this also throws out many valid
ones, and says nothing about potential deliverability, so is not suggested. Instead, see
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz , which actually checks against the
full RFC spec (except for nested comments), looks for addresses you may not wish to accept mail to (say,
Bill Clinton or your postmaster), and then makes sure that the hostname given can be looked up in the DNS
MX records. It‘s not fast, but it works for what it tries to do.
Our best advice for verifying a person‘s mail address is to have them enter their address twice, just as you
normally do to change a password. This usually weeds out typos. If both versions match, send mail to that
address with a personal message that looks somewhat like:
Dear someuser@host.com,
Please confirm the mail address you gave us Wed May 6 09:38:41
MDT 1998 by replying to this message. Include the string
"Rumpelstiltskin" in that reply, but spelled in reverse; that is,
start with "Nik...". Once this is done, your confirmed address will
be entered into our records.
If you get the message back and they‘ve followed your directions, you can be reasonably assured that it‘s
real.
A related strategy that‘s less open to forgery is to give them a PIN (personal ID number). Record the address
and PIN (best that it be a random one) for later processing. In the mail you send, ask them to include the PIN
in their reply. But if it bounces, or the message is included via a ‘‘vacation‘’ script, it‘ll be there anyway. So
it‘s best to ask them to mail back a slight alteration of the PIN, such as with the characters reversed, one
added or subtracted to each digit, etc.
118
Version 5.005_02
18−Oct−1998
perlfaq9
Perl Programmers Reference Guide
perlfaq9
How do I decode a MIME/BASE64 string?
The MIME−tools package (available from CPAN) handles this and a lot more. Decoding BASE64 becomes
as simple as:
use MIME::base64;
$decoded = decode_base64($encoded);
A more direct approach is to use the unpack() function‘s "u" format after minor transliterations:
tr#A−Za−z0−9+/##cd;
tr#A−Za−z0−9+/# −_#;
$len = pack("c", 32 + 0.75*length);
print unpack("u", $len . $_);
#
#
#
#
remove non−base64 chars
convert to uuencoded format
compute length byte
uudecode and print
How do I return the user‘s mail address?
On systems that support getpwuid, the $< variable and the Sys::Hostname module (which is part of the
standard perl distribution), you can probably try using something like this:
use Sys::Hostname;
$address = sprintf(’%s@%s’, getpwuid($<), hostname);
Company policies on mail address can mean that this generates addresses that the company‘s mail system
will not accept, so you should ask for users’ mail addresses when this matters. Furthermore, not all systems
on which Perl runs are so forthcoming with this information as is Unix.
The Mail::Util module from CPAN (part of the MailTools package) provides a mailaddress() function
that tries to guess the mail address of the user. It makes a more intelligent guess than the code above, using
information given when the module was installed, but it could still be incorrect. Again, the best way is often
just to ask the user.
How do I send mail?
Use the sendmail program directly:
open(SENDMAIL, "|/usr/lib/sendmail −oi −t −odq")
or die "Can’t fork for sendmail: $!\n";
print SENDMAIL <<"EOF";
From: User Originating Mail
To: Final Destination
Subject: A relevant subject line
Body of the message goes here, in as many lines as you like.
EOF
close(SENDMAIL)
or warn "sendmail didn’t close nicely";
The −oi option prevents sendmail from interpreting a line consisting of a single dot as "end of message".
The −t option says to use the headers to decide who to send the message to, and −odq says to put the
message into the queue. This last option means your message won‘t be immediately delivered, so leave it
out if you want immediate delivery.
Or use the CPAN module Mail::Mailer:
use Mail::Mailer;
$mailer = Mail::Mailer−>new();
$mailer−>open({ From
=> $from_address,
To
=> $to_address,
Subject => $subject,
})
or die "Can’t open: $!\n";
print $mailer $body;
18−Oct−1998
Version 5.005_02
119
perlfaq9
Perl Programmers Reference Guide
perlfaq9
$mailer−>close();
The Mail::Internet module uses Net::SMTP which is less Unix−centric than Mail::Mailer, but less reliable.
Avoid raw SMTP commands. There are many reasons to use a mail transport agent like sendmail. These
include queueing, MX records, and security.
How do I read mail?
Use the Mail::Folder module from CPAN (part of the MailFolder package) or the Mail::Internet module
from CPAN (also part of the MailTools package).
# sending mail
use Mail::Internet;
use Mail::Header;
# say which mail host to use
$ENV{SMTPHOSTS} = ’mail.frii.com’;
# create headers
$header = new Mail::Header;
$header−>add(’From’, ’gnat@frii.com’);
$header−>add(’Subject’, ’Testing’);
$header−>add(’To’, ’gnat@frii.com’);
# create body
$body = ’This is a test, ignore’;
# create mail object
$mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);
# send it
$mail−>smtpsend or die;
Often a module is overkill, though. Here‘s a mail sorter.
#!/usr/bin/perl
# bysub1 − simple sort by subject
my(@msgs, @sub);
my $msgno = −1;
$/ = ’’;
# paragraph reads
while (<>) {
if (/^From/m) {
/^Subject:\s*(?:Re:\s*)*(.*)/mi;
$sub[++$msgno] = lc($1) || ’’;
}
$msgs[$msgno] .= $_;
}
for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
print $msgs[$i];
}
Or more succinctly,
#!/usr/bin/perl −n00
# bysub2 − awkish sort−by−subject
BEGIN { $msgno = −1 }
$sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
$msg[$msgno] .= $_;
END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
How do I find out my hostname/domainname/IP address?
The normal way to find your own hostname is to call the ‘hostname‘ program. While sometimes
expedient, this has some problems, such as not knowing whether you‘ve got the canonical name or not. It‘s
one of those tradeoffs of convenience versus portability.
120
Version 5.005_02
18−Oct−1998
perlfaq9
Perl Programmers Reference Guide
perlfaq9
The Sys::Hostname module (part of the standard perl distribution) will give you the hostname after which
you can find out the IP address (assuming you have working DNS) with a gethostbyname() call.
use Socket;
use Sys::Hostname;
my $host = hostname();
my $addr = inet_ntoa(scalar(gethostbyname($name)) || ’localhost’);
Probably the simplest way to learn your DNS domain name is to grok it out of /etc/resolv.conf, at least under
Unix. Of course, this assumes several things about your resolv.conf configuration, including that it exists.
(We still need a good DNS domain name−learning method for non−Unix systems.)
How do I fetch a news article or the active newsgroups?
Use the Net::NNTP or News::NNTPClient modules, both available from CPAN. This can make tasks like
fetching the newsgroup list as simple as:
perl −MNews::NNTPClient
−e ’print News::NNTPClient−>new−>list("newsgroups")’
How do I fetch/put an FTP file?
LWP::Simple (available from CPAN) can fetch but not put. Net::FTP (also available from CPAN) is more
complex but can put as well as fetch.
How can I do RPC in Perl?
A DCE::RPC module is being developed (but is not yet available), and will be released as part of the
DCE−Perl package (available from CPAN). No ONC::RPC module is known.
AUTHOR AND COPYRIGHT
Copyright (c) 1997, 1998 Tom Christiansen and Nathan Torkington. All rights reserved.
When included as part of the Standard Version of Perl, or as part of its complete documentation whether
printed or otherwise, this work may be distributed only under the terms of Perl‘s Artistic License. Any
distribution of this file or derivatives thereof outside of that package require that special arrangements be
made with copyright holder.
Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You
are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A
simple comment in the code giving credit would be courteous but is not required.
18−Oct−1998
Version 5.005_02
121
perl
Perl Programmers Reference Guide
perl
NAME
perl − Practical Extraction and Report Language
SYNOPSIS
perl
[ −sTuU ]
[ −hv ] [ −V[:configvar] ]
[ −cw ] [ −d[:debugger] ] [ −D[number/list] ]
[ −pna ] [ −Fpattern ] [ −l[octal] ] [ −0[octal] ]
[ −Idir ] [ −m[−]module ] [ −M[−]‘module...’ ]
[ −P ]
[ −S ]
[ −x[dir] ]
[ −i[extension] ]
[ −e ‘command’ ] [ — ] [ programfile ] [ argument ]...
For ease of access, the Perl manual has been split up into a number of sections:
122
perl
perldelta
perlfaq
perltoc
Perl
Perl
Perl
Perl
overview (this section)
changes since previous version
frequently asked questions
documentation table of contents
perldata
perlsyn
perlop
perlre
perlrun
perlfunc
perlvar
perlsub
perlmod
perlmodlib
perlmodinstall
perlform
perllocale
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
data structures
syntax
operators and precedence
regular expressions
execution and options
builtin functions
predefined variables
subroutines
modules: how they work
modules: how to write and use
modules: how to install from CPAN
formats
locale support
perlref
perldsc
perllol
perltoot
perlobj
perltie
perlbot
perlipc
Perl
Perl
Perl
Perl
Perl
Perl
Perl
Perl
references
data structures intro
data structures: lists of lists
OO tutorial
objects
objects hidden behind simple variables
OO tricks and examples
interprocess communication
perldebug
perldiag
perlsec
perltrap
perlport
perlstyle
Perl
Perl
Perl
Perl
Perl
Perl
debugging
diagnostic messages
security
traps for the unwary
portability guide
style guide
perlpod
perlbook
Perl plain old documentation
Perl book information
perlembed
perlapio
perlxs
Perl ways to embed perl in your C or C++ application
Perl internal IO abstraction interface
Perl XS application programming interface
Version 5.005_02
18−Oct−1998
perl
Perl Programmers Reference Guide
perlxstut
perlguts
perlcall
perl
Perl XS tutorial
Perl internal functions for those doing extensions
Perl calling conventions from C
perlhist
Perl history records
(If you‘re intending to read these straight through for the first time, the suggested order will tend to reduce
the number of forward references.)
By default, all of the above manpages are installed in the /usr/local/man/ directory.
Extensive additional documentation for Perl modules is available. The default configuration for perl will
place this additional documentation in the /usr/local/lib/perl5/man directory (or else in the man subdirectory
of the Perl library directory). Some of this additional documentation is distributed standard with Perl, but
you‘ll also find documentation for third−party modules there.
You should be able to view Perl‘s documentation with your man(1) program by including the proper
directories in the appropriate start−up files, or in the MANPATH environment variable. To find out where
the configuration has installed the manpages, type:
perl −V:man.dir
If the directories have a common stem, such as /usr/local/man/man1 and /usr/local/man/man3, you need
only to add that stem (/usr/local/man) to your man(1) configuration files or your MANPATH environment
variable. If they do not share a stem, you‘ll have to add both stems.
If that doesn‘t work for some reason, you can still use the supplied perldoc script to view module
information. You might also look into getting a replacement man program.
If something strange has gone wrong with your program and you‘re not sure where you should look for help,
try the −w switch first. It will often point out exactly where the trouble is.
DESCRIPTION
Perl is a language optimized for scanning arbitrary text files, extracting information from those text files, and
printing reports based on that information. It‘s also a good language for many system management tasks.
The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant,
minimal).
Perl combines (in the author‘s opinion, anyway) some of the best features of C, sed, awk, and sh, so people
familiar with those languages should have little difficulty with it. (Language historians will also note some
vestiges of csh, Pascal, and even BASIC−PLUS.) Expression syntax corresponds quite closely to C
expression syntax. Unlike most Unix utilities, Perl does not arbitrarily limit the size of your data—if you‘ve
got the memory, Perl can slurp in your whole file as a single string. Recursion is of unlimited depth. And
the tables used by hashes (previously called "associative arrays") grow as necessary to prevent degraded
performance. Perl uses sophisticated pattern matching techniques to scan large amounts of data very
quickly. Although optimized for scanning text, Perl can also deal with binary data, and can make dbm files
look like hashes. Setuid Perl scripts are safer than C programs through a dataflow tracing mechanism which
prevents many stupid security holes.
If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must
run a little faster, and you don‘t want to write the silly thing in C, then Perl may be for you. There are also
translators to turn your sed and awk scripts into Perl scripts.
But wait, there‘s more...
Perl version 5 is nearly a complete rewrite, and provides the following additional benefits:
Many usability enhancements
It is now possible to write much more readable Perl code (even within regular expressions).
Formerly cryptic variable names can be replaced by mnemonic identifiers. Error messages are more
informative, and the optional warnings will catch many of the mistakes a novice might make. This
cannot be stressed enough. Whenever you get mysterious behavior, try the −w switch!!! Whenever
18−Oct−1998
Version 5.005_02
123
perl
Perl Programmers Reference Guide
perl
you don‘t get mysterious behavior, try using −w anyway.
Simplified grammar
The new yacc grammar is one half the size of the old one. Many of the arbitrary grammar rules have
been regularized. The number of reserved words has been cut by 2/3. Despite this, nearly all old Perl
scripts will continue to work unchanged.
Lexical scoping
Perl variables may now be declared within a lexical scope, like "auto" variables in C. Not only is this
more efficient, but it contributes to better privacy for "programming in the large". Anonymous
subroutines exhibit deep binding of lexical variables (closures).
Arbitrarily nested data structures
Any scalar value, including any array element, may now contain a reference to any other variable or
subroutine. You can easily create anonymous variables and subroutines. Perl manages your
reference counts for you.
Modularity and reusability
The Perl library is now defined in terms of modules which can be easily shared among various
packages. A package may choose to import all or a portion of a module‘s published interface.
Pragmas (that is, compiler directives) are defined and used by the same mechanism.
Object−oriented programming
A package can function as a class. Dynamic multiple inheritance and virtual methods are supported
in a straightforward manner and with very little new syntax. Filehandles may now be treated as
objects.
Embeddable and Extensible
Perl may now be embedded easily in your C or C++ application, and can either call or be called by
your routines through a documented interface. The XS preprocessor is provided to make it easy to
glue your C or C++ routines into Perl. Dynamic loading of modules is supported, and Perl itself can
be made into a dynamic library.
POSIX compliant
A major new module is the POSIX module, which provides access to all available POSIX routines
and definitions, via object classes where appropriate.
Package constructors and destructors
The new BEGIN and END blocks provide means to capture control as a package is being compiled,
and after the program exits. As a degenerate case they work just like awk‘s BEGIN and END when
you use the −p or −n switches.
Multiple simultaneous DBM implementations
A Perl program may now access DBM, NDBM, SDBM, GDBM, and Berkeley DB files from the
same script simultaneously. In fact, the old dbmopen interface has been generalized to allow any
variable to be tied to an object class which defines its access methods.
Subroutine definitions may now be autoloaded
In fact, the AUTOLOAD mechanism also allows you to define any arbitrary semantics for undefined
subroutine calls. It‘s not for just autoloading.
Regular expression enhancements
You can now specify nongreedy quantifiers. You can now do grouping without creating a
backreference. You can now write regular expressions with embedded whitespace and comments for
readability. A consistent extensibility mechanism has been added that is upwardly compatible with
all old regular expressions.
124
Version 5.005_02
18−Oct−1998
perl
Perl Programmers Reference Guide
perl
Innumerable Unbundled Modules
The Comprehensive Perl Archive Network described in perlmodlib contains hundreds of
plug−and−play modules full of reusable code. See http://www.perl.com/CPAN for a site near you.
Compilability
While not yet in full production mode, a working perl−to−C compiler does exist. It can generate
portable byte code, simple C, or optimized C code.
Okay, that‘s definitely enough hype.
ENVIRONMENT
See perlrun.
AUTHOR
Larry Wall bar() or $obj−>bar()).
18−Oct−1998
Version 5.005_02
127
perl5004delta
Perl Programmers Reference Guide
perl5004delta
Perl 5.005 will use method lookup only for methods’ AUTOLOADs. However, there is a significant base of
existing code that may be using the old behavior. So, as an interim step, Perl 5.004 issues an optional
warning when a non−method uses an inherited AUTOLOAD.
The simple rule is: Inheritance will not work when autoloading non−methods. The simple fix for old code
is: In any module that used to depend on inheriting AUTOLOAD for non−methods from a base class named
BaseClass, execute *AUTOLOAD = \&BaseClass::AUTOLOAD during startup.
Previously deprecated %OVERLOAD is no longer usable
Using %OVERLOAD to define overloading was deprecated in 5.003. Overloading is now defined using the
overload pragma. %OVERLOAD is still used internally but should not be used by Perl scripts. See overload
for more details.
Subroutine arguments created only when they‘re modified
In Perl 5.004, nonexistent array and hash elements used as subroutine parameters are brought into existence
only if they are actually assigned to (via @_).
Earlier versions of Perl vary in their handling of such arguments. Perl versions 5.002 and 5.003 always
brought them into existence. Perl versions 5.000 and 5.001 brought them into existence only if they were not
the first argument (which was almost certainly a bug). Earlier versions of Perl never brought them into
existence.
For example, given this code:
undef @a; undef %a;
sub show { print $_[0] };
sub change { $_[0]++ };
show($a[2]);
change($a{b});
After this code executes in Perl 5.004, $a{b} exists but $a[2] does not. In Perl 5.002 and 5.003, both
$a{b} and $a[2] would have existed (but $a[2]‘s value would have been undefined).
Group vector changeable with $)
The $) special variable has always (well, in Perl 5, at least) reflected not only the current effective group,
but also the group list as returned by the getgroups() C function (if there is one). However, until this
release, there has not been a way to call the setgroups() C function from Perl.
In Perl 5.004, assigning to $) is exactly symmetrical with examining it: The first number in its string value
is used as the effective gid; if there are any numbers after the first one, they are passed to the
setgroups() C function (if there is one).
Fixed parsing of $${FOO} and $aryref−>[$foo]: You may
now write &$subref($foo) as $subref−>($foo). All of these arrow terms may be chained;
thus, &{$table−>{FOO}}($bar) may now be written $table−>{FOO}−>($bar).
New and changed builtin constants
__PACKAGE__
The current package name at compile time, or the undefined value if there is no current package (due
to a package; directive). Like __FILE__ and __LINE__, __PACKAGE__ does not interpolate
into strings.
New and changed builtin variables
$^E Extended error message on some platforms. (Also known as $EXTENDED_OS_ERROR if you use
English).
$^H The current set of syntax checks enabled by use strict. See the documentation of strict for
more details. Not actually new, but newly documented. Because it is intended for internal use by Perl
core components, there is no use English long name for this variable.
$^M By default, running out of memory it is not trappable. However, if compiled for this, Perl may use the
contents of $^M as an emergency pool after die()ing with this message. Suppose that your Perl
were compiled with −DPERL_EMERGENCY_SBRK and used Perl‘s malloc. Then
$^M = ’a’ x (1<<16);
would allocate a 64K buffer for use when in emergency. See the INSTALL file for information on how
to enable this option. As a disincentive to casual use of this advanced feature, there is no use
English long name for this variable.
New and changed builtin functions
delete on slices
This now works. (e.g. delete @ENV{‘PATH‘, ‘MANPATH‘})
flock
is now supported on more platforms, prefers fcntl to lockf when emulating, and always flushes before
(un)locking.
printf and sprintf
Perl now implements these functions itself; it doesn‘t use the C library function sprintf() any
more, except for floating−point numbers, and even then only known flags are allowed. As a result, it is
now possible to know which conversions and flags will work, and what they will do.
The new conversions in Perl‘s sprintf() are:
%i
%p
%n
130
a synonym for %d
a pointer (the address of the Perl value, in hexadecimal)
special: *stores* the number of characters output so far
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
into the next variable in the parameter list
The new flags that go between the % and the conversion are:
#
h
V
prefix octal with "0", hex with "0x"
interpret integer as C type "short" or "unsigned short"
interpret integer as Perl’s standard integer type
Also, where a number would appear in the flags, an asterisk ("*") may be used instead, in which case
Perl uses the next item in the parameter list as the given number (that is, as the field width or
precision). If a field width obtained through "*" is negative, it has the same effect as the ‘−’ flag:
left−justification.
See sprintf for a complete list of conversion and flags.
keys as an lvalue
As an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash.
This can gain you a measure of efficiency if you know the hash is going to get big. (This is similar to
pre−extending an array by assigning a larger number to $#array.) If you say
keys %hash = 200;
then %hash will have at least 200 buckets allocated for it. These buckets will be retained even if you
do %hash = (); use undef %hash if you want to free the storage while %hash is still in scope.
You can‘t shrink the number of buckets allocated for the hash using keys in this way (but you needn‘t
worry about doing this by accident, as trying has no effect).
my() in Control Structures
You can now use my() (with or without the parentheses) in the control expressions of control
structures such as:
while (defined(my $line = <>)) {
$line = lc $line;
} continue {
print $line;
}
if ((my $answer = ) =~ /^y(es)?$/i) {
user_agrees();
} elsif ($answer =~ /^n(o)?$/i) {
user_disagrees();
} else {
chomp $answer;
die "‘$answer’ is neither ‘yes’ nor ‘no’";
}
Also, you can declare a foreach loop control variable as lexical by preceding it with the word "my".
For example, in:
foreach my $i (1, 2, 3) {
some_function();
}
$i is a lexical variable, and the scope of $i extends to the end of the loop, but not beyond it.
Note that you still cannot use my() on global punctuation variables such as $_ and the like.
pack() and unpack()
A new format ‘w’ represents a BER compressed integer (as defined in ASN.1). Its format is a
sequence of one or more bytes, each of which provides seven bits of the total value, with the most
significant first. Bit eight of each byte is set, except for the last byte, in which bit eight is clear.
18−Oct−1998
Version 5.005_02
131
perl5004delta
Perl Programmers Reference Guide
perl5004delta
If ‘p’ or ‘P’ are given undef as values, they now generate a NULL pointer.
Both pack() and unpack() now fail when their templates contain invalid types. (Invalid types
used to be ignored.)
sysseek()
The new sysseek() operator is a variant of seek() that sets and gets the file‘s system read/write
position, using the lseek(2) system call. It is the only reliable way to seek before using sysread()
or syswrite(). Its return value is the new position, or the undefined value on failure.
use VERSION
If the first argument to use is a number, it is treated as a version number instead of a module name. If
the version of the Perl interpreter is less than VERSION, then an error message is printed and Perl
exits immediately. Because use occurs at compile time, this check happens immediately during the
compilation process, unlike require VERSION, which waits until runtime for the check. This is
often useful if you need to check the current Perl version before useing library modules which have
changed in incompatible ways from older versions of Perl. (We try not to do this more than we have
to.)
use Module VERSION LIST
If the VERSION argument is present between Module and LIST, then the use will call the VERSION
method in class Module with the given version as an argument. The default VERSION method,
inherited from the UNIVERSAL class, croaks if the given version is larger than the value of the
variable $Module::VERSION. (Note that there is not a comma after VERSION!)
This version−checking mechanism is similar to the one currently used in the Exporter module, but it is
faster and can be used with modules that don‘t use the Exporter. It is the recommended method for
new code.
prototype(FUNCTION)
Returns the prototype of a function as a string (or undef if the function has no prototype).
FUNCTION is a reference to or the name of the function whose prototype you want to retrieve. (Not
actually new; just never documented before.)
srand
The default seed for srand, which used to be time, has been changed. Now it‘s a heady mix of
difficult−to−predict system−dependent values, which should be sufficient for most everyday purposes.
Previous to version 5.004, calling rand without first calling srand would yield the same sequence of
random numbers on most or all machines. Now, when perl sees that you‘re calling rand and haven‘t
yet called srand, it calls srand with the default seed. You should still call srand manually if your
code might ever be run on a pre−5.004 system, of course, or if you want a seed other than the default.
$_ as Default
Functions documented in the Camel to default to $_ now in fact do, and all those that do are so
documented in perlfunc.
m//gc does not reset search position on failure
The m//g match iteration construct has always reset its target string‘s search position (which is visible
through the pos operator) when a match fails; as a result, the next m//g match after a failure starts
again at the beginning of the string. With Perl 5.004, this reset may be disabled by adding the "c" (for
"continue") modifier, i.e. m//gc. This feature, in conjunction with the \G zero−width assertion,
makes it possible to chain matches together. See perlop and perlre.
m//x ignores whitespace before ?*+{}
The m//x construct has always been intended to ignore all unescaped whitespace. However, before
Perl 5.004, whitespace had the effect of escaping repeat modifiers like "*" or "?"; for example, /a
*b/x was (mis)interpreted as /a\*b/x. This bug has been fixed in 5.004.
132
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
nested sub{} closures work now
Prior to the 5.004 release, nested anonymous functions didn‘t work right. They do now.
formats work right on changing lexicals
Just like anonymous functions that contain lexical variables that change (like a lexical index variable
for a foreach loop), formats now work properly. For example, this silently failed before (printed
only zeros), but is fine now:
my $i;
foreach $i ( 1 .. 10 ) {
write;
}
format =
my i is @#
$i
.
However, it still fails (without a warning) if the foreach is within a subroutine:
my $i;
sub foo {
foreach $i ( 1 .. 10 ) {
write;
}
}
foo;
format =
my i is @#
$i
.
New builtin methods
The UNIVERSAL package automatically contains the following methods that are inherited by all other
classes:
isa(CLASS)
isa returns true if its object is blessed into a subclass of CLASS
isa is also exportable and can be called as a sub with two arguments. This allows the ability to check
what a reference points to. Example:
use UNIVERSAL qw(isa);
if(isa($ref, ’ARRAY’)) {
...
}
can(METHOD)
can checks to see if its object has a method called METHOD, if it does then a reference to the sub is
returned; if it does not then undef is returned.
VERSION( [NEED] )
VERSION returns the version number of the class (package). If the NEED argument is given then it
will check that the current version (as defined by the $VERSION variable in the given package) not
less than NEED; it will die if this is not the case. This method is normally called as a class method.
This method is called automatically by the VERSION form of use.
use A 1.2 qw(some imported subs);
# implies:
18−Oct−1998
Version 5.005_02
133
perl5004delta
Perl Programmers Reference Guide
perl5004delta
A−>VERSION(1.2);
NOTE: can directly uses Perl‘s internal code for method lookup, and isa uses a very similar method and
caching strategy. This may cause strange effects if the Perl code dynamically changes @ISA in any package.
You may add other methods to the UNIVERSAL class via Perl or XS code. You do not need to use
UNIVERSAL in order to make these methods available to your program. This is necessary only if you wish
to have isa available as a plain subroutine in the current package.
TIEHANDLE now supported
See perltie for other kinds of tie()s.
TIEHANDLE classname, LIST
This is the constructor for the class. That means it is expected to return an object of some sort. The
reference can be used to hold some internal information.
sub TIEHANDLE {
print "\n";
my $i;
return bless \$i, shift;
}
PRINT this, LIST
This method will be triggered every time the tied handle is printed to. Beyond its self reference it also
expects the list that was passed to the print function.
sub PRINT {
$r = shift;
$$r++;
return print join( $, => map {uc} @_), $\;
}
PRINTF this, LIST
This method will be triggered every time the tied handle is printed to with the printf() function.
Beyond its self reference it also expects the format and list that was passed to the printf function.
sub PRINTF {
shift;
my $fmt = shift;
print sprintf($fmt, @_)."\n";
}
READ this LIST
This method will be called when the handle is read from via the read or sysread functions.
sub READ {
$r = shift;
my($buf,$len,$offset) = @_;
print "READ called, \$buf=$buf, \$len=$len, \$offset=$offset";
}
READLINE this
This method will be called when the handle is read from. The method should return undef when there
is no more data.
sub READLINE {
$r = shift;
return "PRINT called $$r times\n"
}
134
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
GETC this
This method will be called when the getc function is called.
sub GETC { print "Don’t GETC, Get Perl"; return "a"; }
DESTROY this
As with the other types of ties, this method will be called when the tied handle is about to be destroyed.
This is useful for debugging and possibly for cleaning up.
sub DESTROY {
print " \n";
}
Malloc enhancements
If perl is compiled with the malloc included with the perl distribution (that is, if perl −V:d_mymalloc is
‘define’) then you can print memory statistics at runtime by running Perl thusly:
env PERL_DEBUG_MSTATS=2 perl your_script_here
The value of 2 means to print statistics after compilation and on exit; with a value of 1, the statistics are
printed only on exit. (If you want the statistics at an arbitrary time, you‘ll need to install the optional module
Devel::Peek.)
Three new compilation flags are recognized by malloc.c. (They have no effect if perl is compiled with
system malloc().)
−DPERL_EMERGENCY_SBRK
If this macro is defined, running out of memory need not be a fatal error: a memory pool can allocated
by assigning to the special variable $^M. See "$^M".
−DPACK_MALLOC
Perl memory allocation is by bucket with sizes close to powers of two. Because of these malloc
overhead may be big, especially for data of size exactly a power of two. If PACK_MALLOC is defined,
perl uses a slightly different algorithm for small allocations (up to 64 bytes long), which makes it
possible to have overhead down to 1 byte for allocations which are powers of two (and appear quite
often).
Expected memory savings (with 8−byte alignment in alignbytes) is about 20% for typical Perl
usage. Expected slowdown due to additional malloc overhead is in fractions of a percent (hard to
measure, because of the effect of saved memory on speed).
−DTWO_POT_OPTIMIZE
Similarly to PACK_MALLOC, this macro improves allocations of data with size close to a power of
two; but this works for big allocations (starting with 16K by default). Such allocations are typical for
big hashes and special−purpose scripts, especially image processing.
On recent systems, the fact that perl requires 2M from system for 1M allocation will not affect speed
of execution, since the tail of such a chunk is not going to be touched (and thus will not require real
memory). However, it may result in a premature out−of−memory error. So if you will be manipulating
very large blocks with sizes close to powers of two, it would be wise to define this macro.
Expected saving of memory is 0−100% (100% in applications which require most memory in such
2**n chunks); expected slowdown is negligible.
Miscellaneous efficiency enhancements
Functions that have an empty prototype and that do nothing but return a fixed value are now inlined (e.g.
sub PI () { 3.14159 }).
Each unique hash key is only allocated once, no matter how many hashes have an entry with that key. So
even if you have 100 copies of the same hash, the hash keys never have to be reallocated.
18−Oct−1998
Version 5.005_02
135
perl5004delta
Perl Programmers Reference Guide
perl5004delta
Support for More Operating Systems
Support for the following operating systems is new in Perl 5.004.
Win32
Perl 5.004 now includes support for building a "native" perl under Windows NT, using the Microsoft Visual
C++ compiler (versions 2.0 and above) or the Borland C++ compiler (versions 5.02 and above). The
resulting perl can be used under Windows 95 (if it is installed in the same directory locations as it got
installed in Windows NT). This port includes support for perl extension building tools like MakeMaker and
h2xs, so that many extensions available on the Comprehensive Perl Archive Network (CPAN) can now be
readily built under Windows NT. See http://www.perl.com/ for more information on CPAN and
README.win32 in the perl distribution for more details on how to get started with building this port.
There is also support for building perl under the Cygwin32 environment. Cygwin32 is a set of GNU tools
that make it possible to compile and run many UNIX programs under Windows NT by providing a mostly
UNIX−like interface for compilation and execution. See README.cygwin32 in the perl distribution for
more details on this port and how to obtain the Cygwin32 toolkit.
Plan 9
See README.plan9 in the perl distribution.
QNX
See README.qnx in the perl distribution.
AmigaOS
See README.amigaos in the perl distribution.
Pragmata
Six new pragmatic modules exist:
use autouse MODULE = qw(sub1 sub2 sub3)
Defers require MODULE until someone calls one of the specified subroutines (which must be
exported by MODULE). This pragma should be used with caution, and only when necessary.
use blib
use blib ‘dir’
Looks for MakeMaker−like ‘blib’ directory structure starting in dir (or current directory) and working
back up to five levels of parent directories.
Intended for use on command line with −M option as a way of testing arbitrary scripts against an
uninstalled version of a package.
use constant NAME = VALUE
Provides a convenient interface for creating compile−time constants, See
Constant Functions in perlsub.
use locale
Tells the compiler to enable (or disable) the use of POSIX locales for builtin operations.
When use locale is in effect, the current LC_CTYPE locale is used for regular expressions and
case mapping; LC_COLLATE for string ordering; and LC_NUMERIC for numeric formating in printf
and sprintf (but not in print). LC_NUMERIC is always used in write, since lexical scoping of formats
is problematic at best.
Each use locale or no locale affects statements to the end of the enclosing BLOCK or, if not
inside a BLOCK, to the end of the current file. Locales can be switched and queried with
POSIX::setlocale().
See perllocale for more information.
136
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
use ops
Disable unsafe opcodes, or any named opcodes, when compiling Perl code.
use vmsish
Enable VMS−specific language features. Currently, there are three VMS−specific features available:
‘status‘, which makes $? and system return genuine VMS status values instead of emulating POSIX;
‘exit‘, which makes exit take a genuine VMS status value instead of assuming that exit 1 is an
error; and ‘time‘, which makes all times relative to the local time zone, in the VMS tradition.
Modules
Required Updates
Though Perl 5.004 is compatible with almost all modules that work with Perl 5.003, there are a few
exceptions:
Module
−−−−−−
Filter
LWP
Tk
Required Version for Perl 5.004
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Filter−1.12
libwww−perl−5.08
Tk400.202 (−w makes noise)
Also, the majordomo mailing list program, version 1.94.1, doesn‘t work with Perl 5.004 (nor with perl 4),
because it executes an invalid regular expression. This bug is fixed in majordomo version 1.94.2.
Installation directories
The installperl script now places the Perl source files for extensions in the architecture−specific library
directory, which is where the shared libraries for extensions have always been. This change is intended to
allow administrators to keep the Perl 5.004 library directory unchanged from a previous version, without
running the risk of binary incompatibility between extensions’ Perl source and shared libraries.
Module information summary
Brand new modules, arranged by topic rather than strictly alphabetically:
CGI.pm
CGI/Apache.pm
CGI/Carp.pm
CGI/Fast.pm
CGI/Push.pm
CGI/Switch.pm
Web server interface ("Common Gateway Interface")
Support for Apache’s Perl module
Log server errors with helpful context
Support for FastCGI (persistent server process)
Support for server push
Simple interface for multiple server types
CPAN
CPAN::FirstTime
CPAN::Nox
Interface to Comprehensive Perl Archive Network
Utility for creating CPAN configuration file
Runs CPAN while avoiding compiled extensions
IO.pm
IO/File.pm
IO/Handle.pm
IO/Pipe.pm
IO/Seekable.pm
IO/Select.pm
IO/Socket.pm
Top−level interface to IO::* classes
IO::File extension Perl module
IO::Handle extension Perl module
IO::Pipe extension Perl module
IO::Seekable extension Perl module
IO::Select extension Perl module
IO::Socket extension Perl module
Opcode.pm
Disable named opcodes when compiling Perl code
ExtUtils/Embed.pm
ExtUtils/testlib.pm
Utilities for embedding Perl in C programs
Fixes up @INC to use just−built extension
FindBin.pm
Find path of currently executing program
Class/Struct.pm
Declare struct−like datatypes as Perl classes
18−Oct−1998
Version 5.005_02
137
perl5004delta
Perl Programmers Reference Guide
perl5004delta
File/stat.pm
Net/hostent.pm
Net/netent.pm
Net/protoent.pm
Net/servent.pm
Time/gmtime.pm
Time/localtime.pm
Time/tm.pm
User/grent.pm
User/pwent.pm
By−name interface to Perl’s builtin stat
By−name interface to Perl’s builtin gethost*
By−name interface to Perl’s builtin getnet*
By−name interface to Perl’s builtin getproto*
By−name interface to Perl’s builtin getserv*
By−name interface to Perl’s builtin gmtime
By−name interface to Perl’s builtin localtime
Internal object for Time::{gm,local}time
By−name interface to Perl’s builtin getgr*
By−name interface to Perl’s builtin getpw*
Tie/RefHash.pm
Base class for tied hashes with references as keys
UNIVERSAL.pm
Base class for *ALL* classes
Fcntl
New constants in the existing Fcntl modules are now supported, provided that your operating system
happens to support them:
F_GETOWN F_SETOWN
O_ASYNC O_DEFER O_DSYNC O_FSYNC O_SYNC
O_EXLOCK O_SHLOCK
These constants are intended for use with the Perl operators sysopen() and fcntl() and the basic
database modules like SDBM_File. For the exact meaning of these and other Fcntl constants please refer to
your operating system‘s documentation for fcntl() and open().
In addition, the Fcntl module now provides these constants for use with the Perl operator flock():
LOCK_SH LOCK_EX LOCK_NB LOCK_UN
These constants are defined in all environments (because where there is no flock() system call, Perl
emulates it). However, for historical reasons, these constants are not exported unless they are explicitly
requested with the ":flock" tag (e.g. use Fcntl ‘:flock’).
IO
The IO module provides a simple mechanism to load all of the IO modules at one go. Currently this
includes:
IO::Handle
IO::Seekable
IO::File
IO::Pipe
IO::Socket
For more information on any of these modules, please see its respective documentation.
Math::Complex
The Math::Complex module has been totally rewritten, and now supports more operations. These are
overloaded:
+ − * / ** <=> neg ~ abs sqrt exp log sin cos atan2 "" (stringify)
And these functions are now exported:
pi i Re Im arg
log10 logn ln cbrt root
tan
csc sec cot
asin acos atan
acsc asec acot
138
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
sinh cosh tanh
csch sech coth
asinh acosh atanh
acsch asech acoth
cplx cplxe
Math::Trig
This new module provides a simpler interface to parts of Math::Complex for those who need trigonometric
functions only for real numbers.
DB_File
There have been quite a few changes made to DB_File. Here are a few of the highlights:
Fixed a handful of bugs.
By public demand, added support for the standard hash function exists().
Made it compatible with Berkeley DB 1.86.
Made negative subscripts work with RECNO interface.
Changed the default flags from O_RDWR to O_CREAT|O_RDWR and the default mode from 0640 to
0666.
Made DB_File automatically import the open() constants (O_RDWR, O_CREAT etc.) from Fcntl, if
available.
Updated documentation.
Refer to the HISTORY section in DB_File.pm for a complete list of changes. Everything after DB_File 1.01
has been added since 5.003.
Net::Ping
Major rewrite − support added for both udp echo and real icmp pings.
Object−oriented overrides for builtin operators
Many of the Perl builtins returning lists now have object−oriented overrides. These are:
File::stat
Net::hostent
Net::netent
Net::protoent
Net::servent
Time::gmtime
Time::localtime
User::grent
User::pwent
For example, you can now say
use File::stat;
use User::pwent;
$his = (stat($filename)−>st_uid == pwent($whoever)−>pw_uid);
Utility Changes
pod2html
Sends converted HTML to standard output
The pod2html utility included with Perl 5.004 is entirely new. By default, it sends the converted
HTML to its standard output, instead of writing it to a file like Perl 5.003‘s pod2html did. Use the
—outfile=FILENAME option to write to a file.
18−Oct−1998
Version 5.005_02
139
perl5004delta
Perl Programmers Reference Guide
perl5004delta
xsubpp
void XSUBs now default to returning nothing
Due to a documentation/implementation bug in previous versions of Perl, XSUBs with a return type of
void have actually been returning one value. Usually that value was the GV for the XSUB, but
sometimes it was some already freed or reused value, which would sometimes lead to program failure.
In Perl 5.004, if an XSUB is declared as returning void, it actually returns no value, i.e. an empty list
(though there is a backward−compatibility exception; see below). If your XSUB really does return an
SV, you should give it a return type of SV *.
For backward compatibility, xsubpp tries to guess whether a void XSUB is really void or if it wants
to return an SV *. It does so by examining the text of the XSUB: if xsubpp finds what looks like an
assignment to ST(0), it assumes that the XSUB‘s return type is really SV *.
C Language API Changes
gv_fetchmethod and perl_call_sv
The gv_fetchmethod function finds a method for an object, just like in Perl 5.003. The GV it
returns may be a method cache entry. However, in Perl 5.004, method cache entries are not visible to
users; therefore, they can no longer be passed directly to perl_call_sv. Instead, you should use the
GvCV macro on the GV to extract its CV, and pass the CV to perl_call_sv.
The most likely symptom of passing the result of gv_fetchmethod to perl_call_sv is Perl‘s
producing an "Undefined subroutine called" error on the second call to a given method (since there is
no cache on the first call).
perl_eval_pv
A new function handy for eval‘ing strings of Perl code inside C code. This function returns the value
from the eval statement, which can be used instead of fetching globals from the symbol table. See
perlguts, perlembed and perlcall for details and examples.
Extended API for manipulating hashes
Internal handling of hash keys has changed. The old hashtable API is still fully supported, and will
likely remain so. The additions to the API allow passing keys as SV*s, so that tied hashes can be
given real scalars as keys rather than plain strings (nontied hashes still can only use strings as keys).
New extensions must use the new hash access functions and macros if they wish to use SV* keys.
These additions also make it feasible to manipulate HE*s (hash entries), which can be more efficient.
See perlguts for details.
Documentation Changes
Many of the base and library pods were updated. These new pods are included in section 1:
perldelta
This document.
perlfaq
Frequently asked questions.
perllocale
Locale support (internationalization and localization).
perltoot
Tutorial on Perl OO programming.
perlapio
Perl internal IO abstraction interface.
140
Version 5.005_02
18−Oct−1998
perl5004delta
Perl Programmers Reference Guide
perl5004delta
perlmodlib
Perl module library and recommended practice for module creation. Extracted from perlmod (which is
much smaller as a result).
perldebug
Although not new, this has been massively updated.
perlsec
Although not new, this has been massively updated.
New Diagnostics
Several new conditions will trigger warnings that were silent before. Some only affect certain platforms.
The following new warnings and errors outline these. These messages are classified as follows (listed in
increasing order of desperation):
(W)
(D)
(S)
(F)
(P)
(X)
(A)
A warning (optional).
A deprecation (optional).
A severe warning (mandatory).
A fatal error (trappable).
An internal error you should never see (trappable).
A very fatal error (nontrappable).
An alien error message (not generated by Perl).
"my" variable %s masks earlier declaration in same scope
(W) A lexical variable has been redeclared in the same scope, effectively eliminating all access to the
previous instance. This is almost always a typographical error. Note that the earlier variable will still
exist until the end of the scope or until all closure referents to it are destroyed.
%s argument is not a HASH element or slice
(F) The argument to delete() must be either a hash element, such as
$foo{$bar}
$ref−>[12]−>{"susie"}
or a hash slice, such as
@foo{$bar, $baz, $xyzzy}
@{$ref−>[12]}{"susie", "queue"}
Allocation too large: %lx
(X) You can‘t allocate more than 64K on an MS−DOS machine.
Allocation too large
(F) You can‘t allocate more than 2^31+"small amount" bytes.
Applying %s to %s will act on scalar(%s)
(W) The pattern match (//), substitution (s///), and transliteration (tr///) operators work on scalar values.
If you apply one of them to an array or a hash, it will convert the array or hash to a scalar value — the
length of an array, or the population info of a hash — and then work on that scalar value. This is
probably not what you meant to do. See grep and map for alternatives.
Attempt to free nonexistent shared string
(P) Perl maintains a reference counted internal table of strings to optimize the storage and access of
hash keys and other strings. This indicates someone tried to decrement the reference count of a string
that can no longer be found in the table.
Attempt to use reference as lvalue in substr
(W) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty
strange. Perhaps you forgot to dereference it first. See substr.
18−Oct−1998
Version 5.005_02
141
perl5004delta
Perl Programmers Reference Guide
perl5004delta
Bareword "%s" refers to nonexistent package
(W) You used a qualified bareword of the form Foo::, but the compiler saw no other uses of that
namespace before that point. Perhaps you need to predeclare a package?
Can‘t redefine active sort subroutine %s
(F) Perl optimizes the internal handling of sort subroutines and keeps pointers into them. You tried to
redefine one such sort subroutine when it was currently active, which is not allowed. If you really
want to do this, you should write sort { &func } @x instead of sort func @x.
Can‘t use bareword ("%s") as %s ref while "strict refs" in use
(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.
Cannot resolve method ‘%s’ overloading ‘%s’ in package ‘%s’
(P) Internal error trying to resolve overloading specified by a method name (as opposed to a subroutine
reference).
Constant subroutine %s redefined
(S) You redefined a subroutine which had previously been eligible for inlining. See
Constant Functions in perlsub for commentary and workarounds.
Constant subroutine %s undefined
(S) You undefined a subroutine which had previously been eligible for inlining. See
Constant Functions in perlsub for commentary and workarounds.
Copy method did not return a reference
(F) The method which overloads "=" is buggy. See Copy Constructor.
Died
(F) You passed die() an empty string (the equivalent of die "") or you called it with no args and
both $@ and $_ were empty.
Exiting pseudo−block via %s
(W) You are exiting a rather special block construct (like a sort block or subroutine) by unconventional
means, such as a goto, or a loop control statement. See sort.
Identifier too long
(F) Perl limits identifiers (names for variables, functions, etc.) to 252 characters for simple names,
somewhat more for compound names (like $A::B). You‘ve exceeded Perl‘s limits. Future versions
of Perl are likely to eliminate these arbitrary limitations.
Illegal character %s (carriage return)
(F) A carriage return character was found in the input. This is an error, and not a warning, because
carriage return characters can break multi−line strings, including here documents (e.g., print
<. This may mean
that your csh (C shell) is broken. If so, you should change all of the csh−related variables in config.sh:
If you have tcsh, make the variables refer to it as if it were csh (e.g.
full_csh=‘/usr/bin/tcsh’); otherwise, make them all empty (except that d_csh should be
‘undef’) so that Perl will think csh is missing. In either case, after editing config.sh, run
./Configure −S and rebuild Perl.
Invalid conversion in %s: "%s"
(W) Perl does not understand the given format conversion. See sprintf.
Invalid type in pack: ‘%s’
(F) The given character is not a valid pack type. See pack.
Invalid type in unpack: ‘%s’
(F) The given character is not a valid unpack type. See unpack.
Name "%s::%s" used only once: possible typo
(W) Typographical errors often show up as unique variable names. If you had a good reason for having
a unique name, then just mention it again somehow to suppress the message (the use vars pragma
is provided for just this purpose).
Null picture in formline
(F) The first argument to formline must be a valid format picture specification. It was found to be
empty, which probably means you supplied it an uninitialized value. See perlform.
Offset outside string
(F) You tried to do a read/write/send/recv operation with an offset pointing outside the buffer. This is
difficult to imagine. The sole exception to this is that sysread()ing past the buffer will extend the
buffer and zero pad the new area.
Out of memory!
(X|F) The malloc() function returned 0, indicating there was insufficient remaining memory (or
virtual memory) to satisfy the request.
The request was judged to be small, so the possibility to trap it depends on the way Perl was compiled.
By default it is not trappable. However, if compiled for this, Perl may use the contents of $^M as an
emergency pool after die()ing with this message. In this case the error is trappable once.
Out of memory during request for %s
(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or
virtual memory) to satisfy the request. However, the request was judged large enough (compile−time
default is 64K), so a possibility to shut down by trapping this error is granted.
panic: frexp
(P) The library function frexp() failed, making printf("%f") impossible.
Possible attempt to put comments in qw() list
(W) qw() lists contain items separated by whitespace; as with literal strings, comment characters are
not ignored, but are instead treated as literal data. (You may have used different delimiters than the
parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
@list = qw(
a # a comment
b # another comment
);
18−Oct−1998
Version 5.005_02
143
perl5004delta
Perl Programmers Reference Guide
perl5004delta
when you should have written this:
@list = qw(
a
b
);
If you really want comments, build your list the old−fashioned way, with quotes and commas:
@list = (
’a’,
’b’,
);
# a comment
# another comment
Possible attempt to separate words with commas
(W) qw() lists contain items separated by whitespace; therefore commas aren‘t needed to separate the
items. (You may have used different delimiters than the parentheses shown here; braces are also
frequently used.)
You probably wrote something like this:
qw! a, b, c !;
which puts literal commas into some of the list items. Write it without commas if you don‘t want them
to appear in your data:
qw! a b c !;
Scalar value @%s{%s} better written as $%s{%s}
(W) You‘ve used a hash slice (indicated by @) to select a single element of a hash. Generally it‘s
better to ask for a scalar value (indicated by $). The difference is that $foo{&bar} always behaves
like a scalar, both when assigning to it and when evaluating its argument, while @foo{&bar}
behaves like a list when you assign to it, and provides a list context to its subscript, which can do weird
things if you‘re expecting only one subscript.
Stub found while resolving method ‘%s’ overloading ‘%s’ in package ‘%s’
(P) Overloading resolution over @ISA tree may be broken by importing stubs. Stubs should never be
implicitely created, but explicit calls to can may break this.
Too late for "−T" option
(X) The #! line (or local equivalent) in a Perl script contains the −T option, but Perl was not invoked
with −T in its argument list. This is an error because, by the time Perl discovers a −T in a script, it‘s
too late to properly taint everything from the environment. So Perl gives up.
untie attempted while %d inner references still exist
(W) A copy of the object returned from tie (or tied) was still valid when untie was called.
Unrecognized character %s
(F) The Perl parser has no idea what to do with the specified character in your Perl script (or eval).
Perhaps you tried to run a compressed script, a binary program, or a directory as a Perl program.
Unsupported function fork
(F) Your version of executable does not support forking.
Note that under some systems, like OS/2, there may be different flavors of Perl executables, some of
which may support fork, some not. Try changing the name you call Perl by to perl_, perl__, and
so on.
Use of "$$ )
the integer operation provides a scalar context for the operator, which responds by reading one
line from STDIN and passing it back to the integer operation, which will then find the integer value of that
line and return that. If, on the other hand, you say
sort( )
then the sort operation provides a list context for , which will proceed to read every line available
up to the end of file, and pass that list of lines back to the sort routine, which will then sort those lines and
return them as a list to whatever the context of the sort was.
Assignment is a little bit special in that it uses its left argument to determine the context for the right
argument. Assignment to a scalar evaluates the righthand side in a scalar context, while assignment to an
array or array slice evaluates the righthand side in a list context. Assignment to a list also evaluates the
righthand side in a list context.
User defined subroutines may choose to care whether they are being called in a scalar or list context, but
most subroutines do not need to care, because scalars are automatically interpolated into lists. See
wantarray.
Scalar values
All data in Perl is a scalar or an array of scalars or a hash of scalars. Scalar variables may contain various
kinds of singular data, such as numbers, strings, and references. In general, conversion from one form to
another is transparent. (A scalar may not contain multiple values, but may contain a reference to an array or
hash containing multiple values.) Because of the automatic conversion of scalars, operations, and functions
that return scalars don‘t need to care (and, in fact, can‘t care) whether the context is looking for a string or a
number.
Scalars aren‘t necessarily one thing or another. There‘s no place to declare a scalar variable to be of type
"string", or of type "number", or type "filehandle", or anything else. Perl is a contextually polymorphic
language whose scalars can be strings, numbers, or references (which includes objects). While strings and
numbers are considered pretty much the same thing for nearly all purposes, references are strongly−typed
uncastable pointers with builtin reference−counting and destructor invocation.
A scalar value is interpreted as TRUE in the Boolean sense if it is not the null string or the number 0 (or its
string equivalent, "0"). The Boolean context is just a special kind of scalar context.
There are actually two varieties of null scalars: defined and undefined. Undefined null scalars are returned
when there is no real value for something, such as when there was an error, or at end of file, or when you
refer to an uninitialized variable or element of an array. An undefined null scalar may become defined the
first time you use it as if it were defined, but prior to that you can use the defined() operator to determine
whether the value is defined or not.
148
Version 5.005_02
18−Oct−1998
perldata
Perl Programmers Reference Guide
perldata
To find out whether a given string is a valid nonzero number, it‘s usually enough to test it against both
numeric 0 and also lexical "0" (although this will cause −w noises). That‘s because strings that aren‘t
numbers count as 0, just as they do in awk:
if ($str == 0 && $str ne "0") {
warn "That doesn’t look like a number";
}
That‘s usually preferable because otherwise you won‘t treat IEEE notations like NaN or Infinity
properly. At other times you might prefer to use the POSIX::strtod function or a regular expression to check
whether data is numeric. See perlre for details on regular expressions.
warn
warn
warn
warn
warn
warn
warn
"has nondigits"
if
/\D/;
"not a natural number" unless /^\d+$/;
# rejects −3
"not an integer"
unless /^−?\d+$/;
# rejects +3
"not an integer"
unless /^[+−]?\d+$/;
"not a decimal number" unless /^−?\d+\.?\d*$/;
# rejects .2
"not a decimal number" unless /^−?(?:\d+(?:\.\d*)?|\.\d+)$/;
"not a C float"
unless /^([+−]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+−]?\d+))?$/;
The length of an array is a scalar value. You may find the length of array @days by evaluating $#days, as
in csh. (Actually, it‘s not the length of the array, it‘s the subscript of the last element, because there is
(ordinarily) a 0th element.) Assigning to $#days changes the length of the array. Shortening an array by
this method destroys intervening values. Lengthening an array that was previously shortened NO LONGER
recovers the values that were in those elements. (It used to in Perl 4, but we had to break this to make sure
destructors were called when expected.) You can also gain some miniscule measure of efficiency by
pre−extending an array that is going to get big. (You can also extend an array by assigning to an element
that is off the end of the array.) You can truncate an array down to nothing by assigning the null list () to it.
The following are equivalent:
@whatever = ();
$#whatever = −1;
If you evaluate a named array in a scalar context, it returns the length of the array. (Note that this is not true
of lists, which return the last value, like the C comma operator, nor of built−in functions, which return
whatever they feel like returning.) The following is always true:
scalar(@whatever) == $#whatever − $[ + 1;
Version 5 of Perl changed the semantics of $[: files that don‘t set the value of $[ no longer need to worry
about whether another file changed its value. (In other words, use of $[ is deprecated.) So in general you
can assume that
scalar(@whatever) == $#whatever + 1;
Some programmers choose to use an explicit conversion so nothing‘s left to doubt:
$element_count = scalar(@whatever);
If you evaluate a hash in a scalar context, it returns a value that is true if and only if the hash contains any
key/value pairs. (If there are any key/value pairs, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a slash. This is pretty much useful only to
find out whether Perl‘s (compiled in) hashing algorithm is performing poorly on your data set. For example,
you stick 10,000 things in a hash, but evaluating %HASH in scalar context reveals "1/16", which means only
one out of sixteen buckets has been touched, and presumably contains all 10,000 of your items. This isn‘t
supposed to happen.)
You can preallocate space for a hash by assigning to the keys() function. This rounds up the allocated
bucked to the next power of two:
18−Oct−1998
Version 5.005_02
149
perldata
Perl Programmers Reference Guide
keys(%users) = 1000;
perldata
# allocate 1024 buckets
Scalar value constructors
Numeric literals are specified in any of the customary floating point or integer formats:
12345
12345.67
.23E−10
0xffff
0377
4_294_967_296
# hex
# octal
# underline for legibility
String literals are usually delimited by either single or double quotes. They work much like shell quotes:
double−quoted string literals are subject to backslash and variable substitution; single−quoted strings are not
(except for "\’" and "\\"). The usual Unix backslash rules apply for making characters such as newline,
tab, etc., as well as some more exotic forms. See Quote and Quotelike Operators for a list.
Octal or hex representations in string literals (e.g. ‘0xffff’) are not automatically converted to their integer
representation. The hex() and oct() functions make these conversions for you. See hex and oct for more
details.
You can also embed newlines directly in your strings, i.e., they can end on a different line than they begin.
This is nice, but if you forget your trailing quote, the error will not be reported until Perl finds another line
containing the quote character, which may be much further on in the script. Variable substitution inside
strings is limited to scalar variables, arrays, and array slices. (In other words, names beginning with $ or @,
followed by an optional bracketed expression as a subscript.) The following code segment prints out "The
price is $100."
$Price = ’$100’;
# not interpreted
print "The price is $Price.\n";
# interpreted
As in some shells, you can put curly brackets around the name to delimit it from following alphanumerics.
In fact, an identifier within such curlies is forced to be a string, as is any single identifier within a hash
subscript. Our earlier example,
$days{’Feb’}
can be written as
$days{Feb}
and the quotes will be assumed automatically. But anything more complicated in the subscript will be
interpreted as an expression.
Note that a single−quoted string must be separated from a preceding word by a space, because single quote is
a valid (though deprecated) character in a variable name (see Packages).
Three special literals are __FILE__, __LINE__, and __PACKAGE__, which represent the current filename,
line number, and package name at that point in your program. They may be used only as separate tokens;
they will not be interpolated into strings. If there is no current package (due to an empty package;
directive), __PACKAGE__ is the undefined value.
The tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual
end of file. Any following text is ignored, but may be read via a DATA filehandle: main::DATA for
__END__, or PACKNAME::DATA (where PACKNAME is the current package) for __DATA__. The two
control characters ^D and ^Z are synonyms for __END__ (or __DATA__ in a module). See SelfLoader for
more description of __DATA__, and an example of its use. Note that you cannot read from the DATA
filehandle in a BEGIN block: the BEGIN block is executed as soon as it is seen (during compilation), at
which point the corresponding __DATA__ (or __END__) token has not yet been seen.
A word that has no other interpretation in the grammar will be treated as if it were a quoted string. These are
known as "barewords". As with filehandles and labels, a bareword that consists entirely of lowercase letters
150
Version 5.005_02
18−Oct−1998
perldata
Perl Programmers Reference Guide
perldata
risks conflict with future reserved words, and if you use the −w switch, Perl will warn you about any such
words. Some people may wish to outlaw barewords entirely. If you say
use strict ’subs’;
then any bareword that would NOT be interpreted as a subroutine call produces a compile−time error
instead. The restriction lasts to the end of the enclosing block. An inner block may countermand this by
saying no strict ‘subs’.
Array variables are interpolated into double−quoted strings by joining all the elements of the array with the
delimiter specified in the $" variable ($LIST_SEPARATOR in English), space by default. The following
are equivalent:
$temp = join($",@ARGV);
system "echo $temp";
system "echo @ARGV";
Within search patterns (which also undergo double−quotish substitution) there is a bad ambiguity: Is
/$foo[bar]/ to be interpreted as /${foo}[bar]/ (where [bar] is a character class for the regular
expression) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn‘t
otherwise exist, then it‘s obviously a character class. If @foo exists, Perl takes a good guess about [bar],
and is almost always right. If it does guess wrong, or if you‘re just plain paranoid, you can force the correct
interpretation with curly brackets as above.
A line−oriented form of quoting is based on the shell "here−doc" syntax. Following a << you specify a
string to terminate the quoted material, and all lines following the current line down to the terminating string
are the value of the item. The terminating string may be either an identifier (a word), or some quoted text. If
quoted, the type of quotes you use determines the treatment of the text, just as in regular quoting. An
unquoted identifier works like double quotes. There must be no space between the << and the identifier. (If
you put a space it will be treated as a null identifier, which is valid, and matches the first empty line.) The
terminating string must appear by itself (unquoted and with no surrounding whitespace) on the terminating
line.
print < operator between key/value pairs. The => operator is mostly just a
more visually distinctive synonym for a comma, but it also arranges for its left−hand operand to be
interpreted as a string—if it‘s a bareword that would be a legal identifier. This makes it nice for initializing
hashes:
%map = (
red
=> 0x00f,
blue => 0x0f0,
green => 0xf00,
);
or for initializing hash references to be used as records:
$rec = {
witch => ’Mable the Merciless’,
cat
=> ’Fluffy the Ferocious’,
date => ’10/31/1776’,
};
or for using call−by−named−parameter to complicated functions:
$field = $query−>radio_group(
name
=> ’group_name’,
values
=> [’eenie’,’meenie’,’minie’],
18−Oct−1998
Version 5.005_02
153
perldata
Perl Programmers Reference Guide
perldata
default
=> ’meenie’,
linebreak => ’true’,
labels
=> \%labels
);
Note that just because a hash is initialized in that order doesn‘t mean that it comes out in that order. See sort
for examples of how to arrange for an output ordering.
Typeglobs and Filehandles
Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a
typeglob is a *, because it represents all types. This used to be the preferred way to pass arrays and hashes
by reference into a function, but now that we have real references, this is seldom needed.
The main use of typeglobs in modern Perl is create symbol table aliases. This assignment:
*this = *that;
makes $this an alias for $that, @this an alias for @that, %this an alias for %that, &this an alias for
&that, etc. Much safer is to use a reference. This:
local *Here::blue = \$There::green;
temporarily makes $Here::blue an alias for $There::green, but doesn‘t make @Here::blue an alias
for @There::green, or %Here::blue an alias for %There::green, etc. See Symbol Tables in perlmod for more
examples of this. Strange though this may seem, this is the basis for the whole module import/export
system.
Another use for typeglobs is to to pass filehandles into a function or to create new filehandles. If you need to
use a typeglob to save away a filehandle, do it this way:
$fh = *STDOUT;
or perhaps as a real reference, like this:
$fh = \*STDOUT;
See perlsub for examples of using these as indirect filehandles in functions.
Typeglobs are also a way to create a local filehandle using the local() operator. These last until their
block is exited, but may be passed back. For example:
sub newopen {
my $path = shift;
local *FH; # not my!
open
(FH, $path)
return *FH;
}
$fh = newopen(’/etc/passwd’);
or
return undef;
Now that we have the *foo{THING} notation, typeglobs aren‘t used as much for filehandle manipulations,
although they‘re still needed to pass brand new file and directory handles into or out of functions. That‘s
because *HANDLE{IO} only works if HANDLE has already been used as a handle. In other words, *FH
can be used to create new symbol table entries, but *foo{THING} cannot.
Another way to create anonymous filehandles is with the IO::Handle module and its ilk. These modules
have the advantage of not hiding different types of the same name during the local(). See the bottom of
open() for an example.
See perlref, perlsub, and Symbol Tables in perlmod for more discussion on typeglobs and the *foo{THING}
syntax.
154
Version 5.005_02
18−Oct−1998
perlsyn
Perl Programmers Reference Guide
perlsyn
NAME
perlsyn − Perl syntax
DESCRIPTION
A Perl script consists of a sequence of declarations and statements. The only things that need to be declared
in Perl are report formats and subroutines. See the sections below for more information on those
declarations. All uninitialized user−created objects are assumed to start with a null or value until they
are defined by some explicit operation such as assignment. (Though you can get warnings about the use of
undefined values if you like.) The sequence of statements is executed just once, unlike in sed and awk
scripts, where the sequence of statements is executed for each input line. While this means that you must
explicitly loop over the lines of your input file (or files), it also means you have much more control over
which files and which lines you look at. (Actually, I‘m lying—it is possible to do an implicit loop with
either the −n or −p switch. It‘s just not the mandatory default like it is in sed and awk.)
Declarations
Perl is, for the most part, a free−form language. (The only exception to this is format declarations, for
obvious reasons.) Comments are indicated by the "#" character, and extend to the end of the line. If you
attempt to use /* */ C−style comments, it will be interpreted either as division or pattern matching,
depending on the context, and C++ // comments just look like a null regular expression, so don‘t do that.
A declaration can be put anywhere a statement can, but has no effect on the execution of the primary
sequence of statements—declarations all take effect at compile time. Typically all the declarations are put at
the beginning or the end of the script. However, if you‘re using lexically−scoped private variables created
with my(), you‘ll have to make sure your format or subroutine definition is within the same block scope as
the my if you expect to be able to access those private variables.
Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point
forward in the program. You can declare a subroutine without defining it by saying sub name, thus:
sub myname;
$me = myname $0
or die "can’t get myname";
Note that it functions as a list operator, not as a unary operator; so be careful to use or instead of || in this
case. However, if you were to declare the subroutine as sub myname ($), then myname would function
as a unary operator, so either or or || would work.
Subroutines declarations can also be loaded up with the require statement or both loaded and imported
into your namespace with a use statement. See perlmod for details on this.
A statement sequence may contain declarations of lexically−scoped variables, but apart from declaring a
variable name, the declaration acts like an ordinary statement, and is elaborated within the sequence of
statements as if it were an ordinary statement. That means it actually has both compile−time and run−time
effects.
Simple statements
The only kind of simple statement is an expression evaluated for its side effects. Every simple statement
must be terminated with a semicolon, unless it is the final statement in a block, in which case the semicolon
is optional. (A semicolon is still encouraged there if the block takes up more than one line, because you may
eventually add another line.) Note that there are some operators like eval {} and do {} that look like
compound statements, but aren‘t (they‘re just TERMs in an expression), and thus need an explicit
termination if used as the last item in a statement.
Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating
semicolon (or block ending). The possible modifiers are:
if EXPR
unless EXPR
while EXPR
until EXPR
18−Oct−1998
Version 5.005_02
155
perlsyn
Perl Programmers Reference Guide
perlsyn
foreach EXPR
The if and unless modifiers have the expected semantics, presuming you‘re a speaker of English. The
foreach modifier is an iterator: For each value in EXPR, it aliases $_ to the value and executes the
statement. The while and until modifiers have the usual "while loop" semantics (conditional
evaluated first), except when applied to a do−BLOCK (or to the now−deprecated do−SUBROUTINE
statement), in which case the block executes once before the conditional is evaluated. This is so that you can
write loops like:
do {
$line = ;
...
} until $line eq ".\n";
See do. Note also that the loop control statements described later will NOT work in this construct, because
modifiers don‘t take loop labels. Sorry. You can always put another block inside of it (for next) or around
it (for last) to do that sort of thing. For next, just double the braces:
do {{
next if $x == $y;
# do something here
}} until $x++ > $z;
For last, you have to be more elaborate:
LOOP: {
do {
last if $x = $y**2;
# do something here
} while $x++ <= $z;
}
Compound statements
In Perl, a sequence of statements that defines a scope is called a block. Sometimes a block is delimited by the
file containing it (in the case of a required file, or the program as a whole), and sometimes a block is
delimited by the extent of a string (in the case of an eval).
But generally, a block is delimited by curly brackets, also known as braces. We will call this syntactic
construct a BLOCK.
The following compound statements may be used to control flow:
if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
LABEL while (EXPR) BLOCK
LABEL while (EXPR) BLOCK continue BLOCK
LABEL for (EXPR; EXPR; EXPR) BLOCK
LABEL foreach VAR (LIST) BLOCK
LABEL BLOCK continue BLOCK
Note that, unlike C and Pascal, these are defined in terms of BLOCKs, not statements. This means that the
curly brackets are required—no dangling statements allowed. If you want to write conditionals without
curly brackets there are several other ways to do it. The following all do the same thing:
if (!open(FOO)) { die "Can’t open $FOO: $!"; }
die "Can’t open $FOO: $!" unless open(FOO);
open(FOO) or die "Can’t open $FOO: $!";
# FOO or bust!
open(FOO) ? ’hi mom’ : die "Can’t open $FOO: $!";
# a bit exotic, that last one
156
Version 5.005_02
18−Oct−1998
perlsyn
Perl Programmers Reference Guide
perlsyn
The if statement is straightforward. Because BLOCKs are always bounded by curly brackets, there is never
any ambiguity about which if an else goes with. If you use unless in place of if, the sense of the test
is reversed.
The while statement executes the block as long as the expression is true (does not evaluate to the null string
("") or or "0"). The LABEL is optional, and if present, consists of an identifier followed by a colon.
The LABEL identifies the loop for the loop control statements next, last, and redo. If the LABEL is
omitted, the loop control statement refers to the innermost enclosing loop. This may include dynamically
looking back your call−stack at run time to find the LABEL. Such desperate behavior triggers a warning if
you use the −w flag.
If there is a continue BLOCK, it is always executed just before the conditional is about to be evaluated
again, just like the third part of a for loop in C. Thus it can be used to increment a loop variable, even
when the loop has been continued via the next statement (which is similar to the C continue statement).
Loop Control
The next command is like the continue statement in C; it starts the next iteration of the loop:
LINE: while () {
next LINE if /^#/;
...
}
# discard comments
The last command is like the break statement in C (as used in loops); it immediately exits the loop in
question. The continue block, if any, is not executed:
LINE: while () {
last LINE if /^$/;
...
}
# exit when done with header
The redo command restarts the loop block without evaluating the conditional again. The continue
block, if any, is not executed. This command is normally used by programs that want to lie to themselves
about what was just input.
For example, when processing a file like /etc/termcap. If your input lines might end in backslashes to
indicate continuation, you want to skip ahead and get the next record.
while (<>) {
chomp;
if (s/\\$//) {
$_ .= <>;
redo unless eof();
}
# now process $_
}
which is Perl short−hand for the more explicitly written version:
LINE: while (defined($line = )) {
chomp($line);
if ($line =~ s/\\$//) {
$line .= ;
redo LINE unless eof(); # not eof(ARGV)!
}
# now process $line
}
Note that if there were a continue block on the above code, it would get executed even on discarded lines.
This is often used to reset line counters or ?pat? one−time matches.
18−Oct−1998
Version 5.005_02
157
perlsyn
Perl Programmers Reference Guide
perlsyn
# inspired by :1,$g/fred/s//WILMA/
while (<>) {
?(fred)?
&& s//WILMA $1 WILMA/;
?(barney)? && s//BETTY $1 BETTY/;
?(homer)?
&& s//MARGE $1 MARGE/;
} continue {
print "$ARGV $.: $_";
close ARGV if eof();
# reset $.
reset
if eof();
# reset ?pat?
}
If the word while is replaced by the word until, the sense of the test is reversed, but the conditional is
still tested before the first iteration.
The loop control statements don‘t work in an if or unless, since they aren‘t loops. You can double the
braces to make them such, though.
if (/pattern/) {{
next if /fred/;
next if /barney/;
# so something here
}}
The form while/if BLOCK BLOCK, available in Perl 4, is no longer available. Replace any occurrence
of if BLOCK by if (do BLOCK).
For Loops
Perl‘s C−style for loop works exactly like the corresponding while loop; that means that this:
for ($i = 1; $i < 10; $i++) {
...
}
is the same as this:
$i = 1;
while ($i < 10) {
...
} continue {
$i++;
}
(There is one minor difference: The first form implies a lexical scope for variables declared with my in the
initialization expression.)
Besides the normal array index looping, for can lend itself to many other interesting applications. Here‘s
one that avoids the problem you get into if you explicitly test for end−of−file on an interactive file descriptor
causing your program to appear to hang.
$on_a_tty = −t STDIN && −t STDOUT;
sub prompt { print "yes? " if $on_a_tty }
for ( prompt(); ; prompt() ) {
# do something
}
Foreach Loops
The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list
in turn. If the variable is preceded with the keyword my, then it is lexically scoped, and is therefore visible
only within the loop. Otherwise, the variable is implicitly local to the loop and regains its former value upon
exiting the loop. If the variable was previously declared with my, it uses that variable instead of the global
158
Version 5.005_02
18−Oct−1998
perlsyn
Perl Programmers Reference Guide
perlsyn
one, but it‘s still localized to the loop. (Note that a lexically scoped variable can cause problems if you have
subroutine or format declarations within the loop which refer to it.)
The foreach keyword is actually a synonym for the for keyword, so you can use foreach for
readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing
for comes more naturally.) If VAR is omitted, $_ is set to each value. If any element of LIST is an lvalue,
you can modify it by modifying VAR inside the loop. That‘s because the foreach loop index variable is
an implicit alias for each item in the list that you‘re looping over.
If any part of LIST is an array, foreach will get very confused if you add or remove elements within the
loop body, for example with splice. So don‘t do that.
foreach probably won‘t do what you expect if VAR is a tied or other special variable.
either.
Don‘t do that
Examples:
for (@ary) { s/foo/bar/ }
foreach my $elem (@elements) {
$elem *= 2;
}
for $count (10,9,8,7,6,5,4,3,2,1,’BOOM’) {
print $count, "\n"; sleep(1);
}
for (1..15) { print "Merry Christmas\n"; }
foreach $item (split(/:[\\\n:]*/, $ENV{TERMCAP})) {
print "Item: $item\n";
}
Here‘s how a C programmer might code up a particular algorithm in Perl:
for (my $i = 0; $i < @ary1; $i++) {
for (my $j = 0; $j < @ary2; $j++) {
if ($ary1[$i] > $ary2[$j]) {
last; # can’t go to outer :−(
}
$ary1[$i] += $ary2[$j];
}
# this is where that last takes me
}
Whereas here‘s how a Perl programmer more comfortable with the idiom might do it:
OUTER: foreach my $wid (@ary1) {
INNER:
foreach my $jet (@ary2) {
next OUTER if $wid > $jet;
$wid += $jet;
}
}
See how much easier this is? It‘s cleaner, safer, and faster. It‘s cleaner because it‘s less noisy. It‘s safer
because if code gets added between the inner and outer loops later on, the new code won‘t be accidentally
executed. The next explicitly iterates the other loop rather than merely terminating the inner one. And it‘s
faster because Perl executes a foreach statement more rapidly than it would the equivalent for loop.
18−Oct−1998
Version 5.005_02
159
perlsyn
Perl Programmers Reference Guide
perlsyn
Basic BLOCKs and Switch Statements
A BLOCK by itself (labeled or not) is semantically equivalent to a loop that executes once. Thus you can
use any of the loop control statements in it to leave or restart the block. (Note that this is NOT true in
eval{}, sub{}, or contrary to popular belief do{} blocks, which do NOT count as loops.) The
continue block is optional.
The BLOCK construct is particularly nice for doing case structures.
SWITCH: {
if (/^abc/) { $abc = 1; last SWITCH; }
if (/^def/) { $def = 1; last SWITCH; }
if (/^xyz/) { $xyz = 1; last SWITCH; }
$nothing = 1;
}
There is no official switch statement in Perl, because there are already several ways to write the
equivalent. In addition to the above, you could write
SWITCH: {
$abc = 1, last SWITCH
$def = 1, last SWITCH
$xyz = 1, last SWITCH
$nothing = 1;
}
if /^abc/;
if /^def/;
if /^xyz/;
(That‘s actually not as strange as it looks once you realize that you can use loop control "operators" within an
expression, That‘s just the normal C comma operator.)
or
SWITCH: {
/^abc/ && do { $abc = 1; last SWITCH; };
/^def/ && do { $def = 1; last SWITCH; };
/^xyz/ && do { $xyz = 1; last SWITCH; };
$nothing = 1;
}
or formatted so it stands out more as a "proper" switch statement:
SWITCH: {
/^abc/
&& do {
$abc = 1;
last SWITCH;
};
/^def/
&& do {
$def = 1;
last SWITCH;
};
/^xyz/
&& do {
$xyz = 1;
last SWITCH;
};
$nothing = 1;
}
or
SWITCH: {
160
Version 5.005_02
18−Oct−1998
perlsyn
Perl Programmers Reference Guide
/^abc/ and
/^def/ and
/^xyz/ and
$nothing =
perlsyn
$abc = 1, last SWITCH;
$def = 1, last SWITCH;
$xyz = 1, last SWITCH;
1;
}
or even, horrors,
if (/^abc/)
{ $abc = 1
elsif (/^def/)
{ $def = 1
elsif (/^xyz/)
{ $xyz = 1
else
{ $nothing
}
}
}
= 1 }
A common idiom for a switch statement is to use foreach‘s aliasing to make a temporary assignment to
$_ for convenient matching:
SWITCH: for ($where) {
/In Card Names/
&&
/Anywhere/
&&
/In Rulings/
&&
die "unknown value for
}
do { push @flags, ’−e’; last; };
do { push @flags, ’−h’; last; };
do {
last; };
form variable where: ‘$where’";
Another interesting approach to a switch statement is arrange for a do block to return the proper value:
$amode = do {
if
($flag
elsif ($flag
elsif ($flag
if ($flag
else
}
};
&
&
&
&
O_RDONLY)
O_WRONLY)
O_RDWR)
O_CREAT)
{
{
{
{
{
"r" }
# XXX: isn’t this 0?
($flag & O_APPEND) ? "a" : "w" }
"w+" }
($flag & O_APPEND) ? "a+" : "r+" }
Or
print do {
($flags & O_WRONLY) ? "write−only"
($flags & O_RDWR)
? "read−write"
"read−only";
};
:
:
Or if you are certainly that all the && clauses are true, you can use something like this, which "switches" on
the value of the HTTP_USER_AGENT envariable.
#!/usr/bin/perl
# pick out jargon file page based on browser
$dir = ’http://www.wins.uva.nl/~mes/jargon’;
for ($ENV{HTTP_USER_AGENT}) {
$page =
/Mac/
&& ’m/Macintrash.html’
|| /Win(dows )?NT/ && ’e/evilandrude.html’
|| /Win|MSIE|WebTV/ && ’m/MicroslothWindows.html’
|| /Linux/
&& ’l/Linux.html’
|| /HP−UX/
&& ’h/HP−SUX.html’
|| /SunOS/
&& ’s/ScumOS.html’
18−Oct−1998
Version 5.005_02
161
perlsyn
Perl Programmers Reference Guide
perlsyn
||
’a/AppendixB.html’;
}
print "Location: $dir/$page\015\012\015\012";
That kind of switch statement only works when you know the && clauses will be true. If you don‘t, the
previous ?: example should be used.
You might also consider writing a hash instead of synthesizing a switch statement.
Goto
Although not for the faint of heart, Perl does support a goto statement. A loop‘s LABEL is not actually a
valid target for a goto; it‘s just the name of the loop. There are three forms: goto−LABEL, goto−EXPR,
and goto−&NAME.
The goto−LABEL form finds the statement labeled with LABEL and resumes execution there. It may not
be used to go into any construct that requires initialization, such as a subroutine or a foreach loop. It also
can‘t be used to go into a construct that is optimized away. It can be used to go almost anywhere else within
the dynamic scope, including out of subroutines, but it‘s usually better to use some other construct such as
last or die. The author of Perl has never felt the need to use this form of goto (in Perl, that is—C is
another matter).
The goto−EXPR form expects a label name, whose scope will be resolved dynamically. This allows for
computed gotos per FORTRAN, but isn‘t necessarily recommended if you‘re optimizing for
maintainability:
goto ("FOO", "BAR", "GLARCH")[$i];
The goto−&NAME form is highly magical, and substitutes a call to the named subroutine for the currently
running subroutine. This is used by AUTOLOAD() subroutines that wish to load another subroutine and then
pretend that the other subroutine had been called in the first place (except that any modifications to @_ in the
current subroutine are propagated to the other subroutine.) After the goto, not even caller() will be
able to tell that this routine was called first.
In almost all cases like this, it‘s usually a far, far better idea to use the structured control flow mechanisms of
next, last, or redo instead of resorting to a goto. For certain applications, the catch and throw pair of
eval{} and die() for exception processing can also be a prudent approach.
PODs: Embedded Documentation
Perl has a mechanism for intermixing documentation with source code. While it‘s expecting the beginning of
a new statement, if the compiler encounters a line that begins with an equal sign and a word, like this
=head1 Here There Be Pods!
Then that text and all remaining text up through and including a line beginning with =cut will be ignored.
The format of the intervening text is described in perlpod.
This allows you to intermix your source code and your documentation text freely, as in
=item snazzle($)
The snazzle() function will behave in the most spectacular
form that you can possibly imagine, not even excepting
cybernetic pyrotechnics.
=cut back to the compiler, nuff of this pod stuff!
sub snazzle($) {
my $thingie = shift;
.........
}
Note that pod translators should look at only paragraphs beginning with a pod directive (it makes parsing
easier), whereas the compiler actually knows to look for pod escapes even in the middle of a paragraph. This
162
Version 5.005_02
18−Oct−1998
perlsyn
Perl Programmers Reference Guide
perlsyn
means that the following secret stuff will be ignored by both the compiler and the translators.
$a=3;
=secret stuff
warn "Neither POD nor CODE!?"
=cut back
print "got $a\n";
You probably shouldn‘t rely upon the warn() being podded out forever. Not all pod translators are
well−behaved in this regard, and perhaps the compiler will become pickier.
One may also use pod directives to quickly comment out a section of code.
Plain Old Comments (Not!)
Much like the C preprocessor, Perl can process line directives. Using this, one can control Perl‘s idea of
filenames and line numbers in error or warning messages (especially for strings that are processed with
eval()). The syntax for this mechanism is the same as for most C preprocessors: it matches the regular
expression /^#\s*line\s+(\d+)\s*(?:\s"([^"]*)")?/ with $1 being the line number for the
next line, and $2 being the optional filename (specified within quotes).
Here are some examples that you should be able to type into your command shell:
% perl
# line 200 "bzzzt"
# the ‘#’ on the previous line must be the first char on line
die ’foo’;
__END__
foo at bzzzt line 201.
% perl
# line 200 "bzzzt"
eval qq[\n#line 2001 ""\ndie ’foo’]; print $@;
__END__
foo at − line 2001.
% perl
eval qq[\n#line 200 "foo bar"\ndie ’foo’]; print $@;
__END__
foo at foo bar line 200.
% perl
# line 345 "goop"
eval "\n#line " . __LINE__ . ’ "’ . __FILE__ ."\"\ndie ’foo’";
print $@;
__END__
foo at goop line 345.
18−Oct−1998
Version 5.005_02
163
perlop
Perl Programmers Reference Guide
perlop
NAME
perlop − Perl operators and precedence
SYNOPSIS
Perl operators have the following associativity and precedence, listed from highest precedence to lowest.
Note that all operators borrowed from C keep the same precedence relationship with each other, even where
C‘s precedence is slightly screwy. (This makes learning Perl easier for C folks.) With very few exceptions,
these all operate on scalar values only, not array values.
left
left
nonassoc
right
right
left
left
left
left
nonassoc
nonassoc
nonassoc
left
left
left
left
nonassoc
right
right
left
nonassoc
right
left
left
terms and list operators (leftward)
−>
++ −−
**
! ~ \ and unary + and −
=~ !~
* / % x
+ − .
<< >>
named unary operators
< > <= >= lt gt le ge
== != <=> eq ne cmp
&
| ^
&&
||
.. ...
?:
= += −= *= etc.
, =>
list operators (rightward)
not
and
or xor
In the following sections, these operators are covered in precedence order.
Many operators can be overloaded for objects. See overload.
DESCRIPTION
Terms and List Operators (Leftward)
A TERM has the highest precedence in Perl. They includes variables, quote and quote−like operators, any
expression in parentheses, and any function whose arguments are parenthesized. Actually, there aren‘t really
functions in this sense, just list operators and unary operators behaving as functions because you put
parentheses around the arguments. These are all documented in perlfunc.
If any list operator (print(), etc.) or any unary operator (chdir(), etc.) is followed by a left
parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest
precedence, just like a normal function call.
In the absence of parentheses, the precedence of list operators such as print, sort, or chmod is either
very high or very low depending on whether you are looking at the left side or the right side of the operator.
For example, in
@ary = (1, 3, sort 4, 2);
print @ary;
# prints 1324
the commas on the right of the sort are evaluated before the sort, but the commas on the left are evaluated
164
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
after. In other words, list operators tend to gobble up all the arguments that follow them, and then act like a
simple TERM with regard to the preceding expression. Note that you have to be careful with parentheses:
# These evaluate exit before doing the print:
print($foo, exit); # Obviously not what you want.
print $foo, exit;
# Nor is this.
# These do the print before evaluating exit:
(print $foo), exit; # This is what you want.
print($foo), exit; # Or this.
print ($foo), exit; # Or even this.
Also note that
print ($foo & 255) + 1, "\n";
probably doesn‘t do what you expect at first glance. See Named Unary Operators for more discussion of
this.
Also parsed as terms are the do {} and eval {} constructs, as well as subroutine and method calls, and
the anonymous constructors [] and {}.
See also Quote and Quote−like Operators toward the end of this section, as well as O Operators".
The Arrow Operator
Just as in C and C++, "−>" is an infix dereference operator. If the right side is either a [...] or {...}
subscript, then the left side must be either a hard or symbolic reference to an array or hash (or a location
capable of holding a hard reference, if it‘s an lvalue (assignable)). See perlref.
Otherwise, the right side is a method name or a simple scalar variable containing the method name, and the
left side must either be an object (a blessed reference) or a class name (that is, a package name). See perlobj.
Auto−increment and Auto−decrement
"++" and "—" work as in C. That is, if placed before a variable, they increment or decrement the variable
before returning the value, and if placed after, increment or decrement the variable after returning the value.
The auto−increment operator has a little extra builtin magic to it. If you increment a variable that is numeric,
or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has
been used in only string contexts since it was set, and has a value that is not the empty string and matches the
pattern /^[a−zA−Z]*[0−9]*$/, the increment is done as a string, preserving each character within its
range, with carry:
print
print
print
print
++($foo
++($foo
++($foo
++($foo
=
=
=
=
’99’);
’a0’);
’Az’);
’zz’);
#
#
#
#
prints
prints
prints
prints
’100’
’a1’
’Ba’
’aaa’
The auto−decrement operator is not magical.
Exponentiation
Binary "**" is the exponentiation operator. Note that it binds even more tightly than unary minus, so −2**4
is −(2**4), not (−2)**4. (This is implemented using C‘s pow(3) function, which actually works on doubles
internally.)
Symbolic Unary Operators
Unary "!" performs logical negation, i.e., "not". See also not for a lower precedence version of this.
Unary "−" performs arithmetic negation if the operand is numeric. If the operand is an identifier, a string
consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a
plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that
−bareword is equivalent to "−bareword".
18−Oct−1998
Version 5.005_02
165
perlop
Perl Programmers Reference Guide
perlop
Unary "~" performs bitwise negation, i.e., 1‘s complement. For example, 0666 &~ 027 is 0640. (See
also Integer Arithmetic and Bitwise String Operators.)
Unary "+" has no effect whatsoever, even on strings. It is useful syntactically for separating a function name
from a parenthesized expression that would otherwise be interpreted as the complete list of function
arguments. (See examples above under Terms and List Operators (Leftward).)
Unary "\" creates a reference to whatever follows it. See perlref. Do not confuse this behavior with the
behavior of backslash within a string, although both forms do convey the notion of protecting the next thing
from interpretation.
Binding Operators
Binary "=~" binds a scalar expression to a pattern match. Certain operations search or modify the string $_
by default. This operator makes that kind of operation work on some other string. The right argument is a
search pattern, substitution, or transliteration. The left argument is what is supposed to be searched,
substituted, or transliterated instead of the default $_. The return value indicates the success of the
operation. (If the right argument is an expression rather than a search pattern, substitution, or transliteration,
it is interpreted as a search pattern at run time. This can be is less efficient than an explicit search, because
the pattern must be compiled every time the expression is evaluated.
Binary "!~" is just like "=~" except the return value is negated in the logical sense.
Multiplicative Operators
Binary "*" multiplies two numbers.
Binary "/" divides two numbers.
Binary "%" computes the modulus of two numbers. Given integer operands $a and $b: If $b is positive,
then $a % $b is $a minus the largest multiple of $b that is not greater than $a. If $b is negative, then
$a % $b is $a minus the smallest multiple of $b that is not less than $a (i.e. the result will be less than or
equal to zero). Note than when use integer is in scope, "%" give you direct access to the modulus
operator as implemented by your C compiler. This operator is not as well defined for negative operands, but
it will execute faster.
Binary "x" is the repetition operator. In scalar context, it returns a string consisting of the left operand
repeated the number of times specified by the right operand. In list context, if the left operand is a list in
parentheses, it repeats the list.
print ’−’ x 80;
# print row of dashes
print "\t" x ($tab/8), ’ ’ x ($tab%8);
@ones = (1) x 80;
@ones = (5) x @ones;
# tab over
# a list of 80 1’s
# set all elements to 5
Additive Operators
Binary "+" returns the sum of two numbers.
Binary "−" returns the difference of two numbers.
Binary "." concatenates two strings.
Shift Operators
Binary "<<" returns the value of its left argument shifted left by the number of bits specified by the right
argument. Arguments should be integers. (See also Integer Arithmetic.)
Binary "" returns the value of its left argument shifted right by the number of bits specified by the right
argument. Arguments should be integers. (See also Integer Arithmetic.)
Named Unary Operators
The various named unary operators are treated as functions with one argument, with optional parentheses.
These include the filetest operators, like −f, −M, etc. See perlfunc.
166
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
If any list operator (print(), etc.) or any unary operator (chdir(), etc.) is followed by a left
parenthesis as the next token, the operator and arguments within parentheses are taken to be of highest
precedence, just like a normal function call. Examples:
chdir $foo
chdir($foo)
chdir ($foo)
chdir +($foo)
||
||
||
||
die;
die;
die;
die;
#
#
#
#
(chdir
(chdir
(chdir
(chdir
$foo)
$foo)
$foo)
$foo)
||
||
||
||
die
die
die
die
but, because * is higher precedence than ||:
chdir $foo * 20;
chdir($foo) * 20;
chdir ($foo) * 20;
chdir +($foo) * 20;
#
#
#
#
chdir ($foo * 20)
(chdir $foo) * 20
(chdir $foo) * 20
chdir ($foo * 20)
rand 10 * 20;
rand(10) * 20;
rand (10) * 20;
rand +(10) * 20;
#
#
#
#
rand (10 * 20)
(rand 10) * 20
(rand 10) * 20
rand (10 * 20)
See also "Terms and List Operators (Leftward)".
Relational Operators
Binary "<" returns true if the left argument is numerically less than the right argument.
Binary ">" returns true if the left argument is numerically greater than the right argument.
Binary "<=" returns true if the left argument is numerically less than or equal to the right argument.
Binary ">=" returns true if the left argument is numerically greater than or equal to the right argument.
Binary "lt" returns true if the left argument is stringwise less than the right argument.
Binary "gt" returns true if the left argument is stringwise greater than the right argument.
Binary "le" returns true if the left argument is stringwise less than or equal to the right argument.
Binary "ge" returns true if the left argument is stringwise greater than or equal to the right argument.
Equality Operators
Binary "==" returns true if the left argument is numerically equal to the right argument.
Binary "!=" returns true if the left argument is numerically not equal to the right argument.
Binary "<=>" returns −1, 0, or 1 depending on whether the left argument is numerically less than, equal to,
or greater than the right argument.
Binary "eq" returns true if the left argument is stringwise equal to the right argument.
Binary "ne" returns true if the left argument is stringwise not equal to the right argument.
Binary "cmp" returns −1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or
greater than the right argument.
"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified by the current locale if use locale
is in effect. See perllocale.
Bitwise And
Binary "&" returns its operators ANDed together bit by bit. (See also Integer Arithmetic and
Bitwise String Operators.)
18−Oct−1998
Version 5.005_02
167
perlop
Perl Programmers Reference Guide
perlop
Bitwise Or and Exclusive Or
Binary "|" returns its operators ORed together bit by bit. (See also Integer Arithmetic and
Bitwise String Operators.)
Binary "^" returns its operators XORed together bit by bit. (See also Integer Arithmetic and
Bitwise String Operators.)
C−style Logical And
Binary "&&" performs a short−circuit logical AND operation. That is, if the left operand is false, the right
operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.
C−style Logical Or
Binary "||" performs a short−circuit logical OR operation. That is, if the left operand is true, the right
operand is not even evaluated. Scalar or list context propagates down to the right operand if it is evaluated.
The || and && operators differ from C‘s in that, rather than returning 0 or 1, they return the last value
evaluated. Thus, a reasonably portable way to find out the home directory (assuming it‘s not "0") might be:
$home = $ENV{’HOME’} || $ENV{’LOGDIR’} ||
(getpwuid($<))[7] || die "You’re homeless!\n";
In particular, this means that you shouldn‘t use this for selecting between two aggregates for assignment:
@a = @b || @c;
@a = scalar(@b) || @c;
@a = @b ? @b : @c;
# this is wrong
# really meant this
# this works fine, though
As more readable alternatives to && and || when used for control flow, Perl provides and and or operators
(see below). The short−circuit behavior is identical. The precedence of "and" and "or" is much lower,
however, so that you can safely use them after a list operator without the need for parentheses:
unlink "alpha", "beta", "gamma"
or gripe(), next LINE;
With the C−style operators that would have been written like this:
unlink("alpha", "beta", "gamma")
|| (gripe(), next LINE);
Use "or" for assignment is unlikely to do what you want; see below.
Range Operators
Binary ".." is the range operator, which is really two different operators depending on the context. In list
context, it returns an array of values counting (by ones) from the left value to the right value. This is useful
for writing foreach (1..10) loops and for doing slice operations on arrays. In the current
implementation, no temporary array is created when the range operator is used as the expression in
foreach loops, but older versions of Perl might burn a lot of memory when you write something like this:
for (1 .. 1_000_000) {
# code
}
In scalar context, ".." returns a boolean value. The operator is bistable, like a flip−flop, and emulates the
line−range (comma) operator of sed, awk, and various editors. Each ".." operator maintains its own boolean
state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true
until the right operand is true, AFTER which the range operator becomes false again. (It doesn‘t become
false till the next time the range operator is evaluated. It can test the right operand and become false on the
same evaluation it became true (as in awk), but it still returns true once. If you don‘t want it to test the right
operand till the next evaluation (as in sed), use three dots ("...") instead of two.) The right operand is not
evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is
in the "true" state. The precedence is a little lower than || and &&. The value returned is either the empty
168
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each
range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn‘t
affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can
exclude the beginning point by waiting for the sequence number to be greater than 1. If either operand of
scalar ".." is a constant expression, that operand is implicitly compared to the $. variable, the current line
number. Examples:
As a scalar operator:
if (101 .. 200) { print; } # print 2nd hundred lines
next line if (1 .. /^$/);
# skip header lines
s/^/> / if (/^$/ .. eof()); # quote body
# parse mail messages
while (<>) {
$in_header =
1 .. /^$/;
$in_body
= /^$/ .. eof();
# do something based on those
} continue {
close ARGV if eof;
}
# reset $. each file
As a list operator:
for (101 .. 200) { print; } # print $_ 100 times
@foo = @foo[0 .. $#foo];
# an expensive no−op
@foo = @foo[$#foo−4 .. $#foo];
# slice last 5 items
The range operator (in list context) makes use of the magical auto−increment algorithm if the operands are
strings. You can say
@alphabet = (’A’ .. ’Z’);
to get all the letters of the alphabet, or
$hexdigit = (0 .. 9, ’a’ .. ’f’)[$num & 15];
to get a hexadecimal digit, or
@z2 = (’01’ .. ’31’);
print $z2[$mday];
to get dates with leading zeros. If the final value specified is not in the sequence that the magical increment
would produce, the sequence goes until the next value would be longer than the final value specified.
Conditional Operator
Ternary "?:" is the conditional operator, just as in C. It works much like an if−then−else. If the argument
before the ? is true, the argument before the : is returned, otherwise the argument after the : is returned. For
example:
printf "I have %d dog%s.\n", $n,
($n == 1) ? ’’ : "s";
Scalar or list context propagates downward into the 2nd or 3rd argument, whichever is selected.
$a = $ok ? $b : $c;
@a = $ok ? @b : @c;
$a = $ok ? @b : @c;
# get a scalar
# get an array
# oops, that’s just a count!
The operator may be assigned to if both the 2nd and 3rd arguments are legal lvalues (meaning that you can
assign to them):
($a_or_b ? $a : $b) = $c;
18−Oct−1998
Version 5.005_02
169
perlop
Perl Programmers Reference Guide
perlop
This is not necessarily guaranteed to contribute to the readability of your program.
Because this operator produces an assignable result, using assignments without parentheses will get you in
trouble. For example, this:
$a % 2 ? $a += 10 : $a += 2
Really means this:
(($a % 2) ? ($a += 10) : $a) += 2
Rather than this:
($a % 2) ? ($a += 10) : ($a += 2)
Assignment Operators
"=" is the ordinary assignment operator.
Assignment operators work as in C. That is,
$a += 2;
is equivalent to
$a = $a + 2;
although without duplicating any side effects that dereferencing the lvalue might trigger, such as from
tie(). Other assignment operators work similarly. The following are recognized:
**=
+=
−=
.=
*=
/=
%=
x=
&=
|=
^=
<<=
>>=
&&=
||=
Note that while these are grouped by family, they all have the precedence of assignment.
Unlike in C, the assignment operator produces a valid lvalue. Modifying an assignment is equivalent to
doing the assignment and then modifying the variable that was assigned to. This is useful for modifying a
copy of something, like this:
($tmp = $global) =~ tr [A−Z] [a−z];
Likewise,
($a += 2) *= 3;
is equivalent to
$a += 2;
$a *= 3;
Comma Operator
Binary "," is the comma operator. In scalar context it evaluates its left argument, throws that value away,
then evaluates its right argument and returns that value. This is just like C‘s comma operator.
In list context, it‘s just the list argument separator, and inserts both its arguments into the list.
The => digraph is mostly just a synonym for the comma operator. It‘s useful for documenting arguments
that come in pairs. As of release 5.001, it also forces any word to the left of it to be interpreted as a string.
List Operators (Rightward)
On the right side of a list operator, it has very low precedence, such that it controls all comma−separated
expressions found there. The only operators with lower precedence are the logical operators "and", "or", and
"not", which may be used to evaluate calls to list operators without the need for extra parentheses:
open HANDLE, "filename"
170
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
or die "Can’t open: $!\n";
See also discussion of list operators in Terms and List Operators (Leftward).
Logical Not
Unary "not" returns the logical negation of the expression to its right. It‘s the equivalent of "!" except for the
very low precedence.
Logical And
Binary "and" returns the logical conjunction of the two surrounding expressions. It‘s equivalent to &&
except for the very low precedence. This means that it short−circuits: i.e., the right expression is evaluated
only if the left expression is true.
Logical or and Exclusive Or
Binary "or" returns the logical disjunction of the two surrounding expressions. It‘s equivalent to || except for
the very low precedence. This makes it useful for control flow
print FH $data
or die "Can’t write to FH: $!";
This means that it short−circuits: i.e., the right expression is evaluated only if the left expression is false.
Due to its precedence, you should probably avoid using this for assignment, only for control flow.
$a = $b or $c;
($a = $b) or $c;
$a = $b || $c;
# bug: this is wrong
# really means this
# better written this way
However, when it‘s a list context assignment and you‘re trying to use "||" for control flow, you probably need
"or" so that the assignment takes higher precedence.
@info = stat($file) || die;
@info = stat($file) or die;
# oops, scalar sense of stat!
# better, now @info gets its due
Then again, you could always use parentheses.
Binary "xor" returns the exclusive−OR of the two surrounding expressions. It cannot short circuit, of course.
C Operators Missing From Perl
Here is what C has that Perl doesn‘t:
unary &
Address−of operator. (But see the "\" operator for taking a reference.)
unary *
Dereference−address operator. (Perl‘s prefix dereferencing operators are typed: $, @, %, and
&.)
(TYPE)
Type casting operator.
Quote and Quote−like Operators
While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds
of interpolating and pattern matching capabilities. Perl provides customary quote characters for these
behaviors, but also provides a way for you to choose your quote character for any of them. In the following
table, a {} represents any pair of delimiters you choose. Non−bracketing delimiters use the same character
fore and aft, but the 4 sorts of brackets (round, angle, square, curly) will all nest.
Customary
’’
""
‘‘
//
18−Oct−1998
Generic
q{}
qq{}
qx{}
qw{}
m{}
qr{}
s{}{}
tr{}{}
Meaning
Literal
Literal
Command
Word list
Pattern match
Pattern
Substitution
Transliteration
Version 5.005_02
Interpolates
no
yes
yes (unless ’’ is delimiter)
no
yes
yes
yes
no (but see below)
171
perlop
Perl Programmers Reference Guide
perlop
Note that there can be whitespace between the operator and the quoting characters, except when # is being
used as the quoting character. q#foo# is parsed as being the string foo, while q #foo# is the operator q
followed by a comment. Its argument will be taken from the next line. This allows you to write:
s {foo}
{bar}
# Replace foo
# with bar.
For constructs that do interpolation, variables beginning with "$" or "@" are interpolated, as are the
following sequences. Within a transliteration, the first ten of these sequences may be used.
\t
\n
\r
\f
\b
\a
\e
\033
\x1b
\c[
tab
newline
return
form feed
backspace
alarm (bell)
escape
octal char
hex char
control char
(HT, TAB)
(NL)
(CR)
(FF)
(BS)
(BEL)
(ESC)
\l
\u
\L
\U
\E
\Q
lowercase next char
uppercase next char
lowercase till \E
uppercase till \E
end case modification
quote non−word characters till \E
If use locale is in effect, the case map used by \l, \L, \u and \U is taken from the current locale. See
perllocale.
All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as
an unvarying, physical newline character. It is an illusion that the operating system, device drivers, C
libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF.
For example, on a Mac, these are reversed, and on systems without line terminator, printing "\n" may emit
no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII
when you need an exact character. For example, most networking protocols expect and prefer a CR+LF
("\012\015" or "\cJ\cM") for line terminators, and although they often accept just "\012", they
seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned
some day.
You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the
corresponding variable, while escaping will cause the literal string \$ to be inserted. You‘ll need to write
something like m/\Quser\E\@\Qhost/.
Patterns are subject to an additional level of interpretation as a regular expression. This is done as a second
pass, after variables are interpolated, so that regular expressions may be incorporated into the pattern from
the variables. If this is not what you want, use \Q to interpolate a variable literally.
Apart from the above, there are no multiple levels of interpolation. In particular, contrary to the expectations
of shell programmers, back−quotes do NOT interpolate within double quotes, nor do single quotes impede
evaluation of variables when used within double quotes.
Regexp Quote−Like Operators
Here are the quote−like operators that apply to pattern matching and related activities.
Most of this section is related to use of regular expressions from Perl. Such a use may be considered from
two points of view: Perl handles a a string and a "pattern" to RE (regular expression) engine to match, RE
engine finds (or does not find) the match, and Perl uses the findings of RE engine for its operation, possibly
asking the engine for other matches.
172
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
RE engine has no idea what Perl is going to do with what it finds, similarly, the rest of Perl has no idea what
a particular regular expression means to RE engine. This creates a clean separation, and in this section we
discuss matching from Perl point of view only. The other point of view may be found in perlre.
?PATTERN?
This is just like the /pattern/ search, except that it matches only once between calls to the
reset() operator. This is a useful optimization when you want to see only the first occurrence
of something in each file of a set of files, for instance. Only ?? patterns local to the current
package are reset.
while (<>) {
if (?^$?) {
# blank line between header and body
}
} continue {
reset if eof;
}
# clear ?? status for next file
This usage is vaguely deprecated, and may be removed in some future version of Perl.
m/PATTERN/cgimosx
/PATTERN/cgimosx
Searches a string for a pattern match, and in scalar context returns true (1) or false (‘’). If no
string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with
=~ need not be an lvalue—it may be the result of an expression evaluation, but remember the =~
binds rather tightly.) See also perlre. See perllocale for discussion of additional considerations
that apply when use locale is in effect.
Options are:
c
g
i
m
o
s
x
Do not reset search position on a failed match when /g is in effect.
Match globally, i.e., find all occurrences.
Do case−insensitive pattern matching.
Treat string as multiple lines.
Compile pattern only once.
Treat string as single line.
Use extended regular expressions.
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of
non−alphanumeric, non−whitespace characters as delimiters (if single quotes are used, no
interpretation is done on the replacement string. Unlike Perl 4, Perl 5 treats backticks as normal
delimiters; the replacement text is not evaluated as a command). This is particularly useful for
matching Unix path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then the match−only−once rule of ?PATTERN? applies.
PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every
time the pattern search is evaluated. (Note that $) and $| might not be interpolated because
they look like end−of−string tests.) If you want such a pattern to be compiled only once, add a
/o after the trailing delimiter. This avoids expensive run−time recompilations, and is useful
when the value you are interpolating won‘t change over the life of the script. However,
mentioning /o constitutes a promise that you won‘t change the variables in the pattern. If you
change them, Perl won‘t even notice.
If the PATTERN evaluates to the empty string, the last successfully matched regular expression
is used instead.
If the /g option is not used, m// in a list context returns a list consisting of the subexpressions
matched by the parentheses in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc. are
also set, and that this differs from Perl 4‘s behavior.) When there are no parentheses in the
18−Oct−1998
Version 5.005_02
173
perlop
Perl Programmers Reference Guide
perlop
pattern, the return value is the list (1) for success. With or without parentheses, an empty list is
returned upon failure.
Examples:
open(TTY, ’/dev/tty’);
=~ /^y/i && foo();
# do foo if desired
if (/Version: *([0−9.]*)/) { $version = $1; }
next if m#^/usr/spool/uucp#;
# poor man’s grep
$arg = shift;
while (<>) {
print if /$arg/o;
}
# compile only once
if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))
This last example splits $foo into the first two words and the remainder of the line, and assigns
those three fields to $F1, $F2, and $Etc. The conditional is true if any variables were
assigned, i.e., if the pattern matched.
The /g modifier specifies global pattern matching—that is, matching as many times as possible
within the string. How it behaves depends on the context. In list context, it returns a list of all
the substrings matched by all the parentheses in the regular expression. If there are no
parentheses, it returns a list of all the matched strings, as if there were parentheses around the
whole pattern.
In scalar context, each execution of m//g finds the next match, returning TRUE if it matches,
and FALSE if there is no further match. The position after the last match can be read or set using
the pos() function; see pos. A failed match normally resets the search position to the
beginning of the string, but you can avoid that by adding the /c modifier (e.g. m//gc).
Modifying the target string also resets the search position.
You can intermix m//g matches with m/\G.../g, where \G is a zero−width assertion that
matches the exact position where the previous m//g, if any, left off. The \G assertion is not
supported without the /g modifier; currently, without /g, \G behaves just like \A, but that‘s
accidental and may change in the future.
Examples:
# list context
($one,$five,$fifteen) = (‘uptime‘ =~ /(\d+\.\d+)/g);
# scalar context
$/ = ""; $* = 1; # $* deprecated in modern perls
while (defined($paragraph = <>)) {
while ($paragraph =~ /[a−z][’")]*[.!?]+[’")]*\s/g) {
$sentences++;
}
}
print "$sentences\n";
# using m//gc with \G
$_ = "ppooqppqq";
while ($i++ < 2) {
print "1: ’";
print $1 while /(o)/gc; print "’, pos=", pos, "\n";
print "2: ’";
174
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
print $1 if /\G(q)/gc; print "’, pos=", pos, "\n";
print "3: ’";
print $1 while /(p)/gc; print "’, pos=", pos, "\n";
}
The last example should print:
1:
2:
3:
1:
2:
3:
’oo’, pos=4
’q’, pos=5
’pp’, pos=7
’’, pos=7
’q’, pos=8
’’, pos=8
A useful idiom for lex−like scanners is /\G.../gc. You can combine several regexps like
this to process a string part−by−part, doing different actions depending on which regexp
matched. Each regexp tries to match where the previous one leaves off.
$_ = <<’EOL’;
$url = new URI::URL "http://www/";
EOL
LOOP:
{
print(" digits"),
redo LOOP
print(" lowercase"),
redo LOOP
print(" UPPERCASE"),
redo LOOP
print(" Capitalized"),
redo LOOP
print(" MiXeD"),
redo LOOP
print(" alphanumeric"),
redo LOOP
print(" line−noise"),
redo LOOP
print ". That’s all!\n";
}
die if $url eq "xXx";
if
if
if
if
if
if
if
/\G\d+\b[,.;]?\s*/gc;
/\G[a−z]+\b[,.;]?\s*/gc;
/\G[A−Z]+\b[,.;]?\s*/gc;
/\G[A−Z][a−z]+\b[,.;]?\s*/gc;
/\G[A−Za−z]+\b[,.;]?\s*/gc;
/\G[A−Za−z0−9]+\b[,.;]?\s*/gc;
/\G[^A−Za−z0−9]+/gc;
Here is the output (split into several lines):
line−noise lowercase line−noise lowercase UPPERCASE line−noise
UPPERCASE line−noise lowercase line−noise lowercase line−noise
lowercase lowercase line−noise lowercase lowercase line−noise
MiXeD line−noise. That’s all!
q/STRING/
‘STRING’
A single−quoted, literal string. A backslash represents a backslash unless followed by the
delimiter or another backslash, in which case the delimiter or backslash is interpolated.
$foo = q!I said, "You said, ’She said it.’"!;
$bar = q(’This is it.’);
$baz = ’\n’;
# a two−character string
qq/STRING/
"STRING"
A double−quoted, interpolated string.
$_ .= qq
(*** The previous line contains the naughty word "$1".\n)
if /(tcl|rexx|python)/;
# :−)
$baz = "\n";
# a one−character string
18−Oct−1998
Version 5.005_02
175
perlop
Perl Programmers Reference Guide
perlop
qr/STRING/imosx
A string which is (possibly) interpolated and then compiled as a regular expression. The result
may be used as a pattern in a match
$re = qr/$pattern/;
$string =~ /foo${re}bar/;
$string =~ $re;
# can be interpolated in other patterns
# or used standalone
Options are:
i
m
o
s
x
Do case−insensitive pattern matching.
Treat string as multiple lines.
Compile pattern only once.
Treat string as single line.
Use extended regular expressions.
The benefit from this is that the pattern is precompiled into an internal representation, and does
not need to be recompiled every time a match is attempted. This makes it very efficient to do
something like:
foreach $pattern (@pattern_list) {
my $re = qr/$pattern/;
foreach $line (@lines) {
if($line =~ /$re/) {
do_something($line);
}
}
}
See perlre for additional information on valid syntax for STRING, and for a detailed look at the
semantics of regular expressions.
qx/STRING/
‘STRING‘ A string which is (possibly) interpolated and then executed as a system command with
/bin/sh or its equivalent. Shell wildcards, pipes, and redirections will be honored. The
collected standard output of the command is returned; standard error is unaffected. In scalar
context, it comes back as a single (potentially multi−line) string. In list context, returns a list of
lines (however you‘ve defined lines with $/ or $INPUT_RECORD_SEPARATOR).
Because backticks do not affect standard error, use shell file descriptor syntax (assuming the
shell supports this) if you care to address this. To capture a command‘s STDERR and STDOUT
together:
$output = ‘cmd 2>&1‘;
To capture a command‘s STDOUT but discard its STDERR:
$output = ‘cmd 2>/dev/null‘;
To capture a command‘s STDERR but discard its STDOUT (ordering is important here):
$output = ‘cmd 2>&1 1>/dev/null‘;
To exchange a command‘s STDOUT and STDERR in order to capture the STDERR but leave its
STDOUT to come out the old STDERR:
$output = ‘cmd 3>&1 1>&2 2>&3 3>&−‘;
To read both a command‘s STDOUT and its STDERR separately, it‘s easiest and safest to
redirect them separately to files, and then read from those files when the program is done:
system("program args 1>/tmp/program.stdout 2>/tmp/program.stderr");
176
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
Using single−quote as a delimiter protects the command from Perl‘s double−quote interpolation,
passing it on to the shell instead:
$perl_info = qx(ps $$);
$shell_info = qx’ps $$’;
# that’s Perl’s $$
# that’s the new shell’s $$
Note that how the string gets evaluated is entirely subject to the command interpreter on your
system. On most platforms, you will have to protect shell metacharacters if you want them
treated literally. This is in practice difficult to do, as it‘s unclear how to escape which characters.
See perlsec for a clean and safe example of a manual fork() and exec() to emulate
backticks safely.
On some platforms (notably DOS−like ones), the shell may not be capable of dealing with
multiline commands, so putting newlines in the string may not get you what you want. You may
be able to evaluate multiple commands in a single line by separating them with the command
separator character, if your shell supports that (e.g. ; on many Unix shells; & on the Windows
NT cmd shell).
Beware that some command shells may place restrictions on the length of the command line.
You must ensure your strings don‘t exceed this limit after any necessary interpolations. See the
platform−specific release notes for more details about your particular environment.
Using this operator can lead to programs that are difficult to port, because the shell commands
called vary between systems, and may in fact not be present at all. As one example, the type
command under the POSIX shell is very different from the type command under DOS. That
doesn‘t mean you should go out of your way to avoid backticks when they‘re the right way to get
something done. Perl was made to be a glue language, and one of the things it glues together is
commands. Just understand what you‘re getting yourself into.
See O Operators" for more discussion.
qw/STRING/
Returns a list of the words extracted out of STRING, using embedded whitespace as the word
delimiters. It is exactly equivalent to
split(’ ’, q/STRING/);
This equivalency means that if used in scalar context, you‘ll get split‘s (unfortunate) scalar
context behavior, complete with mysterious warnings.
Some frequently seen examples:
use POSIX qw( setlocale localeconv )
@EXPORT = qw( foo bar baz );
A common mistake is to try to separate the words with comma or to put comments into a
multi−line qw−string. For this reason the −w switch produce warnings if the STRING contains
the "," or the "#" character.
s/PATTERN/REPLACEMENT/egimosx
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and
returns the number of substitutions made. Otherwise it returns false (specifically, the empty
string).
If no string is specified via the =~ or !~ operator, the $_ variable is searched and modified.
(The string specified with =~ must be scalar variable, an array element, a hash element, or an
assignment to one of those, i.e., an lvalue.)
If the delimiter chosen is single quote, no variable interpolation is done on either the PATTERN
or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable
rather than an end−of−string test, the variable will be interpolated into the pattern at run−time. If
you want the pattern compiled only once the first time the variable is interpolated, use the /o
18−Oct−1998
Version 5.005_02
177
perlop
Perl Programmers Reference Guide
perlop
option. If the pattern evaluates to the empty string, the last successfully executed regular
expression is used instead. See perlre for further explanation on these. See perllocale for
discussion of additional considerations that apply when use locale is in effect.
Options are:
e
g
i
m
o
s
x
Evaluate the right side as an expression.
Replace globally, i.e., all occurrences.
Do case−insensitive pattern matching.
Treat string as multiple lines.
Compile pattern only once.
Treat string as single line.
Use extended regular expressions.
Any non−alphanumeric, non−whitespace delimiter may replace the slashes. If single quotes are
used, no interpretation is done on the replacement string (the /e modifier overrides this,
however). Unlike Perl 4, Perl 5 treats backticks as normal delimiters; the replacement text is not
evaluated as a command. If the PATTERN is delimited by bracketing quotes, the
REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g.,
s(foo)(bar) or s/bar/. A /e will cause the replacement portion to be interpreted
as a full−fledged Perl expression and eval()ed right then and there. It is, however, syntax
checked at compile−time.
Examples:
s/\bgreen\b/mauve/g;
# don’t change wintergreen
$path =~ s|/usr/bin|/usr/local/bin|;
s/Login: $foo/Login: $bar/; # run−time pattern
($foo = $bar) =~ s/this/that/;
# copy first, then change
$count = ($paragraph =~ s/Mister\b/Mr./g);
$_ = ’abc123xyz’;
s/\d+/$&*2/e;
s/\d+/sprintf("%5d",$&)/e;
s/\w/$& x 2/eg;
# get change−count
# yields ’abc246xyz’
# yields ’abc 246xyz’
# yields ’aabbcc 224466xxyyzz’
s/%(.)/$percent{$1}/g;
# change percent escapes; no /e
s/%(.)/$percent{$1} || $&/ge;
# expr now, so /e
s/^=(\w+)/&pod($1)/ge;
# use function call
# expand variables in $_, but dynamics only, using
# symbolic dereferencing
s/\$(\w+)/${$1}/g;
# /e’s can even nest; this will expand
# any embedded scalar variable (including lexicals) in $_
s/(\$\w+)/$1/eeg;
# Delete (most) C comments.
$program =~ s {
/\*
# Match the opening delimiter.
.*?
# Match a minimal number of characters.
\*/
# Match the closing delimiter.
} []gsx;
178
s/^\s*(.*?)\s*$/$1/;
# trim white space in $_, expensively
for ($variable) {
# trim white space in $variable, cheap
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
s/^\s+//;
s/\s+$//;
}
s/([^ ]*) *([^ ]*)/$2 $1/;
# reverse 1st two fields
Note the use of $ instead of \ in the last example. Unlike sed, we use the \ form in only
the left hand side. Anywhere else it‘s $.
Occasionally, you can‘t use just a /g to get all the changes to occur. Here are two common
cases:
# put commas in the right places in an integer
1 while s/(.*\d)(\d\d\d)/$1,$2/g;
# perl4
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5
# expand tabs to 8−column spacing
1 while s/\t+/’ ’ x (length($&)*8 − length($‘)%8)/e;
tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds
Transliterates all occurrences of the characters found in the search list with the corresponding
character in the replacement list. It returns the number of characters replaced or deleted. If no
string is specified via the =~ or !~ operator, the $_ string is transliterated. (The string specified
with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of
those, i.e., an lvalue.) A character range may be specified with a hyphen, so tr/A−J/0−9/
does the same replacement as tr/ACEGIBDFHJ/0246813579/. For sed devotees, y is
provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the
REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes,
e.g., tr[A−Z][a−z] or tr(+\−*/)/ABCD/.
Options:
c
d
s
Complement the SEARCHLIST.
Delete found but unreplaced characters.
Squash duplicate replaced characters.
If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d
modifier is specified, any characters specified by SEARCHLIST not found in
REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of
some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s
modifier is specified, sequences of characters that were transliterated to the same character are
squashed down to a single instance of the character.
If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified.
Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is
replicated till it is long enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is
replicated. This latter is useful for counting characters in a class or for squashing character
sequences in a class.
Examples:
18−Oct−1998
$ARGV[1] =~ tr/A−Z/a−z/;
# canonicalize to lower case
$cnt = tr/*/*/;
# count the stars in $_
$cnt = $sky =~ tr/*/*/;
# count the stars in $sky
$cnt = tr/0−9//;
# count the digits in $_
tr/a−zA−Z//s;
# bookkeeper −> bokeper
Version 5.005_02
179
perlop
Perl Programmers Reference Guide
perlop
($HOST = $host) =~ tr/a−z/A−Z/;
tr/a−zA−Z/ /cs;
# change non−alphas to single space
tr [\200−\377]
[\000−\177];
# delete 8th bit
If multiple transliterations are given for a character, only the first one is used:
tr/AAA/XYZ/
will transliterate any A to X.
Note that because the transliteration table is built at compile time, neither the SEARCHLIST nor
the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you
want to use variables, you must use an eval():
eval "tr/$oldlist/$newlist/";
die $@ if $@;
eval "tr/$oldlist/$newlist/, 1" or die $@;
Gory details of parsing quoted constructs
When presented with something which may have several different interpretations, Perl uses the principle
DWIM (expanded to Do What I Mean − not what I wrote) to pick up the most probable interpretation of the
source. This strategy is so successful that Perl users usually do not suspect ambivalence of what they write.
However, time to time Perl‘s ideas differ from what the author meant.
The target of this section is to clarify the Perl‘s way of interpreting quoted constructs. The most frequent
reason one may have to want to know the details discussed in this section is hairy regular expressions.
However, the first steps of parsing are the same for all Perl quoting operators, so here they are discussed
together.
Some of the passes discussed below are performed concurrently, but as far as results are the same, we
consider them one−by−one. For different quoting constructs Perl performs different number of passes, from
one to five, but they are always performed in the same order.
Finding the end
First pass is finding the end of the quoted construct, be it multichar ender "\nEOF\n" of < which terminates
a fileglob started with <.
When searching for multichar construct no skipping is performed. When searching for one−char
non−matching delimiter, such as /, combinations \\ and \/ are skipped. When searching for
one−char matching delimiter, such as ], combinations \\, \] and \[ are skipped, and nested [, ] are
skipped as well.
For 3−parts constructs, s/// etc. the search is repeated once more.
During this search no attention is paid to the semantic of the construct, thus
"$hash{"$foo/$bar"}"
or
m/
bar
/x
#
This is not a comment, this slash / terminated m//!
do not form legal quoted expressions. Note that since the slash which terminated m// was followed
by a SPACE, this is not m//x, thus # was interpreted as a literal #.
180
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
Removal of backslashes before delimiters
During the second pass the text between the starting delimiter and the ending delimiter is copied to a
safe location, and the \ is removed from combinations consisting of \ and delimiter(s) (both starting
and ending delimiter if they differ).
The removal does not happen for multi−char delimiters.
Note that the combination \\ is left as it was!
Starting from this step no information about the delimiter(s) is used in the parsing.
Interpolation
Next step is interpolation in the obtained delimiter−independent text. There are four different cases.
<<‘EOF’, m‘’, s‘’’, tr///, y///
No interpolation is performed.
‘’, q//
The only interpolation is removal of \ from pairs \\.
"", ‘‘, qq//, qx//,
\Q, \U, \u, \L, \l (possibly paired with \E) are converted to corresponding Perl constructs,
thus "$foo\Qbaz$bar" is converted to
$foo . (quotemeta("baz" . $bar));
Other combinations of \ with following chars are substituted with appropriate expansions.
Interpolated scalars and arrays are converted to join and . Perl constructs, thus "‘@arr‘"
becomes
"’" . (join $", @arr) . "’";
Since all three above steps are performed simultaneously left−to−right, the is no way to insert a
literal $ or @ inside \Q\E pair: it cannot be protected by \, since any \ (except in \E) is
interpreted as a literal inside \Q\E, and any $ is interpreted as starting an interpolated scalar.
Note also that the interpolating code needs to make decision where the interpolated scalar ends,
say, whether "a $b −> {c}" means
"a " . $b . " −> {c}";
or
"a " . $b −> {c};
Most the time the decision is to take the longest possible text which does not include spaces
between components and contains matching braces/brackets.
?RE?, /RE/, m/RE/, s/RE/foo/,
Processing of \Q, \U, \u, \L, \l and interpolation happens (almost) as with qq// constructs,
but the substitution of \ followed by other chars is not performed! Moreover, inside
(?{BLOCK}) no processing is performed at all.
Interpolation has several quirks: $|, $( and $) are not interpolated, and constructs
$var[SOMETHING] are voted (by several different estimators) to be an array element or
$var followed by a RE alternative. This is the place where the notation ${arr[$bar]}
comes handy: /${arr[0−9]}/ is interpreted as an array element −9, not as a regular
expression from variable $arr followed by a digit, which is the interpretation of
/$arr[0−9]/.
Note that absence of processing of \\ creates specific restrictions on the post−processed text: if
the delimiter is /, one cannot get the combination \/ into the result of this step: / will finish the
18−Oct−1998
Version 5.005_02
181
perlop
Perl Programmers Reference Guide
perlop
regular expression, \/ will be stripped to / on the previous step, and \\/ will be left as is.
Since / is equivalent to \/ inside a regular expression, this does not matter unless the delimiter
is special character for the RE engine, as in s*foo*bar*, m[foo], or ?foo?.
This step is the last one for all the constructs except regular expressions, which are processed further.
Interpolation of regular expressions
All the previous steps were performed during the compilation of Perl code, this one happens in run
time (though it may be optimized to be calculated at compile time if appropriate). After all the
preprocessing performed above (and possibly after evaluation if catenation, joining, up/down−casing
and quotemeta()ing are involved) the resulting string is passed to RE engine for compilation.
Whatever happens in the RE engine is better be discussed in perlre, but for the sake of continuity let us
do it here.
This is the first step where presence of the //x switch is relevant. The RE engine scans the string
left−to−right, and converts it to a finite automaton.
Backslashed chars are either substituted by corresponding literal strings, or generate special nodes of
the finite automaton. Characters which are special to the RE engine generate corresponding nodes.
(?#...) comments are ignored. All the rest is either converted to literal strings to match, or is
ignored (as is whitespace and #−style comments if //x is present).
Note that the parsing of the construct [...] is performed using absolutely different rules than the
rest of the regular expression. Similarly, the (?{...}) is only checked for matching braces.
Optimization of regular expressions
This step is listed for completeness only. Since it does not change semantics, details of this step are
not documented and are subject to change.
I/O Operators
There are several I/O operators you should know about. A string enclosed by backticks (grave accents) first
undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the
output of that command is the value of the pseudo−literal, like in a shell. In scalar context, a single string
consisting of all the output is returned. In list context, a list of values is returned, one for each line of output.
(You can set $/ to use a different line terminator.) The command is executed each time the pseudo−literal
is evaluated. The status value of the command is returned in $? (see perlvar for the interpretation of $?).
Unlike in csh, no translation is done on the return data—newlines remain newlines. Unlike in any of the
shells, single quotes do not hide variable names in the command from interpretation. To pass a $ through to
the shell you need to hide it with a backslash. The generalized form of backticks is qx//. (Because
backticks always undergo shell expansion as well, see perlsec for security concerns.)
Evaluating a filehandle in angle brackets yields the next line from that file (newline, if any, included), or
undef at end of file. Ordinarily you must assign that value to a variable, but there is one situation where an
automatic assignment happens. If and ONLY if the input symbol is the only thing inside the conditional of a
while or for(;;) loop, the value is automatically assigned to the variable $_. In these loop constructs,
the assigned value (whether assignment is automatic or explicit) is then tested to see if it is defined. The
defined test avoids problems where line has a string value that would be treated as false by perl e.g. "" or "0"
with no trailing newline. (This may seem like an odd thing to you, but you‘ll use the construct in almost
every Perl script you write.) Anyway, the following lines are equivalent to each other:
while (defined($_ = )) { print; }
while ($_ = ) { print; }
while () { print; }
for (;;) { print; }
print while defined($_ = );
print while ($_ = );
print while ;
182
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
and this also behaves similarly, but avoids the use of $_ :
while (my $line = ) { print $line }
If you really mean such values to terminate the loop they should be tested for explicitly:
while (($_ = ) ne ’0’) { ... }
while () { last unless $_; ... }
In other boolean contexts, without explicit defined test or comparison will solicit a
warning if −w is in effect.
The filehandles STDIN, STDOUT, and STDERR are predefined. (The filehandles stdin, stdout, and
stderr will also work except in packages, where they would be interpreted as local identifiers rather than
global.) Additional filehandles may be created with the open() function. See open() for details on this.
If a is used in a context that is looking for a list, a list consisting of all the input lines is
returned, one line per list element. It‘s easy to make a LARGE data space this way, so use with care.
The null filehandle <> is special and can be used to emulate the behavior of sed and awk. Input from <>
comes either from standard input, or from each file listed on the command line. Here‘s how it works: the
first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "−", which
when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The
loop
while (<>) {
...
}
# code for each line
is equivalent to the following Perl−like pseudo code:
unshift(@ARGV, ’−’) unless @ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while () {
...
# code for each line
}
}
except that it isn‘t so cumbersome to say, and will actually work. It really does shift array @ARGV and put
the current filename into variable $ARGV. It also uses filehandle ARGV internally—<> is just a synonym
for , which is magical. (The pseudo code above doesn‘t work because it treats as
non−magical.)
You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you
really want. Line numbers ($.) continue as if the input were one big happy file. (But see example under
eof for how to reset line numbers on each file.)
If you want to set @ARGV to your own list of files, go right ahead. This sets @ARGV to all plain text files
if no @ARGV was given:
@ARGV = grep { −f && −T } glob(’*’) unless @ARGV;
You can even set them to pipe commands. For example, this automatically filters compressed arguments
through gzip:
@ARGV = map { /\.(gz|Z)$/ ? "gzip −dc < $_ |" : $_ } @ARGV;
If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the
front like this:
while ($_ = $ARGV[0], /^−/) {
shift;
18−Oct−1998
Version 5.005_02
183
perlop
Perl Programmers Reference Guide
perlop
last if /^−−$/;
if (/^−D(.*)/) { $debug = $1 }
if (/^−v/)
{ $verbose++ }
# ...
# other switches
}
while (<>) {
# ...
}
# code for each line
The <> symbol will return undef for end−of−file only once. If you call it again after this it will assume
you are processing another @ARGV list, and if you haven‘t set @ARGV, will input from STDIN.
If the string inside the angle brackets is a reference to a scalar variable (e.g., <$foo>), then that variable
contains the name of the filehandle to input from, or its typeglob, or a reference to the same. For example:
$fh = \*STDIN;
$line = <$fh>;
If what‘s within the angle brackets is neither a filehandle nor a simple scalar variable containing a filehandle
name, typeglob, or typeglob reference, it is interpreted as a filename pattern to be globbed, and either a list of
filenames or the next filename in the list is returned, depending on context. This distinction is determined
on syntactic grounds alone. That means <$x> is always a readline from an indirect handle, but
<$hash{key}> is always a glob. That‘s because $x is a simple scalar variable, but $hash{key} is
not—it‘s a hash element.
One level of double−quote interpretation is done first, but you can‘t say <$foo> because that‘s an indirect
filehandle as explained in the previous paragraph. (In older versions of Perl, programmers would insert curly
brackets to force interpretation as a filename glob: <${foo}>. These days, it‘s considered cleaner to call
the internal function directly as glob($foo), which is probably the right way to have done it in the first
place.) Example:
while (<*.c>) {
chmod 0644, $_;
}
is equivalent to
open(FOO, "echo *.c | tr −s ’ \t\r\f’ ’\\012\\012\\012\\012’|");
while () {
chop;
chmod 0644, $_;
}
In fact, it‘s currently implemented that way. (Which means it will not work on filenames with spaces in
them unless you have csh(1) on your machine.) Of course, the shortest way to do the above is:
chmod 0644, <*.c>;
Because globbing invokes a shell, it‘s often faster to call readdir() yourself and do your own grep()
on the filenames. Furthermore, due to its current implementation of using a shell, the glob() routine may
get "Arg list too long" errors (unless you‘ve installed tcsh(1L) as /bin/csh).
A glob evaluates its (embedded) argument only when it is starting a new list. All values must be read before
it will start over. In a list context this isn‘t important, because you automatically get them all anyway. In
scalar context, however, the operator returns the next value each time it is called, or a undef value if you‘ve
just run out. As for filehandles an automatic defined is generated when the glob occurs in the test part of a
while or for − because legal glob returns (e.g. a file called ) would otherwise terminate the loop. Again,
undef is returned only once. So if you‘re expecting a single value from a glob, it is much better to say
($file) = ;
184
Version 5.005_02
18−Oct−1998
perlop
Perl Programmers Reference Guide
perlop
than
$file = ;
because the latter will alternate between returning a filename and returning FALSE.
It you‘re trying to do variable interpolation, it‘s definitely better to use the glob() function, because the
older notation can cause people to become confused with the indirect filehandle notation.
@files = glob("$dir/*.[ch]");
@files = glob($files[$i]);
Constant Folding
Like C, Perl does a certain amount of expression evaluation at compile time, whenever it determines that all
arguments to an operator are static and have no side effects. In particular, string concatenation happens at
compile time between literals that don‘t do variable substitution. Backslash interpretation also happens at
compile time. You can say
’Now is the time for all’ . "\n" .
’good men to come to.’
and this all reduces to one string internally. Likewise, if you say
foreach $file (@filenames) {
if (−s $file > 5 + 100 * 2**16) {
}
}
the compiler will precompute the number that expression represents so that the interpreter won‘t have to.
Bitwise String Operators
Bitstrings of any size may be manipulated by the bitwise operators (~ | & ^).
If the operands to a binary bitwise op are strings of different sizes, or and xor ops will act as if the shorter
operand had additional zero bits on the right, while the and op will act as if the longer operand were
truncated to the length of the shorter.
# ASCII−based examples
print "j p \n" ^ " a h";
print "JA" | " ph\n";
print "japh\nJunk" & ’_____’;
print ’p N$’ ^ " Enew(’123456789123456789’);
print $x * $x;
# prints +15241578780673678515622620750190521
186
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
NAME
perlre − Perl regular expressions
DESCRIPTION
This page describes the syntax of regular expressions in Perl. For a description of how to use regular
expressions in matching operations, plus various examples of the same, see discussion of m//, s///, qr//
and ?? in Regexp Quote−Like Operators in perlop.
The matching operations can have various modifiers. The modifiers that relate to the interpretation of the
regular expression inside are listed below. For the modifiers that alter the way a regular expression is used
by Perl, see Regexp Quote−Like Operators in perlop and
Gory details of parsing quoted constructs in perlop.
i
Do case−insensitive pattern matching.
If use locale is in effect, the case map is taken from the current locale. See perllocale.
m
Treat string as multiple lines. That is, change "^" and "$" from matching at only the very start or end
of the string to the start or end of any line anywhere within the string,
s
Treat string as single line. That is, change "." to match any character whatsoever, even a newline,
which it normally would not match.
The /s and /m modifiers both override the $* setting. That is, no matter what $* contains, /s
without /m will force "^" to match only at the beginning of the string and "$" to match only at the end
(or just before a newline at the end) of the string. Together, as /ms, they let the "." match any character
whatsoever, while yet allowing "^" and "$" to match, respectively, just after and just before newlines
within the string.
x
Extend your pattern‘s legibility by permitting whitespace and comments.
These are usually written as "the /x modifier", even though the delimiter in question might not actually be a
slash. In fact, any of these modifiers may also be embedded within the regular expression itself using the
new (?...) construct. See below.
The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore
whitespace that is neither backslashed nor within a character class. You can use this to break up your regular
expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing
a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in
the pattern (outside of a character class, where they are unaffected by /x), that you‘ll either have to escape
them or encode them using octal or hex escapes. Taken together, these features go a long way towards
making Perl‘s regular expressions more readable. Note that you have to be careful not to include the pattern
delimiter in the comment—perl has no way of knowing you did not intend to close the pattern early. See the
C−comment deletion code in perlop.
Regular Expressions
The patterns used in pattern matching are regular expressions such as those supplied in the Version 8 regex
routines. (In fact, the routines are derived (distantly) from Henry Spencer‘s freely redistributable
reimplementation of the V8 routines.) See Version 8 Regular Expressions for details.
In particular the following metacharacters have their standard egrep−ish meanings:
\
^
.
$
|
()
[]
18−Oct−1998
Quote the next metacharacter
Match the beginning of the line
Match any character (except newline)
Match the end of the line (or before newline at the end)
Alternation
Grouping
Character class
Version 5.005_02
187
perlre
Perl Programmers Reference Guide
perlre
By default, the "^" character is guaranteed to match at only the beginning of the string, the "$" character at
only the end (or before the newline at the end) and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be matched by "^" or "$". You may,
however, wish to treat a string as a multi−line buffer, such that the "^" will match after any newline within
the string, and "$" will match before any newline. At the cost of a little more overhead, you can do this by
using the /m modifier on the pattern match operator. (Older programs did this by setting $*, but this
practice is now deprecated.)
To facilitate multi−line substitutions, the "." character never matches a newline unless you use the /s
modifier, which in effect tells Perl to pretend the string is a single line—even if it isn‘t. The /s modifier
also overrides the setting of $*, in case you have some (badly behaved) older code that sets it in another
module.
The following standard quantifiers are recognized:
*
+
?
{n}
{n,}
{n,m}
Match
Match
Match
Match
Match
Match
0 or more times
1 or more times
1 or 0 times
exactly n times
at least n times
at least n but not more than m times
(If a curly bracket occurs in any other context, it is treated as a regular character.) The "*" modifier is
equivalent to {0,}, the "+" modifier to {1,}, and the "?" modifier to {0,1}. n and m are limited to
integral values less than 65536.
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a
particular starting location) while still allowing the rest of the pattern to match. If you want it to match the
minimum number of times possible, follow the quantifier with a "?". Note that the meanings don‘t change,
just the "greediness":
*?
+?
??
{n}?
{n,}?
{n,m}?
Match
Match
Match
Match
Match
Match
0 or more times
1 or more times
0 or 1 time
exactly n times
at least n times
at least n but not more than m times
Because patterns are processed as double quoted strings, the following also work:
\t
\n
\r
\f
\a
\e
\033
\x1B
\c[
\l
\u
\L
\U
\E
\Q
tab
(HT, TAB)
newline
(LF, NL)
return
(CR)
form feed
(FF)
alarm (bell)
(BEL)
escape (think troff) (ESC)
octal char (think of a PDP−11)
hex char
control char
lowercase next char (think vi)
uppercase next char (think vi)
lowercase till \E (think vi)
uppercase till \E (think vi)
end case modification (think vi)
quote (disable) pattern metacharacters till \E
If use locale is in effect, the case map used by \l, \L, \u and \U is taken from the current locale. See
perllocale.
188
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
You cannot include a literal $ or @ within a \Q sequence. An unescaped $ or @ interpolates the
corresponding variable, while escaping will cause the literal string \$ to be matched. You‘ll need to write
something like m/\Quser\E\@\Qhost/.
In addition, Perl defines the following:
\w
\W
\s
\S
\d
\D
Match
Match
Match
Match
Match
Match
a
a
a
a
a
a
"word" character (alphanumeric plus "_")
non−word character
whitespace character
non−whitespace character
digit character
non−digit character
A \w matches a single alphanumeric character, not a whole word. To match a word you‘d need to say \w+.
If use locale is in effect, the list of alphabetic characters generated by \w is taken from the current
locale. See perllocale. You may use \w, \W, \s, \S, \d, and \D within character classes (though not as
either end of a range).
Perl defines the following zero−width assertions:
\b
\B
\A
\Z
\z
\G
Match
Match
Match
Match
Match
Match
a word boundary
a non−(word boundary)
only at beginning of string
only at end of string, or before newline at the end
only at end of string
only where previous m//g left off (works only with /g)
A word boundary (\b) is defined as a spot between two characters that has a \w on one side of it and a \W
on the other side of it (in either order), counting the imaginary characters off the beginning and end of the
string as matching a \W. (Within character classes \b represents backspace rather than a word boundary.)
The \A and \Z are just like "^" and "$", except that they won‘t match multiple times when the /m modifier
is used, while "^" and "$" will match at every internal line boundary. To match the actual end of the string,
not ignoring newline, you can use \z. The \G assertion can be used to chain global matches (using m//g),
as described in Regexp Quote−Like Operators in perlop.
It is also useful when writing lex−like scanners, when you have several patterns that you want to match
against consequent substrings of your string, see the previous reference. The actual location where \G will
match can also be influenced by using pos() as an lvalue. See pos.
When the bracketing construct ( ... ) is used, \ matches the digit‘th substring. Outside of the
pattern, always use "$" instead of "\" in front of the digit. (While the \ notation can on rare occasion
work outside the current pattern, this should not be relied upon. See the WARNING below.) The scope of
$ (and $‘, $&, and $’) extends to the end of the enclosing BLOCK or eval string, or to the next
successful pattern match, whichever comes first. If you want to use parentheses to delimit a subpattern (e.g.,
a set of alternatives) without saving it as a subpattern, follow the ( with a ?:.
You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10,
$11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if
there have been at least that many left parentheses before the backreference. Otherwise (for backward
compatibility) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through
\9 are always backreferences.)
$+ returns whatever the last bracket match matched. $& returns the entire matched string. ($0 used to
return the same thing, but not any more.) $‘ returns everything before the matched string. $’ returns
everything after the matched string. Examples:
s/^([^ ]*) *([^ ]*)/$2 $1/;
# swap first two words
if (/Time: (..):(..):(..)/) {
$hours = $1;
18−Oct−1998
Version 5.005_02
189
perlre
Perl Programmers Reference Guide
perlre
$minutes = $2;
$seconds = $3;
}
Once perl sees that you need one of $&, $‘ or $’ anywhere in the program, it has to provide them on each
and every pattern match. This can slow your program down. The same mechanism that handles these
provides for the use of $1, $2, etc., so you pay the same price for each pattern that contains capturing
parentheses. But if you never use $&, etc., in your script, then patterns without capturing parentheses won‘t
be penalized. So avoid $&, $‘, and $‘ if you can, but if you can‘t (and some algorithms really appreciate
them), once you‘ve used them once, use them at will, because you‘ve already paid the price. As of 5.005,
$& is not so costly as the other two.
Backslashed metacharacters in Perl are alphanumeric, such as \b, \w, \n. Unlike some other regular
expression languages, there are no backslashed symbols that aren‘t alphanumeric. So anything that looks
like \\, \(, \), \<, \>, \{, or \} is always interpreted as a literal character, not a metacharacter. This was once
used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a
string that you want to use for a pattern. Simply quote all non−alphanumeric characters:
$pattern =~ s/(\W)/\\$1/g;
Now it is much more common to see either the quotemeta() function or the \Q escape sequence used to
disable all metacharacters’ special meanings like this:
/$unquoted\Q$quoted\E$unquoted/
Perl defines a consistent extension syntax for regular expressions. The syntax is a pair of parentheses with a
question mark as the first thing within the parentheses (this was a syntax error in older versions of Perl). The
character after the question mark gives the function of the extension. Several extensions are already
supported:
(?#text)
A comment. The text is ignored. If the /x switch is used to enable whitespace formatting, a
simple # will suffice. Note that perl closes the comment as soon as it sees a ), so there is no
way to put a literal ) in the comment.
(?:pattern)
(?imsx−imsx:pattern)
This is for clustering, not capturing; it groups subexpressions like "()", but doesn‘t make
backreferences as "()" does. So
@fields = split(/\b(?:a|b|c)\b/)
is like
@fields = split(/\b(a|b|c)\b/)
but doesn‘t spit out extra fields.
The letters between ? and : act as flags modifiers, see (?imsx−imsx). In particular,
/(?s−i:more.*than).*million/i
is equivalent to more verbose
/(?:(?s−i)more.*than).*million/i
(?=pattern)
A zero−width positive lookahead assertion. For example, /\w+(?=\t)/ matches a word
followed by a tab, without including the tab in $&.
(?!pattern)
A zero−width negative lookahead assertion. For example /foo(?!bar)/ matches any
occurrence of "foo" that isn‘t followed by "bar". Note however that lookahead and
lookbehind are NOT the same thing. You cannot use this for lookbehind.
190
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
If you are looking for a "bar" that isn‘t preceded by a "foo", /(?!foo)bar/ will not do
what you want. That‘s because the (?!foo) is just saying that the next thing cannot be
"foo"—and it‘s not, it‘s a "bar", so "foobar" will match. You would have to do something
like /(?!foo)...bar/ for that. We say "like" because there‘s the case of your "bar" not
having three characters before it. You could cover that this way:
/(?:(?!foo)...|^.{0,2})bar/. Sometimes it‘s still easier just to say:
if (/bar/ && $‘ !~ /foo$/)
For lookbehind see below.
(?<=pattern)
A zero−width positive lookbehind assertion. For example, /(?<=\t)\w+/ matches a word
following a tab, without including the tab in $&. Works only for fixed−width lookbehind.
(?x;
will set $res = 4. Note that after the match $cnt returns to the globally introduced value
0, since the scopes which restrict local statements are unwound.
This assertion may be used as (?(condition)yes−pattern|no−pattern) switch.
If not used in this way, the result of evaluation of code is put into variable $^R. This
happens immediately, so $^R can be used from other (?{ code }) assertions inside the
same regular expression.
The above assignment to $^R is properly localized, thus the old value of $^R is restored if
the assertion is backtracked (compare "Backtracking").
Due to security concerns, this construction is not allowed if the regular expression involves
run−time interpolation of variables, unless use re ‘eval’ pragma is used (see re), or the
variables contain results of qr() operator (see qr/STRING/imosx in perlop).
This restriction is due to the wide−spread (questionable) practice of using the construct
$re = <>;
chomp $re;
18−Oct−1998
Version 5.005_02
191
perlre
Perl Programmers Reference Guide
perlre
$string =~ /$re/;
without tainting. While this code is frowned upon from security point of view, when (?{})
was introduced, it was considered bad to add new security holes to existing scripts.
NOTE: Use of the above insecure snippet without also enabling taint mode is to be severely
frowned upon. use re ‘eval’ does not disable tainting checks, thus to allow $re in the
above snippet to contain (?{}) with tainting enabled, one needs both use re ‘eval’
and untaint the $re.
(?>pattern)
An "independent" subexpression. Matches the substring that a standalone pattern would
match if anchored at the given position, and only this substring.
Say, ^(?>a*)ab will never match, since (?>a*) (anchored at the beginning of string, as
above) will match all characters a at the beginning of string, leaving no a for ab to match. In
contrast, a*ab will match the same as a+b, since the match of the subgroup a* is influenced
by the following group ab (see "Backtracking"). In particular, a* inside a*ab will match
fewer characters than a standalone a*, since this makes the tail match.
An effect similar to (?>pattern) may be achieved by
(?=(pattern))\1
since the lookahead is in "logical" context, thus matches the same substring as a standalone
a+. The following \1 eats the matched string, thus making a zero−length assertion into an
analogue of (?>...). (The difference between these two constructs is that the second one
uses a catching group, thus shifting ordinals of backreferences in the rest of a regular
expression.)
This construct is useful for optimizations of "eternal" matches, because it will not backtrack
(see "Backtracking").
m{ \(
(
[^()]+
|
\( [^()]* \)
)+
\)
}x
That will efficiently match a nonempty group with matching two−or−less−level−deep
parentheses. However, if there is no such group, it will take virtually forever on a long string.
That‘s because there are so many different ways to split a long string into several substrings.
This is what (.+)+ is doing, and (.+)+ is similar to a subpattern of the above pattern.
Consider that the above pattern detects no−match on ((()aaaaaaaaaaaaaaaaaa in
several seconds, but that each extra letter doubles this time. This exponential performance
will make it appear that your program has hung.
However, a tiny modification of this pattern
m{ \(
(
(?> [^()]+ )
|
\( [^()]* \)
)+
\)
}x
192
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
which uses (?>...) matches exactly when the one above does (verifying this yourself
would be a productive exercise), but finishes in a fourth the time when used on a similar
string with 1000000 as. Be aware, however, that this pattern currently triggers a warning
message under −w saying it "matches the null string many times"):
On simple groups, such as the pattern (? [^()]+ ), a comparable effect may be achieved by
negative lookahead, as in [^()]+ (?! [^()] ). This was only 4 times slower on a
string with 1000000 as.
(?(condition)yes−pattern|no−pattern)
(?(condition)yes−pattern)
Conditional expression. (condition) should be either an integer in parentheses (which is
valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate
zero−width assertion.
Say,
m{ ( \( )?
[^()]+
(?(1) \) )
}x
matches a chunk of non−parentheses, possibly included in parentheses themselves.
(?imsx−imsx)
One or more embedded pattern−match modifiers. This is particularly useful for patterns that
are specified in a table somewhere, some of which want to be case sensitive, and some of
which don‘t. The case insensitive ones need to include merely (?i) at the front of the
pattern. For example:
$pattern = "foobar";
if ( /$pattern/i ) { }
# more flexible:
$pattern = "(?i)foobar";
if ( /$pattern/ ) { }
Letters after − switch modifiers off.
These modifiers are localized inside an enclosing group (if any). Say,
( (?i) blah ) \s+ \1
(assuming x modifier, and no i modifier outside of this group) will match a repeated
(including the case!) word blah in any case.
A question mark was chosen for this and for the new minimal−matching construct because 1) question mark
is pretty rare in older regular expressions, and 2) whenever you see one, you should stop and "question"
exactly what is going on. That‘s psychology...
Backtracking
A fundamental feature of regular expression matching involves the notion called backtracking, which is
currently used (when needed) by all regular expression quantifiers, namely *, *?, +, +?, {n,m}, and
{n,m}?.
For a regular expression to match, the entire regular expression must match, not just part of it. So if the
beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail,
the matching engine backs up and recalculates the beginning part—that‘s why it‘s called backtracking.
Here is an example of backtracking: Let‘s say you want to find the word following "foo" in the string "Food
is on the foo table.":
18−Oct−1998
Version 5.005_02
193
perlre
Perl Programmers Reference Guide
perlre
$_ = "Food is on the foo table.";
if ( /\b(foo)\s+(\w+)/i ) {
print "$2 follows $1.\n";
}
When the match runs, the first part of the regular expression (\b(foo)) finds a possible match right at the
beginning of the string, and loads up $1 with "Foo". However, as soon as the matching engine sees that
there‘s no whitespace following the "Foo" that it had saved in $1, it realizes its mistake and starts over
again one character after where it had the tentative match. This time it goes all the way until the next
occurrence of "foo". The complete regular expression matches this time, and you get the expected output of
"table follows foo."
Sometimes minimal matching can help a lot. Imagine you‘d like to match everything between "foo" and
"bar". Initially, you write something like this:
$_ = "The food is under the bar in the barn.";
if ( /foo(.*)bar/ ) {
print "got <$1>\n";
}
Which perhaps unexpectedly yields:
got
That‘s because .* was greedy, so you get everything between the first "foo" and the last "bar". In this case,
it‘s more effective to use minimal matching to make sure you get the text between a "foo" and the first "bar"
thereafter.
if ( /foo(.*?)bar/ ) { print "got <$1>\n" }
got
Here‘s another example: let‘s say you‘d like to match a number at the end of a string, and you also want to
keep the preceding part the match. So you write this:
$_ = "I have 2 numbers: 53147";
if ( /(.*)(\d*)/ ) {
print "Beginning is <$1>, number is <$2>.\n";
}
# Wrong!
That won‘t work at all, because .* was greedy and gobbled up the whole string. As \d* can match on an
empty string the complete regular expression matched successfully.
Beginning is , number is <>.
Here are some variants, most of which don‘t work:
$_ = "I have 2 numbers: 53147";
@pats = qw{
(.*)(\d*)
(.*)(\d+)
(.*?)(\d*)
(.*?)(\d+)
(.*)(\d+)$
(.*?)(\d+)$
(.*)\b(\d+)$
(.*\D)(\d+)$
};
for $pat (@pats) {
printf "%−12s ", $pat;
if ( /$pat/ ) {
194
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
print "<$1> <$2>\n";
} else {
print "FAIL\n";
}
}
That will print out:
(.*)(\d*)
(.*)(\d+)
(.*?)(\d*)
(.*?)(\d+)
(.*)(\d+)$
(.*?)(\d+)$
(.*)\b(\d+)$
(.*\D)(\d+)$
have
have
have
have
have
2 numbers: 53147> <>
2 numbers: 5314> <7>
>
2
2
2
2
<2>
numbers:
numbers:
numbers:
numbers:
5314> <7>
> <53147>
> <53147>
> <53147>
As you see, this can be a bit tricky. It‘s important to realize that a regular expression is merely a set of
assertions that gives a definition of success. There may be 0, 1, or several different ways that the definition
might succeed against a particular string. And if there are multiple ways it might succeed, you need to
understand backtracking to know which variety of success you will achieve.
When using lookahead assertions and negations, this can all get even tricker. Imagine you‘d like to find a
sequence of non−digits not followed by "123". You might try to write that as
$_ = "ABC123";
if ( /^\D*(?!123)/ ) {
print "Yup, no 123 in $_\n";
}
# Wrong!
But that isn‘t going to match; at least, not the way you‘re hoping. It claims that there is no 123 in the string.
Here‘s a clearer picture of why it that pattern matches, contrary to popular expectations:
$x = ’ABC123’ ;
$y = ’ABC445’ ;
print "1: got $1\n" if $x =~ /^(ABC)(?!123)/ ;
print "2: got $1\n" if $y =~ /^(ABC)(?!123)/ ;
print "3: got $1\n" if $x =~ /^(\D*)(?!123)/ ;
print "4: got $1\n" if $y =~ /^(\D*)(?!123)/ ;
This prints
2: got ABC
3: got AB
4: got ABC
You might have expected test 3 to fail because it seems to a more general purpose version of test 1. The
important difference between them is that test 3 contains a quantifier (\D*) and so can use backtracking,
whereas test 1 will not. What‘s happening is that you‘ve asked "Is it true that at the start of $x, following 0
or more non−digits, you have something that‘s not 123?" If the pattern matcher had let \D* expand to
"ABC", this would have caused the whole pattern to fail. The search engine will initially match \D* with
"ABC". Then it will try to match (?!123 with "123", which of course fails. But because a quantifier
(\D*) has been used in the regular expression, the search engine can backtrack and retry the match
differently in the hope of matching the complete regular expression.
The pattern really, really wants to succeed, so it uses the standard pattern back−off−and−retry and lets \D*
expand to just "AB" this time. Now there‘s indeed something following "AB" that is not "123". It‘s in fact
"C123", which suffices.
18−Oct−1998
Version 5.005_02
195
perlre
Perl Programmers Reference Guide
perlre
We can deal with this by using both an assertion and a negation. We‘ll say that the first part in $1 must be
followed by a digit, and in fact, it must also be followed by something that‘s not "123". Remember that the
lookaheads are zero−width expressions—they only look, but don‘t consume any of the string in their match.
So rewriting this way produces what you‘d expect; that is, case 5 will fail, but case 6 succeeds:
print "5: got $1\n" if $x =~ /^(\D*)(?=\d)(?!123)/ ;
print "6: got $1\n" if $y =~ /^(\D*)(?=\d)(?!123)/ ;
6: got ABC
In other words, the two zero−width assertions next to each other work as though they‘re ANDed together,
just as you‘d use any builtin assertions: /^$/ matches only if you‘re at the beginning of the line AND the
end of the line simultaneously. The deeper underlying truth is that juxtaposition in regular expressions
always means AND, except when you write an explicit OR using the vertical bar. /ab/ means match "a"
AND (then) match "b", although the attempted matches are made at different positions because "a" is not a
zero−width assertion, but a one−width assertion.
One warning: particularly complicated regular expressions can take exponential time to solve due to the
immense number of possible ways they can use backtracking to try match. For example this will take a very
long time to run
/((a{0,5}){0,5}){0,5}/
And if you used *‘s instead of limiting it to 0 through 5 matches, then it would take literally forever—or
until you ran out of stack space.
A powerful tool for optimizing such beasts is "independent" groups, which do not backtrace (see
(?>pattern)). Note also that zero−length lookahead/lookbehind assertions will not backtrace to make
the tail match, since they are in "logical" context: only the fact whether they match or not is considered
relevant. For an example where side−effects of a lookahead might have influenced the following match, see
(?>pattern).
Version 8 Regular Expressions
In case you‘re not familiar with the "regular" Version 8 regex routines, here are the pattern−matching rules
not described above.
Any single character matches itself, unless it is a metacharacter with a special meaning described here or
above. You can cause characters that normally function as metacharacters to be interpreted literally by
prefixing them with a "\" (e.g., "\." matches a ".", not any character; "\\" matches a "\"). A series of
characters matches that series of characters in the target string, so the pattern blurfl would match "blurfl"
in the target string.
You can specify a character class, by enclosing a list of characters in [], which will match any one character
from the list. If the first character after the "[" is "^", the class matches any character not in the list. Within a
list, the "−" character is used to specify a range, so that a−z represents all characters between "a" and "z",
inclusive. If you want "−" itself to be a member of a class, put it at the start or end of the list, or escape it
with a backslash. (The following all specify the same class of three characters: [−az], [az−], and
[a\−z]. All are different from [a−z], which specifies a class containing twenty−six characters.)
Characters may be specified using a metacharacter syntax much like that used in C: "\n" matches a newline,
"\t" a tab, "\r" a carriage return, "\f" a form feed, etc. More generally, \nnn, where nnn is a string of octal
digits, matches the character whose ASCII value is nnn. Similarly, \xnn, where nn are hexadecimal digits,
matches the character whose ASCII value is nn. The expression \cx matches the ASCII character control−x.
Finally, the "." metacharacter matches any character except "\n" (unless you use /s).
You can specify a series of alternatives for a pattern using "|" to separate them, so that fee|fie|foe will
match any of "fee", "fie", or "foe" in the target string (as would f(e|i|o)e). The first alternative includes
everything from the last pattern delimiter ("(", "[", or the beginning of the pattern) up to the first "|", and the
last alternative contains everything from the last "|" to the next pattern delimiter. For this reason, it‘s
common practice to include alternatives in parentheses, to minimize confusion about where they start and
196
Version 5.005_02
18−Oct−1998
perlre
Perl Programmers Reference Guide
perlre
end.
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches,
is the one that is chosen. This means that alternatives are not necessarily greedy. For example: when mathing
foo|foot against "barefoot", only the "foo" part will match, as that is the first alternative tried, and it
successfully matches the target string. (This might not seem important, but it is important when you are
capturing matched text using parentheses.)
Also remember that "|" is interpreted as a literal within square brackets, so if you write [fee|fie|foe]
you‘re really only matching [feio|].
Within a pattern, you may designate subpatterns for later reference by enclosing them in parentheses, and
you may refer back to the nth subpattern later in the pattern using the metacharacter \n. Subpatterns are
numbered based on the left to right order of their opening parenthesis. A backreference matches whatever
actually matched the subpattern in the string being examined, not the rules for that subpattern. Therefore,
(0|0x)\d*\s\1\d* will match "0x1234 0x4321", but not "0x1234 01234", because subpattern 1 actually
matched "0x", even though the rule 0|0x could potentially match the leading 0 in the second number.
WARNING on \1 vs $1
Some people get too used to writing things like:
$pattern =~ s/(\W)/\\\1/g;
This is grandfathered for the RHS of a substitute to avoid shocking the sed addicts, but it‘s a dirty habit to
get into. That‘s because in PerlThink, the righthand side of a s/// is a double−quoted string. \1 in the
usual double−quoted string means a control−A. The customary Unix meaning of \1 is kludged in for s///.
However, if you get into the habit of doing that, you get yourself into trouble if you then add an /e
modifier.
s/(\d+)/ \1 + 1 /eg;
# causes warning under −w
Or if you try to do
s/(\d+)/\1000/;
You can‘t disambiguate that by saying \{1}000, whereas you can fix it with ${1}000. Basically, the
operation of interpolation should not be confused with the operation of matching a backreference. Certainly
they mean two different things on the left side of the s///.
Repeated patterns matching zero−length substring
WARNING: Difficult material (and prose) ahead. This section needs a rewrite.
Regular expressions provide a terse and powerful programming language. As with most other power tools,
power comes together with the ability to wreak havoc.
A common abuse of this power stems from the ability to make infinite loops using regular expressions, with
something as innocous as:
’foo’ =~ m{ ( o? )* }x;
The o? can match at the beginning of ‘foo’, and since the position in the string is not moved by the match,
o? would match again and again due to the * modifier. Another common way to create a similar cycle is
with the looping modifier //g:
@matches = ( ’foo’ =~ m{ o? }xg );
or
print "match: <$&>\n" while ’foo’ =~ m{ o? }xg;
or the loop implied by split().
However, long experience has shown that many programming tasks may be significantly simplified by using
repeated subexpressions which may match zero−length substrings, with a simple example being:
18−Oct−1998
Version 5.005_02
197
perlre
Perl Programmers Reference Guide
perlre
@chars = split //, $string;
# // is not magic in split
($whitewashed = $string) =~ s/()/ /g; # parens avoid magic s// /
Thus Perl allows the /()/ construct, which forcefully breaks the infinite loop. The rules for this are
different for lower−level loops given by the greedy modifiers *+{}, and for higher−level ones like the /g
modifier or split() operator.
The lower−level loops are interrupted when it is detected that a repeated expression did match a zero−length
substring, thus
m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
is made equivalent to
m{
(?: NON_ZERO_LENGTH )*
|
(?: ZERO_LENGTH )?
}x;
The higher level−loops preserve an additional state between iterations: whether the last match was
zero−length. To break the loop, the following match after a zero−length match is prohibited to have a
length of zero. This prohibition interacts with backtracking (see "Backtracking"), and so the second best
match is chosen if the best match is of zero length.
Say,
$_ = ’bar’;
s/\w??/<$&>/g;
results in "< \&convert;
}
sub invalid { die "/$_[0]/: invalid escape ’\\$_[1]’"}
my %rules = ( ’\\’ => ’\\’,
’Y|’ => qr/(?=\S)(?;
chomp $re;
$re = customre::convert $re;
/\Y|$re\Y|/;
SEE ALSO
Regexp Quote−Like Operators in perlop.
Gory details of parsing quoted constructs in perlop.
pos.
perllocale.
Mastering Regular Expressions (see perlbook) by Jeffrey Friedl.
18−Oct−1998
Version 5.005_02
199
perlrun
Perl Programmers Reference Guide
perlrun
NAME
perlrun − how to execute the Perl interpreter
SYNOPSIS
perl
[ −sTuU ]
[ −hv ] [ −V[:configvar] ]
[ −cw ] [ −d[:debugger] ] [ −D[number/list] ]
[ −pna ] [ −Fpattern ] [ −l[octal] ] [ −0[octal] ]
[ −Idir ] [ −m[−]module ] [ −M[−]‘module...’ ]
[ −P ]
[ −S ]
[ −x[dir] ]
[ −i[extension] ]
[ −e ‘command’ ] [ — ] [ programfile ] [ argument ]...
DESCRIPTION
Upon startup, Perl looks for your script in one of the following places:
1.
Specified line by line via −e switches on the command line.
2.
Contained in the file specified by the first filename on the command line. (Note that systems
supporting the #! notation invoke interpreters this way. See Location of Perl.)
3.
Passed in implicitly via standard input. This works only if there are no filename arguments—to pass
arguments to a STDIN script you must explicitly specify a "−" for the script name.
With methods 2 and 3, Perl starts parsing the input file from the beginning, unless you‘ve specified a −x
switch, in which case it scans for the first line starting with #! and containing the word "perl", and starts there
instead. This is useful for running a script embedded in a larger message. (In this case you would indicate
the end of the script using the __END__ token.)
The #! line is always examined for switches as the line is being parsed. Thus, if you‘re on a machine that
allows only one argument with the #! line, or worse, doesn‘t even recognize the #! line, you still can get
consistent switch behavior regardless of how Perl was invoked, even if −x was used to find the beginning of
the script.
Because many operating systems silently chop off kernel interpretation of the #! line after 32 characters,
some switches may be passed in on the command line, and some may not; you could even get a "−" without
its letter, if you‘re not careful. You probably want to make sure that all your switches fall either before or
after that 32 character boundary. Most switches don‘t actually care if they‘re processed redundantly, but
getting a − instead of a complete switch could cause Perl to try to execute standard input instead of your
script. And a partial −I switch could also cause odd results.
Some switches do care if they are processed twice, for instance combinations of −l and −0. Either put all the
switches after the 32 character boundary (if applicable), or replace the use of −0digits by BEGIN{ $/ =
"\0digits"; }.
Parsing of the #! switches starts wherever "perl" is mentioned in the line. The sequences "−*" and "− " are
specifically ignored so that you could, if you were so inclined, say
#!/bin/sh −− # −*− perl −*− −p
eval ’exec /usr/bin/perl −wS $0 ${1+"$@"}’
if $running_under_some_shell;
to let Perl see the −p switch.
If the #! line does not contain the word "perl", the program named after the #! is executed instead of the Perl
interpreter. This is slightly bizarre, but it helps people on machines that don‘t do #!, because they can tell a
program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter
for them.
200
Version 5.005_02
18−Oct−1998
perlrun
Perl Programmers Reference Guide
perlrun
After locating your script, Perl compiles the entire script to an internal form. If there are any compilation
errors, execution of the script is not attempted. (This is unlike the typical shell script, which might run
part−way through before finding a syntax error.)
If the script is syntactically correct, it is executed. If the script runs off the end without hitting an exit()
or die() operator, an implicit exit(0) is provided to indicate successful completion.
#! and quoting on non−Unix systems
Unix‘s #! technique can be simulated on other systems:
OS/2
Put
extproc perl −S −your_switches
as the first line in *.cmd file (−S due to a bug in cmd.exe‘s ‘extproc’ handling).
MS−DOS
Create a batch file to run your script, and codify it in ALTERNATIVE_SHEBANG (see the dosish.h file
in the source distribution for more information).
Win95/NT
The Win95/NT installation, when using the Activeware port of Perl, will modify the Registry to
associate the .pl extension with the perl interpreter. If you install another port of Perl, including the
one in the Win32 directory of the Perl distribution, then you‘ll have to modify the Registry yourself.
Note that this means you can no longer tell the difference between an executable Perl program and a
Perl library file.
Macintosh
Macintosh perl scripts will have the appropriate Creator and Type, so that double−clicking them will
invoke the perl application.
Command−interpreters on non−Unix systems have rather different ideas on quoting than Unix shells. You‘ll
need to learn the special characters in your command−interpreter (*, \ and " are common) and how to
protect whitespace and these characters to run one−liners (see −e below).
On some systems, you may have to change single−quotes to double ones, which you must NOT do on Unix
or Plan9 systems. You might also have to change a single % to a %%.
For example:
# Unix
perl −e ’print "Hello world\n"’
# MS−DOS, etc.
perl −e "print \"Hello world\n\""
# Macintosh
print "Hello world\n"
(then Run "Myscript" or Shift−Command−R)
# VMS
perl −e "print ""Hello world\n"""
The problem is that none of this is reliable: it depends on the command and it is entirely possible neither
works. If 4DOS was the command shell, this would probably work better:
perl −e "print "Hello world\n""
CMD.EXE in Windows NT slipped a lot of standard Unix functionality in when nobody was looking, but
just try to find documentation for its quoting rules.
Under the Macintosh, it depends which environment you are using. The MacPerl shell, or MPW, is much
18−Oct−1998
Version 5.005_02
201
perlrun
Perl Programmers Reference Guide
perlrun
like Unix shells in its support for several quoting variants, except that it makes free use of the Macintosh‘s
non−ASCII characters as control characters.
There is no general solution to all of this. It‘s just a mess.
Location of Perl
It may seem obvious to say, but Perl is useful only when users can easily find it. When possible, it‘s good for
both /usr/bin/perl and /usr/local/bin/perl to be symlinks to the actual binary. If that can‘t be done, system
administrators are strongly encouraged to put (symlinks to) perl and its accompanying utilities, such as
perldoc, into a directory typically found along a user‘s PATH, or in another obvious and convenient place.
In this documentation, #!/usr/bin/perl on the first line of the script will stand in for whatever method
works on your system.
Switches
A single−character switch may be combined with the following switch, if any.
#!/usr/bin/perl −spi.bak
# same as −s −p −i.bak
Switches include:
−0[digits]
specifies the input record separator ($/) as an octal number. If there are no digits, the null character
is the separator. Other switches may precede or follow the digits. For example, if you have a version
of find which can print filenames terminated by the null character, you can say this:
find . −name ’*.bak’ −print0 | perl −n0e unlink
The special value 00 will cause Perl to slurp files in paragraph mode. The value 0777 will cause Perl
to slurp files whole because there is no legal character with that value.
−a
turns on autosplit mode when used with a −n or −p. An implicit split command to the @F array is
done as the first thing inside the implicit while loop produced by the −n or −p.
perl −ane ’print pop(@F), "\n";’
is equivalent to
while (<>) {
@F = split(’ ’);
print pop(@F), "\n";
}
An alternate delimiter may be specified using −F.
−c
causes Perl to check the syntax of the script and then exit without executing it. Actually, it will
execute BEGIN, END, and use blocks, because these are considered as occurring outside the
execution of your program.
−d
runs the script under the Perl debugger. See perldebug.
−d:foo
runs the script under the control of a debugging or tracing module installed as Devel::foo. E.g.,
−d:DProf executes the script using the Devel::DProf profiler. See perldebug.
−Dletters
−Dnumber
sets debugging flags. To watch how it executes your script, use −Dtls. (This works only if
debugging is compiled into your Perl.) Another nice value is −Dx, which lists your compiled syntax
tree. And −Dr displays compiled regular expressions. As an alternative, specify a number instead of
list of letters (e.g., −D14 is equivalent to −Dtls):
1
202
p
Tokenizing and parsing
Version 5.005_02
18−Oct−1998
perlrun
Perl Programmers Reference Guide
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
s
l
t
o
c
P
m
f
r
x
u
L
H
X
D
S
perlrun
Stack snapshots
Context (loop) stack processing
Trace execution
Method and overloading resolution
String/numeric conversions
Print preprocessor command for −P
Memory allocation
Format processing
Regular expression parsing and execution
Syntax tree dump
Tainting checks
Memory leaks (needs C<−DLEAKTEST> when compiling Perl)
Hash dump −− usurps values()
Scratchpad allocation
Cleaning up
Thread synchronization
All these flags require −DDEBUGGING when you compile the Perl executable. This flag is
automatically set if you include −g option when Configure asks you about optimizer/debugger
flags.
−e commandline
may be used to enter one line of script. If −e is given, Perl will not look for a script filename in the
argument list. Multiple −e commands may be given to build up a multi−line script. Make sure to use
semicolons where you would in a normal program.
−Fpattern
specifies the pattern to split on if −a is also in effect. The pattern may be surrounded by //, "", or
‘’, otherwise it will be put in single quotes.
−h
prints a summary of the options.
−i[extension]
specifies that files processed by the <> construct are to be edited in−place. It does this by renaming
the input file, opening the output file by the original name, and selecting that output file as the default
for print() statements. The extension, if supplied, is used to modify the name of the old file to
make a backup copy, following these rules:
If no extension is supplied, no backup is made and the current file is overwritten.
If the extension doesn‘t contain a * then it is appended to the end of the current filename as a suffix.
If the extension does contain one or more * characters, then each * is replaced with the current
filename. In perl terms you could think of this as:
($backup = $extension) =~ s/\*/$file_name/g;
This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix:
$ perl −pi’bak_*’ −e ’s/bar/baz/’ fileA
# backup to ’bak_fileA’
Or even to place backup copies of the original files into another directory (provided the directory
already exists):
$ perl −pi’old/*.bak’ −e ’s/bar/baz/’ fileA # backup to ’old/fileA.bak’
These sets of one−liners are equivalent:
$ perl −pi −e ’s/bar/baz/’ fileA
$ perl −pi’*’ −e ’s/bar/baz/’ fileA
18−Oct−1998
Version 5.005_02
# overwrite current file
# overwrite current file
203
perlrun
Perl Programmers Reference Guide
perlrun
$ perl −pi’.bak’ −e ’s/bar/baz/’
# backup
fileA
to ’fileA.bak’
$ perl −pi’*.bak’ −e ’s/bar/baz/’
# backup
fileA
to ’fileA.bak’
From the shell, saying
$ perl −p −i.bak −e "s/foo/bar/; ... "
is the same as using the script:
#!/usr/bin/perl −pi.bak
s/foo/bar/;
which is equivalent to
#!/usr/bin/perl
$extension = ’.bak’;
while (<>) {
if ($ARGV ne $oldargv) {
if ($extension !~ /\*/) {
$backup = $ARGV . $extension;
}
else {
($backup = $extension) =~ s/\*/$ARGV/g;
}
rename($ARGV, $backup);
open(ARGVOUT, ">$ARGV");
select(ARGVOUT);
$oldargv = $ARGV;
}
s/foo/bar/;
}
continue {
print; # this prints to original filename
}
select(STDOUT);
except that the −i form doesn‘t need to compare $ARGV to $oldargv to know when the filename
has changed. It does, however, use ARGVOUT for the selected filehandle. Note that STDOUT is
restored as the default output filehandle after the loop.
As shown above, Perl creates the backup file whether or not any output is actually changed. So this
is just a fancy way to copy files:
$ perl −p −i’/some/file/path/*’ −e 1 file1 file2 file3...
or
$ perl −p −i’.bak’ −e 1 file1 file2 file3...
You can use eof without parentheses to locate the end of each input file, in case you want to append
to each file, or reset line numbering (see example in eof).
If, for a given file, Perl is unable to create the backup file as specified in the extension then it will
skip that file and continue on with the next one (if it exists).
For a discussion of issues surrounding file permissions and −i, see
Why does Perl let me delete read−only files? Why does −i clobber protected files? Isn‘t this a bug in Perl?.
You cannot use −i to create directories or to strip extensions from files.
Perl does not expand ~, so don‘t do that.
204
Version 5.005_02
18−Oct−1998
perlrun
Perl Programmers Reference Guide
perlrun
Finally, note that the −i switch does not impede execution when no files are given on the command
line. In this case, no backup is made (the original file cannot, of course, be determined) and
processing proceeds from STDIN to STDOUT as might be expected.
−Idirectory
Directories specified by −I are prepended to the search path for modules (@INC), and also tells the C
preprocessor where to search for include files. The C preprocessor is invoked with −P; by default it
searches /usr/include and /usr/lib/perl.
−l[octnum]
enables automatic line−ending processing. It has two effects: first, it automatically chomps "$/"
(the input record separator) when used with −n or −p, and second, it assigns "$\" (the output record
separator) to have the value of octnum so that any print statements will have that separator added
back on. If octnum is omitted, sets "$\" to the current value of "$/". For instance, to trim lines to
80 columns:
perl −lpe ’substr($_, 80) = ""’
Note that the assignment $\ = $/ is done when the switch is processed, so the input record
separator can be different than the output record separator if the −l switch is followed by a −0 switch:
gnufind / −print0 | perl −ln0e ’print "found $_" if −p’
This sets $\ to newline and then sets $/ to the null character.
−m[−]module
−M[−]module
−M[−]‘module ...’
−[mM][−]module=arg[,arg]...
−mmodule executes use module (); before executing your script.
−Mmodule executes use module ; before executing your script. You can use quotes to add extra
code after the module name, e.g., −M‘module qw(foo bar)’.
If the first character after the −M or −m is a dash (−) then the ‘use’ is replaced with ‘no’.
A little builtin syntactic sugar means you can also say −mmodule=foo,bar or
−Mmodule=foo,bar as a shortcut for −M‘module qw(foo bar)’. This avoids the need to
use quotes when importing symbols. The actual code generated by −Mmodule=foo,bar is use
module split(/,/,q{foo,bar}). Note that the = form removes the distinction between −m
and −M.
−n
causes Perl to assume the following loop around your script, which makes it iterate over filename
arguments somewhat like sed −n or awk:
while (<>) {
...
}
# your script goes here
Note that the lines are not printed by default. See −p to have lines printed. If a file named by an
argument cannot be opened for some reason, Perl warns you about it, and moves on to the next file.
Here is an efficient way to delete all files older than a week:
find . −mtime +7 −print | perl −nle ’unlink;’
This is faster than using the −exec switch of find because you don‘t have to start a process on every
filename found.
BEGIN and END blocks may be used to capture control before or after the implicit loop, just as in
awk.
18−Oct−1998
Version 5.005_02
205
perlrun
−p
Perl Programmers Reference Guide
perlrun
causes Perl to assume the following loop around your script, which makes it iterate over filename
arguments somewhat like sed:
while (<>) {
...
# your script goes here
} continue {
print or die "−p destination: $!\n";
}
If a file named by an argument cannot be opened for some reason, Perl warns you about it, and moves
on to the next file. Note that the lines are printed automatically. An error occuring during printing is
treated as fatal. To suppress printing use the −n switch. A −p overrides a −n switch.
BEGIN and END blocks may be used to capture control before or after the implicit loop, just as in
awk.
−P
causes your script to be run through the C preprocessor before compilation by Perl. (Because both
comments and cpp directives begin with the # character, you should avoid starting comments with
any words recognized by the C preprocessor such as "if", "else", or "define".)
−s
enables some rudimentary switch parsing for switches on the command line after the script name but
before any filename arguments (or before a —). Any switch found there is removed from @ARGV
and sets the corresponding variable in the Perl script. The following script prints "true" if and only if
the script is invoked with a −xyz switch.
#!/usr/bin/perl −s
if ($xyz) { print "true\n"; }
−S
makes Perl use the PATH environment variable to search for the script (unless the name of the script
contains directory separators). On some platforms, this also makes Perl append suffixes to the
filename while searching for it. For example, on Win32 platforms, the ".bat" and ".cmd" suffixes are
appended if a lookup for the original name fails, and if the name does not already end in one of those
suffixes. If your Perl was compiled with DEBUGGING turned on, using the −Dp switch to Perl
shows how the search progresses.
If the filename supplied contains directory separators (i.e. it is an absolute or relative pathname), and
if the file is not found, platforms that append file extensions will do so and try to look for the file with
those extensions added, one by one.
On DOS−like platforms, if the script does not contain directory separators, it will first be searched for
in the current directory before being searched for on the PATH. On Unix platforms, the script will be
searched for strictly on the PATH.
Typically this is used to emulate #! startup on platforms that don‘t support #!. This example works
on many platforms that have a shell compatible with Bourne shell:
#!/usr/bin/perl
eval ’exec /usr/bin/perl −wS $0 ${1+"$@"}’
if $running_under_some_shell;
The system ignores the first line and feeds the script to /bin/sh, which proceeds to try to execute the
Perl script as a shell script. The shell executes the second line as a normal shell command, and thus
starts up the Perl interpreter. On some systems $0 doesn‘t always contain the full pathname, so the
−S tells Perl to search for the script if necessary. After Perl locates the script, it parses the lines and
ignores them because the variable $running_under_some_shell is never true. If the script
will be interpreted by csh, you will need to replace ${1+"$@"} with $*, even though that doesn‘t
understand embedded spaces (and such) in the argument list. To start up sh rather than csh, some
systems may have to replace the #! line with a line containing just a colon, which will be politely
ignored by Perl. Other systems can‘t control that, and need a totally devious construct that will work
under any of csh, sh, or Perl, such as the following:
206
Version 5.005_02
18−Oct−1998
perlrun
Perl Programmers Reference Guide
perlrun
eval ’(exit $?0)’ && eval ’exec /usr/bin/perl −wS $0 ${1+"$@"}’
& eval ’exec /usr/bin/perl −wS $0 $argv:q’
if $running_under_some_shell;
−T
forces "taint" checks to be turned on so you can test them. Ordinarily these checks are done only
when running setuid or setgid. It‘s a good idea to turn them on explicitly for programs run on
another‘s behalf, such as CGI programs. See perlsec. Note that (for security reasons) this option
must be seen by Perl quite early; usually this means it must appear early on the command line or in
the #! line (for systems which support that).
−u
causes Perl to dump core after compiling your script. You can then in theory take this core dump and
turn it into an executable file by using the undump program (not supplied). This speeds startup at
the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello
world" executable comes out to about 200K on my machine.) If you want to execute a portion of
your script before dumping, use the dump() operator instead. Note: availability of undump is
platform specific and may not be available for a specific port of Perl. It has been superseded by the
new perl−to−C compiler, which is more portable, even though it‘s still only considered beta.
−U
allows Perl to do unsafe operations. Currently the only "unsafe" operations are the unlinking of
directories while running as superuser, and running setuid programs with fatal taint checks turned
into warnings. Note that the −w switch (or the $^W variable) must be used along with this option to
actually generate the taint−check warnings.
−v
prints the version and patchlevel of your Perl executable.
−V
prints summary of the major perl configuration values and the current value of @INC.
−V:name
Prints to STDOUT the value of the named configuration variable.
−w
prints warnings about variable names that are mentioned only once, and scalar variables that are used
before being set. Also warns about redefined subroutines, and references to undefined filehandles or
filehandles opened read−only that you are attempting to write on. Also warns you if you use values
as a number that doesn‘t look like numbers, using an array as though it were a scalar, if your
subroutines recurse more than 100 deep, and innumerable other things.
You can disable specific warnings using __WARN__ hooks, as described in perlvar and warn. See
also perldiag and perltrap.
−x directory
tells Perl that the script is embedded in a message. Leading garbage will be discarded until the first
line that starts with #! and contains the string "perl". Any meaningful switches on that line will be
applied. If a directory name is specified, Perl will switch to that directory before running the script.
The −x switch controls only the disposal of leading garbage. The script must be terminated with
__END__ if there is trailing garbage to be ignored (the script can process any or all of the trailing
garbage via the DATA filehandle if desired).
ENVIRONMENT
HOME
Used if chdir has no argument.
LOGDIR
Used if chdir has no argument and HOME is not set.
PATH
Used in executing subprocesses, and in finding the script if −S is used.
PERL5LIB
A colon−separated list of directories in which to look for Perl library files before looking
in the standard library and the current directory. If PERL5LIB is not defined, PERLLIB is
used. When running taint checks (because the script was running setuid or setgid, or the
−T switch was used), neither variable is used. The script should instead say
use lib "/my/directory";
18−Oct−1998
Version 5.005_02
207
perlrun
Perl Programmers Reference Guide
perlrun
PERL5OPT
Command−line options (switches). Switches in this variable are taken as if they were on
every Perl command line. Only the −[DIMUdmw] switches are allowed. When running
taint checks (because the script was running setuid or setgid, or the −T switch was used),
this variable is ignored.
PERLLIB
A colon−separated list of directories in which to look for Perl library files before looking
in the standard library and the current directory. If PERL5LIB is defined, PERLLIB is not
used.
PERL5DB
The command used to load the debugger code. The default is:
BEGIN { require ’perl5db.pl’ }
PERL5SHELL (specific to WIN32 port)
May be set to an alternative shell that perl must use internally for executing "backtick"
commands or system(). Default is cmd.exe /x/c on WindowsNT and
command.com /c on Windows95. The value is considered to be space delimited.
Precede any character that needs to be protected (like a space or backslash) with a
backslash.
Note that Perl doesn‘t use COMSPEC for this purpose because COMSPEC has a high
degree of variability among users, leading to portability concerns. Besides, perl can use a
shell that may not be fit for interactive use, and setting COMSPEC to such a shell may
interfere with the proper functioning of other programs (which usually look in COMSPEC
to find a shell fit for interactive use).
PERL_DEBUG_MSTATS
Relevant only if perl is compiled with the malloc included with the perl distribution (that
is, if perl −V:d_mymalloc is ‘define’). If set, this causes memory statistics to be
dumped after execution. If set to an integer greater than one, also causes memory statistics
to be dumped after compilation.
PERL_DESTRUCT_LEVEL
Relevant only if your perl executable was built with −DDEBUGGING, this controls the
behavior of global destruction of objects and other references.
Perl also has environment variables that control how Perl handles data specific to particular natural
languages. See perllocale.
Apart from these, Perl uses no other environment variables, except to make them available to the script being
executed, and to child processes. However, scripts running setuid would do well to execute the following
lines before doing anything else, just to keep people honest:
$ENV{PATH} = ’/bin:/usr/bin’;
# or whatever you need
$ENV{SHELL} = ’/bin/sh’ if exists $ENV{SHELL};
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};
208
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
NAME
perlfunc − Perl builtin functions
DESCRIPTION
The functions in this section can serve as terms in an expression. They fall into two major categories: list
operators and named unary operators. These differ in their precedence relationship with a following comma.
(See the precedence table in perlop.) List operators take more than one argument, while unary operators can
never take more than one argument. Thus, a comma terminates the argument of a unary operator, but merely
separates the arguments of a list operator. A unary operator generally provides a scalar context to its
argument, while a list operator may provide either scalar and list contexts for its arguments. If it does both,
the scalar arguments will be first, and the list argument will follow. (Note that there can ever be only one list
argument.) For instance, splice() has three scalar arguments followed by a list.
In the syntax descriptions that follow, list operators that expect a list (and provide list context for the
elements of the list) are shown with LIST as an argument. Such a list may consist of any combination of
scalar arguments or list values; the list values will be included in the list as if each individual element were
interpolated at that point in the list, forming a longer single−dimensional list value. Elements of the LIST
should be separated by commas.
Any function in the list below may be used either with or without parentheses around its arguments. (The
syntax descriptions omit the parentheses.) If you use the parentheses, the simple (but occasionally
surprising) rule is this: It LOOKS like a function, therefore it IS a function, and precedence doesn‘t matter.
Otherwise it‘s a list operator or unary operator, and precedence does matter. And whitespace between the
function and left parenthesis doesn‘t count—so you need to be careful sometimes:
print 1+2+4;
print(1+2) + 4;
print (1+2)+4;
print +(1+2)+4;
print ((1+2)+4);
#
#
#
#
#
Prints 7.
Prints 3.
Also prints 3!
Prints 7.
Prints 7.
If you run Perl with the −w switch it can warn you about this. For example, the third line above produces:
print (...) interpreted as function at − line 1.
Useless use of integer addition in void context at − line 1.
For functions that can be used in either a scalar or list context, nonabortive failure is generally indicated in a
scalar context by returning the undefined value, and in a list context by returning the null list.
Remember the following important rule: There is no rule that relates the behavior of an expression in list
context to its behavior in scalar context, or vice versa. It might do two totally different things. Each operator
and function decides which sort of value it would be most appropriate to return in a scalar context. Some
operators return the length of the list that would have been returned in list context. Some operators return the
first value in the list. Some operators return the last value in the list. Some operators return a count of
successful operations. In general, they do what you want, unless you want consistency.
An named array in scalar context is quite different from what would at first glance appear to be a list in
scalar context. You can‘t get a list like (1,2,3) into being in scalar context, because the compiler knows
the context at compile time. It would generate the scalar comma operator there, not the list construction
version of the comma. That means it was never a list to start with.
In general, functions in Perl that serve as wrappers for system calls of the same name (like chown(2), fork(2),
closedir(2), etc.) all return true when they succeed and undef otherwise, as is usually mentioned in the
descriptions below. This is different from the C interfaces, which return −1 on failure. Exceptions to this
rule are wait(), waitpid(), and syscall(). System calls also set the special $! variable on failure.
Other functions do not, except accidentally.
18−Oct−1998
Version 5.005_02
209
perlfunc
Perl Programmers Reference Guide
perlfunc
Perl Functions by Category
Here are Perl‘s functions (including things that look like functions, like some keywords and named
operators) arranged by category. Some functions appear in more than one place.
Functions for SCALARs or strings
chomp, chop, chr, crypt, hex, index, lc, lcfirst, length, oct, ord, pack,
q/STRING/, qq/STRING/, reverse, rindex, sprintf, substr, tr///, uc, ucfirst,
y///
Regular expressions and pattern matching
m//, pos, quotemeta, s///, split, study, qr//
Numeric functions
abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, srand
Functions for real @ARRAYs
pop, push, shift, splice, unshift
Functions for list data
grep, join, map, qw/STRING/, reverse, sort, unpack
Functions for real %HASHes
delete, each, exists, keys, values
Input and output functions
binmode, close, closedir, dbmclose, dbmopen, die, eof, fileno, flock, format,
getc, print, printf, read, readdir, rewinddir, seek, seekdir, select, syscall,
sysread, sysseek, syswrite, tell, telldir, truncate, warn, write
Functions for fixed length data or records
pack, read, syscall, sysread, syswrite, unpack, vec
Functions for filehandles, files, or directories
−X, chdir, chmod, chown, chroot, fcntl, glob, ioctl, link, lstat, mkdir, open,
opendir, readlink, rename, rmdir, stat, symlink, umask, unlink, utime
Keywords related to the control flow of your perl program
caller, continue, die, do, dump, eval, exit, goto, last, next, redo, return, sub,
wantarray
Keywords related to scoping
caller, import, local, my, package, use
Miscellaneous functions
defined, dump, eval, formline, local, my, reset, scalar, undef, wantarray
Functions for processes and process groups
alarm, exec, fork, getpgrp, getppid, getpriority, kill, pipe, qx/STRING/,
setpgrp, setpriority, sleep, system, times, wait, waitpid
Keywords related to perl modules
do, import, no, package, require, use
Keywords related to classes and object−orientedness
bless, dbmclose, dbmopen, package, ref, tie, tied, untie, use
Low−level socket functions
accept, bind, connect, getpeername, getsockname, getsockopt, listen, recv,
send, setsockopt, shutdown, socket, socketpair
210
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
System V interprocess communication functions
msgctl, msgget, msgrcv, msgsnd, semctl, semget, semop, shmctl, shmget, shmread,
shmwrite
Fetching user and group info
endgrent, endhostent, endnetent, endpwent, getgrent, getgrgid, getgrnam,
getlogin, getpwent, getpwnam, getpwuid, setgrent, setpwent
Fetching network info
endprotoent, endservent, gethostbyaddr, gethostbyname, gethostent,
getnetbyaddr, getnetbyname, getnetent, getprotobyname, getprotobynumber,
getprotoent, getservbyname, getservbyport, getservent, sethostent,
setnetent, setprotoent, setservent
Time−related functions
gmtime, localtime, time, times
Functions new in perl5
abs, bless, chomp, chr, exists, formline, glob, import, lc, lcfirst, map, my, no,
prototype, qx, qw, readline, readpipe, ref, sub*, sysopen, tie, tied, uc, ucfirst,
untie, use
* − sub was a keyword in perl4, but in perl5 it is an operator, which can be used in expressions.
Functions obsoleted in perl5
dbmclose, dbmopen
Alphabetical Listing of Perl Functions
−X FILEHANDLE
−X EXPR
−X
A file test, where X is one of the letters listed below. This unary operator takes one argument,
either a filename or a filehandle, and tests the associated file to see if something is true about it.
If the argument is omitted, tests $_, except for −t, which tests STDIN. Unless otherwise
documented, it returns 1 for TRUE and ‘’ for FALSE, or the undefined value if the file doesn‘t
exist. Despite the funny names, precedence is the same as any other named unary operator, and
the argument may be parenthesized like any other unary operator. The operator may be any of:
X<−rX<−wX<−xX<−oX<−RX<−WX<−XX<−OX<−eX<−zX<−sX<−fX<−dX<−lX<−p
X<−SX<−bX<−cX<−tX<−uX<−gX<−kX<−TX<−BX<−MX<−AX<−C
18−Oct−1998
−r
−w
−x
−o
File
File
File
File
is
is
is
is
readable by effective uid/gid.
writable by effective uid/gid.
executable by effective uid/gid.
owned by effective uid.
−R
−W
−X
−O
File
File
File
File
is
is
is
is
readable by real uid/gid.
writable by real uid/gid.
executable by real uid/gid.
owned by real uid.
−e
−z
−s
File exists.
File has zero size.
File has nonzero size (returns size).
−f
−d
−l
−p
−S
File
File
File
File
File
is
is
is
is
is
a
a
a
a
a
plain file.
directory.
symbolic link.
named pipe (FIFO), or Filehandle is a pipe.
socket.
Version 5.005_02
211
perlfunc
Perl Programmers Reference Guide
−b
−c
−t
File is a block special file.
File is a character special file.
Filehandle is opened to a tty.
−u
−g
−k
File has setuid bit set.
File has setgid bit set.
File has sticky bit set.
−T
−B
File is a text file.
File is a binary file (opposite of −T).
−M
−A
−C
Age of file in days when script started.
Same for access time.
Same for inode change time.
perlfunc
The interpretation of the file permission operators −r, −R, −w, −W, −x, and −X is based solely on
the mode of the file and the uids and gids of the user. There may be other reasons you can‘t
actually read, write, or execute the file, such as AFS access control lists. Also note that, for the
superuser, −r, −R, −w, and −W always return 1, and −x and −X return 1 if any execute bit is set
in the mode. Scripts run by the superuser may thus need to do a stat() to determine the actual
mode of the file, or temporarily set the uid to something else.
Example:
while (<>) {
chop;
next unless −f $_;
#...
}
# ignore specials
Note that −s/a/b/ does not do a negated substitution. Saying −exp($foo) still works as
expected, however—only single letters following a minus are interpreted as file tests.
The −T and −B switches work as follows. The first block or so of the file is examined for odd
characters such as strange control codes or characters with the high bit set. If too many strange
characters (>30%) are found, it‘s a −B file, otherwise it‘s a −T file. Also, any file containing
null in the first block is considered a binary file. If −T or −B is used on a filehandle, the current
stdio buffer is examined rather than the first block. Both −T and −B return TRUE on a null file,
or a file at EOF when testing a filehandle. Because you have to read a file to do the −T test, on
most occasions you want to use a −f against the file first, as in next unless −f $file
&& −T $file.
If any of the file tests (or either the stat() or lstat() operators) are given the special
filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat
operator) is used, saving a system call. (This doesn‘t work with −t, and you need to remember
that lstat() and −l will leave values in the stat structure for the symbolic link, not the real
file.) Example:
print "Can do.\n" if −r $a || −w _ || −x _;
stat($filename);
print "Readable\n" if −r _;
print "Writable\n" if −w _;
print "Executable\n" if −x _;
print "Setuid\n" if −u _;
print "Setgid\n" if −g _;
print "Sticky\n" if −k _;
print "Text\n" if −T _;
print "Binary\n" if −B _;
212
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
abs VALUE
abs
Returns the absolute value of its argument. If VALUE is omitted, uses $_.
accept NEWSOCKET,GENERICSOCKET
Accepts an incoming socket connect, just as the accept(2) system call does. Returns the packed
address if it succeeded, FALSE otherwise. See example in
Sockets: Client/Server Communication in perlipc.
alarm SECONDS
alarm
Arranges to have a SIGALRM delivered to this process after the specified number of seconds
have elapsed. If SECONDS is not specified, the value stored in $_ is used. (On some machines,
unfortunately, the elapsed time may be up to one second less than you specified because of how
seconds are counted.) Only one timer may be counting at once. Each call disables the previous
timer, and an argument of may be supplied to cancel the previous timer without starting a new
one. The returned value is the amount of time remaining on the previous timer.
For delays of finer granularity than one second, you may use Perl‘s syscall() interface to
access setitimer(2) if your system supports it, or else see /select(). It is usually a mistake to
intermix alarm() and sleep() calls.
If you want to use alarm() to time out a system call you need to use an eval()/die() pair.
You can‘t rely on the alarm causing the system call to fail with $! set to EINTR because Perl
sets up signal handlers to restart system calls on some systems. Using eval()/die() always
works, modulo the caveats given in Signals in perlipc.
eval {
local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
alarm $timeout;
$nread = sysread SOCKET, $buffer, $size;
alarm 0;
};
if ($@) {
die unless $@ eq "alarm\n";
# propagate unexpected errors
# timed out
}
else {
# didn’t
}
atan2 Y,X
Returns the arctangent of Y/X in the range −PI to PI.
For the tangent operation, you may use the POSIX::tan() function, or use the familiar
relation:
sub tan { sin($_[0]) / cos($_[0])
}
bind SOCKET,NAME
Binds a network address to a socket, just as the bind system call does. Returns TRUE if it
succeeded, FALSE otherwise. NAME should be a packed address of the appropriate type for the
socket. See the examples in Sockets: Client/Server Communication in perlipc.
binmode FILEHANDLE
Arranges for the file to be read or written in "binary" mode in operating systems that distinguish
between binary and text files. Files that are not in binary mode have CR LF sequences translated
to LF on input and LF translated to CR LF on output. Binmode has no effect under Unix; in
MS−DOS and similarly archaic systems, it may be imperative—otherwise your
MS−DOS−damaged C library may mangle your file. The key distinction between systems that
18−Oct−1998
Version 5.005_02
213
perlfunc
Perl Programmers Reference Guide
perlfunc
need binmode() and those that don‘t is their text file formats. Systems like Unix, MacOS, and
Plan9 that delimit lines with a single character, and that encode that character in C as "\n", do
not need binmode(). The rest need it. If FILEHANDLE is an expression, the value is taken
as the name of the filehandle.
bless REF,CLASSNAME
bless REF
This function tells the thingy referenced by REF that it is now an object in the CLASSNAME
package—or the current package if no CLASSNAME is specified, which is often the case. It
returns the reference for convenience, because a bless() is often the last thing in a
constructor. Always use the two−argument version if the function doing the blessing might be
inherited by a derived class. See perltoot and perlobj for more about the blessing (and blessings)
of objects.
caller EXPR
caller
Returns the context of the current subroutine call. In scalar context, returns the caller‘s package
name if there is a caller, that is, if we‘re in a subroutine or eval() or require(), and the
undefined value otherwise. In list context, returns
($package, $filename, $line) = caller;
With EXPR, it returns some extra information that the debugger uses to print a stack trace. The
value of EXPR indicates how many call frames to go back before the current one.
($package, $filename, $line, $subroutine,
$hasargs, $wantarray, $evaltext, $is_require) = caller($i);
Here $subroutine may be "(eval)" if the frame is not a subroutine call, but an eval().
In such a case additional elements $evaltext and $is_require are set: $is_require is
true if the frame is created by a require or use statement, $evaltext contains the text of
the eval EXPR statement. In particular, for a eval BLOCK statement, $filename is
"(eval)", but $evaltext is undefined. (Note also that each use statement creates a
require frame inside an eval EXPR) frame.
Furthermore, when called from within the DB package, caller returns more detailed information:
it sets the list variable @DB::args to be the arguments with which the subroutine was invoked.
Be aware that the optimizer might have optimized call frames away before caller() had a
chance to get the information. That means that caller(N) might not return information about
the call frame you expect it do, for N > 1. In particular, @DB::args might have information
from the previous time caller() was called.
chdir EXPR
Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to home
directory. Returns TRUE upon success, FALSE otherwise. See example under die().
chmod LIST
Changes the permissions of a list of files. The first element of the list must be the numerical
mode, which should probably be an octal number, and which definitely should not a string of
octal digits: 0644 is okay, ‘0644’ is not. Returns the number of files successfully changed.
See also /oct, if all you have is a string.
$cnt = chmod 0755, ’foo’, ’bar’;
chmod 0755, @executables;
$mode = ’0644’; chmod $mode, ’foo’;
#
#
$mode = ’0644’; chmod oct($mode), ’foo’; #
$mode = 0644;
chmod $mode, ’foo’;
#
214
Version 5.005_02
!!! sets mode to
−−w−−−−r−T
this is better
this is best
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
chomp VARIABLE
chomp LIST
chomp
This is a slightly safer version of /chop. It removes any line ending that corresponds to the
current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module).
It returns the total number of characters removed from all its arguments. It‘s often used to
remove the newline from the end of an input record when you‘re worried that the final record
may be missing its newline. When in paragraph mode ($/ = ""), it removes all trailing
newlines from the string. If VARIABLE is omitted, it chomps $_. Example:
while (<>) {
chomp; # avoid \n on last field
@array = split(/:/);
# ...
}
You can actually chomp anything that‘s an lvalue, including an assignment:
chomp($cwd = ‘pwd‘);
chomp($answer = );
If you chomp a list, each element is chomped, and the total number of characters removed is
returned.
chop VARIABLE
chop LIST
chop
Chops off the last character of a string and returns the character chopped. It‘s used primarily to
remove the newline from the end of an input record, but is much more efficient than s/\n//
because it neither scans nor copies the string. If VARIABLE is omitted, chops $_. Example:
while (<>) {
chop;
# avoid \n on last field
@array = split(/:/);
#...
}
You can actually chop anything that‘s an lvalue, including an assignment:
chop($cwd = ‘pwd‘);
chop($answer = );
If you chop a list, each element is chopped. Only the value of the last chop() is returned.
Note that chop() returns the last character.
substr($string, 0, −1).
To return all but the last character, use
chown LIST
Changes the owner (and group) of a list of files. The first two elements of the list must be the
NUMERICAL uid and gid, in that order. Returns the number of files successfully changed.
$cnt = chown $uid, $gid, ’foo’, ’bar’;
chown $uid, $gid, @filenames;
Here‘s an example that looks up nonnumeric uids in the passwd file:
print "User: ";
chop($user = );
print "Files: ";
chop($pattern = );
($login,$pass,$uid,$gid) = getpwnam($user)
or die "$user not in passwd file";
18−Oct−1998
Version 5.005_02
215
perlfunc
Perl Programmers Reference Guide
perlfunc
@ary = glob($pattern);
# expand filenames
chown $uid, $gid, @ary;
On most systems, you are not allowed to change the ownership of the file unless you‘re the
superuser, although you should be able to change the group to any of your secondary groups. On
insecure systems, these restrictions may be relaxed, but this is not a portable assumption.
chr NUMBER
chr
Returns the character represented by that NUMBER in the character set. For example, chr(65)
is "A" in ASCII. For the reverse, use /ord.
If NUMBER is omitted, uses $_.
chroot FILENAME
chroot
This function works like the system call by the same name: it makes the named directory the new
root directory for all further pathnames that begin with a "/" by your process and all its
children. (It doesn‘t change your current working directory, which is unaffected.) For security
reasons, this call is restricted to the superuser. If FILENAME is omitted, does a chroot() to
$_.
close FILEHANDLE
close
Closes the file or pipe associated with the file handle, returning TRUE only if stdio successfully
flushes buffers and closes the system file descriptor. Closes the currently selected filehandle if
the argument is omitted.
You don‘t have to close FILEHANDLE if you are immediately going to do another open() on
it, because open() will close it for you. (See open().) However, an explicit close() on an
input file resets the line counter ($.), while the implicit close done by open() does not.
If the file handle came from a piped open close() will additionally return FALSE if one of the
other system calls involved fails or if the program exits with non−zero status. (If the only
problem was that the program exited non−zero $! will be set to .) Also, closing a pipe waits
for the process executing on the pipe to complete, in case you want to look at the output of the
pipe afterwards. Closing a pipe explicitly also puts the exit status value of the command into
$?.
Example:
open(OUTPUT, ’|sort >foo’) # pipe to sort
or die "Can’t start sort: $!";
#...
# print stuff to output
close OUTPUT
# wait for sort to finish
or warn $! ? "Error closing sort pipe: $!"
: "Exit status $? from sort";
open(INPUT, ’foo’)
# get sort’s results
or die "Can’t open ’foo’ for input: $!";
FILEHANDLE may be an expression whose value can be used as an indirect filehandle, usually
the real filehandle name.
closedir DIRHANDLE
Closes a directory opened by opendir() and returns the success of that system call.
DIRHANDLE may be an expression whose value can be used as an indirect dirhandle, usually
the real dirhandle name.
connect SOCKET,NAME
Attempts to connect to a remote socket, just as the connect system call does. Returns TRUE if it
succeeded, FALSE otherwise. NAME should be a packed address of the appropriate type for the
socket. See the examples in Sockets: Client/Server Communication in perlipc.
216
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
continue BLOCK
Actually a flow control statement rather than a function. If there is a continue BLOCK
attached to a BLOCK (typically in a while or foreach), it is always executed just before the
conditional is about to be evaluated again, just like the third part of a for loop in C. Thus it can
be used to increment a loop variable, even when the loop has been continued via the next
statement (which is similar to the C continue statement).
last, next, or redo may appear within a continue block. last and redo will behave as
if they had been executed within the main block. So will next, but since it will execute a
continue block, it may be more entertaining.
while (EXPR) {
### redo always comes here
do_something;
} continue {
### next always comes here
do_something_else;
# then back the top to re−check EXPR
}
### last always comes here
Omitting the continue section is semantically equivalent to using an empty one, logically
enough. In that case, next goes directly back to check the condition at the top of the loop.
cos EXPR
Returns the cosine of EXPR (expressed in radians). If EXPR is omitted, takes cosine of $_.
For the inverse cosine operation, you may use the POSIX::acos() function, or use this
relation:
sub acos { atan2( sqrt(1 − $_[0] * $_[0]), $_[0] ) }
crypt PLAINTEXT,SALT
Encrypts a string exactly like the crypt(3) function in the C library (assuming that you actually
have a version there that has not been extirpated as a potential munition). This can prove useful
for checking the password file for lousy passwords, amongst other things. Only the guys
wearing white hats should do this.
Note that crypt() is intended to be a one−way function, much like breaking eggs to make an
omelette. There is no (known) corresponding decrypt function. As a result, this function isn‘t all
that useful for cryptography. (For that, see your nearby CPAN mirror.)
Here‘s an example that makes sure that whoever runs this program knows their own password:
$pwd = (getpwuid($<))[1];
$salt = substr($pwd, 0, 2);
system "stty −echo";
print "Password: ";
chop($word = );
print "\n";
system "stty echo";
if (crypt($word, $salt) ne $pwd) {
die "Sorry...\n";
} else {
print "ok\n";
}
18−Oct−1998
Version 5.005_02
217
perlfunc
Perl Programmers Reference Guide
perlfunc
Of course, typing in your own password to whoever asks you for it is unwise.
dbmclose HASH
[This function has been superseded by the untie() function.]
Breaks the binding between a DBM file and a hash.
dbmopen HASH,DBNAME,MODE
[This function has been superseded by the tie() function.]
This binds a dbm(3), ndbm(3), sdbm(3), gdbm(3), or Berkeley DB file to a hash. HASH is the
name of the hash. (Unlike normal open(), the first argument is NOT a filehandle, even though
it looks like one). DBNAME is the name of the database (without the .dir or .pag extension if
any). If the database does not exist, it is created with protection specified by MODE (as
modified by the umask()). If your system supports only the older DBM functions, you may
perform only one dbmopen() in your program. In older versions of Perl, if your system had
neither DBM nor ndbm, calling dbmopen() produced a fatal error; it now falls back to
sdbm(3).
If you don‘t have write access to the DBM file, you can only read hash variables, not set them.
If you want to test whether you can write, either use file tests or try setting a dummy hash entry
inside an eval(), which will trap the error.
Note that functions such as keys() and values() may return huge lists when used on large
DBM files. You may prefer to use the each() function to iterate over large DBM files.
Example:
# print out history file offsets
dbmopen(%HIST,’/usr/lib/news/history’,0666);
while (($key,$val) = each %HIST) {
print $key, ’ = ’, unpack(’L’,$val), "\n";
}
dbmclose(%HIST);
See also AnyDBM_File for a more general description of the pros and cons of the various dbm
approaches, as well as DB_File for a particularly rich implementation.
defined EXPR
defined
Returns a Boolean value telling whether EXPR has a value other than the undefined value
undef. If EXPR is not present, $_ will be checked.
Many operations return undef to indicate failure, end of file, system error, uninitialized
variable, and other exceptional conditions. This function allows you to distinguish undef from
other values. (A simple Boolean test will not distinguish among undef, zero, the empty string,
and "0", which are all equally false.) Note that since undef is a valid scalar, its presence
doesn‘t necessarily indicate an exceptional condition: pop() returns undef when its argument
is an empty array, or when the element to return happens to be undef.
You may also use defined() to check whether a subroutine exists, by saying defined
&func without parentheses. On the other hand, use of defined() upon aggregates (hashes
and arrays) is not guaranteed to produce intuitive results, and should probably be avoided.
When used on a hash element, it tells you whether the value is defined, not whether the key
exists in the hash. Use /exists for the latter purpose.
Examples:
print if defined $switch{’D’};
print "$val\n" while defined($val = pop(@ary));
die "Can’t readlink $sym: $!"
unless defined($value = readlink $sym);
218
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
sub foo { defined &$bar ? &$bar(@_) : die "No bar"; }
$debugging = 0 unless defined $debugging;
Note: Many folks tend to overuse defined(), and then are surprised to discover that the
number and "" (the zero−length string) are, in fact, defined values. For example, if you say
"ab" =~ /a(.*)b/;
The pattern match succeeds, and $1 is defined, despite the fact that it matched "nothing". But it
didn‘t really match nothing—rather, it matched something that happened to be characters long.
This is all very above−board and honest. When a function returns an undefined value, it‘s an
admission that it couldn‘t give you an honest answer. So you should use defined() only
when you‘re questioning the integrity of what you‘re trying to do. At other times, a simple
comparison to or "" is what you want.
Currently, using defined() on an entire array or hash reports whether memory for that
aggregate has ever been allocated. So an array you set to the empty list appears undefined
initially, and one that once was full and that you then set to the empty list still appears defined.
You should instead use a simple test for size:
if (@an_array) { print "has array elements\n" }
if (%a_hash)
{ print "has hash members\n"
}
Using undef() on these, however, does clear their memory and then report them as not defined
anymore, but you shouldn‘t do that unless you don‘t plan to use them again, because it saves
time when you load them up again to have memory already ready to be filled. The normal way
to free up space used by an aggregate is to assign the empty list.
This counterintuitive behavior of defined() on aggregates may be changed, fixed, or broken
in a future release of Perl.
See also /undef, /exists, /ref.
delete EXPR
Deletes the specified key(s) and their associated values from a hash. For each key, returns the
deleted value associated with that key, or the undefined value if there was no such key. Deleting
from $ENV{} modifies the environment. Deleting from a hash tied to a DBM file deletes the
entry from the DBM file. (But deleting from a tie()d hash doesn‘t necessarily return
anything.)
The following deletes all the values of a hash:
foreach $key (keys %HASH) {
delete $HASH{$key};
}
And so does this:
delete @HASH{keys %HASH}
(But both of these are slower than just assigning the empty list, or using undef().) Note that
the EXPR can be arbitrarily complicated as long as the final operation is a hash element lookup
or hash slice:
delete $ref−>[$x][$y]{$key};
delete @{$ref−>[$x][$y]}{$key1, $key2, @morekeys};
die LIST
18−Oct−1998
Outside an eval(), prints the value of LIST to STDERR and exits with the current value of $!
(errno). If $! is , exits with the value of ($? >> 8) (backtick ‘command‘ status). If ($?
>> 8) is , exits with 255. Inside an eval(), the error message is stuffed into $@ and the
eval() is terminated with the undefined value. This makes die() the way to raise an
exception.
Version 5.005_02
219
perlfunc
Perl Programmers Reference Guide
perlfunc
Equivalent examples:
die "Can’t cd to spool: $!\n" unless chdir ’/usr/spool/news’;
chdir ’/usr/spool/news’ or die "Can’t cd to spool: $!\n"
If the value of EXPR does not end in a newline, the current script line number and input line
number (if any) are also printed, and a newline is supplied. Hint: sometimes appending ",
stopped" to your message will cause it to make better sense when the string "at foo line
123" is appended. Suppose you are running script "canasta".
die "/etc/games is no good";
die "/etc/games is no good, stopped";
produce, respectively
/etc/games is no good at canasta line 123.
/etc/games is no good, stopped at canasta line 123.
See also exit() and warn().
If LIST is empty and $@ already contains a value (typically from a previous eval) that value is
reused after appending "\t...propagated". This is useful for propagating exceptions:
eval { ... };
die unless $@ =~ /Expected exception/;
If $@ is empty then the string "Died" is used.
You can arrange for a callback to be run just before the die() does its deed, by setting the
$SIG{__DIE__} hook. The associated handler will be called with the error text and can
change the error message, if it sees fit, by calling die() again. See $SIG{expr} for details
on setting %SIG entries, and "eval BLOCK" for some examples.
Note that the $SIG{__DIE__} hook is called even inside eval()ed blocks/strings. If one
wants the hook to do nothing in such situations, put
die @_ if $^S;
as the first line of the handler (see $^S).
do BLOCK
Not really a function. Returns the value of the last command in the sequence of commands
indicated by BLOCK. When modified by a loop modifier, executes the BLOCK once before
testing the loop condition. (On other statements the loop modifiers test the conditional first.)
do SUBROUTINE(LIST)
A deprecated form of subroutine call. See perlsub.
do EXPR Uses the value of EXPR as a filename and executes the contents of the file as a Perl script. Its
primary use is to include subroutines from a Perl subroutine library.
do ’stat.pl’;
is just like
scalar eval ‘cat stat.pl‘;
except that it‘s more efficient and concise, keeps track of the current filename for error
messages, and searches all the −I libraries if the file isn‘t in the current directory (see also the
@INC array in Predefined Names). It is also different in how code evaluated with do
FILENAME doesn‘t see lexicals in the enclosing scope like eval STRING does. It‘s the same,
however, in that it does reparse the file every time you call it, so you probably don‘t want to do
this inside a loop.
220
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
If do cannot read the file, it returns undef and sets $! to the error. If do can read the file but
cannot compile it, it returns undef and sets an error message in $@. If the file is successfully
compiled, do returns the value of the last expression evaluated.
Note that inclusion of library modules is better done with the use() and require()
operators, which also do automatic error checking and raise an exception if there‘s a problem.
You might like to use do to read in a program configuration file. Manual error checking can be
done this way:
# read in config files: system first, then user
for $file ("/share/prog/defaults.rc",
"$ENV{HOME}/.someprogrc") {
unless ($return = do $file) {
warn "couldn’t parse $file: $@" if $@;
warn "couldn’t do $file: $!"
unless defined $return;
warn "couldn’t run $file"
unless $return;
}
}
dump LABEL
This causes an immediate core dump. Primarily this is so that you can use the undump program
to turn your core dump into an executable binary after having initialized all your variables at the
beginning of the program. When the new binary is executed it will begin by executing a goto
LABEL (with all the restrictions that goto suffers). Think of it as a goto with an intervening
core dump and reincarnation. If LABEL is omitted, restarts the program from the top.
WARNING: Any files opened at the time of the dump will NOT be open any more when the
program is reincarnated, with possible resulting confusion on the part of Perl. See also −u option
in perlrun.
Example:
#!/usr/bin/perl
require ’getopt.pl’;
require ’stat.pl’;
%days = (
’Sun’ => 1,
’Mon’ => 2,
’Tue’ => 3,
’Wed’ => 4,
’Thu’ => 5,
’Fri’ => 6,
’Sat’ => 7,
);
dump QUICKSTART if $ARGV[0] eq ’−d’;
QUICKSTART:
Getopt(’f’);
This operator is largely obsolete, partly because it‘s very hard to convert a core file into an
executable, and because the real perl−to−C compiler has superseded it.
each HASH
When called in list context, returns a 2−element list consisting of the key and value for the next
element of a hash, so that you can iterate over it. When called in scalar context, returns the key
for only the "next" element in the hash. (Note: Keys may be "0" or "", which are logically
false; you may wish to avoid constructs like while ($k = each %foo) {} for this
reason.)
18−Oct−1998
Version 5.005_02
221
perlfunc
Perl Programmers Reference Guide
perlfunc
Entries are returned in an apparently random order. When the hash is entirely read, a null array
is returned in list context (which when assigned produces a FALSE () value), and undef in
scalar context. The next call to each() after that will start iterating again. There is a single
iterator for each hash, shared by all each(), keys(), and values() function calls in the
program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH
or values HASH. If you add or delete elements of a hash while you‘re iterating over it, you
may get entries skipped or duplicated, so don‘t.
The following prints out your environment like the printenv(1) program, only in a different
order:
while (($key,$value) = each %ENV) {
print "$key=$value\n";
}
See also keys() and values().
eof FILEHANDLE
eof ()
eof
Returns 1 if the next read on FILEHANDLE will return end of file, or if FILEHANDLE is not
open. FILEHANDLE may be an expression whose value gives the real filehandle. (Note that
this function actually reads a character and then ungetc()s it, so isn‘t very useful in an
interactive context.) Do not read from a terminal file (or call eof(FILEHANDLE) on it) after
end−of−file is reached. Filetypes such as terminals may lose the end−of−file condition if you
do.
An eof without an argument uses the last file read as argument. Using eof() with empty
parentheses is very different. It indicates the pseudo file formed of the files listed on the
command line, i.e., eof() is reasonable to use inside a while (<>) loop to detect the end of
only the last file. Use eof(ARGV) or eof without the parentheses to test EACH file in a while
(<>) loop. Examples:
# reset line numbering on each input file
while (<>) {
next if /^\s*#/;
# skip comments
print "$.\t$_";
} continue {
close ARGV if eof;
# Not eof()!
}
# insert dashes just before last line of last file
while (<>) {
if (eof()) {
# check for end of current file
print "−−−−−−−−−−−−−−\n";
close(ARGV);
# close or break; is needed if we
# are reading from the terminal
}
print;
}
Practical hint: you almost never need to use eof in Perl, because the input operators return false
values when they run out of data, or if there was an error.
eval EXPR
eval BLOCK
In the first form, the return value of EXPR is parsed and executed as if it were a little Perl
program. The value of the expression (which is itself determined within scalar context) is first
parsed, and if there weren‘t any errors, executed in the context of the current Perl program, so
that any variable settings or subroutine and format definitions remain afterwards. Note that the
222
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
value is parsed every time the eval executes. If EXPR is omitted, evaluates $_. This form is
typically used to delay parsing and subsequent execution of the text of EXPR until run time.
In the second form, the code within the BLOCK is parsed only once—at the same time the code
surrounding the eval itself was parsed—and executed within the context of the current Perl
program. This form is typically used to trap exceptions more efficiently than the first (see
below), while also providing the benefit of checking the code within BLOCK at compile time.
The final semicolon, if any, may be omitted from the value of EXPR or within the BLOCK.
In both forms, the value returned is the value of the last expression evaluated inside the
mini−program; a return statement may be also used, just as with subroutines. The expression
providing the return value is evaluated in void, scalar, or list context, depending on the context of
the eval itself. See /wantarray for more on how the evaluation context can be determined.
If there is a syntax error or runtime error, or a die() statement is executed, an undefined value
is returned by eval(), and $@ is set to the error message. If there was no error, $@ is
guaranteed to be a null string. Beware that using eval() neither silences perl from printing
warnings to STDERR, nor does it stuff the text of warning messages into $@. To do either of
those, you have to use the $SIG{__WARN__} facility. See /warn and perlvar.
Note that, because eval() traps otherwise−fatal errors, it is useful for determining whether a
particular feature (such as socket() or symlink()) is implemented. It is also Perl‘s
exception trapping mechanism, where the die operator is used to raise exceptions.
If the code to be executed doesn‘t vary, you may use the eval−BLOCK form to trap run−time
errors without incurring the penalty of recompiling each time. The error, if any, is still returned
in $@. Examples:
# make divide−by−zero nonfatal
eval { $answer = $a / $b; }; warn $@ if $@;
# same thing, but less efficient
eval ’$answer = $a / $b’; warn $@ if $@;
# a compile−time error
eval { $answer = };
# a run−time error
eval ’$answer =’;
# WRONG
# sets $@
When using the eval{} form as an exception trap in libraries, you may wish not to trigger any
__DIE__ hooks that user code may have installed. You can use the local
$SIG{__DIE__} construct for this purpose, as shown in this example:
# a very private exception trap for divide−by−zero
eval { local $SIG{’__DIE__’}; $answer = $a / $b; };
warn $@ if $@;
This is especially significant, given that __DIE__ hooks can call die() again, which has the
effect of changing their error messages:
# __DIE__ hooks may modify error messages
{
local $SIG{’__DIE__’} =
sub { (my $x = $_[0]) =~ s/foo/bar/g; die $x };
eval { die "foo lives here" };
print $@ if $@;
# prints "bar lives here"
}
With an eval(), you should be especially careful to remember what‘s being looked at when:
18−Oct−1998
Version 5.005_02
223
perlfunc
Perl Programmers Reference Guide
eval $x;
eval "$x";
perlfunc
# CASE 1
# CASE 2
eval ’$x’;
eval { $x };
# CASE 3
# CASE 4
eval "\$$x++";
$$x++;
# CASE 5
# CASE 6
Cases 1 and 2 above behave identically: they run the code contained in the variable $x.
(Although case 2 has misleading double quotes making the reader wonder what else might be
happening (nothing is).) Cases 3 and 4 likewise behave in the same way: they run the code
‘$x’, which does nothing but return the value of $x. (Case 4 is preferred for purely visual
reasons, but it also has the advantage of compiling at compile−time instead of at run−time.)
Case 5 is a place where normally you WOULD like to use double quotes, except that in this
particular situation, you can just use symbolic references instead, as in case 6.
exec LIST
exec PROGRAM LIST
The exec() function executes a system command AND NEVER RETURNS − use system()
instead of exec() if you want it to return. It fails and returns FALSE only if the command does
not exist and it is executed directly instead of via your system‘s command shell (see below).
Since it‘s a common mistake to use exec() instead of system(), Perl warns you if there is a
following statement which isn‘t die(), warn(), or exit() (if −w is set − but you always
do that). If you really want to follow an exec() with some other statement, you can use one of
these styles to avoid the warning:
exec (’foo’)
or print STDERR "couldn’t exec foo: $!";
{ exec (’foo’) }; print STDERR "couldn’t exec foo: $!";
If there is more than one argument in LIST, or if LIST is an array with more than one value, calls
execvp(3) with the arguments in LIST. If there is only one scalar argument or an array with one
element in it, the argument is checked for shell metacharacters, and if there are any, the entire
argument is passed to the system‘s command shell for parsing (this is /bin/sh −c on Unix
platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it
is split into words and passed directly to execvp(), which is more efficient. Note: exec()
and system() do not flush your output buffer, so you may need to set $| to avoid lost output.
Examples:
exec ’/bin/echo’, ’Your arguments are: ’, @ARGV;
exec "sort $outfile | uniq";
If you don‘t really want to execute the first argument, but want to lie to the program you are
executing about its own name, you can specify the program you actually want to run as an
"indirect object" (without a comma) in front of the LIST. (This always forces interpretation of
the LIST as a multivalued list, even if there is only a single scalar in the list.) Example:
$shell = ’/bin/csh’;
exec $shell ’−sh’;
# pretend it’s a login shell
or, more directly,
exec {’/bin/csh’} ’−sh’;
# pretend it’s a login shell
When the arguments get executed via the system shell, results will be subject to its quirks and
capabilities. See ‘STRING‘ in perlop for details.
Using an indirect object with exec() or system() is also more secure. This usage forces
interpretation of the arguments as a multivalued list, even if the list had just one argument. That
way you‘re safe from the shell expanding wildcards or splitting up words with whitespace in
224
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
them.
@args = ( "echo surprise" );
system @args;
system { $args[0] } @args;
# subject to shell escapes
# if @args == 1
# safe even with one−arg list
The first version, the one without the indirect object, ran the echo program, passing it
"surprise" an argument. The second version didn‘t—it tried to run a program literally called
"echo surprise", didn‘t find it, and set $? to a non−zero value indicating failure.
Note that exec() will not call your END blocks, nor will it call any DESTROY methods in your
objects.
exists EXPR
Returns TRUE if the specified hash key exists in its hash array, even if the corresponding value
is undefined.
print "Exists\n" if exists $array{$key};
print "Defined\n" if defined $array{$key};
print "True\n" if $array{$key};
A hash element can be TRUE only if it‘s defined, and defined if it exists, but the reverse doesn‘t
necessarily hold true.
Note that the EXPR can be arbitrarily complicated as long as the final operation is a hash key
lookup:
if (exists $ref−>{"A"}{"B"}{$key}) { ... }
Although the last element will not spring into existence just because its existence was tested,
intervening ones will. Thus $ref−>{"A"} $ref−>{"B"} will spring into existence due to
the existence test for a $key element. This autovivification may be fixed in a later release.
exit EXPR
Evaluates EXPR and exits immediately with that value. (Actually, it calls any defined END
routines first, but the END routines may not abort the exit. Likewise any object destructors that
need to be called are called before exit.) Example:
$ans = ;
exit 0 if $ans =~ /^[Xx]/;
See also die(). If EXPR is omitted, exits with status. The only universally portable values
for EXPR are for success and 1 for error; all other values are subject to unpredictable
interpretation depending on the environment in which the Perl program is running.
You shouldn‘t use exit() to abort a subroutine if there‘s any chance that someone might want
to trap whatever error happened. Use die() instead, which can be trapped by an eval().
All END{} blocks are run at exit time. See perlsub for details.
exp EXPR
exp
Returns e (the natural logarithm base) to the power of EXPR. If EXPR is omitted, gives
exp($_).
fcntl FILEHANDLE,FUNCTION,SCALAR
Implements the fcntl(2) function. You‘ll probably have to say
use Fcntl;
first to get the correct constant definitions. Argument processing and value return works just like
ioctl() below. For example:
18−Oct−1998
Version 5.005_02
225
perlfunc
Perl Programmers Reference Guide
perlfunc
use Fcntl;
fcntl($filehandle, F_GETFL, $packed_return_buffer)
or die "can’t fcntl F_GETFL: $!";
You don‘t have to check for defined() on the return from fnctl(). Like ioctl(), it
maps a return from the system call into " but true" in Perl. This string is true in boolean
context and in numeric context. It is also exempt from the normal −w warnings on improper
numeric conversions.
Note that fcntl() will produce a fatal error if used on a machine that doesn‘t implement
fcntl(2).
fileno FILEHANDLE
Returns the file descriptor for a filehandle. This is useful for constructing bitmaps for
select() and low−level POSIX tty−handling operations. If FILEHANDLE is an expression,
the value is taken as an indirect filehandle, generally its name.
You can use this to find out whether two handles refer to the same underlying descriptor:
if (fileno(THIS) == fileno(THAT)) {
print "THIS and THAT are dups\n";
}
flock FILEHANDLE,OPERATION
Calls flock(2), or an emulation of it, on FILEHANDLE. Returns TRUE for success, FALSE on
failure. Produces a fatal error if used on a machine that doesn‘t implement flock(2), fcntl(2)
locking, or lockf(3). flock() is Perl‘s portable file locking interface, although it locks only
entire files, not records.
On many platforms (including most versions or clones of Unix), locks established by flock()
are merely advisory. Such discretionary locks are more flexible, but offer fewer guarantees.
This means that files locked with flock() may be modified by programs that do not also use
flock(). Windows NT and OS/2 are among the platforms which enforce mandatory locking.
See your local documentation for details.
OPERATION is one of LOCK_SH, LOCK_EX, or LOCK_UN, possibly combined with
LOCK_NB. These constants are traditionally valued 1, 2, 8 and 4, but you can use the symbolic
names if import them from the Fcntl module, either individually, or as a group using the ‘:flock’
tag. LOCK_SH requests a shared lock, LOCK_EX requests an exclusive lock, and LOCK_UN
releases a previously requested lock. If LOCK_NB is added to LOCK_SH or LOCK_EX then
flock() will return immediately rather than blocking waiting for the lock (check the return
status to see if you got it).
To avoid the possibility of mis−coordination, Perl flushes FILEHANDLE before (un)locking it.
Note that the emulation built with lockf(3) doesn‘t provide shared locks, and it requires that
FILEHANDLE be open with write intent. These are the semantics that lockf(3) implements.
Most (all?) systems implement lockf(3) in terms of fcntl(2) locking, though, so the differing
semantics shouldn‘t bite too many people.
Note also that some versions of flock() cannot lock things over the network; you would need
to use the more system−specific fcntl() for that. If you like you can force Perl to ignore your
system‘s flock(2) function, and so provide its own fcntl(2)−based emulation, by passing the
switch −Ud_flock to the Configure program when you configure perl.
Here‘s a mailbox appender for BSD systems.
use Fcntl ’:flock’; # import LOCK_* constants
sub lock {
flock(MBOX,LOCK_EX);
226
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
# and, in case someone appended
# while we were waiting...
seek(MBOX, 0, 2);
}
sub unlock {
flock(MBOX,LOCK_UN);
}
open(MBOX, ">>/usr/spool/mail/$ENV{’USER’}")
or die "Can’t open mailbox: $!";
lock();
print MBOX $msg,"\n\n";
unlock();
See also DB_File for other flock() examples.
fork
Does a fork(2) system call. Returns the child pid to the parent process, to the child process, or
undef if the fork is unsuccessful.
Note: unflushed buffers remain unflushed in both processes, which means you may need to set
$| ($AUTOFLUSH in English) or call the autoflush() method of IO::Handle to avoid
duplicate output.
If you fork() without ever waiting on your children, you will accumulate zombies:
$SIG{CHLD} = sub { wait };
There‘s also the double−fork trick (error checking on fork() returns omitted);
unless ($pid = fork) {
unless (fork) {
exec "what you really wanna do";
die "no exec";
# ... or ...
## (some_perl_code_here)
exit 0;
}
exit 0;
}
waitpid($pid,0);
See also perlipc for more examples of forking and reaping moribund children.
Note that if your forked child inherits system file descriptors like STDIN and STDOUT that are
actually connected by a pipe or socket, even if you exit, then the remote server (such as, say,
httpd or rsh) won‘t think you‘re done. You should reopen those to /dev/null if it‘s any issue.
format
Declare a picture format for use by the write() function. For example:
format Something =
Test: @<<<<<<<< @||||| @>>>>>
$str,
$%,
’$’ . int($num)
.
$str = "widget";
$num = $cost/$quantity;
$~ = ’Something’;
write;
See perlform for many details and examples.
18−Oct−1998
Version 5.005_02
227
perlfunc
Perl Programmers Reference Guide
perlfunc
formline PICTURE,LIST
This is an internal function used by formats, though you may call it, too. It formats (see
perlform) a list of values according to the contents of PICTURE, placing the output into the
format output accumulator, $^A (or $ACCUMULATOR in English). Eventually, when a
write() is done, the contents of $^A are written to some filehandle, but you could also read
$^A yourself and then set $^A back to "". Note that a format typically does one formline()
per line of form, but the formline() function itself doesn‘t care how many newlines are
embedded in the PICTURE. This means that the ~ and ~~ tokens will treat the entire PICTURE
as a single line. You may therefore need to use multiple formlines to implement a single record
format, just like the format compiler.
Be careful if you put double quotes around the picture, because an "@" character may be taken to
mean the beginning of an array name. formline() always returns TRUE. See perlform for
other examples.
getc FILEHANDLE
getc
Returns the next character from the input file attached to FILEHANDLE, or the undefined value
at end of file, or if there was an error. If FILEHANDLE is omitted, reads from STDIN. This is
not particularly efficient. It cannot be used to get unbuffered single−characters, however. For
that, try something more like:
if ($BSD_STYLE) {
system "stty cbreak /dev/tty 2>&1";
}
else {
system "stty", ’−icanon’, ’eol’, "\001";
}
$key = getc(STDIN);
if ($BSD_STYLE) {
system "stty −cbreak /dev/tty 2>&1";
}
else {
system "stty", ’icanon’, ’eol’, ’^@’; # ASCII null
}
print "\n";
Determination of whether $BSD_STYLE should be set is left as an exercise to the reader.
The POSIX::getattr() function can do this more portably on systems purporting POSIX
compliance. See also the Term::ReadKey module from your nearest CPAN site; details on
CPAN can be found on CPAN.
getlogin
Implements the C library function of the same name, which on most systems returns the current
login from /etc/utmp, if any. If null, use getpwuid().
$login = getlogin || getpwuid($<) || "Kilroy";
Do not consider getlogin() for authentication: it is not as secure as getpwuid().
getpeername SOCKET
Returns the packed sockaddr address of other end of the SOCKET connection.
use Socket;
$hersockaddr
($port, $iaddr)
$herhostname
$herstraddr
228
=
=
=
=
getpeername(SOCK);
unpack_sockaddr_in($hersockaddr);
gethostbyaddr($iaddr, AF_INET);
inet_ntoa($iaddr);
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
getpgrp PID
Returns the current process group for the specified PID. Use a PID of to get the current
process group for the current process. Will raise an exception if used on a machine that doesn‘t
implement getpgrp(2). If PID is omitted, returns process group of current process. Note that the
POSIX version of getpgrp() does not accept a PID argument, so only PID==0 is truly
portable.
getppid
Returns the process id of the parent process.
getpriority WHICH,WHO
Returns the current priority for a process, a process group, or a user. (See getpriority(2).) Will
raise a fatal exception if used on a machine that doesn‘t implement getpriority(2).
getpwnam NAME
getgrnam NAME
gethostbyname NAME
getnetbyname NAME
getprotobyname NAME
getpwuid UID
getgrgid GID
getservbyname NAME,PROTO
gethostbyaddr ADDR,ADDRTYPE
getnetbyaddr ADDR,ADDRTYPE
getprotobynumber NUMBER
getservbyport PORT,PROTO
getpwent
getgrent
gethostent
getnetent
getprotoent
getservent
setpwent
setgrent
sethostent STAYOPEN
setnetent STAYOPEN
setprotoent STAYOPEN
setservent STAYOPEN
endpwent
endgrent
endhostent
endnetent
endprotoent
endservent
These routines perform the same functions as their counterparts in the system library. In list
context, the return values from the various get routines are as follows:
($name,$passwd,$uid,$gid,
$quota,$comment,$gcos,$dir,$shell,$expire) = getpw*
($name,$passwd,$gid,$members) = getgr*
($name,$aliases,$addrtype,$length,@addrs) = gethost*
($name,$aliases,$addrtype,$net) = getnet*
($name,$aliases,$proto) = getproto*
($name,$aliases,$port,$proto) = getserv*
(If the entry doesn‘t exist you get a null list.)
18−Oct−1998
Version 5.005_02
229
perlfunc
Perl Programmers Reference Guide
perlfunc
In scalar context, you get the name, unless the function was a lookup by name, in which case you
get the other thing, whatever it is. (If the entry doesn‘t exist you get the undefined value.) For
example:
$uid
$name
$name
$gid
$name
$name
#etc.
=
=
=
=
=
=
getpwnam($name);
getpwuid($num);
getpwent();
getgrnam($name);
getgrgid($num;
getgrent();
In getpw*() the fields $quota, $comment, and $expire are special cases in the sense
that in many systems they are unsupported. If the $quota is unsupported, it is an empty scalar.
If it is supported, it usually encodes the disk quota. If the $comment field is unsupported, it is
an empty scalar. If it is supported it usually encodes some administrative comment about the
user. In some systems the $quota field may be $change or $age, fields that have to do
with password aging. In some systems the $comment field may be $class. The $expire
field, if present, encodes the expiration period of the account or the password. For the
availability and the exact meaning of these fields in your system, please consult your
getpwnam(3) documentation and your pwd.h file. You can also find out from within Perl which
meaning your $quota and $comment fields have and whether you have the $expire field
by using the Config module and the values d_pwquota, d_pwage, d_pwchange,
d_pwcomment, and d_pwexpire.
The $members value returned by getgr*() is a space separated list of the login names of the
members of the group.
For the gethost*() functions, if the h_errno variable is supported in C, it will be returned
to you via $? if the function call fails. The @addrs value returned by a successful call is a list
of the raw addresses returned by the corresponding system library call. In the Internet domain,
each address is four bytes long and you can unpack it by saying something like:
($a,$b,$c,$d) = unpack(’C4’,$addr[0]);
If you get tired of remembering which element of the return list contains which return value,
by−name interfaces are also provided in modules: File::stat, Net::hostent,
Net::netent, Net::protoent, Net::servent, Time::gmtime,
Time::localtime, and User::grent. These override the normal built−in, replacing them
with versions that return objects with the appropriate names for each field. For example:
use File::stat;
use User::pwent;
$is_his = (stat($filename)−>uid == pwent($whoever)−>uid);
Even though it looks like they‘re the same method calls (uid),
File::stat object is different from a User::pwent object.
they aren‘t, because a
getsockname SOCKET
Returns the packed sockaddr address of this end of the SOCKET connection.
use Socket;
$mysockaddr = getsockname(SOCK);
($port, $myaddr) = unpack_sockaddr_in($mysockaddr);
getsockopt SOCKET,LEVEL,OPTNAME
Returns the socket option requested, or undef if there is an error.
230
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
glob EXPR
glob
Returns the value of EXPR with filename expansions such as the standard Unix shell /bin/sh
would do. This is the internal function implementing the <*.c> operator, but you can use it
directly. If EXPR is omitted, $_ is used. The <*.c> operator is discussed in more detail in
I/O Operators in perlop.
gmtime EXPR
Converts a time as returned by the time function to a 9−element array with the time localized for
the standard Greenwich time zone. Typically used as follows:
# 0
1
2
3
4
5
6
7
8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
gmtime(time);
All array elements are numeric, and come straight out of a struct tm. In particular this means that
$mon has the range 0..11 and $wday has the range 0..6 with sunday as day . Also,
$year is the number of years since 1900, that is, $year is 123 in year 2023, not simply the
last two digits of the year.
If EXPR is omitted, does gmtime(time()).
In scalar context, returns the ctime(3) value:
$now_string = gmtime;
# e.g., "Thu Oct 13 04:54:34 1994"
Also see the timegm() function provided by the Time::Local module, and the strftime(3)
function available via the POSIX module.
This scalar value is not locale dependent, see perllocale, but instead a Perl builtin. Also see the
Time::Local module, and the strftime(3) and mktime(3) function available via the POSIX
module. To get somewhat similar but locale dependent date strings, set up your locale
environment variables appropriately (please see perllocale) and try for example:
use POSIX qw(strftime);
$now_string = strftime "%a %b %e %H:%M:%S %Y", gmtime;
Note that the %a and %b, the short forms of the day of the week and the month of the year, may
not necessarily be three characters wide.
goto LABEL
goto EXPR
goto &NAME
The goto−LABEL form finds the statement labeled with LABEL and resumes execution there.
It may not be used to go into any construct that requires initialization, such as a subroutine or a
foreach loop. It also can‘t be used to go into a construct that is optimized away, or to get out
of a block or subroutine given to sort(). It can be used to go almost anywhere else within the
dynamic scope, including out of subroutines, but it‘s usually better to use some other construct
such as last or die(). The author of Perl has never felt the need to use this form of goto (in
Perl, that is—C is another matter).
The goto−EXPR form expects a label name, whose scope will be resolved dynamically. This
allows for computed gotos per FORTRAN, but isn‘t necessarily recommended if you‘re
optimizing for maintainability:
goto ("FOO", "BAR", "GLARCH")[$i];
The goto−&NAME form is highly magical, and substitutes a call to the named subroutine for the
currently running subroutine. This is used by AUTOLOAD subroutines that wish to load another
subroutine and then pretend that the other subroutine had been called in the first place (except
that any modifications to @_ in the current subroutine are propagated to the other subroutine.)
After the goto, not even caller() will be able to tell that this routine was called first.
18−Oct−1998
Version 5.005_02
231
perlfunc
Perl Programmers Reference Guide
perlfunc
grep BLOCK LIST
grep EXPR,LIST
This is similar in spirit to, but not the same as, grep(1) and its relatives. In particular, it is not
limited to using regular expressions.
Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element)
and returns the list value consisting of those elements for which the expression evaluated to
TRUE. In a scalar context, returns the number of times the expression was TRUE.
@foo = grep(!/^#/, @bar);
# weed out comments
or equivalently,
@foo = grep {!/^#/} @bar;
# weed out comments
Note that, because $_ is a reference into the list value, it can be used to modify the elements of
the array. While this is useful and supported, it can cause bizarre results if the LIST is not a
named array. Similarly, grep returns aliases into the original list, much like the way that a for
loop‘s index variable aliases the list elements. That is, modifying an element of a list returned by
grep (for example, in a foreach, map() or another grep()) actually modifies the element in
the original list.
See also /map for an array composed of the results of the BLOCK or EXPR.
hex EXPR
hex
Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that
might start with either 0 or 0x see /oct.) If EXPR is omitted, uses $_.
print hex ’0xAf’; # prints ’175’
print hex ’aF’;
# same
import
There is no builtin import() function. It is just an ordinary method (subroutine) defined (or
inherited) by modules that wish to export names to another module. The use() function calls
the import() method for the package used. See also /use(), perlmod, and Exporter.
index STR,SUBSTR,POSITION
index STR,SUBSTR
Returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. The return value is
based at (or whatever you‘ve set the $[ variable to—but don‘t do that). If the substring is not
found, returns one less than the base, ordinarily −1.
int EXPR
int
Returns the integer portion of EXPR. If EXPR is omitted, uses $_. You should not use this for
rounding, because it truncates towards , and because machine representations of floating point
numbers can sometimes produce counterintuitive results. Usually sprintf() or printf(),
or the POSIX::floor or POSIX::ceil functions, would serve you better.
ioctl FILEHANDLE,FUNCTION,SCALAR
Implements the ioctl(2) function. You‘ll probably have to say
require "ioctl.ph"; # probably in /usr/local/lib/perl/ioctl.ph
first to get the correct function definitions. If ioctl.ph doesn‘t exist or doesn‘t have the correct
definitions you‘ll have to roll your own, based on your C header files such as .
(There is a Perl script called h2ph that comes with the Perl kit that may help you in this, but it‘s
nontrivial.) SCALAR will be read and/or written depending on the FUNCTION—a pointer to
the string value of SCALAR will be passed as the third argument of the actual ioctl() call.
(If SCALAR has no string value but does have a numeric value, that value will be passed rather
than a pointer to the string value. To guarantee this to be TRUE, add a to the scalar before
232
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
using it.) The pack() and unpack() functions are useful for manipulating the values of
structures used by ioctl(). The following example sets the erase character to DEL.
require ’ioctl.ph’;
$getp = &TIOCGETP;
die "NO TIOCGETP" if $@ || !$getp;
$sgttyb_t = "ccccs";
# 4 chars and a short
if (ioctl(STDIN,$getp,$sgttyb)) {
@ary = unpack($sgttyb_t,$sgttyb);
$ary[2] = 127;
$sgttyb = pack($sgttyb_t,@ary);
ioctl(STDIN,&TIOCSETP,$sgttyb)
|| die "Can’t ioctl: $!";
}
The return value of ioctl() (and fcntl()) is as follows:
if OS returns:
−1
0
anything else
then Perl returns:
undefined value
string "0 but true"
that number
Thus Perl returns TRUE on success and FALSE on failure, yet you can still easily determine the
actual value returned by the operating system:
($retval = ioctl(...)) || ($retval = −1);
printf "System returned %d\n", $retval;
The special string " but true" is excempt from −w complaints about improper numeric
conversions.
join EXPR,LIST
Joins the separate strings of LIST into a single string with fields separated by the value of EXPR,
and returns the string. Example:
$_ = join(’:’, $login,$passwd,$uid,$gid,$gcos,$home,$shell);
See /split.
keys HASH
Returns a list consisting of all the keys of the named hash. (In a scalar context, returns the
number of keys.) The keys are returned in an apparently random order, but it is the same order
as either the values() or each() function produces (given that the hash has not been
modified). As a side effect, it resets HASH‘s iterator.
Here is yet another way to print your environment:
@keys = keys %ENV;
@values = values %ENV;
while ($#keys >= 0) {
print pop(@keys), ’=’, pop(@values), "\n";
}
or how about sorted by key:
foreach $key (sort(keys %ENV)) {
print $key, ’=’, $ENV{$key}, "\n";
}
To sort an array by value, you‘ll need to use a sort() function. Here‘s a descending numeric
sort of a hash by its values:
18−Oct−1998
Version 5.005_02
233
perlfunc
Perl Programmers Reference Guide
perlfunc
foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash) {
printf "%4d %s\n", $hash{$key}, $key;
}
As an lvalue keys() allows you to increase the number of hash buckets allocated for the given
hash. This can gain you a measure of efficiency if you know the hash is going to get big. (This
is similar to pre−extending an array by assigning a larger number to $#array.) If you say
keys %hash = 200;
then %hash will have at least 200 buckets allocated for it—256 of them, in fact, since it rounds
up to the next power of two. These buckets will be retained even if you do %hash = (), use
undef %hash if you want to free the storage while %hash is still in scope. You can‘t shrink
the number of buckets allocated for the hash using keys() in this way (but you needn‘t worry
about doing this by accident, as trying has no effect).
kill LIST
Sends a signal to a list of processes. The first element of the list must be the signal to send.
Returns the number of processes successfully signaled.
$cnt = kill 1, $child1, $child2;
kill 9, @goners;
Unlike in the shell, in Perl if the SIGNAL is negative, it kills process groups instead of processes.
(On System V, a negative PROCESS number will also kill process groups, but that‘s not
portable.) That means you usually want to use positive not negative signals. You may also use a
signal name in quotes. See Signals in perlipc for details.
last LABEL
last
The last command is like the break statement in C (as used in loops); it immediately exits
the loop in question. If the LABEL is omitted, the command refers to the innermost enclosing
loop. The continue block, if any, is not executed:
LINE: while () {
last LINE if /^$/;
#...
}
# exit when done with header
See also /continue for an illustration of how last, next, and redo work.
lc EXPR
lc
Returns an lowercased version of EXPR. This is the internal function implementing the \L
escape in double−quoted strings. Respects current LC_CTYPE locale if use locale in force.
See perllocale.
If EXPR is omitted, uses $_.
lcfirst EXPR
lcfirst
Returns the value of EXPR with the first character lowercased. This is the internal function
implementing the \l escape in double−quoted strings. Respects current LC_CTYPE locale if
use locale in force. See perllocale.
If EXPR is omitted, uses $_.
length EXPR
length
Returns the length in bytes of the value of EXPR. If EXPR is omitted, returns length of $_.
link OLDFILE,NEWFILE
Creates a new filename linked to the old filename.
otherwise.
234
Version 5.005_02
Returns TRUE for success, FALSE
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
listen SOCKET,QUEUESIZE
Does the same thing that the listen system call does. Returns TRUE if it succeeded, FALSE
otherwise. See example in Sockets: Client/Server Communication in perlipc.
local EXPR
A local modifies the listed variables to be local to the enclosing block, file, or eval. If more than
one value is listed, the list must be placed in parentheses. See
"Temporary Values via local()" for details, including issues with tied arrays and hashes.
You really probably want to be using my() instead, because local() isn‘t what most people
think of as "local". See "Private Variables via my()" for details.
localtime EXPR
Converts a time as returned by the time function to a 9−element array with the time analyzed for
the local time zone. Typically used as follows:
# 0
1
2
3
4
5
6
7
8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(time);
All array elements are numeric, and come straight out of a struct tm. In particular this means that
$mon has the range 0..11 and $wday has the range 0..6 with sunday as day . Also,
$year is the number of years since 1900, that is, $year is 123 in year 2023, and not simply
the last two digits of the year.
If EXPR is omitted, uses the current time (localtime(time)).
In scalar context, returns the ctime(3) value:
$now_string = localtime;
# e.g., "Thu Oct 13 04:54:34 1994"
This scalar value is not locale dependent, see perllocale, but instead a Perl builtin. Also see the
Time::Local module, and the strftime(3) and mktime(3) function available via the POSIX
module. To get somewhat similar but locale dependent date strings, set up your locale
environment variables appropriately (please see perllocale) and try for example:
use POSIX qw(strftime);
$now_string = strftime "%a %b %e %H:%M:%S %Y", localtime;
Note that the %a and %b, the short forms of the day of the week and the month of the year, may
not necessarily be three characters wide.
log EXPR
log
Returns the natural logarithm (base e) of EXPR. If EXPR is omitted, returns log of $_.
lstat FILEHANDLE
lstat EXPR
lstat
Does the same thing as the stat() function (including setting the special _ filehandle) but stats
a symbolic link instead of the file the symbolic link points to. If symbolic links are
unimplemented on your system, a normal stat() is done.
If EXPR is omitted, stats $_.
m//
The match operator. See perlop.
map BLOCK LIST
map EXPR,LIST
Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element)
and returns the list value composed of the results of each such evaluation. Evaluates BLOCK or
EXPR in a list context, so each element of LIST may produce zero, one, or more elements in the
returned value.
18−Oct−1998
Version 5.005_02
235
perlfunc
Perl Programmers Reference Guide
perlfunc
@chars = map(chr, @nums);
translates a list of numbers to the corresponding characters. And
%hash = map { getkey($_) => $_ } @array;
is just a funny way to write
%hash = ();
foreach $_ (@array) {
$hash{getkey($_)} = $_;
}
Note that, because $_ is a reference into the list value, it can be used to modify the elements of
the array. While this is useful and supported, it can cause bizarre results if the LIST is not a
named array. See also /grep for an array composed of those items of the original list for which
the BLOCK or EXPR evaluates to true.
mkdir FILENAME,MODE
Creates the directory specified by FILENAME, with permissions specified by MODE (as
modified by umask). If it succeeds it returns TRUE, otherwise it returns FALSE and sets $!
(errno).
msgctl ID,CMD,ARG
Calls the System V IPC function msgctl(2). You‘ll probably have to say
use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT, then ARG must be a variable
which will hold the returned msqid_ds structure. Returns like ioctl(): the undefined value
for error, " but true" for zero, or the actual return value otherwise. See also IPC::SysV and
IPC::Semaphore::Msg documentation.
msgget KEY,FLAGS
Calls the System V IPC function msgget(2). Returns the message queue id, or the undefined
value if there is an error. See also IPC::SysV and IPC::SysV::Msg documentation.
msgsnd ID,MSG,FLAGS
Calls the System V IPC function msgsnd to send the message MSG to the message queue ID.
MSG must begin with the long integer message type, which may be created with pack("l",
$type). Returns TRUE if successful, or FALSE if there is an error. See also IPC::SysV
and IPC::SysV::Msg documentation.
msgrcv ID,VAR,SIZE,TYPE,FLAGS
Calls the System V IPC function msgrcv to receive a message from message queue ID into
variable VAR with a maximum message size of SIZE. Note that if a message is received, the
message type will be the first thing in VAR, and the maximum length of VAR is SIZE plus the
size of the message type. Returns TRUE if successful, or FALSE if there is an error. See also
IPC::SysV and IPC::SysV::Msg documentation.
my EXPR
A my() declares the listed variables to be local (lexically) to the enclosing block, file, or
eval(). If more than one value is listed, the list must be placed in parentheses. See
"Private Variables via my()" for details.
next LABEL
next
The next command is like the continue statement in C; it starts the next iteration of the loop:
LINE: while () {
next LINE if /^#/;
236
# discard comments
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
#...
}
Note that if there were a continue block on the above, it would get executed even on
discarded lines. If the LABEL is omitted, the command refers to the innermost enclosing loop.
See also /continue for an illustration of how last, next, and redo work.
no Module LIST
See the /use function, which no is the opposite of.
oct EXPR
oct
Interprets EXPR as an octal string and returns the corresponding value. (If EXPR happens to
start off with 0x, interprets it as a hex string instead.) The following will handle decimal, octal,
and hex in the standard Perl or C notation:
$val = oct($val) if $val =~ /^0/;
If EXPR is omitted, uses $_. This function is commonly used when a string such as 644 needs
to be converted into a file mode, for example. (Although perl will automatically convert strings
into numbers as needed, this automatic conversion assumes base 10.)
open FILEHANDLE,EXPR
open FILEHANDLE
Opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. If
FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted. If
EXPR is omitted, the scalar variable of the same name as the FILEHANDLE contains the
filename. (Note that lexical variables—those declared with my()—will not work for this
purpose; so if you‘re using my(), specify EXPR in your call to open.)
If the filename begins with ‘<’ or nothing, the file is opened for input. If the filename begins
with ‘>’, the file is truncated and opened for output, being created if necessary. If the filename
begins with ‘>>’, the file is opened for appending, again being created if necessary. You can
put a ‘+’ in front of the ‘>’ or ‘<’ to indicate that you want both read and write access to the
file; thus ‘+<’ is almost always preferred for read/write updates—the ‘+>’ mode would
clobber the file first. You can‘t usually use either read−write mode for updating textfiles, since
they have variable length records. See the −i switch in perlrun for a better approach.
The prefix and the filename may be separated with spaces. These various prefixes correspond to
the fopen(3) modes of ‘r’, ‘r+’, ‘w’, ‘w+’, ‘a’, and ‘a+’.
If the filename begins with ‘|’, the filename is interpreted as a command to which output is to
be piped, and if the filename ends with a ‘|’, the filename is interpreted See "Using open()
for IPC" for more examples of this. (You are not allowed to open() to a command that pipes
both in and out, but see IPC::Open2, IPC::Open3, and Bidirectional Communication in perlipc
for alternatives.)
Opening ‘−’ opens STDIN and opening ‘>−’ opens STDOUT. Open returns nonzero upon
success, the undefined value otherwise. If the open() involved a pipe, the return value happens
to be the pid of the subprocess.
If you‘re unfortunate enough to be running Perl on a system that distinguishes between text files
and binary files (modern operating systems don‘t care), then you should check out /binmode for
tips for dealing with this. The key distinction between systems that need binmode() and those
that don‘t is their text file formats. Systems like Unix, MacOS, and Plan9, which delimit lines
with a single character, and which encode that character in C as "\n", do not need
binmode(). The rest need it.
When opening a file, it‘s usually a bad idea to continue normal execution if the request failed, so
open() is frequently used in connection with die(). Even if die() won‘t do what you want
(say, in a CGI script, where you want to make a nicely formatted error message (but there are
18−Oct−1998
Version 5.005_02
237
perlfunc
Perl Programmers Reference Guide
perlfunc
modules that can help with that problem)) you should always check the return value from
opening a file. The infrequent exception is when working with an unopened filehandle is actually
what you want to do.
Examples:
$ARTICLE = 100;
open ARTICLE or die "Can’t find article $ARTICLE: $!\n";
while () {...
open(LOG, ’>>/usr/spool/news/twitlog’); # (log is reserved)
# if the open fails, output is discarded
open(DBASE, ’+/tmp/Tmp$$")
or die "Can’t start sort: $!";
# $$ is our process id
# process argument list of files along with any includes
foreach $file (@ARGV) {
process($file, ’fh00’);
}
sub process {
my($filename, $input) = @_;
$input++;
# this is a string increment
unless (open($input, $filename)) {
print STDERR "Can’t open $filename: $!\n";
return;
}
local $_;
while (<$input>) {
# note use of indirection
if (/^#include "(.*)"/) {
process($1, $input);
next;
}
#...
# whatever
}
}
You may also, in the Bourne shell tradition, specify an EXPR beginning with ‘>&’, in which
case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric)
to be duped and opened. You may use & after >, >>, <, +>, +>>, and +<. The mode you
specify should match the mode of the original filehandle. (Duping a filehandle does not take into
account any existing contents of stdio buffers.) Here is a script that saves, redirects, and restores
STDOUT and STDERR:
#!/usr/bin/perl
open(OLDOUT, ">&STDOUT");
open(OLDERR, ">&STDERR");
open(STDOUT, ">foo.out") || die "Can’t redirect stdout";
open(STDERR, ">&STDOUT") || die "Can’t dup stdout";
select(STDERR); $| = 1;
238
# make unbuffered
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
select(STDOUT); $| =#1;
make unbuffered
print STDOUT "stdout 1\n";
print STDERR "stderr 1\n";
# this works for
# subprocesses too
close(STDOUT);
close(STDERR);
open(STDOUT, ">&OLDOUT");
open(STDERR, ">&OLDERR");
print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";
If you specify ‘<&=N’, where N is a number, then Perl will do an equivalent of C‘s fdopen()
of that file descriptor; this is more parsimonious of file descriptors. For example:
open(FILEHANDLE, "<&=$fd")
If you open a pipe on the command ‘−’, i.e., either ‘|−’ or ‘−|’, then there is an implicit fork
done, and the return value of open is the pid of the child within the parent process, and within
the child process. (Use defined($pid) to determine whether the open was successful.) The
filehandle behaves normally for the parent, but i/o to that filehandle is piped from/to the
STDOUT/STDIN of the child process. In the child process the filehandle isn‘t opened—i/o
happens from/to the new STDOUT or STDIN. Typically this is used like the normal piped open
when you want to exercise more control over just how the pipe command gets executed, such as
when you are running setuid, and don‘t want to have to scan shell commands for metacharacters.
The following pairs are more or less equivalent:
open(FOO, "|tr ’[a−z]’ ’[A−Z]’");
open(FOO, "|−") || exec ’tr’, ’[a−z]’, ’[A−Z]’;
open(FOO, "cat −n ’$file’|");
open(FOO, "−|") || exec ’cat’, ’−n’, $file;
See Safe Pipe Opens in perlipc for more examples of this.
NOTE: On any operation that may do a fork, any unflushed buffers remain unflushed in both
processes, which means you may need to set $| to avoid duplicate output.
Closing any piped filehandle causes the parent process to wait for the child to finish, and returns
the status value in $?.
The filename passed to open will have leading and trailing whitespace deleted, and the normal
redirection characters honored. This property, known as "magic open", can often be used to
good effect. A user could specify a filename of "rsh cat file |", or you could change certain
filenames as needed:
$filename =~ s/(.*\.gz)\s*$/gzip −dc < $1|/;
open(FH, $filename) or die "Can’t open $filename: $!";
However, to open a file with arbitrary weird characters in it, it‘s necessary to protect any leading
and trailing whitespace:
$file =~ s#^(\s)#./$1#;
open(FOO, "< $file\0");
If you want a "real" C open() (see open(2) on your system), then you should use the
sysopen() function, which involves no such magic. This is another way to protect your
filenames from interpretation. For example:
use IO::Handle;
sysopen(HANDLE, $path, O_RDWR|O_CREAT|O_EXCL)
18−Oct−1998
Version 5.005_02
239
perlfunc
Perl Programmers Reference Guide
perlfunc
or die "sysopen $path: $!";
$oldfh = select(HANDLE); $| = 1; select($oldfh);
print HANDLE "stuff $$\n");
seek(HANDLE, 0, 0);
print "File contains: ", ;
Using the constructor from the IO::Handle package (or one of its subclasses, such as
IO::File or IO::Socket), you can generate anonymous filehandles that have the scope of
whatever variables hold references to them, and automatically close whenever and however you
leave that scope:
use IO::File;
#...
sub read_myfile_munged {
my $ALL = shift;
my $handle = new IO::File;
open($handle, "myfile") or die "myfile: $!";
$first = <$handle>
or return ();
# Automatically closed
mung $first or die "mung failed";
# Or
return $first, <$handle> if $ALL;
# Or
$first;
# Or
}
here.
here.
here.
here.
See /seek() for some details about mixing reading and writing.
opendir DIRHANDLE,EXPR
Opens a directory named EXPR for processing by readdir(), telldir(), seekdir(),
rewinddir(), and closedir(). Returns TRUE if successful. DIRHANDLEs have their
own namespace separate from FILEHANDLEs.
ord EXPR
ord
Returns the numeric ascii value of the first character of EXPR. If EXPR is omitted, uses $_.
For the reverse, see /chr.
pack TEMPLATE,LIST
Takes an array or list of values and packs it into a binary structure, returning the string
containing the structure. The TEMPLATE is a sequence of characters that give the order and
type of values, as follows:
240
A
a
b
B
h
H
An ascii string, will be space padded.
An ascii string, will be null padded.
A bit string (ascending bit order, like vec()).
A bit string (descending bit order).
A hex string (low nybble first).
A hex string (high nybble first).
c
C
A signed char value.
An unsigned char value.
s
S
A signed short value.
An unsigned short value.
(This ’short’ is _exactly_ 16 bits, which may differ from
what a local C compiler calls ’short’.)
i
I
A signed integer value.
An unsigned integer value.
(This ’integer’ is _at_least_ 32 bits wide. Its exact
size depends on what a local C compiler calls ’int’,
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
and may even be larger than the ’long’ described in
the next item.)
l
L
A signed long value.
An unsigned long value.
(This ’long’ is _exactly_ 32 bits, which may differ from
what a local C compiler calls ’long’.)
n
N
v
V
A
A
A
A
f
d
A single−precision float in the native format.
A double−precision float in the native format.
p
P
A pointer to a null−terminated string.
A pointer to a structure (fixed−length string).
u
A uuencoded string.
w
A BER compressed integer. Its bytes represent an unsigned
integer in base 128, most significant digit first, with as
few digits as possible. Bit eight (the high bit) is set
on each byte except the last.
x
X
@
A null byte.
Back up a byte.
Null fill to absolute position.
short in "network" (big−endian) order.
long in "network" (big−endian) order.
short in "VAX" (little−endian) order.
long in "VAX" (little−endian) order.
(These ’shorts’ and ’longs’ are _exactly_ 16 bits and
_exactly_ 32 bits, respectively.)
Each letter may optionally be followed by a number giving a repeat count. With all types except
"a", "A", "b", "B", "h", "H", and "P" the pack function will gobble up that many values
from the LIST. A * for the repeat count means to use however many items are left. The "a"
and "A" types gobble just one value, but pack it as a string of length count, padding with nulls or
spaces as necessary. (When unpacking, "A" strips trailing spaces and nulls, but "a" does not.)
Likewise, the "b" and "B" fields pack a string that many bits long. The "h" and "H" fields
pack a string that many nybbles long. The "p" type packs a pointer to a null− terminated string.
You are responsible for ensuring the string is not a temporary value (which can potentially get
deallocated before you get around to using the packed result). The "P" packs a pointer to a
structure of the size indicated by the length. A NULL pointer is created if the corresponding
value for "p" or "P" is undef. Real numbers (floats and doubles) are in the native machine
format only; due to the multiplicity of floating formats around, and the lack of a standard
"network" representation, no facility for interchange has been made. This means that packed
floating point data written on one machine may not be readable on another − even if both use
IEEE floating point arithmetic (as the endian−ness of the memory representation is not part of
the IEEE spec). Note that Perl uses doubles internally for all numeric calculation, and
converting from double into float and thence back to double again will lose precision (i.e.,
unpack("f", pack("f", $foo)) will not in general equal $foo).
Examples:
$foo = pack("cccc",65,66,67,68);
# foo eq "ABCD"
$foo = pack("c4",65,66,67,68);
# same thing
$foo = pack("ccxxcc",65,66,67,68);
# foo eq "AB\0\0CD"
18−Oct−1998
Version 5.005_02
241
perlfunc
Perl Programmers Reference Guide
perlfunc
$foo = pack("s2",1,2);
# "\1\0\2\0" on little−endian
# "\0\1\0\2" on big−endian
$foo = pack("a4","abcd","x","y","z");
# "abcd"
$foo = pack("aaaa","abcd","x","y","z");
# "axyz"
$foo = pack("a14","abcdefg");
# "abcdefg\0\0\0\0\0\0\0"
$foo = pack("i9pl", gmtime);
# a real struct tm (on my system anyway)
sub bintodec {
unpack("N", pack("B32", substr("0" x 32 . shift, −32)));
}
The same template may generally also be used in the unpack function.
package
package NAMESPACE
Declares the compilation unit as being in the given namespace. The scope of the package
declaration is from the declaration itself through the end of the enclosing block (the same scope
as the local() operator). All further unqualified dynamic identifiers will be in this
namespace. A package statement affects only dynamic variables—including those you‘ve used
local() on—but not lexical variables created with my(). Typically it would be the first
declaration in a file to be included by the require or use operator. You can switch into a
package in more than one place; it merely influences which symbol table is used by the compiler
for the rest of that block. You can refer to variables and filehandles in other packages by
prefixing the identifier with the package name and a double colon: $Package::Variable.
If the package name is null, the main package as assumed. That is, $::sail is equivalent to
$main::sail.
If NAMESPACE is omitted, then there is no current package, and all identifiers must be fully
qualified or lexicals. This is stricter than use strict, since it also extends to function names.
See Packages in perlmod for more information about packages, modules, and classes. See
perlsub for other scoping issues.
pipe READHANDLE,WRITEHANDLE
Opens a pair of connected pipes like the corresponding system call. Note that if you set up a loop
of piped processes, deadlock can occur unless you are very careful. In addition, note that Perl‘s
pipes use stdio buffering, so you may need to set $| to flush your WRITEHANDLE after each
command, depending on the application.
See IPC::Open2, IPC::Open3, and Bidirectional Communication in perlipc for examples of such
things.
pop ARRAY
pop
Pops and returns the last value of the array, shortening the array by 1. Has a similar effect to
$tmp = $ARRAY[$#ARRAY−−];
If there are no elements in the array, returns the undefined value. If ARRAY is omitted, pops the
@ARGV array in the main program, and the @_ array in subroutines, just like shift().
242
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
pos SCALAR
pos
Returns the offset of where the last m//g search left off for the variable is in question ($_ is
used when the variable is not specified). May be modified to change that offset. Such
modification will also influence the \G zero−width assertion in regular expressions. See perlre
and perlop.
print FILEHANDLE LIST
print LIST
print
Prints a string or a comma−separated list of strings. Returns TRUE if successful.
FILEHANDLE may be a scalar variable name, in which case the variable contains the name of
or a reference to the filehandle, thus introducing one level of indirection. (NOTE: If
FILEHANDLE is a variable and the next token is a term, it may be misinterpreted as an operator
unless you interpose a + or put parentheses around the arguments.) If FILEHANDLE is omitted,
prints by default to standard output (or to the last selected output channel—see /select). If LIST
is also omitted, prints $_ to the currently selected output channel. To set the default output
channel to something other than STDOUT use the select operation. Note that, because print
takes a LIST, anything in the LIST is evaluated in list context, and any subroutine that you call
will have one or more of its expressions evaluated in list context. Also be careful not to follow
the print keyword with a left parenthesis unless you want the corresponding right parenthesis to
terminate the arguments to the print—interpose a + or put parentheses around all the arguments.
Note that if you‘re storing FILEHANDLES in an array or other expression, you will have to use
a block returning its value instead:
print { $files[$i] } "stuff\n";
print { $OK ? STDOUT : STDERR } "stuff\n";
printf FILEHANDLE FORMAT, LIST
printf FORMAT, LIST
Equivalent to print FILEHANDLE sprintf(FORMAT, LIST), except that $\ (the
output record separator) is not appended. The first argument of the list will be interpreted as the
printf() format. If use locale is in effect, the character used for the decimal point in
formatted real numbers is affected by the LC_NUMERIC locale. See perllocale.
Don‘t fall into the trap of using a printf() when a simple print() would do. The
print() is more efficient and less error prone.
prototype FUNCTION
Returns the prototype of a function as a string (or undef if the function has no prototype).
FUNCTION is a reference to, or the name of, the function whose prototype you want to retrieve.
If FUNCTION is a string starting with CORE::, the rest is taken as a name for Perl builtin. If
builtin is not overridable (such as qw//) or its arguments cannot be expressed by a prototype
(such as system()) − in other words, the builtin does not behave like a Perl function − returns
undef. Otherwise, the string describing the equivalent prototype is returned.
push ARRAY,LIST
Treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length
of ARRAY increases by the length of LIST. Has the same effect as
for $value (LIST) {
$ARRAY[++$#ARRAY] = $value;
}
but is more efficient. Returns the new number of elements in the array.
q/STRING/
18−Oct−1998
Version 5.005_02
243
perlfunc
Perl Programmers Reference Guide
perlfunc
qq/STRING/
qr/STRING/
qx/STRING/
qw/STRING/
Generalized quotes. See perlop.
quotemeta EXPR
quotemeta
Returns the value of EXPR with all non−alphanumeric characters backslashed. (That is, all
characters not matching /[A−Za−z_0−9]/ will be preceded by a backslash in the returned
string, regardless of any locale settings.) This is the internal function implementing the \Q
escape in double−quoted strings.
If EXPR is omitted, uses $_.
rand EXPR
rand
Returns a random fractional number greater than or equal to and less than the value of EXPR.
(EXPR should be positive.) If EXPR is omitted, the value 1 is used. Automatically calls
srand() unless srand() has already been called. See also srand().
(Note: If your rand function consistently returns numbers that are too large or too small, then
your version of Perl was probably compiled with the wrong number of RANDBITS.)
read FILEHANDLE,SCALAR,LENGTH,OFFSET
read FILEHANDLE,SCALAR,LENGTH
Attempts to read LENGTH bytes of data into variable SCALAR from the specified
FILEHANDLE. Returns the number of bytes actually read, at end of file, or undef if there was
an error. SCALAR will be grown or shrunk to the length actually read. An OFFSET may be
specified to place the read data at some other place than the beginning of the string. This call is
actually implemented in terms of stdio‘s fread(3) call. To get a true read(2) system call, see
sysread().
readdir DIRHANDLE
Returns the next directory entry for a directory opened by opendir(). If used in list context,
returns all the rest of the entries in the directory. If there are no more entries, returns an
undefined value in scalar context or a null list in list context.
If you‘re planning to filetest the return values out of a readdir(), you‘d better prepend the
directory in question. Otherwise, because we didn‘t chdir() there, it would have been testing
the wrong file.
opendir(DIR, $some_dir) || die "can’t opendir $some_dir: $!";
@dots = grep { /^\./ && −f "$some_dir/$_" } readdir(DIR);
closedir DIR;
readline EXPR
Reads from the filehandle whose typeglob is contained in EXPR. In scalar context, a single line
is read and returned. In list context, reads until end−of−file is reached and returns a list of lines
(however you‘ve defined lines with $/ or $INPUT_RECORD_SEPARATOR). This is the
internal function implementing the operator, but you can use it directly. The
operator is discussed in more detail in I/O Operators in perlop.
$line = ;
$line = readline(*STDIN);
# same thing
readlink EXPR
readlink Returns the value of a symbolic link, if symbolic links are implemented. If not, gives a fatal
error. If there is some system error, returns the undefined value and sets $! (errno). If EXPR is
omitted, uses $_.
244
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
readpipe EXPR
EXPR is executed as a system command. The collected standard output of the command is
returned. In scalar context, it comes back as a single (potentially multi−line) string. In list
context, returns a list of lines (however you‘ve defined lines with $/ or
$INPUT_RECORD_SEPARATOR). This is the internal function implementing the qx/EXPR/
operator, but you can use it directly. The qx/EXPR/ operator is discussed in more detail in
I/O Operators in perlop.
recv SOCKET,SCALAR,LEN,FLAGS
Receives a message on a socket. Attempts to receive LENGTH bytes of data into variable
SCALAR from the specified SOCKET filehandle. Actually does a C recvfrom(), so that it
can return the address of the sender. Returns the undefined value if there‘s an error. SCALAR
will be grown or shrunk to the length actually read. Takes the same flags as the system call of
the same name. See UDP: Message Passing in perlipc for examples.
redo LABEL
redo
The redo command restarts the loop block without evaluating the conditional again. The
continue block, if any, is not executed. If the LABEL is omitted, the command refers to the
innermost enclosing loop. This command is normally used by programs that want to lie to
themselves about what was just input:
# a simpleminded Pascal comment stripper
# (warning: assumes no { or } in strings)
LINE: while () {
while (s|({.*}.*){.*}|$1 |) {}
s|{.*}| |;
if (s|{.*| |) {
$front = $_;
while () {
if (/}/) {
# end of comment?
s|^|$front\{|;
redo LINE;
}
}
}
print;
}
See also /continue for an illustration of how last, next, and redo work.
ref EXPR
ref
Returns a TRUE value if EXPR is a reference, FALSE otherwise. If EXPR is not specified, $_
will be used. The value returned depends on the type of thing the reference is a reference to.
Builtin types include:
REF
SCALAR
ARRAY
HASH
CODE
GLOB
If the referenced object has been blessed into a package, then that package name is returned
instead. You can think of ref() as a typeof() operator.
if (ref($r) eq "HASH") {
print "r is a reference to a hash.\n";
}
18−Oct−1998
Version 5.005_02
245
perlfunc
Perl Programmers Reference Guide
perlfunc
if (!ref($r)) {
print "r is not a reference at all.\n";
}
See also perlref.
rename OLDNAME,NEWNAME
Changes the name of a file. Returns 1 for success, otherwise. Will not work across file
system boundaries.
require EXPR
require
Demands some semantics specified by EXPR, or by $_ if EXPR is not supplied. If EXPR is
numeric, demands that the current version of Perl ($] or $PERL_VERSION) be equal or
greater than EXPR.
Otherwise, demands that a library file be included if it hasn‘t already been included. The file is
included via the do−FILE mechanism, which is essentially just a variety of eval(). Has
semantics similar to the following subroutine:
sub require {
my($filename) = @_;
return 1 if $INC{$filename};
my($realfilename,$result);
ITER: {
foreach $prefix (@INC) {
$realfilename = "$prefix/$filename";
if (−f $realfilename) {
$result = do $realfilename;
last ITER;
}
}
die "Can’t find $filename in \@INC";
}
die $@ if $@;
die "$filename did not return true value" unless $result;
$INC{$filename} = $realfilename;
return $result;
}
Note that the file will not be included twice under the same specified name. The file must return
TRUE as the last statement to indicate successful execution of any initialization code, so it‘s
customary to end such a file with "1;" unless you‘re sure it‘ll return TRUE otherwise. But it‘s
better just to put the "1;", in case you add more statements.
If EXPR is a bareword, the require assumes a ".pm" extension and replaces "::" with "/" in the
filename for you, to make it easy to load standard modules. This form of loading of modules
does not risk altering your namespace.
In other words, if you try this:
require Foo::Bar;
# a splendid bareword
The require function will actually look for the "Foo/Bar.pm" file in the directories specified in
the @INC array.
But if you try this:
$class = ’Foo::Bar’;
require $class;
# $class is not a bareword
#or
246
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
require "Foo::Bar";
perlfunc
# not a bareword because of the ""
The require function will look for the "Foo::Bar" file in the @INC array and will complain
about not finding "Foo::Bar" there. In this case you can do:
eval "require $class";
For a yet−more−powerful import facility, see /use and perlmod.
reset EXPR
reset
Generally used in a continue block at the end of a loop to clear variables and reset ??
searches so that they work again. The expression is interpreted as a list of single characters
(hyphens allowed for ranges). All variables and arrays beginning with one of those letters are
reset to their pristine state. If the expression is omitted, one−match searches (?pattern?) are
reset to match again. Resets only variables or searches in the current package. Always returns 1.
Examples:
reset ’X’;
reset ’a−z’;
reset;
# reset all X variables
# reset lower case variables
# just reset ?? searches
Resetting "A−Z" is not recommended because you‘ll wipe out your @ARGV and @INC arrays
and your %ENV hash. Resets only package variables—lexical variables are unaffected, but they
clean themselves up on scope exit anyway, so you‘ll probably want to use them instead. See
/my.
return EXPR
return
Returns from a subroutine, eval(), or do FILE with the value given in EXPR. Evaluation of
EXPR may be in list, scalar, or void context, depending on how the return value will be used,
and the context may vary from one execution to the next (see wantarray()). If no EXPR is
given, returns an empty list in list context, an undefined value in scalar context, or nothing in a
void context.
(Note that in the absence of a return, a subroutine, eval, or do FILE will automatically return the
value of the last expression evaluated.)
reverse LIST
In list context, returns a list value consisting of the elements of LIST in the opposite order. In
scalar context, concatenates the elements of LIST, and returns a string value consisting of those
bytes, but in the opposite order.
print reverse <>;
# line tac, last line first
undef $/;
print scalar reverse <>;
# for efficiency of <>
# byte tac, last line tsrif
This operator is also handy for inverting a hash, although there are some caveats. If a value is
duplicated in the original hash, only one of those can be represented as a key in the inverted
hash. Also, this has to unwind one hash and build a whole new one, which may take some time
on a large hash.
%by_name = reverse %by_address;
# Invert the hash
rewinddir DIRHANDLE
Sets the current position to the beginning of the directory for the readdir() routine on
DIRHANDLE.
rindex STR,SUBSTR,POSITION
rindex STR,SUBSTR
Works just like index except that it returns the position of the LAST occurrence of SUBSTR in
STR. If POSITION is specified, returns the last occurrence at or before that position.
18−Oct−1998
Version 5.005_02
247
perlfunc
Perl Programmers Reference Guide
perlfunc
rmdir FILENAME
rmdir
Deletes the directory specified by FILENAME if that directory is empty. If it succeeds it returns
TRUE, otherwise it returns FALSE and sets $! (errno). If FILENAME is omitted, uses $_.
s///
The substitution operator. See perlop.
scalar EXPR
Forces EXPR to be interpreted in scalar context and returns the value of EXPR.
@counts = ( scalar @a, scalar @b, scalar @c );
There is no equivalent operator to force an expression to be interpolated in list context because
it‘s in practice never needed. If you really wanted to do so, however, you could use the
construction @{[ (some expression) ]}, but usually a simple (some expression)
suffices.
seek FILEHANDLE,POSITION,WHENCE
Sets FILEHANDLE‘s position, just like the fseek() call of stdio(). FILEHANDLE may
be an expression whose value gives the name of the filehandle. The values for WHENCE are
to set the new position to POSITION, 1 to set it to the current position plus POSITION, and 2 to
set it to EOF plus POSITION (typically negative). For WHENCE you may use the constants
SEEK_SET, SEEK_CUR, and SEEK_END from either the IO::Seekable or the POSIX
module. Returns 1 upon success, otherwise.
If you want to position file for sysread() or syswrite(), don‘t use seek() — buffering
makes its effect on the file‘s system position unpredictable and non−portable. Use sysseek()
instead.
On some systems you have to do a seek whenever you switch between reading and writing.
Amongst other things, this may have the effect of calling stdio‘s clearerr(3). A WHENCE of 1
(SEEK_CUR) is useful for not moving the file position:
seek(TEST,0,1);
This is also useful for applications emulating tail −f. Once you hit EOF on your read, and
then sleep for a while, you might have to stick in a seek() to reset things. The seek()
doesn‘t change the current position, but it does clear the end−of−file condition on the handle, so
that the next makes Perl try again to read something. We hope.
If that doesn‘t work (some stdios are particularly cantankerous), then you may need something
more like this:
for (;;) {
for ($curpos = tell(FILE); $_ = ;
$curpos = tell(FILE)) {
# search for some stuff and put it into files
}
sleep($for_a_while);
seek(FILE, $curpos, 0);
}
seekdir DIRHANDLE,POS
Sets the current position for the readdir() routine on DIRHANDLE. POS must be a value
returned by telldir(). Has the same caveats about possible directory compaction as the
corresponding system library routine.
select FILEHANDLE
select
Returns the currently selected filehandle. Sets the current default filehandle for output, if
FILEHANDLE is supplied. This has two effects: first, a write() or a print() without a
filehandle will default to this FILEHANDLE. Second, references to variables related to output
248
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
will refer to this output channel. For example, if you have to set the top of form format for more
than one output channel, you might do the following:
select(REPORT1);
$^ = ’report1_top’;
select(REPORT2);
$^ = ’report2_top’;
FILEHANDLE may be an expression whose value gives the name of the actual filehandle.
Thus:
$oldfh = select(STDERR); $| = 1; select($oldfh);
Some programmers may prefer to think of filehandles as objects with methods, preferring to
write the last example as:
use IO::Handle;
STDERR−>autoflush(1);
select RBITS,WBITS,EBITS,TIMEOUT
This calls the select(2) system call with the bit masks specified, which can be constructed using
fileno() and vec(), along these lines:
$rin = $win = $ein = ’’;
vec($rin,fileno(STDIN),1) = 1;
vec($win,fileno(STDOUT),1) = 1;
$ein = $rin | $win;
If you want to select on many filehandles you might wish to write a subroutine:
sub fhbits {
my(@fhlist) = split(’ ’,$_[0]);
my($bits);
for (@fhlist) {
vec($bits,fileno($_),1) = 1;
}
$bits;
}
$rin = fhbits(’STDIN TTY SOCK’);
The usual idiom is:
($nfound,$timeleft) =
select($rout=$rin, $wout=$win, $eout=$ein, $timeout);
or to block until something becomes ready just do this
$nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef);
Most systems do not bother to return anything useful in $timeleft, so calling select() in
scalar context just returns $nfound.
Any of the bit masks can also be undef. The timeout, if specified, is in seconds, which may be
fractional. Note: not all implementations are capable of returning the$timeleft. If not, they
always return $timeleft equal to the supplied $timeout.
You can effect a sleep of 250 milliseconds this way:
select(undef, undef, undef, 0.25);
WARNING: One should not attempt to mix buffered I/O (like read() or ) with
select(), except as permitted by POSIX, and even then only on POSIX systems. You have to
use sysread() instead.
18−Oct−1998
Version 5.005_02
249
perlfunc
Perl Programmers Reference Guide
perlfunc
semctl ID,SEMNUM,CMD,ARG
Calls the System V IPC function semctl(). You‘ll probably have to say
use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT or GETALL, then ARG must
be a variable which will hold the returned semid_ds structure or semaphore value array. Returns
like ioctl(): the undefined value for error, " but true" for zero, or the actual return value
otherwise. See also IPC::SysV and IPC::Semaphore documentation.
semget KEY,NSEMS,FLAGS
Calls the System V IPC function semget. Returns the semaphore id, or the undefined value if
there is an error. See also IPC::SysV and IPC::SysV::Semaphore documentation.
semop KEY,OPSTRING
Calls the System V IPC function semop to perform semaphore operations such as signaling and
waiting. OPSTRING must be a packed array of semop structures. Each semop structure can be
generated with pack("sss", $semnum, $semop, $semflag). The number of
semaphore operations is implied by the length of OPSTRING. Returns TRUE if successful, or
FALSE if there is an error. As an example, the following code waits on semaphore $semnum of
semaphore id $semid:
$semop = pack("sss", $semnum, −1, 0);
die "Semaphore trouble: $!\n" unless semop($semid, $semop);
To signal the semaphore, replace −1 with 1. See also IPC::SysV and
IPC::SysV::Semaphore documentation.
send SOCKET,MSG,FLAGS,TO
send SOCKET,MSG,FLAGS
Sends a message on a socket. Takes the same flags as the system call of the same name. On
unconnected sockets you must specify a destination to send TO, in which case it does a C
sendto(). Returns the number of characters sent, or the undefined value if there is an error.
See UDP: Message Passing in perlipc for examples.
setpgrp PID,PGRP
Sets the current process group for the specified PID, for the current process. Will produce a
fatal error if used on a machine that doesn‘t implement setpgrp(2). If the arguments are omitted,
it defaults to 0,0. Note that the POSIX version of setpgrp() does not accept any arguments,
so only setpgrp 0,0 is portable.
setpriority WHICH,WHO,PRIORITY
Sets the current priority for a process, a process group, or a user. (See setpriority(2).) Will
produce a fatal error if used on a machine that doesn‘t implement setpriority(2).
setsockopt SOCKET,LEVEL,OPTNAME,OPTVAL
Sets the socket option requested. Returns undefined if there is an error. OPTVAL may be
specified as undef if you don‘t want to pass an argument.
shift ARRAY
shift
Shifts the first value of the array off and returns it, shortening the array by 1 and moving
everything down. If there are no elements in the array, returns the undefined value. If ARRAY
is omitted, shifts the @_ array within the lexical scope of subroutines and formats, and the
@ARGV array at file scopes or within the lexical scopes established by the eval ‘’, BEGIN
{}, END {}, and INIT {} constructs. See also unshift(), push(), and pop().
Shift() and unshift() do the same thing to the left end of an array that pop() and
push() do to the right end.
250
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
shmctl ID,CMD,ARG
Calls the System V IPC function shmctl. You‘ll probably have to say
use IPC::SysV;
first to get the correct constant definitions. If CMD is IPC_STAT, then ARG must be a variable
which will hold the returned shmid_ds structure. Returns like ioctl: the undefined value for
error, " but true" for zero, or the actual return value otherwise. See also IPC::SysV
documentation.
shmget KEY,SIZE,FLAGS
Calls the System V IPC function shmget. Returns the shared memory segment id, or the
undefined value if there is an error. See also IPC::SysV documentation.
shmread ID,VAR,POS,SIZE
shmwrite ID,STRING,POS,SIZE
Reads or writes the System V shared memory segment ID starting at position POS for size SIZE
by attaching to it, copying in/out, and detaching from it. When reading, VAR must be a variable
that will hold the data read. When writing, if STRING is too long, only SIZE bytes are used; if
STRING is too short, nulls are written to fill out SIZE bytes. Return TRUE if successful, or
FALSE if there is an error. See also IPC::SysV documentation.
shutdown SOCKET,HOW
Shuts down a socket connection in the manner indicated by HOW, which has the same
interpretation as in the system call of the same name.
shutdown(SOCKET, 0);
shutdown(SOCKET, 1);
shutdown(SOCKET, 2);
# I/we have stopped reading data
# I/we have stopped writing data
# I/we have stopped using this socket
This is useful with sockets when you want to tell the other side you‘re done writing but not done
reading, or vice versa. It‘s also a more insistent form of close because it also disables the
filedescriptor in any forked copies in other processes.
sin EXPR
sin
Returns the sine of EXPR (expressed in radians). If EXPR is omitted, returns sine of $_.
For the inverse sine operation, you may use the POSIX::asin() function, or use this relation:
sub asin { atan2($_[0], sqrt(1 − $_[0] * $_[0])) }
sleep EXPR
sleep
Causes the script to sleep for EXPR seconds, or forever if no EXPR. May be interrupted if the
process receives a signal such as SIGALRM. Returns the number of seconds actually slept. You
probably cannot mix alarm() and sleep() calls, because sleep() is often implemented
using alarm().
On some older systems, it may sleep up to a full second less than what you requested, depending
on how it counts seconds. Most modern systems always sleep the full amount. They may appear
to sleep longer than that, however, because your process might not be scheduled right away in a
busy multitasking system.
For delays of finer granularity than one second, you may use Perl‘s syscall() interface to
access setitimer(2) if your system supports it, or else see /select() above.
See also the POSIX module‘s sigpause() function.
socket SOCKET,DOMAIN,TYPE,PROTOCOL
Opens a socket of the specified kind and attaches it to filehandle SOCKET. DOMAIN, TYPE,
and PROTOCOL are specified the same as for the system call of the same name. You should
"use Socket;" first to get the proper definitions imported. See the example in
18−Oct−1998
Version 5.005_02
251
perlfunc
Perl Programmers Reference Guide
perlfunc
Sockets: Client/Server Communication in perlipc.
socketpair SOCKET1,SOCKET2,DOMAIN,TYPE,PROTOCOL
Creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN,
TYPE, and PROTOCOL are specified the same as for the system call of the same name. If
unimplemented, yields a fatal error. Returns TRUE if successful.
Some systems defined pipe() in terms of socketpair(), in which a call to pipe(Rdr,
Wtr) is essentially:
use Socket;
socketpair(Rdr, Wtr, AF_UNIX, SOCK_STREAM, PF_UNSPEC);
shutdown(Rdr, 1);
# no more writing for reader
shutdown(Wtr, 0);
# no more reading for writer
See perlipc for an example of socketpair use.
sort SUBNAME LIST
sort BLOCK LIST
sort LIST Sorts the LIST and returns the sorted list value. If SUBNAME or BLOCK is omitted, sort()s
in standard string comparison order. If SUBNAME is specified, it gives the name of a
subroutine that returns an integer less than, equal to, or greater than , depending on how the
elements of the array are to be ordered. (The <=> and cmp operators are extremely useful in
such routines.) SUBNAME may be a scalar variable name (unsubscripted), in which case the
value provides the name of (or a reference to) the actual subroutine to use. In place of a
SUBNAME, you can provide a BLOCK as an anonymous, in−line sort subroutine.
In the interests of efficiency the normal calling code for subroutines is bypassed, with the
following effects: the subroutine may not be a recursive subroutine, and the two elements to be
compared are passed into the subroutine not via @_ but as the package global variables $a and
$b (see example below). They are passed by reference, so don‘t modify $a and $b. And don‘t
try to declare them as lexicals either.
You also cannot exit out of the sort block or subroutine using any of the loop control operators
described in perlsyn or with goto().
When use locale is in effect, sort LIST sorts LIST according to the current collation
locale. See perllocale.
Examples:
# sort lexically
@articles = sort @files;
# same thing, but with explicit sort routine
@articles = sort {$a cmp $b} @files;
# now case−insensitively
@articles = sort {uc($a) cmp uc($b)} @files;
# same thing in reversed order
@articles = sort {$b cmp $a} @files;
# sort numerically ascending
@articles = sort {$a <=> $b} @files;
# sort numerically descending
@articles = sort {$b <=> $a} @files;
# sort using explicit subroutine name
sub byage {
$age{$a} <=> $age{$b}; # presuming numeric
252
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
}
@sortedclass = sort byage @class;
# this sorts the %age hash by value instead of key
# using an in−line function
@eldest = sort { $age{$b} <=> $age{$a} } keys %age;
sub backwards { $b cmp $a; }
@harry = (’dog’,’cat’,’x’,’Cain’,’Abel’);
@george = (’gone’,’chased’,’yz’,’Punished’,’Axed’);
print sort @harry;
# prints AbelCaincatdogx
print sort backwards @harry;
# prints xdogcatCainAbel
print sort @george, ’to’, @harry;
# prints AbelAxedCainPunishedcatchaseddoggonetoxyz
# inefficiently sort by descending numeric compare using
# the first integer after the first = sign, or the
# whole record case−insensitively otherwise
@new = sort {
($b =~ /=(\d+)/)[0] <=> ($a =~ /=(\d+)/)[0]
||
uc($a) cmp uc($b)
} @old;
# same thing, but much more efficiently;
# we’ll build auxiliary indices instead
# for speed
@nums = @caps = ();
for (@old) {
push @nums, /=(\d+)/;
push @caps, uc($_);
}
@new = @old[ sort {
$nums[$b] <=> $nums[$a]
||
$caps[$a] cmp $caps[$b]
} 0..$#old
];
# same thing using a Schwartzian Transform (no temps)
@new = map { $_−>[0] }
sort { $b−>[1] <=> $a−>[1]
||
$a−>[2] cmp $b−>[2]
} map { [$_, /=(\d+)/, uc($_)] } @old;
If you‘re using strict, you MUST NOT declare $a and $b as lexicals. They are package globals.
That means if you‘re in the main package, it‘s
@articles = sort {$main::b <=> $main::a} @files;
or just
@articles = sort {$::b <=> $::a} @files;
but if you‘re in the FooPack package, it‘s
18−Oct−1998
Version 5.005_02
253
perlfunc
Perl Programmers Reference Guide
perlfunc
@articles = sort {$FooPack::b <=> $FooPack::a} @files;
The comparison function is required to behave. If it returns inconsistent results (sometimes
saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results
are not well−defined.
splice ARRAY,OFFSET,LENGTH,LIST
splice ARRAY,OFFSET,LENGTH
splice ARRAY,OFFSET
Removes the elements designated by OFFSET and LENGTH from an array, and replaces them
with the elements of LIST, if any. In list context, returns the elements removed from the array.
In scalar context, returns the last element removed, or undef if no elements are removed. The
array grows or shrinks as necessary. If OFFSET is negative then it start that far from the end of
the array. If LENGTH is omitted, removes everything from OFFSET onward. If LENGTH is
negative, leave that many elements off the end of the array. The following equivalences hold
(assuming $[ == 0):
push(@a,$x,$y)
pop(@a)
shift(@a)
unshift(@a,$x,$y)
$a[$x] = $y
splice(@a,@a,0,$x,$y)
splice(@a,−1)
splice(@a,0,1)
splice(@a,0,0,$x,$y)
splice(@a,$x,1,$y)
Example, assuming array lengths are passed before arrays:
sub aeq {
# compare two list values
my(@a) = splice(@_,0,shift);
my(@b) = splice(@_,0,shift);
return 0 unless @a == @b;
# same len?
while (@a) {
return 0 if pop(@a) ne pop(@b);
}
return 1;
}
if (&aeq($len,@foo[1..$len],0+@bar,@bar)) { ... }
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
Splits a string into an array of strings, and returns it. By default, empty leading fields are
preserved, and empty trailing ones are deleted.
If not in list context, returns the number of fields found and splits into the @_ array. (In list
context, you can force the split into @_ by using ?? as the pattern delimiters, but it still returns
the list value.) The use of implicit split to @_ is deprecated, however, because it clobbers your
subroutine arguments.
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on whitespace (after
skipping any leading whitespace). Anything matching PATTERN is taken to be a delimiter
separating the fields. (Note that the delimiter may be longer than one character.)
If LIMIT is specified and positive, splits into no more than that many fields (though it may split
into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential
users of pop() would do well to remember). If LIMIT is negative, it is treated as if an
arbitrarily large LIMIT had been specified.
A pattern matching the null string (not to be confused with a null pattern //, which is just one
member of the set of patterns matching a null string) will split the value of EXPR into separate
characters at each point it matches that way. For example:
254
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
print join(’:’, split(/ */, ’hi there’));
produces the output ‘h:i:t:h:e:r:e’.
The LIMIT parameter can be used to split a line partially
($login, $passwd, $remainder) = split(/:/, $_, 3);
When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number
of variables in the list, to avoid unnecessary work. For the list above LIMIT would have been 4
by default. In time critical applications it behooves you not to split into more fields than you
really need.
If the PATTERN contains parentheses, additional array elements are created from each matching
substring in the delimiter.
split(/([,−])/, "1−10,20", 3);
produces the list value
(1, ’−’, 10, ’,’, 20)
If you had the entire header of a normal Unix email message in $header, you could split it up
into fields and their values this way:
$header =~ s/\n\s+/ /g; # fix continuation lines
%hdrs
= (UNIX_FROM => split /^(\S*?):\s*/m, $header);
The pattern /PATTERN/ may be replaced with an expression to specify patterns that vary at
runtime. (To do runtime compilation only once, use /$variable/o.)
As a special case, specifying a PATTERN of space (’ ’) will split on white space just as
split() with no arguments does. Thus, split(’ ’) can be used to emulate awk‘s default
behavior, whereas split(/ /) will give you as many null initial fields as there are leading
spaces. A split() on /\s+/ is like a split(’ ’) except that any leading whitespace
produces a null first field. A split() with no arguments really does a split(’ ‘, $_)
internally.
Example:
open(PASSWD, ’/etc/passwd’);
while () {
($login, $passwd, $uid, $gid,
$gcos, $home, $shell) = split(/:/);
#...
}
(Note that $shell above will still have a newline on it. See /chop, /chomp, and /join.)
sprintf FORMAT, LIST
Returns a string formatted by the usual printf() conventions of the C library function
sprintf(). See sprintf(3) or printf(3) on your system for an explanation of the general
principles.
Perl does its own sprintf() formatting — it emulates the C function sprintf(), but it
doesn‘t use it (except for floating−point numbers, and even then only the standard modifiers are
allowed). As a result, any non−standard extensions in your local sprintf() are not available
from Perl.
Perl‘s sprintf() permits the following universally−known conversions:
%%
%c
%s
18−Oct−1998
a percent sign
a character with the given number
a string
Version 5.005_02
255
perlfunc
Perl Programmers Reference Guide
%d
%u
%o
%x
%e
%f
%g
perlfunc
a signed integer, in decimal
an unsigned integer, in decimal
an unsigned integer, in octal
an unsigned integer, in hexadecimal
a floating−point number, in scientific notation
a floating−point number, in fixed decimal notation
a floating−point number, in %e or %f notation
In addition, Perl permits the following widely−supported conversions:
%X
%E
%G
%p
%n
like %x, but using upper−case letters
like %e, but using an upper−case "E"
like %g, but with an upper−case "E" (if applicable)
a pointer (outputs the Perl value’s address in hexadecimal)
special: *stores* the number of characters output so far
into the next variable in the parameter list
Finally, for backward (and we do mean "backward") compatibility, Perl permits these
unnecessary but widely−supported conversions:
%i
%D
%U
%O
%F
a
a
a
a
a
synonym
synonym
synonym
synonym
synonym
for
for
for
for
for
%d
%ld
%lu
%lo
%f
Perl permits the following universally−known flags between the % and the conversion letter:
space
+
−
0
#
number
.number
l
h
prefix positive number with a space
prefix positive number with a plus sign
left−justify within the field
use zeros, not spaces, to right−justify
prefix non−zero octal with "0", non−zero hex with "0x"
minimum field width
"precision": digits after decimal point for
floating−point, max length for string, minimum length
for integer
interpret integer as C type "long" or "unsigned long"
interpret integer as C type "short" or "unsigned short"
There is also one Perl−specific flag:
V
interpret integer as Perl’s standard integer type
Where a number would appear in the flags, an asterisk ("*") may be used instead, in which case
Perl uses the next item in the parameter list as the given number (that is, as the field width or
precision). If a field width obtained through "*" is negative, it has the same effect as the "−" flag:
left−justification.
If use locale is in effect, the character used for the decimal point in formatted real numbers
is affected by the LC_NUMERIC locale. See perllocale.
sqrt EXPR
sqrt
Return the square root of EXPR. If EXPR is omitted, returns square root of $_.
srand EXPR
srand
Sets the random number seed for the rand() operator. If EXPR is omitted, uses a
semi−random value based on the current time and process ID, among other things. In versions
of Perl prior to 5.004 the default seed was just the current time(). This isn‘t a particularly
good seed, so many old programs supply their own seed value (often time ^ $$ or time ^
256
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
($$ + ($$ << 15))), but that isn‘t necessary any more.
In fact, it‘s usually not necessary to call srand() at all, because if it is not called explicitly, it is
called implicitly at the first use of the rand() operator. However, this was not the case in
version of Perl before 5.004, so if your script will run under older Perl versions, it should call
srand().
Note that you need something much more random than the default seed for cryptographic
purposes. Checksumming the compressed output of one or more rapidly changing operating
system status programs is the usual method. For example:
srand (time ^ $$ ^ unpack "%L*", ‘ps axww | gzip‘);
If you‘re particularly concerned with this, see the Math::TrulyRandom module in CPAN.
Do not call srand() multiple times in your program unless you know exactly what you‘re
doing and why you‘re doing it. The point of the function is to "seed" the rand() function so
that rand() can produce a different sequence each time you run your program. Just do it once
at the top of your program, or you won‘t get random numbers out of rand()!
Frequently called programs (like CGI scripts) that simply use
time ^ $$
for a seed can fall prey to the mathematical property that
a^b == (a+1)^(b+1)
one−third of the time. So don‘t do that.
stat FILEHANDLE
stat EXPR
stat
Returns a 13−element list giving the status info for a file, either the file opened via
FILEHANDLE, or named by EXPR. If EXPR is omitted, it stats $_. Returns a null list if the
stat fails. Typically used as follows:
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks)
= stat($filename);
Not all fields are supported on all filesystem types. Here are the meaning of the fields:
0
1
2
3
4
5
6
7
8
9
10
11
12
dev
ino
mode
nlink
uid
gid
rdev
size
atime
mtime
ctime
blksize
blocks
device number of filesystem
inode number
file mode (type and permissions)
number of (hard) links to the file
numeric user ID of file’s owner
numeric group ID of file’s owner
the device identifier (special files only)
total size of file, in bytes
last access time since the epoch
last modify time since the epoch
inode change time (NOT creation time!) since the epoch
preferred block size for file system I/O
actual number of blocks allocated
(The epoch was at 00:00 January 1, 1970 GMT.)
If stat is passed the special filehandle consisting of an underline, no stat is done, but the current
contents of the stat structure from the last stat or filetest are returned. Example:
if (−x $file && (($d) = stat(_)) && $d < 0) {
print "$file is executable NFS file\n";
18−Oct−1998
Version 5.005_02
257
perlfunc
Perl Programmers Reference Guide
perlfunc
}
(This works on machines only for which the device number is negative under NFS.)
In scalar context, stat() returns a boolean value indicating success or failure, and, if
successful, sets the information associated with the special filehandle _.
study SCALAR
study
Takes extra time to study SCALAR ($_ if unspecified) in anticipation of doing many pattern
matches on the string before it is next modified. This may or may not save time, depending on
the nature and number of patterns you are searching on, and on the distribution of character
frequencies in the string to be searched — you probably want to compare run times with and
without it to see which runs faster. Those loops which scan for many short constant strings
(including the constant parts of more complex patterns) will benefit most. You may have only
one study() active at a time — if you study a different scalar the first is "unstudied". (The
way study() works is this: a linked list of every character in the string to be searched is made,
so we know, for example, where all the ‘k’ characters are. From each search string, the rarest
character is selected, based on some static frequency tables constructed from some C programs
and English text. Only those places that contain this "rarest" character are examined.)
For example, here is a loop that inserts index producing entries before any line containing a
certain pattern:
while (<>) {
study;
print ".IX foo\n" if /\bfoo\b/;
print ".IX bar\n" if /\bbar\b/;
print ".IX blurfl\n" if /\bblurfl\b/;
# ...
print;
}
In searching for /\bfoo\b/, only those locations in $_ that contain "f" will be looked at,
because "f" is rarer than "o". In general, this is a big win except in pathological cases. The
only question is whether it saves you more time than it took to build the linked list in the first
place.
Note that if you have to look for strings that you don‘t know till runtime, you can build an entire
loop as a string and eval() that to avoid recompiling all your patterns all the time. Together
with undefining $/ to input entire files as one record, this can be very fast, often faster than
specialized programs like fgrep(1). The following scans a list of files (@files) for a list of
words (@words), and prints out the names of those files that contain a match:
$search = ’while (<>) { study;’;
foreach $word (@words) {
$search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n";
}
$search .= "}";
@ARGV = @files;
undef $/;
eval $search;
# this screams
$/ = "\n";
# put back to normal input delimiter
foreach $file (sort keys(%seen)) {
print $file, "\n";
}
sub BLOCK
258
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
sub NAME
sub NAME BLOCK
This is subroutine definition, not a real function per se. With just a NAME (and possibly
prototypes), it‘s just a forward declaration. Without a NAME, it‘s an anonymous function
declaration, and does actually return a value: the CODE ref of the closure you just created. See
perlsub and perlref for details.
substr EXPR,OFFSET,LEN,REPLACEMENT
substr EXPR,OFFSET,LEN
substr EXPR,OFFSET
Extracts a substring out of EXPR and returns it. First character is at offset , or whatever you‘ve
set $[ to (but don‘t do that). If OFFSET is negative (or more precisely, less than $[), starts
that far from the end of the string. If LEN is omitted, returns everything to the end of the string.
If LEN is negative, leaves that many characters off the end of the string.
If you specify a substring that is partly outside the string, the part within the string is returned.
If the substring is totally outside the string a warning is produced.
You can use the substr() function as an lvalue, in which case EXPR must be an lvalue. If
you assign something shorter than LEN, the string will shrink, and if you assign something
longer than LEN, the string will grow to accommodate it. To keep the string the same length you
may need to pad or chop your value using sprintf().
An alternative to using substr() as an lvalue is to specify the replacement string as the 4th
argument. This allows you to replace parts of the EXPR and return what was there before in one
operation.
symlink OLDFILE,NEWFILE
Creates a new filename symbolically linked to the old filename. Returns 1 for success,
otherwise. On systems that don‘t support symbolic links, produces a fatal error at run time. To
check for that, use eval:
$symlink_exists =
eval { symlink("",""); 1 };
syscall LIST
Calls the system call specified as the first element of the list, passing the remaining elements as
arguments to the system call. If unimplemented, produces a fatal error. The arguments are
interpreted as follows: if a given argument is numeric, the argument is passed as an int. If not,
the pointer to the string value is passed. You are responsible to make sure a string is
pre−extended long enough to receive any result that might be written into a string. You can‘t use
a string literal (or other read−only string) as an argument to syscall() because Perl has to
assume that any string pointer might be written through. If your integer arguments are not
literals and have never been interpreted in a numeric context, you may need to add to them to
force them to look like numbers. This emulates the syswrite() function (or vice versa):
require ’syscall.ph’;
# may need to run h2ph
$s = "hi there\n";
syscall(&SYS_write, fileno(STDOUT), $s, length $s);
Note that Perl supports passing of up to only 14 arguments to your system call, which in practice
should usually suffice.
Syscall returns whatever value returned by the system call it calls. If the system call fails,
syscall() returns −1 and sets $! (errno). Note that some system calls can legitimately return
−1. The proper way to handle such calls is to assign $!=0; before the call and check the value
of $! if syscall returns −1.
There‘s a problem with syscall(&SYS_pipe): it returns the file number of the read end of
the pipe it creates. There is no way to retrieve the file number of the other end. You can avoid
18−Oct−1998
Version 5.005_02
259
perlfunc
Perl Programmers Reference Guide
perlfunc
this problem by using pipe() instead.
sysopen FILEHANDLE,FILENAME,MODE
sysopen FILEHANDLE,FILENAME,MODE,PERMS
Opens the file whose filename is given by FILENAME, and associates it with FILEHANDLE.
If FILEHANDLE is an expression, its value is used as the name of the real filehandle wanted.
This function calls the underlying operating system‘s open() function with the parameters
FILENAME, MODE, PERMS.
The possible values and flag bits of the MODE parameter are system−dependent; they are
available via the standard module Fcntl. For historical reasons, some values work on almost
every system supported by perl: zero means read−only, one means write−only, and two means
read/write. We know that these values do not work under OS/390 Unix and on the Macintosh;
you probably don‘t want to use them in new code.
If the file named by FILENAME does not exist and the open() call creates it (typically because
MODE includes the O_CREAT flag), then the value of PERMS specifies the permissions of the
newly created file. If you omit the PERMS argument to sysopen(), Perl uses the octal value
0666. These permission values need to be in octal, and are modified by your process‘s current
umask. The umask value is a number representing disabled permissions bits—if your umask
were 027 (group can‘t write; others can‘t read, write, or execute), then passing sysopen()
0666 would create a file with mode 0640 (0666 &~ 027 is 0640).
If you find this umask() talk confusing, here‘s some advice: supply a creation mode of 0666
for regular files and one of 0777 for directories (in mkdir()) and executable files. This gives
users the freedom of choice: if they want protected files, they might choose process umasks of
022, 027, or even the particularly antisocial mask of 077. Programs should rarely if ever make
policy decisions better left to the user. The exception to this is when writing files that should be
kept private: mail files, web browser cookies, .rhosts files, and so on. In short, seldom if ever
use 0644 as argument to sysopen() because that takes away the user‘s option to have a more
permissive umask. Better to omit it.
The IO::File module provides a more object−oriented approach, if you‘re into that kind of
thing.
sysread FILEHANDLE,SCALAR,LENGTH,OFFSET
sysread FILEHANDLE,SCALAR,LENGTH
Attempts to read LENGTH bytes of data into variable SCALAR from the specified
FILEHANDLE, using the system call read(2). It bypasses stdio, so mixing this with other kinds
of reads, print(), write(), seek(), or tell() can cause confusion because stdio usually
buffers data. Returns the number of bytes actually read, at end of file, or undef if there was an
error. SCALAR will be grown or shrunk so that the last byte actually read is the last byte of the
scalar after the read.
An OFFSET may be specified to place the read data at some place in the string other than the
beginning. A negative OFFSET specifies placement at that many bytes counting backwards
from the end of the string. A positive OFFSET greater than the length of SCALAR results in the
string being padded to the required size with "\0" bytes before the result of the read is
appended.
sysseek FILEHANDLE,POSITION,WHENCE
Sets FILEHANDLE‘s system position using the system call lseek(2). It bypasses stdio, so
mixing this with reads (other than sysread()), print(), write(), seek(), or tell()
may cause confusion. FILEHANDLE may be an expression whose value gives the name of the
filehandle. The values for WHENCE are to set the new position to POSITION, 1 to set the it
to the current position plus POSITION, and 2 to set it to EOF plus POSITION (typically
negative). For WHENCE, you may use the constants SEEK_SET, SEEK_CUR, and SEEK_END
from either the IO::Seekable or the POSIX module.
260
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
Returns the new position, or the undefined value on failure. A position of zero is returned as the
string " but true"; thus sysseek() returns TRUE on success and FALSE on failure, yet you
can still easily determine the new position.
system LIST
system PROGRAM LIST
Does exactly the same thing as "exec LIST" except that a fork is done first, and the parent
process waits for the child process to complete. Note that argument processing varies depending
on the number of arguments. If there is more than one argument in LIST, or if LIST is an array
with more than one value, starts the program given by the first element of the list with arguments
given by the rest of the list. If there is only one scalar argument, the argument is checked for
shell metacharacters, and if there are any, the entire argument is passed to the system‘s command
shell for parsing (this is /bin/sh −c on Unix platforms, but varies on other platforms). If
there are no shell metacharacters in the argument, it is split into words and passed directly to
execvp(), which is more efficient.
The return value is the exit status of the program as returned by the wait() call. To get the
actual exit value divide by 256. See also /exec. This is NOT what you want to use to capture the
output from a command, for that you should use merely backticks or qx//, as described in
‘STRING‘ in perlop.
Like exec(), system() allows you to lie to a program about its name if you use the
"system PROGRAM LIST" syntax. Again, see /exec.
Because system() and backticks block SIGINT and SIGQUIT, killing the program they‘re
running doesn‘t actually interrupt your program.
@args = ("command", "arg1", "arg2");
system(@args) == 0
or die "system @args failed: $?"
You can check all the failure possibilities by inspecting $? like this:
$exit_value = $? >> 8;
$signal_num = $? & 127;
$dumped_core = $? & 128;
When the arguments get executed via the system shell, results and return codes will be subject to
its quirks and capabilities. See ‘STRING‘ in perlop and /exec for details.
syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET
syswrite FILEHANDLE,SCALAR,LENGTH
Attempts to write LENGTH bytes of data from variable SCALAR to the specified
FILEHANDLE, using the system call write(2). It bypasses stdio, so mixing this with reads
(other than sysread()), print(), write(), seek(), or tell() may cause confusion
because stdio usually buffers data. Returns the number of bytes actually written, or undef if
there was an error. If the LENGTH is greater than the available data in the SCALAR after the
OFFSET, only as much data as is available will be written.
An OFFSET may be specified to write the data from some part of the string other than the
beginning. A negative OFFSET specifies writing that many bytes counting backwards from the
end of the string. In the case the SCALAR is empty you can use OFFSET but only zero offset.
tell FILEHANDLE
tell
Returns the current position for FILEHANDLE. FILEHANDLE may be an expression whose
value gives the name of the actual filehandle. If FILEHANDLE is omitted, assumes the file last
read.
18−Oct−1998
Version 5.005_02
261
perlfunc
Perl Programmers Reference Guide
perlfunc
telldir DIRHANDLE
Returns the current position of the readdir() routines on DIRHANDLE. Value may be given
to seekdir() to access a particular location in a directory. Has the same caveats about
possible directory compaction as the corresponding system library routine.
tie VARIABLE,CLASSNAME,LIST
This function binds a variable to a package class that will provide the implementation for the
variable. VARIABLE is the name of the variable to be enchanted. CLASSNAME is the name
of a class implementing objects of correct type. Any additional arguments are passed to the
"new()" method of the class (meaning TIESCALAR, TIEARRAY, or TIEHASH). Typically
these are arguments such as might be passed to the dbm_open() function of C. The object
returned by the "new()" method is also returned by the tie() function, which would be
useful if you want to access other methods in CLASSNAME.
Note that functions such as keys() and values() may return huge lists when used on large
objects, like DBM files. You may prefer to use the each() function to iterate over such.
Example:
# print out history file offsets
use NDBM_File;
tie(%HIST, ’NDBM_File’, ’/usr/lib/news/history’, 1, 0);
while (($key,$val) = each %HIST) {
print $key, ’ = ’, unpack(’L’,$val), "\n";
}
untie(%HIST);
A class implementing a hash should have the following methods:
TIEHASH classname, LIST
DESTROY this
FETCH this, key
STORE this, key, value
DELETE this, key
EXISTS this, key
FIRSTKEY this
NEXTKEY this, lastkey
A class implementing an ordinary array should have the following methods:
TIEARRAY classname, LIST
DESTROY this
FETCH this, key
STORE this, key, value
[others TBD]
A class implementing a scalar should have the following methods:
TIESCALAR classname, LIST
DESTROY this
FETCH this,
STORE this, value
Unlike dbmopen(), the tie() function will not use or require a module for you—you need to
do that explicitly yourself. See DB_File or the Config module for interesting tie()
implementations.
For further details see perltie, tied VARIABLE.
262
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
tied VARIABLE
Returns a reference to the object underlying VARIABLE (the same value that was originally
returned by the tie() call that bound the variable to a package.) Returns the undefined value if
VARIABLE isn‘t tied to a package.
time
Returns the number of non−leap seconds since whatever time the system considers to be the
epoch (that‘s 00:00:00, January 1, 1904 for MacOS, and 00:00:00 UTC, January 1, 1970 for
most other systems). Suitable for feeding to gmtime() and localtime().
times
Returns a four−element list giving the user and system times, in seconds, for this process and the
children of this process.
($user,$system,$cuser,$csystem) = times;
tr///
The transliteration operator. Same as y///. See perlop.
truncate FILEHANDLE,LENGTH
truncate EXPR,LENGTH
Truncates the file opened on FILEHANDLE, or named by EXPR, to the specified length.
Produces a fatal error if truncate isn‘t implemented on your system. Returns TRUE if successful,
the undefined value otherwise.
uc EXPR
uc
Returns an uppercased version of EXPR. This is the internal function implementing the \U
escape in double−quoted strings. Respects current LC_CTYPE locale if use locale in force.
See perllocale.
If EXPR is omitted, uses $_.
ucfirst EXPR
ucfirst
Returns the value of EXPR with the first character uppercased. This is the internal function
implementing the \u escape in double−quoted strings. Respects current LC_CTYPE locale if
use locale in force. See perllocale.
If EXPR is omitted, uses $_.
umask EXPR
umask
Sets the umask for the process to EXPR and returns the previous value. If EXPR is omitted,
merely returns the current umask.
If umask(2) is not implemented on your system and you are trying to restrict access for yourself
(i.e., (EXPR & 0700) 0), produces a fatal error at run time. If umask(2) is not implemented and
you are not trying to restrict access for yourself, returns undef.
Remember that a umask is a number, usually given in octal; it is not a string of octal digits. See
also /oct, if all you have is a string.
undef EXPR
undef
Undefines the value of EXPR, which must be an lvalue. Use only on a scalar value, an array
(using "@"), a hash (using "%"), a subroutine (using "&"), or a typeglob (using "<*"). (Saying
undef $hash{$key} will probably not do what you expect on most predefined variables or
DBM list values, so don‘t do that; see delete.) Always returns the undefined value. You can
omit the EXPR, in which case nothing is undefined, but you still get an undefined value that you
could, for instance, return from a subroutine, assign to a variable or pass as a parameter.
Examples:
undef
undef
undef
undef
undef
18−Oct−1998
$foo;
$bar{’blurfl’};
@ary;
%hash;
&mysub;
# Compare to: delete $bar{’blurfl’};
Version 5.005_02
263
perlfunc
Perl Programmers Reference Guide
perlfunc
undef *xyz;
# destroys $xyz, @xyz, %xyz, &xyz, etc.
return (wantarray ? (undef, $errmsg) : undef) if $they_blew_it;
select undef, undef, undef, 0.25;
($a, $b, undef, $c) = &foo;
# Ignore third value returned
Note that this is a unary operator, not a list operator.
unlink LIST
unlink
Deletes a list of files. Returns the number of files successfully deleted.
$cnt = unlink ’a’, ’b’, ’c’;
unlink @goners;
unlink <*.bak>;
Note: unlink() will not delete directories unless you are superuser and the −U flag is supplied
to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict damage
on your filesystem. Use rmdir() instead.
If LIST is omitted, uses $_.
unpack TEMPLATE,EXPR
Unpack() does the reverse of pack(): it takes a string representing a structure and expands it
out into a list value, returning the array value. (In scalar context, it returns merely the first value
produced.) The TEMPLATE has the same format as in the pack() function. Here‘s a
subroutine that does substring:
sub substr {
my($what,$where,$howmuch) = @_;
unpack("x$where a$howmuch", $what);
}
and then there‘s
sub ordinal { unpack("c",$_[0]); } # same as ord()
In addition, you may prefix a field with a % to indicate that you want a −bit
checksum of the items instead of the items themselves. Default is a 16−bit checksum. For
example, the following computes the same number as the System V sum program:
while (<>) {
$checksum += unpack("%16C*", $_);
}
$checksum %= 65536;
The following efficiently counts the number of set bits in a bit vector:
$setbits = unpack("%32b*", $selectmask);
untie VARIABLE
Breaks the binding between a variable and a package. (See tie().)
unshift ARRAY,LIST
Does the opposite of a shift(). Or the opposite of a push(), depending on how you look at
it. Prepends list to the front of the array, and returns the new number of elements in the array.
unshift(ARGV, ’−e’) unless $ARGV[0] =~ /^−/;
Note the LIST is prepended whole, not one element at a time, so the prepended elements stay in
the same order. Use reverse() to do the reverse.
use Module LIST
264
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
use Module
use Module VERSION LIST
use VERSION
Imports some semantics into the current package from the named module, generally by aliasing
certain subroutine or variable names into your package. It is exactly equivalent to
BEGIN { require Module; import Module LIST; }
except that Module must be a bareword.
If the first argument to use is a number, it is treated as a version number instead of a module
name. If the version of the Perl interpreter is less than VERSION, then an error message is
printed and Perl exits immediately. This is often useful if you need to check the current Perl
version before useing library modules that have changed in incompatible ways from older
versions of Perl. (We try not to do this more than we have to.)
The BEGIN forces the require and import() to happen at compile time. The require
makes sure the module is loaded into memory if it hasn‘t been yet. The import() is not a
builtin—it‘s just an ordinary static method call into the "Module" package to tell the module to
import the list of features back into the current package. The module can implement its
import() method any way it likes, though most modules just choose to derive their
import() method via inheritance from the Exporter class that is defined in the Exporter
module. See Exporter. If no import() method can be found then the error is currently silently
ignored. This may change to a fatal error in a future version.
If you don‘t want your namespace altered, explicitly supply an empty list:
use Module ();
That is exactly equivalent to
BEGIN { require Module }
If the VERSION argument is present between Module and LIST, then the use will call the
VERSION method in class Module with the given version as an argument. The default
VERSION method, inherited from the Universal class, croaks if the given version is larger than
the value of the variable $Module::VERSION. (Note that there is not a comma after
VERSION!)
Because this is a wide−open interface, pragmas (compiler directives) are also implemented this
way. Currently implemented pragmas are:
use
use
use
use
use
integer;
diagnostics;
sigtrap qw(SEGV BUS);
strict qw(subs vars refs);
subs
qw(afunc blurfl);
Some of these these pseudo−modules import semantics into the current block scope (like
strict or integer, unlike ordinary modules, which import symbols into the current package
(which are effective through the end of the file).
There‘s a corresponding "no" command that unimports meanings imported by use, i.e., it calls
unimport Module LIST instead of import().
no integer;
no strict ’refs’;
If no unimport() method can be found the call fails with a fatal error.
See perlmod for a list of standard modules and pragmas.
18−Oct−1998
Version 5.005_02
265
perlfunc
Perl Programmers Reference Guide
perlfunc
utime LIST
Changes the access and modification times on each file of a list of files. The first two elements
of the list must be the NUMERICAL access and modification times, in that order. Returns the
number of files successfully changed. The inode modification time of each file is set to the
current time. This code has the same effect as the "touch" command if the files already exist:
#!/usr/bin/perl
$now = time;
utime $now, $now, @ARGV;
values HASH
Returns a list consisting of all the values of the named hash. (In a scalar context, returns the
number of values.) The values are returned in an apparently random order, but it is the same
order as either the keys() or each() function would produce on the same hash. As a side
effect, it resets HASH‘s iterator. See also keys(), each(), and sort().
vec EXPR,OFFSET,BITS
Treats the string in EXPR as a vector of unsigned integers, and returns the value of the bit field
specified by OFFSET. BITS specifies the number of bits that are reserved for each entry in the
bit vector. This must be a power of two from 1 to 32. vec() may also be assigned to, in which
case parentheses are needed to give the expression the correct precedence as in
vec($image, $max_x * $x + $y, 8) = 3;
Vectors created with vec() can also be manipulated with the logical operators |, &, and ^,
which will assume a bit vector operation is desired when both operands are strings.
The following code will build up an ASCII string saying ‘PerlPerlPerl’. The comments
show the string after each step. Note that this code works in the same way on big−endian or
little−endian machines.
my $foo = ’’;
vec($foo, 0, 32) = 0x5065726C;
vec($foo, 2, 16) = 0x5065;
vec($foo, 3, 16) = 0x726C;
vec($foo, 8, 8) = 0x50;
vec($foo, 9, 8) = 0x65;
vec($foo, 20, 4) = 2;
vec($foo, 21, 4) = 7;
vec($foo, 45,
vec($foo, 93,
vec($foo, 94,
2) = 3;
1) = 1;
1) = 1;
#
#
#
#
#
#
#
#
#
#
#
#
’Perl’
’PerlPe’
’PerlPerl’
’PerlPerlP’
’PerlPerlPe’
’PerlPerlPe’
. "\x02"
’PerlPerlPer’
’r’ is "\x72"
’PerlPerlPer’ . "\x0c"
’PerlPerlPer’ . "\x2c"
’PerlPerlPerl’
’l’ is "\x6c"
To transform a bit vector into a string or array of 0‘s and 1‘s, use these:
$bits = unpack("b*", $vector);
@bits = split(//, unpack("b*", $vector));
If you know the exact length in bits, it can be used in place of the *.
wait
Waits for a child process to terminate and returns the pid of the deceased process, or −1 if there
are no child processes. The status is returned in $?.
waitpid PID,FLAGS
Waits for a particular child process to terminate and returns the pid of the deceased process, or
−1 if there is no such child process. The status is returned in $?. If you say
266
Version 5.005_02
18−Oct−1998
perlfunc
Perl Programmers Reference Guide
perlfunc
use POSIX ":sys_wait_h";
#...
waitpid(−1,&WNOHANG);
then you can do a non−blocking wait for any process. Non−blocking wait is available on
machines supporting either the waitpid(2) or wait4(2) system calls. However, waiting for a
particular pid with FLAGS of is implemented everywhere. (Perl emulates the system call by
remembering the status values of processes that have exited but have not been harvested by the
Perl script yet.)
See perlipc for other examples.
wantarray
Returns TRUE if the context of the currently executing subroutine is looking for a list value.
Returns FALSE if the context is looking for a scalar. Returns the undefined value if the context
is looking for no value (void context).
return unless defined wantarray;
my @a = complex_calculation();
return wantarray ? @a : "@a";
# don’t bother doing more
warn LIST
Produces a message on STDERR just like die(), but doesn‘t exit or throw an exception.
If LIST is empty and $@ already contains a value (typically from a previous eval) that value is
used after appending "\t...caught" to $@. This is useful for staying almost, but not
entirely similar to die().
If $@ is empty then the string "Warning: Something‘s wrong" is used.
No message is printed if there is a $SIG{__WARN__} handler installed. It is the handler‘s
responsibility to deal with the message as it sees fit (like, for instance, converting it into a
die()). Most handlers must therefore make arrangements to actually display the warnings that
they are not prepared to deal with, by calling warn() again in the handler. Note that this is
quite safe and will not produce an endless loop, since __WARN__ hooks are not called from
inside one.
You will find this behavior is slightly different from that of $SIG{__DIE__} handlers (which
don‘t suppress the error text, but can instead call die() again to change it).
Using a __WARN__ handler provides a powerful way to silence all warnings (even the so−called
mandatory ones). An example:
# wipe out *all* compile−time warnings
BEGIN { $SIG{’__WARN__’} = sub { warn $_[0] if $DOWARN } }
my $foo = 10;
my $foo = 20;
# no warning about duplicate my $foo,
# but hey, you asked for it!
# no compile−time or run−time warnings before here
$DOWARN = 1;
# run−time warnings enabled after here
warn "\$foo is alive and $foo!";
# does show up
See perlvar for details on setting %SIG entries, and for more examples.
write FILEHANDLE
write EXPR
write
Writes a formatted record (possibly multi−line) to the specified FILEHANDLE, using the format
associated with that file. By default the format for a file is the one having the same name as the
filehandle, but the format for the current output channel (see the select() function) may be
18−Oct−1998
Version 5.005_02
267
perlfunc
Perl Programmers Reference Guide
perlfunc
set explicitly by assigning the name of the format to the $~ variable.
Top of form processing is handled automatically: if there is insufficient room on the current
page for the formatted record, the page is advanced by writing a form feed, a special
top−of−page format is used to format the new page header, and then the record is written. By
default the top−of−page format is the name of the filehandle with "_TOP" appended, but it may
be dynamically set to the format of your choice by assigning the name to the $^ variable while
the filehandle is selected. The number of lines remaining on the current page is in variable $−,
which can be set to to force a new page.
If FILEHANDLE is unspecified, output goes to the current default output channel, which starts
out as STDOUT but may be changed by the select() operator. If the FILEHANDLE is an
EXPR, then the expression is evaluated and the resulting string is used to look up the name of the
FILEHANDLE at run time. For more on formats, see perlform.
Note that write is NOT the opposite of read(). Unfortunately.
y///
268
The transliteration operator. Same as tr///. See perlop.
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
NAME
perlvar − Perl predefined variables
DESCRIPTION
Predefined Names
The following names have special meaning to Perl. Most punctuation names have reasonable mnemonics,
or analogues in one of the shells. Nevertheless, if you wish to use long variable names, you just need to say
use English;
at the top of your program. This will alias all the short names to the long names in the current package.
Some even have medium names, generally borrowed from awk.
To go a step further, those variables that depend on the currently selected filehandle may instead (and
preferably) be set by calling an object method on the FileHandle object. (Summary lines below for this
contain the word HANDLE.) First you must say
use FileHandle;
after which you may use either
method HANDLE EXPR
or more safely,
HANDLE−>method(EXPR)
Each of the methods returns the old value of the FileHandle attribute. The methods each take an optional
EXPR, which if supplied specifies the new value for the FileHandle attribute in question. If not supplied,
most of the methods do nothing to the current value, except for autoflush(), which will assume a 1 for
you, just to be different.
A few of these variables are considered "read−only". This means that if you try to assign to this variable,
either directly or indirectly through a reference, you‘ll raise a run−time exception.
The following list is ordered by scalar variables first, then the arrays, then the hashes (except $^M was added
in the wrong place). This is somewhat obscured by the fact that %ENV and %SIG are listed as
$ENV{expr} and $SIG{expr}.
$ARG
$_
The default input and pattern−searching space. The following pairs are equivalent:
while (<>) {...}
# equivalent in only while!
while (defined($_ = <>)) {...}
/^Subject:/
$_ =~ /^Subject:/
tr/a−z/A−Z/
$_ =~ tr/a−z/A−Z/
chop
chop($_)
Here are the places where Perl will assume $_ even if you don‘t use it:
Various unary functions, including functions like ord() and int(), as well as the all file
tests (−f, −d) except for −t, which defaults to STDIN.
Various list functions like print() and unlink().
18−Oct−1998
Version 5.005_02
269
perlvar
Perl Programmers Reference Guide
perlvar
The pattern matching operations m//, s///, and tr/// when used without an =~
operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The default place to put an input record when a operation‘s result is tested by itself as
the sole criterion of a while test. Note that outside of a while test, this will not happen.
(Mnemonic: underline is understood in certain operations.)
$
Contains the subpattern from the corresponding set of parentheses in the last pattern matched,
not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like
\digits.) These variables are all read−only.
$MATCH
$&
The string matched by the last successful pattern match (not counting any matches hidden within
a BLOCK or eval() enclosed by the current BLOCK). (Mnemonic: like & in some editors.)
This variable is read−only.
$PREMATCH
$‘
The string preceding whatever was matched by the last successful pattern match (not counting
any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: ‘
often precedes a quoted string.) This variable is read−only.
$POSTMATCH
$’
The string following whatever was matched by the last successful pattern match (not counting
any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).
(Mnemonic: ’ often follows a quoted string.) Example:
$_ = ’abcdefghi’;
/def/;
print "$‘:$&:$’\n";
# prints abc:def:ghi
This variable is read−only.
$LAST_PAREN_MATCH
$+
The last bracket matched by the last search pattern. This is useful if you don‘t know which of a
set of alternative patterns matched. For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
(Mnemonic: be positive and forward looking.) This variable is read−only.
$MULTILINE_MATCHING
$*
Set to 1 to do multi−line matching within a string, 0 to tell Perl that it can assume that strings
contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings
containing multiple newlines can produce confusing results when "$*" is 0. Default is 0.
(Mnemonic: * matches multiple things.) Note that this variable influences the interpretation of
only "^" and "$". A literal newline can be searched for even when $* == 0.
Use of "$*" is deprecated in modern Perls, supplanted by the /s and /m modifiers on pattern
matching.
input_line_number HANDLE EXPR
$INPUT_LINE_NUMBER
$NR
$.
The current input line number for the last file handle from which you read (or performed a seek
or tell on). An explicit close on a filehandle resets the line number. Because "<>" never does
an explicit close, line numbers increase across ARGV files (but see examples under eof()).
270
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
Localizing $. has the effect of also localizing Perl‘s notion of "the last read filehandle".
(Mnemonic: many programs use "." to mean the current line number.)
input_record_separator HANDLE EXPR
$INPUT_RECORD_SEPARATOR
$RS
$/
The input record separator, newline by default. Works like awk‘s RS variable, including treating
empty lines as delimiters if set to the null string. (Note: An empty line cannot contain any spaces
or tabs.) You may set it to a multi−character string to match a multi−character delimiter, or to
undef to read to end of file. Note that setting it to "\n\n" means something slightly different
than setting it to "", if the file contains consecutive empty lines. Setting it to "" will treat two
or more consecutive empty lines as a single empty line. Setting it to "\n\n" will blindly
assume that the next input character belongs to the next paragraph, even if it‘s a newline.
(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
undef $/;
$_ = ;
s/\n[ \t]+/ /g;
# whole file now here
Remember: the value of $/ is a string, not a regexp. AWK has to be better for something :−)
Setting $/ to a reference to an integer, scalar containing an integer, or scalar that‘s convertable
to an integer will attempt to read records instead of lines, with the maximum record size being
the referenced integer. So this:
$/ = \32768; # or \"32768", or \$var_containing_32768
open(FILE, $myfile);
$_ = ;
will read a record of no more than 32768 bytes from FILE. If you‘re not reading from a
record−oriented file (or your OS doesn‘t have record−oriented files), then you‘ll likely get a full
chunk of data with every read. If a record is larger than the record size you‘ve set, you‘ll get the
record back in pieces.
On VMS, record reads are done with the equivalent of sysread, so it‘s best not to mix record
and non−record reads on the same file. (This is likely not a problem, as any file you‘d want to
read in record mode is proably usable in line mode) Non−VMS systems perform normal I/O, so
it‘s safe to mix record and non−record reads of a file.
autoflush HANDLE EXPR
$OUTPUT_AUTOFLUSH
$|
If set to nonzero, forces a flush right away and after every write or print on the currently selected
output channel. Default is 0 (regardless of whether the channel is actually buffered by the
system or not; $| tells you only whether you‘ve asked Perl explicitly to flush after each write).
Note that STDOUT will typically be line buffered if output is to the terminal and block buffered
otherwise. Setting this variable is useful primarily when you are outputting to a pipe, such as
when you are running a Perl script under rsh and want to see the output as it‘s happening. This
has no effect on input buffering. (Mnemonic: when you want your pipes to be piping hot.)
output_field_separator HANDLE EXPR
$OUTPUT_FIELD_SEPARATOR
$OFS
$,
The output field separator for the print operator. Ordinarily the print operator simply prints out
the comma−separated fields you specify. To get behavior more like awk, set this variable as you
would set awk‘s OFS variable to specify what is printed between fields. (Mnemonic: what is
printed when there is a , in your print statement.)
18−Oct−1998
Version 5.005_02
271
perlvar
Perl Programmers Reference Guide
perlvar
output_record_separator HANDLE EXPR
$OUTPUT_RECORD_SEPARATOR
$ORS
$\
The output record separator for the print operator. Ordinarily the print operator simply prints out
the comma−separated fields you specify, with no trailing newline or record separator assumed.
To get behavior more like awk, set this variable as you would set awk‘s ORS variable to specify
what is printed at the end of the print. (Mnemonic: you set "$\" instead of adding \n at the end
of the print. Also, it‘s just like $/, but it‘s what you get "back" from Perl.)
$LIST_SEPARATOR
$"
This is like "$," except that it applies to array values interpolated into a double−quoted string
(or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)
$SUBSCRIPT_SEPARATOR
$SUBSEP
$;
The subscript separator for multidimensional array emulation. If you refer to a hash element as
$foo{$a,$b,$c}
it really means
$foo{join($;, $a, $b, $c)}
But don‘t put
@foo{$a,$b,$c}
# a slice−−note the @
which means
($foo{$a},$foo{$b},$foo{$c})
Default is "\034", the same as SUBSEP in awk. Note that if your keys contain binary data there
might not be any safe value for "$;". (Mnemonic: comma (the syntactic subscript separator) is
a semi−semicolon. Yeah, I know, it‘s pretty lame, but "$," is already taken for something more
important.)
Consider using "real" multidimensional arrays.
$OFMT
$#
The output format for printed numbers. This variable is a half−hearted attempt to emulate awk‘s
OFMT variable. There are times, however, when awk and Perl have differing notions of what is
in fact numeric. The initial value is %.ng, where n is the value of the macro DBL_DIG from
your system‘s float.h. This is different from awk‘s default OFMT setting of %.6g, so you need
to set "$#" explicitly to get awk‘s value. (Mnemonic: # is the number sign.)
Use of "$#" is deprecated.
format_page_number HANDLE EXPR
$FORMAT_PAGE_NUMBER
$%
The current page number of the currently selected output channel. (Mnemonic: % is page
number in nroff.)
format_lines_per_page HANDLE EXPR
$FORMAT_LINES_PER_PAGE
$=
The current page length (printable lines) of the currently selected output channel. Default is 60.
(Mnemonic: = has horizontal lines.)
format_lines_left HANDLE EXPR
$FORMAT_LINES_LEFT
$−
The number of lines left on the page of the currently selected output channel. (Mnemonic:
lines_on_page − lines_printed.)
272
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
format_name HANDLE EXPR
$FORMAT_NAME
$~
The name of the current report format for the currently selected output channel. Default is name
of the filehandle. (Mnemonic: brother to "$^".)
format_top_name HANDLE EXPR
$FORMAT_TOP_NAME
$^
The name of the current top−of−page format for the currently selected output channel. Default is
name of the filehandle with _TOP appended. (Mnemonic: points to top of page.)
format_line_break_characters HANDLE EXPR
$FORMAT_LINE_BREAK_CHARACTERS
$:
The current set of characters after which a string may be broken to fill continuation fields
(starting with ^) in a format. Default is " \n−", to break on whitespace or hyphens. (Mnemonic:
a "colon" in poetry is a part of a line.)
format_formfeed HANDLE EXPR
$FORMAT_FORMFEED
$^L
What formats output to perform a form feed. Default is \f.
$ACCUMULATOR
$^A
The current value of the write() accumulator for format() lines. A format contains
formline() commands that put their result into $^A. After calling its format, write()
prints out the contents of $^A and empties. So you never actually see the contents of $^A unless
you call formline() yourself and then look at it. See perlform and formline().
$CHILD_ERROR
$?
The status returned by the last pipe close, backtick (‘‘) command, or system() operator.
Note that this is the status word returned by the wait() system call (or else is made up to look
like it). Thus, the exit value of the subprocess is actually ($? >> 8), and $? & 127 gives
which signal, if any, the process died from, and $? & 128 reports whether there was a core
dump. (Mnemonic: similar to sh and ksh.)
Additionally, if the h_errno variable is supported in C, its value is returned via $? if any of
the gethost*() functions fail.
Note that if you have installed a signal handler for SIGCHLD, the value of $? will usually be
wrong outside that handler.
Inside an END subroutine $? contains the value that is going to be given to exit(). You can
modify $? in an END subroutine to change the exit status of the script.
Under VMS, the pragma use vmsish ‘status’ makes $? reflect the actual VMS exit
status, instead of the default emulation of POSIX status.
Also see Error Indicators.
$OS_ERROR
$ERRNO
$!
If used in a numeric context, yields the current value of errno, with all the usual caveats. (This
means that you shouldn‘t depend on the value of $! to be anything in particular unless you‘ve
gotten a specific error return indicating a system error.) If used in a string context, yields the
corresponding system error string. You can assign to $! to set errno if, for instance, you want
"$!" to return the string for error n, or you want to set the exit value for the die() operator.
(Mnemonic: What just went bang?)
Also see Error Indicators.
18−Oct−1998
Version 5.005_02
273
perlvar
Perl Programmers Reference Guide
perlvar
$EXTENDED_OS_ERROR
$^E
Error information specific to the current operating system. At the moment, this differs from $!
under only VMS, OS/2, and Win32 (and for MacPerl). On all other platforms, $^E is always
just the same as $!.
Under VMS, $^E provides the VMS status value from the last system error. This is more
specific information about the last system error than that provided by $!. This is particularly
important when $! is set to EVMSERR.
Under OS/2, $^E is set to the error code of the last call to OS/2 API either via CRT, or directly
from perl.
Under Win32, $^E always returns the last error information reported by the Win32 call
GetLastError() which describes the last error from within the Win32 API. Most
Win32−specific code will report errors via $^E. ANSI C and UNIX−like calls set errno and
so most portable Perl code will report errors via $!.
Caveats mentioned in the description of $! generally apply to $^E, also. (Mnemonic: Extra
error explanation.)
Also see Error Indicators.
$EVAL_ERROR
$@
The Perl syntax error message from the last eval() command. If null, the last eval() parsed
and executed correctly (although the operations you invoked may have failed in the normal
fashion). (Mnemonic: Where was the syntax error "at"?)
Note that warning messages are not collected in this variable. You can, however, set up a routine
to process warnings by setting $SIG{__WARN__} as described below.
Also see Error Indicators.
$PROCESS_ID
$PID
$$
The process number of the Perl running this script. (Mnemonic: same as shells.)
$REAL_USER_ID
$UID
$<
The real uid of this process. (Mnemonic: it‘s the uid you came FROM, if you‘re running setuid.)
$EFFECTIVE_USER_ID
$EUID
$
The effective uid of this process. Example:
$< = $>;
($<,$>) = ($>,$<);
# set real to effective uid
# swap real and effective uid
(Mnemonic: it‘s the uid you went TO, if you‘re running setuid.) Note: "$<" and "$>" can be
swapped only on machines supporting setreuid().
$REAL_GROUP_ID
$GID
$(
The real gid of this process. If you are on a machine that supports membership in multiple
groups simultaneously, gives a space separated list of groups you are in. The first number is the
one returned by getgid(), and the subsequent ones by getgroups(), one of which may be
the same as the first number.
However, a value assigned to "$(" must be a single number used to set the real gid. So the
value given by "$(" should not be assigned back to "$(" without being forced numeric, such as
by adding zero.
274
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
(Mnemonic: parentheses are used to GROUP things. The real gid is the group you LEFT, if
you‘re running setgid.)
$EFFECTIVE_GROUP_ID
$EGID
$)
The effective gid of this process. If you are on a machine that supports membership in multiple
groups simultaneously, gives a space separated list of groups you are in. The first number is the
one returned by getegid(), and the subsequent ones by getgroups(), one of which may
be the same as the first number.
Similarly, a value assigned to "$)" must also be a space−separated list of numbers. The first
number is used to set the effective gid, and the rest (if any) are passed to setgroups(). To
get the effect of an empty list for setgroups(), just repeat the new effective gid; that is, to
force an effective gid of 5 and an effectively empty setgroups() list, say $) = "5 5" .
(Mnemonic: parentheses are used to GROUP things. The effective gid is the group that‘s RIGHT
for you, if you‘re running setgid.)
Note: "$<", "$>", "$(" and "$)" can be set only on machines that support the corresponding
set[re][ug]id() routine. "$(" and "$)" can be swapped only on machines supporting
setregid().
$PROGRAM_NAME
$0
Contains the name of the file containing the Perl script being executed. On some operating
systems assigning to "$0" modifies the argument area that the ps(1) program sees. This is more
useful as a way of indicating the current program state than it is for hiding the program you‘re
running. (Mnemonic: same as sh and ksh.)
$[
The index of the first element in an array, and of the first character in a substring. Default is 0,
but you could set it to 1 to make Perl behave more like awk (or Fortran) when subscripting and
when evaluating the index() and substr() functions. (Mnemonic: [ begins subscripts.)
As of Perl 5, assignment to "$[" is treated as a compiler directive, and cannot influence the
behavior of any other file. Its use is discouraged.
$PERL_VERSION
$]
The version + patchlevel / 1000 of the Perl interpreter. This variable can be used to determine
whether the Perl interpreter executing a script is in the right range of versions. (Mnemonic: Is
this version of perl in the right bracket?) Example:
warn "No checksumming!\n" if $] < 3.019;
See also the documentation of use VERSION and require VERSION for a convenient way
to fail if the Perl interpreter is too old.
$DEBUGGING
$^D
The current value of the debugging flags. (Mnemonic: value of −D switch.)
$SYSTEM_FD_MAX
$^F
The maximum system file descriptor, ordinarily 2. System file descriptors are passed to
exec()ed processes, while higher file descriptors are not. Also, during an open(), system
file descriptors are preserved even if the open() fails. (Ordinary file descriptors are closed
before the open() is attempted.) Note that the close−on−exec status of a file descriptor will be
decided according to the value of $^F at the time of the open, not the time of the exec.
$^H
The current set of syntax checks enabled by use strict and other block scoped compiler
hints. See the documentation of strict for more details.
$INPLACE_EDIT
$^I
The current value of the inplace−edit extension.
(Mnemonic: value of −i switch.)
18−Oct−1998
Version 5.005_02
Use undef to disable inplace editing.
275
perlvar
$^M
Perl Programmers Reference Guide
perlvar
By default, running out of memory it is not trappable. However, if compiled for this, Perl may
use the contents of $^M as an emergency pool after die()ing with this message. Suppose that
your Perl were compiled with −DPERL_EMERGENCY_SBRK and used Perl‘s malloc. Then
$^M = ’a’ x (1<<16);
would allocate a 64K buffer for use when in emergency. See the INSTALL file for information
on how to enable this option. As a disincentive to casual use of this advanced feature, there is no
English long name for this variable.
$OSNAME
$^O
The name of the operating system under which this copy of Perl was built, as determined during
the configuration process. The value is identical to $Config{‘osname‘}.
$PERLDB
$^P
The internal variable for debugging support. Different bits mean the following (subject to
change):
0x01
Debug subroutine enter/exit.
0x02
Line−by−line debugging.
0x04
Switch off optimizations.
0x08
Preserve more data for future interactive inspections.
0x10
Keep info about source lines on which a subroutine is defined.
0x20
Start with single−step on.
Note that some bits may be relevent at compile−time only, some at run−time only. This is a new
mechanism and the details may change.
$^R
The result of evaluation of the last successful (?{ code }) regular expression assertion.
(Excluding those used as switches.) May be written to.
$^S
Current state of the interpreter. Undefined if parsing of the current module/eval is not finished
(may happen in $SIG{__DIE__} and $SIG{__WARN__} handlers). True if inside an eval,
otherwise false.
$BASETIME
$^T
The time at which the script began running, in seconds since the epoch (beginning of 1970). The
values returned by the −M, −A, and −C filetests are based on this value.
$WARNING
$^W
The current value of the warning switch, either TRUE or FALSE. (Mnemonic: related to the −w
switch.)
$EXECUTABLE_NAME
$^X
The name that the Perl binary itself was executed as, from C‘s argv[0].
$ARGV
contains the name of the current file when reading from <>.
@ARGV The array @ARGV contains the command line arguments intended for the script. Note that
$#ARGV is the generally number of arguments minus one, because $ARGV[0] is the first
argument, NOT the command name. See "$0" for the command name.
@INC
276
The array @INC contains the list of places to look for Perl scripts to be evaluated by the do
EXPR, require, or use constructs. It initially consists of the arguments to any −I command
line switches, followed by the default Perl library, probably /usr/local/lib/perl, followed by ".",
to represent the current directory. If you need to modify this at runtime, you should use the use
lib pragma to get the machine−dependent library properly loaded also:
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
use lib ’/mypath/libdir/’;
use SomeMod;
@_
Within a subroutine the array @_ contains the parameters passed to that subroutine. See perlsub.
%INC
The hash %INC contains entries for each filename that has been included via do or require.
The key is the filename you specified, and the value is the location of the file actually found. The
require command uses this array to determine whether a given file has already been included.
%ENV $ENV{expr}
The hash %ENV contains your current environment. Setting a value in ENV changes the
environment for child processes.
%SIG $SIG{expr}
The hash %SIG is used to set signal handlers for various signals. Example:
sub handler {
# 1st argument is signal name
my($sig) = @_;
print "Caught a SIG$sig−−shutting down\n";
close(LOG);
exit(0);
}
$SIG{’INT’} = \&handler;
$SIG{’QUIT’} = \&handler;
...
$SIG{’INT’} = ’DEFAULT’;
$SIG{’QUIT’} = ’IGNORE’;
# restore default action
# ignore SIGQUIT
The %SIG array contains values for only the signals actually set within the Perl script. Here are
some other examples:
$SIG{"PIPE"}
$SIG{"PIPE"}
$SIG{"PIPE"}
$SIG{"PIPE"}
=
=
=
=
Plumber;
"Plumber";
\&Plumber;
Plumber();
#
#
#
#
SCARY!!
assumes main::Plumber (not recommended)
just fine; assume current Plumber
oops, what did Plumber() return??
The one marked scary is problematic because it‘s a bareword, which means sometimes it‘s a
string representing the function, and sometimes it‘s going to call the subroutine call right then
and there! Best to be sure and quote it or take a reference to it. *Plumber works too. See
perlsub.
If your system has the sigaction() function then signal handlers are installed using it. This
means you get reliable signal handling. If your system has the SA_RESTART flag it is used
when signals handlers are installed. This means that system calls for which it is supported
continue rather than returning when a signal arrives. If you want your system calls to be
interrupted by signal delivery then do something like this:
use POSIX ’:signal_h’;
my $alarm = 0;
sigaction SIGALRM, new POSIX::SigAction sub { $alarm = 1 }
or die "Error setting SIGALRM handler: $!\n";
See POSIX.
Certain internal hooks can be also set using the %SIG hash. The routine indicated by
$SIG{__WARN__} is called when a warning message is about to be printed. The warning
message is passed as the first argument. The presence of a __WARN__ hook causes the
ordinary printing of warnings to STDERR to be suppressed. You can use this to save warnings
in a variable, or turn warnings into fatal errors, like this:
18−Oct−1998
Version 5.005_02
277
perlvar
Perl Programmers Reference Guide
perlvar
local $SIG{__WARN__} = sub { die $_[0] };
eval $proggie;
The routine indicated by $SIG{__DIE__} is called when a fatal exception is about to be
thrown. The error message is passed as the first argument. When a __DIE__ hook routine
returns, the exception processing continues as it would have in the absence of the hook, unless
the hook routine itself exits via a goto, a loop exit, or a die(). The __DIE__ handler is
explicitly disabled during the call, so that you can die from a __DIE__ handler. Similarly for
__WARN__.
Note that the $SIG{__DIE__} hook is called even inside eval()ed blocks/strings. See die
and $^S for how to circumvent this.
Note that __DIE__/__WARN__ handlers are very special in one respect: they may be called to
report (probable) errors found by the parser. In such a case the parser may be in inconsistent
state, so any attempt to evaluate Perl code from such a handler will probably result in a segfault.
This means that calls which result/may−result in parsing Perl should be used with extreme
causion, like this:
require Carp if defined $^S;
Carp::confess("Something wrong") if defined &Carp::confess;
die "Something wrong, but could not load Carp to give backtrace...
To see backtrace try starting Perl with −MCarp switch";
Here the first line will load Carp unless it is the parser who called the handler. The second line
will print backtrace and die if Carp was available. The third line will be executed only if Carp
was not available.
See die, warn and eval for additional info.
Error Indicators
The variables $@, $!, $^E, and $? contain information about different types of error conditions that may
appear during execution of Perl script. The variables are shown ordered by the "distance" between the
subsystem which reported the error and the Perl process, and correspond to errors detected by the Perl
interpreter, C library, operating system, or an external program, respectively.
To illustrate the differences between these variables, consider the following Perl expression:
eval ’
open PIPE, "/cdrom/install |";
@res = ;
close PIPE or die "bad pipe: $?, $!";
’;
After execution of this statement all 4 variables may have been set.
$@ is set if the string to be eval−ed did not compile (this may happen if open or close were imported
with bad prototypes), or if Perl code executed during evaluation die()d (either implicitly, say, if open
was imported from module Fatal, or the die after close was triggered). In these cases the value of $@ is
the compile error, or Fatal error (which will interpolate $!!), or the argument to die (which will
interpolate $! and $?!).
When the above expression is executed, open(), , and close are translated to C run−time library
calls. $! is set if one of these calls fails. The value is a symbolic indicator chosen by the C run−time
library, say No such file or directory.
On some systems the above C library calls are further translated to calls to the kernel. The kernel may have
set more verbose error indicator that one of the handful of standard C errors. In such cases $^E contains
this verbose error indicator, which may be, say, CDROM tray not closed. On systems where C library
calls are identical to system calls $^E is a duplicate of $!.
278
Version 5.005_02
18−Oct−1998
perlvar
Perl Programmers Reference Guide
perlvar
Finally, $? may be set to non− value if the external program /cdrom/install fails. Upper bits of the
particular value may reflect specific error conditions encountered by this program (this is
program−dependent), lower−bits reflect mode of failure (segfault, completion, etc.). Note that in contrast to
$@, $!, and $^E, which are set only if error condition is detected, the variable $? is set on each wait or
pipe close, overwriting the old value.
For more details, see the individual descriptions at $@, $!, $^E, and $?.
18−Oct−1998
Version 5.005_02
279
perlsub
Perl Programmers Reference Guide
perlsub
NAME
perlsub − Perl subroutines
SYNOPSIS
To declare subroutines:
sub NAME;
sub NAME(PROTO);
# A "forward" declaration.
# ditto, but with prototypes
sub NAME BLOCK
# A declaration and a definition.
sub NAME(PROTO) BLOCK # ditto, but with prototypes
To define an anonymous subroutine at runtime:
$subref = sub BLOCK;
$subref = sub (PROTO) BLOCK;
# no proto
# with proto
To import subroutines:
use PACKAGE qw(NAME1 NAME2 NAME3);
To call subroutines:
NAME(LIST);
NAME LIST;
&NAME;
# & is optional with parentheses.
# Parentheses optional if predeclared/imported.
# Makes current @_ visible to called subroutine.
DESCRIPTION
Like many languages, Perl provides for user−defined subroutines. These may be located anywhere in the
main program, loaded in from other files via the do, require, or use keywords, or even generated on the
fly using eval or anonymous subroutines (closures). You can even call a function indirectly using a
variable containing its name or a CODE reference to it.
The Perl model for function call and return values is simple: all functions are passed as parameters one single
flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or
hashes in these call and return lists will collapse, losing their identities—but you may always use
pass−by−reference instead to avoid this. Both call and return lists may contain as many or as few scalar
elements as you‘d like. (Often a function without an explicit return statement is called a subroutine, but
there‘s really no difference from the language‘s perspective.)
Any arguments passed to the routine come in as the array @_. Thus if you called a function with two
arguments, those would be stored in $_[0] and $_[1]. The array @_ is a local array, but its elements are
aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding
argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element
which did not exist when the function was called, that element is created only when (and if) it is modified or
if a reference to it is taken. (Some earlier versions of Perl created the element whether or not it was assigned
to.) Note that assigning to the whole array @_ removes the aliasing, and does not update any arguments.
The return value of the subroutine is the value of the last expression evaluated. Alternatively, a return
statement may be used to exit the subroutine, optionally specifying the returned value, which will be
evaluated in the appropriate context (list, scalar, or void) depending on the context of the subroutine call. If
you specify no return value, the subroutine will return an empty list in a list context, an undefined value in a
scalar context, or nothing in a void context. If you return one or more arrays and/or hashes, these will be
flattened together into one large indistinguishable list.
Perl does not have named formal parameters, but in practice all you do is assign to a my() list of these. Any
variables you use in the function that aren‘t declared private are global variables. For the gory details on
creating private variables, see "Private Variables via my()" and "Temporary Values via local()". To
create protected environments for a set of functions in a separate package (and probably a separate file), see
Packages in perlmod.
280
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
Example:
sub max {
my $max = shift(@_);
foreach $foo (@_) {
$max = $foo if $max < $foo;
}
return $max;
}
$bestday = max($mon,$tue,$wed,$thu,$fri);
Example:
# get a line, combining continuation lines
# that start with whitespace
sub get_line {
$thisline = $lookahead; # GLOBAL VARIABLES!!
LINE: while (defined($lookahead = )) {
if ($lookahead =~ /^[ \t]/) {
$thisline .= $lookahead;
}
else {
last LINE;
}
}
$thisline;
}
$lookahead = ;
while ($_ = get_line()) {
...
}
# get first line
Use array assignment to a local list to name your formal arguments:
sub maybeset {
my($key, $value) = @_;
$Foo{$key} = $value unless $Foo{$key};
}
This also has the effect of turning call−by−reference into call−by−value, because the assignment copies the
values. Otherwise a function is free to do in−place modifications of @_ and change its caller‘s values.
upcase_in($v1, $v2); # this changes $v1 and $v2
sub upcase_in {
for (@_) { tr/a−z/A−Z/ }
}
You aren‘t allowed to modify constants in this way, of course. If an argument were actually literal and you
tried to change it, you‘d take a (presumably fatal) exception. For example, this won‘t work:
upcase_in("frederick");
It would be much safer if the upcase_in() function were written to return a copy of its parameters
instead of changing them in place:
($v3, $v4) = upcase($v1, $v2); # this doesn’t
sub upcase {
return unless defined wantarray; # void context, do nothing
my @parms = @_;
18−Oct−1998
Version 5.005_02
281
perlsub
Perl Programmers Reference Guide
perlsub
for (@parms) { tr/a−z/A−Z/ }
return wantarray ? @parms : $parms[0];
}
Notice how this (unprototyped) function doesn‘t care whether it was passed real scalars or arrays. Perl will
see everything as one big long flat @_ parameter list. This is one of the ways where Perl‘s simple
argument−passing style shines. The upcase() function would work perfectly well without changing the
upcase() definition even if we fed it things like this:
@newlist
@newlist
= upcase(@list1, @list2);
= upcase( split /:/, $var );
Do not, however, be tempted to do this:
(@a, @b)
= upcase(@list1, @list2);
Because like its flat incoming parameter list, the return list is also flat. So all you have managed to do here is
stored everything in @a and made @b an empty list. See Pass by Reference for alternatives.
A subroutine may be called using the "&" prefix. The "&" is optional in modern Perls, and so are the
parentheses if the subroutine has been predeclared. (Note, however, that the "&" is NOT optional when
you‘re just naming the subroutine, such as when it‘s used as an argument to defined() or undef(). Nor
is it optional when you want to do an indirect subroutine call with a subroutine name or reference using the
&$subref() or &{$subref}() constructs. See perlref for more on that.)
Subroutines may be called recursively. If a subroutine is called using the "&" form, the argument list is
optional, and if omitted, no @_ array is set up for the subroutine: the @_ array at the time of the call is visible
to subroutine instead. This is an efficiency mechanism that new users may wish to avoid.
&foo(1,2,3);
foo(1,2,3);
# pass three arguments
# the same
foo();
&foo();
# pass a null list
# the same
&foo;
foo;
# foo() get current args, like foo(@_) !!
# like foo() IFF sub foo predeclared, else "foo"
Not only does the "&" form make the argument list optional, but it also disables any prototype checking on
the arguments you do provide. This is partly for historical reasons, and partly for having a convenient way to
cheat if you know what you‘re doing. See the section on Prototypes below.
Function whose names are in all upper case are reserved to the Perl core, just as are modules whose names
are in all lower case. A function in all capitals is a loosely−held convention meaning it will be called
indirectly by the run−time system itself. Functions that do special, pre−defined things are BEGIN, END,
AUTOLOAD, and DESTROY—plus all the functions mentioned in perltie. The 5.005 release adds INIT to
this list.
Private Variables via my()
Synopsis:
my
my
my
my
$foo;
(@wid, %get);
$foo = "flurp";
@oof = @bar;
#
#
#
#
declare
declare
declare
declare
$foo
list
$foo
@oof
lexically local
of variables local
lexical, and init it
lexical, and init it
A "my" declares the listed variables to be confined (lexically) to the enclosing block, conditional
(if/unless/elsif/else), loop (for/foreach/while/until/continue), subroutine, eval,
or do/require/use‘d file. If more than one value is listed, the list must be placed in parentheses. All
listed elements must be legal lvalues. Only alphanumeric identifiers may be lexically scoped—magical
builtins like $/ must currently be localize with "local" instead.
282
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
Unlike dynamic variables created by the "local" operator, lexical variables declared with "my" are totally
hidden from the outside world, including any called subroutines (even if it‘s the same subroutine called from
itself or elsewhere—every call gets its own copy).
This doesn‘t mean that a my() variable declared in a statically enclosing lexical scope would be invisible.
Only the dynamic scopes are cut off. For example, the bumpx() function below has access to the lexical
$x variable because both the my and the sub occurred at the same scope, presumably the file scope.
my $x = 10;
sub bumpx { $x++ }
(An eval(), however, can see the lexical variables of the scope it is being evaluated in so long as the
names aren‘t hidden by declarations within the eval() itself. See perlref.)
The parameter list to my() may be assigned to if desired, which allows you to initialize your variables. (If
no initializer is given for a particular variable, it is created with the undefined value.) Commonly this is used
to name the parameters to a subroutine. Examples:
$arg = "fred";
# "global" variable
$n = cube_root(27);
print "$arg thinks the root is $n\n";
fred thinks the root is 3
sub cube_root {
my $arg = shift;
$arg **= 1/3;
return $arg;
}
# name doesn’t matter
The "my" is simply a modifier on something you might assign to. So when you do assign to the variables in
its argument list, the "my" doesn‘t change whether those variables are viewed as a scalar or an array. So
my ($foo) = ;
my @FOO = ;
# WRONG?
both supply a list context to the right−hand side, while
my $foo = ;
supplies a scalar context. But the following declares only one variable:
my $foo, $bar = 1;
# WRONG
That has the same effect as
my $foo;
$bar = 1;
The declared variable is not introduced (is not visible) until after the current statement. Thus,
my $x = $x;
can be used to initialize the new $x with the value of the old $x, and the expression
my $x = 123 and $x == 123
is false unless the old $x happened to have the value 123.
Lexical scopes of control structures are not bounded precisely by the braces that delimit their controlled
blocks; control expressions are part of the scope, too. Thus in the loop
while (defined(my $line = <>)) {
$line = lc $line;
} continue {
print $line;
18−Oct−1998
Version 5.005_02
283
perlsub
Perl Programmers Reference Guide
perlsub
}
the scope of $line extends from its declaration throughout the rest of the loop construct (including the
continue clause), but not beyond it. Similarly, in the conditional
if ((my $answer = ) =~ /^yes$/i) {
user_agrees();
} elsif ($answer =~ /^no$/i) {
user_disagrees();
} else {
chomp $answer;
die "’$answer’ is neither ’yes’ nor ’no’";
}
the scope of $answer extends from its declaration throughout the rest of the conditional (including elsif
and else clauses, if any), but not beyond it.
(None of the foregoing applies to if/unless or while/until modifiers appended to simple statements.
Such modifiers are not control structures and have no effect on scoping.)
The foreach loop defaults to scoping its index variable dynamically (in the manner of local; see below).
However, if the index variable is prefixed with the keyword "my", then it is lexically scoped instead. Thus
in the loop
for my $i (1, 2, 3) {
some_function();
}
the scope of $i extends to the end of the loop, but not beyond it, and so the value of $i is unavailable in
some_function().
Some users may wish to encourage the use of lexically scoped variables. As an aid to catching implicit
references to package variables, if you say
use strict ’vars’;
then any variable reference from there to the end of the enclosing block must either refer to a lexical
variable, or must be fully qualified with the package name. A compilation error results otherwise. An inner
block may countermand this with "no strict ‘vars’".
A my() has both a compile−time and a run−time effect. At compile time, the compiler takes notice of it; the
principle usefulness of this is to quiet "use strict ‘vars’". The actual initialization is delayed until
run time, so it gets executed appropriately; every time through a loop, for example.
Variables declared with "my" are not part of any package and are therefore never fully qualified with the
package name. In particular, you‘re not allowed to try to make a package variable (or other global) lexical:
my $pack::var;
my $_;
# ERROR! Illegal syntax
# also illegal (currently)
In fact, a dynamic variable (also known as package or global variables) are still accessible using the fully
qualified :: notation even while a lexical of the same name is also visible:
package main;
local $x = 10;
my
$x = 20;
print "$x and $::x\n";
That will print out 20 and 10.
You may declare "my" variables at the outermost scope of a file to hide any such identifiers totally from the
outside world. This is similar to C‘s static variables at the file level. To do this with a subroutine requires
the use of a closure (anonymous function with lexical access). If a block (such as an eval(), function, or
284
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
package) wants to create a private subroutine that cannot be called from outside that block, it can declare a
lexical variable containing an anonymous sub reference:
my $secret_version = ’1.001−beta’;
my $secret_sub = sub { print $secret_version };
&$secret_sub();
As long as the reference is never returned by any function within the module, no outside module can see the
subroutine, because its name is not in any package‘s symbol table. Remember that it‘s not REALLY called
$some_pack::secret_version or anything; it‘s just $secret_version, unqualified and
unqualifiable.
This does not work with object methods, however; all object methods have to be in the symbol table of some
package to be found.
Peristent Private Variables
Just because a lexical variable is lexically (also called statically) scoped to its enclosing block, eval, or do
FILE, this doesn‘t mean that within a function it works like a C static. It normally works more like a C auto,
but with implicit garbage collection.
Unlike local variables in C or C++, Perl‘s lexical variables don‘t necessarily get recycled just because their
scope has exited. If something more permanent is still aware of the lexical, it will stick around. So long as
something else references a lexical, that lexical won‘t be freed—which is as it should be. You wouldn‘t
want memory being free until you were done using it, or kept around once you were done. Automatic
garbage collection takes care of this for you.
This means that you can pass back or save away references to lexical variables, whereas to return a pointer to
a C auto is a grave error. It also gives us a way to simulate C‘s function statics. Here‘s a mechanism for
giving a function private variables with both lexical scoping and a static lifetime. If you do want to create
something like C‘s static variables, just enclose the whole function in an extra block, and put the static
variable outside the function but in the block.
{
my $secret_val = 0;
sub gimme_another {
return ++$secret_val;
}
}
# $secret_val now becomes unreachable by the outside
# world, but retains its value between calls to gimme_another
If this function is being sourced in from a separate file via require or use, then this is probably just fine.
If it‘s all in the main program, you‘ll need to arrange for the my() to be executed early, either by putting the
whole block above your main program, or more likely, placing merely a BEGIN sub around it to make sure it
gets executed before your program starts to run:
sub BEGIN {
my $secret_val = 0;
sub gimme_another {
return ++$secret_val;
}
}
See Package Constructors and Destructors in perlmod about the BEGIN function.
If declared at the outermost scope, the file scope, then lexicals work someone like C‘s file statics. They are
available to all functions in that same file declared below them, but are inaccessible from outside of the file.
This is sometimes used in modules to create private variables for the whole module.
18−Oct−1998
Version 5.005_02
285
perlsub
Perl Programmers Reference Guide
perlsub
Temporary Values via local()
NOTE: In general, you should be using "my" instead of "local", because it‘s faster and safer. Exceptions
to this include the global punctuation variables, filehandles and formats, and direct manipulation of the Perl
symbol table itself. Format variables often use "local" though, as do other variables whose current value
must be visible to called subroutines.
Synopsis:
local
local
local
local
$foo;
(@wid, %get);
$foo = "flurp";
@oof = @bar;
local *FH;
local *merlyn = *randal;
local *merlyn = ’randal’;
local *merlyn = \$randal;
#
#
#
#
declare
declare
declare
declare
$foo
list
$foo
@oof
dynamically local
of variables local
dynamic, and init it
dynamic, and init it
#
#
#
#
#
localize $FH, @FH, %FH, &FH ...
now $merlyn is really $randal, plus
@merlyn is really @randal, etc
SAME THING: promote ’randal’ to *randal
just alias $merlyn, not @merlyn etc
A local() modifies its listed variables to be "local" to the enclosing block, eval, or do FILE—and to
any subroutine called from within that block. A local() just gives temporary values to global (meaning
package) variables. It does not create a local variable. This is known as dynamic scoping. Lexical scoping
is done with "my", which works more like C‘s auto declarations.
If more than one variable is given to local(), they must be placed in parentheses. All listed elements
must be legal lvalues. This operator works by saving the current values of those variables in its argument list
on a hidden stack and restoring them upon exiting the block, subroutine, or eval. This means that called
subroutines can also reference the local variable, but not the global one. The argument list may be assigned
to if desired, which allows you to initialize your local variables. (If no initializer is given for a particular
variable, it is created with an undefined value.) Commonly this is used to name the parameters to a
subroutine. Examples:
for $i ( 0 .. 9 ) {
$digits{$i} = $i;
}
# assume this function uses global %digits hash
parse_num();
# now temporarily add to %digits hash
if ($base12) {
# (NOTE: not claiming this is efficient!)
local %digits = (%digits, ’t’ => 10, ’e’ => 11);
parse_num(); # parse_num gets this new %digits!
}
# old %digits restored here
Because local() is a run−time command, it gets executed every time through a loop. In releases of Perl
previous to 5.0, this used more stack storage each time until the loop was exited. Perl now reclaims the
space each time through, but it‘s still more efficient to declare your variables outside the loop.
A local is simply a modifier on an lvalue expression. When you assign to a localized variable, the
local doesn‘t change whether its list is viewed as a scalar or an array. So
local($foo) = ;
local @FOO = ;
both supply a list context to the right−hand side, while
local $foo = ;
286
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
supplies a scalar context.
A note about local() and composite types is in order. Something like local(%foo) works by
temporarily placing a brand new hash in the symbol table. The old hash is left alone, but is hidden "behind"
the new one.
This means the old variable is completely invisible via the symbol table (i.e. the hash entry in the *foo
typeglob) for the duration of the dynamic scope within which the local() was seen. This has the effect of
allowing one to temporarily occlude any magic on composite types. For instance, this will briefly alter a tied
hash to some other implementation:
tie %ahash, ’APackage’;
[...]
{
local %ahash;
tie %ahash, ’BPackage’;
[..called code will see %ahash tied to ’BPackage’..]
{
local %ahash;
[..%ahash is a normal (untied) hash here..]
}
}
[..%ahash back to its initial tied self again..]
As another example, a custom implementation of %ENV might look like this:
{
local %ENV;
tie %ENV, ’MyOwnEnv’;
[..do your own fancy %ENV manipulation here..]
}
[..normal %ENV behavior here..]
It‘s also worth taking a moment to explain what happens when you localize a member of a composite type
(i.e. an array or hash element). In this case, the element is localized by name. This means that when the
scope of the local() ends, the saved value will be restored to the hash element whose key was named in
the local(), or the array element whose index was named in the local(). If that element was deleted
while the local() was in effect (e.g. by a delete() from a hash or a shift() of an array), it will
spring back into existence, possibly extending an array and filling in the skipped elements with undef. For
instance, if you say
%hash = ( ’This’ => ’is’, ’a’ => ’test’ );
@ary = ( 0..5 );
{
local($ary[5]) = 6;
local($hash{’a’}) = ’drill’;
while (my $e = pop(@ary)) {
print "$e . . .\n";
last unless $e > 3;
}
if (@ary) {
$hash{’only a’} = ’test’;
delete $hash{’a’};
}
}
print join(’ ’, map { "$_ $hash{$_}" } sort keys %hash),".\n";
print "The array has ",scalar(@ary)," elements: ",
join(’, ’, map { defined $_ ? $_ : ’undef’ } @ary),"\n";
18−Oct−1998
Version 5.005_02
287
perlsub
Perl Programmers Reference Guide
perlsub
Perl will print
6 . . .
4 . . .
3 . . .
This is a test only a test.
The array has 6 elements: 0, 1, 2, undef, undef, 5
Passing Symbol Table Entries (typeglobs)
[Note: The mechanism described in this section was originally the only way to simulate pass−by−reference
in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is
generally easier to work with. See below.]
Sometimes you don‘t want to pass the value of an array to a subroutine but rather the name of it, so that the
subroutine can modify the global copy of it rather than working with a local copy. In perl you can refer to all
objects of a particular name by prefixing the name with a star: *foo. This is often known as a "typeglob",
because the star on the front can be thought of as a wildcard match for all the funny prefix characters on
variables and subroutines and such.
When evaluated, the typeglob produces a scalar value that represents all the objects of that name, including
any filehandle, format, or subroutine. When assigned to, it causes the name mentioned to refer to whatever
"*" value was assigned to it. Example:
sub doubleary {
local(*someary) = @_;
foreach $elem (@someary) {
$elem *= 2;
}
}
doubleary(*foo);
doubleary(*bar);
Note that scalars are already passed by reference, so you can modify scalar arguments without using this
mechanism by referring explicitly to $_[0] etc. You can modify all the elements of an array by passing all
the elements as scalars, but you have to use the * mechanism (or the equivalent reference mechanism) to
push, pop, or change the size of an array. It will certainly be faster to pass the typeglob (or reference).
Even if you don‘t want to modify an array, this mechanism is useful for passing multiple arrays in a single
LIST, because normally the LIST mechanism will merge all the array values so that you can‘t extract out the
individual arrays. For more on typeglobs, see Typeglobs and Filehandles in perldata.
When to Still Use local()
Despite the existence of my(), there are still three places where the local() operator still shines. In fact,
in these three places, you must use local instead of my.
1. You need to give a global variable a temporary value, especially $_.
The global variables, like @ARGV or the punctuation variables, must be localized with local().
This block reads in /etc/motd, and splits it up into chunks separated by lines of equal signs, which are
placed in @Fields.
{
local @ARGV = ("/etc/motd");
local $/ = undef;
local $_ = <>;
@Fields = split /^\s*=+\s*$/;
}
It particular, it‘s important to localize $_ in any routine that assigns to it. Look out for implicit
assignments in while conditionals.
288
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
2. You need to create a local file or directory handle or a local function.
A function that needs a filehandle of its own must use local() uses local() on complete
typeglob. This can be used to create new symbol table entries:
sub ioqueue {
local (*READER, *WRITER);
pipe
(READER, WRITER);
return (*READER, *WRITER);
}
($head, $tail) = ioqueue();
# not my!
or die "pipe: $!";
See the Symbol module for a way to create anonymous symbol table entries.
Because assignment of a reference to a typeglob creates an alias, this can be used to create what is
effectively a local function, or at least, a local alias.
{
local *grow = \&shrink; # only until this block exists
grow();
# really calls shrink()
move();
# if move() grow()s, it shrink()s too
}
grow();
# get the real grow() again
See Function Templates in perlref for more about manipulating functions by name in this way.
3. You want to temporarily change just one element of an array or hash.
You can localize just one element of an aggregate. Usually this is done on dynamics:
{
local $SIG{INT} = ’IGNORE’;
funct();
# uninterruptible
}
# interruptibility automatically restored here
But it also works on lexically declared aggregates. Prior to 5.005, this operation could on occasion
misbehave.
Pass by Reference
If you want to pass more than one array or hash into a function—or return them from it—and have them
maintain their integrity, then you‘re going to have to use an explicit pass−by−reference. Before you do that,
you need to understand references as detailed in perlref. This section may not make much sense to you
otherwise.
Here are a few simple examples. First, let‘s pass in several arrays to a function and have it pop all of then,
return a new list of all their former last elements:
@tailings = popmany ( \@a, \@b, \@c, \@d );
sub popmany {
my $aref;
my @retlist = ();
foreach $aref ( @_ ) {
push @retlist, pop @$aref;
}
return @retlist;
}
Here‘s how you might write a function that returns a list of keys occurring in all the hashes passed to it:
@common = inter( \%foo, \%bar, \%joe );
18−Oct−1998
Version 5.005_02
289
perlsub
Perl Programmers Reference Guide
perlsub
sub inter {
my ($k, $href, %seen); # locals
foreach $href (@_) {
while ( $k = each %$href ) {
$seen{$k}++;
}
}
return grep { $seen{$_} == @_ } keys %seen;
}
So far, we‘re using just the normal list return mechanism. What happens if you want to pass or return a hash?
Well, if you‘re using only one of them, or you don‘t mind them concatenating, then the normal calling
convention is ok, although a little expensive.
Where people get into trouble is here:
(@a, @b) = func(@c, @d);
or
(%a, %b) = func(%c, %d);
That syntax simply won‘t work. It sets just @a or %a and clears the @b or %b. Plus the function didn‘t get
passed into two separate arrays or hashes: it got one long list in @_, as always.
If you can arrange for everyone to deal with this through references, it‘s cleaner code, although not so nice to
look at. Here‘s a function that takes two array references as arguments, returning the two array elements in
order of how many elements they have in them:
($aref, $bref) = func(\@c, \@d);
print "@$aref has more than @$bref\n";
sub func {
my ($cref, $dref) = @_;
if (@$cref > @$dref) {
return ($cref, $dref);
} else {
return ($dref, $cref);
}
}
It turns out that you can actually do this also:
(*a, *b) = func(\@c, \@d);
print "@a has more than @b\n";
sub func {
local (*c, *d) = @_;
if (@c > @d) {
return (\@c, \@d);
} else {
return (\@d, \@c);
}
}
Here we‘re using the typeglobs to do symbol table aliasing. It‘s a tad subtle, though, and also won‘t work if
you‘re using my() variables, because only globals (well, and local()s) are in the symbol table.
If you‘re passing around filehandles, you could usually just use the bare typeglob, like *STDOUT, but
typeglobs references would be better because they‘ll still work properly under use strict ‘refs’.
For example:
splutter(\*STDOUT);
sub splutter {
290
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
my $fh = shift;
print $fh "her um well a hmmm\n";
}
$rec = get_rec(\*STDIN);
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
Another way to do this is using *HANDLE{IO}, see perlref for usage and caveats.
If you‘re planning on generating new filehandles, you could do this:
sub openit {
my $name = shift;
local *FH;
return open (FH, $path) ? *FH : undef;
}
Although that will actually produce a small memory leak. See the bottom of open() for a somewhat
cleaner way using the IO::Handle package.
Prototypes
As of the 5.002 release of perl, if you declare
sub mypush (\@@)
then mypush() takes arguments exactly like push() does. The declaration of the function to be called
must be visible at compile time. The prototype affects only the interpretation of new−style calls to the
function, where new−style is defined as not using the & character. In other words, if you call it like a builtin
function, then it behaves like a builtin function. If you call it like an old−fashioned subroutine, then it
behaves like an old−fashioned subroutine. It naturally falls out from this rule that prototypes have no
influence on subroutine references like \&foo or on indirect subroutine calls like &{$subref}.
Method calls are not influenced by prototypes either, because the function to be called is indeterminate at
compile time, because it depends on inheritance.
Because the intent is primarily to let you define subroutines that work like builtin commands, here are the
prototypes for some other functions that parse almost exactly like the corresponding builtins.
Declared as
sub
sub
sub
sub
sub
sub
sub
sub
sub
sub
sub
sub
sub
sub
mylink ($$)
myvec ($$$)
myindex ($$;$)
mysyswrite ($$$;$)
myreverse (@)
myjoin ($@)
mypop (\@)
mysplice (\@$$@)
mykeys (\%)
myopen (*;$)
mypipe (**)
mygrep (&@)
myrand ($)
mytime ()
Called as
mylink $old, $new
myvec $var, $offset, 1
myindex &getstring, "substr"
mysyswrite $buf, 0, length($buf) − $off, $off
myreverse $a, $b, $c
myjoin ":", $a, $b, $c
mypop @array
mysplice @array, @array, 0, @pushme
mykeys %{$hashref}
myopen HANDLE, $name
mypipe READHANDLE, WRITEHANDLE
mygrep { /foo/ } $a, $b, $c
myrand 42
mytime
Any backslashed prototype character represents an actual argument that absolutely must start with that
character. The value passed to the subroutine (as part of @_) will be a reference to the actual argument given
in the subroutine call, obtained by applying \ to that argument.
18−Oct−1998
Version 5.005_02
291
perlsub
Perl Programmers Reference Guide
perlsub
Unbackslashed prototype characters have special meanings. Any unbackslashed @ or % eats all the rest of
the arguments, and forces list context. An argument represented by $ forces scalar context. An & requires
an anonymous subroutine, which, if passed as the first argument, does not require the "sub" keyword or a
subsequent comma. A * does whatever it has to do to turn the argument into a reference to a symbol table
entry.
A semicolon separates mandatory arguments from optional arguments. (It is redundant before @ or %.)
Note how the last three examples above are treated specially by the parser. mygrep() is parsed as a true list
operator, myrand() is parsed as a true unary operator with unary precedence the same as rand(), and
mytime() is truly without arguments, just like time(). That is, if you say
mytime +2;
you‘ll get mytime() + 2, not mytime(2), which is how it would be parsed without the prototype.
The interesting thing about & is that you can generate new syntax with it:
sub try (&@) {
my($try,$catch) = @_;
eval { &$try };
if ($@) {
local $_ = $@;
&$catch;
}
}
sub catch (&) { $_[0] }
try {
die "phooey";
} catch {
/phooey/ and print "unphooey\n";
};
That prints "unphooey". (Yes, there are still unresolved issues having to do with the visibility of @_. I‘m
ignoring that question for the moment. (But note that if we make @_ lexically scoped, those anonymous
subroutines can act like closures... (Gee, is this sounding a little Lispish? (Never mind.))))
And here‘s a reimplementation of grep:
sub mygrep (&@) {
my $code = shift;
my @result;
foreach $_ (@_) {
push(@result, $_) if &$code;
}
@result;
}
Some folks would prefer full alphanumeric prototypes. Alphanumerics have been intentionally left out of
prototypes for the express purpose of someday in the future adding named, formal parameters. The current
mechanism‘s main goal is to let module writers provide better diagnostics for module users. Larry feels the
notation quite understandable to Perl programmers, and that it will not intrude greatly upon the meat of the
module, nor make it harder to read. The line noise is visually encapsulated into a small pill that‘s easy to
swallow.
It‘s probably best to prototype new functions, not retrofit prototyping into older ones. That‘s because you
must be especially careful about silent impositions of differing list versus scalar contexts. For example, if
you decide that a function should take just one parameter, like this:
292
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
sub func ($) {
my $n = shift;
print "you gave me $n\n";
}
and someone has been calling it with an array or expression returning a list:
func(@foo);
func( split /:/ );
Then you‘ve just supplied an automatic scalar() in front of their argument, which can be more than a bit
surprising. The old @foo which used to hold one thing doesn‘t get passed in. Instead, the func() now
gets passed in 1, that is, the number of elements in @foo. And the split() gets called in a scalar context
and starts scribbling on your @_ parameter list.
This is all very powerful, of course, and should be used only in moderation to make the world a better place.
Constant Functions
Functions with a prototype of () are potential candidates for inlining. If the result after optimization and
constant folding is either a constant or a lexically−scoped scalar which has no other references, then it will
be used in place of function calls made without & or do. Calls made using & or do are never inlined. (See
constant.pm for an easy way to declare most constants.)
The following functions would all be inlined:
sub pi ()
sub PI ()
{ 3.14159 }
{ 4 * atan2 1, 1 }
# Not exact, but close.
# As good as it gets,
# and it’s inlined, too!
sub ST_DEV ()
sub ST_INO ()
{ 0 }
{ 1 }
sub FLAG_FOO ()
sub FLAG_BAR ()
sub FLAG_MASK ()
{ 1 << 8 }
{ 1 << 9 }
{ FLAG_FOO | FLAG_BAR }
sub OPT_BAZ ()
sub BAZ_VAL () {
if (OPT_BAZ) {
return 23;
}
else {
return 42;
}
}
{ not (0x1B58 & FLAG_MASK) }
sub N () { int(BAZ_VAL) / 3 }
BEGIN {
my $prod = 1;
for (1..N) { $prod *= $_ }
sub N_FACTORIAL () { $prod }
}
If you redefine a subroutine that was eligible for inlining, you‘ll get a mandatory warning. (You can use this
warning to tell whether or not a particular subroutine is considered constant.) The warning is considered
severe enough not to be optional because previously compiled invocations of the function will still be using
the old value of the function. If you need to be able to redefine the subroutine you need to ensure that it isn‘t
inlined, either by dropping the () prototype (which changes the calling semantics, so beware) or by
thwarting the inlining mechanism in some other way, such as
sub not_inlined () {
18−Oct−1998
Version 5.005_02
293
perlsub
Perl Programmers Reference Guide
perlsub
23 if $];
}
Overriding Builtin Functions
Many builtin functions may be overridden, though this should be tried only occasionally and for good
reason. Typically this might be done by a package attempting to emulate missing builtin functionality on a
non−Unix system.
Overriding may be done only by importing the name from a module—ordinary predeclaration isn‘t good
enough. However, the subs pragma (compiler directive) lets you, in effect, predeclare subs via the import
syntax, and these names may then override the builtin ones:
use subs ’chdir’, ’chroot’, ’chmod’, ’chown’;
chdir $somewhere;
sub chdir { ... }
To unambiguously refer to the builtin form, one may precede the builtin name with the special package
qualifier CORE::. For example, saying CORE::open() will always refer to the builtin open(), even if
the current package has imported some other subroutine called &open() from elsewhere.
Library modules should not in general export builtin names like "open" or "chdir" as part of their default
@EXPORT list, because these may sneak into someone else‘s namespace and change the semantics
unexpectedly. Instead, if the module adds the name to the @EXPORT_OK list, then it‘s possible for a user to
import the name explicitly, but not implicitly. That is, they could say
use Module ’open’;
and it would import the open override, but if they said
use Module;
they would get the default imports without the overrides.
The foregoing mechanism for overriding builtins is restricted, quite deliberately, to the package that requests
the import. There is a second method that is sometimes applicable when you wish to override a builtin
everywhere, without regard to namespace boundaries. This is achieved by importing a sub into the special
namespace CORE::GLOBAL::. Here is an example that quite brazenly replaces the glob operator with
something that understands regular expressions.
package REGlob;
require Exporter;
@ISA = ’Exporter’;
@EXPORT_OK = ’glob’;
sub import {
my $pkg = shift;
return unless @_;
my $sym = shift;
my $where = ($sym =~ s/^GLOBAL_// ? ’CORE::GLOBAL’ : caller(0));
$pkg−>export($where, $sym, @_);
}
sub glob {
my $pat = shift;
my @got;
local(*D);
if (opendir D, ’.’) { @got = grep /$pat/, readdir D; closedir D; }
@got;
}
1;
294
Version 5.005_02
18−Oct−1998
perlsub
Perl Programmers Reference Guide
perlsub
And here‘s how it could be (ab)used:
#use REGlob ’GLOBAL_glob’;
package Foo;
use REGlob ’glob’;
print for <^[a−z_]+\.pm\$>;
# override glob() in ALL namespaces
# override glob() in Foo:: only
# show all pragmatic modules
Note that the initial comment shows a contrived, even dangerous example. By overriding glob globally,
you would be forcing the new (and subversive) behavior for the glob operator for every namespace,
without the complete cognizance or cooperation of the modules that own those namespaces. Naturally, this
should be done with extreme caution—if it must be done at all.
The REGlob example above does not implement all the support needed to cleanly override perl‘s glob
operator. The builtin glob has different behaviors depending on whether it appears in a scalar or list
context, but our REGlob doesn‘t. Indeed, many perl builtins have such context sensitive behaviors, and
these must be adequately supported by a properly written override. For a fully functional example of
overriding glob, study the implementation of File::DosGlob in the standard library.
Autoloading
If you call a subroutine that is undefined, you would ordinarily get an immediate fatal error complaining that
the subroutine doesn‘t exist. (Likewise for subroutines being used as methods, when the method doesn‘t
exist in any base class of the class package.) If, however, there is an AUTOLOAD subroutine defined in the
package or packages that were searched for the original subroutine, then that AUTOLOAD subroutine is called
with the arguments that would have been passed to the original subroutine. The fully qualified name of the
original subroutine magically appears in the $AUTOLOAD variable in the same package as the AUTOLOAD
routine. The name is not passed as an ordinary argument because, er, well, just because, that‘s why...
Most AUTOLOAD routines will load in a definition for the subroutine in question using eval, and then execute
that subroutine using a special form of "goto" that erases the stack frame of the AUTOLOAD routine without a
trace. (See the standard AutoLoader module, for example.) But an AUTOLOAD routine can also just
emulate the routine and never define it. For example, let‘s pretend that a function that wasn‘t defined should
just call system() with those arguments. All you‘d do is this:
sub AUTOLOAD {
my $program = $AUTOLOAD;
$program =~ s/.*:://;
system($program, @_);
}
date();
who(’am’, ’i’);
ls(’−l’);
In fact, if you predeclare the functions you want to call that way, you don‘t even need the parentheses:
use subs qw(date who ls);
date;
who "am", "i";
ls −l;
A more complete example of this is the standard Shell module, which can treat undefined subroutine calls as
calls to Unix programs.
Mechanisms are available for modules writers to help split the modules up into autoloadable files. See the
standard AutoLoader module described in AutoLoader and in AutoSplit, the standard SelfLoader modules in
SelfLoader, and the document on adding C functions to perl code in perlxs.
SEE ALSO
See perlref for more about references and closures. See perlxs if you‘d like to learn about calling C
subroutines from perl. See perlmod to learn about bundling up your functions in separate files.
18−Oct−1998
Version 5.005_02
295
perlmod
Perl Programmers Reference Guide
perlmod
NAME
perlmod − Perl modules (packages and symbol tables)
DESCRIPTION
Packages
Perl provides a mechanism for alternative namespaces to protect packages from stomping on each other‘s
variables. In fact, there‘s really no such thing as a global variable in Perl (although some identifiers default
to the main package instead of the current one). The package statement declares the compilation unit as
being in the given namespace. The scope of the package declaration is from the declaration itself through the
end of the enclosing block, eval, sub, or end of file, whichever comes first (the same scope as the my()
and local() operators). All further unqualified dynamic identifiers will be in this namespace. A package
statement only affects dynamic variables—including those you‘ve used local() on—but not lexical
variables created with my(). Typically it would be the first declaration in a file to be included by the
require or use operator. You can switch into a package in more than one place; it merely influences
which symbol table is used by the compiler for the rest of that block. You can refer to variables and
filehandles in other packages by prefixing the identifier with the package name and a double colon:
$Package::Variable. If the package name is null, the main package is assumed. That is, $::sail
is equivalent to $main::sail.
The old package delimiter was a single quote, but double colon is now the preferred delimiter, in part
because it‘s more readable to humans, and in part because it‘s more readable to emacs macros. It also makes
C++ programmers feel like they know what‘s going on—as opposed to using the single quote as separator,
which was there to make Ada programmers feel like they knew what‘s going on. Because the old−fashioned
syntax is still supported for backwards compatibility, if you try to use a string like "This is $owner‘s
house", you‘ll be accessing $owner::s; that is, the $s variable in package owner, which is probably
not what you meant. Use braces to disambiguate, as in "This is ${owner}‘s house".
Packages may be nested inside other packages: $OUTER::INNER::var. This implies nothing about the
order of name lookups, however. All symbols are either local to the current package, or must be fully
qualified from the outer package name down. For instance, there is nowhere within package OUTER that
$INNER::var refers to $OUTER::INNER::var. It would treat package INNER as a totally separate
global package.
Only identifiers starting with letters (or underscore) are stored in a package‘s symbol table. All other
symbols are kept in package main, including all of the punctuation variables like $_. In addition, when
unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are
forced to be in package main, even when used for other purposes than their builtin one. Note also that, if
you have a package called m, s, or y, then you can‘t use the qualified form of an identifier because it will be
interpreted instead as a pattern match, a substitution, or a transliteration.
(Variables beginning with underscore used to be forced into package main, but we decided it was more
useful for package writers to be able to use leading underscore to indicate private variables and method
names. $_ is still global though.)
Eval()ed strings are compiled in the package in which the eval() was compiled. (Assignments to
$SIG{}, however, assume the signal handler specified is in the main package. Qualify the signal handler
name if you wish to have a signal handler in a package.) For an example, examine perldb.pl in the Perl
library. It initially switches to the DB package so that the debugger doesn‘t interfere with variables in the
script you are trying to debug. At various points, however, it temporarily switches back to the main
package to evaluate various expressions in the context of the main package (or wherever you came from).
See perldebug.
The special symbol __PACKAGE__ contains the current package, but cannot (easily) be used to construct
variables.
See perlsub for other scoping issues related to my() and local(), and perlref regarding closures.
296
Version 5.005_02
18−Oct−1998
perlmod
Perl Programmers Reference Guide
perlmod
Symbol Tables
The symbol table for a package happens to be stored in the hash of that name with two colons appended.
The main symbol table‘s name is thus %main::, or %:: for short. Likewise symbol table for the nested
package mentioned earlier is named %OUTER::INNER::.
The value in each entry of the hash is what you are referring to when you use the *name typeglob notation.
In fact, the following have the same effect, though the first is more efficient because it does the symbol table
lookups at compile time:
local *main::foo
local $main::{foo}
= *main::bar;
= $main::{bar};
You can use this to print out all the variables in a package, for instance. The standard dumpvar.pl library
and the CPAN module Devel::Symdump make use of this.
Assignment to a typeglob performs an aliasing operation, i.e.,
*dick = *richard;
causes variables, subroutines, formats, and file and directory handles accessible via the identifier richard
also to be accessible via the identifier dick. If you want to alias only a particular variable or subroutine,
you can assign a reference instead:
*dick = \$richard;
Which makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays.
Tricky, eh?
This mechanism may be used to pass and return cheap references into or from subroutines if you won‘t want
to copy the whole thing. It only works when assigning to dynamic variables, not lexicals.
%some_hash = ();
# can’t be my()
*some_hash = fn( \%another_hash );
sub fn {
local *hashsym = shift;
# now use %hashsym normally, and you
# will affect the caller’s %another_hash
my %nhash = (); # do what you want
return \%nhash;
}
On return, the reference will overwrite the hash slot in the symbol table specified by the *some_hash
typeglob. This is a somewhat tricky way of passing around references cheaply when you won‘t want to have
to remember to dereference variables explicitly.
Another use of symbol tables is for making "constant" scalars.
*PI = \3.14159265358979;
Now you cannot alter $PI, which is probably a good thing all in all. This isn‘t the same as a constant
subroutine, which is subject to optimization at compile−time. This isn‘t. A constant subroutine is one
prototyped to take no arguments and to return a constant expression. See perlsub for details on these. The
use constant pragma is a convenient shorthand for these.
You can say *foo{PACKAGE} and *foo{NAME} to find out what name and package the *foo symbol
table entry comes from. This may be useful in a subroutine that gets passed typeglobs as arguments:
sub identify_typeglob {
my $glob = shift;
print ’You gave me ’, *{$glob}{PACKAGE}, ’::’, *{$glob}{NAME}, "\n";
}
identify_typeglob *foo;
18−Oct−1998
Version 5.005_02
297
perlmod
Perl Programmers Reference Guide
perlmod
identify_typeglob *bar::baz;
This prints
You gave me main::foo
You gave me bar::baz
The *foo{THING} notation can also be used to obtain references to the individual elements of *foo, see
perlref.
Package Constructors and Destructors
There are two special subroutine definitions that function as package constructors and destructors. These are
the BEGIN and END routines. The sub is optional for these routines.
A BEGIN subroutine is executed as soon as possible, that is, the moment it is completely defined, even
before the rest of the containing file is parsed. You may have multiple BEGIN blocks within a file—they
will execute in order of definition. Because a BEGIN block executes immediately, it can pull in definitions
of subroutines and such from other files in time to be visible to the rest of the file. Once a BEGIN has run, it
is immediately undefined and any code it used is returned to Perl‘s memory pool. This means you can‘t ever
explicitly call a BEGIN.
An END subroutine is executed as late as possible, that is, when the interpreter is being exited, even if it is
exiting as a result of a die() function. (But not if it‘s polymorphing into another program via exec, or
being blown out of the water by a signal—you have to trap that yourself (if you can).) You may have
multiple END blocks within a file—they will execute in reverse order of definition; that is: last in, first out
(LIFO).
Inside an END subroutine, $? contains the value that the script is going to pass to exit(). You can modify
$? to change the exit value of the script. Beware of changing $? by accident (e.g. by running something via
system).
Note that when you use the −n and −p switches to Perl, BEGIN and END work just as they do in awk, as a
degenerate case. As currently implemented (and subject to change, since its inconvenient at best), both
BEGIN and END blocks are run when you use the −c switch for a compile−only syntax check, although your
main code is not.
Perl Classes
There is no special class syntax in Perl, but a package may function as a class if it provides subroutines to act
as methods. Such a package may also derive some of its methods from another class (package) by listing the
other package name in its global @ISA array (which must be a package global, not a lexical).
For more on this, see perltoot and perlobj.
Perl Modules
A module is just a package that is defined in a library file of the same name, and is designed to be reusable.
It may do this by providing a mechanism for exporting some of its symbols into the symbol table of any
package using it. Or it may function as a class definition and make its semantics available implicitly through
method calls on the class and its objects, without explicit exportation of any symbols. Or it can do a little of
both.
For example, to start a normal module called Some::Module, create a file called Some/Module.pm and start
with this template:
package Some::Module;
# assumes Some/Module.pm
use strict;
BEGIN {
use Exporter
use vars
298
();
qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
Version 5.005_02
18−Oct−1998
perlmod
Perl Programmers Reference Guide
perlmod
# set the version for version checking
$VERSION
= 1.00;
# if using RCS/CVS, this may be preferred
$VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d"
@ISA
= qw(Exporter);
@EXPORT
= qw(&func1 &func2 &func4);
%EXPORT_TAGS = ( );
# eg: TAG => [ qw!name1 name2! ],
# your exported package globals go here,
# as well as any optionally exported functions
@EXPORT_OK
= qw($Var1 %Hashit &func3);
}
use vars
@EXPORT_OK;
# non−exported package globals go here
use vars
qw(@more $stuff);
# initalize package globals, first exported ones
$Var1
= ’’;
%Hashit = ();
# then the others (which are still accessible as $Some::Module::stuff)
$stuff = ’’;
@more
= ();
# all file−scoped lexicals must be created before
# the functions below that use them.
# file−private lexicals go here
my $priv_var
= ’’;
my %secret_hash = ();
# here’s a file−private function as a closure,
# callable as &$priv_func; it cannot be prototyped.
my $priv_func = sub {
# stuff goes here.
};
# make all your functions, whether exported or not;
# remember to put something interesting in the {} stubs
sub func1
{}
# no prototype
sub func2()
{}
# proto’d void
sub func3($$) {}
# proto’d to 2 scalars
# this one isn’t exported, but could be called!
sub func4(\%) {}
# proto’d to 1 hash ref
END { }
# module clean−up code here (global destructor)
Then go on to declare and use your variables in functions without any qualifications. See Exporter and the
perlmodlib for details on mechanics and style issues in module creation.
Perl modules are included into your program by saying
use Module;
or
use Module LIST;
This is exactly equivalent to
18−Oct−1998
Version 5.005_02
299
perlmod
Perl Programmers Reference Guide
perlmod
BEGIN { require Module; import Module; }
or
BEGIN { require Module; import Module LIST; }
As a special case
use Module ();
is exactly equivalent to
BEGIN { require Module; }
All Perl module files have the extension .pm. use assumes this so that you don‘t have to spell out
"Module.pm" in quotes. This also helps to differentiate new modules from old .pl and .ph files. Module
names are also capitalized unless they‘re functioning as pragmas, "Pragmas" are in effect compiler
directives, and are sometimes called "pragmatic modules" (or even "pragmata" if you‘re a classicist).
The two statements:
require SomeModule;
require "SomeModule.pm";
differ from each other in two ways. In the first case, any double colons in the module name, such as
Some::Module, are translated into your system‘s directory separator, usually "/". The second case does
not, and would have to be specified literally. The other difference is that seeing the first require clues in
the compiler that uses of indirect object notation involving "SomeModule", as in $ob = purge
SomeModule, are method calls, not function calls. (Yes, this really can make a difference.)
Because the use statement implies a BEGIN block, the importation of semantics happens at the moment the
use statement is compiled, before the rest of the file is compiled. This is how it is able to function as a
pragma mechanism, and also how modules are able to declare subroutines that are then visible as list
operators for the rest of the current file. This will not work if you use require instead of use. With
require you can get into this problem:
require Cwd;
$here = Cwd::getcwd();
# make Cwd:: accessible
use Cwd;
$here = getcwd();
# import names from Cwd::
require Cwd;
$here = getcwd();
# make Cwd:: accessible
# oops! no main::getcwd()
In general, use Module () is recommended over require Module, because it determines module
availability at compile time, not in the middle of your program‘s execution. An exception would be if two
modules each tried to use each other, and each also called a function from that other module. In that case,
it‘s easy to use requires instead.
Perl packages may be nested inside other package names, so we can have package names containing ::.
But if we used that package name directly as a filename it would makes for unwieldy or impossible
filenames on some systems. Therefore, if a module‘s name is, say, Text::Soundex, then its definition is
actually found in the library file Text/Soundex.pm.
Perl modules always have a .pm file, but there may also be dynamically linked executables or autoloaded
subroutine definitions associated with the module. If so, these will be entirely transparent to the user of the
module. It is the responsibility of the .pm file to load (or arrange to autoload) any additional functionality.
The POSIX module happens to do both dynamic loading and autoloading, but the user can say just use
POSIX to get it all.
For more information on writing extension modules, see perlxstut and perlguts.
300
Version 5.005_02
18−Oct−1998
perlmod
Perl Programmers Reference Guide
perlmod
SEE ALSO
See perlmodlib for general style issues related to building Perl modules and classes as well as descriptions of
the standard library and CPAN, Exporter for how Perl‘s standard import/export mechanism works, perltoot
for an in−depth tutorial on creating classes, perlobj for a hard−core reference document on objects, and
perlsub for an explanation of functions and scoping.
18−Oct−1998
Version 5.005_02
301
perlref
Perl Programmers Reference Guide
perlref
NAME
perlref − Perl references and nested data structures
DESCRIPTION
Before release 5 of Perl it was difficult to represent complex data structures, because all references had to be
symbolic—and even then it was difficult to refer to a variable instead of a symbol table entry. Perl now not
only makes it easier to use symbolic references to variables, but also lets you have "hard" references to any
piece of data or code. Any scalar may hold a hard reference. Because arrays and hashes contain scalars, you
can now easily build arrays of arrays, arrays of hashes, hashes of arrays, arrays of hashes of functions, and so
on.
Hard references are smart—they keep track of reference counts for you, automatically freeing the thing
referred to when its reference count goes to zero. (Note: the reference counts for values in self−referential or
cyclic data structures may not go to zero without a little help; see
Two−Phased Garbage Collection in perlobj for a detailed explanation.) If that thing happens to be an object,
the object is destructed. See perlobj for more about objects. (In a sense, everything in Perl is an object, but
we usually reserve the word for references to objects that have been officially "blessed" into a class
package.)
Symbolic references are names of variables or other objects, just as a symbolic link in a Unix filesystem
contains merely the name of a file. The *glob notation is a kind of symbolic reference. (Symbolic
references are sometimes called "soft references", but please don‘t call them that; references are confusing
enough without useless synonyms.)
In contrast, hard references are more like hard links in a Unix file system: They are used to access an
underlying object without concern for what its (other) name is. When the word "reference" is used without
an adjective, as in the following paragraph, it is usually talking about a hard reference.
References are easy to use in Perl. There is just one overriding principle: Perl does no implicit referencing or
dereferencing. When a scalar is holding a reference, it always behaves as a simple scalar. It doesn‘t
magically start being an array or hash or subroutine; you have to tell it explicitly to do so, by dereferencing
it.
Making References
References can be created in several ways.
1.
By using the backslash operator on a variable, subroutine, or value. (This works much like the &
(address−of) operator in C.) Note that this typically creates ANOTHER reference to a variable,
because there‘s already a reference to the variable in the symbol table. But the symbol table reference
might go away, and you‘ll still have the reference that the backslash returned. Here are some
examples:
$scalarref
$arrayref
$hashref
$coderef
$globref
=
=
=
=
=
\$foo;
\@ARGV;
\%ENV;
\&handler;
\*foo;
It isn‘t possible to create a true reference to an IO handle (filehandle or dirhandle) using the backslash
operator. The most you can get is a reference to a typeglob, which is actually a complete symbol table
entry. But see the explanation of the *foo{THING} syntax below. However, you can still use type
globs and globrefs as though they were IO handles.
2.
A reference to an anonymous array can be created using square brackets:
$arrayref = [1, 2, [’a’, ’b’, ’c’]];
Here we‘ve created a reference to an anonymous array of three elements whose final element is itself a
reference to another anonymous array of three elements. (The multidimensional syntax described later
302
Version 5.005_02
18−Oct−1998
perlref
Perl Programmers Reference Guide
perlref
can be used to access this. For example, after the above, $arrayref−>[2][1] would have the
value "b".)
Note that taking a reference to an enumerated list is not the same as using square brackets—instead it‘s
the same as creating a list of references!
@list = (\$a, \@b, \%c);
@list = \($a, @b, %c);
# same thing!
As a special case, \(@foo) returns a list of references to the contents of @foo, not a reference to
@foo itself. Likewise for %foo.
3.
A reference to an anonymous hash can be created using curly brackets:
$hashref = {
’Adam’ => ’Eve’,
’Clyde’ => ’Bonnie’,
};
Anonymous hash and array composers like these can be intermixed freely to produce as complicated a
structure as you want. The multidimensional syntax described below works for these too. The values
above are literals, but variables and expressions would work just as well, because assignment operators
in Perl (even within local() or my()) are executable statements, not compile−time declarations.
Because curly brackets (braces) are used for several other things including BLOCKs, you may
occasionally have to disambiguate braces at the beginning of a statement by putting a + or a return
in front so that Perl realizes the opening brace isn‘t starting a BLOCK. The economy and mnemonic
value of using curlies is deemed worth this occasional extra hassle.
For example, if you wanted a function to make a new hash and return a reference to it, you have these
options:
sub hashem {
{ @_ } }
sub hashem {
+{ @_ } }
sub hashem { return { @_ } }
# silently wrong
# ok
# ok
On the other hand, if you want the other meaning, you can do this:
sub showem {
{ @_ } }
sub showem {
{; @_ } }
sub showem { { return @_ } }
# ambiguous (currently ok, but may change)
# ok
# ok
Note how the leading +{ and {; always serve to disambiguate the expression to mean either the
HASH reference, or the BLOCK.
4.
A reference to an anonymous subroutine can be created by using sub without a subname:
$coderef = sub { print "Boink!\n" };
Note the presence of the semicolon. Except for the fact that the code inside isn‘t executed
immediately, a sub {} is not so much a declaration as it is an operator, like do{} or eval{}.
(However, no matter how many times you execute that particular line (unless you‘re in an
eval("...")), $coderef will still have a reference to the SAME anonymous subroutine.)
Anonymous subroutines act as closures with respect to my() variables, that is, variables visible
lexically within the current scope. Closure is a notion out of the Lisp world that says if you define an
anonymous function in a particular lexical context, it pretends to run in that context even when it‘s
called outside of the context.
In human terms, it‘s a funny way of passing arguments to a subroutine when you define it as well as
when you call it. It‘s useful for setting up little bits of code to run later, such as callbacks. You can
even do object−oriented stuff with it, though Perl already provides a different mechanism to do
that—see perlobj.
18−Oct−1998
Version 5.005_02
303
perlref
Perl Programmers Reference Guide
perlref
You can also think of closure as a way to write a subroutine template without using eval. (In fact, in
version 5.000, eval was the only way to get closures. You may wish to use "require 5.001" if you use
closures.)
Here‘s a small example of how closures works:
sub newprint {
my $x = shift;
return sub { my $y = shift; print "$x, $y!\n"; };
}
$h = newprint("Howdy");
$g = newprint("Greetings");
# Time passes...
&$h("world");
&$g("earthlings");
This prints
Howdy, world!
Greetings, earthlings!
Note particularly that $x continues to refer to the value passed into newprint() despite the fact that
the "my $x" has seemingly gone out of scope by the time the anonymous subroutine runs. That‘s
what closure is all about.
This applies only to lexical variables, by the way. Dynamic variables continue to work as they have
always worked. Closure is not something that most Perl programmers need trouble themselves about
to begin with.
5.
References are often returned by special subroutines called constructors. Perl objects are just
references to a special kind of object that happens to know which package it‘s associated with.
Constructors are just special subroutines that know how to create that association. They do so by
starting with an ordinary reference, and it remains an ordinary reference even while it‘s also being an
object. Constructors are often named new() and called indirectly:
$objref = new Doggie (Tail => ’short’, Ears => ’long’);
But don‘t have to be:
$objref
= Doggie−>new(Tail => ’short’, Ears => ’long’);
use Term::Cap;
$terminal = Term::Cap−>Tgetent( { OSPEED => 9600 });
use Tk;
$main
= MainWindow−>new();
$menubar = $main−>Frame(−relief
−borderwidth
6.
References of the appropriate type can spring into existence if you dereference them in a context that
assumes they exist. Because we haven‘t talked about dereferencing yet, we can‘t show you any
examples yet.
7.
A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax.
*foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which
holds everything known as foo).
$scalarref
$arrayref
$hashref
$coderef
304
=> "raised",
=> 2)
=
=
=
=
*foo{SCALAR};
*ARGV{ARRAY};
*ENV{HASH};
*handler{CODE};
Version 5.005_02
18−Oct−1998
perlref
Perl Programmers Reference Guide
$ioref
$globref
perlref
= *STDIN{IO};
= *foo{GLOB};
All of these are self−explanatory except for *foo{IO}. It returns the IO handle, used for file handles
(open), sockets (socket and socketpair), and directory handles (opendir). For compatibility with
previous versions of Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}.
*foo{THING} returns undef if that particular THING hasn‘t been used yet, except in the case of
scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn‘t been used yet.
This might change in a future release.
*foo{IO} is an alternative to the \*HANDLE mechanism given in
Typeglobs and Filehandles in perldata for passing filehandles into or out of subroutines, or storing into
larger data structures. Its disadvantage is that it won‘t create a new filehandle for you. Its advantage is
that you have no risk of clobbering more than you want to with a typeglob assignment, although if you
assign to a scalar instead of a typeglob, you‘re ok.
splutter(*STDOUT);
splutter(*STDOUT{IO});
sub splutter {
my $fh = shift;
print $fh "her um well a hmmm\n";
}
$rec = get_rec(*STDIN);
$rec = get_rec(*STDIN{IO});
sub get_rec {
my $fh = shift;
return scalar <$fh>;
}
Using References
That‘s it for creating references. By now you‘re probably dying to know how to use references to get back to
your long−lost data. There are several basic methods.
1.
Anywhere you‘d put an identifier (or chain of identifiers) as part of a variable or subroutine name, you
can replace the identifier with a simple scalar variable containing a reference of the correct type:
$bar = $$scalarref;
push(@$arrayref, $filename);
$$arrayref[0] = "January";
$$hashref{"KEY"} = "VALUE";
&$coderef(1,2,3);
print $globref "output\n";
It‘s important to understand that we are specifically NOT dereferencing $arrayref[0] or
$hashref{"KEY"} there. The dereference of the scalar variable happens BEFORE it does any key
lookups. Anything more complicated than a simple scalar variable must use methods 2 or 3 below.
However, a "simple scalar" includes an identifier that itself uses method 1 recursively. Therefore, the
following prints "howdy".
$refrefref = \\\"howdy";
print $$$$refrefref;
2.
Anywhere you‘d put an identifier (or chain of identifiers) as part of a variable or subroutine name, you
can replace the identifier with a BLOCK returning a reference of the correct type. In other words, the
previous examples could be written like this:
$bar = ${$scalarref};
18−Oct−1998
Version 5.005_02
305
perlref
Perl Programmers Reference Guide
perlref
push(@{$arrayref}, $filename);
${$arrayref}[0] = "January";
${$hashref}{"KEY"} = "VALUE";
&{$coderef}(1,2,3);
$globref−>print("output\n"); # iff IO::Handle is loaded
Admittedly, it‘s a little silly to use the curlies in this case, but the BLOCK can contain any arbitrary
expression, in particular, subscripted expressions:
&{ $dispatch{$index} }(1,2,3);
# call correct routine
Because of being able to omit the curlies for the simple case of $$x, people often make the mistake of
viewing the dereferencing symbols as proper operators, and wonder about their precedence. If they
were, though, you could use parentheses instead of braces. That‘s not the case. Consider the difference
below; case 0 is a short−hand version of case 1, NOT case 2:
$$hashref{"KEY"}
=
${$hashref}{"KEY"} =
${$hashref{"KEY"}} =
${$hashref−>{"KEY"}}
"VALUE";
"VALUE";
"VALUE";
= "VALUE";
#
#
#
#
CASE
CASE
CASE
CASE
0
1
2
3
Case 2 is also deceptive in that you‘re accessing a variable called %hashref, not dereferencing through
$hashref to the hash it‘s presumably referencing. That would be case 3.
3.
Subroutine calls and lookups of individual array elements arise often enough that it gets cumbersome
to use method 2. As a form of syntactic sugar, the examples for method 2 may be written:
$arrayref−>[0] = "January";
$hashref−>{"KEY"} = "VALUE";
$coderef−>(1,2,3);
# Array element
# Hash element
# Subroutine call
The left side of the arrow can be any expression returning a reference, including a previous
dereference. Note that $array[$x] is NOT the same thing as $array−>[$x] here:
$array[$x]−>{"foo"}−>[0] = "January";
This is one of the cases we mentioned earlier in which references could spring into existence when in
an lvalue context. Before this statement, $array[$x] may have been undefined. If so, it‘s
automatically defined with a hash reference so that we can look up {"foo"} in it. Likewise
$array[$x]−>{"foo"} will automatically get defined with an array reference so that we can look
up [0] in it. This process is called autovivification.
One more thing here. The arrow is optional BETWEEN brackets subscripts, so you can shrink the
above down to
$array[$x]{"foo"}[0] = "January";
Which, in the degenerate case of using only ordinary arrays, gives you multidimensional arrays just
like C‘s:
$score[$x][$y][$z] += 42;
Well, okay, not entirely like C‘s arrays, actually. C doesn‘t know how to grow its arrays on demand.
Perl does.
4.
If a reference happens to be a reference to an object, then there are probably methods to access the
things referred to, and you should probably stick to those methods unless you‘re in the class package
that defines the object‘s methods. In other words, be nice, and don‘t violate the object‘s encapsulation
without a very good reason. Perl does not enforce encapsulation. We are not totalitarians here. We do
expect some basic civility though.
The ref() operator may be used to determine what type of thing the reference is pointing to. See perlfunc.
306
Version 5.005_02
18−Oct−1998
perlref
Perl Programmers Reference Guide
perlref
The bless() operator may be used to associate the object a reference points to with a package functioning
as an object class. See perlobj.
A typeglob may be dereferenced the same way a reference can, because the dereference syntax always
indicates the kind of reference desired. So ${*foo} and ${\$foo} both indicate the same scalar variable.
Here‘s a trick for interpolating a subroutine call into a string:
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
The way it works is that when the @{...} is seen in the double−quoted string, it‘s evaluated as a block.
The block creates a reference to an anonymous array containing the results of the call to mysub(1,2,3).
So the whole block returns a reference to an array, which is then dereferenced by @{...} and stuck into the
double−quoted string. This chicanery is also useful for arbitrary expressions:
print "That yields @{[$n + 5]} widgets\n";
Symbolic references
We said that references spring into existence as necessary if they are undefined, but we didn‘t say what
happens if a value used as a reference is already defined, but ISN‘T a hard reference. If you use it as a
reference in this case, it‘ll be treated as a symbolic reference. That is, the value of the scalar is taken to be
the NAME of a variable, rather than a direct link to a (possibly) anonymous value.
People frequently expect it to work like this. So it does.
$name = "foo";
$$name = 1;
${$name} = 2;
${$name x 2} = 3;
$name−>[0] = 4;
@$name = ();
&$name();
$pack = "THAT";
${"${pack}::$name"} = 5;
#
#
#
#
#
#
Sets $foo
Sets $foo
Sets $foofoo
Sets $foo[0]
Clears @foo
Calls &foo() (as in Perl 4)
# Sets $THAT::foo without eval
This is very powerful, and slightly dangerous, in that it‘s possible to intend (with the utmost sincerity) to use
a hard reference, and accidentally use a symbolic reference instead. To protect against that, you can say
use strict ’refs’;
and then only hard references will be allowed for the rest of the enclosing block. An inner block may
countermand that with
no strict ’refs’;
Only package variables (globals, even if localized) are visible to symbolic references. Lexical variables
(declared with my()) aren‘t in a symbol table, and thus are invisible to this mechanism. For example:
local $value = 10;
$ref = \$value;
{
my $value = 20;
print $$ref;
}
This will still print 10, not 20. Remember that local() affects package variables, which are all "global" to
the package.
Not−so−symbolic references
A new feature contributing to readability in perl version 5.001 is that the brackets around a symbolic
reference behave more like quotes, just as they always have within a string. That is,
18−Oct−1998
Version 5.005_02
307
perlref
Perl Programmers Reference Guide
perlref
$push = "pop on ";
print "${push}over";
has always meant to print "pop on over", despite the fact that push is a reserved word. This has been
generalized to work the same outside of quotes, so that
print ${push} . "over";
and even
print ${ push } . "over";
will have the same effect. (This would have been a syntax error in Perl 5.000, though Perl 4 allowed it in the
spaceless form.) Note that this construct is not considered to be a symbolic reference when you‘re using
strict refs:
use strict ’refs’;
${ bareword };
${ "bareword" };
# Okay, means $bareword.
# Error, symbolic reference.
Similarly, because of all the subscripting that is done using single words, we‘ve applied the same rule to any
bareword that is used for subscripting a hash. So now, instead of writing
$array{ "aaa" }{ "bbb" }{ "ccc" }
you can write just
$array{ aaa }{ bbb }{ ccc }
and not worry about whether the subscripts are reserved words. In the rare event that you do wish to do
something like
$array{ shift }
you can force interpretation as a reserved word by adding anything that makes it more than a bareword:
$array{ shift() }
$array{ +shift }
$array{ shift @_ }
The −w switch will warn you if it interprets a reserved word as a string. But it will no longer warn you about
using lowercase words, because the string is effectively quoted.
Pseudo−hashes: Using an array as a hash
WARNING: This section describes an experimental feature. Details may change without notice in future
versions.
Beginning with release 5.005 of Perl you can use an array reference in some contexts that would normally
require a hash reference. This allows you to access array elements using symbolic names, as if they were
fields in a structure.
For this to work, the array must contain extra information. The first element of the array has to be a hash
reference that maps field names to array indices. Here is an example:
$struct = [{foo => 1, bar => 2}, "FOO", "BAR"];
$struct−>{foo};
$struct−>{bar};
# same as $struct−>[1], i.e. "FOO"
# same as $struct−>[2], i.e. "BAR"
keys %$struct;
# will return ("foo", "bar") in some order
values %$struct; # will return ("FOO", "BAR") in same some order
while (my($k,$v) = each %$struct) {
print "$k => $v\n";
}
308
Version 5.005_02
18−Oct−1998
perlref
Perl Programmers Reference Guide
perlref
Perl will raise an exception if you try to delete keys from a pseudo−hash or try to access nonexistent fields.
For better performance, Perl can also do the translation from field names to array indices at compile time for
typed object references. See fields.
Function Templates
As explained above, a closure is an anonymous function with access to the lexical variables visible when that
function was compiled. It retains access to those variables even though it doesn‘t get run until later, such as
in a signal handler or a Tk callback.
Using a closure as a function template allows us to generate many functions that act similarly. Suppopose
you wanted functions named after the colors that generated HTML font changes for the various colors:
print "Be ", red("careful"), "with that ", green("light");
The red() and green() functions would be very similar. To create these, we‘ll assign a closure to a
typeglob of the name of the function we‘re trying to build.
@colors = qw(red blue green yellow orange purple violet);
for my $name (@colors) {
no strict ’refs’;
# allow symbol table manipulation
*$name = *{uc $name} = sub { "@_ " };
}
Now all those different functions appear to exist independently. You can call red(), RED(), blue(),
BLUE(), green(), etc. This technique saves on both compile time and memory use, and is less
error−prone as well, since syntax checks happen at compile time. It‘s critical that any variables in the
anonymous subroutine be lexicals in order to create a proper closure. That‘s the reasons for the my on the
loop iteration variable.
This is one of the only places where giving a prototype to a closure makes much sense. If you wanted to
impose scalar context on the arguments of these functions (probably not a wise idea for this particular
example), you could have written it this way instead:
*$name = sub ($) { "$_[0] " };
However, since prototype checking happens at compile time, the assignment above happens too late to be of
much use. You could address this by putting the whole loop of assignments within a BEGIN block, forcing
it to occur during compilation.
Access to lexicals that change over type—like those in the for loop above—only works with closures, not
general subroutines. In the general case, then, named subroutines do not nest properly, although anonymous
ones do. If you are accustomed to using nested subroutines in other programming languages with their own
private variables, you‘ll have to work at it a bit in Perl. The intuitive coding of this kind of thing incurs
mysterious warnings about ‘‘will not stay shared‘’. For example, this won‘t work:
sub outer {
my $x = $_[0] + 35;
sub inner { return $x * 19 }
return $x + inner();
}
# WRONG
A work−around is the following:
sub outer {
my $x = $_[0] + 35;
local *inner = sub { return $x * 19 };
return $x + inner();
}
Now inner() can only be called from within outer(), because of the temporary assignments of the
closure (anonymous subroutine). But when it does, it has normal access to the lexical variable $x from the
18−Oct−1998
Version 5.005_02
309
perlref
Perl Programmers Reference Guide
perlref
scope of outer().
This has the interesting effect of creating a function local to another function, something not normally
supported in Perl.
WARNING
You may not (usefully) use a reference as the key to a hash. It will be converted into a string:
$x{ \$a } = $a;
If you try to dereference the key, it won‘t do a hard dereference, and you won‘t accomplish what you‘re
attempting. You might want to do something more like
$r = \@a;
$x{ $r } = $r;
And then at least you can use the values(), which will be real refs, instead of the keys(), which won‘t.
The standard Tie::RefHash module provides a convenient workaround to this.
SEE ALSO
Besides the obvious documents, source code can be instructive. Some rather pathological examples of the
use of references can be found in the t/op/ref.t regression test in the Perl source directory.
See also perldsc and perllol for how to use references to create complex data structures, and perltoot,
perlobj, and perlbot for how to use them to create objects.
310
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
perldsc
NAME
perldsc − Perl Data Structures Cookbook
DESCRIPTION
The single feature most sorely lacking in the Perl programming language prior to its 5.0 release was complex
data structures. Even without direct language support, some valiant programmers did manage to emulate
them, but it was hard work and not for the faint of heart. You could occasionally get away with the
$m{$LoL,$b} notation borrowed from awk in which the keys are actually more like a single concatenated
string "$LoL$b", but traversal and sorting were difficult. More desperate programmers even hacked
Perl‘s internal symbol table directly, a strategy that proved hard to develop and maintain—to put it mildly.
The 5.0 release of Perl let us have complex data structures. You may now write something like this and all
of a sudden, you‘d have a array with three dimensions!
for $x (1 .. 10) {
for $y (1 .. 10) {
for $z (1 .. 10) {
$LoL[$x][$y][$z] =
$x ** $y + $z;
}
}
}
Alas, however simple this may appear, underneath it‘s a much more elaborate construct than meets the eye!
How do you print it out? Why can‘t you say just print @LoL? How do you sort it? How can you pass it
to a function or get one of these back from a function? Is is an object? Can you save it to disk to read back
later? How do you access whole rows or columns of that matrix? Do all the values have to be numeric?
As you see, it‘s quite easy to become confused. While some small portion of the blame for this can be
attributed to the reference−based implementation, it‘s really more due to a lack of existing documentation
with examples designed for the beginner.
This document is meant to be a detailed but understandable treatment of the many different sorts of data
structures you might want to develop. It should also serve as a cookbook of examples. That way, when you
need to create one of these complex data structures, you can just pinch, pilfer, or purloin a drop−in example
from here.
Let‘s look at each of these possible constructs in detail. There are separate sections on each of the following:
arrays of arrays
hashes of arrays
arrays of hashes
hashes of hashes
more elaborate constructs
But for now, let‘s look at general issues common to all these types of data structures.
REFERENCES
The most important thing to understand about all data structures in Perl — including multidimensional
arrays—is that even though they might appear otherwise, Perl @ARRAYs and %HASHes are all internally
one−dimensional. They can hold only scalar values (meaning a string, number, or a reference). They cannot
directly contain other arrays or hashes, but instead contain references to other arrays or hashes.
You can‘t use a reference to a array or hash in quite the same way that you would a real array or hash. For C
or C++ programmers unused to distinguishing between arrays and pointers to the same, this can be
confusing. If so, just think of it as the difference between a structure and a pointer to a structure.
18−Oct−1998
Version 5.005_02
311
perldsc
Perl Programmers Reference Guide
perldsc
You can (and should) read more about references in the perlref(1) man page. Briefly, references are rather
like pointers that know what they point to. (Objects are also a kind of reference, but we won‘t be needing
them right away—if ever.) This means that when you have something which looks to you like an access to a
two−or−more−dimensional array and/or hash, what‘s really going on is that the base type is merely a
one−dimensional entity that contains references to the next level. It‘s just that you can use it as though it
were a two−dimensional one. This is actually the way almost all C multidimensional arrays work as well.
$list[7][12]
$list[7]{string}
$hash{string}[7]
$hash{string}{’another string’}
#
#
#
#
array of arrays
array of hashes
hash of arrays
hash of hashes
Now, because the top level contains only references, if you try to print out your array in with a simple
print() function, you‘ll get something that doesn‘t look very nice, like this:
@LoL = ( [2, 3], [4, 5, 7], [0] );
print $LoL[1][2];
7
print @LoL;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That‘s because Perl doesn‘t (ever) implicitly dereference your variables. If you want to get at the thing a
reference is referring to, then you have to do this yourself using either prefix typing indicators, like
${$blah}, @{$blah}, @{$blah[$i]}, or else postfix pointer arrows, like $a−>[3],
$h−>{fred}, or even $ob−>method()−>[3].
COMMON MISTAKES
The two most common mistakes made in constructing something like an array of arrays is either accidentally
counting the number of elements or else taking a reference to the same memory location repeatedly. Here‘s
the case where you just get the count instead of a nested array:
for $i (1..10) {
@list = somefunc($i);
$LoL[$i] = @list;
}
# WRONG!
That‘s just the simple case of assigning a list to a scalar and getting its element count. If that‘s what you
really and truly want, then you might do well to consider being a tad more explicit about it, like this:
for $i (1..10) {
@list = somefunc($i);
$counts[$i] = scalar @list;
}
Here‘s the case of taking a reference to the same memory location again and again:
for $i (1..10) {
@list = somefunc($i);
$LoL[$i] = \@list;
}
# WRONG!
So, what‘s the big problem with that? It looks right, doesn‘t it? After all, I just told you that you need an
array of references, so by golly, you‘ve made me one!
Unfortunately, while this is true, it‘s still broken. All the references in @LoL refer to the very same place,
and they will therefore all hold whatever was last in @list! It‘s similar to the problem demonstrated in the
following C program:
#include
main() {
struct passwd *getpwnam(), *rp, *dp;
312
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
perldsc
rp = getpwnam("root");
dp = getpwnam("daemon");
printf("daemon name is %s\nroot name is %s\n",
dp−>pw_name, rp−>pw_name);
}
Which will print
daemon name is daemon
root name is daemon
The problem is that both rp and dp are pointers to the same location in memory! In C, you‘d have to
remember to malloc() yourself some new memory. In Perl, you‘ll want to use the array constructor [] or
the hash constructor {} instead. Here‘s the right way to do the preceding broken code fragments:
for $i (1..10) {
@list = somefunc($i);
$LoL[$i] = [ @list ];
}
The square brackets make a reference to a new array with a copy of what‘s in @list at the time of the
assignment. This is what you want.
Note that this will produce something similar, but it‘s much harder to read:
for $i (1..10) {
@list = 0 .. $i;
@{$LoL[$i]} = @list;
}
Is it the same? Well, maybe so—and maybe not. The subtle difference is that when you assign something in
square brackets, you know for sure it‘s always a brand new reference with a new copy of the data. Something
else could be going on in this new case with the @{$LoL[$i]}} dereference on the left−hand−side of the
assignment. It all depends on whether $LoL[$i] had been undefined to start with, or whether it already
contained a reference. If you had already populated @LoL with references, as in
$LoL[3] = \@another_list;
Then the assignment with the indirection on the left−hand−side would use the existing reference that was
already there:
@{$LoL[3]} = @list;
Of course, this would have the "interesting" effect of clobbering @another_list. (Have you ever noticed how
when a programmer says something is "interesting", that rather than meaning "intriguing", they‘re
disturbingly more apt to mean that it‘s "annoying", "difficult", or both? :−)
So just remember always to use the array or hash constructors with [] or {}, and you‘ll be fine, although
it‘s not always optimally efficient.
Surprisingly, the following dangerous−looking construct will actually work out fine:
for $i (1..10) {
my @list = somefunc($i);
$LoL[$i] = \@list;
}
That‘s because my() is more of a run−time statement than it is a compile−time declaration per se. This
means that the my() variable is remade afresh each time through the loop. So even though it looks as
though you stored the same variable reference each time, you actually did not! This is a subtle distinction
that can produce more efficient code at the risk of misleading all but the most experienced of programmers.
So I usually advise against teaching it to beginners. In fact, except for passing arguments to functions, I
18−Oct−1998
Version 5.005_02
313
perldsc
Perl Programmers Reference Guide
perldsc
seldom like to see the gimme−a−reference operator (backslash) used much at all in code. Instead, I advise
beginners that they (and most of the rest of us) should try to use the much more easily understood
constructors [] and {} instead of relying upon lexical (or dynamic) scoping and hidden reference−counting
to do the right thing behind the scenes.
In summary:
$LoL[$i] = [ @list ];
$LoL[$i] = \@list;
@{ $LoL[$i] } = @list;
# usually best
# perilous; just how my() was that list?
# way too tricky for most programmers
CAVEAT ON PRECEDENCE
Speaking of things like @{$LoL[$i]}, the following are actually the same thing:
$listref−>[2][2]
$$listref[2][2]
# clear
# confusing
That‘s because Perl‘s precedence rules on its five prefix dereferencers (which look like someone swearing: $
@ * % &) make them bind more tightly than the postfix subscripting brackets or braces! This will no
doubt come as a great shock to the C or C++ programmer, who is quite accustomed to using *a[i] to mean
what‘s pointed to by the i‘th element of a. That is, they first take the subscript, and only then dereference
the thing at that subscript. That‘s fine in C, but this isn‘t C.
The seemingly equivalent construct in Perl, $$listref[$i] first does the deref of $listref, making
it take $listref as a reference to an array, and then dereference that, and finally tell you the i‘th value of
the array pointed to by $LoL. If you wanted the C notion, you‘d have to write ${$LoL[$i]} to force the
$LoL[$i] to get evaluated first before the leading $ dereferencer.
WHY YOU SHOULD ALWAYS use strict
If this is starting to sound scarier than it‘s worth, relax. Perl has some features to help you avoid its most
common pitfalls. The best way to avoid getting confused is to start every program like this:
#!/usr/bin/perl −w
use strict;
This way, you‘ll be forced to declare all your variables with my() and also disallow accidental "symbolic
dereferencing". Therefore if you‘d done this:
my $listref = [
[ "fred", "barney", "pebbles", "bambam", "dino", ],
[ "homer", "bart", "marge", "maggie", ],
[ "george", "jane", "elroy", "judy", ],
];
print $listref[2][2];
The compiler would immediately flag that as an error at compile time, because you were accidentally
accessing @listref, an undeclared variable, and it would thereby remind you to write instead:
print $listref−>[2][2]
DEBUGGING
Before version 5.002, the standard Perl debugger didn‘t do a very nice job of printing out complex data
structures. With 5.002 or above, the debugger includes several new features, including command line editing
as well as the x command to dump out complex data structures. For example, given the assignment to $LoL
above, here‘s the debugger output:
DB<1> x $LoL
$LoL = ARRAY(0x13b5a0)
0 ARRAY(0x1f0a24)
0 ’fred’
314
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
1
2
perldsc
1 ’barney’
2 ’pebbles’
3 ’bambam’
4 ’dino’
ARRAY(0x13b558)
0 ’homer’
1 ’bart’
2 ’marge’
3 ’maggie’
ARRAY(0x13b540)
0 ’george’
1 ’jane’
2 ’elroy’
3 ’judy’
CODE EXAMPLES
Presented with little comment (these will get their own manpages someday) here are short code examples
illustrating access of various types of data structures.
LISTS OF LISTS
Declaration of a LIST OF LISTS
@LoL = (
[ "fred", "barney" ],
[ "george", "jane", "elroy" ],
[ "homer", "marge", "bart" ],
);
Generation of a LIST OF LISTS
# reading from file
while ( <> ) {
push @LoL, [ split ];
}
# calling a function
for $i ( 1 .. 10 ) {
$LoL[$i] = [ somefunc($i) ];
}
# using temp vars
for $i ( 1 .. 10 ) {
@tmp = somefunc($i);
$LoL[$i] = [ @tmp ];
}
# add to an existing row
push @{ $LoL[0] }, "wilma", "betty";
Access and Printing of a LIST OF LISTS
# one element
$LoL[0][0] = "Fred";
# another element
$LoL[1][1] =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $aref ( @LoL ) {
print "\t [ @$aref ],\n";
}
18−Oct−1998
Version 5.005_02
315
perldsc
Perl Programmers Reference Guide
perldsc
# print the whole thing with indices
for $i ( 0 .. $#LoL ) {
print "\t [ @{$LoL[$i]} ],\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#LoL ) {
for $j ( 0 .. $#{ $LoL[$i] } ) {
print "elt $i $j is $LoL[$i][$j]\n";
}
}
HASHES OF LISTS
Declaration of a HASH OF LISTS
%HoL = (
flintstones
jetsons
simpsons
);
=> [ "fred", "barney" ],
=> [ "george", "jane", "elroy" ],
=> [ "homer", "marge", "bart" ],
Generation of a HASH OF LISTS
# reading from file
# flintstones: fred barney wilma dino
while ( <> ) {
next unless s/^(.*?):\s*//;
$HoL{$1} = [ split ];
}
# reading from file; more temps
# flintstones: fred barney wilma dino
while ( $line = <> ) {
($who, $rest) = split /:\s*/, $line, 2;
@fields = split ’ ’, $rest;
$HoL{$who} = [ @fields ];
}
# calling a function that returns a list
for $group ( "simpsons", "jetsons", "flintstones" ) {
$HoL{$group} = [ get_family($group) ];
}
# likewise, but using temps
for $group ( "simpsons", "jetsons", "flintstones" ) {
@members = get_family($group);
$HoL{$group} = [ @members ];
}
# append new members to an existing family
push @{ $HoL{"flintstones"} }, "wilma", "betty";
Access and Printing of a HASH OF LISTS
# one element
$HoL{flintstones}[0] = "Fred";
# another element
$HoL{simpsons}[1] =~ s/(\w)/\u$1/;
# print the whole thing
316
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
perldsc
foreach $family ( keys %HoL ) {
print "$family: @{ $HoL{$family} }\n"
}
# print the whole thing with indices
foreach $family ( keys %HoL ) {
print "family: ";
foreach $i ( 0 .. $#{ $HoL{$family} } ) {
print " $i = $HoL{$family}[$i]";
}
print "\n";
}
# print the whole thing sorted by number of members
foreach $family ( sort { @{$HoL{$b}} <=> @{$HoL{$a}} } keys %HoL ) {
print "$family: @{ $HoL{$family} }\n"
}
# print the whole thing sorted by number of members and name
foreach $family ( sort {
@{$HoL{$b}} <=> @{$HoL{$a}}
||
$a cmp $b
} keys %HoL )
{
print "$family: ", join(", ", sort @{ $HoL{$family} }), "\n";
}
LISTS OF HASHES
Declaration of a LIST OF HASHES
@LoH = (
{
Lead
=>
Friend
=>
},
{
Lead
=>
Wife
=>
Son
=>
},
{
Lead
=>
Wife
=>
Son
=>
}
);
"fred",
"barney",
"george",
"jane",
"elroy",
"homer",
"marge",
"bart",
Generation of a LIST OF HASHES
# reading from file
# format: LEAD=fred FRIEND=barney
while ( <> ) {
$rec = {};
for $field ( split ) {
($key, $value) = split /=/, $field;
$rec−>{$key} = $value;
}
18−Oct−1998
Version 5.005_02
317
perldsc
Perl Programmers Reference Guide
perldsc
push @LoH, $rec;
}
# reading from file
# format: LEAD=fred FRIEND=barney
# no temp
while ( <> ) {
push @LoH, { split /[\s+=]/ };
}
# calling a function that returns a key,value list, like
# "lead","fred","daughter","pebbles"
while ( %fields = getnextpairset() ) {
push @LoH, { %fields };
}
# likewise, but using no temp vars
while (<>) {
push @LoH, { parsepairs($_) };
}
# add key/value to an element
$LoH[0]{pet} = "dino";
$LoH[2]{pet} = "santa’s little helper";
Access and Printing of a LIST OF HASHES
# one element
$LoH[0]{lead} = "fred";
# another element
$LoH[1]{lead} =~ s/(\w)/\u$1/;
# print the whole thing with refs
for $href ( @LoH ) {
print "{ ";
for $role ( keys %$href ) {
print "$role=$href−>{$role} ";
}
print "}\n";
}
# print the whole thing with indices
for $i ( 0 .. $#LoH ) {
print "$i is { ";
for $role ( keys %{ $LoH[$i] } ) {
print "$role=$LoH[$i]{$role} ";
}
print "}\n";
}
# print the whole thing one at a time
for $i ( 0 .. $#LoH ) {
for $role ( keys %{ $LoH[$i] } ) {
print "elt $i $role is $LoH[$i]{$role}\n";
}
}
318
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
perldsc
HASHES OF HASHES
Declaration of a HASH OF HASHES
%HoH = (
flintstones => {
lead
pal
},
jetsons
=> {
lead
wife
"his boy"
},
simpsons
=> {
lead
wife
kid
},
);
=> "fred",
=> "barney",
=> "george",
=> "jane",
=> "elroy",
=> "homer",
=> "marge",
=> "bart",
Generation of a HASH OF HASHES
# reading from file
# flintstones: lead=fred pal=barney wife=wilma pet=dino
while ( <> ) {
next unless s/^(.*?):\s*//;
$who = $1;
for $field ( split ) {
($key, $value) = split /=/, $field;
$HoH{$who}{$key} = $value;
}
# reading from file; more temps
while ( <> ) {
next unless s/^(.*?):\s*//;
$who = $1;
$rec = {};
$HoH{$who} = $rec;
for $field ( split ) {
($key, $value) = split /=/, $field;
$rec−>{$key} = $value;
}
}
# calling a function that returns a key,value hash
for $group ( "simpsons", "jetsons", "flintstones" ) {
$HoH{$group} = { get_family($group) };
}
# likewise, but using temps
for $group ( "simpsons", "jetsons", "flintstones" ) {
%members = get_family($group);
$HoH{$group} = { %members };
}
# append new members to an existing family
%new_folks = (
wife => "wilma",
18−Oct−1998
Version 5.005_02
319
perldsc
Perl Programmers Reference Guide
pet
perldsc
=> "dino",
);
for $what (keys %new_folks) {
$HoH{flintstones}{$what} = $new_folks{$what};
}
Access and Printing of a HASH OF HASHES
# one element
$HoH{flintstones}{wife} = "wilma";
# another element
$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;
# print the whole thing
foreach $family ( keys %HoH ) {
print "$family: { ";
for $role ( keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# print the whole thing somewhat sorted
foreach $family ( sort keys %HoH ) {
print "$family: { ";
for $role ( sort keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# print the whole thing sorted by number of members
foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) {
print "$family: { ";
for $role ( sort keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
# establish a sort order (rank) for each role
$i = 0;
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }
# now print the whole thing sorted by number of members
foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) {
print "$family: { ";
# and print these according to rank order
for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) {
print "$role=$HoH{$family}{$role} ";
}
print "}\n";
}
MORE ELABORATE RECORDS
Declaration of MORE ELABORATE RECORDS
Here‘s a sample showing how to create and use a record whose fields are of many different sorts:
320
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
$rec = {
TEXT
SEQUENCE
LOOKUP
THATCODE
THISCODE
HANDLE
};
=>
=>
=>
=>
=>
=>
perldsc
$string,
[ @old_values ],
{ %some_table },
\&some_function,
sub { $_[0] ** $_[1] },
\*STDOUT,
print $rec−>{TEXT};
print $rec−>{LIST}[0];
$last = pop @ { $rec−>{SEQUENCE} };
print $rec−>{LOOKUP}{"key"};
($first_k, $first_v) = each %{ $rec−>{LOOKUP} };
$answer = $rec−>{THATCODE}−>($arg);
$answer = $rec−>{THISCODE}−>($arg1, $arg2);
# careful of extra block braces on fh ref
print { $rec−>{HANDLE} } "a string\n";
use FileHandle;
$rec−>{HANDLE}−>autoflush(1);
$rec−>{HANDLE}−>print(" a string\n");
Declaration of a HASH OF COMPLEX RECORDS
%TV = (
flintstones => {
series
=> "flintstones",
nights
=> [ qw(monday thursday
members => [
{ name => "fred",
role =>
{ name => "wilma",
role =>
{ name => "pebbles", role =>
],
},
friday) ],
"lead", age
"wife", age
"kid", age
jetsons
=> {
series
=> "jetsons",
nights
=> [ qw(wednesday saturday) ],
members => [
{ name => "george", role => "lead", age
{ name => "jane",
role => "wife", age
{ name => "elroy",
role => "kid", age
],
},
=> 36, },
=> 31, },
=> 4, },
=> 41, },
=> 39, },
=> 9, },
simpsons
=> {
series
=> "simpsons",
nights
=> [ qw(monday) ],
members => [
{ name => "homer", role => "lead", age => 34, },
{ name => "marge", role => "wife", age => 37, },
{ name => "bart", role => "kid", age => 11, },
],
},
);
18−Oct−1998
Version 5.005_02
321
perldsc
Perl Programmers Reference Guide
perldsc
Generation of a HASH OF COMPLEX RECORDS
# reading from file
# this is most easily done by having the file itself be
# in the raw data format as shown above. perl is happy
# to parse complex data structures if declared as data, so
# sometimes it’s easiest to do that
# here’s a piece by piece build up
$rec = {};
$rec−>{series} = "flintstones";
$rec−>{nights} = [ find_days() ];
@members = ();
# assume this file in field=value syntax
while (<>) {
%fields = split /[\s=]+/;
push @members, { %fields };
}
$rec−>{members} = [ @members ];
# now remember the whole thing
$TV{ $rec−>{series} } = $rec;
###########################################################
# now, you might want to make interesting extra fields that
# include pointers back into the same data structure so if
# change one piece, it changes everywhere, like for examples
# if you wanted a {kids} field that was an array reference
# to a list of the kids’ records without having duplicate
# records and thus update problems.
###########################################################
foreach $family (keys %TV) {
$rec = $TV{$family}; # temp pointer
@kids = ();
for $person ( @{ $rec−>{members} } ) {
if ($person−>{role} =~ /kid|son|daughter/) {
push @kids, $person;
}
}
# REMEMBER: $rec and $TV{$family} point to same data!!
$rec−>{kids} = [ @kids ];
}
# you copied the list, but the list itself contains pointers
# to uncopied objects. this means that if you make bart get
# older via
$TV{simpsons}{kids}[0]{age}++;
# then this would also change in
print $TV{simpsons}{members}[2]{age};
# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
# both point to the same underlying anonymous hash table
# print the whole thing
foreach $family ( keys %TV ) {
print "the $family";
322
Version 5.005_02
18−Oct−1998
perldsc
Perl Programmers Reference Guide
perldsc
print " is on during @{ $TV{$family}{nights} }\n";
print "its members are:\n";
for $who ( @{ $TV{$family}{members} } ) {
print " $who−>{name} ($who−>{role}), age $who−>{age}\n";
}
print "it turns out that $TV{$family}{lead} has ";
print scalar ( @{ $TV{$family}{kids} } ), " kids named ";
print join (", ", map { $_−>{name} } @{ $TV{$family}{kids} } );
print "\n";
}
Database Ties
You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is
that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with
how references are to be represented on disk. One experimental module that does partially attempt to
address this need is the MLDBM module. Check your nearest CPAN site as described in perlmodlib for
source code to MLDBM.
SEE ALSO
perlref(1), perllol(1), perldata(1), perlobj(1)
AUTHOR
Tom Christiansen [2][2];
Notice that the outer bracket type has changed, and so our access syntax has also changed. That‘s because
unlike C, in perl you can‘t freely interchange arrays and references thereto. $ref_to_LoL is a reference to
an array, whereas @LoL is an array proper. Likewise, $LoL[2] is not an array, but an array ref. So how
come you can write these:
$LoL[2][2]
$ref_to_LoL−>[2][2]
instead of having to write these:
$LoL[2]−>[2]
$ref_to_LoL−>[2]−>[2]
Well, that‘s because the rule is that on adjacent brackets only (whether square or curly), you are free to omit
the pointer dereferencing arrow. But you cannot do so for the very first one if it‘s a scalar containing a
reference, which means that $ref_to_LoL always needs it.
Growing Your Own
That‘s all well and good for declaration of a fixed data structure, but what if you wanted to add new elements
on the fly, or build it up entirely from scratch?
First, let‘s look at reading it in from a file. This is something like adding a row at a time. We‘ll assume that
there‘s a flat file in which each line is a row and each word an element. If you‘re trying to develop an @LoL
list containing all these, here‘s the right way to do that:
324
Version 5.005_02
18−Oct−1998
perllol
Perl Programmers Reference Guide
perllol
while (<>) {
@tmp = split;
push @LoL, [ @tmp ];
}
You might also have loaded that from a function:
for $i ( 1 .. 10 ) {
$LoL[$i] = [ somefunc($i) ];
}
Or you might have had a temporary variable sitting around with the list in it.
for $i ( 1 .. 10 ) {
@tmp = somefunc($i);
$LoL[$i] = [ @tmp ];
}
It‘s very important that you make sure to use the [] list reference constructor. That‘s because this will be
very wrong:
$LoL[$i] = @tmp;
You see, assigning a named list like that to a scalar just counts the number of elements in @tmp, which
probably isn‘t what you want.
If you are running under use strict, you‘ll have to add some declarations to make it happy:
use strict;
my(@LoL, @tmp);
while (<>) {
@tmp = split;
push @LoL, [ @tmp ];
}
Of course, you don‘t need the temporary array to have a name at all:
while (<>) {
push @LoL, [ split ];
}
You also don‘t have to use push(). You could just make a direct assignment if you knew where you
wanted to put it:
my (@LoL, $i, $line);
for $i ( 0 .. 10 ) {
$line = <>;
$LoL[$i] = [ split ’ ’, $line ];
}
or even just
my (@LoL, $i);
for $i ( 0 .. 10 ) {
$LoL[$i] = [ split ’ ’, <> ];
}
You should in general be leery of using potential list functions in a scalar context without explicitly stating
such. This would be clearer to the casual reader:
my (@LoL, $i);
for $i ( 0 .. 10 ) {
$LoL[$i] = [ split ’ ’, scalar(<>) ];
18−Oct−1998
Version 5.005_02
325
perllol
Perl Programmers Reference Guide
perllol
}
If you wanted to have a $ref_to_LoL variable as a reference to an array, you‘d have to do something like
this:
while (<>) {
push @$ref_to_LoL, [ split ];
}
Now you can add new rows. What about adding new columns? If you‘re dealing with just matrices, it‘s
often easiest to use simple assignment:
for $x (1 .. 10) {
for $y (1 .. 10) {
$LoL[$x][$y] = func($x, $y);
}
}
for $x ( 3, 7, 9 ) {
$LoL[$x][20] += func2($x);
}
It doesn‘t matter whether those elements are already there or not: it‘ll gladly create them for you, setting
intervening elements to undef as need be.
If you wanted just to append to a row, you‘d have to do something a bit funnier looking:
# add new columns to an existing row
push @{ $LoL[0] }, "wilma", "betty";
Notice that I couldn‘t say just:
push $LoL[0], "wilma", "betty";
# WRONG!
In fact, that wouldn‘t even compile. How come? Because the argument to push() must be a real array, not
just a reference to such.
Access and Printing
Now it‘s time to print your data structure out. How are you going to do that? Well, if you want only one of
the elements, it‘s trivial:
print $LoL[0][0];
If you want to print the whole thing, though, you can‘t say
print @LoL;
# WRONG
because you‘ll get just references listed, and perl will never automatically dereference things for you.
Instead, you have to roll yourself a loop or two. This prints the whole structure, using the shell−style for()
construct to loop across the outer set of subscripts.
for $aref ( @LoL ) {
print "\t [ @$aref ],\n";
}
If you wanted to keep track of subscripts, you might do this:
for $i ( 0 .. $#LoL ) {
print "\t elt $i is [ @{$LoL[$i]} ],\n";
}
or maybe even this. Notice the inner loop.
for $i ( 0 .. $#LoL ) {
for $j ( 0 .. $#{$LoL[$i]} ) {
326
Version 5.005_02
18−Oct−1998
perllol
Perl Programmers Reference Guide
perllol
print "elt $i $j is $LoL[$i][$j]\n";
}
}
As you can see, it‘s getting a bit complicated. That‘s why sometimes is easier to take a temporary on your
way through:
for $i ( 0 .. $#LoL ) {
$aref = $LoL[$i];
for $j ( 0 .. $#{$aref} ) {
print "elt $i $j is $LoL[$i][$j]\n";
}
}
Hmm... that‘s still a bit ugly. How about this:
for $i ( 0 .. $#LoL ) {
$aref = $LoL[$i];
$n = @$aref − 1;
for $j ( 0 .. $n ) {
print "elt $i $j is $LoL[$i][$j]\n";
}
}
Slices
If you want to get at a slice (part of a row) in a multidimensional array, you‘re going to have to do some
fancy subscripting. That‘s because while we have a nice synonym for single elements via the pointer arrow
for dereferencing, no such convenience exists for slices. (Remember, of course, that you can always write a
loop to do a slice operation.)
Here‘s how to do one operation using a loop. We‘ll assume an @LoL variable as before.
@part = ();
$x = 4;
for ($y = 7; $y < 13; $y++) {
push @part, $LoL[$x][$y];
}
That same loop could be replaced with a slice operation:
@part = @{ $LoL[4] } [ 7..12 ];
but as you might well imagine, this is pretty rough on the reader.
Ah, but what if you wanted a two−dimensional slice, such as having $x run from 4..8 and $y run from 7 to
12? Hmm... here‘s the simple way:
@newLoL = ();
for ($startx = $x = 4; $x <= 8; $x++) {
for ($starty = $y = 7; $y <= 12; $y++) {
$newLoL[$x − $startx][$y − $starty] = $LoL[$x][$y];
}
}
We can reduce some of the looping through slices
for ($x = 4; $x <= 8; $x++) {
push @newLoL, [ @{ $LoL[$x] } [ 7..12 ] ];
}
If you were into Schwartzian Transforms, you would probably have selected map for that
18−Oct−1998
Version 5.005_02
327
perllol
Perl Programmers Reference Guide
perllol
@newLoL = map { [ @{ $LoL[$_] } [ 7..12 ] ] } 4 .. 8;
Although if your manager accused of seeking job security (or rapid insecurity) through inscrutable code, it
would be hard to argue. :−) If I were you, I‘d put that in a function:
@newLoL = splice_2D( \@LoL, 4 => 8, 7 => 12 );
sub splice_2D {
my $lrr = shift;
# ref to list of list refs!
my ($x_lo, $x_hi,
$y_lo, $y_hi) = @_;
return map {
[ @{ $lrr−>[$_] } [ $y_lo .. $y_hi ] ]
} $x_lo .. $x_hi;
}
SEE ALSO
perldata(1), perlref(1), perldsc(1)
AUTHOR
Tom Christiansen initialize();
return $self;
}
If you care about inheritance (and you should; see Modules: Creation, Use, and Abuse in perlmod), then you
18−Oct−1998
Version 5.005_02
329
perlobj
Perl Programmers Reference Guide
perlobj
want to use the two−arg form of bless so that your constructors may be inherited:
sub new {
my $class = shift;
my $self = {};
bless $self, $class;
$self−>initialize();
return $self;
}
Or if you expect people to call not just CLASS−>new() but also $obj−>new(), then use something like
this. The initialize() method used will be of whatever $class we blessed the object into:
sub new {
my $this = shift;
my $class = ref($this) || $this;
my $self = {};
bless $self, $class;
$self−>initialize();
return $self;
}
Within the class package, the methods will typically deal with the reference as an ordinary reference.
Outside the class package, the reference is generally treated as an opaque value that may be accessed only
through the class‘s methods.
A constructor may re−bless a referenced object currently belonging to another class, but then the new class is
responsible for all cleanup later. The previous blessing is forgotten, as an object may belong to only one
class at a time. (Although of course it‘s free to inherit methods from many classes.) If you find yourself
having to do this, the parent class is probably misbehaving, though.
A clarification: Perl objects are blessed. References are not. Objects know which package they belong to.
References do not. The bless() function uses the reference to find the object. Consider the following
example:
$a = {};
$b = $a;
bless $a, BLAH;
print "\$b is a ", ref($b), "\n";
This reports $b as being a BLAH, so obviously bless() operated on the object and not on the reference.
A Class is Simply a Package
Unlike say C++, Perl doesn‘t provide any special syntax for class definitions. You use a package as a class
by putting method definitions into the class.
There is a special array within each package called @ISA, which says where else to look for a method if you
can‘t find it in the current package. This is how Perl implements inheritance. Each element of the @ISA
array is just the name of another package that happens to be a class package. The classes are searched (depth
first) for missing methods in the order that they occur in @ISA. The classes accessible through @ISA are
known as base classes of the current class.
All classes implicitly inherit from class UNIVERSAL as their last base class. Several commonly used
methods are automatically supplied in the UNIVERSAL class; see "Default UNIVERSAL methods" for more
details.
If a missing method is found in one of the base classes, it is cached in the current class for efficiency.
Changing @ISA or defining new subroutines invalidates the cache and causes Perl to do the lookup again.
If neither the current class, its named base classes, nor the UNIVERSAL class contains the requested
method, these three places are searched all over again, this time looking for a method named AUTOLOAD().
330
Version 5.005_02
18−Oct−1998
perlobj
Perl Programmers Reference Guide
perlobj
If an AUTOLOAD is found, this method is called on behalf of the missing method, setting the package
global $AUTOLOAD to be the fully qualified name of the method that was intended to be called.
If none of that works, Perl finally gives up and complains.
Perl classes do method inheritance only. Data inheritance is left up to the class itself. By and large, this is
not a problem in Perl, because most classes model the attributes of their object using an anonymous hash,
which serves as its own little namespace to be carved up by the various classes that might want to do
something with the object. The only problem with this is that you can‘t sure that you aren‘t using a piece of
the hash that isn‘t already used. A reasonable workaround is to prepend your fieldname in the hash with the
package name.
sub bump {
my $self = shift;
$self−>{ __PACKAGE__ . ".count"}++;
}
A Method is Simply a Subroutine
Unlike say C++, Perl doesn‘t provide any special syntax for method definition. (It does provide a little
syntax for method invocation though. More on that later.) A method expects its first argument to be the
object (reference) or package (string) it is being invoked on. There are just two types of methods, which
we‘ll call class and instance. (Sometimes you‘ll hear these called static and virtual, in honor of the two C++
method types they most closely resemble.)
A class method expects a class name as the first argument. It provides functionality for the class as a whole,
not for any individual object belonging to the class. Constructors are typically class methods. Many class
methods simply ignore their first argument, because they already know what package they‘re in, and don‘t
care what package they were invoked via. (These aren‘t necessarily the same, because class methods follow
the inheritance tree just like ordinary instance methods.) Another typical use for class methods is to look up
an object by name:
sub find {
my ($class, $name) = @_;
$objtable{$name};
}
An instance method expects an object reference as its first argument. Typically it shifts the first argument
into a "self" or "this" variable, and then uses that as an ordinary reference.
sub display {
my $self = shift;
my @keys = @_ ? @_ : sort keys %$self;
foreach $key (@keys) {
print "\t$key => $self−>{$key}\n";
}
}
Method Invocation
There are two ways to invoke a method, one of which you‘re already familiar with, and the other of which
will look familiar. Perl 4 already had an "indirect object" syntax that you use when you say
print STDERR "help!!!\n";
This same syntax can be used to call either class or instance methods. We‘ll use the two methods defined
above, the class method to lookup an object reference and the instance method to print out its attributes.
$fred = find Critter "Fred";
display $fred ’Height’, ’Weight’;
These could be combined into one statement by using a BLOCK in the indirect object slot:
18−Oct−1998
Version 5.005_02
331
perlobj
Perl Programmers Reference Guide
perlobj
display {find Critter "Fred"} ’Height’, ’Weight’;
For C++ fans, there‘s also a syntax using −> notation that does exactly the same thing. The parentheses are
required if there are any arguments.
$fred = Critter−>find("Fred");
$fred−>display(’Height’, ’Weight’);
or in one statement,
Critter−>find("Fred")−>display(’Height’, ’Weight’);
There are times when one syntax is more readable, and times when the other syntax is more readable. The
indirect object syntax is less cluttered, but it has the same ambiguity as ordinary list operators. Indirect object
method calls are parsed using the same rule as list operators: "If it looks like a function, it is a function".
(Presuming for the moment that you think two words in a row can look like a function name. C++
programmers seem to think so with some regularity, especially when the first word is "new".) Thus, the
parentheses of
new Critter (’Barney’, 1.5, 70)
are assumed to surround ALL the arguments of the method call, regardless of what comes after. Saying
new Critter (’Bam’ x 2), 1.4, 45
would be equivalent to
Critter−>new(’Bam’ x 2), 1.4, 45
which is unlikely to do what you want.
There are times when you wish to specify which class‘s method to use. In this case, you can call your
method as an ordinary subroutine call, being sure to pass the requisite first argument explicitly:
$fred = MyCritter::find("Critter", "Fred");
MyCritter::display($fred, ’Height’, ’Weight’);
Note however, that this does not do any inheritance. If you wish merely to specify that Perl should START
looking for a method in a particular package, use an ordinary method call, but qualify the method name with
the package like this:
$fred = Critter−>MyCritter::find("Fred");
$fred−>MyCritter::display(’Height’, ’Weight’);
If you‘re trying to control where the method search begins and you‘re executing in the class itself, then you
may use the SUPER pseudo class, which says to start looking in your base class‘s @ISA list without having
to name it explicitly:
$self−>SUPER::display(’Height’, ’Weight’);
Please note that the SUPER:: construct is meaningful only within the class.
Sometimes you want to call a method when you don‘t know the method name ahead of time. You can use
the arrow form, replacing the method name with a simple scalar variable containing the method name:
$method = $fast ? "findfirst" : "findbest";
$fred−>$method(@args);
Default UNIVERSAL methods
The UNIVERSAL package automatically contains the following methods that are inherited by all other
classes:
isa(CLASS)
isa returns true if its object is blessed into a subclass of CLASS
332
Version 5.005_02
18−Oct−1998
perlobj
Perl Programmers Reference Guide
perlobj
isa is also exportable and can be called as a sub with two arguments. This allows the ability to check
what a reference points to. Example
use UNIVERSAL qw(isa);
if(isa($ref, ’ARRAY’)) {
#...
}
can(METHOD)
can checks to see if its object has a method called METHOD, if it does then a reference to the sub is
returned, if it does not then undef is returned.
VERSION( [NEED] )
VERSION returns the version number of the class (package). If the NEED argument is given then it
will check that the current version (as defined by the $VERSION variable in the given package) not
less than NEED; it will die if this is not the case. This method is normally called as a class method.
This method is called automatically by the VERSION form of use.
use A 1.2 qw(some imported subs);
# implies:
A−>VERSION(1.2);
NOTE: can directly uses Perl‘s internal code for method lookup, and isa uses a very similar method and
cache−ing strategy. This may cause strange effects if the Perl code dynamically changes @ISA in any
package.
You may add other methods to the UNIVERSAL class via Perl or XS code. You do not need to use
UNIVERSAL in order to make these methods available to your program. This is necessary only if you wish
to have isa available as a plain subroutine in the current package.
Destructors
When the last reference to an object goes away, the object is automatically destroyed. (This may even be
after you exit, if you‘ve stored references in global variables.) If you want to capture control just before the
object is freed, you may define a DESTROY method in your class. It will automatically be called at the
appropriate moment, and you can do any extra cleanup you need to do. Perl passes a reference to the object
under destruction as the first (and only) argument. Beware that the reference is a read−only value, and
cannot be modified by manipulating $_[0] within the destructor. The object itself (i.e. the thingy the
reference points to, namely ${$_[0]}, @{$_[0]}, %{$_[0]} etc.) is not similarly constrained.
If you arrange to re−bless the reference before the destructor returns, perl will again call the DESTROY
method for the re−blessed object after the current one returns. This can be used for clean delegation of
object destruction, or for ensuring that destructors in the base classes of your choosing get called. Explicitly
calling DESTROY is also possible, but is usually never needed.
Do not confuse the foregoing with how objects CONTAINED in the current one are destroyed. Such objects
will be freed and destroyed automatically when the current object is freed, provided no other references to
them exist elsewhere.
WARNING
While indirect object syntax may well be appealing to English speakers and to C++ programmers, be not
seduced! It suffers from two grave problems.
The first problem is that an indirect object is limited to a name, a scalar variable, or a block, because it would
have to do too much lookahead otherwise, just like any other postfix dereference in the language. (These are
the same quirky rules as are used for the filehandle slot in functions like print and printf.) This can
lead to horribly confusing precedence problems, as in these next two lines:
move $obj−>{FIELD};
move $ary[$i];
18−Oct−1998
# probably wrong!
# probably wrong!
Version 5.005_02
333
perlobj
Perl Programmers Reference Guide
perlobj
Those actually parse as the very surprising:
$obj−>move−>{FIELD};
$ary−>move−>[$i];
# Well, lookee here
# Didn’t expect this one, eh?
Rather than what you might have expected:
$obj−>{FIELD}−>move();
$ary[$i]−>move;
# You should be so lucky.
# Yeah, sure.
The left side of ‘‘−>‘’ is not so limited, because it‘s an infix operator, not a postfix operator.
As if that weren‘t bad enough, think about this: Perl must guess at compile time whether name and move
above are functions or methods. Usually Perl gets it right, but when it doesn‘t it, you get a function call
compiled as a method, or vice versa. This can introduce subtle bugs that are hard to unravel. For example,
calling a method new in indirect notation—as C++ programmers are so wont to do—can be miscompiled
into a subroutine call if there‘s already a new function in scope. You‘d end up calling the current package‘s
new as a subroutine, rather than the desired class‘s method. The compiler tries to cheat by remembering
bareword requires, but the grief if it messes up just isn‘t worth the years of debugging it would likely take
you to to track such subtle bugs down.
The infix arrow notation using ‘‘−>‘’ doesn‘t suffer from either of these disturbing ambiguities, so we
recommend you use it exclusively.
Summary
That‘s about all there is to it. Now you need just to go off and buy a book about object−oriented design
methodology, and bang your forehead with it for the next six months or so.
Two−Phased Garbage Collection
For most purposes, Perl uses a fast and simple reference−based garbage collection system. For this reason,
there‘s an extra dereference going on at some level, so if you haven‘t built your Perl executable using your C
compiler‘s −O flag, performance will suffer. If you have built Perl with cc −O, then this probably won‘t
matter.
A more serious concern is that unreachable memory with a non−zero reference count will not normally get
freed. Therefore, this is a bad idea:
{
my $a;
$a = \$a;
}
Even thought $a should go away, it can‘t. When building recursive data structures, you‘ll have to break the
self−reference yourself explicitly if you don‘t care to leak. For example, here‘s a self−referential node such
as one might use in a sophisticated tree structure:
sub new_node {
my $self = shift;
my $class = ref($self) || $self;
my $node = {};
$node−>{LEFT} = $node−>{RIGHT} = $node;
$node−>{DATA} = [ @_ ];
return bless $node => $class;
}
If you create nodes like that, they (currently) won‘t go away unless you break their self reference yourself.
(In other words, this is not to be construed as a feature, and you shouldn‘t depend on it.)
Almost.
When an interpreter thread finally shuts down (usually when your program exits), then a rather costly but
complete mark−and−sweep style of garbage collection is performed, and everything allocated by that thread
334
Version 5.005_02
18−Oct−1998
perlobj
Perl Programmers Reference Guide
perlobj
gets destroyed. This is essential to support Perl as an embedded or a multithreadable language. For
example, this program demonstrates Perl‘s two−phased garbage collection:
#!/usr/bin/perl
package Subtle;
sub new {
my $test;
$test = \$test;
warn "CREATING " . \$test;
return bless \$test;
}
sub DESTROY {
my $self = shift;
warn "DESTROYING $self";
}
package main;
warn "starting program";
{
my $a = Subtle−>new;
my $b = Subtle−>new;
$$a = 0; # break selfref
warn "leaving block";
}
warn "just exited block";
warn "time to die...";
exit;
When run as /tmp/test, the following output is produced:
starting program at /tmp/test line 18.
CREATING SCALAR(0x8e5b8) at /tmp/test line 7.
CREATING SCALAR(0x8e57c) at /tmp/test line 7.
leaving block at /tmp/test line 23.
DESTROYING Subtle=SCALAR(0x8e5b8) at /tmp/test line 13.
just exited block at /tmp/test line 26.
time to die... at /tmp/test line 27.
DESTROYING Subtle=SCALAR(0x8e57c) during global destruction.
Notice that "global destruction" bit there? That‘s the thread garbage collector reaching the unreachable.
Objects are always destructed, even when regular refs aren‘t and in fact are destructed in a separate pass
before ordinary refs just to try to prevent object destructors from using refs that have been themselves
destructed. Plain refs are only garbage−collected if the destruct level is greater than 0. You can test the
higher levels of global destruction by setting the PERL_DESTRUCT_LEVEL environment variable,
presuming −DDEBUGGING was enabled during perl build time.
A more complete garbage collection strategy will be implemented at a future date.
In the meantime, the best solution is to create a non−recursive container class that holds a pointer to the
self−referential data structure. Define a DESTROY method for the containing object‘s class that manually
breaks the circularities in the self−referential structure.
SEE ALSO
A kinder, gentler tutorial on object−oriented programming in Perl can be found in perltoot. You should also
check out perlbot for other object tricks, traps, and tips, as well as perlmodlib for some style guides on
constructing both modules and classes.
18−Oct−1998
Version 5.005_02
335
perltie
Perl Programmers Reference Guide
perltie
NAME
perltie − how to hide an object class in a simple variable
SYNOPSIS
tie VARIABLE, CLASSNAME, LIST
$object = tied VARIABLE
untie VARIABLE
DESCRIPTION
Prior to release 5.0 of Perl, a programmer could use dbmopen() to connect an on−disk database in the
standard Unix dbm(3x) format magically to a %HASH in their program. However, their Perl was either built
with one particular dbm library or another, but not both, and you couldn‘t extend this mechanism to other
packages or types of variables.
Now you can.
The tie() function binds a variable to a class (package) that will provide the implementation for access
methods for that variable. Once this magic has been performed, accessing a tied variable automatically
triggers method calls in the proper class. The complexity of the class is hidden behind magic methods calls.
The method names are in ALL CAPS, which is a convention that Perl uses to indicate that they‘re called
implicitly rather than explicitly—just like the BEGIN() and END() functions.
In the tie() call, VARIABLE is the name of the variable to be enchanted. CLASSNAME is the name of a
class implementing objects of the correct type. Any additional arguments in the LIST are passed to the
appropriate constructor method for that class—meaning TIESCALAR(), TIEARRAY(), TIEHASH(), or
TIEHANDLE(). (Typically these are arguments such as might be passed to the dbminit() function of
C.) The object returned by the "new" method is also returned by the tie() function, which would be useful
if you wanted to access other methods in CLASSNAME. (You don‘t actually have to return a reference to a
right "type" (e.g., HASH or CLASSNAME) so long as it‘s a properly blessed object.) You can also retrieve a
reference to the underlying object using the tied() function.
Unlike dbmopen(), the tie() function will not use or require a module for you—you need to do that
explicitly yourself.
Tying Scalars
A class implementing a tied scalar should define the following methods: TIESCALAR, FETCH, STORE,
and possibly DESTROY.
Let‘s look at each in turn, using as an example a tie class for scalars that allows the user to do something
like:
tie $his_speed, ’Nice’, getppid();
tie $my_speed, ’Nice’, $$;
And now whenever either of those variables is accessed, its current system priority is retrieved and returned.
If those variables are set, then the process‘s priority is changed!
We‘ll use Jarkko Hietaniemi PRIO_MAX) {
carp sprintf
"WARNING: priority %d greater than maximum system priority %d",
$new_nicety, PRIO_MAX if $^W;
$new_nicety = PRIO_MAX;
}
unless (defined setpriority(PRIO_PROCESS, $$self, $new_nicety)) {
confess "setpriority failed: $!";
}
return $new_nicety;
}
DESTROY this
This method will be triggered when the tied variable needs to be destructed. As with other object
classes, such a method is seldom necessary, because Perl deallocates its moribund object‘s memory for
you automatically—this isn‘t C++, you know. We‘ll use a DESTROY method here for debugging
purposes only.
sub DESTROY {
my $self = shift;
confess "wrong type" unless ref $self;
carp "[ Nice::DESTROY pid $$self ]" if $Nice::DEBUG;
}
That‘s about all there is to it. Actually, it‘s more than all there is to it, because we‘ve done a few nice things
here for the sake of completeness, robustness, and general aesthetics. Simpler TIESCALAR classes are
certainly possible.
Tying Arrays
A class implementing a tied ordinary array should define the following methods: TIEARRAY, FETCH,
STORE, FETCHSIZE, STORESIZE and perhaps DESTROY.
FETCHSIZE and STORESIZE are used to provide $#array and equivalent scalar(@array) access.
The methods POP, PUSH, SHIFT, UNSHIFT, SPLICE are required if the perl operator with the
corresponding (but lowercase) name is to operate on the tied array. The Tie::Array class can be used as a
base class to implement these in terms of the basic five methods above.
In addition EXTEND will be called when perl would have pre−extended allocation in a real array.
This means that tied arrays are now complete. The example below needs upgrading to illustrate this. (The
documentation in Tie::Array is more complete.)
For this discussion, we‘ll implement an array whose indices are fixed at its creation. If you try to access
anything beyond those bounds, you‘ll take an exception. For example:
require Bounded_Array;
tie @ary, ’Bounded_Array’, 2;
$| = 1;
for $i (0 .. 10) {
print "setting index $i: ";
$ary[$i] = 10 * $i;
$ary[$i] = 10 * $i;
print "value of elt $i now $ary[$i]\n";
}
338
Version 5.005_02
18−Oct−1998
perltie
Perl Programmers Reference Guide
perltie
The preamble code for the class is as follows:
package Bounded_Array;
use Carp;
use strict;
TIEARRAY classname, LIST
This is the constructor for the class. That means it is expected to return a blessed reference through
which the new array (probably an anonymous ARRAY ref) will be accessed.
In our example, just to show you that you don‘t really have to return an ARRAY reference, we‘ll
choose a HASH reference to represent our object. A HASH works out well as a generic record type:
the {BOUND} field will store the maximum bound allowed, and the {ARRAY} field will hold the true
ARRAY ref. If someone outside the class tries to dereference the object returned (doubtless thinking it
an ARRAY ref), they‘ll blow up. This just goes to show you that you should respect an object‘s
privacy.
sub TIEARRAY {
my $class = shift;
my $bound = shift;
confess "usage: tie(\@ary, ’Bounded_Array’, max_subscript)"
if @_ || $bound =~ /\D/;
return bless {
BOUND => $bound,
ARRAY => [],
}, $class;
}
FETCH this, index
This method will be triggered every time an individual element the tied array is accessed (read). It
takes one argument beyond its self reference: the index whose value we‘re trying to fetch.
sub FETCH {
my($self,$idx) = @_;
if ($idx > $self−>{BOUND}) {
confess "Array OOB: $idx > $self−>{BOUND}";
}
return $self−>{ARRAY}[$idx];
}
As you may have noticed, the name of the FETCH method (et al.) is the same for all accesses, even
though the constructors differ in names (TIESCALAR vs TIEARRAY). While in theory you could
have the same class servicing several tied types, in practice this becomes cumbersome, and it‘s easiest
to keep them at simply one tie type per class.
STORE this, index, value
This method will be triggered every time an element in the tied array is set (written). It takes two
arguments beyond its self reference: the index at which we‘re trying to store something and the value
we‘re trying to put there. For example:
sub STORE {
my($self, $idx, $value) = @_;
print "[STORE $value at $idx]\n" if _debug;
if ($idx > $self−>{BOUND} ) {
confess "Array OOB: $idx > $self−>{BOUND}";
}
return $self−>{ARRAY}[$idx] = $value;
}
18−Oct−1998
Version 5.005_02
339
perltie
Perl Programmers Reference Guide
perltie
DESTROY this
This method will be triggered when the tied variable needs to be destructed. As with the scalar tie
class, this is almost never needed in a language that does its own garbage collection, so this time we‘ll
just leave it out.
The code we presented at the top of the tied array class accesses many elements of the array, far more than
we‘ve set the bounds to. Therefore, it will blow up once they try to access beyond the 2nd element of @ary,
as the following output demonstrates:
setting
setting
setting
setting
index 0: value of elt 0 now 0
index 1: value of elt 1 now 10
index 2: value of elt 2 now 20
index 3: Array OOB: 3 > 2 at Bounded_Array.pm line 39
Bounded_Array::FETCH called at testba line 12
Tying Hashes
As the first Perl data type to be tied (see dbmopen()), hashes have the most complete and useful tie()
implementation. A class implementing a tied hash should define the following methods: TIEHASH is the
constructor. FETCH and STORE access the key and value pairs. EXISTS reports whether a key is present in
the hash, and DELETE deletes one. CLEAR empties the hash by deleting all the key and value pairs.
FIRSTKEY and NEXTKEY implement the keys() and each() functions to iterate over all the keys. And
DESTROY is called when the tied variable is garbage collected.
If this seems like a lot, then feel free to inherit from merely the standard Tie::Hash module for most of your
methods, redefining only the interesting ones. See Tie::Hash for details.
Remember that Perl distinguishes between a key not existing in the hash, and the key existing in the hash but
having a corresponding value of undef. The two possibilities can be tested with the exists() and
defined() functions.
Here‘s an example of a somewhat interesting tied hash class: it gives you a hash representing a particular
user‘s dot files. You index into the hash with the name of the file (minus the dot) and you get back that dot
file‘s contents. For example:
use DotFiles;
tie %dot, ’DotFiles’;
if ( $dot{profile} =~ /MANPATH/ ||
$dot{login}
=~ /MANPATH/ ||
$dot{cshrc}
=~ /MANPATH/
)
{
print "you seem to set your MANPATH\n";
}
Or here‘s another sample of using our tied class:
tie %him, ’DotFiles’, ’daemon’;
foreach $f ( keys %him ) {
printf "daemon dot file %s is size %d\n",
$f, length $him{$f};
}
In our tied hash DotFiles example, we use a regular hash for the object containing several important fields, of
which only the {LIST} field will be what the user thinks of as the real hash.
USER
whose dot files this object represents
340
Version 5.005_02
18−Oct−1998
perltie
Perl Programmers Reference Guide
perltie
HOME
where those dot files live
CLOBBER
whether we should try to change or remove those dot files
LIST the hash of dot file names and content mappings
Here‘s the start of Dotfiles.pm:
package DotFiles;
use Carp;
sub whowasi { (caller(1))[3] . ’()’ }
my $DEBUG = 0;
sub debug { $DEBUG = @_ ? shift : 1 }
For our example, we want to be able to emit debugging info to help in tracing during development. We keep
also one convenience function around internally to help print out warnings; whowasi() returns the
function name that calls it.
Here are the methods for the DotFiles tied hash.
TIEHASH classname, LIST
This is the constructor for the class. That means it is expected to return a blessed reference through
which the new object (probably but not necessarily an anonymous hash) will be accessed.
Here‘s the constructor:
sub TIEHASH {
my $self = shift;
my $user = shift || $>;
my $dotdir = shift || ’’;
croak "usage: @{[&whowasi]} [USER [DOTDIR]]" if @_;
$user = getpwuid($user) if $user =~ /^\d+$/;
my $dir = (getpwnam($user))[7]
|| croak "@{[&whowasi]}: no user $user";
$dir .= "/$dotdir" if $dotdir;
my $node = {
USER
=>
HOME
=>
LIST
=>
CLOBBER =>
};
$user,
$dir,
{},
0,
opendir(DIR, $dir)
|| croak "@{[&whowasi]}: can’t opendir $dir: $!";
foreach $dot ( grep /^\./ && −f "$dir/$_", readdir(DIR)) {
$dot =~ s/^\.//;
$node−>{LIST}{$dot} = undef;
}
closedir DIR;
return bless $node, $self;
}
It‘s probably worth mentioning that if you‘re going to filetest the return values out of a readdir, you‘d
better prepend the directory in question. Otherwise, because we didn‘t chdir() there, it would have
been testing the wrong file.
18−Oct−1998
Version 5.005_02
341
perltie
Perl Programmers Reference Guide
perltie
FETCH this, key
This method will be triggered every time an element in the tied hash is accessed (read). It takes one
argument beyond its self reference: the key whose value we‘re trying to fetch.
Here‘s the fetch for our DotFiles example.
sub FETCH {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $dir = $self−>{HOME};
my $file = "$dir/.$dot";
unless (exists $self−>{LIST}−>{$dot} || −f $file) {
carp "@{[&whowasi]}: no $dot file" if $DEBUG;
return undef;
}
if (defined $self−>{LIST}−>{$dot}) {
return $self−>{LIST}−>{$dot};
} else {
return $self−>{LIST}−>{$dot} = ‘cat $dir/.$dot‘;
}
}
It was easy to write by having it call the Unix cat(1) command, but it would probably be more portable
to open the file manually (and somewhat more efficient). Of course, because dot files are a Unixy
concept, we‘re not that concerned.
STORE this, key, value
This method will be triggered every time an element in the tied hash is set (written). It takes two
arguments beyond its self reference: the index at which we‘re trying to store something, and the value
we‘re trying to put there.
Here in our DotFiles example, we‘ll be careful not to let them try to overwrite the file unless they‘ve
called the clobber() method on the original object reference returned by tie().
sub STORE {
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $value = shift;
my $file = $self−>{HOME} . "/.$dot";
my $user = $self−>{USER};
croak "@{[&whowasi]}: $file not clobberable"
unless $self−>{CLOBBER};
open(F, "> $file") || croak "can’t open $file: $!";
print F $value;
close(F);
}
If they wanted to clobber something, they might say:
$ob = tie %daemon_dots, ’daemon’;
$ob−>clobber(1);
$daemon_dots{signature} = "A true daemon\n";
Another way to lay hands on a reference to the underlying object is to use the tied() function, so
342
Version 5.005_02
18−Oct−1998
perltie
Perl Programmers Reference Guide
perltie
they might alternately have set clobber using:
tie %daemon_dots, ’daemon’;
tied(%daemon_dots)−>clobber(1);
The clobber method is simply:
sub clobber {
my $self = shift;
$self−>{CLOBBER} = @_ ? shift : 1;
}
DELETE this, key
This method is triggered when we remove an element from the hash, typically by using the
delete() function. Again, we‘ll be careful to check whether they really want to clobber files.
sub DELETE
{
carp &whowasi if $DEBUG;
my $self = shift;
my $dot = shift;
my $file = $self−>{HOME} . "/.$dot";
croak "@{[&whowasi]}: won’t remove file $file"
unless $self−>{CLOBBER};
delete $self−>{LIST}−>{$dot};
my $success = unlink($file);
carp "@{[&whowasi]}: can’t unlink $file: $!" unless $success;
$success;
}
The value returned by DELETE becomes the return value of the call to delete(). If you want to
emulate the normal behavior of delete(), you should return whatever FETCH would have returned
for this key. In this example, we have chosen instead to return a value which tells the caller whether
the file was successfully deleted.
CLEAR this
This method is triggered when the whole hash is to be cleared, usually by assigning the empty list to it.
In our example, that would remove all the user‘s dot files! It‘s such a dangerous thing that they‘ll have
to set CLOBBER to something higher than 1 to make it happen.
sub CLEAR
{
carp &whowasi if $DEBUG;
my $self = shift;
croak "@{[&whowasi]}: won’t remove all dot files for $self−>{USER}"
unless $self−>{CLOBBER} > 1;
my $dot;
foreach $dot ( keys %{$self−>{LIST}}) {
$self−>DELETE($dot);
}
}
EXISTS this, key
This method is triggered when the user uses the exists() function on a particular hash. In our
example, we‘ll look at the {LIST} hash element for this:
sub EXISTS
{
carp &whowasi if $DEBUG;
my $self = shift;
18−Oct−1998
Version 5.005_02
343
perltie
Perl Programmers Reference Guide
perltie
my $dot = shift;
return exists $self−>{LIST}−>{$dot};
}
FIRSTKEY this
This method will be triggered when the user is going to iterate through the hash, such as via a keys()
or each() call.
sub FIRSTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
my $a = keys %{$self−>{LIST}};
each %{$self−>{LIST}}
}
# reset each() iterator
NEXTKEY this, lastkey
This method gets triggered during a keys() or each() iteration. It has a second argument which is
the last key that had been accessed. This is useful if you‘re carrying about ordering or calling the
iterator from more than one sequence, or not really storing things in a hash anywhere.
For our example, we‘re using a real hash so we‘ll do just the simple thing, but we‘ll have to go through
the LIST field indirectly.
sub NEXTKEY {
carp &whowasi if $DEBUG;
my $self = shift;
return each %{ $self−>{LIST} }
}
DESTROY this
This method is triggered when a tied hash is about to go out of scope. You don‘t really need it unless
you‘re trying to add debugging or have auxiliary state to clean up. Here‘s a very simple function:
sub DESTROY {
carp &whowasi if $DEBUG;
}
Note that functions such as keys() and values() may return huge lists when used on large objects, like
DBM files. You may prefer to use the each() function to iterate over such. Example:
# print out history file offsets
use NDBM_File;
tie(%HIST, ’NDBM_File’, ’/usr/lib/news/history’, 1, 0);
while (($key,$val) = each %HIST) {
print $key, ’ = ’, unpack(’L’,$val), "\n";
}
untie(%HIST);
Tying FileHandles
This is partially implemented now.
A class implementing a tied filehandle should define the following methods: TIEHANDLE, at least one of
PRINT, PRINTF, WRITE, READLINE, GETC, READ, and possibly CLOSE and DESTROY.
It is especially useful when perl is embedded in some other program, where output to STDOUT and
STDERR may have to be redirected in some special way. See nvi and the Apache module for examples.
In our example we‘re going to create a shouting handle.
package Shout;
344
Version 5.005_02
18−Oct−1998
perltie
Perl Programmers Reference Guide
perltie
TIEHANDLE classname, LIST
This is the constructor for the class. That means it is expected to return a blessed reference of some
sort. The reference can be used to hold some internal information.
sub TIEHANDLE { print "\n"; my $i; bless \$i, shift }
WRITE this, LIST
This method will be called when the handle is written to via the syswrite function.
sub WRITE {
$r = shift;
my($buf,$len,$offset) = @_;
print "WRITE called, \$buf=$buf, \$len=$len, \$offset=$offset";
}
PRINT this, LIST
This method will be triggered every time the tied handle is printed to with the print() function.
Beyond its self reference it also expects the list that was passed to the print function.
sub PRINT { $r = shift; $$r++; print join($,,map(uc($_),@_)),$\ }
PRINTF this, LIST
This method will be triggered every time the tied handle is printed to with the printf() function.
Beyond its self reference it also expects the format and list that was passed to the printf function.
sub PRINTF {
shift;
my $fmt = shift;
print sprintf($fmt, @_)."\n";
}
READ this, LIST
This method will be called when the handle is read from via the read or sysread functions.
sub READ {
$r = shift;
my($buf,$len,$offset) = @_;
print "READ called, \$buf=$buf, \$len=$len, \$offset=$offset";
}
READLINE this
This method will be called when the handle is read from via \n" }
Here‘s how to use our little example:
tie(*FOO,’Shout’);
print FOO "hello\n";
$a = 4; $b = 6;
print FOO $a, " plus ", $b, " equals ", $a + $b, "\n";
print ;
The untie Gotcha
If you intend making use of the object returned from either tie() or tied(), and if the tie‘s target class
defines a destructor, there is a subtle gotcha you must guard against.
As setup, consider this (admittedly rather contrived) example of a tie; all it does is use a file to keep a log of
the values assigned to a scalar.
package Remember;
use strict;
use IO::File;
sub TIESCALAR {
my $class = shift;
my $filename = shift;
my $handle = new IO::File "> $filename"
or die "Cannot open $filename: $!\n";
print $handle "The Start\n";
bless {FH => $handle, Value => 0}, $class;
}
sub FETCH {
my $self = shift;
return $self−>{Value};
}
sub STORE {
my $self = shift;
my $value = shift;
my $handle = $self−>{FH};
print $handle "$value\n";
$self−>{Value} = $value;
}
sub DESTROY {
my $self = shift;
my $handle = $self−>{FH};
print $handle "The End\n";
close $handle;
}
1;
Here is an example that makes use of this tie:
use strict;
use Remember;
my $fred;
tie $fred, ’Remember’, ’myfile.txt’;
$fred = 1;
346
Version 5.005_02
18−Oct−1998
perltie
Perl Programmers Reference Guide
perltie
$fred = 4;
$fred = 5;
untie $fred;
system "cat myfile.txt";
This is the output when it is executed:
The Start
1
4
5
The End
So far so good. Those of you who have been paying attention will have spotted that the tied object hasn‘t
been used so far. So lets add an extra method to the Remember class to allow comments to be included in
the file — say, something like this:
sub comment {
my $self = shift;
my $text = shift;
my $handle = $self−>{FH};
print $handle $text, "\n";
}
And here is the previous example modified to use the comment method (which requires the tied object):
use strict;
use Remember;
my ($fred, $x);
$x = tie $fred, ’Remember’, ’myfile.txt’;
$fred = 1;
$fred = 4;
comment $x "changing...";
$fred = 5;
untie $fred;
system "cat myfile.txt";
When this code is executed there is no output. Here‘s why:
When a variable is tied, it is associated with the object which is the return value of the TIESCALAR,
TIEARRAY, or TIEHASH function. This object normally has only one reference, namely, the implicit
reference from the tied variable. When untie() is called, that reference is destroyed. Then, as in the first
example above, the object‘s destructor (DESTROY) is called, which is normal for objects that have no more
valid references; and thus the file is closed.
In the second example, however, we have stored another reference to the tied object in $x. That means that
when untie() gets called there will still be a valid reference to the object in existence, so the destructor is
not called at that time, and thus the file is not closed. The reason there is no output is because the file buffers
have not been flushed to disk.
Now that you know what the problem is, what can you do to avoid it? Well, the good old −w flag will spot
any instances where you call untie() and there are still valid references to the tied object. If the second
script above is run with the −w flag, Perl prints this warning message:
untie attempted while 1 inner references still exist
To get the script to work properly and silence the warning make sure there are no valid references to the tied
object before untie() is called:
undef $x;
18−Oct−1998
Version 5.005_02
347
perltie
Perl Programmers Reference Guide
perltie
untie $fred;
SEE ALSO
See DB_File or Config for some interesting tie() implementations.
BUGS
Tied arrays are incomplete. They are also distinctly lacking something for the $#ARRAY access (which is
hard, as it‘s an lvalue), as well as the other obvious array functions, like push(), pop(), shift(),
unshift(), and splice().
You cannot easily tie a multilevel data structure (such as a hash of hashes) to a dbm file. The first problem is
that all but GDBM and Berkeley DB have size limitations, but beyond that, you also have problems with
how references are to be represented on disk. One experimental module that does attempt to address this
need partially is the MLDBM module. Check your nearest CPAN site as described in perlmodlib for source
code to MLDBM.
AUTHOR
Tom Christiansen
TIEHANDLE by Sven Verdoolaege {’High’} = $params{’High’};
$self−>{’Low’} = $params{’Low’};
bless $self, $type;
}
package Bar;
18−Oct−1998
Version 5.005_02
349
perlbot
Perl Programmers Reference Guide
perlbot
sub new {
my $type = shift;
my %params = @_;
my $self = [];
$self−>[0] = $params{’Left’};
$self−>[1] = $params{’Right’};
bless $self, $type;
}
package main;
$a = Foo−>new( ’High’ => 42, ’Low’ => 11 );
print "High=$a−>{’High’}\n";
print "Low=$a−>{’Low’}\n";
$b = Bar−>new( ’Left’ => 78, ’Right’ => 40 );
print "Left=$b−>[0]\n";
print "Right=$b−>[1]\n";
SCALAR INSTANCE VARIABLES
An anonymous scalar can be used when only one instance variable is needed.
package Foo;
sub new {
my $type = shift;
my $self;
$self = shift;
bless \$self, $type;
}
package main;
$a = Foo−>new( 42 );
print "a=$$a\n";
INSTANCE VARIABLE INHERITANCE
This example demonstrates how one might inherit instance variables from a superclass for inclusion in the
new class. This requires calling the superclass‘s constructor and adding one‘s own instance variables to the
new object.
package Bar;
sub new {
my $type = shift;
my $self = {};
$self−>{’buz’} = 42;
bless $self, $type;
}
package Foo;
@ISA = qw( Bar );
sub new {
my $type = shift;
my $self = Bar−>new;
$self−>{’biz’} = 11;
bless $self, $type;
}
package main;
350
Version 5.005_02
18−Oct−1998
perlbot
Perl Programmers Reference Guide
perlbot
$a = Foo−>new;
print "buz = ", $a−>{’buz’}, "\n";
print "biz = ", $a−>{’biz’}, "\n";
OBJECT RELATIONSHIPS
The following demonstrates how one might implement "containing" and "using" relationships between
objects.
package Bar;
sub new {
my $type = shift;
my $self = {};
$self−>{’buz’} = 42;
bless $self, $type;
}
package Foo;
sub new {
my $type = shift;
my $self = {};
$self−>{’Bar’} = Bar−>new;
$self−>{’biz’} = 11;
bless $self, $type;
}
package main;
$a = Foo−>new;
print "buz = ", $a−>{’Bar’}−>{’buz’}, "\n";
print "biz = ", $a−>{’biz’}, "\n";
OVERRIDING SUPERCLASS METHODS
The following example demonstrates how to override a superclass method and then call the overridden
method. The SUPER pseudo−class allows the programmer to call an overridden superclass method without
actually knowing where that method is defined.
package Buz;
sub goo { print "here’s the goo\n" }
package Bar; @ISA = qw( Buz );
sub google { print "google here\n" }
package Baz;
sub mumble { print "mumbling\n" }
package Foo;
@ISA = qw( Bar Baz );
sub new {
my $type = shift;
bless [], $type;
}
sub grr { print "grumble\n" }
sub goo {
my $self = shift;
$self−>SUPER::goo();
}
sub mumble {
my $self = shift;
18−Oct−1998
Version 5.005_02
351
perlbot
Perl Programmers Reference Guide
perlbot
$self−>SUPER::mumble();
}
sub google {
my $self = shift;
$self−>SUPER::google();
}
package main;
$foo = Foo−>new;
$foo−>mumble;
$foo−>grr;
$foo−>goo;
$foo−>google;
USING RELATIONSHIP WITH SDBM
This example demonstrates an interface for the SDBM class. This creates a "using" relationship between the
SDBM class and the new class Mydbm.
package Mydbm;
require SDBM_File;
require Tie::Hash;
@ISA = qw( Tie::Hash );
sub TIEHASH {
my $type = shift;
my $ref = SDBM_File−>new(@_);
bless {’dbm’ => $ref}, $type;
}
sub FETCH {
my $self = shift;
my $ref = $self−>{’dbm’};
$ref−>FETCH(@_);
}
sub STORE {
my $self = shift;
if (defined $_[0]){
my $ref = $self−>{’dbm’};
$ref−>STORE(@_);
} else {
die "Cannot STORE an undefined key in Mydbm\n";
}
}
package main;
use Fcntl qw( O_RDWR O_CREAT );
tie %foo, "Mydbm", "Sdbm", O_RDWR|O_CREAT, 0640;
$foo{’bar’} = 123;
print "foo−bar = $foo{’bar’}\n";
tie %bar, "Mydbm", "Sdbm2", O_RDWR|O_CREAT, 0640;
$bar{’Cathy’} = 456;
print "bar−Cathy = $bar{’Cathy’}\n";
352
Version 5.005_02
18−Oct−1998
perlbot
Perl Programmers Reference Guide
perlbot
THINKING OF CODE REUSE
One strength of Object−Oriented languages is the ease with which old code can use new code. The
following examples will demonstrate first how one can hinder code reuse and then how one can promote
code reuse.
This first example illustrates a class which uses a fully−qualified method call to access the "private" method
BAZ(). The second example will show that it is impossible to override the BAZ() method.
package FOO;
sub new {
my $type = shift;
bless {}, $type;
}
sub bar {
my $self = shift;
$self−>FOO::private::BAZ;
}
package FOO::private;
sub BAZ {
print "in BAZ\n";
}
package main;
$a = FOO−>new;
$a−>bar;
Now we try to override the BAZ() method. We would like FOO::bar() to call GOOP::BAZ(), but this
cannot happen because FOO::bar() explicitly calls FOO::private::BAZ().
package FOO;
sub new {
my $type = shift;
bless {}, $type;
}
sub bar {
my $self = shift;
$self−>FOO::private::BAZ;
}
package FOO::private;
sub BAZ {
print "in BAZ\n";
}
package GOOP;
@ISA = qw( FOO );
sub new {
my $type = shift;
bless {}, $type;
}
sub BAZ {
print "in GOOP::BAZ\n";
}
18−Oct−1998
Version 5.005_02
353
perlbot
Perl Programmers Reference Guide
perlbot
package main;
$a = GOOP−>new;
$a−>bar;
To create reusable code we must modify class FOO, flattening class FOO::private. The next example shows
a reusable class FOO which allows the method GOOP::BAZ() to be used in place of FOO::BAZ().
package FOO;
sub new {
my $type = shift;
bless {}, $type;
}
sub bar {
my $self = shift;
$self−>BAZ;
}
sub BAZ {
print "in BAZ\n";
}
package GOOP;
@ISA = qw( FOO );
sub new {
my $type = shift;
bless {}, $type;
}
sub BAZ {
print "in GOOP::BAZ\n";
}
package main;
$a = GOOP−>new;
$a−>bar;
CLASS CONTEXT AND THE OBJECT
Use the object to solve package and class context problems. Everything a method needs should be available
via the object or should be passed as a parameter to the method.
A class will sometimes have static or global data to be used by the methods. A subclass may want to
override that data and replace it with new data. When this happens the superclass may not know how to find
the new copy of the data.
This problem can be solved by using the object to define the context of the method. Let the method look in
the object for a reference to the data. The alternative is to force the method to go hunting for the data ("Is it
in my class, or in a subclass? Which subclass?"), and this can be inconvenient and will lead to hackery. It is
better just to let the object tell the method where that data is located.
package Bar;
%fizzle = ( ’Password’ => ’XYZZY’ );
sub new {
my $type = shift;
my $self = {};
$self−>{’fizzle’} = \%fizzle;
bless $self, $type;
}
354
Version 5.005_02
18−Oct−1998
perlbot
Perl Programmers Reference Guide
perlbot
sub enter {
my $self = shift;
# Don’t try to guess if we should use %Bar::fizzle
# or %Foo::fizzle. The object already knows which
# we should use, so just ask it.
#
my $fizzle = $self−>{’fizzle’};
print "The word is ", $fizzle−>{’Password’}, "\n";
}
package Foo;
@ISA = qw( Bar );
%fizzle = ( ’Password’ => ’Rumple’ );
sub new {
my $type = shift;
my $self = Bar−>new;
$self−>{’fizzle’} = \%fizzle;
bless $self, $type;
}
package main;
$a = Bar−>new;
$b = Foo−>new;
$a−>enter;
$b−>enter;
INHERITING A CONSTRUCTOR
An inheritable constructor should use the second form of bless() which allows blessing directly into a
specified class. Notice in this example that the object will be a BAR not a FOO, even though the constructor
is in class FOO.
package FOO;
sub new {
my $type = shift;
my $self = {};
bless $self, $type;
}
sub baz {
print "in FOO::baz()\n";
}
package BAR;
@ISA = qw(FOO);
sub baz {
print "in BAR::baz()\n";
}
package main;
$a = BAR−>new;
$a−>baz;
18−Oct−1998
Version 5.005_02
355
perlbot
Perl Programmers Reference Guide
perlbot
DELEGATION
Some classes, such as SDBM_File, cannot be effectively subclassed because they create foreign objects.
Such a class can be extended with some sort of aggregation technique such as the "using" relationship
mentioned earlier or by delegation.
The following example demonstrates delegation using an AUTOLOAD() function to perform
message−forwarding. This will allow the Mydbm object to behave exactly like an SDBM_File object. The
Mydbm class could now extend the behavior by adding custom FETCH() and STORE() methods, if this is
desired.
package Mydbm;
require SDBM_File;
require Tie::Hash;
@ISA = qw(Tie::Hash);
sub TIEHASH {
my $type = shift;
my $ref = SDBM_File−>new(@_);
bless {’delegate’ => $ref};
}
sub AUTOLOAD {
my $self = shift;
# The Perl interpreter places the name of the
# message in a variable called $AUTOLOAD.
# DESTROY messages should never be propagated.
return if $AUTOLOAD =~ /::DESTROY$/;
# Remove the package name.
$AUTOLOAD =~ s/^Mydbm:://;
# Pass the message to the delegate.
$self−>{’delegate’}−>$AUTOLOAD(@_);
}
package main;
use Fcntl qw( O_RDWR O_CREAT );
tie %foo, "Mydbm", "adbm", O_RDWR|O_CREAT, 0640;
$foo{’bar’} = 123;
print "foo−bar = $foo{’bar’}\n";
356
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
NAME
perldebug − Perl debugging
DESCRIPTION
First of all, have you tried using the −w switch?
The Perl Debugger
"As soon as we started programming, we found to our surprise that it wasn‘t as easy to get programs right as
we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a
large part of my life from then on was going to be spent in finding mistakes in my own programs."
—Maurice Wilkes, 1949
If you invoke Perl with the −d switch, your script runs under the Perl source debugger. This works like an
interactive Perl environment, prompting for debugger commands that let you examine source code, set
breakpoints, get stack backtraces, change the values of variables, etc. This is so convenient that you often
fire up the debugger all by itself just to test out Perl constructs interactively to see what they do. For
example:
perl −d −e 42
In Perl, the debugger is not a separate program as it usually is in the typical compiled environment. Instead,
the −d flag tells the compiler to insert source information into the parse trees it‘s about to hand off to the
interpreter. That means your code must first compile correctly for the debugger to work on it. Then when
the interpreter starts up, it preloads a Perl library file containing the debugger itself.
The program will halt right before the first run−time executable statement (but see below regarding
compile−time statements) and ask you to enter a debugger command. Contrary to popular expectations,
whenever the debugger halts and shows you a line of code, it always displays the line it‘s about to execute,
rather than the one it has just executed.
Any command not recognized by the debugger is directly executed (eval‘d) as Perl code in the current
package. (The debugger uses the DB package for its own state information.)
Leading white space before a command would cause the debugger to think it‘s NOT a debugger command
but for Perl, so be careful not to do that.
Debugger Commands
The debugger understands the following commands:
h [command]
Prints out a help message.
If you supply another debugger command as an argument to the h command, it prints out
the description for just that command. The special argument of h h produces a more
compact help listing, designed to fit together on one screen.
If the output of the h command (or any command, for that matter) scrolls past your screen,
either precede the command with a leading pipe symbol so it‘s run through your pager, as
in
DB> |h
You may change the pager which is used via O pager=... command.
p expr
Same as print {$DB::OUT} expr in the current package. In particular, because this
is just Perl‘s own print function, this means that nested data structures and objects are not
dumped, unlike with the x command.
The DB::OUT filehandle is opened to /dev/tty, regardless of where STDOUT may be
redirected to.
18−Oct−1998
Version 5.005_02
357
perldebug
x expr
Perl Programmers Reference Guide
perldebug
Evaluates its expression in list context and dumps out the result in a pretty−printed fashion.
Nested data structures are printed out recursively, unlike the print function.
The details of printout are governed by multiple Options.
V [pkg [vars]]
Display all (or some) variables in package (defaulting to the main package) using a data
pretty−printer (hashes show their keys and values so you see what‘s what, control
characters are made printable, etc.). Make sure you don‘t put the type specifier (like $)
there, just the symbol names, like this:
V DB filename line
Use ~pattern and !pattern for positive and negative regexps.
Nested data structures are printed out in a legible fashion, unlike the print function.
The details of printout are governed by multiple Options.
X [vars]
Same as V currentpackage [vars].
T
Produce a stack backtrace. See below for details on its output.
s [expr]
Single step. Executes until it reaches the beginning of another statement, descending into
subroutine calls. If an expression is supplied that includes function calls, it too will be
single−stepped.
n [expr]
Next. Executes over subroutine calls, until it reaches the beginning of the next statement.
If an expression is supplied that includes function calls, those functions will be executed
with stops before each statement.
Repeat last n or s command.
c [line|sub]
Continue, optionally inserting a one−time−only breakpoint at the specified line or
subroutine.
l
List next window of lines.
l min+incr
List incr+1 lines starting at min.
l min−max
List lines min through max. l − is synonymous to −.
l line
List a single line.
l subname
List first window of lines from subroutine.
−
List previous window of lines.
w [line]
List window (a few lines) around the current line.
.
Return debugger pointer to the last−executed line and print it out.
f filename
Switch to viewing a different file or eval statement. If filename is not a full filename as
found in values of %INC, it is considered as a regexp.
/pattern/
Search forwards for pattern; final / is optional.
?pattern?
Search backwards for pattern; final ? is optional.
L
List all breakpoints and actions.
S [[!]pattern]
List subroutine names [not] matching pattern.
t
Toggle trace mode (see also AutoTrace Option).
t expr
Trace through execution of expr. For example:
$ perl −de 42
Stack dump during die enabled outside of evals.
358
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or ‘h h’ for help.
main::(−e:1):
0
DB<1> sub foo { 14 }
DB<2> sub bar { 3 }
DB<3> t print foo() * bar()
main::((eval 172):3):
print foo() + bar();
main::foo((eval 168):2):
main::bar((eval 170):2):
42
or, with the Option frame=2 set,
DB<4> O f=2
frame = ’2’
DB<5> t print foo() * bar()
3:
foo() * bar()
entering main::foo
2:
sub foo { 14 };
exited main::foo
entering main::bar
2:
sub bar { 3 };
exited main::bar
42
b [line] [condition]
Set a breakpoint. If line is omitted, sets a breakpoint on the line that is about to be
executed. If a condition is specified, it‘s evaluated each time the statement is reached and
a breakpoint is taken only if the condition is true. Breakpoints may be set on only lines
that begin an executable statement. Conditions don‘t use if:
b 237 $x > 30
b 237 ++$count237 < 11
b 33 /pattern/i
b subname [condition]
Set a breakpoint at the first line of the named subroutine.
b postpone subname [condition]
Set breakpoint at first line of subroutine after it is compiled.
b load filename
Set breakpoint at the first executed line of the file. Filename should be a full name as
found in values of %INC.
b compile subname
Sets breakpoint at the first statement executed after the subroutine is compiled.
d [line]
Delete a breakpoint at the specified line. If line is omitted, deletes the breakpoint on the
line that is about to be executed.
D
Delete all installed breakpoints.
a [line] command
Set an action to be done before the line is executed. The sequence of steps taken by the
debugger is
18−Oct−1998
Version 5.005_02
359
perldebug
Perl Programmers Reference Guide
1.
2.
3.
4.
5.
perldebug
check for a breakpoint at this line
print the line if necessary (tracing)
do any actions associated with that line
prompt user if at a breakpoint or in single−step
evaluate line
For example, this will print out $foo every time line 53 is passed:
a 53 print "DB FOUND $foo\n"
A
Delete all installed actions.
W [expr]
Add a global watch−expression.
W
Delete all watch−expressions.
O [opt[=val]] [opt"val"] [opt?]...
Set or query values of options. val defaults to 1. opt can be abbreviated. Several options
can be listed.
recallCommand, ShellBang
The characters used to recall command or spawn shell. By default, these
are both set to !.
pager
Program to use for output of pager−piped commands (those beginning
with a | character.) By default, $ENV{PAGER} will be used.
tkRunning
Run Tk while prompting (with ReadLine).
signalLevel, warnLevel, dieLevel
Level of verbosity. By default the debugger is in a sane verbose mode,
thus it will print backtraces on all the warnings and die−messages which
are going to be printed out, and will print a message when interesting
uncaught signals arrive.
To disable this behaviour, set these values to 0. If dieLevel is 2, then
the messages which will be caught by surrounding eval are also
printed.
AutoTrace
Trace mode (similar to t command, but can be put into
PERLDB_OPTS).
LineInfo
File or pipe to print line number info to. If it is a pipe (say,
|visual_perl_db), then a short, "emacs like" message is used.
inhibit_exit
If 0, allows stepping off the end of the script.
PrintRet
affects printing of return value after r command.
ornaments
affects screen appearance of the command line (see Term::ReadLine).
frame
affects printing messages on entry and exit from subroutines. If frame
& 2 is false, messages are printed on entry only. (Printing on exit may
be useful if inter(di)spersed with other messages.)
If frame & 4, arguments to functions are printed as well as the context
and caller info. If frame & 8, overloaded stringify and tied
FETCH are enabled on the printed arguments. If frame & 16, the
return value from the subroutine is printed as well.
The length at which the argument list is truncated is governed by the
next option:
360
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
maxTraceLen length at which the argument list is truncated when frame option‘s bit 4
is set.
The following options affect what happens with V, X, and x commands:
arrayDepth, hashDepth
Print only first N elements (‘’ for all).
compactDump, veryCompact
Change style of array and hash dump. If compactDump, short array
may be printed on one line.
Whether to print contents of globs.
globPrint
DumpDBFiles Dump arrays holding debugged files.
DumpPackages
Dump symbol tables of packages.
DumpReused
Dump contents of "reused" addresses.
quote, HighBit, undefPrint
Change style of string dump. Default value of quote is auto, one can
enable either double−quotish dump, or single−quotish by setting it to "
or ’. By default, characters with high bit set are printed as is.
UsageOnly
very rudimentally per−package memory usage dump. Calculates total
size of strings in variables in the package.
During startup options are initialized from $ENV{PERLDB_OPTS}. You can put
additional initialization options TTY, noTTY, ReadLine, and NonStop there.
Example rc file:
&parse_options("NonStop=1 LineInfo=db.out AutoTrace");
The script will run without human intervention, putting trace information into the file
db.out. (If you interrupt it, you would better reset LineInfo to something "interactive"!)
TTY
The TTY to use for debugging I/O.
noTTY
If set, goes in NonStop mode, and would not connect to a TTY. If
interrupt (or if control goes to debugger via explicit setting of
$DB::signal or $DB::single from the Perl script), connects to a
TTY specified by the TTY option at startup, or to a TTY found at
runtime using Term::Rendezvous module of your choice.
This module should implement a method new which returns an object
with two methods: IN and OUT, returning two filehandles to use for
debugging input and output correspondingly. Method new may inspect
an argument which is a value of $ENV{PERLDB_NOTTY} at startup, or
is "/tmp/perldbtty$$" otherwise.
ReadLine
If false, readline support in debugger is disabled, so you can debug
ReadLine applications.
NonStop
If set, debugger goes into noninteractive mode until interrupted, or
programmatically by setting $DB::signal or $DB::single.
Here‘s an example of using the $ENV{PERLDB_OPTS} variable:
$ PERLDB_OPTS="N f=2" perl −d myprogram
18−Oct−1998
Version 5.005_02
361
perldebug
Perl Programmers Reference Guide
perldebug
will run the script myprogram without human intervention, printing out the call tree with
entry and exit points. Note that N f=2 is equivalent to NonStop=1 frame=2. Note
also that at the moment when this documentation was written all the options to the
debugger could be uniquely abbreviated by the first letter (with exception of Dump*
options).
Other examples may include
$ PERLDB_OPTS="N f A L=listing" perl −d myprogram
− runs script noninteractively, printing info on each entry into a subroutine and each
executed line into the file listing. (If you interrupt it, you would better reset LineInfo to
something "interactive"!)
$ env "PERLDB_OPTS=R=0 TTY=/dev/ttyc" perl −d myprogram
may be useful for debugging a program which uses Term::ReadLine itself. Do not
forget detach shell from the TTY in the window which corresponds to /dev/ttyc, say, by
issuing a command like
$ sleep 1000000
See "Debugger Internals" below for more details.
< [ command ] Set an action (Perl command) to happen before every debugger prompt. A multi−line
command may be entered by backslashing the newlines. If command is missing, resets
the list of actions.
<< command
Add an action (Perl command) to happen before every debugger prompt. A multi−line
command may be entered by backslashing the newlines.
> command
Set an action (Perl command) to happen after the prompt when you‘ve just given a
command to return to executing the script. A multi−line command may be entered by
backslashing the newlines. If command is missing, resets the list of actions.
>> command
Adds an action (Perl command) to happen after the prompt when you‘ve just given a
command to return to executing the script. A multi−line command may be entered by
backslashing the newlines.
{ [ command ]
Set an action (debugger command) to happen before every debugger prompt. A multi−line
command may be entered by backslashing the newlines. If command is missing, resets
the list of actions.
{{ command
Add an action (debugger command) to happen before every debugger prompt. A
multi−line command may be entered by backslashing the newlines.
! number
Redo a previous command (default previous command).
! −number
Redo number‘th−to−last command.
! pattern
Redo last command that started with pattern. See O recallCommand, too.
!! cmd
Run cmd in a subprocess (reads from DB::IN, writes to DB::OUT) See O shellBang
too.
H −number
Display last n commands. Only commands longer than one character are listed. If number
is omitted, lists them all.
q or ^D
Quit. ("quit" doesn‘t work for this.) This is the only supported way to exit the debugger,
though typing exit twice may do it too.
Set an Option inhibit_exit to 0 if you want to be able to step off the end the script.
You may also need to set $finished to 0 at some moment if you want to step through
global destruction.
362
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
R
perldebug
Restart the debugger by execing a new session. It tries to maintain your history across this,
but internal settings and command line options may be lost.
Currently the following setting are preserved: history, breakpoints, actions, debugger
Options, and the following command line options: −w, −I, and −e.
|dbcmd
Run debugger command, piping DB::OUT to current pager.
||dbcmd
Same as |dbcmd but DB::OUT is temporarily selected as well. Often used with
commands that would otherwise produce long output, such as
|V main
= [alias value]
Define a command alias, like
= quit q
or list current aliases.
command
Execute command as a Perl statement. A missing semicolon will be supplied.
m expr
The expression is evaluated, and the methods which may be applied to the result are listed.
m package
The methods which may be applied to objects in the package are listed.
Debugger input/output
Prompt
The debugger prompt is something like
DB<8>
or even
DB<<17>>
where that number is the command number, which you‘d use to access with the builtin csh−like
history mechanism, e.g., !17 would repeat command number 17. The number of angle brackets
indicates the depth of the debugger. You could get more than one set of brackets, for example, if
you‘d already at a breakpoint and then printed out the result of a function call that itself also has
a breakpoint, or you step into an expression via s/n/t expression command.
Multiline commands
If you want to enter a multi−line command, such as a subroutine definition with several
statements, or a format, you may escape the newline that would normally end the debugger
command with a backslash. Here‘s an example:
DB<1> for (1..4) {
cont:
print "ok\n";
cont: }
ok
ok
ok
ok
\
\
Note that this business of escaping a newline is specific to interactive commands typed into the
debugger.
Stack backtrace
Here‘s an example of what a stack backtrace via T command might look like:
$ = main::infested called from file ‘Ambulation.pm’ line 10
@ = Ambulation::legs(1, 2, 3, 4) called from file ‘camel_flea’ line 7
$ = main::pests(’bactrian’, 4) called from file ‘camel_flea’ line 4
18−Oct−1998
Version 5.005_02
363
perldebug
Perl Programmers Reference Guide
perldebug
The left−hand character up there tells whether the function was called in a scalar or list context
(we bet you can tell which is which). What that says is that you were in the function
main::infested when you ran the stack dump, and that it was called in a scalar context
from line 10 of the file Ambulation.pm, but without any arguments at all, meaning it was called
as &infested. The next stack frame shows that the function Ambulation::legs was
called in a list context from the camel_flea file with four arguments. The last stack frame shows
that main::pests was called in a scalar context, also from camel_flea, but from line 4.
Note that if you execute T command from inside an active use statement, the backtrace will
contain both require frame and an eval) frame.
Listing
Listing given via different flavors of l command looks like this:
DB<<13>> l
101:
102:b
103
104
105
106
107==>
108
109:a
110:
@i{@i} = ();
@isa{@i,$pack} = ()
if(exists $i{$prevpack} || exists $isa{$pack});
}
next
if(exists $isa{$pack});
if ($extra−− > 0) {
%isa = ($pack,1);
Note that the breakable lines are marked with :, lines with breakpoints are marked by b, with
actions by a, and the next executed line is marked by ==>.
Frame listing
When frame option is set, debugger would print entered (and optionally exited) subroutines in
different styles.
What follows is the start of the listing of
env "PERLDB_OPTS=f=n N" perl −d −V
for different values of n:
1
entering main::BEGIN
entering Config::BEGIN
Package lib/Exporter.pm.
Package lib/Carp.pm.
Package lib/Config.pm.
entering Config::TIEHASH
entering Exporter::import
entering Exporter::export
entering Config::myconfig
entering Config::FETCH
entering Config::FETCH
entering Config::FETCH
entering Config::FETCH
2
entering main::BEGIN
entering Config::BEGIN
Package lib/Exporter.pm.
Package lib/Carp.pm.
exited Config::BEGIN
364
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
Package lib/Config.pm.
entering Config::TIEHASH
exited Config::TIEHASH
entering Exporter::import
entering Exporter::export
exited Exporter::export
exited Exporter::import
exited main::BEGIN
entering Config::myconfig
entering Config::FETCH
exited Config::FETCH
entering Config::FETCH
exited Config::FETCH
entering Config::FETCH
4
in $=main::BEGIN() from /dev/nul:0
in $=Config::BEGIN() from lib/Config.pm:2
Package lib/Exporter.pm.
Package lib/Carp.pm.
Package lib/Config.pm.
in $=Config::TIEHASH(’Config’) from lib/Config.pm:644
in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
in @=Config::myconfig() from /dev/nul:0
in $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’SUBVERSION’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’osname’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’osvers’) from lib/Config.pm:574
6
in $=main::BEGIN() from /dev/nul:0
in $=Config::BEGIN() from lib/Config.pm:2
Package lib/Exporter.pm.
Package lib/Carp.pm.
out $=Config::BEGIN() from lib/Config.pm:0
Package lib/Config.pm.
in $=Config::TIEHASH(’Config’) from lib/Config.pm:644
out $=Config::TIEHASH(’Config’) from lib/Config.pm:644
in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
out $=main::BEGIN() from /dev/nul:0
in @=Config::myconfig() from /dev/nul:0
in $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574
out $=Config::FETCH(ref(Config), ’package’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574
out $=Config::FETCH(ref(Config), ’baserev’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574
out $=Config::FETCH(ref(Config), ’PATCHLEVEL’) from lib/Config.pm:574
in $=Config::FETCH(ref(Config), ’SUBVERSION’) from lib/Config.pm:574
18−Oct−1998
Version 5.005_02
365
perldebug
Perl Programmers Reference Guide
perldebug
14
in $=main::BEGIN() from /dev/nul:0
in $=Config::BEGIN() from lib/Config.pm:2
Package lib/Exporter.pm.
Package lib/Carp.pm.
out $=Config::BEGIN() from lib/Config.pm:0
Package lib/Config.pm.
in $=Config::TIEHASH(’Config’) from lib/Config.pm:644
out $=Config::TIEHASH(’Config’) from lib/Config.pm:644
in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
out $=main::BEGIN() from /dev/nul:0
in @=Config::myconfig() from /dev/nul:0
in $=Config::FETCH(’Config=HASH(0x1aa444)’, ’package’) from lib/Confi
out $=Config::FETCH(’Config=HASH(0x1aa444)’, ’package’) from lib/Confi
in $=Config::FETCH(’Config=HASH(0x1aa444)’, ’baserev’) from lib/Confi
out $=Config::FETCH(’Config=HASH(0x1aa444)’, ’baserev’) from lib/Confi
30
in $=CODE(0x15eca4)() from /dev/null:0
in $=CODE(0x182528)() from lib/Config.pm:2
Package lib/Exporter.pm.
out $=CODE(0x182528)() from lib/Config.pm:0
scalar context return from CODE(0x182528): undef
Package lib/Config.pm.
in $=Config::TIEHASH(’Config’) from lib/Config.pm:628
out $=Config::TIEHASH(’Config’) from lib/Config.pm:628
scalar context return from Config::TIEHASH:
empty hash
in $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
in $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
out $=Exporter::export(’Config’, ’main’, ’myconfig’, ’config_vars’) f
scalar context return from Exporter::export: ’’
out $=Exporter::import(’Config’, ’myconfig’, ’config_vars’) from /dev/
scalar context return from Exporter::import: ’’
In all the cases indentation of lines shows the call tree, if bit 2 of frame is set, then a line is
printed on exit from a subroutine as well, if bit 4 is set, then the arguments are printed as well as
the caller info, if bit 8 is set, the arguments are printed even if they are tied or references, if bit 16
is set, the return value is printed as well.
When a package is compiled, a line like this
Package lib/Carp.pm.
is printed with proper indentation.
Debugging compile−time statements
If you have any compile−time executable statements (code within a BEGIN block or a use statement), these
will NOT be stopped by debugger, although requires will (and compile−time statements can be traced
with AutoTrace option set in PERLDB_OPTS). From your own Perl code, however, you can transfer
control back to the debugger using the following statement, which is harmless if the debugger is not running:
$DB::single = 1;
366
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
If you set $DB::single to the value 2, it‘s equivalent to having just typed the n command, whereas a
value of 1 means the s command. The $DB::trace variable should be set to 1 to simulate having typed
the t command.
Another way to debug compile−time code is to start debugger, set a breakpoint on load of some module
thusly
DB<7> b load f:/perllib/lib/Carp.pm
Will stop on load of ‘f:/perllib/lib/Carp.pm’.
and restart debugger by R command (if possible). One can use b compile subname for the same
purpose.
Debugger Customization
Most probably you do not want to modify the debugger, it contains enough hooks to satisfy most needs. You
may change the behaviour of debugger from the debugger itself, using Options, from the command line via
PERLDB_OPTS environment variable, and from customization files.
You can do some customization by setting up a .perldb file which contains initialization code. For instance,
you could make aliases like these (the last one is one people expect to be there):
$DB::alias{’len’}
$DB::alias{’stop’}
$DB::alias{’ps’}
$DB::alias{’quit’}
=
=
=
=
’s/^len(.*)/p length($1)/’;
’s/^stop (at|in)/b/’;
’s/^ps\b/p scalar /’;
’s/^quit(\s*)/exit\$/’;
One changes options from .perldb file via calls like this one;
parse_options("NonStop=1 LineInfo=db.out AutoTrace=1 frame=2");
(the code is executed in the package DB). Note that .perldb is processed before processing PERLDB_OPTS.
If .perldb defines the subroutine afterinit, it is called after all the debugger initialization ends. .perldb
may be contained in the current directory, or in the LOGDIR/HOME directory.
If you want to modify the debugger, copy perl5db.pl from the Perl library to another name and modify it as
necessary. You‘ll also want to set your PERL5DB environment variable to say something like this:
BEGIN { require "myperl5db.pl" }
As the last resort, one can use PERL5DB to customize debugger by directly setting internal variables or
calling debugger functions.
Readline Support
As shipped, the only command line history supplied is a simplistic one that checks for leading exclamation
points. However, if you install the Term::ReadKey and Term::ReadLine modules from CPAN, you will
have full editing capabilities much like GNU readline(3) provides. Look for these in the
modules/by−module/Term directory on CPAN.
A rudimentary command line completion is also available. Unfortunately, the names of lexical variables are
not available for completion.
Editor Support for Debugging
If you have GNU emacs installed on your system, it can interact with the Perl debugger to provide an
integrated software development environment reminiscent of its interactions with C debuggers.
Perl is also delivered with a start file for making emacs act like a syntax−directed editor that understands
(some of) Perl‘s syntax. Look in the emacs directory of the Perl source distribution.
(Historically, a similar setup for interacting with vi and the X11 window system had also been available, but
at the time of this writing, no debugger support for vi currently exists.)
18−Oct−1998
Version 5.005_02
367
perldebug
Perl Programmers Reference Guide
perldebug
The Perl Profiler
If you wish to supply an alternative debugger for Perl to run, just invoke your script with a colon and a
package argument given to the −d flag. One of the most popular alternative debuggers for Perl is DProf, the
Perl profiler. As of this writing, DProf is not included with the standard Perl distribution, but it is expected
to be included soon, for certain values of "soon".
Meanwhile, you can fetch the Devel::Dprof module from CPAN. Assuming it‘s properly installed on your
system, to profile your Perl program in the file mycode.pl, just type:
perl −d:DProf mycode.pl
When the script terminates the profiler will dump the profile information to a file called tmon.out. A tool
like dprofpp (also supplied with the Devel::DProf package) can be used to interpret the information which is
in that profile.
Debugger support in perl
When you call the caller function (see caller) from the package DB, Perl sets the array @DB::args to contain
the arguments the corresponding stack frame was called with.
If perl is run with −d option, the following additional features are enabled (cf. $^P):
Perl inserts the contents of $ENV{PERL5DB} (or BEGIN {require ‘perl5db.pl‘} if not
present) before the first line of the application.
The array @{"_<$filename"} is the line−by−line contents of $filename for all the compiled
files. Same for evaled strings which contain subroutines, or which are currently executed. The
$filename for evaled strings looks like (eval 34).
The hash %{"_<$filename"} contains breakpoints and action (it is keyed by line number), and
individual entries are settable (as opposed to the whole hash). Only true/false is important to Perl,
though the values used by perl5db.pl have the form "$break_condition\0$action". Values
are magical in numeric context: they are zeros if the line is not breakable.
Same for evaluated strings which contain subroutines, or which are currently executed.
$filename for evaled strings looks like (eval 34).
The
The scalar ${"_<$filename"} contains "_<$filename". Same for evaluated strings which
contain subroutines, or which are currently executed. The $filename for evaled strings looks like
(eval 34).
After each required file is compiled, but before it is executed,
DB::postponed(*{"_<$filename"}) is called (if subroutine DB::postponed exists).
Here the $filename is the expanded name of the required file (as found in values of %INC).
After each subroutine subname is compiled existence of $DB::postponed{subname} is
checked. If this key exists, DB::postponed(subname) is called (if subroutine
DB::postponed exists).
A hash %DB::sub is maintained, with keys being subroutine names, values having the form
filename:startline−endline. filename has the form (eval 31) for subroutines
defined inside evals.
When execution of the application reaches a place that can have a breakpoint, a call to DB::DB() is
performed if any one of variables $DB::trace, $DB::single, or $DB::signal is true. (Note
that these variables are not localizable.) This feature is disabled when the control is inside
DB::DB() or functions called from it (unless $^D & (1<<30)).
When execution of the application reaches a subroutine call, a call to &DB::sub(args) is
performed instead, with $DB::sub being the name of the called subroutine. (Unless the subroutine is
compiled in the package DB.)
368
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
perldebug
Note that if &DB::sub needs some external data to be setup for it to work, no subroutine call is possible
until this is done. For the standard debugger $DB::deep (how many levels of recursion deep into the
debugger you can go before a mandatory break) gives an example of such a dependency.
The minimal working debugger consists of one line
sub DB::DB {}
which is quite handy as contents of PERL5DB environment variable:
env "PERL5DB=sub DB::DB {}" perl −d your−script
Another (a little bit more useful) minimal debugger can be created with the only line being
sub DB::DB {print ++$i; scalar }
This debugger would print the sequential number of encountered statement, and would wait for your CR to
continue.
The following debugger is quite functional:
{
package DB;
sub DB {}
sub sub {print ++$i, " $sub\n"; &$sub}
}
It prints the sequential number of subroutine call and the name of the called subroutine.
&DB::sub should be compiled into the package DB.
Note that
Debugger Internals
At the start, the debugger reads your rc file (./.perldb or ~/.perldb under Unix), which can set important
options. This file may define a subroutine &afterinit to be executed after the debugger is initialized.
After the rc file is read, the debugger reads environment variable PERLDB_OPTS and parses it as a rest of O
... line in debugger prompt.
It also maintains magical internal variables, such as @DB::dbline, %DB::dbline, which are aliases for
@{"::_(13)
13: CURLYX {1,32767}(27)
15:
OPEN1(17)
17:
EXACT (19)
19:
STAR(22)
20:
EXACT (0)
22:
EXACT (24)
24:
CLOSE1(26)
26:
WHILEM(0)
27: NOTHING(28)
28: EXACT (30)
30: ANYOF(40)
40: EXACT (42)
42: EOL(43)
43: END(0)
anchored ‘de’ at 1 floating ‘gh’ at 3..2147483647 (checking floating)
stclass ‘ANYOF’ minlen 7
The first line shows the pre−compiled form of the regexp, and the second shows the size of the compiled
form (in arbitrary units, usually 4−byte words) and the label id of the first node which does a match.
The last line (split into two lines in the above) contains the optimizer info. In the example shown, the
optimizer found that the match should contain a substring de at the offset 1, and substring gh at some offset
between 3 and infinity. Moreover, when checking for these substrings (to abandon impossible matches
quickly) it will check for the substring gh before checking for the substring de. The optimizer may also use
the knowledge that the match starts (at the first id) with a character class, and the match cannot be shorter
than 7 chars.
The fields of interest which may appear in the last line are
anchored STRING at POS
floating STRING at POS1..POS2
see above;
18−Oct−1998
Version 5.005_02
373
perldebug
Perl Programmers Reference Guide
perldebug
matching floating/anchored
which substring to check first;
minlen
the minimal length of the match;
stclass TYPE
The type of the first matching node.
noscan
which advises to not scan for the found substrings;
isall
which says that the optimizer info is in fact all that the regular expression contains (thus one does not
need to enter the RE engine at all);
GPOS
if the pattern contains \G;
plus
if the pattern starts with a repeated char (as in x+y);
implicit
if the pattern starts with .*;
with eval
if the pattern contain eval−groups (see (?{ code }));
anchored(TYPE)
if the pattern may match only at a handful of places (with TYPE being BOL, MBOL, or GPOS, see the
table below).
If a substring is known to match at end−of−line only, it may be followed by $, as in floating ‘k‘$.
The optimizer−specific info is used to avoid entering (a slow) RE engine on strings which will definitely not
match. If isall flag is set, a call to the RE engine may be avoided even when optimizer found an
appropriate place for the match.
The rest of the output contains the list of nodes of the compiled form of the RE. Each line has format
id: TYPE OPTIONAL−INFO (next−id)
Types of nodes
Here is the list of possible types with short descriptions:
# TYPE arg−description [num−args] [longjump−len] DESCRIPTION
374
# Exit points
END
no
SUCCEED
no
End of program.
Return from a subroutine, basically.
# Anchors:
BOL
MBOL
SBOL
EOS
EOL
MEOL
SEOL
BOUND
Match
Same,
Same,
Match
Match
Same,
Same,
Match
no
no
no
no
no
no
no
no
"" at beginning of line.
assuming multiline.
assuming singleline.
"" at end of string.
"" at end of line.
assuming multiline.
assuming singleline.
"" at any word boundary
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
BOUNDL
NBOUND
NBOUNDL
GPOSno
no
no
no
Matches
# [Special]
ANY
SANY
ANYOF
ALNUM
ALNUML
NALNUM
NALNUML
SPACE
SPACEL
NSPACE
NSPACEL
DIGIT
NDIGIT
alternatives
no
Match
no
Match
sv
Match
no
Match
no
Match
no
Match
no
Match
no
Match
no
Match
no
Match
no
Match
no
Match
no
Match
# BRANCH
#
#
#
#
#
#
#
BRANCH
The set of branches constituting a single choice are hooked
together with their "next" pointers, since precedence prevents
anything being concatenated to any individual branch. The
"next" pointer of the last BRANCH in a choice points to the
thing following the whole choice. This is also where the
final "next" pointer of each individual branch points; each
branch starts with the operand node of a BRANCH node.
# BACK
#
# not used
BACK
Normal "next" pointers all implicitly point forward; BACK
exists to make loop structures possible.
# Literals
EXACT
EXACTF
EXACTFL
node
Match
Match
Match
where
"" at any
"" at any
"" at any
last m//g
word
word
word
left
perldebug
boundary
non−boundary
non−boundary
off.
any one character (except newline).
any one character.
character in (or not in) this class.
any alphanumeric character
any alphanumeric char in locale
any non−alphanumeric character
any non−alphanumeric char in locale
any whitespace character
any whitespace char in locale
any non−whitespace character
any non−whitespace char in locale
any numeric character
any non−numeric character
Match this alternative, or the next...
no
Match "", "next" ptr points backward.
sv
sv
sv
Match this string (preceded by length).
Match this string, folded (prec. by length).
Match this string, folded in locale (w/len).
# Do nothing
NOTHING
no
Match empty string.
# A variant of above which delimits a group, thus stops optimizations
TAIL
no
Match empty string. Can jump here from outside.
# STAR,PLUS
#
#
#
#
STAR
PLUS
’?’, and complex ’*’ and ’+’, are implemented as circular
BRANCH structures using BACK. Simple cases (one character
per match) are implemented with STAR and PLUS for speed
and to minimize recursive plunges.
node
node
Match this (simple) thing 0 or more times.
Match this (simple) thing 1 or more times.
CURLY
CURLYN
#
CURLYM
CURLYX
sv 2
no 2
Match
Match
{n,m}
Match
Match
18−Oct−1998
no 2
sv 2
this simple thing {n,m} times.
next−after−this simple thing
times, set parenths.
this medium−complex thing {n,m} times.
this complex thing {n,m} times.
Version 5.005_02
375
perldebug
Perl Programmers Reference Guide
perldebug
# This terminator creates a loop structure for CURLYX
WHILEM
no
Do curly processing and see if rest matches.
# OPEN,CLOSE,GROUPP ...are numbered at compile time.
OPEN
num 1
Mark this point in input as start of #n.
CLOSE
num 1
Analogous to OPEN.
REF
REFF
REFFL
num 1
num 1
num 1
Match some already matched string
Match already matched string, folded
Match already matched string, folded in loc.
# grouping assertions
IFMATCH
off 1 2 Succeeds if the following matches.
UNLESSM
off 1 2 Fails if the following matches.
SUSPEND
off 1 1 "Independent" sub−RE.
IFTHEN
off 1 1 Switch, should be preceeded by switcher .
GROUPP
num 1
Whether the group matched.
# Support for long RE
LONGJMP
off 1 1 Jump far away.
BRANCHJ
off 1 1 BRANCH with long offset.
# The heavy worker
EVAL
evl 1
Execute some Perl code.
# Modifiers
MINMOD
no
LOGICAL
no
Next operator is not greedy.
Next opcode should set the flag only.
# This is not used yet
RENUM
off 1 1 Group with independently numbered parens.
# This is not really a node, but an optimized away piece of a "long" node.
# To simplify debugging output, we mark it as if it were a node
OPTIMIZED
off
Placeholder for dump.
Run−time output
First of all, when doing a match, one may get no run−time output even if debugging is enabled. this means
that the RE engine was never entered, all of the job was done by the optimizer.
If RE engine was entered, the output may look like this:
Matching ‘[bc]d(ef*g)+h[ij]k$’ against ‘abcdefg__gh__’
Setting an EVAL scope, savestack=3
2
| 1: ANYOF
3
| 11: EXACT
4
| 13: CURLYX {1,32767}
4
| 26:
WHILEM
0 out of 1..32767 cc=effff31c
4
| 15:
OPEN1
4
| 17:
EXACT
5
| 19:
STAR
EXACT can match 1 times out of 32767...
Setting an EVAL scope, savestack=3
6
| 22:
EXACT
7 <__gh__>
| 24:
CLOSE1
7 <__gh__>
| 26:
WHILEM
1 out of 1..32767 cc=effff31c
Setting an EVAL scope, savestack=12
7 <__gh__>
| 15:
OPEN1
376
Version 5.005_02
18−Oct−1998
perldebug
Perl Programmers Reference Guide
7 <__gh__>
| 17:
restoring \1 to 4(4)..7
7 <__gh__>
7 <__gh__>
perldebug
EXACT
failed, try continuation...
NOTHING
EXACT
failed...
failed...
| 27:
| 28:
The most significant information in the output is about the particular node of the compiled RE which is
currently being tested against the target string. The format of these lines is
STRING−OFFSET [12]−>{"susie"}
%s argument is not a HASH element or slice
(F) The argument to delete() must be either a hash element, such as
$foo{$bar}
$ref−>[12]−>{"susie"}
378
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
or a hash slice, such as
@foo{$bar, $baz, $xyzzy}
@{$ref−>[12]}{"susie", "queue"}
%s did not return a true value
(F) A required (or used) file must return a true value to indicate that it compiled correctly and ran its
initialization code correctly. It‘s traditional to end such a file with a "1;", though any true value would
do. See require.
%s found where operator expected
(S) The Perl lexer knows whether to expect a term or an operator. If it sees what it knows to be a term
when it was expecting to see an operator, it gives you this warning. Usually it indicates that an
operator or delimiter was omitted, such as a semicolon.
%s had compilation errors
(F) The final summary message when a perl −c fails.
%s has too many errors
(F) The parser has given up trying to parse the program after 10 errors. Further error messages would
likely be uninformative.
%s matches null string many times
(W) The pattern you‘ve specified would be an infinite loop if the regular expression engine didn‘t
specifically check for that. See perlre.
%s never introduced
(S) The symbol in question was declared but somehow went out of scope before it could possibly have
been used.
%s syntax OK
(F) The final summary message when a perl −c succeeds.
%s: Command not found
(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually
feed your script into Perl yourself.
%s: Expression syntax
(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually
feed your script into Perl yourself.
%s: Undefined variable
(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually
feed your script into Perl yourself.
%s: not found
(A) You‘ve accidentally run your script through the Bourne shell instead of Perl. Check the #! line, or
manually feed your script into Perl yourself.
(Missing semicolon on previous line?)
(S) This is an educated guess made in conjunction with the message "%s found where operator
expected". Don‘t automatically put a semicolon on the previous line just because you saw this
message.
−P not allowed for setuid/setgid script
(F) The script would have to be opened by the C preprocessor by name, which provides a race
condition that breaks security.
18−Oct−1998
Version 5.005_02
379
perldiag
Perl Programmers Reference Guide
perldiag
−T and −B not implemented on filehandles
(F) Perl can‘t peek at the stdio buffer of filehandles when it doesn‘t know about your kind of stdio.
You‘ll have to use a filename instead.
−p destination: %s
(F) An error occurred during the implicit output invoked by the −p command−line switch. (This
output goes to STDOUT unless you‘ve redirected it with select().)
500 Server error
See Server error.
?+* follows nothing in regexp
(F) You started a regular expression with a quantifier. Backslash it if you meant it literally.
perlre.
See
@ outside of string
(F) You had a pack template that specified an absolute position outside the string being unpacked. See
pack.
accept() on closed fd
(W) You tried to do an accept on a closed socket. Did you forget to check the return value of your
socket() call? See accept.
Allocation too large: %lx
(X) You can‘t allocate more than 64K on an MS−DOS machine.
Applying %s to %s will act on scalar(%s)
(W) The pattern match (//), substitution (s///), and transliteration (tr///) operators work on scalar values.
If you apply one of them to an array or a hash, it will convert the array or hash to a scalar value — the
length of an array, or the population info of a hash — and then work on that scalar value. This is
probably not what you meant to do. See grep and map for alternatives.
Arg too short for msgsnd
(F) msgsnd() requires a string at least as long as sizeof(long).
Ambiguous use of %s resolved as %s
(W)(S) You said something that may not be interpreted the way you thought. Normally it‘s pretty easy
to disambiguate it by supplying a missing quote, operator, parenthesis pair or declaration.
Ambiguous call resolved as CORE::%s(), qualify as such or use &
(W) A subroutine you have declared has the same name as a Perl keyword, and you have used the
name without qualification for calling one or the other. Perl decided to call the builtin because the
subroutine is not imported.
To force interpretation as a subroutine call, either put an ampersand before the subroutine name, or
qualify the name with its package. Alternatively, you can import the subroutine (or pretend that it‘s
imported with the use subs pragma).
To silently interpret it as the Perl operator, use the CORE:: prefix on the operator (e.g.
CORE::log($x)) or by declaring the subroutine to be an object method (see attrs).
Args must match #! line
(F) The setuid emulator requires that the arguments Perl was invoked with match the arguments
specified on the #! line. Since some systems impose a one−argument limit on the #! line, try
combining switches; for example, turn −w −U into −wU.
380
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Argument "%s" isn‘t numeric%s
(W) The indicated string was fed as an argument to an operator that expected a numeric value instead.
If you‘re fortunate the message will identify which operator was so unfortunate.
Array @%s missing the @ in argument %d of %s()
(D) Really old Perl let you omit the @ on array names in some spots. This is now heavily deprecated.
assertion botched: %s
(P) The malloc package that comes with Perl had an internal failure.
Assertion failed: file "%s"
(P) A general assertion failed. The file in question must be examined.
Assignment to both a list and a scalar
(F) If you assign to a conditional operator, the 2nd and 3rd arguments must either both be scalars or
both be lists. Otherwise Perl won‘t know which context to supply to the right side.
Attempt to free non−arena SV: 0x%lx
(P) All SV objects are supposed to be allocated from arenas that will be garbage collected on exit. An
SV was discovered to be outside any of those arenas.
Attempt to free nonexistent shared string
(P) Perl maintains a reference counted internal table of strings to optimize the storage and access of
hash keys and other strings. This indicates someone tried to decrement the reference count of a string
that can no longer be found in the table.
Attempt to free temp prematurely
(W) Mortalized values are supposed to be freed by the free_tmps() routine. This indicates that
something else is freeing the SV before the free_tmps() routine gets a chance, which means that
the free_tmps() routine will be freeing an unreferenced scalar when it does try to free it.
Attempt to free unreferenced glob pointers
(P) The reference counts got screwed up on symbol aliases.
Attempt to free unreferenced scalar
(W) Perl went to decrement the reference count of a scalar to see if it would go to 0, and discovered
that it had already gone to 0 earlier, and should have been freed, and in fact, probably was freed. This
could indicate that SvREFCNT_dec() was called too many times, or that SvREFCNT_inc() was
called too few times, or that the SV was mortalized when it shouldn‘t have been, or that memory has
been corrupted.
Attempt to pack pointer to temporary value
(W) You tried to pass a temporary value (like the result of a function, or a computed expression) to the
"p" pack() template. This means the result contains a pointer to a location that could become invalid
anytime, even before the end of the current statement. Use literals or global values as arguments to the
"p" pack() template to avoid this warning.
Attempt to use reference as lvalue in substr
(W) You supplied a reference as the first argument to substr() used as an lvalue, which is pretty
strange. Perhaps you forgot to dereference it first. See substr.
Bad arg length for %s, is %d, should be %d
(F) You passed a buffer of the wrong size to one of msgctl(), semctl() or shmctl(). In C
parlance, the correct sizes are, respectively, sizeof(struct msqid_ds *), sizeof(struct semid_ds *), and
sizeof(struct shmid_ds *).
18−Oct−1998
Version 5.005_02
381
perldiag
Perl Programmers Reference Guide
perldiag
Bad filehandle: %s
(F) A symbol was passed to something wanting a filehandle, but the symbol has no filehandle
associated with it. Perhaps you didn‘t do an open(), or did it in another package.
Bad free() ignored
(S) An internal routine called free() on something that had never been malloc()ed in the first
place. Mandatory, but can be disabled by setting environment variable PERL_BADFREE to 1.
This message can be quite often seen with DB_File on systems with "hard" dynamic linking, like AIX
and OS/2. It is a bug of Berkeley DB which is left unnoticed if DB uses forgiving system
malloc().
Bad hash
(P) One of the internal hash routines was passed a null HV pointer.
Bad index while coercing array into hash
(F) The index looked up in the hash found as the 0‘th element of a pseudo−hash is not legal. Index
values must be at 1 or greater. See perlref.
Bad name after %s::
(F) You started to name a symbol by using a package prefix, and then didn‘t finish the symbol. In
particular, you can‘t interpolate outside of quotes, so
$var = ’myvar’;
$sym = mypack::$var;
is not the same as
$var = ’myvar’;
$sym = "mypack::$var";
Bad symbol for array
(P) An internal request asked to add an array entry to something that wasn‘t a symbol table entry.
Bad symbol for filehandle
(P) An internal request asked to add a filehandle entry to something that wasn‘t a symbol table entry.
Bad symbol for hash
(P) An internal request asked to add a hash entry to something that wasn‘t a symbol table entry.
Badly placed ()‘s
(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually
feed your script into Perl yourself.
Bareword "%s" not allowed while "strict subs" in use
(F) With "strict subs" in use, a bareword is only allowed as a subroutine identifier, in curly braces or to
the left of the "=" symbol. Perhaps you need to predeclare a subroutine?
Bareword "%s" refers to nonexistent package
(W) You used a qualified bareword of the form Foo::, but the compiler saw no other uses of that
namespace before that point. Perhaps you need to predeclare a package?
BEGIN failed—compilation aborted
(F) An untrapped exception was raised while executing a BEGIN subroutine. Compilation stops
immediately and the interpreter is exited.
BEGIN not safe after errors—compilation aborted
(F) Perl found a BEGIN {} subroutine (or a use directive, which implies a BEGIN {}) after one or
more compilation errors had already occurred. Since the intended environment for the BEGIN {}
382
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
could not be guaranteed (due to the errors), and since subsequent code likely depends on its correct
operation, Perl just gave up.
bind() on closed fd
(W) You tried to do a bind on a closed socket. Did you forget to check the return value of your
socket() call? See bind.
Bizarre copy of %s in %s
(P) Perl detected an attempt to copy an internal value that is not copiable.
Callback called exit
(F) A subroutine invoked from an external package via perl_call_sv() exited by calling exit.
Can‘t "goto" outside a block
(F) A "goto" statement was executed to jump out of what might look like a block, except that it isn‘t a
proper block. This usually occurs if you tried to jump out of a sort() block or subroutine, which is a
no−no. See goto.
Can‘t "goto" into the middle of a foreach loop
(F) A "goto" statement was executed to jump into the middle of a foreach loop. You can‘t get there
from here. See goto.
Can‘t "last" outside a block
(F) A "last" statement was executed to break out of the current block, except that there‘s this itty bitty
problem called there isn‘t a current block. Note that an "if" or "else" block doesn‘t count as a
"loopish" block, as doesn‘t a block given to sort(). You can usually double the curlies to get the
same effect though, because the inner curlies will be considered a block that loops once. See last.
Can‘t "next" outside a block
(F) A "next" statement was executed to reiterate the current block, but there isn‘t a current block. Note
that an "if" or "else" block doesn‘t count as a "loopish" block, as doesn‘t a block given to sort().
You can usually double the curlies to get the same effect though, because the inner curlies will be
considered a block that loops once. See next.
Can‘t "redo" outside a block
(F) A "redo" statement was executed to restart the current block, but there isn‘t a current block. Note
that an "if" or "else" block doesn‘t count as a "loopish" block, as doesn‘t a block given to sort().
You can usually double the curlies to get the same effect though, because the inner curlies will be
considered a block that loops once. See redo.
Can‘t bless non−reference value
(F) Only hard references may be blessed. This is how Perl "enforces" encapsulation of objects. See
perlobj.
Can‘t break at that line
(S) A warning intended to only be printed while running within the debugger, indicating the line
number specified wasn‘t the location of a statement that could be stopped at.
Can‘t call method "%s" in empty package "%s"
(F) You called a method correctly, and it correctly indicated a package functioning as a class, but that
package doesn‘t have ANYTHING defined in it, let alone methods. See perlobj.
Can‘t call method "%s" on unblessed reference
(F) A method call must know in what package it‘s supposed to run. It ordinarily finds this out from the
object reference you supply, but you didn‘t supply an object reference in this case. A reference isn‘t an
object reference until it has been blessed. See perlobj.
18−Oct−1998
Version 5.005_02
383
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t call method "%s" without a package or object reference
(F) You used the syntax of a method call, but the slot filled by the object reference or package name
contains an expression that returns a defined value which is neither an object reference nor a package
name. Something like this will reproduce the error:
$BADREF = 42;
process $BADREF 1,2,3;
$BADREF−>process(1,2,3);
Can‘t call method "%s" on an undefined value
(F) You used the syntax of a method call, but the slot filled by the object reference or package name
contains an undefined value. Something like this will reproduce the error:
$BADREF = undef;
process $BADREF 1,2,3;
$BADREF−>process(1,2,3);
Can‘t chdir to %s
(F) You called perl −x/foo/bar, but /foo/bar is not a directory that you can chdir to, possibly
because it doesn‘t exist.
Can‘t coerce %s to integer in %s
(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop
being what they are. So you can‘t say things like:
*foo += 1;
You CAN say
$foo = *foo;
$foo += 1;
but then $foo no longer contains a glob.
Can‘t coerce %s to number in %s
(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop
being what they are.
Can‘t coerce %s to string in %s
(F) Certain types of SVs, in particular real symbol table entries (typeglobs), can‘t be forced to stop
being what they are.
Can‘t coerce array into hash
(F) You used an array where a hash was expected, but the array has no information on how to map
from keys to array indices. You can do that only with arrays that have a hash reference at index 0.
Can‘t create pipe mailbox
(P) An error peculiar to VMS. The process is suffering from exhausted quotas or other plumbing
problems.
Can‘t declare %s in my
(F) Only scalar, array, and hash variables may be declared as lexical variables. They must have
ordinary identifiers as names.
Can‘t do inplace edit on %s: %s
(S) The creation of the new file failed for the indicated reason.
Can‘t do inplace edit without backup
(F) You‘re on a system such as MS−DOS that gets confused if you try reading from a deleted (but still
opened) file. You have to say −i.bak, or some such.
384
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t do inplace edit: %s > 14 characters
(S) There isn‘t enough room in the filename to make a backup name for the file.
Can‘t do inplace edit: %s is not a regular file
(S) You tried to use the −i switch on a special file, such as a file in /dev, or a FIFO. The file was
ignored.
Can‘t do setegid!
(P) The setegid() call failed for some reason in the setuid emulator of suidperl.
Can‘t do seteuid!
(P) The setuid emulator of suidperl failed for some reason.
Can‘t do setuid
(F) This typically means that ordinary perl tried to exec suidperl to do setuid emulation, but couldn‘t
exec it. It looks for a name of the form sperl5.000 in the same directory that the perl executable resides
under the name perl5.000, typically /usr/local/bin on Unix machines. If the file is there, check the
execute permissions. If it isn‘t, ask your sysadmin why he and/or she removed it.
Can‘t do waitpid with flags
(F) This machine doesn‘t have either waitpid() or wait4(), so only waitpid() without flags
is emulated.
Can‘t do {n,m} with n > m
(F) Minima must be less than or equal to maxima. If you really want your regexp to match something
0 times, just put {0}. See perlre.
Can‘t emulate −%s on #! line
(F) The #! line specifies a switch that doesn‘t make sense at this point. For example, it‘d be kind of
silly to put a −x on the #! line.
Can‘t exec "%s": %s
(W) An system(), exec(), or piped open call could not execute the named program for the
indicated reason. Typical reasons include: the permissions were wrong on the file, the file wasn‘t
found in $ENV{PATH}, the executable in question was compiled for another architecture, or the #!
line in a script points to an interpreter that can‘t be run for similar reasons. (Or maybe your system
doesn‘t support #! at all.)
Can‘t exec %s
(F) Perl was trying to execute the indicated program for you because that‘s what the #! line said. If
that‘s not what you wanted, you may need to mention "perl" on the #! line somewhere.
Can‘t execute %s
(F) You used the −S switch, but the copies of the script to execute found in the PATH did not have
correct permissions.
Can‘t find %s on PATH, ’.’ not in PATH
(F) You used the −S switch, but the script to execute could not be found in the PATH, or at least not
with the correct permissions. The script exists in the current directory, but PATH prohibits running it.
Can‘t find %s on PATH
(F) You used the −S switch, but the script to execute could not be found in the PATH.
Can‘t find label %s
(F) You said to goto a label that isn‘t mentioned anywhere that it‘s possible for us to go to. See goto.
18−Oct−1998
Version 5.005_02
385
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t find string terminator %s anywhere before EOF
(F) Perl strings can stretch over multiple lines. This message means that the closing delimiter was
omitted. Because bracketed quotes count nesting levels, the following is missing its final parenthesis:
print q(The character ’(’ starts a side comment.);
If you‘re getting this error from a here−document, you may have included unseen whitespace before
or after your closing tag. A good programmer‘s editor will have a way to help you find these
characters.
Can‘t fork
(F) A fatal error occurred while trying to fork while opening a pipeline.
Can‘t get filespec − stale stat buffer?
(S) A warning peculiar to VMS. This arises because of the difference between access checks under
VMS and under the Unix model Perl assumes. Under VMS, access checks are done by filename,
rather than by bits in the stat buffer, so that ACLs and other protections can be taken into account.
Unfortunately, Perl assumes that the stat buffer contains all the necessary information, and passes it,
instead of the filespec, to the access checking routine. It will try to retrieve the filespec using the
device name and FID present in the stat buffer, but this works only if you haven‘t made a subsequent
call to the CRTL stat() routine, because the device name is overwritten with each call. If this
warning appears, the name lookup failed, and the access checking routine gave up and returned
FALSE, just to be conservative. (Note: The access checking routine knows about the Perl stat
operator and file tests, so you shouldn‘t ever see this warning in response to a Perl command; it arises
only if some internal code takes stat buffers lightly.)
Can‘t get pipe mailbox device name
(P) An error peculiar to VMS. After creating a mailbox to act as a pipe, Perl can‘t retrieve its name for
later use.
Can‘t get SYSGEN parameter value for MAXBUF
(P) An error peculiar to VMS. Perl asked $GETSYI how big you want your mailbox buffers to be,
and didn‘t get an answer.
Can‘t goto subroutine outside a subroutine
(F) The deeply magical "goto subroutine" call can only replace one subroutine call for another. It can‘t
manufacture one out of whole cloth. In general you should be calling it out of only an AUTOLOAD
routine anyway. See goto.
Can‘t goto subroutine from an eval−string
(F) The "goto subroutine" call can‘t be used to jump out of an eval "string". (You can use it to jump
out of an eval {BLOCK}, but you probably don‘t want to.)
Can‘t localize through a reference
(F) You said something like local $$ref, which Perl can‘t currently handle, because when it goes
to restore the old value of whatever $ref pointed to after the scope of the local() is finished, it
can‘t be sure that $ref will still be a reference.
Can‘t localize lexical variable %s
(F) You used local on a variable name that was previously declared as a lexical variable using "my".
This is not allowed. If you want to localize a package variable of the same name, qualify it with the
package name.
Can‘t localize pseudo−hash element
(F) You said something like local $ar−>{‘key‘}, where $ar is a reference to a pseudo−hash.
That hasn‘t been implemented yet, but you can get a similar effect by localizing the corresponding
array element directly — local $ar−>[$ar−>[0]{‘key‘}].
386
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t locate auto/%s.al in @INC
(F) A function (or method) was called in a package which allows autoload, but there is no function to
autoload. Most probable causes are a misprint in a function/method name or a failure to AutoSplit
the file, say, by doing make install.
Can‘t locate %s in @INC
(F) You said to do (or require, or use) a file that couldn‘t be found in any of the libraries mentioned in
@INC. Perhaps you need to set the PERL5LIB or PERL5OPT environment variable to say where the
extra library is, or maybe the script needs to add the library name to @INC. Or maybe you just
misspelled the name of the file. See require.
Can‘t locate object method "%s" via package "%s"
(F) You called a method correctly, and it correctly indicated a package functioning as a class, but that
package doesn‘t define that particular method, nor does any of its base classes. See perlobj.
Can‘t locate package %s for @%s::ISA
(W) The @ISA array contained the name of another package that doesn‘t seem to exist.
Can‘t make list assignment to \%ENV on this system
(F) List assignment to %ENV is not supported on some systems, notably VMS.
Can‘t modify %s in %s
(F) You aren‘t allowed to assign to the item indicated, or otherwise try to change it, such as with an
auto−increment.
Can‘t modify nonexistent substring
(P) The internal routine that does assignment to a substr() was handed a NULL.
Can‘t msgrcv to read−only var
(F) The target of a msgrcv must be modifiable to be used as a receive buffer.
Can‘t open %s: %s
(S) The implicit opening of a file through use of the <> filehandle, either implicitly under the −n or −p
command−line switches, or explicitly, failed for the indicated reason. Usually this is because you
don‘t have read permission for a file which you named on the command line.
Can‘t open bidirectional pipe
(W) You tried to say open(CMD, "|cmd|"), which is not supported. You can try any of several
modules in the Perl library to do this, such as IPC::Open2. Alternately, direct the pipe‘s output to a file
using ">", and then read it in under a different file handle.
Can‘t open error file %s as stderr
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file
specified after ‘2>’ or ‘2>>’ on the command line for writing.
Can‘t open input file %s as stdin
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file
specified after ‘<’ on the command line for reading.
Can‘t open output file %s as stdout
(F) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the file
specified after ‘>’ or ‘>>’ on the command line for writing.
Can‘t open output pipe (name: %s)
(P) An error peculiar to VMS. Perl does its own command line redirection, and couldn‘t open the pipe
into which to send data destined for stdout.
18−Oct−1998
Version 5.005_02
387
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t open perl script "%s": %s
(F) The script you specified can‘t be opened for the indicated reason.
Can‘t redefine active sort subroutine %s
(F) Perl optimizes the internal handling of sort subroutines and keeps pointers into them. You tried to
redefine one such sort subroutine when it was currently active, which is not allowed. If you really
want to do this, you should write sort { &func } @x instead of sort func @x.
Can‘t rename %s to %s: %s, skipping file
(S) The rename done by the −i switch failed for some reason, probably because you don‘t have write
permission to the directory.
Can‘t reopen input pipe (name: %s) in binary mode
(P) An error peculiar to VMS. Perl thought stdin was a pipe, and tried to reopen it to accept binary
data. Alas, it failed.
Can‘t reswap uid and euid
(P) The setreuid() call failed for some reason in the setuid emulator of suidperl.
Can‘t return outside a subroutine
(F) The return statement was executed in mainline code, that is, where there was no subroutine call to
return out of. See perlsub.
Can‘t stat script "%s"
(P) For some reason you can‘t fstat() the script even though you have it open already. Bizarre.
Can‘t swap uid and euid
(P) The setreuid() call failed for some reason in the setuid emulator of suidperl.
Can‘t take log of %g
(F) For ordinary real numbers, you can‘t take the logarithm of a negative number or zero. There‘s a
Math::Complex package that comes standard with Perl, though, if you really want to do that for the
negative numbers.
Can‘t take sqrt of %g
(F) For ordinary real numbers, you can‘t take the square root of a negative number. There‘s a
Math::Complex package that comes standard with Perl, though, if you really want to do that.
Can‘t undef active subroutine
(F) You can‘t undefine a routine that‘s currently running. You can, however, redefine it while it‘s
running, and you can even undef the redefined subroutine while the old routine is running. Go figure.
Can‘t unshift
(F) You tried to unshift an "unreal" array that can‘t be unshifted, such as the main Perl stack.
Can‘t upgrade that kind of scalar
(P) The internal sv_upgrade routine adds "members" to an SV, making it into a more specialized kind
of SV. The top several SV types are so specialized, however, that they cannot be interconverted. This
message indicates that such a conversion was attempted.
Can‘t upgrade to undef
(P) The undefined SV is the bottom of the totem pole, in the scheme of upgradability. Upgrading to
undef indicates an error in the code calling sv_upgrade.
Can‘t use %%! because Errno.pm is not available
(F) The first time the %! hash is used, perl automatically loads the Errno.pm module. The Errno
module is expected to tie the %! hash to provide symbolic names for $! errno values.
388
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Can‘t use "my %s" in sort comparison
(F) The global variables $a and $b are reserved for sort comparisons. You mentioned $a or $b in the
same line as the <=> or cmp operator, and the variable had earlier been declared as a lexical variable.
Either qualify the sort variable with the package name, or rename the lexical variable.
Can‘t use %s for loop variable
(F) Only a simple scalar variable may be used as a loop variable on a foreach.
Can‘t use %s ref as %s ref
(F) You‘ve mixed up your reference types. You have to dereference a reference of the type needed.
You can use the ref() function to test the type of the reference, if need be.
Can‘t use \1 to mean $1 in expression
(W) In an ordinary expression, backslash is a unary operator that creates a reference to its argument.
The use of backslash to indicate a backreference to a matched substring is valid only as part of a
regular expression pattern. Trying to do this in ordinary Perl code produces a value that prints out
looking like SCALAR(0xdecaf). Use the $1 form instead.
Can‘t use bareword ("%s") as %s ref while \"strict refs\" in use
(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.
Can‘t use string ("%s") as %s ref while "strict refs" in use
(F) Only hard references are allowed by "strict refs". Symbolic references are disallowed. See perlref.
Can‘t use an undefined value as %s reference
(F) A value used as either a hard reference or a symbolic reference must be a defined value. This helps
to delurk some insidious errors.
Can‘t use global %s in "my"
(F) You tried to declare a magical variable as a lexical variable. This is not allowed, because the
magic can be tied to only one location (namely the global variable) and it would be incredibly
confusing to have variables in your program that looked like magical variables but weren‘t.
Can‘t use subscript on %s
(F) The compiler tried to interpret a bracketed expression as a subscript. But to the left of the brackets
was an expression that didn‘t look like an array reference, or anything else subscriptable.
Can‘t x= to read−only value
(F) You tried to repeat a constant value (often the undefined value) with an assignment operator, which
implies modifying the value itself. Perhaps you need to copy the value to a temporary, and repeat that.
Cannot find an opnumber for "%s"
(F) A string of a form CORE::word was given to prototype(), but there is no builtin with the
name word.
Cannot resolve method ‘%s’ overloading ‘%s’ in package ‘%s’
(F|P) Error resolving overloading specified by a method name (as opposed to a subroutine reference):
no such method callable via the package. If method name is ???, this is an internal error.
Character class syntax [. .] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning with "[." and ending with ".]"
is reserved for future extensions. If you need to represent those character sequences inside a regular
expression character class, just quote the square brackets with the backslash: "\[." and ".\]".
18−Oct−1998
Version 5.005_02
389
perldiag
Perl Programmers Reference Guide
perldiag
Character class syntax [: :] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning with "[:" and ending with
":]" is reserved for future extensions. If you need to represent those character sequences inside a
regular expression character class, just quote the square brackets with the backslash: "\[:" and ":\]".
Character class syntax [= =] is reserved for future extensions
(W) Within regular expression character classes ([]) the syntax beginning with "[=" and ending with
"=]" is reserved for future extensions. If you need to represent those character sequences inside a
regular expression character class, just quote the square brackets with the backslash: "\[=" and "=\]".
chmod: mode argument is missing initial 0
(W) A novice will sometimes say
chmod 777, $filename
not realizing that 777 will be interpreted as a decimal number, equivalent to 01411. Octal constants
are introduced with a leading 0 in Perl, as in C.
Close on unopened file <%s>
(W) You tried to close a filehandle that was never opened.
Compilation failed in require
(F) Perl could not compile a file specified in a require statement. Perl uses this generic message
when none of the errors that it encountered were severe enough to halt compilation immediately.
Complex regular subexpression recursion limit (%d) exceeded
(W) The regular expression engine uses recursion in complex situations where back−tracking is
required. Recursion depth is limited to 32766, or perhaps less in architectures where the stack cannot
grow arbitrarily. ("Simple" and "medium" situations are handled without recursion and are not subject
to a limit.) Try shortening the string under examination; looping in Perl code (e.g. with while) rather
than in the regular expression engine; or rewriting the regular expression so that it is simpler or
backtracks less. (See perlbook for information on Mastering Regular Expressions.)
connect() on closed fd
(W) You tried to do a connect on a closed socket. Did you forget to check the return value of your
socket() call? See connect.
Constant subroutine %s redefined
(S) You redefined a subroutine which had previously been eligible for inlining. See
Constant Functions in perlsub for commentary and workarounds.
Constant subroutine %s undefined
(S) You undefined a subroutine which had previously been eligible for inlining. See
Constant Functions in perlsub for commentary and workarounds.
Copy method did not return a reference
(F) The method which overloads "=" is buggy. See Copy Constructor.
Corrupt malloc ptr 0x%lx at 0x%lx
(P) The malloc package that comes with Perl had an internal failure.
corrupted regexp pointers
(P) The regular expression engine got confused by what the regular expression compiler gave it.
corrupted regexp program
(P) The regular expression engine got passed a regexp program without a valid magic number.
390
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Deep recursion on subroutine "%s"
(W) This subroutine has called itself (directly or indirectly) 100 times more than it has returned. This
probably indicates an infinite recursion, unless you‘re writing strange benchmark programs, in which
case it indicates something else.
Delimiter for here document is too long
(F) In a here document construct like <" or "+>>" instead of with "<" or nothing. If you intended only to
write the file, use ">" or ">>". See open.
Filehandle opened for only input
(W) You tried to write on a read−only filehandle. If you intended it to be a read−write filehandle, you
needed to open it with "+<" or "+>" or "+>>" instead of with "<" or nothing. If you intended only to
392
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
write the file, use ">" or ">>". See open.
Final $ should be \$ or $name
(F) You must now decide whether the final $ in a string was meant to be a literal dollar sign, or was
meant to introduce a variable name that happens to be missing. So you have to put either the backslash
or the name.
Final @ should be \@ or @name
(F) You must now decide whether the final @ in a string was meant to be a literal "at" sign, or was
meant to introduce a variable name that happens to be missing. So you have to put either the backslash
or the name.
Format %s redefined
(W) You redefined a format. To suppress this warning, say
{
local $^W = 0;
eval "format NAME =...";
}
Format not terminated
(F) A format must be terminated by a line with a solitary dot. Perl got to the end of your file without
finding such a line.
Found = in conditional, should be ==
(W) You said
if ($foo = 123)
when you meant
if ($foo == 123)
(or something like that).
gdbm store returned %d, errno %d, key "%s"
(S) A warning from the GDBM_File extension that a store failed.
gethostent not implemented
(F) Your C library apparently doesn‘t implement gethostent(), probably because if it did, it‘d feel
morally obligated to return every hostname on the Internet.
get{sock,peer}name() on closed fd
(W) You tried to get a socket or peer socket name on a closed socket. Did you forget to check the
return value of your socket() call?
getpwnam returned invalid UIC %#o for user "%s"
(S) A warning peculiar to VMS. The call to sys$getuai underlying the getpwnam operator
returned an invalid UIC.
Glob not terminated
(F) The lexer saw a left angle bracket in a place where it was expecting a term, so it‘s looking for the
corresponding right angle bracket, and not finding it. Chances are you left some needed parentheses
out earlier in the line, and you really meant a "less than".
Global symbol "%s" requires explicit package name
(F) You‘ve said "use strict vars", which indicates that all variables must either be lexically scoped
(using "my"), or explicitly qualified to say which package the global variable is in (using "::").
18−Oct−1998
Version 5.005_02
393
perldiag
Perl Programmers Reference Guide
perldiag
goto must have label
(F) Unlike with "next" or "last", you‘re not allowed to goto an unspecified destination. See goto.
Had to create %s unexpectedly
(S) A routine asked for a symbol from a symbol table that ought to have existed already, but for some
reason it didn‘t, and had to be created on an emergency basis to prevent a core dump.
Hash %%s missing the % in argument %d of %s()
(D) Really old Perl let you omit the % on hash names in some spots. This is now heavily deprecated.
Identifier too long
(F) Perl limits identifiers (names for variables, functions, etc.) to about 250 characters for simple
names, and somewhat more for compound names (like $A::B). You‘ve exceeded Perl‘s limits.
Future versions of Perl are likely to eliminate these arbitrary limitations.
Ill−formed logical name |%s| in prime_env_iter
(W) A warning peculiar to VMS. A logical name was encountered when preparing to iterate over
%ENV which violates the syntactic rules governing logical names. Because it cannot be translated
normally, it is skipped, and will not appear in %ENV. This may be a benign occurrence, as some
software packages might directly modify logical name tables and introduce nonstandard names, or it
may indicate that a logical name table has been corrupted.
Illegal character %s (carriage return)
(F) A carriage return character was found in the input. This is an error, and not a warning, because
carriage return characters can break multi−line strings, including here documents (e.g., print
<. This may mean
that your csh (C shell) is broken. If so, you should change all of the csh−related variables in config.sh:
If you have tcsh, make the variables refer to it as if it were csh (e.g.
full_csh=‘/usr/bin/tcsh’); otherwise, make them all empty (except that d_csh should be
‘undef’) so that Perl will think csh is missing. In either case, after editing config.sh, run
./Configure −S and rebuild Perl.
internal urp in regexp at /%s/
(P) Something went badly awry in the regular expression parser.
invalid [] range in regexp
(F) The range specified in a character class had a minimum character greater than the maximum
character. See perlre.
18−Oct−1998
Version 5.005_02
395
perldiag
Perl Programmers Reference Guide
perldiag
Invalid conversion in %s: "%s"
(W) Perl does not understand the given format conversion. See sprintf.
Invalid type in pack: ‘%s’
(F) The given character is not a valid pack type. See pack. (W) The given character is not a valid pack
type but used to be silently ignored.
Invalid type in unpack: ‘%s’
(F) The given character is not a valid unpack type. See unpack. (W) The given character is not a valid
unpack type but used to be silently ignored.
ioctl is not implemented
(F) Your machine apparently doesn‘t implement ioctl(), which is pretty strange for a machine that
supports C.
junk on end of regexp
(P) The regular expression parser is confused.
Label not found for "last %s"
(F) You named a loop to break out of, but you‘re not currently in a loop of that name, not even if you
count where you were called from. See last.
Label not found for "next %s"
(F) You named a loop to continue, but you‘re not currently in a loop of that name, not even if you
count where you were called from. See last.
Label not found for "redo %s"
(F) You named a loop to restart, but you‘re not currently in a loop of that name, not even if you count
where you were called from. See last.
listen() on closed fd
(W) You tried to do a listen on a closed socket. Did you forget to check the return value of your
socket() call? See listen.
Method for operation %s not found in package %s during blessing
(F) An attempt was made to specify an entry in an overloading table that doesn‘t resolve to a valid
subroutine. See overload.
Might be a runaway multi−line %s string starting on line %d
(S) An advisory indicating that the previous error may have been caused by a missing delimiter on a
string or pattern, because it eventually ended earlier on the current line.
Misplaced _ in number
(W) An underline in a decimal constant wasn‘t on a 3−digit boundary.
Missing $ on loop variable
(F) Apparently you‘ve been programming in csh too much. Variables are always mentioned with the $
in Perl, unlike in the shells, where it can vary from one line to the next.
Missing comma after first argument to %s function
(F) While certain functions allow you to specify a filehandle or an "indirect object" before the
argument list, this ain‘t one of them.
Missing operator before %s?
(S) This is an educated guess made in conjunction with the message "%s found where operator
expected". Often the missing operator is a comma.
396
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Missing right bracket
(F) The lexer counted more opening curly brackets (braces) than closing ones. As a general rule, you‘ll
find it‘s missing near the place you were last editing.
Modification of a read−only value attempted
(F) You tried, directly or indirectly, to change the value of a constant. You didn‘t, of course, try "2 =
1", because the compiler catches that. But an easy way to do the same thing is:
sub mod { $_[0] = 1 }
mod(2);
Another way is to assign to a substr() that‘s off the end of the string.
Modification of non−creatable array value attempted, subscript %d
(F) You tried to make an array value spring into existence, and the subscript was probably negative,
even counting from end of the array backwards.
Modification of non−creatable hash value attempted, subscript "%s"
(P) You tried to make a hash value spring into existence, and it couldn‘t be created for some peculiar
reason.
Module name must be constant
(F) Only a bare module name is allowed as the first argument to a "use".
msg%s not implemented
(F) You don‘t have System V message IPC on your system.
Multidimensional syntax %s not supported
(W) Multidimensional arrays aren‘t written like $foo[1,2,3]. They‘re written like
$foo[1][2][3], as in C.
Name "%s::%s" used only once: possible typo
(W) Typographical errors often show up as unique variable names. If you had a good reason for having
a unique name, then just mention it again somehow to suppress the message. The use vars pragma
is provided for just this purpose.
Negative length
(F) You tried to do a read/write/send/recv operation with a buffer length that is less than 0. This is
difficult to imagine.
nested *?+ in regexp
(F) You can‘t quantify a quantifier without intervening parentheses. So things like ** or +* or ?* are
illegal.
Note, however, that the minimal matching quantifiers, *?, +?, and ?? appear to be nested quantifiers,
but aren‘t. See perlre.
No #! line
(F) The setuid emulator requires that scripts have a well−formed #! line even on machines that don‘t
support the #! construct.
No %s allowed while running setuid
(F) Certain operations are deemed to be too insecure for a setuid or setgid script to even be allowed to
attempt. Generally speaking there will be another way to do what you want that is, if not secure, at
least securable. See perlsec.
No −e allowed in setuid scripts
(F) A setuid script can‘t be specified by the user.
18−Oct−1998
Version 5.005_02
397
perldiag
Perl Programmers Reference Guide
perldiag
No comma allowed after %s
(F) A list operator that has a filehandle or "indirect object" is not allowed to have a comma between
that and the following arguments. Otherwise it‘d be just another one of the arguments.
One possible cause for this is that you expected to have imported a constant to your name space with
use or import while no such importing took place, it may for example be that your operating system
does not support that particular constant. Hopefully you did use an explicit import list for the constants
you expect to see, please see use and import. While an explicit import list would probably have caught
this error earlier it naturally does not remedy the fact that your operating system still does not support
that constant. Maybe you have a typo in the constants of the symbol import list of use or import or in
the constant name at the line where this error was triggered?
No command into which to pipe on command line
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘|’ at the
end of the command line, so it doesn‘t know where you want to pipe the output from this command.
No DB::DB routine defined
(F) The currently executing code was compiled with the −d switch, but for some reason the perl5db.pl
file (or some facsimile thereof) didn‘t define a routine to be called at the beginning of each statement.
Which is odd, because the file should have been required automatically, and should have blown up the
require if it didn‘t parse right.
No dbm on this machine
(P) This is counted as an internal error, because every machine should supply dbm nowadays, because
Perl comes with SDBM. See SDBM_File.
No DBsub routine
(F) The currently executing code was compiled with the −d switch, but for some reason the perl5db.pl
file (or some facsimile thereof) didn‘t define a DB::sub routine to be called at the beginning of each
ordinary subroutine call.
No error file after 2> or 2>> on command line
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘2>’ or a
‘2>>’ on the command line, but can‘t find the name of the file to which to write data destined for
stderr.
No input file after < on command line
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘<’ on the
command line, but can‘t find the name of the file from which to read data for stdin.
No output file after > on command line
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a lone ‘>’ at
the end of the command line, so it doesn‘t know where you wanted to redirect stdout.
No output file after > or >> on command line
(F) An error peculiar to VMS. Perl handles its own command line redirection, and found a ‘>’ or a
‘>>’ on the command line, but can‘t find the name of the file to which to write data destined for stdout.
No Perl script found in input
(F) You called perl −x, but no line was found in the file beginning with #! and containing the word
"perl".
No setregid available
(F) Configure didn‘t find anything resembling the setregid() call for your system.
398
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
No setreuid available
(F) Configure didn‘t find anything resembling the setreuid() call for your system.
No space allowed after −I
(F) The argument to −I must follow the −I immediately with no intervening space.
No such array field
(F) You tried to access an array as a hash, but the field name used is not defined. The hash at index 0
should map all valid field names to array indices for that to work.
No such field "%s" in variable %s of type %s
(F) You tried to access a field of a typed variable where the type does not know about the field name.
The field names are looked up in the %FIELDS hash in the type package at compile time. The
%FIELDS hash is usually set up with the ‘fields’ pragma.
No such pipe open
(P) An error peculiar to VMS. The internal routine my_pclose() tried to close a pipe which hadn‘t
been opened. This should have been caught earlier as an attempt to close an unopened filehandle.
No such signal: SIG%s
(W) You specified a signal name as a subscript to %SIG that was not recognized. Say kill −l in
your shell to see the valid signal names on your system.
Not a CODE reference
(F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference
to something else instead. You can use the ref() function to find out what kind of ref it really was.
See also perlref.
Not a format reference
(F) I‘m not sure how you managed to generate a reference to an anonymous format, but this indicates
you did, and that it didn‘t exist.
Not a GLOB reference
(F) Perl was trying to evaluate a reference to a "typeglob" (that is, a symbol table entry that looks like
*foo), but found a reference to something else instead. You can use the ref() function to find out
what kind of ref it really was. See perlref.
Not a HASH reference
(F) Perl was trying to evaluate a reference to a hash value, but found a reference to something else
instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
Not a perl script
(F) The setuid emulator requires that scripts have a well−formed #! line even on machines that don‘t
support the #! construct. The line must mention perl.
Not a SCALAR reference
(F) Perl was trying to evaluate a reference to a scalar value, but found a reference to something else
instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
Not a subroutine reference
(F) Perl was trying to evaluate a reference to a code value (that is, a subroutine), but found a reference
to something else instead. You can use the ref() function to find out what kind of ref it really was.
See also perlref.
Not a subroutine reference in overload table
(F) An attempt was made to specify an entry in an overloading table that doesn‘t somehow point to a
valid subroutine. See overload.
18−Oct−1998
Version 5.005_02
399
perldiag
Perl Programmers Reference Guide
perldiag
Not an ARRAY reference
(F) Perl was trying to evaluate a reference to an array value, but found a reference to something else
instead. You can use the ref() function to find out what kind of ref it really was. See perlref.
Not enough arguments for %s
(F) The function requires more arguments than you specified.
Not enough format arguments
(W) A format specified more picture fields than the next line supplied. See perlform.
Null filename used
(F) You can‘t require the null filename, especially because on many machines that means the current
directory! See require.
Null picture in formline
(F) The first argument to formline must be a valid format picture specification. It was found to be
empty, which probably means you supplied it an uninitialized value. See perlform.
NULL OP IN RUN
(P) Some internal routine called run() with a null opcode pointer.
Null realloc
(P) An attempt was made to realloc NULL.
NULL regexp argument
(P) The internal pattern matching routines blew it big time.
NULL regexp parameter
(P) The internal pattern matching routines are out of their gourd.
Number too long
(F) Perl limits the representation of decimal numbers in programs to about about 250 characters.
You‘ve exceeded that length. Future versions of Perl are likely to eliminate this arbitrary limitation.
In the meantime, try using scientific notation (e.g. "1e6" instead of "1_000_000").
Odd number of elements in hash assignment
(S) You specified an odd number of elements to initialize a hash, which is odd, because hashes come in
key/value pairs.
Offset outside string
(F) You tried to do a read/write/send/recv operation with an offset pointing outside the buffer. This is
difficult to imagine. The sole exception to this is that sysread()ing past the buffer will extend the
buffer and zero pad the new area.
oops: oopsAV
(S) An internal warning that the grammar is screwed up.
oops: oopsHV
(S) An internal warning that the grammar is screwed up.
Operation ‘%s‘: no method found, %s
(F) An attempt was made to perform an overloaded operation for which no handler was defined.
While some handlers can be autogenerated in terms of other handlers, there is no default handler for
any operation, unless fallback overloading key is specified to be true. See overload.
Operator or semicolon missing before %s
(S) You used a variable or subroutine call where the parser was expecting an operator. The parser has
assumed you really meant to use an operator, but this is highly likely to be incorrect. For example, if
400
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
you say "*foo *foo" it will be interpreted as if you said "*foo * ‘foo‘".
Out of memory for yacc stack
(F) The yacc parser wanted to grow its stack so it could continue parsing, but realloc() wouldn‘t
give it more memory, virtual or otherwise.
Out of memory during request for %s
(X|F) The malloc() function returned 0, indicating there was insufficient remaining memory (or
virtual memory) to satisfy the request.
The request was judged to be small, so the possibility to trap it depends on the way perl was compiled.
By default it is not trappable. However, if compiled for this, Perl may use the contents of $^M as an
emergency pool after die()ing with this message. In this case the error is trappable once.
Out of memory during "large" request for %s
(F) The malloc() function returned 0, indicating there was insufficient remaining memory (or
virtual memory) to satisfy the request. However, the request was judged large enough (compile−time
default is 64K), so a possibility to shut down by trapping this error is granted.
Out of memory during ridiculously large request
(F) You can‘t allocate more than 2^31+"small amount" bytes. This error is most likely to be caused by
a typo in the Perl program. e.g., $arr[time] instead of $arr[$time].
page overflow
(W) A single call to write() produced more lines than can fit on a page. See perlform.
panic: ck_grep
(P) Failed an internal consistency check trying to compile a grep.
panic: ck_split
(P) Failed an internal consistency check trying to compile a split.
panic: corrupt saved stack index
(P) The savestack was requested to restore more localized values than there are in the savestack.
panic: die %s
(P) We popped the context stack to an eval context, and then discovered it wasn‘t an eval context.
panic: do_match
(P) The internal pp_match() routine was called with invalid operational data.
panic: do_split
(P) Something terrible went wrong in setting up for the split.
panic: do_subst
(P) The internal pp_subst() routine was called with invalid operational data.
panic: do_trans
(P) The internal do_trans() routine was called with invalid operational data.
panic: frexp
(P) The library function frexp() failed, making printf("%f") impossible.
panic: goto
(P) We popped the context stack to a context with the specified label, and then discovered it wasn‘t a
context we know how to do a goto in.
panic: INTERPCASEMOD
(P) The lexer got into a bad state at a case modifier.
18−Oct−1998
Version 5.005_02
401
perldiag
Perl Programmers Reference Guide
perldiag
panic: INTERPCONCAT
(P) The lexer got into a bad state parsing a string with brackets.
panic: last
(P) We popped the context stack to a block context, and then discovered it wasn‘t a block context.
panic: leave_scope clearsv
(P) A writable lexical variable became read−only somehow within the scope.
panic: leave_scope inconsistency
(P) The savestack probably got out of sync. At least, there was an invalid enum on the top of it.
panic: malloc
(P) Something requested a negative number of bytes of malloc.
panic: mapstart
(P) The compiler is screwed up with respect to the map() function.
panic: null array
(P) One of the internal array routines was passed a null AV pointer.
panic: pad_alloc
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and
lexicals from.
panic: pad_free curpad
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and
lexicals from.
panic: pad_free po
(P) An invalid scratch pad offset was detected internally.
panic: pad_reset curpad
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and
lexicals from.
panic: pad_sv po
(P) An invalid scratch pad offset was detected internally.
panic: pad_swipe curpad
(P) The compiler got confused about which scratch pad it was allocating and freeing temporaries and
lexicals from.
panic: pad_swipe po
(P) An invalid scratch pad offset was detected internally.
panic: pp_iter
(P) The foreach iterator got called in a non−loop context frame.
panic: realloc
(P) Something requested a negative number of bytes of realloc.
panic: restartop
(P) Some internal routine requested a goto (or something like it), and didn‘t supply the destination.
panic: return
(P) We popped the context stack to a subroutine or eval context, and then discovered it wasn‘t a
subroutine or eval context.
402
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
panic: scan_num
(P) scan_num() got called on something that wasn‘t a number.
panic: sv_insert
(P) The sv_insert() routine was told to remove more string than there was string.
panic: top_env
(P) The compiler attempted to do a goto, or something weird like that.
panic: yylex
(P) The lexer got into a bad state while processing a case modifier.
Parentheses missing around "%s" list
(W) You said something like
my $foo, $bar = @_;
when you meant
my ($foo, $bar) = @_;
Remember that "my" and "local" bind closer than comma.
Perl %3.3f required—this is only version %s, stopped
(F) The module in question uses features of a version of Perl more recent than the currently running
version. How long has it been since you upgraded, anyway? See require.
Permission denied
(F) The setuid emulator in suidperl decided you were up to no good.
pid %d not a child
(W) A warning peculiar to VMS. Waitpid() was asked to wait for a process which isn‘t a
subprocess of the current process. While this is fine from VMS’ perspective, it‘s probably not what
you intended.
POSIX getpgrp can‘t take an argument
(F) Your C compiler uses POSIX getpgrp(), which takes no argument, unlike the BSD version,
which takes a pid.
Possible attempt to put comments in qw() list
(W) qw() lists contain items separated by whitespace; as with literal strings, comment characters are
not ignored, but are instead treated as literal data. (You may have used different delimiters than the
parentheses shown here; braces are also frequently used.)
You probably wrote something like this:
@list = qw(
a # a comment
b # another comment
);
when you should have written this:
@list = qw(
a
b
);
If you really want comments, build your list the old−fashioned way, with quotes and commas:
@list = (
’a’,
18−Oct−1998
# a comment
Version 5.005_02
403
perldiag
Perl Programmers Reference Guide
’b’,
perldiag
# another comment
);
Possible attempt to separate words with commas
(W) qw() lists contain items separated by whitespace; therefore commas aren‘t needed to separate the
items. (You may have used different delimiters than the parentheses shown here; braces are also
frequently used.)
You probably wrote something like this:
qw! a, b, c !;
which puts literal commas into some of the list items. Write it without commas if you don‘t want them
to appear in your data:
qw! a b c !;
Possible memory corruption: %s overflowed 3rd argument
(F) An ioctl() or fcntl() returned more than Perl was bargaining for. Perl guesses a reasonable
buffer size, but puts a sentinel byte at the end of the buffer just in case. This sentinel byte got
clobbered, and Perl assumes that memory is now corrupted. See ioctl.
Precedence problem: open %s should be open(%s)
(S) The old irregular construct
open FOO || die;
is now misinterpreted as
open(FOO || die);
because of the strict regularization of Perl 5‘s grammar into unary and list operators. (The old open
was a little of both.) You must put parentheses around the filehandle, or use the new "or" operator
instead of "||".
print on closed filehandle %s
(W) The filehandle you‘re printing on got itself closed sometime before now. Check your logic flow.
printf on closed filehandle %s
(W) The filehandle you‘re writing to got itself closed sometime before now. Check your logic flow.
Probable precedence problem on %s
(W) The compiler found a bareword where it expected a conditional, which often indicates that an || or
&& was parsed as part of the last argument of the previous construct, for example:
open FOO || die;
Prototype mismatch: %s vs %s
(S) The subroutine being declared or defined had previously been declared or defined with a different
function prototype.
Range iterator outside integer range
(F) One (or both) of the numeric arguments to the range operator ".." are outside the range which can
be represented by integers internally. One possible workaround is to force Perl to use magical string
increment by prepending "0" to your numbers.
Read on closed filehandle <%s>
(W) The filehandle you‘re reading from got itself closed sometime before now. Check your logic flow.
404
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Reallocation too large: %lx
(F) You can‘t allocate more than 64K on an MS−DOS machine.
Recompile perl with −DDEBUGGING to use −D switch
(F) You can‘t use the −D option unless the code to produce the desired output is compiled into Perl,
which entails some overhead, which is why it‘s currently left out of your copy.
Recursive inheritance detected in package ‘%s’
(F) More than 100 levels of inheritance were used. Probably indicates an unintended loop in your
inheritance hierarchy.
Recursive inheritance detected while looking for method ‘%s’ in package ‘%s’
(F) More than 100 levels of inheritance were encountered while invoking a method.
indicates an unintended loop in your inheritance hierarchy.
Probably
Reference found where even−sized list expected
(W) You gave a single reference where Perl was expecting a list with an even number of elements (for
assignment to a hash). This usually means that you used the anon hash constructor when you meant to
use parens. In any case, a hash requires key/value pairs.
%hash
%hash
%hash
%hash
=
=
=
=
{ one => 1, two => 2, };
[ qw/ an anon array / ];
( one => 1, two => 2, );
qw( one 1 two 2 );
# WRONG
# WRONG
# right
# also fine
Reference miscount in sv_replace()
(W) The internal sv_replace() function was handed a new SV with a reference count of other than
1.
regexp *+ operand could be empty
(F) The part of the regexp subject to either the * or + quantifier could match an empty string.
regexp memory corruption
(P) The regular expression engine got confused by what the regular expression compiler gave it.
regexp out of space
(P) A "can‘t happen" error, because safemalloc() should have caught it earlier.
regexp too big
(F) The current implementation of regular expressions uses shorts as address offsets within a string.
Unfortunately this means that if the regular expression compiles to longer than 32767, it‘ll blow up.
Usually when you want a regular expression this big, there is a better way to do it with multiple
statements. See perlre.
Reversed %s= operator
(W) You wrote your assignment operator backwards.
ambiguity with subsequent unary operators.
The = must always comes last, to avoid
Runaway format
(F) Your format contained the ~~ repeat−until−blank sequence, but it produced 200 lines at once, and
the 200th line looked exactly like the 199th line. Apparently you didn‘t arrange for the arguments to
exhaust themselves, either by using ^ instead of @ (for scalar variables), or by shifting or popping (for
array variables). See perlform.
Scalar value @%s[%s] better written as $%s[%s]
(W) You‘ve used an array slice (indicated by @) to select a single element of an array. Generally it‘s
better to ask for a scalar value (indicated by $). The difference is that $foo[&bar] always behaves
like a scalar, both when assigning to it and when evaluating its argument, while @foo[&bar]
18−Oct−1998
Version 5.005_02
405
perldiag
Perl Programmers Reference Guide
perldiag
behaves like a list when you assign to it, and provides a list context to its subscript, which can do weird
things if you‘re expecting only one subscript.
On the other hand, if you were actually hoping to treat the array element as a list, you need to look into
how references work, because Perl will not magically convert between scalars and lists for you. See
perlref.
Scalar value @%s{%s} better written as $%s{%s}
(W) You‘ve used a hash slice (indicated by @) to select a single element of a hash. Generally it‘s
better to ask for a scalar value (indicated by $). The difference is that $foo{&bar} always behaves
like a scalar, both when assigning to it and when evaluating its argument, while @foo{&bar}
behaves like a list when you assign to it, and provides a list context to its subscript, which can do weird
things if you‘re expecting only one subscript.
On the other hand, if you were actually hoping to treat the hash element as a list, you need to look into
how references work, because Perl will not magically convert between scalars and lists for you. See
perlref.
Script is not setuid/setgid in suidperl
(F) Oddly, the suidperl program was invoked on a script without a setuid or setgid bit set. This doesn‘t
make much sense.
Search pattern not terminated
(F) The lexer couldn‘t find the final delimiter of a // or m{} construct. Remember that bracketing
delimiters count nesting level. Missing the leading $ from a variable $m may cause this error.
%sseek() on unopened file
(W) You tried to use the seek() or sysseek() function on a filehandle that was either never
opened or has since been closed.
select not implemented
(F) This machine doesn‘t implement the select() system call.
sem%s not implemented
(F) You don‘t have System V semaphore IPC on your system.
semi−panic: attempt to dup freed string
(S) The internal newSVsv() routine was called to duplicate a scalar that had previously been marked
as free.
Semicolon seems to be missing
(W) A nearby syntax error was probably caused by a missing semicolon, or possibly some other
missing operator, such as a comma.
Send on closed socket
(W) The filehandle you‘re sending to got itself closed sometime before now. Check your logic flow.
Sequence (? incomplete
(F) A regular expression ended with an incomplete extension (?. See perlre.
Sequence (?#... not terminated
(F) A regular expression comment must be terminated by a closing parenthesis.
parentheses aren‘t allowed. See perlre.
Embedded
Sequence (?%s...) not implemented
(F) A proposed regular expression extension has the character reserved but has not yet been written.
See perlre.
406
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Sequence (?%s...) not recognized
(F) You used a regular expression extension that doesn‘t make sense. See perlre.
Server error
Also known as "500 Server error".
This is a CGI error, not a Perl error.
You need to make sure your script is executable, is accessible by the user CGI is running the script
under (which is probably not the user account you tested it under), does not rely on any environment
variables (like PATH) from the user it isn‘t running under, and isn‘t in a location where the CGI server
can‘t find it, basically, more or less. Please see the following for more information:
http://www.perl.com/perl/faq/idiots−guide.html
http://www.perl.com/perl/faq/perl−cgi−faq.html
ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi−faq
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
http://www−genome.wi.mit.edu/WWW/faqs/www−security−faq.html
setegid() not implemented
(F) You tried to assign to $), and your operating system doesn‘t support the setegid() system call
(or equivalent), or at least Configure didn‘t think so.
seteuid() not implemented
(F) You tried to assign to $>, and your operating system doesn‘t support the seteuid() system call
(or equivalent), or at least Configure didn‘t think so.
setrgid() not implemented
(F) You tried to assign to $(, and your operating system doesn‘t support the setrgid() system call
(or equivalent), or at least Configure didn‘t think so.
setruid() not implemented
(F) You tried to assign to $<, and your operating system doesn‘t support the setruid() system call
(or equivalent), or at least Configure didn‘t think so.
Setuid/gid script is writable by world
(F) The setuid emulator won‘t run a script that is writable by the world, because the world might have
written on it already.
shm%s not implemented
(F) You don‘t have System V shared memory IPC on your system.
shutdown() on closed fd
(W) You tried to do a shutdown on a closed socket. Seems a bit superfluous.
SIG%s handler "%s" not defined
(W) The signal handler named in %SIG doesn‘t, in fact, exist. Perhaps you put it into the wrong
package?
sort is now a reserved word
(F) An ancient error message that almost nobody ever runs into anymore. But before sort was a
keyword, people sometimes used it as a filehandle.
Sort subroutine didn‘t return a numeric value
(F) A sort comparison routine must return a number. You probably blew it by not using <=> or cmp,
or by not using them correctly. See sort.
18−Oct−1998
Version 5.005_02
407
perldiag
Perl Programmers Reference Guide
perldiag
Sort subroutine didn‘t return single value
(F) A sort comparison subroutine may not return a list value with more or less than one element. See
sort.
Split loop
(P) The split was looping infinitely. (Obviously, a split shouldn‘t iterate more times than there are
characters of input, which is what happened.) See split.
Stat on unopened file <%s>
(W) You tried to use the stat() function (or an equivalent file test) on a filehandle that was either
never opened or has since been closed.
Statement unlikely to be reached
(W) You did an exec() with some statement after it other than a die(). This is almost always an
error, because exec() never returns unless there was a failure. You probably wanted to use
system() instead, which does return. To suppress this warning, put the exec() in a block by itself.
Stub found while resolving method ‘%s’ overloading ‘%s’ in package ‘%s’
(P) Overloading resolution over @ISA tree may be broken by importation stubs. Stubs should never be
implicitely created, but explicit calls to can may break this.
Subroutine %s redefined
(W) You redefined a subroutine. To suppress this warning, say
{
local $^W = 0;
eval "sub name { ... }";
}
Substitution loop
(P) The substitution was looping infinitely. (Obviously, a substitution shouldn‘t iterate more times
than there are characters of input, which is what happened.) See the discussion of substitution in
Quote and Quote−like Operators in perlop.
Substitution pattern not terminated
(F) The lexer couldn‘t find the interior delimiter of a s/// or s{}{} construct. Remember that
bracketing delimiters count nesting level. Missing the leading $ from variable $s may cause this error.
Substitution replacement not terminated
(F) The lexer couldn‘t find the final delimiter of a s/// or s{}{} construct. Remember that bracketing
delimiters count nesting level. Missing the leading $ from variable $s may cause this error.
substr outside of string
(S),(W) You tried to reference a substr() that pointed outside of a string. That is, the absolute
value of the offset was larger than the length of the string. See substr. This warning is mandatory if
substr is used in an lvalue context (as the left hand side of an assignment or as a subroutine argument
for example).
suidperl is no longer needed since %s
(F) Your Perl was compiled with −DSETUID_SCRIPTS_ARE_SECURE_NOW, but a version of the
setuid emulator somehow got run anyway.
syntax error
(F) Probably means you had a syntax error. Common reasons include:
A keyword is misspelled.
408
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
A semicolon is missing.
A comma is missing.
An opening or closing parenthesis is missing.
An opening or closing brace is missing.
A closing quote is missing.
Often there will be another error message associated with the syntax error giving more information.
(Sometimes it helps to turn on −w.) The error message itself often tells you where it was in the line
when it decided to give up. Sometimes the actual error is several tokens before this, because Perl is
good at understanding random input. Occasionally the line number may be misleading, and once in a
blue moon the only way to figure out what‘s triggering the error is to call perl −c repeatedly,
chopping away half the program each time to see if the error went away. Sort of the cybernetic version
of 20 questions.
syntax error at line %d: ‘%s’ unexpected
(A) You‘ve accidentally run your script through the Bourne shell instead of Perl. Check the #! line, or
manually feed your script into Perl yourself.
System V %s is not implemented on this machine
(F) You tried to do something with a function beginning with "sem", "shm", or "msg" but that System
V IPC is not implemented in your machine. In some machines the functionality can exist but be
unconfigured. Consult your system support.
Syswrite on closed filehandle
(W) The filehandle you‘re writing to got itself closed sometime before now. Check your logic flow.
Target of goto is too deeply nested
(F) You tried to use goto to reach a label that was too deeply nested for Perl to reach. Perl is doing
you a favor by refusing.
tell() on unopened file
(W) You tried to use the tell() function on a filehandle that was either never opened or has since
been closed.
Test on unopened file <%s>
(W) You tried to invoke a file test operator on a filehandle that isn‘t open. Check your logic. See also
−X.
That use of $[ is unsupported
(F) Assignment to $[ is now strictly circumscribed, and interpreted as a compiler directive. You may
say only one of
$[ = 0;
$[ = 1;
...
local $[ = 0;
local $[ = 1;
...
This is to prevent the problem of one module changing the array base out from under another module
inadvertently. See $[.
The %s function is unimplemented
The function indicated isn‘t implemented on this architecture, according to the probings of Configure.
The crypt() function is unimplemented due to excessive paranoia
(F) Configure couldn‘t find the crypt() function on your machine, probably because your vendor
didn‘t supply it, probably because they think the U.S. Government thinks it‘s a secret, or at least that
they will continue to pretend that it is. And if you quote me on that, I will deny it.
18−Oct−1998
Version 5.005_02
409
perldiag
Perl Programmers Reference Guide
perldiag
The stat preceding −l _ wasn‘t an lstat
(F) It makes no sense to test the current stat buffer for symbolic linkhood if the last stat that wrote to
the stat buffer already went past the symlink to get to the real file. Use an actual filename instead.
times not implemented
(F) Your version of the C library apparently doesn‘t do times(). I suspect you‘re not running on
Unix.
Too few args to syscall
(F) There has to be at least one argument to syscall() to specify the system call to call, silly dilly.
Too late for "−T" option
(X) The #! line (or local equivalent) in a Perl script contains the −T option, but Perl was not invoked
with −T in its command line. This is an error because, by the time Perl discovers a −T in a script, it‘s
too late to properly taint everything from the environment. So Perl gives up.
If the Perl script is being executed as a command using the #! mechanism (or its local equivalent), this
error can usually be fixed by editing the #! line so that the −T option is a part of Perl‘s first argument:
e.g. change perl −n −T to perl −T −n.
If the Perl script is being executed as perl scriptname, then the −T option must appear on the
command line: perl −T scriptname.
Too late for "−%s" option
(X) The #! line (or local equivalent) in a Perl script contains the −M or −m option. This is an error
because −M and −m options are not intended for use inside scripts. Use the use pragma instead.
Too many (‘s
Too many )‘s
(A) You‘ve accidentally run your script through csh instead of Perl. Check the #! line, or manually
feed your script into Perl yourself.
Too many args to syscall
(F) Perl supports a maximum of only 14 args to syscall().
Too many arguments for %s
(F) The function requires fewer arguments than you specified.
trailing \ in regexp
(F) The regular expression ends with an unbackslashed backslash. Backslash it. See perlre.
Transliteration pattern not terminated
(F) The lexer couldn‘t find the interior delimiter of a tr/// or tr[][] or y/// or y[][] construct. Missing the
leading $ from variables $tr or $y may cause this error.
Transliteration replacement not terminated
(F) The lexer couldn‘t find the final delimiter of a tr/// or tr[][] construct.
truncate not implemented
(F) Your machine doesn‘t implement a file truncation mechanism that Configure knows about.
Type of arg %d to %s must be %s (not %s)
(F) This function requires the argument in that position to be of a certain type. Arrays must be
@NAME or @{EXPR}. Hashes must be %NAME or %{EXPR}. No implicit dereferencing is
allowed—use the {EXPR} forms as an explicit dereference. See perlref.
umask: argument is missing initial 0
(W) A umask of 222 is incorrect. It should be 0222, because octal literals always start with 0 in Perl,
as in C.
410
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
umask not implemented
(F) Your machine doesn‘t implement the umask function and you tried to use it to restrict permissions
for yourself (EXPR & 0700).
Unable to create sub named "%s"
(F) You attempted to create or access a subroutine with an illegal name.
Unbalanced context: %d more PUSHes than POPs
(W) The exit code detected an internal inconsistency in how many execution contexts were entered and
left.
Unbalanced saves: %d more saves than restores
(W) The exit code detected an internal inconsistency in how many values were temporarily localized.
Unbalanced scopes: %d more ENTERs than LEAVEs
(W) The exit code detected an internal inconsistency in how many blocks were entered and left.
Unbalanced tmps: %d more allocs than frees
(W) The exit code detected an internal inconsistency in how many mortal scalars were allocated and
freed.
Undefined format "%s" called
(F) The format indicated doesn‘t seem to exist. Perhaps it‘s really in another package? See perlform.
Undefined sort subroutine "%s" called
(F) The sort comparison routine specified doesn‘t seem to exist. Perhaps it‘s in a different package?
See sort.
Undefined subroutine &%s called
(F) The subroutine indicated hasn‘t been defined, or if it was, it has since been undefined.
Undefined subroutine called
(F) The anonymous subroutine you‘re trying to call hasn‘t been defined, or if it was, it has since been
undefined.
Undefined subroutine in sort
(F) The sort comparison routine specified is declared but doesn‘t seem to have been defined yet. See
sort.
Undefined top format "%s" called
(F) The format indicated doesn‘t seem to exist. Perhaps it‘s really in another package? See perlform.
Undefined value assigned to typeglob
(W) An undefined value was assigned to a typeglob, a la *foo = undef. This does nothing. It‘s
possible that you really mean undef *foo.
unexec of %s into %s failed!
(F) The unexec() routine failed for some reason. See your local FSF representative, who probably
put it there in the first place.
Unknown BYTEORDER
(F) There are no byte−swapping functions for a machine with this byte order.
unmatched () in regexp
(F) Unbackslashed parentheses must always be balanced in regular expressions. If you‘re a vi user, the
% key is valuable for finding the matching parenthesis. See perlre.
18−Oct−1998
Version 5.005_02
411
perldiag
Perl Programmers Reference Guide
perldiag
Unmatched right bracket
(F) The lexer counted more closing curly brackets (braces) than opening ones, so you‘re probably
missing an opening bracket. As a general rule, you‘ll find the missing one (so to speak) near the place
you were last editing.
unmatched [] in regexp
(F) The brackets around a character class must match. If you wish to include a closing bracket in a
character class, backslash it or put it first. See perlre.
Unquoted string "%s" may clash with future reserved word
(W) You used a bareword that might someday be claimed as a reserved word. It‘s best to put such a
word in quotes, or capitalize it somehow, or insert an underbar into it. You might also declare it as a
subroutine.
Unrecognized character %s
(F) The Perl parser has no idea what to do with the specified character in your Perl script (or eval).
Perhaps you tried to run a compressed script, a binary program, or a directory as a Perl program.
Unrecognized signal name "%s"
(F) You specified a signal name to the kill() function that was not recognized. Say kill −l in
your shell to see the valid signal names on your system.
Unrecognized switch: −%s (−h will show valid options)
(F) You specified an illegal option to Perl. Don‘t do that. (If you think you didn‘t do that, check the #!
line to see if it‘s supplying the bad switch on your behalf.)
Unsuccessful %s on filename containing newline
(W) A file operation was attempted on a filename, and that operation failed, PROBABLY because the
filename contained a newline, PROBABLY because you forgot to chop() or chomp() it off. See
chomp.
Unsupported directory function "%s" called
(F) Your machine doesn‘t support opendir() and readdir().
Unsupported function fork
(F) Your version of executable does not support forking.
Note that under some systems, like OS/2, there may be different flavors of Perl executables, some of
which may support fork, some not. Try changing the name you call Perl by to perl_, perl__, and
so on.
Unsupported function %s
(F) This machine doesn‘t implement the indicated function, apparently. At least, Configure doesn‘t
think so.
Unsupported socket function "%s" called
(F) Your machine doesn‘t support the Berkeley socket mechanism, or at least that‘s what Configure
thought.
Unterminated <> operator
(F) The lexer saw a left angle bracket in a place where it was expecting a term, so it‘s looking for the
corresponding right angle bracket, and not finding it. Chances are you left some needed parentheses
out earlier in the line, and you really meant a "less than".
Use of "$$bar() or $obj−>bar()).
This bug will be rectified in Perl 5.005, which will use method lookup only for methods’ AUTOLOADs.
However, there is a significant base of existing code that may be using the old behavior. So, as an
interim step, Perl 5.004 issues an optional warning when non−methods use inherited AUTOLOADs.
The simple rule is: Inheritance will not work when autoloading non−methods. The simple fix for old
code is: In any module that used to depend on inheriting AUTOLOAD for non−methods from a base
class named BaseClass, execute *AUTOLOAD = \&BaseClass::AUTOLOAD during startup.
In code that currently says use AutoLoader; @ISA = qw(AutoLoader); you should
remove AutoLoader from @ISA and change use AutoLoader; to use AutoLoader
‘AUTOLOAD‘;.
Use of reserved word "%s" is deprecated
(D) The indicated bareword is a reserved word. Future versions of perl may use it as a keyword, so
you‘re better off either explicitly quoting the word in a manner appropriate for its context of use, or
using a different name altogether. The warning can be suppressed for subroutine names by either
adding a & prefix, or using a package qualifier, e.g. &our(), or Foo::our().
Use of %s is deprecated
(D) The construct indicated is no longer recommended for use, generally because there‘s a better way
to do it, and also because the old way has bad side effects.
Use of uninitialized value
(W) An undefined value was used as if it were already defined. It was interpreted as a "" or a 0, but
maybe it was a mistake. To suppress this warning assign an initial value to your variables.
18−Oct−1998
Version 5.005_02
413
perldiag
Perl Programmers Reference Guide
perldiag
Useless use of "re" pragma
(W) You did use re; without any arguments. That isn‘t very useful.
Useless use of %s in void context
(W) You did something without a side effect in a context that does nothing with the return value, such
as a statement that doesn‘t return a value from a block, or the left side of a scalar comma operator.
Very often this points not to stupidity on your part, but a failure of Perl to parse your program the way
you thought it would. For example, you‘d get this if you mixed up your C precedence with Python
precedence and said
$one, $two = 1, 2;
when you meant to say
($one, $two) = (1, 2);
Another common error is to use ordinary parentheses to construct a list reference when you should be
using square or curly brackets, for example, if you say
$array = (1,2);
when you should have said
$array = [1,2];
The square brackets explicitly turn a list value into a scalar value, while parentheses do not. So when a
parenthesized list is evaluated in a scalar context, the comma is treated like C‘s comma operator, which
throws away the left argument, which is not what you want. See perlref for more on this.
untie attempted while %d inner references still exist
(W) A copy of the object returned from tie (or tied) was still valid when untie was called.
Value of %s can be "0"; test with defined()
(W) In a conditional expression, you used ’ may not both be specified on command line
(F) An error peculiar to VMS. Perl does its own command line redirection, and thinks you tried to
redirect stdout both to a file and into a pipe to another command. You need to choose one or the other,
though nothing‘s stopping you from piping into a program or Perl script which ‘splits’ output into two
streams, such as
open(OUT,">$ARGV[0]") or die "Can’t write to $ARGV[0]: $!";
while () {
print;
print OUT;
}
close OUT;
416
Version 5.005_02
18−Oct−1998
perldiag
Perl Programmers Reference Guide
perldiag
Got an error from DosAllocMem
(P) An error peculiar to OS/2. Most probably you‘re using an obsolete version of Perl, and this should
not happen anyway.
Malformed PERLLIB_PREFIX
(F) An error peculiar to OS/2. PERLLIB_PREFIX should be of the form
prefix1;prefix2
or
prefix1 prefix2
with nonempty prefix1 and prefix2. If prefix1 is indeed a prefix of a builtin library search path,
prefix2 is substituted. The error may appear if components are not found, or are too long. See
"PERLLIB_PREFIX" in README.os2.
PERL_SH_DIR too long
(F) An error peculiar to OS/2. PERL_SH_DIR is the directory to find the sh−shell in.
"PERL_SH_DIR" in README.os2.
See
Process terminated by SIG%s
(W) This is a standard message issued by OS/2 applications, while *nix applications die in silence. It
is considered a feature of the OS/2 port. One can easily disable this by appropriate sighandlers, see
Signals in perlipc. See also "Process terminated by SIGTERM/SIGINT" in README.os2.
18−Oct−1998
Version 5.005_02
417
perlform
Perl Programmers Reference Guide
perlform
NAME
perlform − Perl formats
DESCRIPTION
Perl has a mechanism to help you generate simple reports and charts. To facilitate this, Perl helps you code
up your output page close to how it will look when it‘s printed. It can keep track of things like how many
lines are on a page, what page you‘re on, when to print page headers, etc. Keywords are borrowed from
FORTRAN: format() to declare and write() to execute; see their entries in perlfunc. Fortunately, the
layout is much more legible, more like BASIC‘s PRINT USING statement. Think of it as a poor man‘s
nroff(1).
Formats, like packages and subroutines, are declared rather than executed, so they may occur at any point in
your program. (Usually it‘s best to keep them all together though.) They have their own namespace apart
from all the other "types" in Perl. This means that if you have a function named "Foo", it is not the same
thing as having a format named "Foo". However, the default name for the format associated with a given
filehandle is the same as the name of the filehandle. Thus, the default format for STDOUT is named
"STDOUT", and the default format for filehandle TEMP is named "TEMP". They just look the same. They
aren‘t.
Output record formats are declared as follows:
format NAME =
FORMLIST
.
If name is omitted, format "STDOUT" is defined. FORMLIST consists of a sequence of lines, each of which
may be one of three types:
1.
A comment, indicated by putting a ‘#’ in the first column.
2.
A "picture" line giving the format for one output line.
3.
An argument line supplying values to plug into the previous picture line.
Picture lines are printed exactly as they look, except for certain fields that substitute values into the line.
Each field in a picture line starts with either "@" (at) or "^" (caret). These lines do not undergo any kind of
variable interpolation. The at field (not to be confused with the array marker @) is the normal kind of field;
the other kind, caret fields, are used to do rudimentary multi−line text block filling. The length of the field is
supplied by padding out the field with multiple "<", ">", or "|" characters to specify, respectively, left
justification, right justification, or centering. If the variable would exceed the width specified, it is truncated.
As an alternate form of right justification, you may also use "#" characters (with an optional ".") to specify a
numeric field. This way you can line up the decimal points. If any value supplied for these fields contains a
newline, only the text up to the newline is printed. Finally, the special field "@*" can be used for printing
multi−line, nontruncated values; it should appear by itself on a line.
The values are specified on the following line in the same order as the picture fields. The expressions
providing the values should be separated by commas. The expressions are all evaluated in a list context
before the line is processed, so a single list expression could produce multiple list elements. The expressions
may be spread out to more than one line if enclosed in braces. If so, the opening brace must be the first
token on the first line. If an expression evaluates to a number with a decimal part, and if the corresponding
picture specifies that the decimal part should appear in the output (that is, any picture except multiple "#"
characters without an embedded "."), the character used for the decimal point is always determined by the
current LC_NUMERIC locale. This means that, if, for example, the run−time environment happens to
specify a German locale, "," will be used instead of the default ".". See perllocale and "WARNINGS" for
more information.
Picture fields that begin with ^ rather than @ are treated specially. With a # field, the field is blanked out if
the value is undefined. For other field types, the caret enables a kind of fill mode. Instead of an arbitrary
418
Version 5.005_02
18−Oct−1998
perlform
Perl Programmers Reference Guide
perlform
expression, the value supplied must be a scalar variable name that contains a text string. Perl puts as much
text as it can into the field, and then chops off the front of the string so that the next time the variable is
referenced, more of the text can be printed. (Yes, this means that the variable itself is altered during
execution of the write() call, and is not returned.) Normally you would use a sequence of fields in a
vertical stack to print out a block of text. You might wish to end the final field with the text "...", which will
appear in the output if the text was too long to appear in its entirety. You can change which characters are
legal to break on by changing the variable $: (that‘s $FORMAT_LINE_BREAK_CHARACTERS if you‘re
using the English module) to a list of the desired characters.
Using caret fields can produce variable length records. If the text to be formatted is short, you can suppress
blank lines by putting a "~" (tilde) character anywhere in the line. The tilde will be translated to a space
upon output. If you put a second tilde contiguous to the first, the line will be repeated until all the fields on
the line are exhausted. (If you use a field of the at variety, the expression you supply had better not give the
same value every time forever!)
Top−of−form processing is by default handled by a format with the same name as the current filehandle with
"_TOP" concatenated to it. It‘s triggered at the top of each page. See write.
Examples:
# a report on the /etc/passwd file
format STDOUT_TOP =
Passwd File
Name
Login
Office
Uid
Gid Home
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
.
format STDOUT =
@<<<<<<<<<<<<<<<<<< @||||||| @<<<<<<@>>>> @>>>> @<<<<<<<<<<<<<<<<<
$name,
$login, $office,$uid,$gid, $home
.
# a report from a bug report form
format STDOUT_TOP =
Bug Reports
@<<<<<<<<<<<<<<<<<<<<<<<
@|||
@>>>>>>>>>>>>>>>>>>>>>>>
$system,
$%,
$date
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
.
format STDOUT =
Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$subject
Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$index,
$description
Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$priority,
$date,
$description
From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$from,
$description
Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$programmer,
$description
~
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
~
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$description
18−Oct−1998
Version 5.005_02
419
perlform
Perl Programmers Reference Guide
~
perlform
^<<<<<<<<<<<<<<<<<<<<<<<...
$description
.
It is possible to intermix print()s with write()s on the same output channel, but you‘ll have to handle
$− ($FORMAT_LINES_LEFT) yourself.
Format Variables
The current format name is stored in the variable $~ ($FORMAT_NAME), and the current top of form
format name is in $^ ($FORMAT_TOP_NAME). The current output page number is stored in $%
($FORMAT_PAGE_NUMBER), and the number of lines on the page is in $=
($FORMAT_LINES_PER_PAGE). Whether to autoflush output on this handle is stored in $|
($OUTPUT_AUTOFLUSH). The string output before each top of page (except the first) is stored in $^L
($FORMAT_FORMFEED). These variables are set on a per−filehandle basis, so you‘ll need to select()
into a different one to affect them:
select((select(OUTF),
$~ = "My_Other_Format",
$^ = "My_Top_Format"
)[0]);
Pretty ugly, eh? It‘s a common idiom though, so don‘t be too surprised when you see it. You can at least
use a temporary variable to hold the previous filehandle: (this is a much better approach in general, because
not only does legibility improve, you now have intermediary stage in the expression to single−step the
debugger through):
$ofh = select(OUTF);
$~ = "My_Other_Format";
$^ = "My_Top_Format";
select($ofh);
If you use the English module, you can even read the variable names:
use English;
$ofh = select(OUTF);
$FORMAT_NAME
= "My_Other_Format";
$FORMAT_TOP_NAME = "My_Top_Format";
select($ofh);
But you still have those funny select()s. So just use the FileHandle module. Now, you can access these
special variables using lowercase method names instead:
use FileHandle;
format_name
OUTF "My_Other_Format";
format_top_name OUTF "My_Top_Format";
Much better!
NOTES
Because the values line may contain arbitrary expressions (for at fields, not caret fields), you can farm out
more sophisticated processing to other functions, like sprintf() or one of your own. For example:
format Ident =
@<<<<<<<<<<<<<<<
&commify($n)
.
To get a real at or caret into the field, do this:
format Ident =
I have an @ here.
420
Version 5.005_02
18−Oct−1998
perlform
Perl Programmers Reference Guide
perlform
"@"
.
To center a whole line of text, do something like this:
format Ident =
@|||||||||||||||||||||||||||||||||||||||||||||||
"Some text line"
.
There is no builtin way to say "float this to the right hand side of the page, however wide it is." You have to
specify where it goes. The truly desperate can generate their own format on the fly, based on the current
number of columns, and then eval() it:
$format
= "format STDOUT = \n"
. ’^’ . ’<’ x $cols . "\n"
. ’$entry’ . "\n"
. "\t^" . "<" x ($cols−8) . "~~\n"
. ’$entry’ . "\n"
. ".\n";
print $format if $Debugging;
eval $format;
die $@ if $@;
Which would generate a format looking something like this:
format STDOUT =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
$entry
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<~~
$entry
.
Here‘s a little program that‘s somewhat like fmt(1):
format =
^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~
$_
.
$/ = ’’;
while (<>) {
s/\s*\n\s*/ /g;
write;
}
Footers
While $FORMAT_TOP_NAME contains the name of the current header format, there is no corresponding
mechanism to automatically do the same thing for a footer. Not knowing how big a format is going to be
until you evaluate it is one of the major problems. It‘s on the TODO list.
Here‘s one strategy: If you have a fixed−size footer, you can get footers by checking
$FORMAT_LINES_LEFT before each write() and print the footer yourself if necessary.
Here‘s another strategy: Open a pipe to yourself, using open(MYSELF, "|−") (see open()) and always
write() to MYSELF instead of STDOUT. Have your child process massage its STDIN to rearrange
headers and footers however you like. Not very convenient, but doable.
18−Oct−1998
Version 5.005_02
421
perlform
Perl Programmers Reference Guide
perlform
Accessing Formatting Internals
For low−level access to the formatting mechanism. you may use formline() and access $^A (the
$ACCUMULATOR variable) directly.
For example:
$str = formline <<’END’, 1,2,3;
@<<< @||| @>>>
END
print "Wow, I just stored ‘$^A’ in the accumulator!\n";
Or to make an swrite() subroutine, which is to write() what sprintf() is to printf(), do this:
use Carp;
sub swrite {
croak "usage: swrite PICTURE ARGS" unless @_;
my $format = shift;
$^A = "";
formline($format,@_);
return $^A;
}
$string = swrite(<<’END’, 1, 2, 3);
Check me out
@<<< @||| @>>>
END
print $string;
WARNINGS
The lone dot that ends a format can also prematurely end a mail message passing through a misconfigured
Internet mailer (and based on experience, such misconfiguration is the rule, not the exception). So when
sending format code through mail, you should indent it so that the format−ending dot is not on the left
margin; this will prevent SMTP cutoff.
Lexical variables (declared with "my") are not visible within a format unless the format is declared within
the scope of the lexical variable. (They weren‘t visible at all before version 5.001.)
Formats are the only part of Perl that unconditionally use information from a program‘s locale; if a
program‘s environment specifies an LC_NUMERIC locale, it is always used to specify the decimal point
character in formatted output. Perl ignores all other aspects of locale handling unless the use locale
pragma is in effect. Formatted output cannot be controlled by use locale because the pragma is tied to
the block structure of the program, and, for historical reasons, formats exist outside that block structure. See
perllocale for further discussion of locale handling.
422
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
NAME
perlipc − Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)
DESCRIPTION
The basic IPC facilities of Perl are built out of the good old Unix signals, named pipes, pipe opens, the
Berkeley socket routines, and SysV IPC calls. Each is used in slightly different situations.
Signals
Perl uses a simple signal handling model: the %SIG hash contains names or references of user−installed
signal handlers. These handlers will be called with an argument which is the name of the signal that
triggered it. A signal may be generated intentionally from a particular keyboard sequence like control−C or
control−Z, sent to you from another process, or triggered automatically by the kernel when special events
transpire, like a child process exiting, your process running out of stack space, or hitting file size limit.
For example, to trap an interrupt signal, set up a handler like this. Do as little as you possibly can in your
handler; notice how all we do is set a global variable and then raise an exception. That‘s because on most
systems, libraries are not re−entrant; particularly, memory allocation and I/O routines are not. That means
that doing nearly anything in your handler could in theory trigger a memory fault and subsequent core dump.
sub catch_zap {
my $signame = shift;
$shucks++;
die "Somebody sent me a SIG$signame";
}
$SIG{INT} = ’catch_zap’; # could fail in modules
$SIG{INT} = \&catch_zap; # best strategy
The names of the signals are the ones listed out by kill −l on your system, or you can retrieve them from
the Config module. Set up an @signame list indexed by number to get the name and a %signo table indexed
by name to get the number:
use Config;
defined $Config{sig_name} || die "No sigs?";
foreach $name (split(’ ’, $Config{sig_name})) {
$signo{$name} = $i;
$signame[$i] = $name;
$i++;
}
So to check whether signal 17 and SIGALRM were the same, do just this:
print "signal #17 = $signame[17]\n";
if ($signo{ALRM}) {
print "SIGALRM is $signo{ALRM}\n";
}
You may also choose to assign the strings ‘IGNORE’ or ‘DEFAULT’ as the handler, in which case Perl
will try to discard the signal or do the default thing. Some signals can be neither trapped nor ignored, such as
the KILL and STOP (but not the TSTP) signals. One strategy for temporarily ignoring signals is to use a
local() statement, which will be automatically restored once your block is exited. (Remember that
local() values are "inherited" by functions called from within that block.)
sub precious {
local $SIG{INT} = ’IGNORE’;
&more_functions;
}
18−Oct−1998
Version 5.005_02
423
perlipc
Perl Programmers Reference Guide
perlipc
sub more_functions {
# interrupts still ignored, for now...
}
Sending a signal to a negative process ID means that you send the signal to the entire Unix process−group.
This code sends a hang−up signal to all processes in the current process group (and sets $SIG{HUP} to
IGNORE so it doesn‘t kill itself):
{
local $SIG{HUP} = ’IGNORE’;
kill HUP => −$$;
# snazzy writing of: kill(’HUP’, −$$)
}
Another interesting signal to send is signal number zero. This doesn‘t actually affect another process, but
instead checks whether it‘s alive or has changed its UID.
unless (kill 0 => $kid_pid) {
warn "something wicked happened to $kid_pid";
}
You might also want to employ anonymous functions for simple signal handlers:
$SIG{INT} = sub { die "\nOutta here!\n" };
But that will be problematic for the more complicated handlers that need to reinstall themselves. Because
Perl‘s signal mechanism is currently based on the signal(3) function from the C library, you may sometimes
be so misfortunate as to run on systems where that function is "broken", that is, it behaves in the old
unreliable SysV way rather than the newer, more reasonable BSD and POSIX fashion. So you‘ll see
defensive people writing signal handlers like this:
sub REAPER {
$waitedpid = wait;
# loathe sysV: it makes us not only reinstate
# the handler, but place it after the wait
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
# now do something that forks...
or even the more elaborate:
use POSIX ":sys_wait_h";
sub REAPER {
my $child;
while ($child = waitpid(−1,WNOHANG)) {
$Kid_Status{$child} = $?;
}
$SIG{CHLD} = \&REAPER; # still loathe sysV
}
$SIG{CHLD} = \&REAPER;
# do something that forks...
Signal handling is also used for timeouts in Unix, While safely protected within an eval{} block, you set
a signal handler to trap alarm signals and then schedule to have one delivered to you in some number of
seconds. Then try your blocking operation, clearing the alarm when it‘s done but not before you‘ve exited
your eval{} block. If it goes off, you‘ll use die() to jump out of the block, much as you might using
longjmp() or throw() in other languages.
Here‘s an example:
424
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
eval {
local $SIG{ALRM} = sub { die "alarm clock restart" };
alarm 10;
flock(FH, 2);
# blocking write lock
alarm 0;
};
if ($@ and $@ !~ /alarm clock restart/) { die }
For more complex signal handling, you might see the standard POSIX module. Lamentably, this is almost
entirely undocumented, but the t/lib/posix.t file from the Perl source distribution has some examples in it.
Named Pipes
A named pipe (often referred to as a FIFO) is an old Unix IPC mechanism for processes communicating on
the same machine. It works just like a regular, connected anonymous pipes, except that the processes
rendezvous using a filename and don‘t have to be related.
To create a named pipe, use the Unix command mknod(1) or on some systems, mkfifo(1). These may not be
in your normal path.
# system return val is backwards, so && not ||
#
$ENV{PATH} .= ":/etc:/usr/etc";
if (
system(’mknod’, $path, ’p’)
&& system(’mkfifo’, $path) )
{
die "mk{nod,fifo} $path failed";
}
A fifo is convenient when you want to connect a process to an unrelated one. When you open a fifo, the
program will block until there‘s something on the other end.
For example, let‘s say you‘d like to have your .signature file be a named pipe that has a Perl program on the
other end. Now every time any program (like a mailer, news reader, finger program, etc.) tries to read from
that file, the reading program will block and your program will supply the new signature. We‘ll use the
pipe−checking file test −p to find out whether anyone (or anything) has accidentally removed our fifo.
chdir; # go home
$FIFO = ’.signature’;
$ENV{PATH} .= ":/etc:/usr/games";
while (1) {
unless (−p $FIFO) {
unlink $FIFO;
system(’mknod’, $FIFO, ’p’)
&& die "can’t mknod $FIFO: $!";
}
# next line blocks until there’s a reader
open (FIFO, "> $FIFO") || die "can’t write $FIFO: $!";
print FIFO "John Smith (smith\@host.org)\n", ‘fortune −s‘;
close FIFO;
sleep 2;
# to avoid dup signals
}
WARNING
By installing Perl code to deal with signals, you‘re exposing yourself to danger from two things. First, few
system library functions are re−entrant. If the signal interrupts while Perl is executing one function (like
malloc(3) or printf(3)), and your signal handler then calls the same function again, you could get
unpredictable behavior—often, a core dump. Second, Perl isn‘t itself re−entrant at the lowest levels. If the
18−Oct−1998
Version 5.005_02
425
perlipc
Perl Programmers Reference Guide
perlipc
signal interrupts Perl while Perl is changing its own internal data structures, similarly unpredictable
behaviour may result.
There are two things you can do, knowing this: be paranoid or be pragmatic. The paranoid approach is to do
as little as possible in your signal handler. Set an existing integer variable that already has a value, and
return. This doesn‘t help you if you‘re in a slow system call, which will just restart. That means you have to
die to longjump(3) out of the handler. Even this is a little cavalier for the true paranoiac, who avoids die
in a handler because the system is out to get you. The pragmatic approach is to say ‘‘I know the risks, but
prefer the convenience‘’, and to do anything you want in your signal handler, prepared to clean up core
dumps now and again.
To forbid signal handlers altogether would bars you from many interesting programs, including virtually
everything in this manpage, since you could no longer even write SIGCHLD handlers. Their dodginess is
expected to be addresses in the 5.005 release.
Using open() for IPC
Perl‘s basic open() statement can also be used for unidirectional interprocess communication by either
appending or prepending a pipe symbol to the second argument to open(). Here‘s how to start something
up in a child process you intend to write to:
open(SPOOLER, "| cat −v | lpr −h 2>/dev/null")
|| die "can’t fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
And here‘s how to start up a child process you intend to read from:
open(STATUS, "netstat −an 2>&1 |")
|| die "can’t fork: $!";
while () {
next if /^(tcp|udp)/;
print;
}
close STATUS || die "bad netstat: $! $?";
If one can be sure that a particular program is a Perl script that is expecting filenames in @ARGV, the clever
programmer can write something like this:
% program f1 "cmd1|" − f2 "cmd2|" f3 < tmpfile
and irrespective of which shell it‘s called from, the Perl program will read from the file f1, the process cmd1,
standard input (tmpfile in this case), the f2 file, the cmd2 command, and finally the f3 file. Pretty nifty, eh?
You might notice that you could use backticks for much the same effect as opening a pipe for reading:
print grep { !/^(tcp|udp)/ } ‘netstat −an 2>&1‘;
die "bad netstat" if $?;
While this is true on the surface, it‘s much more efficient to process the file one line or record at a time
because then you don‘t have to read the whole thing into memory at once. It also gives you finer control of
the whole process, letting you to kill off the child process early if you‘d like.
Be careful to check both the open() and the close() return values. If you‘re writing to a pipe, you
should also trap SIGPIPE. Otherwise, think of what happens when you start up a pipe to a command that
doesn‘t exist: the open() will in all likelihood succeed (it only reflects the fork()‘s success), but then
your output will fail—spectacularly. Perl can‘t know whether the command worked because your command
is actually running in a separate process whose exec() might have failed. Therefore, while readers of
bogus commands return just a quick end of file, writers to bogus command will trigger a signal they‘d better
be prepared to handle. Consider:
426
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
open(FH, "|bogus") or die "can’t fork: $!";
print FH "bang\n"
or die "can’t write: $!";
close FH
or die "can’t close: $!";
That won‘t blow up until the close, and it will blow up with a SIGPIPE. To catch it, you could use this:
$SIG{PIPE} = ’IGNORE’;
open(FH, "|bogus") or die "can’t fork: $!";
print FH "bang\n"
or die "can’t write: $!";
close FH
or die "can’t close: status=$?";
Filehandles
Both the main process and any child processes it forks share the same STDIN, STDOUT, and STDERR
filehandles. If both processes try to access them at once, strange things can happen. You‘ll certainly want to
any stdio flush output buffers before forking. You may also want to close or reopen the filehandles for the
child. You can get around this by opening your pipe with open(), but on some systems this means that the
child process cannot outlive the parent.
Background Processes
You can run a command in the background with:
system("cmd &");
The command‘s STDOUT and STDERR (and possibly STDIN, depending on your shell) will be the same as
the parent‘s. You won‘t need to catch SIGCHLD because of the double−fork taking place (see below for
more details).
Complete Dissociation of Child from Parent
In some cases (starting server processes, for instance) you‘ll want to complete dissociate the child process
from the parent. The easiest way is to use:
use POSIX qw(setsid);
setsid()
or die "Can’t start a new session: $!";
However, you may not be on POSIX. The following process is reported to work on most Unixish systems.
Non−Unix users should check their Your_OS::Process module for other solutions.
Open /dev/tty and use the TIOCNOTTY ioctl on it. See tty(4) for details.
Change directory to /
Reopen STDIN, STDOUT, and STDERR so they‘re not connected to the old tty.
Background yourself like this:
fork && exit;
Ignore hangup signals in case you‘re running on a shell that doesn‘t automatically no−hup you:
$SIG{HUP} = ’IGNORE’;
# or whatever you’d like
Safe Pipe Opens
Another interesting approach to IPC is making your single program go multiprocess and communicate
between (or even amongst) yourselves. The open() function will accept a file argument of either "−|" or
"|−" to do a very interesting thing: it forks a child connected to the filehandle you‘ve opened. The child is
running the same program as the parent. This is useful for safely opening a file when running under an
assumed UID or GID, for example. If you open a pipe to minus, you can write to the filehandle you opened
and your kid will find it in his STDIN. If you open a pipe from minus, you can read from the filehandle you
opened whatever your kid writes to his STDOUT.
use English;
my $sleep_count = 0;
18−Oct−1998
Version 5.005_02
427
perlipc
Perl Programmers Reference Guide
perlipc
do {
$pid = open(KID_TO_WRITE, "|−");
unless (defined $pid) {
warn "cannot fork: $!";
die "bailing out" if $sleep_count++ > 6;
sleep 10;
}
} until defined $pid;
if ($pid) { # parent
print KID_TO_WRITE @some_data;
close(KID_TO_WRITE) || warn "kid exited $?";
} else {
# child
($EUID, $EGID) = ($UID, $GID); # suid progs only
open (FILE, "> /safe/file")
|| die "can’t open /safe/file: $!";
while () {
print FILE; # child’s STDIN is parent’s KID
}
exit; # don’t forget this
}
Another common use for this construct is when you need to execute something without the shell‘s
interference. With system(), it‘s straightforward, but you can‘t use a pipe open or backticks safely. That‘s
because there‘s no way to stop the shell from getting its hands on your arguments. Instead, use lower−level
control to call exec() directly.
Here‘s a safe backtick or pipe open for read:
# add error processing as above
$pid = open(KID_TO_READ, "−|");
if ($pid) {
# parent
while () {
# do something interesting
}
close(KID_TO_READ) || warn "kid exited $?";
} else {
# child
($EUID, $EGID) = ($UID, $GID); # suid only
exec($program, @options, @args)
|| die "can’t exec program: $!";
# NOTREACHED
}
And here‘s a safe pipe open for writing:
# add error processing as above
$pid = open(KID_TO_WRITE, "|−");
$SIG{ALRM} = sub { die "whoops, $program pipe broke" };
if ($pid) { # parent
for (@data) {
print KID_TO_WRITE;
}
close(KID_TO_WRITE) || warn "kid exited $?";
} else {
# child
($EUID, $EGID) = ($UID, $GID);
428
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
exec($program, @options, @args)
|| die "can’t exec program: $!";
# NOTREACHED
}
Note that these operations are full Unix forks, which means they may not be correctly implemented on alien
systems. Additionally, these are not true multithreading. If you‘d like to learn more about threading, see the
modules file mentioned below in the SEE ALSO section.
Bidirectional Communication with Another Process
While this works reasonably well for unidirectional communication, what about bidirectional
communication? The obvious thing you‘d like to do doesn‘t actually work:
open(PROG_FOR_READING_AND_WRITING, "| some program |")
and if you forget to use the −w flag, then you‘ll miss out entirely on the diagnostic message:
Can’t do bidirectional pipe at −e line 1.
If you really want to, you can use the standard open2() library function to catch both ends. There‘s also
an open3() for tridirectional I/O so you can also catch your child‘s STDERR, but doing so would then
require an awkward select() loop and wouldn‘t allow you to use normal Perl input operations.
If you look at its source, you‘ll see that open2() uses low−level primitives like Unix pipe() and
exec() calls to create all the connections. While it might have been slightly more efficient by using
socketpair(), it would have then been even less portable than it already is. The open2() and
open3() functions are unlikely to work anywhere except on a Unix system or some other one purporting
to be POSIX compliant.
Here‘s an example of using open2():
use FileHandle;
use IPC::Open2;
$pid = open2(*Reader, *Writer, "cat −u −n" );
Writer−>autoflush(); # default here, actually
print Writer "stuff\n";
$got = ;
The problem with this is that Unix buffering is really going to ruin your day. Even though your Writer
filehandle is auto−flushed, and the process on the other end will get your data in a timely manner, you can‘t
usually do anything to force it to give it back to you in a similarly quick fashion. In this case, we could,
because we gave cat a −u flag to make it unbuffered. But very few Unix commands are designed to operate
over pipes, so this seldom works unless you yourself wrote the program on the other end of the
double−ended pipe.
A solution to this is the nonstandard Comm.pl library. It uses pseudo−ttys to make your program behave
more reasonably:
require ’Comm.pl’;
$ph = open_proc(’cat −n’);
for (1..10) {
print $ph "a line\n";
print "got back ", scalar <$ph>;
}
This way you don‘t have to have control over the source code of the program you‘re using. The Comm
library also has expect() and interact() functions. Find the library (and we hope its successor
IPC::Chat) at your nearest CPAN archive as detailed in the SEE ALSO section below.
The newer Expect.pm module from CPAN also addresses this kind of thing. This module requires two other
modules from CPAN: IO::Pty and IO::Stty. It sets up a pseudo−terminal to interact with programs that insist
18−Oct−1998
Version 5.005_02
429
perlipc
Perl Programmers Reference Guide
perlipc
on using talking to the terminal device driver. If your system is amongst those supported, this may be your
best bet.
Bidirectional Communication with Yourself
If you want, you may make low−level pipe() and fork() to stitch this together by hand. This example
only talks to itself, but you could reopen the appropriate handles to STDIN and STDOUT and call other
processes.
#!/usr/bin/perl −w
# pipe1 − bidirectional communication using two pipe pairs
#
designed for the socketpair−challenged
use IO::Handle;
# thousands of lines just for autoflush :−(
pipe(PARENT_RDR, CHILD_WTR);
# XXX: failure?
pipe(CHILD_RDR, PARENT_WTR);
# XXX: failure?
CHILD_WTR−>autoflush(1);
PARENT_WTR−>autoflush(1);
if ($pid = fork) {
close PARENT_RDR; close PARENT_WTR;
print CHILD_WTR "Parent Pid $$ is sending this\n";
chomp($line = );
print "Parent Pid $$ just read this: ‘$line’\n";
close CHILD_RDR; close CHILD_WTR;
waitpid($pid,0);
} else {
die "cannot fork: $!" unless defined $pid;
close CHILD_RDR; close CHILD_WTR;
chomp($line = );
print "Child Pid $$ just read this: ‘$line’\n";
print PARENT_WTR "Child Pid $$ is sending this\n";
close PARENT_RDR; close PARENT_WTR;
exit;
}
But you don‘t actually have to make two pipe calls. If you have the socketpair() system call, it will do
this all for you.
#!/usr/bin/perl −w
# pipe2 − bidirectional communication using socketpair
#
"the best ones always go both ways"
use Socket;
use IO::Handle;
# thousands of lines just for autoflush :−(
# We say AF_UNIX because although *_LOCAL is the
# POSIX 1003.1g form of the constant, many machines
# still don’t have it.
socketpair(CHILD, PARENT, AF_UNIX, SOCK_STREAM, PF_UNSPEC)
or die "socketpair: $!";
CHILD−>autoflush(1);
PARENT−>autoflush(1);
if ($pid = fork) {
close PARENT;
print CHILD "Parent Pid $$ is sending this\n";
chomp($line = );
print "Parent Pid $$ just read this: ‘$line’\n";
close CHILD;
430
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
waitpid($pid,0);
} else {
die "cannot fork: $!" unless defined $pid;
close CHILD;
chomp($line = );
print "Child Pid $$ just read this: ‘$line’\n";
print PARENT "Child Pid $$ is sending this\n";
close PARENT;
exit;
}
Sockets: Client/Server Communication
While not limited to Unix−derived operating systems (e.g., WinSock on PCs provides socket support, as do
some VMS libraries), you may not have sockets on your system, in which case this section probably isn‘t
going to do you much good. With sockets, you can do both virtual circuits (i.e., TCP streams) and datagrams
(i.e., UDP packets). You may be able to do even more depending on your system.
The Perl function calls for dealing with sockets have the same names as the corresponding system calls in C,
but their arguments tend to differ for two reasons: first, Perl filehandles work differently than C file
descriptors. Second, Perl already knows the length of its strings, so you don‘t need to pass that information.
One of the major problems with old socket code in Perl was that it used hard−coded values for some of the
constants, which severely hurt portability. If you ever see code that does anything like explicitly setting
$AF_INET = 2, you know you‘re in for big trouble: An immeasurably superior approach is to use the
Socket module, which more reliably grants access to various constants and functions you‘ll need.
If you‘re not writing a server/client for an existing protocol like NNTP or SMTP, you should give some
thought to how your server will know when the client has finished talking, and vice−versa. Most protocols
are based on one−line messages and responses (so one party knows the other has finished when a "\n" is
received) or multi−line messages and responses that end with a period on an empty line ("\n.\n" terminates a
message/response).
Internet Line Terminators
The Internet line terminator is "\015\012". Under ASCII variants of Unix, that could usually be written as
"\r\n", but under other systems, "\r\n" might at times be "\015\015\012", "\012\012\015", or something
completely different. The standards specify writing "\015\012" to be conformant (be strict in what you
provide), but they also recommend accepting a lone "\012" on input (but be lenient in what you require). We
haven‘t always been very good about that in the code in this manpage, but unless you‘re on a Mac, you‘ll
probably be ok.
Internet TCP Clients and Servers
Use Internet−domain sockets when you want to do client−server communication that might extend to
machines outside of your own system.
Here‘s a sample TCP client using Internet−domain sockets:
#!/usr/bin/perl −w
use strict;
use Socket;
my ($remote,$port, $iaddr, $paddr, $proto, $line);
$remote = shift || ’localhost’;
$port
= shift || 2345; # random port
if ($port =~ /\D/) { $port = getservbyname($port, ’tcp’) }
die "No port" unless $port;
$iaddr
= inet_aton($remote)
|| die "no host: $remote";
$paddr
= sockaddr_in($port, $iaddr);
18−Oct−1998
Version 5.005_02
431
perlipc
Perl Programmers Reference Guide
perlipc
$proto
= getprotobyname(’tcp’);
socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
connect(SOCK, $paddr)
|| die "connect: $!";
while (defined($line = )) {
print $line;
}
close (SOCK)
exit;
|| die "close: $!";
And here‘s a corresponding server to go along with it. We‘ll leave the address as INADDR_ANY so that the
kernel can choose the appropriate interface on multihomed hosts. If you want sit on a particular interface
(like the external side of a gateway or firewall machine), you should fill this in with your real address
instead.
#!/usr/bin/perl −Tw
use strict;
BEGIN { $ENV{PATH} = ’/usr/ucb:/bin’ }
use Socket;
use Carp;
$EOL = "\015\012";
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
my $port = shift || 2345;
my $proto = getprotobyname(’tcp’);
$port = $1 if $port =~ /(\d+)/; # untaint port number
socket(Server, PF_INET, SOCK_STREAM, $proto)
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR,
pack("l", 1))
bind(Server, sockaddr_in($port, INADDR_ANY))
listen(Server,SOMAXCONN)
|| die "socket: $!";
|| die "setsockopt: $!";
|| die "bind: $!";
|| die "listen: $!";
logmsg "server started on port $port";
my $paddr;
$SIG{CHLD} = \&REAPER;
for ( ; $paddr = accept(Client,Server); close Client) {
my($port,$iaddr) = sockaddr_in($paddr);
my $name = gethostbyaddr($iaddr,AF_INET);
logmsg "connection from $name [",
inet_ntoa($iaddr), "]
at port $port";
print Client "Hello there, $name, it’s now ",
scalar localtime, $EOL;
}
And here‘s a multithreaded version. It‘s multithreaded in that like most typical servers, it spawns (forks) a
slave server to handle the client request so that the master server can quickly go back to service a new client.
#!/usr/bin/perl −Tw
use strict;
BEGIN { $ENV{PATH} = ’/usr/ucb:/bin’ }
use Socket;
use Carp;
$EOL = "\015\012";
432
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
sub spawn; # forward declaration
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
my $port = shift || 2345;
my $proto = getprotobyname(’tcp’);
$port = $1 if $port =~ /(\d+)/; # untaint port number
socket(Server, PF_INET, SOCK_STREAM, $proto)
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR,
pack("l", 1))
bind(Server, sockaddr_in($port, INADDR_ANY))
listen(Server,SOMAXCONN)
|| die "socket: $!";
|| die "setsockopt: $!";
|| die "bind: $!";
|| die "listen: $!";
logmsg "server started on port $port";
my $waitedpid = 0;
my $paddr;
sub REAPER {
$waitedpid = wait;
$SIG{CHLD} = \&REAPER; # loathe sysV
logmsg "reaped $waitedpid" . ($? ? " with exit $?" : ’’);
}
$SIG{CHLD} = \&REAPER;
for ( $waitedpid = 0;
($paddr = accept(Client,Server)) || $waitedpid;
$waitedpid = 0, close Client)
{
next if $waitedpid and not $paddr;
my($port,$iaddr) = sockaddr_in($paddr);
my $name = gethostbyaddr($iaddr,AF_INET);
logmsg "connection from $name [",
inet_ntoa($iaddr), "]
at port $port";
spawn sub {
print "Hello there, $name, it’s now ", scalar localtime, $EOL;
exec ’/usr/games/fortune’
# XXX: ‘wrong’ line terminators
or confess "can’t exec fortune: $!";
};
}
sub spawn {
my $coderef = shift;
unless (@_ == 0 && $coderef && ref($coderef) eq ’CODE’) {
confess "usage: spawn CODEREF";
}
my $pid;
if (!defined($pid = fork)) {
logmsg "cannot fork: $!";
return;
} elsif ($pid) {
logmsg "begat $pid";
return; # I’m the parent
}
18−Oct−1998
Version 5.005_02
433
perlipc
Perl Programmers Reference Guide
perlipc
# else I’m the child −− go spawn
open(STDIN, "<&Client")
|| die "can’t dup client to stdin";
open(STDOUT, ">&Client")
|| die "can’t dup client to stdout";
## open(STDERR, ">&STDOUT") || die "can’t dup stdout to stderr";
exit &$coderef();
}
This server takes the trouble to clone off a child version via fork() for each incoming request. That way it
can handle many requests at once, which you might not always want. Even if you don‘t fork(), the
listen() will allow that many pending connections. Forking servers have to be particularly careful about
cleaning up their dead children (called "zombies" in Unix parlance), because otherwise you‘ll quickly fill up
your process table.
We suggest that you use the −T flag to use taint checking (see perlsec) even if we aren‘t running setuid or
setgid. This is always a good idea for servers and other programs run on behalf of someone else (like CGI
scripts), because it lessens the chances that people from the outside will be able to compromise your system.
Let‘s look at another TCP client. This one connects to the TCP "time" service on a number of different
machines and shows how far their clocks differ from the system on which it‘s being run:
#!/usr/bin/perl
use strict;
use Socket;
−w
my $SECS_of_70_YEARS = 2208988800;
sub ctime { scalar localtime(shift) }
my $iaddr = gethostbyname(’localhost’);
my $proto = getprotobyname(’tcp’);
my $port = getservbyname(’time’, ’tcp’);
my $paddr = sockaddr_in(0, $iaddr);
my($host);
$| = 1;
printf "%−24s %8s %s\n",
"localhost", 0, ctime(time());
foreach $host (@ARGV) {
printf "%−24s ", $host;
my $hisiaddr = inet_aton($host)
|| die "unknown host";
my $hispaddr = sockaddr_in($port, $hisiaddr);
socket(SOCKET, PF_INET, SOCK_STREAM, $proto)
|| die "socket: $!";
connect(SOCKET, $hispaddr)
|| die "bind: $!";
my $rtime = ’
’;
read(SOCKET, $rtime, 4);
close(SOCKET);
my $histime = unpack("N", $rtime) − $SECS_of_70_YEARS ;
printf "%8d %s\n", $histime − time, ctime($histime);
}
Unix−Domain TCP Clients and Servers
That‘s fine for Internet−domain clients and servers, but what about local communications? While you can
use the same setup, sometimes you don‘t want to. Unix−domain sockets are local to the current host, and are
often used internally to implement pipes. Unlike Internet domain sockets, Unix domain sockets can show up
in the file system with an ls(1) listing.
% ls −l /dev/log
srw−rw−rw− 1 root
0 Oct 31 07:23 /dev/log
You can test for these with Perl‘s −S file test:
434
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
unless ( −S ’/dev/log’ ) {
die "something’s wicked with the print system";
}
Here‘s a sample Unix−domain client:
#!/usr/bin/perl −w
use Socket;
use strict;
my ($rendezvous, $line);
$rendezvous = shift || ’/tmp/catsock’;
socket(SOCK, PF_UNIX, SOCK_STREAM, 0)
connect(SOCK, sockaddr_un($rendezvous))
while (defined($line = )) {
print $line;
}
exit;
|| die "socket: $!";
|| die "connect: $!";
And here‘s a corresponding server. You don‘t have to worry about silly network terminators here because
Unix domain sockets are guaranteed to be on the localhost, and thus everything works right.
#!/usr/bin/perl −Tw
use strict;
use Socket;
use Carp;
BEGIN { $ENV{PATH} = ’/usr/ucb:/bin’ }
sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" }
my $NAME = ’/tmp/catsock’;
my $uaddr = sockaddr_un($NAME);
my $proto = getprotobyname(’tcp’);
socket(Server,PF_UNIX,SOCK_STREAM,0)
unlink($NAME);
bind (Server, $uaddr)
listen(Server,SOMAXCONN)
|| die "socket: $!";
|| die "bind: $!";
|| die "listen: $!";
logmsg "server started on $NAME";
my $waitedpid;
sub REAPER {
$waitedpid = wait;
$SIG{CHLD} = \&REAPER; # loathe sysV
logmsg "reaped $waitedpid" . ($? ? " with exit $?" : ’’);
}
$SIG{CHLD} = \&REAPER;
for ( $waitedpid = 0;
accept(Client,Server) || $waitedpid;
$waitedpid = 0, close Client)
{
next if $waitedpid;
logmsg "connection on $NAME";
spawn sub {
print "Hello there, it’s now ", scalar localtime, "\n";
exec ’/usr/games/fortune’ or die "can’t exec fortune: $!";
};
18−Oct−1998
Version 5.005_02
435
perlipc
Perl Programmers Reference Guide
perlipc
}
As you see, it‘s remarkably similar to the Internet domain TCP server, so much so, in fact, that we‘ve
omitted several duplicate functions—spawn(), logmsg(), ctime(), and REAPER()—which are
exactly the same as in the other server.
So why would you ever want to use a Unix domain socket instead of a simpler named pipe? Because a
named pipe doesn‘t give you sessions. You can‘t tell one process‘s data from another‘s. With socket
programming, you get a separate session for each client: that‘s why accept() takes two arguments.
For example, let‘s say that you have a long running database server daemon that you want folks from the
World Wide Web to be able to access, but only if they go through a CGI interface. You‘d have a small,
simple CGI program that does whatever checks and logging you feel like, and then acts as a Unix−domain
client and connects to your private server.
TCP Clients with IO::Socket
For those preferring a higher−level interface to socket programming, the IO::Socket module provides an
object−oriented approach. IO::Socket is included as part of the standard Perl distribution as of the 5.004
release. If you‘re running an earlier version of Perl, just fetch IO::Socket from CPAN, where you‘ll also find
find modules providing easy interfaces to the following systems: DNS, FTP, Ident (RFC 931), NIS and
NISPlus, NNTP, Ping, POP3, SMTP, SNMP, SSLeay, Telnet, and Time—just to name a few.
A Simple Client
Here‘s a client that creates a TCP connection to the "daytime" service at port 13 of the host name "localhost"
and prints out everything that the server there cares to provide.
#!/usr/bin/perl −w
use IO::Socket;
$remote = IO::Socket::INET−>new(
Proto
=> "tcp",
PeerAddr => "localhost",
PeerPort => "daytime(13)",
)
or die "cannot connect to daytime port at localhost";
while ( <$remote> ) { print }
When you run this program, you should get something back that looks like this:
Wed May 14 08:40:46 MDT 1997
Here are what those parameters to the new constructor mean:
Proto
This is which protocol to use. In this case, the socket handle returned will be connected to a TCP
socket, because we want a stream−oriented connection, that is, one that acts pretty much like a plain
old file. Not all sockets are this of this type. For example, the UDP protocol can be used to make a
datagram socket, used for message−passing.
PeerAddr
This is the name or Internet address of the remote host the server is running on. We could have
specified a longer name like "www.perl.com", or an address like "204.148.40.9". For
demonstration purposes, we‘ve used the special hostname "localhost", which should always mean
the current machine you‘re running on. The corresponding Internet address for localhost is "127.1",
if you‘d rather use that.
PeerPort
This is the service name or port number we‘d like to connect to. We could have gotten away with using
just "daytime" on systems with a well−configured system services file,[FOOTNOTE: The system
services file is in /etc/services under Unix] but just in case, we‘ve specified the port number (13) in
parentheses. Using just the number would also have worked, but constant numbers make careful
436
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
programmers nervous.
Notice how the return value from the new constructor is used as a filehandle in the while loop? That‘s
what‘s called an indirect filehandle, a scalar variable containing a filehandle. You can use it the same way
you would a normal filehandle. For example, you can read one line from it this way:
$line = <$handle>;
all remaining lines from is this way:
@lines = <$handle>;
and send a line of data to it this way:
print $handle "some data\n";
A Webget Client
Here‘s a simple client that takes a remote host to fetch a document from, and then a list of documents to get
from that host. This is a more interesting client than the previous one because it first sends something to the
server before fetching the server‘s response.
#!/usr/bin/perl −w
use IO::Socket;
unless (@ARGV > 1) { die "usage: $0 host document ..." }
$host = shift(@ARGV);
$EOL = "\015\012";
$BLANK = $EOL x 2;
foreach $document ( @ARGV ) {
$remote = IO::Socket::INET−>new( Proto
=> "tcp",
PeerAddr => $host,
PeerPort => "http(80)",
);
unless ($remote) { die "cannot connect to http daemon on $host" }
$remote−>autoflush(1);
print $remote "GET $document HTTP/1.0" . $BLANK;
while ( <$remote> ) { print }
close $remote;
}
The web server handing the "http" service, which is assumed to be at its standard port, number 80. If your
the web server you‘re trying to connect to is at a different port (like 1080 or 8080), you should specify as the
named−parameter pair, PeerPort => 8080. The autoflush method is used on the socket because
otherwise the system would buffer up the output we sent it. (If you‘re on a Mac, you‘ll also need to change
every "\n" in your code that sends data over the network to be a "\015\012" instead.)
Connecting to the server is only the first part of the process: once you have the connection, you have to use
the server‘s language. Each server on the network has its own little command language that it expects as
input. The string that we send to the server starting with "GET" is in HTTP syntax. In this case, we simply
request each specified document. Yes, we really are making a new connection for each document, even
though it‘s the same host. That‘s the way you always used to have to speak HTTP. Recent versions of web
browsers may request that the remote server leave the connection open a little while, but the server doesn‘t
have to honor such a request.
Here‘s an example of running that program, which we‘ll call webget:
% webget www.perl.com /guanaco.html
HTTP/1.1 404 File Not Found
Date: Thu, 08 May 1997 18:02:32 GMT
Server: Apache/1.2b6
Connection: close
18−Oct−1998
Version 5.005_02
437
perlipc
Perl Programmers Reference Guide
perlipc
Content−type: text/html
404 File Not Found
File Not Found
The requested URL /guanaco.html was not found on this server.
Ok, so that‘s not very interesting, because it didn‘t find that particular document. But a long response
wouldn‘t have fit on this page.
For a more fully−featured version of this program, you should look to the lwp−request program included
with the LWP modules from CPAN.
Interactive Client with IO::Socket
Well, that‘s all fine if you want to send one command and get one answer, but what about setting up
something fully interactive, somewhat like the way telnet works? That way you can type a line, get the
answer, type a line, get the answer, etc.
This client is more complicated than the two we‘ve done so far, but if you‘re on a system that supports the
powerful fork call, the solution isn‘t that rough. Once you‘ve made the connection to whatever service
you‘d like to chat with, call fork to clone your process. Each of these two identical process has a very
simple job to do: the parent copies everything from the socket to standard output, while the child
simultaneously copies everything from standard input to the socket. To accomplish the same thing using just
one process would be much harder, because it‘s easier to code two processes to do one thing than it is to code
one process to do two things. (This keep−it−simple principle a cornerstones of the Unix philosophy, and
good software engineering as well, which is probably why it‘s spread to other systems.)
Here‘s the code:
#!/usr/bin/perl −w
use strict;
use IO::Socket;
my ($host, $port, $kidpid, $handle, $line);
unless (@ARGV == 2) { die "usage: $0 host port" }
($host, $port) = @ARGV;
# create a tcp connection to the specified host and port
$handle = IO::Socket::INET−>new(Proto
=> "tcp",
PeerAddr => $host,
PeerPort => $port)
or die "can’t connect to port $port on $host: $!";
$handle−>autoflush(1);
# so output gets there right away
print STDERR "[Connected to $host:$port]\n";
# split the program into two processes, identical twins
die "can’t fork: $!" unless defined($kidpid = fork());
# the if{} block runs only in the parent process
if ($kidpid) {
# copy the socket to standard output
while (defined ($line = <$handle>)) {
print STDOUT $line;
}
kill("TERM", $kidpid);
# send SIGTERM to child
}
# the else{} block runs only in the child process
else {
# copy standard input to the socket
438
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
while (defined ($line = )) {
print $handle $line;
}
}
The kill function in the parent‘s if block is there to send a signal to our child process (current running in
the else block) as soon as the remote server has closed its end of the connection.
If the remote server sends data a byte at time, and you need that data immediately without waiting for a
newline (which might not happen), you may wish to replace the while loop in the parent with the
following:
my $byte;
while (sysread($handle, $byte, 1) == 1) {
print STDOUT $byte;
}
Making a system call for each byte you want to read is not very efficient (to put it mildly) but is the simplest
to explain and works reasonably well.
TCP Servers with IO::Socket
As always, setting up a server is little bit more involved than running a client. The model is that the server
creates a special kind of socket that does nothing but listen on a particular port for incoming connections. It
does this by calling the IO::Socket::INET−>new() method with slightly different arguments than the
client did.
Proto
This is which protocol to use. Like our clients, we‘ll still specify "tcp" here.
LocalPort
We specify a local port in the LocalPort argument, which we didn‘t do for the client. This is service
name or port number for which you want to be the server. (Under Unix, ports under 1024 are restricted
to the superuser.) In our sample, we‘ll use port 9000, but you can use any port that‘s not currently in
use on your system. If you try to use one already in used, you‘ll get an "Address already in use"
message. Under Unix, the netstat −a command will show which services current have servers.
Listen
The Listen parameter is set to the maximum number of pending connections we can accept until we
turn away incoming clients. Think of it as a call−waiting queue for your telephone. The low−level
Socket module has a special symbol for the system maximum, which is SOMAXCONN.
Reuse
The Reuse parameter is needed so that we restart our server manually without waiting a few minutes
to allow system buffers to clear out.
Once the generic server socket has been created using the parameters listed above, the server then waits for a
new client to connect to it. The server blocks in the accept method, which eventually an bidirectional
connection to the remote client. (Make sure to autoflush this handle to circumvent buffering.)
To add to user−friendliness, our server prompts the user for commands. Most servers don‘t do this. Because
of the prompt without a newline, you‘ll have to use the sysread variant of the interactive client above.
This server accepts one of five different commands, sending output back to the client. Note that unlike most
network servers, this one only handles one incoming client at a time. Multithreaded servers are covered in
Chapter 6 of the Camel as well as later in this manpage.
Here‘s the code. We‘ll
#!/usr/bin/perl −w
use IO::Socket;
use Net::hostent;
18−Oct−1998
# for OO version of gethostbyaddr
Version 5.005_02
439
perlipc
Perl Programmers Reference Guide
$PORT = 9000;
perlipc
# pick something not in use
$server = IO::Socket::INET−>new( Proto
LocalPort
Listen
Reuse
=>
=>
=>
=>
’tcp’,
$PORT,
SOMAXCONN,
1);
die "can’t setup server" unless $server;
print "[Server $0 accepting clients]\n";
while ($client = $server−>accept()) {
$client−>autoflush(1);
print $client "Welcome to $0; type help for command list.\n";
$hostinfo = gethostbyaddr($client−>peeraddr);
printf "[Connect from %s]\n", $hostinfo−>name || $client−>peerhost;
print $client "Command? ";
while ( <$client>) {
next unless /\S/;
# blank line
if
(/quit|exit/i)
{ last;
elsif (/date|time/i)
{ printf $client "%s\n", scalar localtime;
elsif (/who/i )
{ print $client ‘who 2>&1‘;
elsif (/cookie/i )
{ print $client ‘/usr/games/fortune 2>&1‘;
elsif (/motd/i )
{ print $client ‘cat /etc/motd 2>&1‘;
else {
print $client "Commands: quit date who cookie motd\n";
}
} continue {
print $client "Command? ";
}
close $client;
}
}
}
}
}
}
UDP: Message Passing
Another kind of client−server setup is one that uses not connections, but messages. UDP communications
involve much lower overhead but also provide less reliability, as there are no promises that messages will
arrive at all, let alone in order and unmangled. Still, UDP offers some advantages over TCP, including being
able to "broadcast" or "multicast" to a whole bunch of destination hosts at once (usually on your local
subnet). If you find yourself overly concerned about reliability and start building checks into your message
system, then you probably should use just TCP to start with.
Here‘s a UDP program similar to the sample Internet TCP client given earlier. However, instead of checking
one host at a time, the UDP version will check many of them asynchronously by simulating a multicast and
then using select() to do a timed−out wait for I/O. To do something similar with TCP, you‘d have to use
a different socket handle for each host.
#!/usr/bin/perl −w
use strict;
use Socket;
use Sys::Hostname;
my ( $count, $hisiaddr, $hispaddr, $histime,
$host, $iaddr, $paddr, $port, $proto,
$rin, $rout, $rtime, $SECS_of_70_YEARS);
$SECS_of_70_YEARS
= 2208988800;
$iaddr = gethostbyname(hostname());
$proto = getprotobyname(’udp’);
$port = getservbyname(’time’, ’udp’);
440
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
$paddr = sockaddr_in(0, $iaddr); # 0 means let kernel pick
socket(SOCKET, PF_INET, SOCK_DGRAM, $proto)
bind(SOCKET, $paddr)
|| die "socket: $!";
|| die "bind: $!";
$| = 1;
printf "%−12s %8s %s\n", "localhost", 0, scalar localtime time;
$count = 0;
for $host (@ARGV) {
$count++;
$hisiaddr = inet_aton($host)
|| die "unknown host";
$hispaddr = sockaddr_in($port, $hisiaddr);
defined(send(SOCKET, 0, 0, $hispaddr))
|| die "send $host: $!";
}
$rin = ’’;
vec($rin, fileno(SOCKET), 1) = 1;
# timeout after 10.0 seconds
while ($count && select($rout = $rin, undef, undef, 10.0)) {
$rtime = ’’;
($hispaddr = recv(SOCKET, $rtime, 4, 0))
|| die "recv: $!";
($port, $hisiaddr) = sockaddr_in($hispaddr);
$host = gethostbyaddr($hisiaddr, AF_INET);
$histime = unpack("N", $rtime) − $SECS_of_70_YEARS ;
printf "%−12s ", $host;
printf "%8d %s\n", $histime − time, scalar localtime($histime);
$count−−;
}
SysV IPC
While System V IPC isn‘t so widely used as sockets, it still has some interesting uses. You can‘t, however,
effectively use SysV IPC or Berkeley mmap() to have shared memory so as to share a variable amongst
several processes. That‘s because Perl would reallocate your string when you weren‘t wanting it to.
Here‘s a small example showing shared memory usage.
use IPC::SysV qw(IPC_PRIVATE IPC_RMID S_IRWXU S_IRWXG S_IRWXO);
$size = 2000;
$key = shmget(IPC_PRIVATE, $size, S_IRWXU|S_IRWXG|S_IRWXO) || die "$!";
print "shm key $key\n";
$message = "Message #1";
shmwrite($key, $message, 0, 60) || die "$!";
print "wrote: ’$message’\n";
shmread($key, $buff, 0, 60) || die "$!";
print "read : ’$buff’\n";
# the buffer of shmread is zero−character end−padded.
substr($buff, index($buff, "\0")) = ’’;
print "un" unless $buff eq $message;
print "swell\n";
print "deleting shm $key\n";
shmctl($key, IPC_RMID, 0) || die "$!";
Here‘s an example of a semaphore:
use IPC::SysV qw(IPC_CREAT);
18−Oct−1998
Version 5.005_02
441
perlipc
Perl Programmers Reference Guide
perlipc
$IPC_KEY = 1234;
$key = semget($IPC_KEY, 10, 0666 | IPC_CREAT ) || die "$!";
print "shm key $key\n";
Put this code in a separate file to be run in more than one process. Call the file take:
# create a semaphore
$IPC_KEY = 1234;
$key = semget($IPC_KEY,
die if !defined($key);
0 , 0 );
$semnum = 0;
$semflag = 0;
# ’take’ semaphore
# wait for semaphore to be zero
$semop = 0;
$opstring1 = pack("sss", $semnum, $semop, $semflag);
# Increment the semaphore count
$semop = 1;
$opstring2 = pack("sss", $semnum, $semop,
$opstring = $opstring1 . $opstring2;
$semflag);
semop($key,$opstring) || die "$!";
Put this code in a separate file to be run in more than one process. Call this file give:
# ’give’ the semaphore
# run this in the original process and you will see
# that the second process continues
$IPC_KEY = 1234;
$key = semget($IPC_KEY, 0, 0);
die if !defined($key);
$semnum = 0;
$semflag = 0;
# Decrement the semaphore count
$semop = −1;
$opstring = pack("sss", $semnum, $semop, $semflag);
semop($key,$opstring) || die "$!";
The SysV IPC code above was written long ago, and it‘s definitely clunky looking. For a more modern look,
see the IPC::SysV module which is included with Perl starting from Perl 5.005.
NOTES
Most of these routines quietly but politely return undef when they fail instead of causing your program to
die right then and there due to an uncaught exception. (Actually, some of the new Socket conversion
functions croak() on bad arguments.) It is therefore essential to check return values from these functions.
Always begin your socket programs this way for optimal success, and don‘t forget to add −T taint checking
flag to the #! line for servers:
#!/usr/bin/perl −Tw
use strict;
use sigtrap;
use Socket;
442
Version 5.005_02
18−Oct−1998
perlipc
Perl Programmers Reference Guide
perlipc
BUGS
All these routines create system−specific portability problems. As noted elsewhere, Perl is at the mercy of
your C libraries for much of its system behaviour. It‘s probably safest to assume broken SysV semantics for
signals and to stick with simple TCP and UDP socket operations; e.g., don‘t try to pass open file descriptors
over a local UDP datagram socket if you want your code to stand a chance of being portable.
As mentioned in the signals section, because few vendors provide C libraries that are safely re−entrant, the
prudent programmer will do little else within a handler beyond setting a numeric variable that already exists;
or, if locked into a slow (restarting) system call, using die() to raise an exception and longjmp(3) out. In
fact, even these may in some cases cause a core dump. It‘s probably best to avoid signals except where they
are absolutely inevitable. This will be addressed in a future release of Perl.
AUTHOR
Tom Christiansen, with occasional vestiges of Larry Wall‘s original version and suggestions from the Perl
Porters.
SEE ALSO
There‘s a lot more to networking than this, but this should get you started.
For intrepid programmers, the indispensable textbook is Unix Network Programming by W. Richard Stevens
(published by Addison−Wesley). Note that most books on networking address networking from the
perspective of a C programmer; translation to Perl is left as an exercise for the reader.
The IO::Socket(3) manpage describes the object library, and the Socket(3) manpage describes the low−level
interface to sockets. Besides the obvious functions in perlfunc, you should also check out the modules file at
your nearest CPAN site. (See perlmodlib or best yet, the Perl FAQ for a description of what CPAN is and
where to get it.)
Section 5 of the modules file is devoted to "Networking, Device Control (modems), and Interprocess
Communication", and contains numerous unbundled modules numerous networking modules, Chat and
Expect operations, CGI programming, DCE, FTP, IPC, NNTP, Proxy, Ptty, RPC, SNMP, SMTP, Telnet,
Threads, and ToolTalk—just to name a few.
18−Oct−1998
Version 5.005_02
443
perlsec
Perl Programmers Reference Guide
perlsec
NAME
perlsec − Perl security
DESCRIPTION
Perl is designed to make it easy to program securely even when running with extra privileges, like setuid or
setgid programs. Unlike most command line shells, which are based on multiple substitution passes on each
line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags. Additionally,
because the language has more builtin functionality, it can rely less upon external (and possibly
untrustworthy) programs to accomplish its purposes.
Perl automatically enables a set of special security checks, called taint mode, when it detects its program
running with differing real and effective user or group IDs. The setuid bit in Unix permissions is mode
04000, the setgid bit mode 02000; either or both may be set. You can also enable taint mode explicitly by
using the −T command line flag. This flag is strongly suggested for server programs and any program run on
behalf of someone else, such as a CGI script. Once taint mode is on, it‘s on for the remainder of your script.
While in this mode, Perl takes special precautions called taint checks to prevent both obvious and subtle
traps. Some of these checks are reasonably simple, such as verifying that path directories aren‘t writable by
others; careful programmers have always used checks like these. Other checks, however, are best supported
by the language itself, and it is these checks especially that contribute to making a set−id Perl program more
secure than the corresponding C program.
You may not use data derived from outside your program to affect something else outside your program—at
least, not by accident. All command line arguments, environment variables, locale information (see
perllocale), results of certain system calls (readdir, readlink, the gecos field of getpw* calls), and all file
input are marked as "tainted". Tainted data may not be used directly or indirectly in any command that
invokes a sub−shell, nor in any command that modifies files, directories, or processes. (Important
exception: If you pass a list of arguments to either system or exec, the elements of that list are NOT
checked for taintedness.) Any variable set to a value derived from tainted data will itself be tainted, even if it
is logically impossible for the tainted data to alter the variable. Because taintedness is associated with each
scalar value, some elements of an array can be tainted and others not.
For example:
$arg = shift;
# $arg is tainted
$hid = $arg, ’bar’;
# $hid is also tainted
$line = <>;
# Tainted
$line = ;
# Also tainted
open FOO, "/home/me/bar" or die $!;
$line = ;
# Still tainted
$path = $ENV{’PATH’};
# Tainted, but see below
$data = ’abc’;
# Not tainted
system
system
system
system
"echo $arg";
"/bin/echo", $arg;
"echo $hid";
"echo $data";
$path = $ENV{’PATH’};
#
#
#
#
Insecure
Secure (doesn’t use sh)
Insecure
Insecure until PATH set
# $path now tainted
$ENV{’PATH’} = ’/bin:/usr/bin’;
delete @ENV{’IFS’, ’CDPATH’, ’ENV’, ’BASH_ENV’};
444
$path = $ENV{’PATH’};
system "echo $data";
# $path now NOT tainted
# Is secure now!
open(FOO, "< $arg");
open(FOO, "> $arg");
# OK − read−only file
# Not OK − trying to write
Version 5.005_02
18−Oct−1998
perlsec
Perl Programmers Reference Guide
perlsec
open(FOO,"echo $arg|");
# Not OK, but...
open(FOO,"−|")
or exec ’echo’, $arg;
# OK
$shout = ‘echo $arg‘;
# Insecure, $shout now tainted
unlink $data, $arg;
umask $arg;
# Insecure
# Insecure
exec "echo $arg";
exec "echo", $arg;
exec "sh", ’−c’, $arg;
# Insecure
# Secure (doesn’t use the shell)
# Considered secure, alas!
@files = <*.c>;
@files = glob(’*.c’);
# Always insecure (uses csh)
# Always insecure (uses csh)
If you try to do something insecure, you will get a fatal error saying something like "Insecure dependency"
or "Insecure $ENV{PATH}". Note that you can still write an insecure system or exec, but only by
explicitly doing something like the "considered secure" example above.
Laundering and Detecting Tainted Data
To test whether a variable contains tainted data, and whose use would thus trigger an "Insecure dependency"
message, check your nearby CPAN mirror for the Taint.pm module, which should become available around
November 1997. Or you may be able to use the following is_tainted() function.
sub is_tainted {
return ! eval {
join(’’,@_), kill 0;
1;
};
}
This function makes use of the fact that the presence of tainted data anywhere within an expression renders
the entire expression tainted. It would be inefficient for every operator to test every argument for
taintedness. Instead, the slightly more efficient and conservative approach is used that if any tainted value
has been accessed within the same expression, the whole expression is considered tainted.
But testing for taintedness gets you only so far. Sometimes you have just to clear your data‘s taintedness.
The only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression
match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were
doing when you wrote the pattern. That means using a bit of thought—don‘t just blindly untaint anything, or
you defeat the entire mechanism. It‘s better to verify that the variable has only good characters (for certain
values of "good") rather than checking whether it has any bad characters. That‘s because it‘s far too easy to
miss bad characters that you never thought of.
Here‘s a test to make sure that the data contains nothing but "word" characters (alphabetics, numerics, and
underscores), a hyphen, an at sign, or a dot.
if ($data =~ /^([−\@\w.]+)$/) {
$data = $1;
} else {
die "Bad data in $data";
}
# $data now untainted
# log this somewhere
This is fairly secure because /\w+/ doesn‘t normally match shell metacharacters, nor are dot, dash, or at
going to mean something special to the shell. Use of /.+/ would have been insecure in theory because it
lets everything through, but Perl doesn‘t check for that. The lesson is that when untainting, you must be
exceedingly careful with your patterns. Laundering data using regular expression is the ONLY mechanism for
untainting dirty data, unless you use the strategy detailed below to fork a child of lesser privilege.
18−Oct−1998
Version 5.005_02
445
perlsec
Perl Programmers Reference Guide
perlsec
The example does not untaint $data if use locale is in effect, because the characters matched by \w
are determined by the locale. Perl considers that locale definitions are untrustworthy because they contain
data from outside the program. If you are writing a locale−aware program, and want to launder data with a
regular expression containing \w, put no locale ahead of the expression in the same block. See
SECURITY for further discussion and examples.
Switches On the "#!" Line
When you make a script executable, in order to make it usable as a command, the system will pass switches
to perl from the script‘s #! line. Perl checks that any command line switches given to a setuid (or setgid)
script actually match the ones set on the #! line. Some Unix and Unix−like environments impose a
one−switch limit on the #! line, so you may need to use something like −wU instead of −w −U under such
systems. (This issue should arise only in Unix or Unix−like environments that support #! and setuid or
setgid scripts.)
Cleaning Up Your Path
For "Insecure $ENV{PATH}" messages, you need to set $ENV{‘PATH‘} to a known value, and each
directory in the path must be non−writable by others than its owner and group. You may be surprised to get
this message even if the pathname to your executable is fully qualified. This is not generated because you
didn‘t supply a full path to the program; instead, it‘s generated because you never set your PATH
environment variable, or you didn‘t set it to something that was safe. Because Perl can‘t guarantee that the
executable in question isn‘t itself going to turn around and execute some other program that is dependent on
your PATH, it makes sure you set the PATH.
The PATH isn‘t the only environment variable which can cause problems. Because some shells may use the
variables IFS, CDPATH, ENV, and BASH_ENV, Perl checks that those are either empty or untainted when
starting subprocesses. You may wish to add something like this to your setid and taint−checking scripts.
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};
# Make %ENV safer
It‘s also possible to get into trouble with other operations that don‘t care whether they use tainted values.
Make judicious use of the file tests in dealing with any user−supplied filenames. When possible, do opens
and such after properly dropping any special user (or group!) privileges. Perl doesn‘t prevent you from
opening tainted filenames for reading, so be careful what you print out. The tainting mechanism is intended
to prevent stupid mistakes, not to remove the need for thought.
Perl does not call the shell to expand wild cards when you pass system and exec explicit parameter lists
instead of strings with possible shell wildcards in them. Unfortunately, the open, glob, and backtick
functions provide no such alternate calling convention, so more subterfuge will be required.
Perl provides a reasonably safe way to open a file or pipe from a setuid or setgid program: just create a child
process with reduced privilege who does the dirty work for you. First, fork a child using the special open
syntax that connects the parent and child by a pipe. Now the child resets its ID set and any other per−process
attributes, like environment variables, umasks, current working directories, back to the originals or known
safe values. Then the child process, which no longer has any special permissions, does the open or other
system call. Finally, the child passes the data it managed to access back to the parent. Because the file or
pipe was opened in the child while running under less privilege than the parent, it‘s not apt to be tricked into
doing something it shouldn‘t.
Here‘s a way to do backticks reasonably safely. Notice how the exec is not called with a string that the shell
could expand. This is by far the best way to call something that might be subjected to shell escapes: just
never call the shell at all.
use English;
die "Can’t fork: $!" unless defined $pid = open(KID, "−|");
if ($pid) {
# parent
while () {
# do something
}
close KID;
446
Version 5.005_02
18−Oct−1998
perlsec
Perl Programmers Reference Guide
perlsec
} else {
my @temp = ($EUID, $EGID);
$EUID = $UID;
$EGID = $GID;
initgroups()
#
also called!
# Make sure privs are really gone
($EUID, $EGID) = @temp;
die "Can’t drop privileges"
unless $UID == $EUID && $GID eq $EGID;
$ENV{PATH} = "/bin:/usr/bin";
exec ’myprog’, ’arg1’, ’arg2’
or die "can’t exec myprog: $!";
}
A similar strategy would work for wildcard expansion via glob, although you can use readdir instead.
Taint checking is most useful when although you trust yourself not to have written a program to give away
the farm, you don‘t necessarily trust those who end up using it not to try to trick it into doing something bad.
This is the kind of security checking that‘s useful for set−id programs and programs launched on someone
else‘s behalf, like CGI programs.
This is quite different, however, from not even trusting the writer of the code not to try to do something evil.
That‘s the kind of trust needed when someone hands you a program you‘ve never seen before and says,
"Here, run this." For that kind of safety, check out the Safe module, included standard in the Perl
distribution. This module allows the programmer to set up special compartments in which all system
operations are trapped and namespace access is carefully controlled.
Security Bugs
Beyond the obvious problems that stem from giving special privileges to systems as flexible as scripts, on
many versions of Unix, set−id scripts are inherently insecure right from the start. The problem is a race
condition in the kernel. Between the time the kernel opens the file to see which interpreter to run and when
the (now−set−id) interpreter turns around and reopens the file to interpret it, the file in question may have
changed, especially if you have symbolic links on your system.
Fortunately, sometimes this kernel "feature" can be disabled. Unfortunately, there are two ways to disable it.
The system can simply outlaw scripts with any set−id bit set, which doesn‘t help much. Alternately, it can
simply ignore the set−id bits on scripts. If the latter is true, Perl can emulate the setuid and setgid
mechanism when it notices the otherwise useless setuid/gid bits on Perl scripts. It does this via a special
executable called suidperl that is automatically invoked for you if it‘s needed.
However, if the kernel set−id script feature isn‘t disabled, Perl will complain loudly that your set−id script is
insecure. You‘ll need to either disable the kernel set−id script feature, or put a C wrapper around the script.
A C wrapper is just a compiled program that does nothing except call your Perl program. Compiled
programs are not subject to the kernel bug that plagues set−id scripts. Here‘s a simple wrapper, written in C:
#define REAL_PATH "/path/to/script"
main(ac, av)
char **av;
{
execv(REAL_PATH, av);
}
Compile this wrapper into a binary executable and then make it rather than your script setuid or setgid.
See the program wrapsuid in the eg directory of your Perl distribution for a convenient way to do this
automatically for all your setuid Perl programs. It moves setuid scripts into files with the same name plus a
leading dot, and then compiles a wrapper like the one above for each of them.
In recent years, vendors have begun to supply systems free of this inherent security bug. On such systems,
18−Oct−1998
Version 5.005_02
447
perlsec
Perl Programmers Reference Guide
perlsec
when the kernel passes the name of the set−id script to open to the interpreter, rather than using a pathname
subject to meddling, it instead passes /dev/fd/3. This is a special file already opened on the script, so that
there can be no race condition for evil scripts to exploit. On these systems, Perl should be compiled with
−DSETUID_SCRIPTS_ARE_SECURE_NOW. The Configure program that builds Perl tries to figure this
out for itself, so you should never have to specify this yourself. Most modern releases of SysVr4 and BSD
4.4 use this approach to avoid the kernel race condition.
Prior to release 5.003 of Perl, a bug in the code of suidperl could introduce a security hole in systems
compiled with strict POSIX compliance.
Protecting Your Programs
There are a number of ways to hide the source to your Perl programs, with varying levels of "security".
First of all, however, you can‘t take away read permission, because the source code has to be readable in
order to be compiled and interpreted. (That doesn‘t mean that a CGI script‘s source is readable by people on
the web, though.) So you have to leave the permissions at the socially friendly 0755 level. This lets people
on your local system only see your source.
Some people mistakenly regard this as a security problem. If your program does insecure things, and relies
on people not knowing how to exploit those insecurities, it is not secure. It is often possible for someone to
determine the insecure things and exploit them without viewing the source. Security through obscurity, the
name for hiding your bugs instead of fixing them, is little security indeed.
You can try using encryption via source filters (Filter::* from CPAN). But crackers might be able to decrypt
it. You can try using the byte code compiler and interpreter described below, but crackers might be able to
de−compile it. You can try using the native−code compiler described below, but crackers might be able to
disassemble it. These pose varying degrees of difficulty to people wanting to get at your code, but none can
definitively conceal it (this is true of every language, not just Perl).
If you‘re concerned about people profiting from your code, then the bottom line is that nothing but a
restrictive licence will give you legal security. License your software and pepper it with threatening
statements like "This is unpublished proprietary software of XYZ Corp. Your access to it does not give you
permission to use it blah blah blah." You should see a lawyer to be sure your licence‘s wording will stand up
in court.
SEE ALSO
perlrun for its description of cleaning up environment variables.
448
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
NAME
perltrap − Perl traps for the unwary
DESCRIPTION
The biggest trap of all is forgetting to use the −w switch; see perlrun. The second biggest trap is not making
your entire program runnable under use strict. The third biggest trap is not reading the list of changes
in this version of Perl; see perldelta.
Awk Traps
Accustomed awk users should take special note of the following:
The English module, loaded via
use English;
allows you to refer to special variables (like $/) with names (like $RS), as though they were in awk;
see perlvar for details.
Semicolons are required after all simple statements in Perl (except at the end of a block). Newline is
not a statement delimiter.
Curly brackets are required on ifs and whiles.
Variables begin with "$", "@" or "%" in Perl.
Arrays index from 0. Likewise string positions in substr() and index().
You have to decide whether your array has numeric or string indices.
Hash values do not spring into existence upon mere reference.
You have to decide whether you want to use string or numeric comparisons.
Reading an input line does not split it for you. You get to split it to an array yourself. And the
split() operator has different arguments than awk‘s.
The current input line is normally in $_, not $0. It generally does not have the newline stripped.
($0 is the name of the program executed.) See perlvar.
$ does not refer to fields—it refers to substrings matched by the last match pattern.
The print() statement does not add field and record separators unless you set $, and $\. You can
set $OFS and $ORS if you‘re using the English module.
You must open your files before you print to them.
The range operator is "..", not comma. The comma operator works as in C.
The match operator is "=~", not "~". ("~" is the one‘s complement operator, as in C.)
The exponentiation operator is "**", not "^". "^" is the XOR operator, as in C. (You know, one could
get the feeling that awk is basically incompatible with C.)
The concatenation operator is ".", not the null string. (Using the null string would render /pat/
/pat/ unparsable, because the third slash would be interpreted as a division operator—the tokenizer
is in fact slightly context sensitive for operators like "/", "?", and ">". And in fact, "." itself can be the
beginning of a number.)
The next, exit, and continue keywords work differently.
The following variables work differently:
Awk
ARGC
ARGV[0]
18−Oct−1998
Perl
$#ARGV or scalar @ARGV
$0
Version 5.005_02
449
perltrap
Perl Programmers Reference Guide
perltrap
FILENAME $ARGV
FNR
$. − something
FS(whatever you like)
NF$#Fld, or some such
NR$.
OFMT
$#
OFS
$,
ORS
$\
RLENGTH
length($&)
RS$/
RSTART
length($‘)
SUBSEP
$;
You cannot set $RS to a pattern, only a string.
When in doubt, run the awk construct through a2p and see what it gives you.
C Traps
Cerebral C programmers should take note of the following:
Curly brackets are required on if‘s and while‘s.
You must use elsif rather than else if.
The break and continue keywords from C become in Perl last and next, respectively. Unlike
in C, these do NOT work within a do { } while construct.
There‘s no switch statement. (But it‘s easy to build one on the fly.)
Variables begin with "$", "@" or "%" in Perl.
printf() does not implement the "*" format for interpolating field widths, but it‘s trivial to use
interpolation of double−quoted strings to achieve the same effect.
Comments begin with "#", not "/*".
You can‘t take the address of anything, although a similar operator in Perl is the backslash, which
creates a reference.
ARGV must be capitalized. $ARGV[0] is C‘s argv[1], and argv[0] ends up in $0.
System calls such as link(), unlink(), rename(), etc. return nonzero for success, not 0.
Signal handlers deal with signal names, not numbers. Use kill −l to find their names on your
system.
Sed Traps
Seasoned sed programmers should take note of the following:
Backreferences in substitutions use "$" rather than "\".
The pattern matching metacharacters "(", ")", and "|" do not have backslashes in front.
The range operator is ..., rather than comma.
Shell Traps
Sharp shell programmers should take note of the following:
The backtick operator does variable interpolation without regard to the presence of single quotes in the
command.
The backtick operator does no translation of the return value, unlike csh.
450
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
Shells (especially csh) do several levels of substitution on each command line. Perl does substitution
in only certain constructs such as double quotes, backticks, angle brackets, and search patterns.
Shells interpret scripts a little bit at a time. Perl compiles the entire program before executing it
(except for BEGIN blocks, which execute at compile time).
The arguments are available via @ARGV, not $1, $2, etc.
The environment is not automatically made available as separate scalar variables.
Perl Traps
Practicing Perl Programmers should take note of the following:
Remember that many operations behave differently in a list context than they do in a scalar one. See
perldata for details.
Avoid barewords if you can, especially all lowercase ones. You can‘t tell by just looking at it whether
a bareword is a function or a string. By using quotes on strings and parentheses on function calls, you
won‘t ever get them confused.
You cannot discern from mere inspection which builtins are unary operators (like chop() and
chdir()) and which are list operators (like print() and unlink()). (User−defined subroutines
can be only list operators, never unary ones.) See perlop.
People have a hard time remembering that some functions default to $_, or @ARGV, or whatever,
but that others which you might expect to do not.
The construct is not the name of the filehandle, it is a readline operation on that handle. The
data read is assigned to $_ only if the file read is the sole condition in a while loop:
while ()
{ }
while (defined($_ = )) { }..
; # data discarded!
Remember not to use "=" when you need "=~"; these two constructs are quite different:
$x = /foo/;
$x =~ /foo/;
The do {} construct isn‘t a real loop that you can use loop control on.
Use my() for local variables whenever you can get away with it (but see perlform for where you
can‘t). Using local() actually gives a local value to a global variable, which leaves you open to
unforeseen side−effects of dynamic scoping.
If you localize an exported variable in a module, its exported value will not change. The local name
becomes an alias to a new value but the external name is still an alias for the original.
Perl4 to Perl5 Traps
Practicing Perl4 Programmers should take note of the following Perl4−to−Perl5 specific traps.
They‘re crudely ordered according to the following list:
Discontinuance, Deprecation, and BugFix traps
Anything that‘s been fixed as a perl4 bug, removed as a perl4 feature or deprecated as a perl4 feature
with the intent to encourage usage of some other perl5 feature.
Parsing Traps
Traps that appear to stem from the new parser.
Numerical Traps
Traps having to do with numerical or mathematical operators.
18−Oct−1998
Version 5.005_02
451
perltrap
Perl Programmers Reference Guide
perltrap
General data type traps
Traps involving perl standard data types.
Context Traps − scalar, list contexts
Traps related to context within lists, scalar statements/declarations.
Precedence Traps
Traps related to the precedence of parsing, evaluation, and execution of code.
General Regular Expression Traps using s///, etc.
Traps related to the use of pattern matching.
Subroutine, Signal, Sorting Traps
Traps related to the use of signals and signal handlers, general subroutines, and sorting, along with
sorting subroutines.
OS Traps
OS−specific traps.
DBM Traps
Traps specific to the use of dbmopen(), and specific dbm implementations.
Unclassified Traps
Everything else.
If you find an example of a conversion trap that is not listed here, please submit it to Bill Middleton
(Variable Suicide)
Variable suicide behavior is more consistent under Perl 5. Perl5 exhibits the same behavior for hashes
and scalars, that perl4 exhibits for only scalars.
$aGlobal{ "aKey" } = "global value";
print "MAIN:", $aGlobal{"aKey"}, "\n";
$GlobalLevel = 0;
&test( *aGlobal );
sub test {
local( *theArgument ) = @_;
local( %aNewLocal ); # perl 4 != 5.001l,m
$aNewLocal{"aKey"} = "this should never appear";
print "SUB: ", $theArgument{"aKey"}, "\n";
$aNewLocal{"aKey"} = "level $GlobalLevel";
# what should print
$GlobalLevel++;
if( $GlobalLevel<4 ) {
&test( *aNewLocal );
}
}
458
#
#
#
#
#
#
Perl4:
MAIN:global value
SUB: global value
SUB: level 0
SUB: level 1
SUB: level 2
#
#
#
#
#
Perl5:
MAIN:global value
SUB: global value
SUB: this should never appear
SUB: this should never appear
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
# SUB: this should never appear
Context Traps − scalar, list contexts
(list context)
The elements of argument lists for formats are now evaluated in list context. This means you can
interpolate list values now.
@fmt = ("foo","bar","baz");
format STDOUT=
@<<<<< @||||| @>>>>>
@fmt;
.
write;
# perl4 errors: Please use commas to separate fields in file
# perl5 prints: foo
bar
baz
(scalar context)
The caller() function now returns a false value in a scalar context if there is no caller. This lets
library files determine if they‘re being required.
caller() ? (print "You rang?\n") : (print "Got a 0\n");
# perl4 errors: There is no caller
# perl5 prints: Got a 0
(scalar context)
The comma operator in a scalar context is now guaranteed to give a scalar context to its arguments.
@y= (’a’,’b’,’c’);
$x = (1, 2, @y);
print "x = $x\n";
# Perl4 prints:
# Perl5 prints:
x = c
x = 3
# Thinks list context interpolates list
# Knows scalar uses length of list
(list, builtin)
sprintf() funkiness (array argument converted to scalar array count) This test could be added to
t/op/sprintf.t
@z = (’%s%s’, ’foo’, ’bar’);
$x = sprintf(@z);
if ($x eq ’foobar’) {print "ok 2\n";} else {print "not ok 2 ’$x’\n";}
# perl4 prints: ok 2
# perl5 prints: not ok 2
printf() works fine, though:
printf STDOUT (@z);
print "\n";
# perl4 prints: foobar
# perl5 prints: foobar
Probably a bug.
Precedence Traps
Perl4−to−Perl5 traps involving precedence order.
Perl 4 has almost the same precedence rules as Perl 5 for the operators that they both have. Perl 4 however,
seems to have had some inconsistencies that made the behavior differ from what was documented.
18−Oct−1998
Version 5.005_02
459
perltrap
Perl Programmers Reference Guide
perltrap
Precedence
LHS vs. RHS of any assignment operator. LHS is evaluated first in perl4, second in perl5; this can
affect the relationship between side−effects in sub−expressions.
@arr = ( ’left’, ’right’ );
$a{shift @arr} = shift @arr;
print join( ’ ’, keys %a );
# perl4 prints: left
# perl5 prints: right
Precedence
These are now semantic errors because of precedence:
@list = (1,2,3,4,5);
%map = ("a",1,"b",2,"c",3,"d",4);
$n = shift @list + 2;
# first item in list plus 2
print "n is $n, ";
$m = keys %map + 2;
# number of items in hash plus 2
print "m is $m\n";
# perl4 prints: n is 3, m is 6
# perl5 errors and fails to compile
Precedence
The precedence of assignment operators is now the same as the precedence of assignment. Perl 4
mistakenly gave them the precedence of the associated operator. So you now must parenthesize them
in expressions like
/foo/ ? ($a += 2) : ($a −= 2);
Otherwise
/foo/ ? $a += 2 : $a −= 2
would be erroneously parsed as
(/foo/ ? $a += 2 : $a) −= 2;
On the other hand,
$a += /foo/ ? 1 : 2;
now works as a C programmer would expect.
Precedence
open FOO || die;
is now incorrect. You need parentheses around the filehandle. Otherwise, perl5 leaves the statement
as its default precedence:
open(FOO || die);
# perl4 opens or dies
# perl5 errors: Precedence problem: open FOO should be open(FOO)
Precedence
perl4 gives the special variable, $: precedence, where perl5 treats $:: as main package
$a = "x"; print "$::a";
# perl 4 prints: −:a
# perl 5 prints: x
460
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
Precedence
perl4 had buggy precedence for the file test operators vis−a−vis the assignment operators. Thus,
although the precedence table for perl4 leads one to believe −e $foo .= "q" should parse as
((−e $foo) .= "q"), it actually parses as (−e ($foo .= "q")). In perl5, the precedence
is as documented.
−e $foo .= "q"
# perl4 prints: no output
# perl5 prints: Can’t modify −e in concatenation
Precedence
In perl4, keys(), each() and values() were special high−precedence operators that operated
on a single hash, but in perl5, they are regular named unary operators. As documented, named unary
operators have lower precedence than the arithmetic and concatenation operators + − ., but the
perl4 variants of these operators actually bind tighter than + − .. Thus, for:
%foo = 1..10;
print keys %foo − 1
# perl4 prints: 4
# perl5 prints: Type of arg 1 to keys must be hash (not subtraction)
The perl4 behavior was probably more useful, if less consistent.
General Regular Expression Traps using s///, etc.
All types of RE traps.
Regular Expression
s‘$lhs‘$rhs’ now does no interpolation on either side. It used to interpolate $lhs but not
$rhs. (And still does not match a literal ‘$’ in string)
$a=1;$b=2;
$string = ’1 2 $a $b’;
$string =~ s’$a’$b’;
print $string,"\n";
# perl4 prints: $b 2 $a $b
# perl5 prints: 1 2 $a $b
Regular Expression
m//g now attaches its state to the searched string rather than the regular expression. (Once the scope
of a block is left for the sub, the state of the searched string is lost)
$_ = "ababab";
while(m/ab/g){
&doit("blah");
}
sub doit{local($_) = shift; print "Got $_ "}
# perl4 prints: blah blah blah
# perl5 prints: infinite loop blah...
Regular Expression
Currently, if you use the m//o qualifier on a regular expression within an anonymous sub, all
closures generated from that anonymous sub will use the regular expression as it was compiled when
it was used the very first time in any such closure. For instance, if you say
sub build_match {
my($left,$right) = @_;
18−Oct−1998
Version 5.005_02
461
perltrap
Perl Programmers Reference Guide
perltrap
return sub { $_[0] =~ /$left stuff $right/o; };
}
build_match() will always return a sub which matches the contents of $left and $right as
they were the first time that build_match() was called, not as they are in the current call.
This is probably a bug, and may change in future versions of Perl.
Regular Expression
If no parentheses are used in a match, Perl4 sets $+ to the whole match, just like $&. Perl5 does not.
"abcdef" =~ /b.*e/;
print "\$+ = $+\n";
# perl4 prints: bcde
# perl5 prints:
Regular Expression
substitution now returns the null string if it fails
$string = "test";
$value = ($string =~ s/foo//);
print $value, "\n";
# perl4 prints: 0
# perl5 prints:
Also see Numerical Traps for another example of this new feature.
Regular Expression
s‘lhs‘rhs‘ (using backticks) is now a normal substitution, with no backtick expansion
$string = "";
$string =~ s‘^‘hostname‘;
print $string, "\n";
# perl4 prints:
# perl5 prints: hostname
Regular Expression
Stricter parsing of variables used in regular expressions
s/^([^$grpc]*$grpc[$opt$plus$rep]?)//o;
# perl4: compiles w/o error
# perl5: with Scalar found where operator expected ..., near "$opt$plus"
an added component of this example, apparently from the same script, is the actual value of the s‘d
string after the substitution. [$opt] is a character class in perl4 and an array subscript in perl5
$grpc = ’a’;
$opt = ’r’;
$_ = ’bar’;
s/^([^$grpc]*$grpc[$opt]?)/foo/;
print ;
# perl4 prints: foo
# perl5 prints: foobar
Regular Expression
Under perl5, m?x? matches only once, like ?x?. Under perl4, it matched repeatedly, like /x/ or
m!x!.
462
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
$test = "once";
sub match { $test =~ m?once?; }
&match();
if( &match() ) {
# m?x? matches more then once
print "perl4\n";
} else {
# m?x? matches only once
print "perl5\n";
}
# perl4 prints: perl4
# perl5 prints: perl5
Subroutine, Signal, Sorting Traps
The general group of Perl4−to−Perl5 traps having to do with Signals, Sorting, and their related subroutines,
as well as general subroutine traps. Includes some OS−Specific traps.
(Signals)
Barewords that used to look like strings to Perl will now look like subroutine calls if a subroutine by
that name is defined before the compiler sees them.
sub SeeYa { warn"Hasta la vista, baby!" }
$SIG{’TERM’} = SeeYa;
print "SIGTERM is now $SIG{’TERM’}\n";
# perl4 prints: SIGTERM is main’SeeYa
# perl5 prints: SIGTERM is now main::1
Use −w to catch this one
(Sort Subroutine)
reverse is no longer allowed as the name of a sort subroutine.
sub reverse{ print "yup "; $a <=> $b }
print sort reverse a,b,c;
# perl4 prints: yup yup yup yup abc
# perl5 prints: abc
warn() won‘t let you specify a filehandle.
Although it _always_ printed to STDERR, warn() would let you specify a filehandle in perl4.
With perl5 it does not.
warn STDERR "Foo!";
# perl4 prints: Foo!
# perl5 prints: String found where operator expected
OS Traps
(SysV)
Under HPUX, and some other SysV OSes, one had to reset any signal handler, within the signal
handler function, each time a signal was handled with perl4. With perl5, the reset is now done
correctly. Any code relying on the handler _not_ being reset will have to be reworked.
Since version 5.002, Perl uses sigaction() under SysV.
sub gotit {
print "Got @_... ";
}
$SIG{’INT’} = ’gotit’;
18−Oct−1998
Version 5.005_02
463
perltrap
Perl Programmers Reference Guide
perltrap
$| = 1;
$pid = fork;
if ($pid) {
kill(’INT’, $pid);
sleep(1);
kill(’INT’, $pid);
} else {
while (1) {sleep(10);}
}
# perl4 (HPUX) prints: Got INT...
# perl5 (HPUX) prints: Got INT... Got INT...
(SysV)
Under SysV OSes, seek() on a file opened to append >> now does the right thing w.r.t. the
fopen() manpage. e.g., − When a file is opened for append, it is impossible to overwrite
information already in the file.
open(TEST,">>seek.test");
$start = tell TEST ;
foreach(1 .. 9){
print TEST "$_ ";
}
$end = tell TEST ;
seek(TEST,$start,0);
print TEST "18 characters here";
# perl4 (solaris) seek.test has: 18 characters here
# perl5 (solaris) seek.test has: 1 2 3 4 5 6 7 8 9 18 characters here
Interpolation Traps
Perl4−to−Perl5 traps having to do with how things get interpolated within certain expressions, statements,
contexts, or whatever.
Interpolation
@ now always interpolates an array in double−quotish strings.
print "To: someone@somewhere.com\n";
# perl4 prints: To:someone@somewhere.com
# perl5 errors : In string, @somewhere now must be written as \@somewhere
Interpolation
Double−quoted strings may no longer end with an unescaped $ or @.
$foo = "foo$";
$bar = "bar@";
print "foo is $foo, bar is $bar\n";
# perl4 prints: foo is foo$, bar is bar@
# perl5 errors: Final $ should be \$ or $name
Note: perl5 DOES NOT error on the terminating @ in $bar
Interpolation
Perl now sometimes evaluates arbitrary expressions inside braces that occur within double quotes
(usually when the opening brace is preceded by $ or @).
@www = "buz";
$foo = "foo";
$bar = "bar";
464
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
sub foo { return "bar" };
print "|@{w.w.w}|${main’foo}|";
# perl4 prints: |@{w.w.w}|foo|
# perl5 prints: |buz|bar|
Note that you can use strict; to ward off such trappiness under perl5.
Interpolation
The construct "this is $$x" used to interpolate the pid at that point, but now apparently tries to
dereference $x. $$ by itself still works fine, however.
print "this is $$x\n";
# perl4 prints: this is XXXx
# perl5 prints: this is
(XXX is the current pid)
Interpolation
Creation of hashes on the fly with eval "EXPR" now requires either both $‘s to be protected in
the specification of the hash name, or both curlies to be protected. If both curlies are protected, the
result will be compatible with perl4 and perl5. This is a very common practice, and should be
changed to use the block form of eval{} if possible.
$hashname = "foobar";
$key = "baz";
$value = 1234;
eval "\$$hashname{’$key’} = q|$value|";
(defined($foobar{’baz’})) ? (print "Yup") : (print "Nope");
# perl4 prints: Yup
# perl5 prints: Nope
Changing
eval "\$$hashname{’$key’} = q|$value|";
to
eval "\$\$hashname{’$key’} = q|$value|";
causes the following result:
# perl4 prints: Nope
# perl5 prints: Yup
or, changing to
eval "\$$hashname\{’$key’\} = q|$value|";
causes the following result:
# perl4 prints: Yup
# perl5 prints: Yup
# and is compatible for both versions
Interpolation
perl4 programs which unconsciously rely on the bugs in earlier perl versions.
perl −e ’$bar=q/not/; print "This is $foo{$bar} perl5"’
# perl4 prints: This is not perl5
# perl5 prints: This is perl5
18−Oct−1998
Version 5.005_02
465
perltrap
Perl Programmers Reference Guide
perltrap
Interpolation
You also have to be careful about array references.
print "$foo{"
perl 4 prints: {
perl 5 prints: syntax error
Interpolation
Similarly, watch out for:
$foo = "array";
print "\$$foo{bar}\n";
# perl4 prints: $array{bar}
# perl5 prints: $
Perl 5 is looking for $array{bar} which doesn‘t exist, but perl 4 is happy just to expand $foo to
"array" by itself. Watch out for this especially in eval‘s.
Interpolation
qq() string passed to eval
eval qq(
foreach \$y (keys %\$x\) {
\$count++;
}
);
# perl4 runs this ok
# perl5 prints: Can’t find string terminator ")"
DBM Traps
General DBM traps.
DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script,
run under perl5, to fail. The build of perl5 must have been linked with the same dbm/ndbm as the
default for dbmopen() to function properly without tie‘ing to an extension dbm implementation.
dbmopen (%dbm, "file", undef);
print "ok\n";
# perl4 prints: ok
# perl5 prints: ok (IFF linked with −ldbm or −lndbm)
DBM Existing dbm databases created under perl4 (or any other dbm/ndbm tool) may cause the same script,
run under perl5, to fail. The error generated when exceeding the limit on the key/value size will
cause perl5 to exit immediately.
dbmopen(DB, "testdb",0600) || die "couldn’t open db! $!";
$DB{’trap’} = "x" x 1024; # value too large for most dbm/ndbm
print "YUP\n";
# perl4 prints:
dbm store returned −1, errno 28, key "trap" at − line 3.
YUP
# perl5 prints:
dbm store returned −1, errno 28, key "trap" at − line 3.
466
Version 5.005_02
18−Oct−1998
perltrap
Perl Programmers Reference Guide
perltrap
Unclassified Traps
Everything else.
require/do trap using returned value
If the file doit.pl has:
sub foo {
$rc = do "./do.pl";
return 8;
}
print &foo, "\n";
And the do.pl file has the following single line:
return 3;
Running doit.pl gives the following:
# perl 4 prints: 3 (aborts the subroutine early)
# perl 5 prints: 8
Same behavior if you replace do with require.
split on empty string with LIMIT specified
$string = ’’;
@list = split(/foo/, $string, 2)
Perl4 returns a one element list containing the empty string but Perl5 returns an empty list.
As always, if any of these are ever officially declared as bugs, they‘ll be fixed and removed.
18−Oct−1998
Version 5.005_02
467
perlstyle
Perl Programmers Reference Guide
perlstyle
NAME
perlstyle − Perl style guide
DESCRIPTION
Each programmer will, of course, have his or her own preferences in regards to formatting, but there are
some general guidelines that will make your programs easier to read, understand, and maintain.
The most important thing is to run your programs under the −w flag at all times. You may turn it off
explicitly for particular portions of code via the $^W variable if you must. You should also always run under
use strict or know the reason why not. The use sigtrap and even use diagnostics pragmas
may also prove useful.
Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly
brace of a multi−line BLOCK should line up with the keyword that started the construct. Beyond that, he has
other preferences that aren‘t so strong:
4−column indent.
Opening curly on same line as keyword, if possible, otherwise line up.
Space before the opening curly of a multi−line BLOCK.
One−line BLOCK may be put on one line, including curlies.
No space before the semicolon.
Semicolon omitted in "short" one−line BLOCK.
Space around most operators.
Space around a "complex" subscript (inside brackets).
Blank lines between chunks that do different things.
Uncuddled elses.
No space between function name and its opening parenthesis.
Space after each comma.
Long lines broken after an operator (except "and" and "or").
Space after last parenthesis matching on current line.
Line up corresponding items vertically.
Omit redundant punctuation as long as clarity doesn‘t suffer.
Larry has his reasons for each of these things, but he doesn‘t claim that everyone else‘s mind works the same
as his does.
Here are some other more substantive style issues to think about:
Just because you CAN do something a particular way doesn‘t mean that you SHOULD do it that way.
Perl is designed to give you several ways to do anything, so consider picking the most readable one.
For instance
open(FOO,$foo) || die "Can’t open $foo: $!";
is better than
die "Can’t open $foo: $!" unless open(FOO,$foo);
because the second way hides the main point of the statement in a modifier. On the other hand
print "Starting analysis\n" if $verbose;
468
Version 5.005_02
18−Oct−1998
perlstyle
Perl Programmers Reference Guide
perlstyle
is better than
$verbose && print "Starting analysis\n";
because the main point isn‘t whether the user typed −v or not.
Similarly, just because an operator lets you assume default arguments doesn‘t mean that you have to
make use of the defaults. The defaults are there for lazy systems programmers writing one−shot
programs. If you want your program to be readable, consider supplying the argument.
Along the same lines, just because you CAN omit parentheses in many places doesn‘t mean that you
ought to:
return print reverse sort num values %array;
return print(reverse(sort num (values(%array))));
When in doubt, parenthesize. At the very least it will let some poor schmuck bounce on the % key in
vi.
Even if you aren‘t in doubt, consider the mental welfare of the person who has to maintain the code
after you, and who will probably put parentheses in the wrong place.
Don‘t go through silly contortions to exit a loop at the top or the bottom, when Perl provides the last
operator so you can exit in the middle. Just "outdent" it a little to make it more visible:
LINE:
for (;;) {
statements;
last LINE if $foo;
next LINE if /^#/;
statements;
}
Don‘t be afraid to use loop labels—they‘re there to enhance readability as well as to allow multilevel
loop breaks. See the previous example.
Avoid using grep() (or map()) or ‘backticks‘ in a void context, that is, when you just throw away
their return values. Those functions all have return values, so use them. Otherwise use a foreach()
loop or the system() function instead.
For portability, when using features that may not be implemented on every machine, test the construct
in an eval to see if it fails. If you know what version or patchlevel a particular feature was
implemented, you can test $] ($PERL_VERSION in English) to see if it will be there. The
Config module will also let you interrogate values determined by the Configure program when Perl
was installed.
Choose mnemonic identifiers. If you can‘t remember what mnemonic means, you‘ve got a problem.
While short identifiers like $gotit are probably ok, use underscores to separate words. It is
generally easier to read $var_names_like_this than $VarNamesLikeThis, especially for
non−native speakers of English. It‘s also a simple rule that works consistently with
VAR_NAMES_LIKE_THIS.
Package names are sometimes an exception to this rule. Perl informally reserves lowercase module
names for "pragma" modules like integer and strict. Other modules should begin with a capital
letter and use mixed case, but probably without underscores due to limitations in primitive file
systems’ representations of module names as files that must fit into a few sparse bytes.
You may find it helpful to use letter case to indicate the scope or nature of a variable. For example:
$ALL_CAPS_HERE
$Some_Caps_Here
$no_caps_here
18−Oct−1998
constants only (beware clashes with perl vars!)
package−wide global/static
function scope my() or local() variables
Version 5.005_02
469
perlstyle
Perl Programmers Reference Guide
perlstyle
Function and method names seem to work best as all lowercase. E.g., $obj−>as_string().
You can use a leading underscore to indicate that a variable or function should not be used outside the
package that defined it.
If you have a really hairy regular expression, use the /x modifier and put in some whitespace to make
it look a little less like line noise. Don‘t use slash as a delimiter when your regexp has slashes or
backslashes.
Use the new "and" and "or" operators to avoid having to parenthesize list operators so much, and to
reduce the incidence of punctuation operators like && and ||. Call your subroutines as if they were
functions or list operators to avoid excessive ampersands and parentheses.
Use here documents instead of repeated print() statements.
Line up corresponding things vertically, especially if it‘d be too long to fit on one line anyway.
$IDX
$IDX
$IDX
$IDX
=
=
=
=
$ST_MTIME;
$ST_ATIME
$ST_CTIME
$ST_SIZE
if $opt_u;
if $opt_c;
if $opt_s;
mkdir $tmpdir, 0700 or die "can’t mkdir $tmpdir: $!";
chdir($tmpdir)
or die "can’t chdir $tmpdir: $!";
mkdir ’tmp’,
0777 or die "can’t mkdir $tmpdir/tmp: $!";
Always check the return codes of system calls. Good error messages should go to STDERR, include
which program caused the problem, what the failed system call and arguments were, and (VERY
IMPORTANT) should contain the standard system error message for what went wrong. Here‘s a
simple but sufficient example:
opendir(D, $dir)
or die "can’t opendir $dir: $!";
Line up your transliterations when it makes sense:
tr [abc]
[xyz];
Think about reusability. Why waste brainpower on a one−shot when you might want to do something
like it again? Consider generalizing your code. Consider writing a module or object class. Consider
making your code run cleanly with use strict and −w in effect. Consider giving away your code.
Consider changing your whole world view. Consider... oh, never mind.
Be consistent.
Be nice.
470
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
NAME
perlxs − XS language reference manual
DESCRIPTION
Introduction
XS is a language used to create an extension interface between Perl and some C library which one wishes to
use with Perl. The XS interface is combined with the library to create a new library which can be linked to
Perl. An XSUB is a function in the XS language and is the core component of the Perl application interface.
The XS compiler is called xsubpp. This compiler will embed the constructs necessary to let an XSUB,
which is really a C function in disguise, manipulate Perl values and creates the glue necessary to let Perl
access the XSUB. The compiler uses typemaps to determine how to map C function parameters and
variables to Perl values. The default typemap handles many common C types. A supplement typemap must
be created to handle special structures and types for the library being linked.
See perlxstut for a tutorial on the whole extension creation process.
Note: For many extensions, Dave Beazley‘s SWIG system provides a significantly more convenient
mechanism for creating the XS glue code. See http://www.cs.utah.edu/~beazley/SWIG for more information.
On The Road
Many of the examples which follow will concentrate on creating an interface between Perl and the ONC+
RPC bind library functions. The rpcb_gettime() function is used to demonstrate many features of the
XS language. This function has two parameters; the first is an input parameter and the second is an output
parameter. The function also returns a status value.
bool_t rpcb_gettime(const char *host, time_t *timep);
From C this function will be called with the following statements.
#include
bool_t status;
time_t timep;
status = rpcb_gettime( "localhost", &timep );
If an XSUB is created to offer a direct translation between this function and Perl, then this XSUB will be
used from Perl with the following code. The $status and $timep variables will contain the output of the
function.
use RPC;
$status = rpcb_gettime( "localhost", $timep );
The following XS file shows an XS subroutine, or XSUB, which demonstrates one possible interface to the
rpcb_gettime() function. This XSUB represents a direct translation between C and Perl and so
preserves the interface even from Perl. This XSUB will be invoked from Perl with the usage shown above.
Note that the first three #include statements, for EXTERN.h, perl.h, and XSUB.h, will always be present
at the beginning of an XS file. This approach and others will be expanded later in this document.
#include
#include
#include
#include
"EXTERN.h"
"perl.h"
"XSUB.h"
MODULE = RPC
PACKAGE = RPC
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
18−Oct−1998
Version 5.005_02
471
perlxs
Perl Programmers Reference Guide
perlxs
OUTPUT:
timep
Any extension to Perl, including those containing XSUBs, should have a Perl module to serve as the
bootstrap which pulls the extension into Perl. This module will export the extension‘s functions and
variables to the Perl program and will cause the extension‘s XSUBs to be linked into Perl. The following
module will be used for most of the examples in this document and should be used from Perl with the use
command as shown earlier. Perl modules are explained in more detail later in this document.
package RPC;
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
@EXPORT = qw( rpcb_gettime );
bootstrap RPC;
1;
Throughout this document a variety of interfaces to the rpcb_gettime() XSUB will be explored. The
XSUBs will take their parameters in different orders or will take different numbers of parameters. In each
case the XSUB is an abstraction between Perl and the real C rpcb_gettime() function, and the XSUB
must always ensure that the real rpcb_gettime() function is called with the correct parameters. This
abstraction will allow the programmer to create a more Perl−like interface to the C function.
The Anatomy of an XSUB
The following XSUB allows a Perl program to access a C library function called sin(). The XSUB will
imitate the C function which takes a single argument and returns a single value.
double
sin(x)
double x
When using C pointers the indirection operator * should be considered part of the type and the address
operator & should be considered part of the variable, as is demonstrated in the rpcb_gettime() function
above. See the section on typemaps for more about handling qualifiers and unary operators in C types.
The function name and the return type must be placed on separate lines.
INCORRECT
CORRECT
double sin(x)
double x
double
sin(x)
double x
The function body may be indented or left−adjusted. The following example shows a function with its body
left−adjusted. Most examples in this document will indent the body.
CORRECT
double
sin(x)
double x
The Argument Stack
The argument stack is used to store the values which are sent as parameters to the XSUB and to store the
XSUB‘s return value. In reality all Perl functions keep their values on this stack at the same time, each
limited to its own range of positions on the stack. In this document the first position on that stack which
belongs to the active function will be referred to as position 0 for that function.
XSUBs refer to their stack arguments with the macro ST(x), where x refers to a position in this XSUB‘s part
of the stack. Position 0 for that function would be known to the XSUB as ST(0). The XSUB‘s incoming
472
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
parameters and outgoing return values always begin at ST(0). For many simple cases the xsubpp compiler
will generate the code necessary to handle the argument stack by embedding code fragments found in the
typemaps. In more complex cases the programmer must supply the code.
The RETVAL Variable
The RETVAL variable is a magic variable which always matches the return type of the C library function.
The xsubpp compiler will supply this variable in each XSUB and by default will use it to hold the return
value of the C library function being called. In simple cases the value of RETVAL will be placed in ST(0)
of the argument stack where it can be received by Perl as the return value of the XSUB.
If the XSUB has a return type of void then the compiler will not supply a RETVAL variable for that
function. When using the PPCODE: directive the RETVAL variable is not needed, unless used explicitly.
If PPCODE: directive is not used, void return value should be used only for subroutines which do not
return a value, even if CODE: directive is used which sets ST(0) explicitly.
Older versions of this document recommended to use void return value in such cases. It was discovered that
this could lead to segfaults in cases when XSUB was truely void. This practice is now deprecated, and may
be not supported at some future version. Use the return value SV * in such cases. (Currently xsubpp
contains some heuristic code which tries to disambiguate between "truely−void" and
"old−practice−declared−as−void" functions. Hence your code is at mercy of this heuristics unless you use SV
* as return value.)
The MODULE Keyword
The MODULE keyword is used to start the XS code and to specify the package of the functions which are
being defined. All text preceding the first MODULE keyword is considered C code and is passed through to
the output untouched. Every XS module will have a bootstrap function which is used to hook the XSUBs
into Perl. The package name of this bootstrap function will match the value of the last MODULE statement
in the XS source files. The value of MODULE should always remain constant within the same XS file,
though this is not required.
The following example will start the XS code and will place all functions in a package named RPC.
MODULE = RPC
The PACKAGE Keyword
When functions within an XS source file must be separated into packages the PACKAGE keyword should be
used. This keyword is used with the MODULE keyword and must follow immediately after it when used.
MODULE = RPC
PACKAGE = RPC
[ XS code in package RPC ]
MODULE = RPC
PACKAGE = RPCB
[ XS code in package RPCB ]
MODULE = RPC
PACKAGE = RPC
[ XS code in package RPC ]
Although this keyword is optional and in some cases provides redundant information it should always be
used. This keyword will ensure that the XSUBs appear in the desired package.
The PREFIX Keyword
The PREFIX keyword designates prefixes which should be removed from the Perl function names. If the C
function is rpcb_gettime() and the PREFIX value is rpcb_ then Perl will see this function as
gettime().
This keyword should follow the PACKAGE keyword when used. If PACKAGE is not used then PREFIX
should follow the MODULE keyword.
18−Oct−1998
Version 5.005_02
473
perlxs
Perl Programmers Reference Guide
MODULE = RPC
PREFIX = rpc_
MODULE = RPC
PACKAGE = RPCB
perlxs
PREFIX = rpcb_
The OUTPUT: Keyword
The OUTPUT: keyword indicates that certain function parameters should be updated (new values made
visible to Perl) when the XSUB terminates or that certain values should be returned to the calling Perl
function. For simple functions, such as the sin() function above, the RETVAL variable is automatically
designated as an output value. In more complex functions the xsubpp compiler will need help to determine
which variables are output variables.
This keyword will normally be used to complement the CODE: keyword. The RETVAL variable is not
recognized as an output variable when the CODE: keyword is present. The OUTPUT: keyword is used in
this situation to tell the compiler that RETVAL really is an output variable.
The OUTPUT: keyword can also be used to indicate that function parameters are output variables. This may
be necessary when a parameter has been modified within the function and the programmer would like the
update to be seen by Perl.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
The OUTPUT: keyword will also allow an output parameter to be mapped to a matching piece of code rather
than to a typemap.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep sv_setnv(ST(1), (double)timep);
xsubpp emits an automatic SvSETMAGIC() for all parameters in the OUTPUT section of the XSUB,
except RETVAL. This is the usually desired behavior, as it takes care of properly invoking ‘set’ magic on
output parameters (needed for hash or array element parameters that must be created if they didn‘t exist). If
for some reason, this behavior is not desired, the OUTPUT section may contain a SETMAGIC: DISABLE
line to disable it for the remainder of the parameters in the OUTPUT section. Likewise, SETMAGIC:
ENABLE can be used to reenable it for the remainder of the OUTPUT section. See perlguts for more details
about ‘set’ magic.
The CODE: Keyword
This keyword is used in more complicated XSUBs which require special handling for the C function. The
RETVAL variable is available but will not be returned unless it is specified under the OUTPUT: keyword.
The following XSUB is for a C function which requires special handling of its parameters. The Perl usage is
given first.
$status = rpcb_gettime( "localhost", $timep );
The XSUB follows.
bool_t
rpcb_gettime(host,timep)
char *host
time_t timep
CODE:
RETVAL = rpcb_gettime( host, &timep );
474
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
OUTPUT:
timep
RETVAL
The INIT: Keyword
The INIT: keyword allows initialization to be inserted into the XSUB before the compiler generates the call
to the C function. Unlike the CODE: keyword above, this keyword does not affect the way the compiler
handles RETVAL.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
INIT:
printf("# Host is %s\n", host );
OUTPUT:
timep
The NO_INIT Keyword
The NO_INIT keyword is used to indicate that a function parameter is being used only as an output value.
The xsubpp compiler will normally generate code to read the values of all function parameters from the
argument stack and assign them to C variables upon entry to the function. NO_INIT will tell the compiler
that some parameters will be used for output rather than for input and that they will be handled before the
function terminates.
The following example shows a variation of the rpcb_gettime() function. This function uses the timep
variable only as an output variable and does not care about its initial contents.
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep = NO_INIT
OUTPUT:
timep
Initializing Function Parameters
Function parameters are normally initialized with their values from the argument stack. The typemaps
contain the code segments which are used to transfer the Perl values to the C parameters. The programmer,
however, is allowed to override the typemaps and supply alternate (or additional) initialization code.
The following code demonstrates how to supply initialization code for function parameters. The
initialization code is eval‘d within double quotes by the compiler before it is added to the output so anything
which should be interpreted literally [mainly $, @, or \\] must be protected with backslashes. The variables
$var, $arg, and $type can be used as in typemaps.
bool_t
rpcb_gettime(host,timep)
char *host = (char *)SvPV($arg,PL_na);
time_t &timep = 0;
OUTPUT:
timep
This should not be used to supply default values for parameters. One would normally use this when a
function parameter must be processed by another library function before it can be used. Default parameters
are covered in the next section.
If the initialization begins with =, then it is output on the same line where the input variable is declared. If
the initialization begins with ; or +, then it is output after all of the input variables have been declared. The
= and ; cases replace the initialization normally supplied from the typemap. For the + case, the initialization
18−Oct−1998
Version 5.005_02
475
perlxs
Perl Programmers Reference Guide
perlxs
from the typemap will preceed the initialization code included after the +. A global variable, %v, is available
for the truely rare case where information from one initialization is needed in another initialization.
bool_t
rpcb_gettime(host,timep)
time_t &timep ; /*\$v{time}=@{[$v{time}=$arg]}*/
char *host + SvOK($v{time}) ? SvPV($arg,PL_na) : NULL;
OUTPUT:
timep
Default Parameter Values
Default values can be specified for function parameters by placing an assignment statement in the parameter
list. The default value may be a number or a string. Defaults should always be used on the right−most
parameters only.
To allow the XSUB for rpcb_gettime() to have a default host value the parameters to the XSUB could
be rearranged. The XSUB will then call the real rpcb_gettime() function with the parameters in the
correct order. Perl will call this XSUB with either of the following statements.
$status = rpcb_gettime( $timep, $host );
$status = rpcb_gettime( $timep );
The XSUB will look like the code which follows.
A CODE: block is used to call the real
rpcb_gettime() function with the parameters in the correct order for that function.
bool_t
rpcb_gettime(timep,host="localhost")
char *host
time_t timep = NO_INIT
CODE:
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
The PREINIT: Keyword
The PREINIT: keyword allows extra variables to be declared before the typemaps are expanded. If a
variable is declared in a CODE: block then that variable will follow any typemap code. This may result in a
C syntax error. To force the variable to be declared before the typemap code, place it into a PREINIT: block.
The PREINIT: keyword may be used one or more times within an XSUB.
The following examples are equivalent, but if the code is using complex typemaps then the first example is
safer.
bool_t
rpcb_gettime(timep)
time_t timep = NO_INIT
PREINIT:
char *host = "localhost";
CODE:
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
A correct, but error−prone example.
bool_t
rpcb_gettime(timep)
476
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
time_t timep = NO_INIT
CODE:
char *host = "localhost";
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
The SCOPE: Keyword
The SCOPE: keyword allows scoping to be enabled for a particular XSUB. If enabled, the XSUB will
invoke ENTER and LEAVE automatically.
To support potentially complex type mappings, if a typemap entry used by this XSUB contains a comment
like /*scope*/ then scoping will automatically be enabled for that XSUB.
To enable scoping:
SCOPE: ENABLE
To disable scoping:
SCOPE: DISABLE
The INPUT: Keyword
The XSUB‘s parameters are usually evaluated immediately after entering the XSUB. The INPUT: keyword
can be used to force those parameters to be evaluated a little later. The INPUT: keyword can be used
multiple times within an XSUB and can be used to list one or more input variables. This keyword is used
with the PREINIT: keyword.
The following example shows how the input parameter timep can be evaluated late, after a PREINIT.
bool_t
rpcb_gettime(host,timep)
char *host
PREINIT:
time_t tt;
INPUT:
time_t timep
CODE:
RETVAL = rpcb_gettime( host, &tt );
timep = tt;
OUTPUT:
timep
RETVAL
The next example shows each input parameter evaluated late.
bool_t
rpcb_gettime(host,timep)
PREINIT:
time_t tt;
INPUT:
char *host
PREINIT:
char *h;
INPUT:
time_t timep
CODE:
h = host;
RETVAL = rpcb_gettime( h, &tt );
18−Oct−1998
Version 5.005_02
477
perlxs
Perl Programmers Reference Guide
perlxs
timep = tt;
OUTPUT:
timep
RETVAL
Variable−length Parameter Lists
XSUBs can have variable−length parameter lists by specifying an ellipsis (...) in the parameter list. This
use of the ellipsis is similar to that found in ANSI C. The programmer is able to determine the number of
arguments passed to the XSUB by examining the items variable which the xsubpp compiler supplies for
all XSUBs. By using this mechanism one can create an XSUB which accepts a list of parameters of
unknown length.
The host parameter for the rpcb_gettime() XSUB can be optional so the ellipsis can be used to indicate
that the XSUB will take a variable number of parameters. Perl should be able to call this XSUB with either
of the following statements.
$status = rpcb_gettime( $timep, $host );
$status = rpcb_gettime( $timep );
The XS code, with ellipsis, follows.
bool_t
rpcb_gettime(timep, ...)
time_t timep = NO_INIT
PREINIT:
char *host = "localhost";
CODE:
if( items > 1 )
host = (char *)SvPV(ST(1), PL_na);
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
The C_ARGS: Keyword
The C_ARGS: keyword allows creating of XSUBS which have different calling sequence from Perl than
from C, without a need to write CODE: or CPPCODE: section. The contents of the C_ARGS: paragraph is
put as the argument to the called C function without any change.
For example, suppose that C function is declared as
symbolic nth_derivative(int n, symbolic function, int flags);
and that the default flags are kept in a global C variable default_flags. Suppose that you want to create
an interface which is called as
$second_deriv = $function−>nth_derivative(2);
To do this, declare the XSUB as
symbolic
nth_derivative(function, n)
symbolic
function
int
n
C_ARGS:
n, function, default_flags
The PPCODE: Keyword
The PPCODE: keyword is an alternate form of the CODE: keyword and is used to tell the xsubpp compiler
that the programmer is supplying the code to control the argument stack for the XSUBs return values.
478
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
Occasionally one will want an XSUB to return a list of values rather than a single value. In these cases one
must use PPCODE: and then explicitly push the list of values on the stack. The PPCODE: and CODE:
keywords are not used together within the same XSUB.
The following XSUB will call the C rpcb_gettime() function and will return its two output values,
timep and status, to Perl as a single list.
void
rpcb_gettime(host)
char *host
PREINIT:
time_t timep;
bool_t status;
PPCODE:
status = rpcb_gettime( host, &timep );
EXTEND(SP, 2);
PUSHs(sv_2mortal(newSViv(status)));
PUSHs(sv_2mortal(newSViv(timep)));
Notice that the programmer must supply the C code necessary to have the real rpcb_gettime() function
called and to have the return values properly placed on the argument stack.
The void return type for this function tells the xsubpp compiler that the RETVAL variable is not needed or
used and that it should not be created. In most scenarios the void return type should be used with the
PPCODE: directive.
The EXTEND() macro is used to make room on the argument stack for 2 return values. The PPCODE:
directive causes the xsubpp compiler to create a stack pointer available as SP, and it is this pointer which is
being used in the EXTEND() macro. The values are then pushed onto the stack with the PUSHs() macro.
Now the rpcb_gettime() function can be used from Perl with the following statement.
($status, $timep) = rpcb_gettime("localhost");
When handling output parameters with a PPCODE section, be sure to handle ‘set’ magic properly. See
perlguts for details about ‘set’ magic.
Returning Undef And Empty Lists
Occasionally the programmer will want to return simply undef or an empty list if a function fails rather
than a separate status value. The rpcb_gettime() function offers just this situation. If the function
succeeds we would like to have it return the time and if it fails we would like to have undef returned. In the
following Perl code the value of $timep will either be undef or it will be a valid time.
$timep = rpcb_gettime( "localhost" );
The following XSUB uses the SV * return type as a mnemonic only, and uses a CODE: block to indicate to
the compiler that the programmer has supplied all the necessary code. The sv_newmortal() call will
initialize the return value to undef, making that the default return value.
SV *
rpcb_gettime(host)
char * host
PREINIT:
time_t timep;
bool_t x;
CODE:
ST(0) = sv_newmortal();
if( rpcb_gettime( host, &timep ) )
sv_setnv( ST(0), (double)timep);
18−Oct−1998
Version 5.005_02
479
perlxs
Perl Programmers Reference Guide
perlxs
The next example demonstrates how one would place an explicit undef in the return value, should the need
arise.
SV *
rpcb_gettime(host)
char * host
PREINIT:
time_t timep;
bool_t x;
CODE:
ST(0) = sv_newmortal();
if( rpcb_gettime( host, &timep ) ){
sv_setnv( ST(0), (double)timep);
}
else{
ST(0) = &PL_sv_undef;
}
To return an empty list one must use a PPCODE: block and then not push return values on the stack.
void
rpcb_gettime(host)
char *host
PREINIT:
time_t timep;
PPCODE:
if( rpcb_gettime( host, &timep ) )
PUSHs(sv_2mortal(newSViv(timep)));
else{
/* Nothing pushed on stack, so an empty */
/* list is implicitly returned. */
}
Some people may be inclined to include an explicit return in the above XSUB, rather than letting control
fall through to the end. In those situations XSRETURN_EMPTY should be used, instead. This will ensure
that the XSUB stack is properly adjusted. Consult API LISTING in perlguts for other XSRETURN macros.
The REQUIRE: Keyword
The REQUIRE: keyword is used to indicate the minimum version of the xsubpp compiler needed to compile
the XS module. An XS module which contains the following statement will compile with only xsubpp
version 1.922 or greater:
REQUIRE: 1.922
The CLEANUP: Keyword
This keyword can be used when an XSUB requires special cleanup procedures before it terminates. When
the CLEANUP: keyword is used it must follow any CODE:, PPCODE:, or OUTPUT: blocks which are
present in the XSUB. The code specified for the cleanup block will be added as the last statements in the
XSUB.
The BOOT: Keyword
The BOOT: keyword is used to add code to the extension‘s bootstrap function. The bootstrap function is
generated by the xsubpp compiler and normally holds the statements necessary to register any XSUBs with
Perl. With the BOOT: keyword the programmer can tell the compiler to add extra statements to the bootstrap
function.
This keyword may be used any time after the first MODULE keyword and should appear on a line by itself.
The first blank line after the keyword will terminate the code block.
480
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
BOOT:
# The following message will be printed when the
# bootstrap function executes.
printf("Hello from the bootstrap!\n");
The VERSIONCHECK: Keyword
The VERSIONCHECK: keyword corresponds to xsubpp‘s −versioncheck and −noversioncheck
options. This keyword overrides the command line options. Version checking is enabled by default. When
version checking is enabled the XS module will attempt to verify that its version matches the version of the
PM module.
To enable version checking:
VERSIONCHECK: ENABLE
To disable version checking:
VERSIONCHECK: DISABLE
The PROTOTYPES: Keyword
The PROTOTYPES: keyword corresponds to xsubpp‘s −prototypes and −noprototypes options.
This keyword overrides the command line options. Prototypes are enabled by default. When prototypes are
enabled XSUBs will be given Perl prototypes. This keyword may be used multiple times in an XS module to
enable and disable prototypes for different parts of the module.
To enable prototypes:
PROTOTYPES: ENABLE
To disable prototypes:
PROTOTYPES: DISABLE
The PROTOTYPE: Keyword
This keyword is similar to the PROTOTYPES: keyword above but can be used to force xsubpp to use a
specific prototype for the XSUB. This keyword overrides all other prototype options and keywords but
affects only the current XSUB. Consult Prototypes for information about Perl prototypes.
bool_t
rpcb_gettime(timep, ...)
time_t timep = NO_INIT
PROTOTYPE: $;$
PREINIT:
char *host = "localhost";
CODE:
if( items > 1 )
host = (char *)SvPV(ST(1), PL_na);
RETVAL = rpcb_gettime( host, &timep );
OUTPUT:
timep
RETVAL
The ALIAS: Keyword
The ALIAS: keyword allows an XSUB to have two or more unique Perl names and to know which of those
names was used when it was invoked. The Perl names may be fully−qualified with package names. Each
alias is given an index. The compiler will setup a variable called ix which contain the index of the alias
which was used. When the XSUB is called with its declared name ix will be 0.
The following example will create aliases FOO::gettime() and BAR::getit() for this function.
bool_t
18−Oct−1998
Version 5.005_02
481
perlxs
Perl Programmers Reference Guide
perlxs
rpcb_gettime(host,timep)
char *host
time_t &timep
ALIAS:
FOO::gettime = 1
BAR::getit = 2
INIT:
printf("# ix = %d\n", ix );
OUTPUT:
timep
The INTERFACE: Keyword
This keyword declares the current XSUB as a keeper of the given calling signature. If some text follows this
keyword, it is considered as a list of functions which have this signature, and should be attached to XSUBs.
Say, if you have 4 functions multiply(), divide(), add(), subtract() all having the signature
symbolic f(symbolic, symbolic);
you code them all by using XSUB
symbolic
interface_s_ss(arg1, arg2)
symbolic
arg1
symbolic
arg2
INTERFACE:
multiply divide
add subtract
The advantage of this approach comparing to ALIAS: keyword is that one can attach an extra function
remainder() at runtime by using
CV *mycv = newXSproto("Symbolic::remainder",
XS_Symbolic_interface_s_ss, __FILE__, "$$");
XSINTERFACE_FUNC_SET(mycv, remainder);
(This example supposes that there was no INTERFACE_MACRO: section, otherwise one needs to use
something else instead of XSINTERFACE_FUNC_SET.)
The INTERFACE_MACRO: Keyword
This keyword allows one to define an INTERFACE using a different way to extract a function pointer from
an XSUB. The text which follows this keyword should give the name of macros which would extract/set a
function pointer. The extractor macro is given return type, CV*, and XSANY.any_dptr for this CV*. The
setter macro is given cv, and the function pointer.
The default value is XSINTERFACE_FUNC and XSINTERFACE_FUNC_SET. An INTERFACE keyword
with an empty list of functions can be omitted if INTERFACE_MACRO keyword is used.
Suppose that in the previous example functions pointers for multiply(), divide(), add(),
subtract() are kept in a global C array fp[] with offsets being multiply_off, divide_off,
add_off, subtract_off. Then one can use
#define XSINTERFACE_FUNC_BYOFFSET(ret,cv,f) \
((XSINTERFACE_CVT(ret,))fp[CvXSUBANY(cv).any_i32])
#define XSINTERFACE_FUNC_BYOFFSET_set(cv,f) \
CvXSUBANY(cv).any_i32 = CAT2( f, _off )
in C section,
symbolic
interface_s_ss(arg1, arg2)
482
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
symbolicarg1
symbolicarg2
INTERFACE_MACRO:
XSINTERFACE_FUNC_BYOFFSET
XSINTERFACE_FUNC_BYOFFSET_set
INTERFACE:
multiply divide
add subtract
in XSUB section.
The INCLUDE: Keyword
This keyword can be used to pull other files into the XS module. The other files may have XS code.
INCLUDE: can also be used to run a command to generate the XS code to be pulled into the module.
The file Rpcb1.xsh contains our rpcb_gettime() function:
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
The XS module can use INCLUDE: to pull that file into it.
INCLUDE: Rpcb1.xsh
If the parameters to the INCLUDE: keyword are followed by a pipe (|) then the compiler will interpret the
parameters as a command.
INCLUDE: cat Rpcb1.xsh |
The CASE: Keyword
The CASE: keyword allows an XSUB to have multiple distinct parts with each part acting as a virtual
XSUB. CASE: is greedy and if it is used then all other XS keywords must be contained within a CASE:.
This means nothing may precede the first CASE: in the XSUB and anything following the last CASE: is
included in that case.
A CASE: might switch via a parameter of the XSUB, via the ix ALIAS: variable (see
"The ALIAS: Keyword"), or maybe via the items variable (see "Variable−length Parameter Lists"). The
last CASE: becomes the default case if it is not associated with a conditional. The following example shows
CASE switched via ix with a function rpcb_gettime() having an alias x_gettime(). When the
function is called as rpcb_gettime() its parameters are the usual (char *host, time_t
*timep), but when the function is called as x_gettime() its parameters are reversed, (time_t
*timep, char *host).
long
rpcb_gettime(a,b)
CASE: ix == 1
ALIAS:
x_gettime = 1
INPUT:
# ’a’ is timep, ’b’ is host
char *b
time_t a = NO_INIT
CODE:
RETVAL = rpcb_gettime( b, &a );
OUTPUT:
a
18−Oct−1998
Version 5.005_02
483
perlxs
Perl Programmers Reference Guide
perlxs
RETVAL
CASE:
# ’a’ is host, ’b’ is timep
char *a
time_t &b = NO_INIT
OUTPUT:
b
RETVAL
That function can be called with either of the following statements. Note the different argument lists.
$status = rpcb_gettime( $host, $timep );
$status = x_gettime( $timep, $host );
The & Unary Operator
The & unary operator is used to tell the compiler that it should dereference the object when it calls the C
function. This is used when a CODE: block is not used and the object is a not a pointer type (the object is an
int or long but not a int* or long*).
The following XSUB will generate incorrect C code. The xsubpp compiler will turn this into code which
calls rpcb_gettime() with parameters (char *host, time_t timep), but the real
rpcb_gettime() wants the timep parameter to be of type time_t* rather than time_t.
bool_t
rpcb_gettime(host,timep)
char *host
time_t timep
OUTPUT:
timep
That problem is corrected by using the & operator. The xsubpp compiler will now turn this into code which
calls rpcb_gettime() correctly with parameters (char *host, time_t *timep). It does this by
carrying the & through, so the function call looks like rpcb_gettime(host, &timep).
bool_t
rpcb_gettime(host,timep)
char *host
time_t &timep
OUTPUT:
timep
Inserting Comments and C Preprocessor Directives
C preprocessor directives are allowed within BOOT:, PREINIT: INIT:, CODE:, PPCODE:, and CLEANUP:
blocks, as well as outside the functions. Comments are allowed anywhere after the MODULE keyword. The
compiler will pass the preprocessor directives through untouched and will remove the commented lines.
Comments can be added to XSUBs by placing a # as the first non−whitespace of a line. Care should be
taken to avoid making the comment look like a C preprocessor directive, lest it be interpreted as such. The
simplest way to prevent this is to put whitespace in front of the #.
If you use preprocessor directives to choose one of two versions of a function, use
#if ... version1
#else /* ... version2
#endif
*/
and not
#if ... version1
#endif
484
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
#if ... version2
#endif
because otherwise xsubpp will believe that you made a duplicate definition of the function. Also, put a blank
line before the #else/#endif so it will not be seen as part of the function body.
Using XS With C++
If a function is defined as a C++ method then it will assume its first argument is an object pointer. The
object pointer will be stored in a variable called THIS. The object should have been created by C++ with the
new() function and should be blessed by Perl with the sv_setref_pv() macro. The blessing of the
object by Perl can be handled by a typemap. An example typemap is shown at the end of this section.
If the method is defined as static it will call the C++ function using the class::method() syntax. If the
method is not static the function will be called using the THIS−>method() syntax.
The next examples will use the following C++ class.
class color {
public:
color();
~color();
int blue();
void set_blue( int );
private:
int c_blue;
};
The XSUBs for the blue() and set_blue() methods are defined with the class name but the parameter
for the object (THIS, or "self") is implicit and is not listed.
int
color::blue()
void
color::set_blue( val )
int val
Both functions will expect an object as the first parameter. The xsubpp compiler will call that object THIS
and will use it to call the specified method. So in the C++ code the blue() and set_blue() methods
will be called in the following manner.
RETVAL = THIS−>blue();
THIS−>set_blue( val );
If the function‘s name is DESTROY then the C++ delete function will be called and THIS will be given
as its parameter.
void
color::DESTROY()
The C++ code will call delete.
delete THIS;
If the function‘s name is new then the C++ new function will be called to create a dynamic C++ object. The
XSUB will expect the class name, which will be kept in a variable called CLASS, to be given as the first
argument.
color *
color::new()
18−Oct−1998
Version 5.005_02
485
perlxs
Perl Programmers Reference Guide
perlxs
The C++ code will call new.
RETVAL = new color();
The following is an example of a typemap that could be used for this C++ example.
TYPEMAP
color *
O_OBJECT
OUTPUT
# The Perl object is blessed into ’CLASS’, which should be a
# char* having the name of the package for the blessing.
O_OBJECT
sv_setref_pv( $arg, CLASS, (void*)$var );
INPUT
O_OBJECT
if( sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG) )
$var = ($type)SvIV((SV*)SvRV( $arg ));
else{
warn( \"${Package}::$func_name() −− $var is not a blessed SV referenc
XSRETURN_UNDEF;
}
Interface Strategy
When designing an interface between Perl and a C library a straight translation from C to XS is often
sufficient. The interface will often be very C−like and occasionally nonintuitive, especially when the C
function modifies one of its parameters. In cases where the programmer wishes to create a more Perl−like
interface the following strategy may help to identify the more critical parts of the interface.
Identify the C functions which modify their parameters. The XSUBs for these functions may be able to
return lists to Perl, or may be candidates to return undef or an empty list in case of failure.
Identify which values are used by only the C and XSUB functions themselves. If Perl does not need to
access the contents of the value then it may not be necessary to provide a translation for that value from C to
Perl.
Identify the pointers in the C function parameter lists and return values. Some pointers can be handled in XS
with the & unary operator on the variable name while others will require the use of the * operator on the type
name. In general it is easier to work with the & operator.
Identify the structures used by the C functions. In many cases it may be helpful to use the T_PTROBJ
typemap for these structures so they can be manipulated by Perl as blessed objects.
Perl Objects And C Structures
When dealing with C structures one should select either T_PTROBJ or T_PTRREF for the XS type. Both
types are designed to handle pointers to complex objects. The T_PTRREF type will allow the Perl object to
be unblessed while the T_PTROBJ type requires that the object be blessed. By using T_PTROBJ one can
achieve a form of type−checking because the XSUB will attempt to verify that the Perl object is of the
expected type.
The following XS code shows the getnetconfigent() function which is used with ONC+ TIRPC. The
getnetconfigent() function will return a pointer to a C structure and has the C prototype shown
below. The example will demonstrate how the C pointer will become a Perl reference. Perl will consider
this reference to be a pointer to a blessed object and will attempt to call a destructor for the object. A
destructor will be provided in the XS source to free the memory used by getnetconfigent().
Destructors in XS can be created by specifying an XSUB function whose name ends with the word
DESTROY. XS destructors can be used to free memory which may have been malloc‘d by another XSUB.
struct netconfig *getnetconfigent(const char *netid);
486
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
A typedef will be created for struct netconfig. The Perl object will be blessed in a class matching
the name of the C type, with the tag Ptr appended, and the name should not have embedded spaces if it will
be a Perl package name. The destructor will be placed in a class corresponding to the class of the object and
the PREFIX keyword will be used to trim the name to the word DESTROY as Perl will expect.
typedef struct netconfig Netconfig;
MODULE = RPC
PACKAGE = RPC
Netconfig *
getnetconfigent(netid)
char *netid
MODULE = RPC
PACKAGE = NetconfigPtr
PREFIX = rpcb_
void
rpcb_DESTROY(netconf)
Netconfig *netconf
CODE:
printf("Now in NetconfigPtr::DESTROY\n");
free( netconf );
This example requires the following typemap entry. Consult the typemap section for more information about
adding new typemaps for an extension.
TYPEMAP
Netconfig *
T_PTROBJ
This example will be used with the following Perl statements.
use RPC;
$netconf = getnetconfigent("udp");
When Perl destroys the object referenced by $netconf it will send the object to the supplied XSUB
DESTROY function. Perl cannot determine, and does not care, that this object is a C struct and not a Perl
object. In this sense, there is no difference between the object created by the getnetconfigent()
XSUB and an object created by a normal Perl subroutine.
The Typemap
The typemap is a collection of code fragments which are used by the xsubpp compiler to map C function
parameters and values to Perl values. The typemap file may consist of three sections labeled TYPEMAP,
INPUT, and OUTPUT. The INPUT section tells the compiler how to translate Perl values into variables of
certain C types. The OUTPUT section tells the compiler how to translate the values from certain C types
into values Perl can understand. The TYPEMAP section tells the compiler which of the INPUT and
OUTPUT code fragments should be used to map a given C type to a Perl value. Each of the sections of the
typemap must be preceded by one of the TYPEMAP, INPUT, or OUTPUT keywords.
The default typemap in the ext directory of the Perl source contains many useful types which can be used
by Perl extensions. Some extensions define additional typemaps which they keep in their own directory.
These additional typemaps may reference INPUT and OUTPUT maps in the main typemap. The xsubpp
compiler will allow the extension‘s own typemap to override any mappings which are in the default
typemap.
Most extensions which require a custom typemap will need only the TYPEMAP section of the typemap file.
The custom typemap used in the getnetconfigent() example shown earlier demonstrates what may be
the typical use of extension typemaps. That typemap is used to equate a C structure with the T_PTROBJ
typemap. The typemap used by getnetconfigent() is shown here. Note that the C type is separated
from the XS type with a tab and that the C unary operator * is considered to be a part of the C type name.
TYPEMAP
Netconfig *T_PTROBJ
18−Oct−1998
Version 5.005_02
487
perlxs
Perl Programmers Reference Guide
perlxs
Here‘s a more complicated example: suppose that you wanted struct netconfig to be blessed into the
class Net::Config. One way to do this is to use underscores (_) to separate package names, as follows:
typedef struct netconfig * Net_Config;
And then provide a typemap entry T_PTROBJ_SPECIAL that maps underscores to double−colons (::), and
declare Net_Config to be of that type:
TYPEMAP
Net_Config
T_PTROBJ_SPECIAL
INPUT
T_PTROBJ_SPECIAL
if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")) {
IV tmp = SvIV((SV*)SvRV($arg));
$var = ($type) tmp;
}
else
croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$nt
OUTPUT
T_PTROBJ_SPECIAL
sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\",
(void*)$var);
The INPUT and OUTPUT sections substitute underscores for double−colons on the fly, giving the desired
effect. This example demonstrates some of the power and versatility of the typemap facility.
EXAMPLES
File RPC.xs: Interface to some ONC+ RPC bind library functions.
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include
typedef struct netconfig Netconfig;
MODULE = RPC
PACKAGE = RPC
SV *
rpcb_gettime(host="localhost")
char *host
PREINIT:
time_t timep;
CODE:
ST(0) = sv_newmortal();
if( rpcb_gettime( host, &timep ) )
sv_setnv( ST(0), (double)timep );
Netconfig *
getnetconfigent(netid="udp")
char *netid
MODULE = RPC
PACKAGE = NetconfigPtr
PREFIX = rpcb_
void
rpcb_DESTROY(netconf)
Netconfig *netconf
CODE:
printf("NetconfigPtr::DESTROY\n");
488
Version 5.005_02
18−Oct−1998
perlxs
Perl Programmers Reference Guide
perlxs
free( netconf );
File typemap: Custom typemap for RPC.xs.
TYPEMAP
Netconfig *
T_PTROBJ
File RPC.pm: Perl module for the RPC extension.
package RPC;
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
@EXPORT = qw(rpcb_gettime getnetconfigent);
bootstrap RPC;
1;
File rpctest.pl: Perl test program for the RPC extension.
use RPC;
$netconf = getnetconfigent();
$a = rpcb_gettime();
print "time = $a\n";
print "netconf = $netconf\n";
$netconf = getnetconfigent("tcp");
$a = rpcb_gettime("poplar");
print "time = $a\n";
print "netconf = $netconf\n";
XS VERSION
This document covers features supported by xsubpp 1.935.
AUTHOR
Dean Roehrich ’Mytest’,
’VERSION_FROM’ => ’Mytest.pm’, # finds $VERSION
’LIBS’
=> [’’],
# e.g., ’−lm’
’DEFINE’
=> ’’,
# e.g., ’−DHAVE_SOMETHING’
’INC’
=> ’’,
# e.g., ’−I/usr/include/other’
);
The file Mytest.pm should start with something like this:
package Mytest;
require Exporter;
require DynaLoader;
@ISA = qw(Exporter DynaLoader);
# Items to export into callers namespace by default. Note: do not export
# names by default without a very good reason. Use EXPORT_OK instead.
# Do not simply export all your public functions/methods/constants.
@EXPORT = qw(
);
$VERSION = ’0.01’;
bootstrap Mytest $VERSION;
# Preloaded methods go here.
# Autoload methods go after __END__, and are processed by the autosplit progr
1;
__END__
# Below is the stub of documentation for your module. You better edit it!
And the Mytest.xs file should look something like this:
#ifdef __cplusplus
extern "C" {
#endif
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus
}
#endif
PROTOTYPES: DISABLE
MODULE = Mytest
PACKAGE = Mytest
Let‘s edit the .xs file by adding this to the end of the file:
void
hello()
CODE:
printf("Hello, world!\n");
18−Oct−1998
Version 5.005_02
491
perlxstut
Perl Programmers Reference Guide
perlxstut
Now we‘ll run "perl Makefile.PL". This will create a real Makefile, which make needs. Its output looks
something like:
% perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Mytest
%
Now, running make will produce output that looks something like this (some long lines shortened for
clarity):
% make
umask 0 && cp Mytest.pm ./blib/Mytest.pm
perl xsubpp −typemap typemap Mytest.xs >Mytest.tc && mv Mytest.tc Mytest.c
cc −c Mytest.c
Running Mkbootstrap for Mytest ()
chmod 644 Mytest.bs
LD_RUN_PATH="" ld −o ./blib/PA−RISC1.1/auto/Mytest/Mytest.sl −b Mytest.o
chmod 755 ./blib/PA−RISC1.1/auto/Mytest/Mytest.sl
cp Mytest.bs ./blib/PA−RISC1.1/auto/Mytest/Mytest.bs
chmod 644 ./blib/PA−RISC1.1/auto/Mytest/Mytest.bs
Now, although there is already a test.pl template ready for us, for this example only, we‘ll create a special
test script. Create a file called hello that looks like this:
#! /opt/perl5/bin/perl
use ExtUtils::testlib;
use Mytest;
Mytest::hello();
Now we run the script and we should see the following output:
% perl hello
Hello, world!
%
EXAMPLE 2
Now let‘s add to our extension a subroutine that will take a single argument and return 1 if the argument is
even, 0 if the argument is odd.
Add the following to the end of Mytest.xs:
int
is_even(input)
int
input
CODE:
RETVAL = (input % 2 == 0);
OUTPUT:
RETVAL
There does not need to be white space at the start of the "int input" line, but it is useful for improving
readability. The semi−colon at the end of that line is also optional.
Any white space may be between the "int" and "input". It is also okay for the four lines starting at the
"CODE:" line to not be indented. However, for readability purposes, it is suggested that you indent them 8
spaces (or one normal tab stop).
Now rerun make to rebuild our new shared library.
492
Version 5.005_02
18−Oct−1998
perlxstut
Perl Programmers Reference Guide
perlxstut
Now perform the same steps as before, generating a Makefile from the Makefile.PL file, and running make.
To test that our extension works, we now need to look at the file test.pl. This file is set up to imitate the
same kind of testing structure that Perl itself has. Within the test script, you perform a number of tests to
confirm the behavior of the extension, printing "ok" when the test is correct, "not ok" when it is not. Change
the print statement in the BEGIN block to print "1..4", and add the following code to the end of the file:
print &Mytest::is_even(0) == 1 ? "ok 2" : "not ok 2", "\n";
print &Mytest::is_even(1) == 0 ? "ok 3" : "not ok 3", "\n";
print &Mytest::is_even(2) == 1 ? "ok 4" : "not ok 4", "\n";
We will be calling the test script through the command "make test". You should see output that looks
something like this:
% make test
PERL_DL_NONLAZY=1 /opt/perl5.002b2/bin/perl (lots of −I arguments) test.pl
1..4
ok 1
ok 2
ok 3
ok 4
%
WHAT HAS GONE ON?
The program h2xs is the starting point for creating extensions. In later examples we‘ll see how we can use
h2xs to read header files and generate templates to connect to C routines.
h2xs creates a number of files in the extension directory. The file Makefile.PL is a perl script which will
generate a true Makefile to build the extension. We‘ll take a closer look at it later.
The files .pm and .xs contain the meat of the extension. The .xs file holds the C
routines that make up the extension. The .pm file contains routines that tell Perl how to load your extension.
Generating and invoking the Makefile created a directory blib (which stands for "build library") in the
current working directory. This directory will contain the shared library that we will build. Once we have
tested it, we can install it into its final location.
Invoking the test script via "make test" did something very important. It invoked perl with all those −I
arguments so that it could find the various files that are part of the extension.
It is very important that while you are still testing extensions that you use "make test". If you try to run the
test script all by itself, you will get a fatal error.
Another reason it is important to use "make test" to run your test script is that if you are testing an upgrade to
an already−existing version, using "make test" insures that you use your new extension, not the
already−existing version.
When Perl sees a use extension;, it searches for a file with the same name as the use‘d extension that
has a .pm suffix. If that file cannot be found, Perl dies with a fatal error. The default search path is
contained in the @INC array.
In our case, Mytest.pm tells perl that it will need the Exporter and Dynamic Loader extensions. It then sets
the @ISA and @EXPORT arrays and the $VERSION scalar; finally it tells perl to bootstrap the module.
Perl will call its dynamic loader routine (if there is one) and load the shared library.
The two arrays that are set in the .pm file are very important. The @ISA array contains a list of other
packages in which to search for methods (or subroutines) that do not exist in the current package. The
@EXPORT array tells Perl which of the extension‘s routines should be placed into the calling package‘s
namespace.
It‘s important to select what to export carefully. Do NOT export method names and do NOT export anything
else by default without a good reason.
18−Oct−1998
Version 5.005_02
493
perlxstut
Perl Programmers Reference Guide
perlxstut
As a general rule, if the module is trying to be object−oriented then don‘t export anything. If it‘s just a
collection of functions then you can export any of the functions via another array, called @EXPORT_OK.
See perlmod for more information.
The $VERSION variable is used to ensure that the .pm file and the shared library are "in sync" with each
other. Any time you make changes to the .pm or .xs files, you should increment the value of this variable.
WRITING GOOD TEST SCRIPTS
The importance of writing good test scripts cannot be overemphasized. You should closely follow the
"ok/not ok" style that Perl itself uses, so that it is very easy and unambiguous to determine the outcome of
each test case. When you find and fix a bug, make sure you add a test case for it.
By running "make test", you ensure that your test.pl script runs and uses the correct version of your
extension. If you have many test cases, you might want to copy Perl‘s test style. Create a directory named
"t", and ensure all your test files end with the suffix ".t". The Makefile will properly run all these test files.
EXAMPLE 3
Our third extension will take one argument as its input, round off that value, and set the argument to the
rounded value.
Add the following to the end of Mytest.xs:
void
round(arg)
double arg
CODE:
if (arg > 0.0) {
arg = floor(arg + 0.5);
} else if (arg < 0.0) {
arg = ceil(arg − 0.5);
} else {
arg = 0.0;
}
OUTPUT:
arg
Edit the Makefile.PL file so that the corresponding line looks like this:
’LIBS’
=> [’−lm’],
# e.g., ’−lm’
Generate the Makefile and run make. Change the BEGIN block to print out "1..9" and add the following to
test.pl:
$i
$i
$i
$i
$i
=
=
=
=
=
−1.5; &Mytest::round($i); print $i == −2.0
−1.1; &Mytest::round($i); print $i == −1.0
0.0; &Mytest::round($i); print $i == 0.0 ?
0.5; &Mytest::round($i); print $i == 1.0 ?
1.2; &Mytest::round($i); print $i == 1.0 ?
? "ok 5"
? "ok 6"
"ok 7" :
"ok 8" :
"ok 9" :
: "not ok 5", "\n";
: "not ok 6", "\n";
"not ok 7", "\n";
"not ok 8", "\n";
"not ok 9", "\n";
Running "make test" should now print out that all nine tests are okay.
You might be wondering if you can round a constant. To see what happens, add the following line to test.pl
temporarily:
&Mytest::round(3);
Run "make test" and notice that Perl dies with a fatal error. Perl won‘t let you change the value of constants!
494
Version 5.005_02
18−Oct−1998
perlxstut
Perl Programmers Reference Guide
perlxstut
WHAT‘S NEW HERE?
Two things are new here. First, we‘ve made some changes to Makefile.PL. In this case, we‘ve specified an
extra library to link in, the math library libm. We‘ll talk later about how to write XSUBs that can call every
routine in a library.
Second, the value of the function is being passed back not as the function‘s return value, but through the
same variable that was passed into the function.
INPUT AND OUTPUT PARAMETERS
You specify the parameters that will be passed into the XSUB just after you declare the function return value
and name. Each parameter line starts with optional white space, and may have an optional terminating
semicolon.
The list of output parameters occurs after the OUTPUT: directive. The use of RETVAL tells Perl that you
wish to send this value back as the return value of the XSUB function. In Example 3, the value we wanted
returned was contained in the same variable we passed in, so we listed it (and not RETVAL) in the
OUTPUT: section.
THE XSUBPP COMPILER
The compiler xsubpp takes the XS code in the .xs file and converts it into C code, placing it in a file whose
suffix is .c. The C code created makes heavy use of the C functions within Perl.
THE TYPEMAP FILE
The xsubpp compiler uses rules to convert from Perl‘s data types (scalar, array, etc.) to C‘s data types (int,
char *, etc.). These rules are stored in the typemap file ($PERLLIB/ExtUtils/typemap). This file is
split into three parts.
The first part attempts to map various C data types to a coded flag, which has some correspondence with the
various Perl types. The second part contains C code which xsubpp uses for input parameters. The third part
contains C code which xsubpp uses for output parameters. We‘ll talk more about the C code later.
Let‘s now take a look at a portion of the .c file created for our extension.
XS(XS_Mytest_round)
{
dXSARGS;
if (items != 1)
croak("Usage: Mytest::round(arg)");
{
double arg = (double)SvNV(ST(0));
if (arg > 0.0) {
arg = floor(arg + 0.5);
} else if (arg < 0.0) {
arg = ceil(arg − 0.5);
} else {
arg = 0.0;
}
sv_setnv(ST(0), (double)arg);
}
XSRETURN(1);
}
/* XXXXX */
/* XXXXX */
Notice the two lines marked with "XXXXX". If you check the first section of the typemap file, you‘ll see
that doubles are of type T_DOUBLE. In the INPUT section, an argument that is T_DOUBLE is assigned to
the variable arg by calling the routine SvNV on something, then casting it to double, then assigned to the
variable arg. Similarly, in the OUTPUT section, once arg has its final value, it is passed to the sv_setnv
function to be passed back to the calling subroutine. These two functions are explained in perlguts; we‘ll
talk more later about what that "ST(0)" means in the section on the argument stack.
18−Oct−1998
Version 5.005_02
495
perlxstut
Perl Programmers Reference Guide
perlxstut
WARNING
In general, it‘s not a good idea to write extensions that modify their input parameters, as in Example 3.
However, to accommodate better calling pre−existing C routines, which often do modify their input
parameters, this behavior is tolerated. The next example will show how to do this.
EXAMPLE 4
In this example, we‘ll now begin to write XSUBs that will interact with predefined C libraries. To begin
with, we will build a small library of our own, then let h2xs write our .pm and .xs files for us.
Create a new directory called Mytest2 at the same level as the directory Mytest. In the Mytest2 directory,
create another directory called mylib, and cd into that directory.
Here we‘ll create some files that will generate a test library. These will include a C source file and a header
file. We‘ll also create a Makefile.PL in this directory. Then we‘ll make sure that running make at the
Mytest2 level will automatically run this Makefile.PL file and the resulting Makefile.
In the testlib directory, create a file mylib.h that looks like this:
#define TESTVAL 4
extern double
foo(int, long, const char*);
Also create a file mylib.c that looks like this:
#include
#include "./mylib.h"
double
foo(a, b, c)
int
a;
long
b;
const char *
c;
{
return (a + b + atof(c) + TESTVAL);
}
And finally create a file Makefile.PL that looks like this:
use ExtUtils::MakeMaker;
$Verbose = 1;
WriteMakefile(
NAME
=> ’Mytest2::mylib’,
SKIP
=> [qw(all static static_lib dynamic dynamic_lib)],
clean
=> {’FILES’ => ’libmylib$(LIB_EXT)’},
);
sub MY::top_targets {
’
all :: static
static ::
libmylib$(LIB_EXT)
libmylib$(LIB_EXT): $(O_FILES)
$(AR) cr libmylib$(LIB_EXT) $(O_FILES)
$(RANLIB) libmylib$(LIB_EXT)
’;
}
We will now create the main top−level Mytest2 files. Change to the directory above Mytest2 and run the
following command:
496
Version 5.005_02
18−Oct−1998
perlxstut
Perl Programmers Reference Guide
perlxstut
% h2xs −O −n Mytest2 ./Mytest2/mylib/mylib.h
This will print out a warning about overwriting Mytest2, but that‘s okay. Our files are stored in
Mytest2/mylib, and will be untouched.
The normal Makefile.PL that h2xs generates doesn‘t know about the mylib directory. We need to tell it that
there is a subdirectory and that we will be generating a library in it. Let‘s add the following key−value pair
to the WriteMakefile call:
’MYEXTLIB’ => ’mylib/libmylib$(LIB_EXT)’,
and a new replacement subroutine too:
sub MY::postamble {
’
$(MYEXTLIB): mylib/Makefile
cd mylib && $(MAKE) $(PASTHRU)
’;
}
(Note: Most makes will require that there be a tab character that indents the line cd mylib && $(MAKE)
$(PASTHRU), similarly for the Makefile in the subdirectory.)
Let‘s also fix the MANIFEST file so that it accurately reflects the contents of our extension. The single line
that says "mylib" should be replaced by the following three lines:
mylib/Makefile.PL
mylib/mylib.c
mylib/mylib.h
To keep our namespace nice and unpolluted, edit the .pm file and change the lines setting @EXPORT to
@EXPORT_OK (there are two: one in the line beginning "use vars" and one setting the array itself).
Finally, in the .xs file, edit the #include line to read:
#include "mylib/mylib.h"
And also add the following function definition to the end of the .xs file:
double
foo(a,b,c)
int
long
const char *
OUTPUT:
RETVAL
a
b
c
Now we also need to create a typemap file because the default Perl doesn‘t currently support the const char *
type. Create a file called typemap and place the following in it:
const char *
T_PV
Now run perl on the top−level Makefile.PL. Notice that it also created a Makefile in the mylib directory.
Run make and see that it does cd into the mylib directory and run make in there as well.
Now edit the test.pl script and change the BEGIN block to print "1..4", and add the following lines to the end
of the script:
print &Mytest2::foo(1, 2, "Hello, world!") == 7 ? "ok 2\n" : "not ok 2\n";
print &Mytest2::foo(1, 2, "0.0") == 7 ? "ok 3\n" : "not ok 3\n";
print abs(&Mytest2::foo(0, 0, "−3.4") − 0.6) <= 0.01 ? "ok 4\n" : "not ok 4\n
(When dealing with floating−point comparisons, it is often useful not to check for equality, but rather the
difference being below a certain epsilon factor, 0.01 in this case)
18−Oct−1998
Version 5.005_02
497
perlxstut
Perl Programmers Reference Guide
perlxstut
Run "make test" and all should be well.
WHAT HAS HAPPENED HERE?
Unlike previous examples, we‘ve now run h2xs on a real include file. This has caused some extra goodies to
appear in both the .pm and .xs files.
In the .xs file, there‘s now a #include declaration with the full path to the mylib.h header file.
There‘s now some new C code that‘s been added to the .xs file. The purpose of the constant
routine is to make the values that are #define‘d in the header file available to the Perl script (in this
case, by calling &main::TESTVAL). There‘s also some XS code to allow calls to the constant
routine.
The .pm file has exported the name TESTVAL in the @EXPORT array. This could lead to name
clashes. A good rule of thumb is that if the #define is going to be used by only the C routines
themselves, and not by the user, they should be removed from the @EXPORT array. Alternately, if
you don‘t mind using the "fully qualified name" of a variable, you could remove most or all of the
items in the @EXPORT array.
If our include file contained #include directives, these would not be processed at all by h2xs. There is
no good solution to this right now.
We‘ve also told Perl about the library that we built in the mylib subdirectory. That required the addition of
only the MYEXTLIB variable to the WriteMakefile call and the replacement of the postamble subroutine to
cd into the subdirectory and run make. The Makefile.PL for the library is a bit more complicated, but not
excessively so. Again we replaced the postamble subroutine to insert our own code. This code specified
simply that the library to be created here was a static archive (as opposed to a dynamically loadable library)
and provided the commands to build it.
SPECIFYING ARGUMENTS TO XSUBPP
With the completion of Example 4, we now have an easy way to simulate some real−life libraries whose
interfaces may not be the cleanest in the world. We shall now continue with a discussion of the arguments
passed to the xsubpp compiler.
When you specify arguments in the .xs file, you are really passing three pieces of information for each one
listed. The first piece is the order of that argument relative to the others (first, second, etc). The second is
the type of argument, and consists of the type declaration of the argument (e.g., int, char*, etc). The third
piece is the exact way in which the argument should be used in the call to the library function from this
XSUB. This would mean whether or not to place a "&" before the argument or not, meaning the argument
expects to be passed the address of the specified data type.
There is a difference between the two arguments in this hypothetical function:
int
foo(a,b)
char
char *
&a
b
The first argument to this function would be treated as a char and assigned to the variable a, and its address
would be passed into the function foo. The second argument would be treated as a string pointer and
assigned to the variable b. The value of b would be passed into the function foo. The actual call to the
function foo that xsubpp generates would look like this:
foo(&a, b);
Xsubpp will identically parse the following function argument lists:
char
char&a
char
498
&a
& a
Version 5.005_02
18−Oct−1998
perlxstut
Perl Programmers Reference Guide
perlxstut
However, to help ease understanding, it is suggested that you place a "&" next to the variable name and away
from the variable type), and place a "*" near the variable type, but away from the variable name (as in the
complete example above). By doing so, it is easy to understand exactly what will be passed to the C function
— it will be whatever is in the "last column".
You should take great pains to try to pass the function the type of variable it wants, when possible. It will
save you a lot of trouble in the long run.
THE ARGUMENT STACK
If we look at any of the C code generated by any of the examples except example 1, you will notice a number
of references to ST(n), where n is usually 0. The "ST" is actually a macro that points to the n‘th argument on
the argument stack. ST(0) is thus the first argument passed to the XSUB, ST(1) is the second argument, and
so on.
When you list the arguments to the XSUB in the .xs file, that tells xsubpp which argument corresponds to
which of the argument stack (i.e., the first one listed is the first argument, and so on). You invite disaster if
you do not list them in the same order as the function expects them.
EXTENDING YOUR EXTENSION
Sometimes you might want to provide some extra methods or subroutines to assist in making the interface
between Perl and your extension simpler or easier to understand. These routines should live in the .pm file.
Whether they are automatically loaded when the extension itself is loaded or loaded only when called
depends on where in the .pm file the subroutine definition is placed.
DOCUMENTING YOUR EXTENSION
There is absolutely no excuse for not documenting your extension. Documentation belongs in the .pm file.
This file will be fed to pod2man, and the embedded documentation will be converted to the manpage format,
then placed in the blib directory. It will be copied to Perl‘s man page directory when the extension is
installed.
You may intersperse documentation and Perl code within the .pm file. In fact, if you want to use method
autoloading, you must do this, as the comment inside the .pm file explains.
See perlpod for more information about the pod format.
INSTALLING YOUR EXTENSION
Once your extension is complete and passes all its tests, installing it is quite simple: you simply run "make
install". You will either need to have write permission into the directories where Perl is installed, or ask your
system administrator to run the make for you.
SEE ALSO
For more information, consult perlguts, perlxs, perlmod, and perlpod.
Author
Jeff Okamoto used only once: possible typo" warning.
GV_ADDWARN Issues the warning "Had to create unexpectedly" if
the variable did not exist before the function was called.
If you do not specify a package name, the variable is created in the current package.
Reference Counts and Mortality
Perl uses an reference count−driven garbage collection mechanism. SVs, AVs, or HVs (xV for short in the
following) start their life with a reference count of 1. If the reference count of an xV ever drops to 0, then it
will be destroyed and its memory made available for reuse.
This normally doesn‘t happen at the Perl level unless a variable is undef‘ed or the last variable holding a
reference to it is changed or overwritten. At the internal level, however, reference counts can be manipulated
with the following macros:
int SvREFCNT(SV* sv);
SV* SvREFCNT_inc(SV* sv);
void SvREFCNT_dec(SV* sv);
However, there is one other function which manipulates the reference count of its argument. The
newRV_inc function, you will recall, creates a reference to the specified argument. As a side effect, it
increments the argument‘s reference count. If this is not what you want, use newRV_noinc instead.
For example, imagine you want to return a reference from an XSUB function. Inside the XSUB routine, you
create an SV which initially has a reference count of one. Then you call newRV_inc, passing it the
just−created SV. This returns the reference as a new SV, but the reference count of the SV you passed to
newRV_inc has been incremented to two. Now you return the reference from the XSUB routine and forget
about the SV. But Perl hasn‘t! Whenever the returned reference is destroyed, the reference count of the
original SV is decreased to one and nothing happens. The SV will hang around without any way to access it
until Perl itself terminates. This is a memory leak.
The correct procedure, then, is to use newRV_noinc instead of newRV_inc. Then, if and when the last
reference is destroyed, the reference count of the SV will go to zero and it will be destroyed, stopping any
memory leak.
There are some convenience functions available that can help with the destruction of xVs. These functions
introduce the concept of "mortality". An xV that is mortal has had its reference count marked to be
decremented, but not actually decremented, until "a short time later". Generally the term "short time later"
means a single Perl statement, such as a call to an XSUB function. The actual determinant for when mortal
xVs have their reference count decremented depends on two macros, SAVETMPS and FREETMPS. See
perlcall and perlxs for more details on these macros.
"Mortalization" then is at its simplest a deferred SvREFCNT_dec. However, if you mortalize a variable
twice, the reference count will later be decremented twice.
You should be careful about creating mortal variables. Strange things can happen if you make the same
value mortal within multiple contexts, or if you make a variable mortal multiple times.
To create a mortal variable, use the functions:
SV*
SV*
SV*
18−Oct−1998
sv_newmortal()
sv_2mortal(SV*)
sv_mortalcopy(SV*)
Version 5.005_02
507
perlguts
Perl Programmers Reference Guide
perlguts
The first call creates a mortal SV, the second converts an existing SV to a mortal SV (and thus defers a call
to SvREFCNT_dec), and the third creates a mortal copy of an existing SV.
The mortal routines are not just for SVs — AVs and HVs can be made mortal by passing their address
(type−casted to SV*) to the sv_2mortal or sv_mortalcopy routines.
Stashes and Globs
A "stash" is a hash that contains all of the different objects that are contained within a package. Each key of
the stash is a symbol name (shared by all the different types of objects that have the same name), and each
value in the hash table is a GV (Glob Value). This GV in turn contains references to the various objects of
that name, including (but not limited to) the following:
Scalar Value
Array Value
Hash Value
I/O Handle
Format
Subroutine
There is a single stash called "PL_defstash" that holds the items that exist in the "main" package. To get at
the items in other packages, append the string "::" to the package name. The items in the "Foo" package are
in the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are in the stash "Baz::" in "Bar::"‘s
stash.
To get the stash pointer for a particular package, use the function:
HV*
HV*
gv_stashpv(char* name, I32 create)
gv_stashsv(SV*, I32 create)
The first function takes a literal string, the second uses the string stored in the SV. Remember that a stash is
just a hash table, so you get back an HV*. The create flag will create a new package if it is set.
The name that gv_stash*v wants is the name of the package whose symbol table you want. The default
package is called main. If you have multiply nested packages, pass their names to gv_stash*v,
separated by :: as in the Perl language itself.
Alternately, if you have an SV that is a blessed reference, you can find out the stash pointer by using:
HV*
SvSTASH(SvRV(SV*));
then use the following to get the package name itself:
char*
HvNAME(HV* stash);
If you need to bless or re−bless an object you can use the following function:
SV*
sv_bless(SV*, HV* stash)
where the first argument, an SV*, must be a reference, and the second argument is a stash. The returned
SV* can now be used in the same way as any other SV.
For more information on references and blessings, consult perlref.
Double−Typed SVs
Scalar variables normally contain only one type of value, an integer, double, pointer, or reference. Perl will
automatically convert the actual scalar data from the stored type into the requested type.
Some scalar variables contain more than one type of scalar data. For example, the variable $! contains
either the numeric value of errno or its string equivalent from either strerror or sys_errlist[].
To force multiple data values into an SV, you must do two things: use the sv_set*v routines to add the
additional scalar type, then set a flag so that Perl will believe it contains more than one type of data. The
four macros to set the flags are:
508
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
SvIOK_on
SvNOK_on
SvPOK_on
SvROK_on
The particular macro you must use depends on which sv_set*v routine you called first. This is because
every sv_set*v routine turns on only the bit for the particular type of data being set, and turns off all the
rest.
For example, to create a new Perl variable called "dberror" that contains both the numeric and descriptive
string error values, you could use the following code:
extern int dberror;
extern char *dberror_list;
SV* sv = perl_get_sv("dberror", TRUE);
sv_setiv(sv, (IV) dberror);
sv_setpv(sv, dberror_list[dberror]);
SvIOK_on(sv);
If the order of sv_setiv and sv_setpv had been reversed, then the macro SvPOK_on would need to be
called instead of SvIOK_on.
Magic Variables
[This section still under construction. Ignore everything here. Post no bills. Everything not permitted is
forbidden.]
Any SV may be magical, that is, it has special features that a normal SV does not have. These features are
stored in the SV structure in a linked list of struct magic‘s, typedef‘ed to MAGIC.
struct magic {
MAGIC*
MGVTBL*
U16
char
U8
SV*
char*
I32
};
mg_moremagic;
mg_virtual;
mg_private;
mg_type;
mg_flags;
mg_obj;
mg_ptr;
mg_len;
Note this is current as of patchlevel 0, and could change at any time.
Assigning Magic
Perl adds magic to an SV using the sv_magic function:
void sv_magic(SV* sv, SV* obj, int how, char* name, I32 namlen);
The sv argument is a pointer to the SV that is to acquire a new magical feature.
If sv is not already magical, Perl uses the SvUPGRADE macro to set the SVt_PVMG flag for the sv. Perl
then continues by adding it to the beginning of the linked list of magical features. Any prior entry of the
same type of magic is deleted. Note that this can be overridden, and multiple instances of the same type of
magic can be associated with an SV.
The name and namlen arguments are used to associate a string with the magic, typically the name of a
variable. namlen is stored in the mg_len field and if name is non−null and namlen = 0 a malloc‘d copy
of the name is stored in mg_ptr field.
The sv_magic function uses how to determine which, if any, predefined "Magic Virtual Table" should be
assigned to the mg_virtual field. See the "Magic Virtual Table" section below. The how argument is
also stored in the mg_type field.
18−Oct−1998
Version 5.005_02
509
perlguts
Perl Programmers Reference Guide
perlguts
The obj argument is stored in the mg_obj field of the MAGIC structure. If it is not the same as the sv
argument, the reference count of the obj object is incremented. If it is the same, or if the how argument is
"#", or if it is a NULL pointer, then obj is merely stored, without the reference count being incremented.
There is also a function to add magic to an HV:
void hv_magic(HV *hv, GV *gv, int how);
This simply calls sv_magic and coerces the gv argument into an SV.
To remove the magic from an SV, call the function sv_unmagic:
void sv_unmagic(SV *sv, int type);
The type argument should be equal to the how value when the SV was initially made magical.
Magic Virtual Tables
The mg_virtual field in the MAGIC structure is a pointer to a MGVTBL, which is a structure of function
pointers and stands for "Magic Virtual Table" to handle the various operations that might be applied to that
variable.
The MGVTBL has five pointers to the following routine types:
int
int
U32
int
int
(*svt_get)(SV* sv, MAGIC* mg);
(*svt_set)(SV* sv, MAGIC* mg);
(*svt_len)(SV* sv, MAGIC* mg);
(*svt_clear)(SV* sv, MAGIC* mg);
(*svt_free)(SV* sv, MAGIC* mg);
This MGVTBL structure is set at compile−time in perl.h and there are currently 19 types (or 21 with
overloading turned on). These different structures contain pointers to various routines that perform
additional actions depending on which function is being called.
Function pointer
−−−−−−−−−−−−−−−−
svt_get
svt_set
svt_len
svt_clear
svt_free
Action taken
−−−−−−−−−−−−
Do something after the value of the SV is retrieved.
Do something after the SV is assigned a value.
Report on the SV’s length.
Clear something the SV represents.
Free any extra storage associated with the SV.
For instance, the MGVTBL structure called vtbl_sv (which corresponds to an mg_type of ‘\0’) contains:
{ magic_get, magic_set, magic_len, 0, 0 }
Thus, when an SV is determined to be magical and of type ‘\0‘, if a get operation is being performed, the
routine magic_get is called. All the various routines for the various magical types begin with magic_.
The current kinds of Magic Virtual Tables are:
mg_type
−−−−−−−
\0
A
a
c
B
E
e
f
g
510
MGVTBL
−−−−−−
vtbl_sv
vtbl_amagic
vtbl_amagicelem
(none)
vtbl_bm
vtbl_env
vtbl_envelem
vtbl_fm
vtbl_mglob
Type of magic
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Special scalar variable
%OVERLOAD hash
%OVERLOAD hash element
Holds overload table (AMT) on stash
Boyer−Moore (fast string search)
%ENV hash
%ENV hash element
Formline (’compiled’ format)
m//g target / study()ed string
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
I
i
k
L
l
o
P
p
q
S
s
t
U
v
x
y
vtbl_isa
vtbl_isaelem
vtbl_nkeys
(none)
vtbl_dbline
vtbl_collxfrm
vtbl_pack
vtbl_packelem
vtbl_packelem
vtbl_sig
vtbl_sigelem
vtbl_taint
vtbl_uvar
vtbl_vec
vtbl_substr
vtbl_defelem
*
#
.
~
vtbl_glob
vtbl_arylen
vtbl_pos
(none)
perlguts
@ISA array
@ISA array element
scalar(keys()) lvalue
Debugger %_ $c −−−> + −−−> $a −−−> assign−to
But with the actual compile tree for $a = $b + $c it is different: some nodes optimized away. As a
corollary, though the actual tree contains more nodes than our simplified example, the execution order is the
same as in our example.
Examining the tree
If you have your perl compiled for debugging (usually done with −D optimize=−g on Configure
command line), you may examine the compiled tree by specifying −Dx on the Perl command line. The
output takes several lines per node, and for $b+$c it looks like this:
5
3
18−Oct−1998
TYPE = add ===> 6
TARG = 1
FLAGS = (SCALAR,KIDS)
{
TYPE = null ===> (4)
(was rv2sv)
FLAGS = (SCALAR,KIDS)
{
TYPE = gvsv ===> 4
FLAGS = (SCALAR)
GV = main::b
}
}
{
Version 5.005_02
517
perlguts
Perl Programmers Reference Guide
perlguts
TYPE = null ===> (5)
(was rv2sv)
FLAGS = (SCALAR,KIDS)
{
TYPE = gvsv ===> 5
FLAGS = (SCALAR)
GV = main::c
}
4
}
This tree has 5 nodes (one per TYPE specifier), only 3 of them are not optimized away (one per number in
the left column). The immediate children of the given node correspond to {} pairs on the same level of
indentation, thus this listing corresponds to the tree:
add
/
null
|
gvsv
\
null
|
gvsv
The execution order is indicated by ===> marks, thus it is 3 4 5 6 (node 6 is not included into above
listing), i.e., gvsv gvsv add whatever.
Compile pass 1: check routines
The tree is created by the pseudo−compiler while yacc code feeds it the constructions it recognizes. Since
yacc works bottom−up, so does the first pass of perl compilation.
What makes this pass interesting for perl developers is that some optimization may be performed on this
pass. This is optimization by so−called check routines. The correspondence between node names and
corresponding check routines is described in opcode.pl (do not forget to run make regen_headers if
you modify this file).
A check routine is called when the node is fully constructed except for the execution−order thread. Since at
this time there are no back−links to the currently constructed node, one can do most any operation to the
top−level node, including freeing it and/or creating new nodes above/below it.
The check routine returns the node which should be inserted into the tree (if the top−level node was not
modified, check routine returns its argument).
By convention, check routines have names ck_*. They are usually called from new*OP subroutines (or
convert) (which in turn are called from perly.y).
Compile pass 1a: constant folding
Immediately after the check routine is called the returned node is checked for being compile−time
executable. If it is (the value is judged to be constant) it is immediately executed, and a constant node with
the "return value" of the corresponding subtree is substituted instead. The subtree is deleted.
If constant folding was not performed, the execution−order thread is created.
Compile pass 2: context propagation
When a context for a part of compile tree is known, it is propagated down through the tree. At this time the
context can have 5 values (instead of 2 for runtime context): void, boolean, scalar, list, and lvalue. In
contrast with the pass 1 this pass is processed from top to bottom: a node‘s context determines the context
for its children.
Additional context−dependent optimizations are performed at this time. Since at this moment the compile
tree contains back−references (via "thread" pointers), nodes cannot be free()d now. To allow
optimized−away nodes at this stage, such nodes are null()ified instead of free()ing (i.e. their type is
changed to OP_NULL).
518
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
Compile pass 3: peephole optimization
After the compile tree for a subroutine (or for an eval or a file) is created, an additional pass over the code
is performed. This pass is neither top−down or bottom−up, but in the execution order (with additional
complications for conditionals). These optimizations are done in the subroutine peep(). Optimizations
performed at this stage are subject to the same restrictions as in the pass 2.
API LISTING
This is a listing of functions, macros, flags, and variables that may be useful to extension writers or that may
be found while reading other extensions.
Note that all Perl API global variables must be referenced with the PL_ prefix. Some macros are provided
for compatibility with the older, unadorned names, but this support will be removed in a future release.
It is strongly recommended that all Perl API functions that don‘t begin with perl be referenced with an
explicit Perl_ prefix.
The sort order of the listing is case insensitive, with any occurrences of ‘_’ ignored for the the purpose of
sorting.
av_clear
Clears an array, making it empty. Does not free the memory used by the array itself.
void
av_clear (AV* ar)
av_extend
Pre−extend an array. The key is the index to which the array should be extended.
void
av_fetch
av_extend (AV* ar, I32 key)
Returns the SV at the specified index in the array. The key is the index. If lval is set then the
fetch will be part of a store. Check that the return value is non−null before dereferencing it to a
SV*.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied arrays.
SV**
av_fetch (AV* ar, I32 key, I32 lval)
AvFILL
Same as av_len(). Deprecated, use av_len() instead.
av_len
Returns the highest index in the array. Returns −1 if the array is empty.
I32
av_len (AV* ar)
av_make Creates a new AV and populates it with a list of SVs. The SVs are copied into the array, so they
may be freed after the call to av_make. The new AV will have a reference count of 1.
AV*
av_pop
Pops an SV off the end of the array. Returns &PL_sv_undef if the array is empty.
SV*
av_push
av_pop (AV* ar)
Pushes an SV onto the end of the array. The array will grow automatically to accommodate the
addition.
void
av_shift
av_make (I32 size, SV** svp)
av_push (AV* ar, SV* val)
Shifts an SV off the beginning of the array.
SV*
av_shift (AV* ar)
av_store Stores an SV in an array. The array index is specified as key. The return value will be NULL if
the operation failed or if the value did not need to be actually stored within the array (as in the
case of tied arrays). Otherwise it can be dereferenced to get the original SV*. Note that the
caller is responsible for suitably incrementing the reference count of val before the call, and
18−Oct−1998
Version 5.005_02
519
perlguts
Perl Programmers Reference Guide
perlguts
decrementing it if the function returned NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied arrays.
SV**
av_store (AV* ar, I32 key, SV* val)
av_undef Undefines the array. Frees the memory used by the array itself.
void
av_undef (AV* ar)
av_unshift
Unshift the given number of undef values onto the beginning of the array. The array will grow
automatically to accommodate the addition. You must then use av_store to assign values to
these new elements.
void
av_unshift (AV* ar, I32 num)
CLASS
Variable which is setup by xsubpp to indicate the class name for a C++ XS constructor. This is
always a char*. See THIS and Using XS With C++ in perlxs.
Copy
The XSUB−writer‘s interface to the C memcpy function. The s is the source, d is the
destination, n is the number of items, and t is the type. May fail on overlapping copies. See
also Move.
void
croak
Copy( s, d, n, t )
This is the XSUB−writer‘s interface to Perl‘s die function. Use this function the same way you
use the C printf function. See warn.
CvSTASH
Returns the stash of the CV.
HV*
CvSTASH( SV* sv )
PL_DBsingle
When Perl is run in debugging mode, with the −d switch, this SV is a boolean which indicates
whether subs are being single−stepped. Single−stepping is automatically turned on after every
step. This is the C variable which corresponds to Perl‘s $DB::single variable. See
PL_DBsub.
PL_DBsub
When Perl is run in debugging mode, with the −d switch, this GV contains the SV which holds
the name of the sub being debugged. This is the C variable which corresponds to Perl‘s
$DB::sub variable. See PL_DBsingle. The sub name can be found by
SvPV( GvSV( PL_DBsub ), PL_na )
PL_DBtrace
Trace variable used when Perl is run in debugging mode, with the −d switch. This is the C
variable which corresponds to Perl‘s $DB::trace variable. See PL_DBsingle.
dMARK
Declare a stack marker variable, mark, for the XSUB. See MARK and dORIGMARK.
dORIGMARK
Saves the original stack mark for the XSUB. See ORIGMARK.
PL_dowarn
The C variable which corresponds to Perl‘s $^W warning variable.
dSP
520
Declares a local copy of perl‘s stack pointer for the XSUB, available via the SP macro. See SP.
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
dXSARGS
Sets up stack and mark pointers for an XSUB, calling dSP and dMARK. This is usually handled
automatically by xsubpp. Declares the items variable to indicate the number of items on the
stack.
dXSI32
Sets up the ix variable for an XSUB which has aliases. This is usually handled automatically by
xsubpp.
do_binmode
Switches filehandle to binmode. iotype is what IoTYPE(io) would contain.
do_binmode(fp, iotype, TRUE);
ENTER
Opening bracket on a callback. See LEAVE and perlcall.
ENTER;
EXTEND Used to extend the argument stack for an XSUB‘s return values.
EXTEND( sp, int x )
fbm_compile
Analyses the string in order to make fast searches on it using fbm_instr() — the
Boyer−Moore algorithm.
void
fbm_compile(SV* sv, U32 flags)
fbm_instr Returns the location of the SV in the string delimited by str and strend. It returns Nullch
if the string can‘t be found. The sv does not have to be fbm_compiled, but the search will not
be as fast then.
char*
fbm_instr(char *str, char *strend, SV *sv, U32 flags)
FREETMPS
Closing bracket for temporaries on a callback. See SAVETMPS and perlcall.
FREETMPS;
G_ARRAY
Used to indicate array context. See GIMME_V, GIMME and perlcall.
G_DISCARD
Indicates that arguments returned from a callback should be discarded. See perlcall.
G_EVAL Used to force a Perl eval wrapper around a callback. See perlcall.
GIMME
A backward−compatible version of GIMME_V which can only return G_SCALAR or G_ARRAY;
in a void context, it returns G_SCALAR.
GIMME_V
The XSUB−writer‘s equivalent to Perl‘s wantarray.
G_ARRAY for void, scalar or array context, respectively.
Returns G_VOID, G_SCALAR or
G_NOARGS
Indicates that no arguments are being sent to a callback. See perlcall.
G_SCALAR
Used to indicate scalar context. See GIMME_V, GIMME, and perlcall.
gv_fetchmeth
Returns the glob with the given name and a defined subroutine or NULL. The glob lives in the
given stash, or in the stashes accessible via @ISA and @UNIVERSAL.
18−Oct−1998
Version 5.005_02
521
perlguts
Perl Programmers Reference Guide
perlguts
The argument level should be either 0 or −1. If level==0, as a side−effect creates a glob
with the given name in the given stash which in the case of success contains an alias for the
subroutine, and sets up caching info for this glob. Similarly for all the searched stashes.
This function grants "SUPER" token as a postfix of the stash name.
The GV returned from gv_fetchmeth may be a method cache entry, which is not visible to
Perl code. So when calling perl_call_sv, you should not use the GV directly; instead, you
should use the method‘s CV, which can be obtained from the GV with the GvCV macro.
GV*
gv_fetchmeth (HV* stash, char* name, STRLEN len, I32 level)
gv_fetchmethod
gv_fetchmethod_autoload
Returns the glob which contains the subroutine to call to invoke the method on the stash. In
fact in the presense of autoloading this may be the glob for "AUTOLOAD". In this case the
corresponding variable $AUTOLOAD is already setup.
The third parameter of gv_fetchmethod_autoload determines whether AUTOLOAD
lookup is performed if the given method is not present: non−zero means yes, look for
AUTOLOAD; zero means no, don‘t look for AUTOLOAD. Calling gv_fetchmethod is
equivalent to calling gv_fetchmethod_autoload with a non−zero autoload parameter.
These functions grant "SUPER" token as a prefix of the method name.
Note that if you want to keep the returned glob for a long time, you need to check for it being
"AUTOLOAD", since at the later time the call may load a different subroutine due to
$AUTOLOAD changing its value. Use the glob created via a side effect to do this.
These functions have the same side−effects and as gv_fetchmeth with level==0. name
should be writable if contains ‘:’ or ‘\‘’. The warning against passing the GV returned by
gv_fetchmeth to perl_call_sv apply equally to these functions.
GV*
GV*
G_VOID
gv_fetchmethod (HV* stash, char* name)
gv_fetchmethod_autoload (HV* stash, char* name, I32 autoload)
Used to indicate void context. See GIMME_V and perlcall.
gv_stashpv
Returns a pointer to the stash for a specified package. If create is set then the package will be
created if it does not already exist. If create is not set and the package does not exist then
NULL is returned.
HV*
gv_stashpv (char* name, I32 create)
gv_stashsv
Returns a pointer to the stash for a specified package. See gv_stashpv.
HV*
GvSV
gv_stashsv (SV* sv, I32 create)
Return the SV from the GV.
HEf_SVKEY
This flag, used in the length slot of hash entries and magic structures, specifies the structure
contains a SV* pointer where a char* pointer is to be expected. (For information only—not to
be used).
HeHASH Returns the computed hash stored in the hash entry.
U32
522
HeHASH(HE* he)
Version 5.005_02
18−Oct−1998
perlguts
HeKEY
Perl Programmers Reference Guide
perlguts
Returns the actual pointer stored in the key slot of the hash entry. The pointer may be either
char* or SV*, depending on the value of HeKLEN(). Can be assigned to. The HePV() or
HeSVKEY() macros are usually preferable for finding the value of a key.
char*
HeKEY(HE* he)
HeKLEN If this is negative, and amounts to HEf_SVKEY, it indicates the entry holds an SV* key.
Otherwise, holds the actual length of the key. Can be assigned to. The HePV() macro is usually
preferable for finding key lengths.
int
HePV
HeKLEN(HE* he)
Returns the key slot of the hash entry as a char* value, doing any necessary dereferencing of
possibly SV* keys. The length of the string is placed in len (this is a macro, so do not use
&len). If you do not care about what the length of the key is, you may use the global variable
PL_na. Remember though, that hash keys in perl are free to contain embedded nulls, so using
strlen() or similar is not a good way to find the length of hash keys. This is very similar to
the SvPV() macro described elsewhere in this document.
char*
HePV(HE* he, STRLEN len)
HeSVKEY
Returns the key as an SV*, or Nullsv if the hash entry does not contain an SV* key.
HeSVKEY(HE* he)
HeSVKEY_force
Returns the key as an SV*. Will create and return a temporary mortal SV* if the hash entry
contains only a char* key.
HeSVKEY_force(HE* he)
HeSVKEY_set
Sets the key to a given SV*, taking care to set the appropriate flags to indicate the presence of an
SV* key, and returns the same SV*.
HeSVKEY_set(HE* he, SV* sv)
HeVAL
Returns the value slot (type SV*) stored in the hash entry.
HeVAL(HE* he)
hv_clear
Clears a hash, making it empty.
void
hv_clear (HV* tb)
hv_delayfree_ent
Releases a hash entry, such as while iterating though the hash, but delays actual freeing of key
and value until the end of the current statement (or thereabouts) with sv_2mortal. See
hv_iternext and hv_free_ent.
void
hv_delayfree_ent (HV* hv, HE* entry)
hv_delete
Deletes a key/value pair in the hash. The value SV is removed from the hash and returned to the
caller. The klen is the length of the key. The flags value will normally be zero; if set to
G_DISCARD then NULL will be returned.
SV*
hv_delete (HV* tb, char* key, U32 klen, I32 flags)
hv_delete_ent
Deletes a key/value pair in the hash. The value SV is removed from the hash and returned to the
caller. The flags value will normally be zero; if set to G_DISCARD then NULL will be
18−Oct−1998
Version 5.005_02
523
perlguts
Perl Programmers Reference Guide
perlguts
returned. hash can be a valid precomputed hash value, or 0 to ask for it to be computed.
SV*
hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash)
hv_exists Returns a boolean indicating whether the specified hash key exists. The klen is the length of
the key.
bool
hv_exists (HV* tb, char* key, U32 klen)
hv_exists_ent
Returns a boolean indicating whether the specified hash key exists. hash can be a valid
precomputed hash value, or 0 to ask for it to be computed.
bool
hv_fetch
hv_exists_ent (HV* tb, SV* key, U32 hash)
Returns the SV which corresponds to the specified key in the hash. The klen is the length of
the key. If lval is set then the fetch will be part of a store. Check that the return value is
non−null before dereferencing it to a SV*.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied hashes.
SV**
hv_fetch (HV* tb, char* key, U32 klen, I32 lval)
hv_fetch_ent
Returns the hash entry which corresponds to the specified key in the hash. hash must be a valid
precomputed hash number for the given key, or 0 if you want the function to compute it. IF
lval is set then the fetch will be part of a store. Make sure the return value is non−null before
accessing it. The return value when tb is a tied hash is a pointer to a static location, so be sure
to make a copy of the structure if you need to store it somewhere.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied hashes.
HE*
hv_fetch_ent
(HV* tb, SV* key, I32 lval, U32 hash)
hv_free_ent
Releases a hash entry, such as while iterating though the hash.
hv_delayfree_ent.
void
See hv_iternext and
hv_free_ent (HV* hv, HE* entry)
hv_iterinit Prepares a starting point to traverse a hash table.
I32
hv_iterinit (HV* tb)
Returns the number of keys in the hash (i.e. the same as HvKEYS(tb)). The return value is
currently only meaningful for hashes without tie magic.
NOTE: Before version 5.004_65, hv_iterinit used to return the number of hash buckets that
happen to be in use. If you still need that esoteric value, you can get it through the macro
HvFILL(tb).
hv_iterkey
Returns the key from the current position of the hash iterator. See hv_iterinit.
char*
hv_iterkey (HE* entry, I32* retlen)
hv_iterkeysv
Returns the key as an SV* from the current position of the hash iterator. The return value will
always be a mortal copy of the key. Also see hv_iterinit.
SV*
524
hv_iterkeysv
(HE* entry)
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
hv_iternext
Returns entries from a hash iterator. See hv_iterinit.
HE*
hv_iternext (HV* tb)
hv_iternextsv
Performs an hv_iternext, hv_iterkey, and hv_iterval in one operation.
SV*
hv_iternextsv (HV* hv, char** key, I32* retlen)
hv_iterval Returns the value from the current position of the hash iterator. See hv_iterkey.
SV*
hv_iterval (HV* tb, HE* entry)
hv_magic Adds magic to a hash. See sv_magic.
void
hv_magic (HV* hv, GV* gv, int how)
HvNAME Returns the package name of a stash. See SvSTASH, CvSTASH.
char*
HvNAME (HV* stash)
hv_store Stores an SV in a hash. The hash key is specified as key and klen is the length of the key.
The hash parameter is the precomputed hash value; if it is zero then Perl will compute it. The
return value will be NULL if the operation failed or if the value did not need to be actually stored
within the hash (as in the case of tied hashes). Otherwise it can be dereferenced to get the
original SV*. Note that the caller is responsible for suitably incrementing the reference count of
val before the call, and decrementing it if the function returned NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied hashes.
SV**
hv_store (HV* tb, char* key, U32 klen, SV* val, U32 hash)
hv_store_ent
Stores val in a hash. The hash key is specified as key. The hash parameter is the
precomputed hash value; if it is zero then Perl will compute it. The return value is the new hash
entry so created. It will be NULL if the operation failed or if the value did not need to be
actually stored within the hash (as in the case of tied hashes). Otherwise the contents of the
return value can be accessed using the He??? macros described here. Note that the caller is
responsible for suitably incrementing the reference count of val before the call, and
decrementing it if the function returned NULL.
See Understanding the Magic of Tied Hashes and Arrays for more information on how to use
this function on tied hashes.
HE*
hv_store_ent
(HV* tb, SV* key, SV* val, U32 hash)
hv_undef Undefines the hash.
void
hv_undef (HV* tb)
isALNUM Returns a boolean indicating whether the C char is an ascii alphanumeric character or digit.
int
isALNUM (char c)
isALPHA Returns a boolean indicating whether the C char is an ascii alphabetic character.
int
isDIGIT
Returns a boolean indicating whether the C char is an ascii digit.
int
18−Oct−1998
isALPHA (char c)
isDIGIT (char c)
Version 5.005_02
525
perlguts
Perl Programmers Reference Guide
perlguts
isLOWER
Returns a boolean indicating whether the C char is a lowercase character.
int
isLOWER (char c)
isSPACE Returns a boolean indicating whether the C char is whitespace.
int
isSPACE (char c)
isUPPER Returns a boolean indicating whether the C char is an uppercase character.
int
isUPPER (char c)
items
Variable which is setup by xsubpp to indicate the number of items on the stack.
Variable−length Parameter Lists in perlxs.
ix
Variable which is setup by xsubpp to indicate which of an XSUB‘s aliases was used to invoke
it. See The ALIAS: Keyword in perlxs.
LEAVE
Closing bracket on a callback. See ENTER and perlcall.
See
LEAVE;
looks_like_number
Test if an the content of an SV looks like a number (or is a number).
int
MARK
looks_like_number(SV*)
Stack marker variable for the XSUB. See dMARK.
mg_clear Clear something magical that the SV represents. See sv_magic.
int
mg_clear (SV* sv)
mg_copy Copies the magic from one SV to another. See sv_magic.
int
mg_find
Finds the magic pointer for type matching the SV. See sv_magic.
MAGIC*
mg_free
mg_free (SV* sv)
Do magic after a value is retrieved from the SV. See sv_magic.
int
mg_len
mg_find (SV* sv, int type)
Free any magic storage used by the SV. See sv_magic.
int
mg_get
mg_copy (SV *, SV *, char *, STRLEN)
mg_get (SV* sv)
Report on the SV‘s length. See sv_magic.
U32
mg_len (SV* sv)
mg_magical
Turns on the magical status of an SV. See sv_magic.
void
mg_set
Do magic after a value is assigned to the SV. See sv_magic.
int
Move
mg_set (SV* sv)
The XSUB−writer‘s interface to the C memmove function. The s is the source, d is the
destination, n is the number of items, and t is the type. Can do overlapping moves. See also
Copy.
void
526
mg_magical (SV* sv)
Move( s, d, n, t )
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
PL_na
A variable which may be used with SvPV to tell Perl to calculate the string length.
New
The XSUB−writer‘s interface to the C malloc function.
void*
newAV
New( x, void *ptr, int size, type )
Creates a new AV. The reference count is set to 1.
AV*
Newc
perlguts
newAV (void)
The XSUB−writer‘s interface to the C malloc function, with cast.
void*
Newc( x, void *ptr, int size, type, cast )
newCONSTSUB
Creates a constant sub equivalent to Perl sub FOO () { 123 } which is eligible for inlining
at compile−time.
void
newHV
newCONSTSUB(HV* stash, char* name, SV* sv)
Creates a new HV. The reference count is set to 1.
HV*
newHV (void)
newRV_inc
Creates an RV wrapper for an SV. The reference count for the original SV is incremented.
SV*
newRV_inc (SV* ref)
For historical reasons, "newRV" is a synonym for "newRV_inc".
newRV_noinc
Creates an RV wrapper for an SV. The reference count for the original SV is not incremented.
SV*
NEWSV
newRV_noinc (SV* ref)
Creates a new SV. A non−zero len parameter indicates the number of bytes of preallocated
string space the SV should have. An extra byte for a tailing NUL is also reserved. (SvPOK is
not set for the SV even if string space is allocated.) The reference count for the new SV is set to
1. id is an integer id between 0 and 1299 (used to identify leaks).
SV*
NEWSV (int id, STRLEN len)
newSViv Creates a new SV and copies an integer into it. The reference count for the SV is set to 1.
SV*
newSViv (IV i)
newSVnv Creates a new SV and copies a double into it. The reference count for the SV is set to 1.
SV*
newSVnv (NV i)
newSVpv Creates a new SV and copies a string into it. The reference count for the SV is set to 1. If len
is zero then Perl will compute the length.
SV*
newSVpv (char* s, STRLEN len)
newSVpvf
Creates a new SV an initialize it with the string formatted like sprintf.
SV*
newSVpvf(const char* pat, ...);
newSVpvn
Creates a new SV and copies a string into it. The reference count for the SV is set to 1. If len
is zero then Perl will create a zero length string.
SV*
18−Oct−1998
newSVpvn (char* s, STRLEN len)
Version 5.005_02
527
perlguts
Perl Programmers Reference Guide
perlguts
newSVrv Creates a new SV for the RV, rv, to point to. If rv is not an RV then it will be upgraded to one.
If classname is non−null then the new SV will be blessed in the specified package. The new
SV is returned and its reference count is 1.
SV*
newSVrv (SV* rv, char* classname)
newSVsv Creates a new SV which is an exact duplicate of the original SV.
SV*
newXS
newSVsv (SV* old)
Used by xsubpp to hook up XSUBs as Perl subs.
newXSproto
Used by xsubpp to hook up XSUBs as Perl subs. Adds Perl prototypes to the subs.
Newz
The XSUB−writer‘s interface to the C malloc function. The allocated memory is zeroed with
memzero.
void*
Nullav
Null AV pointer.
Nullch
Null character pointer.
Nullcv
Null CV pointer.
Nullhv
Null HV pointer.
Nullsv
Null SV pointer.
Newz( x, void *ptr, int size, type )
ORIGMARK
The original stack mark for the XSUB. See dORIGMARK.
perl_alloc Allocates a new Perl interpreter. See perlembed.
perl_call_argv
Performs a callback to the specified Perl sub. See perlcall.
I32
perl_call_argv (char* subname, I32 flags, char** argv)
perl_call_method
Performs a callback to the specified Perl method. The blessed object must be on the stack. See
perlcall.
I32
perl_call_method (char* methname, I32 flags)
perl_call_pv
Performs a callback to the specified Perl sub. See perlcall.
I32
perl_call_pv (char* subname, I32 flags)
perl_call_sv
Performs a callback to the Perl sub whose name is in the SV. See perlcall.
I32
perl_call_sv (SV* sv, I32 flags)
perl_construct
Initializes a new Perl interpreter. See perlembed.
perl_destruct
Shuts down a Perl interpreter. See perlembed.
perl_eval_sv
Tells Perl to eval the string in the SV.
528
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
I32
perlguts
perl_eval_sv (SV* sv, I32 flags)
perl_eval_pv
Tells Perl to eval the given string and return an SV* result.
SV*
perl_eval_pv (char* p, I32 croak_on_error)
perl_free Releases a Perl interpreter. See perlembed.
perl_get_av
Returns the AV of the specified Perl array. If create is set and the Perl variable does not exist
then it will be created. If create is not set and the variable does not exist then NULL is
returned.
AV*
perl_get_av (char* name, I32 create)
perl_get_cv
Returns the CV of the specified Perl sub. If create is set and the Perl variable does not exist
then it will be created. If create is not set and the variable does not exist then NULL is
returned.
CV*
perl_get_cv (char* name, I32 create)
perl_get_hv
Returns the HV of the specified Perl hash. If create is set and the Perl variable does not exist
then it will be created. If create is not set and the variable does not exist then NULL is
returned.
HV*
perl_get_hv (char* name, I32 create)
perl_get_sv
Returns the SV of the specified Perl scalar. If create is set and the Perl variable does not exist
then it will be created. If create is not set and the variable does not exist then NULL is
returned.
SV*
perl_get_sv (char* name, I32 create)
perl_parse
Tells a Perl interpreter to parse a Perl script. See perlembed.
perl_require_pv
Tells Perl to require a module.
void
perl_require_pv (char* pv)
perl_run
Tells a Perl interpreter to run. See perlembed.
POPi
Pops an integer off the stack.
int
POPl
Pops a long off the stack.
long
POPp
POPp()
Pops a double off the stack.
double
18−Oct−1998
POPl()
Pops a string off the stack.
char*
POPn
POPi()
POPn()
Version 5.005_02
529
perlguts
POPs
Perl Programmers Reference Guide
perlguts
Pops an SV off the stack.
SV*
POPs()
PUSHMARK
Opening bracket for arguments on a callback. See PUTBACK and perlcall.
PUSHMARK(p)
PUSHi
Push an integer onto the stack. The stack must have room for this element. Handles ‘set’ magic.
See XPUSHi.
void
PUSHn
Push a double onto the stack. The stack must have room for this element. Handles ‘set’ magic.
See XPUSHn.
void
PUSHp
PUSHn(double d)
Push a string onto the stack. The stack must have room for this element. The len indicates the
length of the string. Handles ‘set’ magic. See XPUSHp.
void
PUSHs
PUSHi(int d)
PUSHp(char *c, int len )
Push an SV onto the stack. The stack must have room for this element. Does not handle ‘set’
magic. See XPUSHs.
void
PUSHu
PUSHs(sv)
Push an unsigned integer onto the stack. The stack must have room for this element. See
XPUSHu.
void
PUSHu(unsigned int d)
PUTBACK
Closing bracket for XSUB arguments. This is usually handled by xsubpp. See PUSHMARK and
perlcall for other uses.
PUTBACK;
Renew
The XSUB−writer‘s interface to the C realloc function.
void*
Renewc
Renew( void *ptr, int size, type )
The XSUB−writer‘s interface to the C realloc function, with cast.
void*
Renewc( void *ptr, int size, type, cast )
RETVAL Variable which is setup by xsubpp to hold the return value for an XSUB. This is always the
proper type for the XSUB. See The RETVAL Variable in perlxs.
safefree
The XSUB−writer‘s interface to the C free function.
safemalloc
The XSUB−writer‘s interface to the C malloc function.
saferealloc
The XSUB−writer‘s interface to the C realloc function.
savepv
Copy a string to a safe spot. This does not use an SV.
char*
savepvn
530
savepv (char* sv)
Copy a string to a safe spot. The len indicates number of bytes to copy. This does not use an
SV.
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
char*
perlguts
savepvn (char* sv, I32 len)
SAVETMPS
Opening bracket for temporaries on a callback. See FREETMPS and perlcall.
SAVETMPS;
SP
Stack pointer. This is usually handled by xsubpp. See dSP and SPAGAIN.
SPAGAIN
Refetch the stack pointer. Used after a callback. See perlcall.
SPAGAIN;
ST
Used to access elements on the XSUB‘s stack.
SV*
strEQ
Test two strings to see if they are equal. Returns true or false.
int
strGE
strNE( char *s1, char *s2 )
Test two strings to see if they are equal. The len parameter indicates the number of bytes to
compare. Returns true or false.
int
strnNE
strLT( char *s1, char *s2 )
Test two strings to see if they are different. Returns true or false.
int
strnEQ
strLE( char *s1, char *s2 )
Test two strings to see if the first, s1, is less than the second, s2. Returns true or false.
int
strNE
strGT( char *s1, char *s2 )
Test two strings to see if the first, s1, is less than or equal to the second, s2. Returns true or
false.
int
strLT
strGE( char *s1, char *s2 )
Test two strings to see if the first, s1, is greater than the second, s2. Returns true or false.
int
strLE
strEQ( char *s1, char *s2 )
Test two strings to see if the first, s1, is greater than or equal to the second, s2. Returns true or
false.
int
strGT
ST(int x)
strnEQ( char *s1, char *s2 )
Test two strings to see if they are different. The len parameter indicates the number of bytes to
compare. Returns true or false.
int
strnNE( char *s1, char *s2, int len )
sv_2mortal
Marks an SV as mortal. The SV will be destroyed when the current context ends.
SV*
sv_2mortal (SV* sv)
sv_bless Blesses an SV into a specified package. The SV must be an RV. The package must be
designated by its stash (see gv_stashpv()). The reference count of the SV is unaffected.
SV*
sv_bless (SV* sv, HV* stash)
sv_catpv Concatenates the string onto the end of the string which is in the SV. Handles ‘get’ magic, but
not ‘set’ magic. See sv_catpv_mg.
void
18−Oct−1998
sv_catpv (SV* sv, char* ptr)
Version 5.005_02
531
perlguts
Perl Programmers Reference Guide
perlguts
sv_catpv_mg
Like sv_catpv, but also handles ‘set’ magic.
void
sv_catpvn (SV* sv, char* ptr)
sv_catpvn
Concatenates the string onto the end of the string which is in the SV. The len indicates number
of bytes to copy. Handles ‘get’ magic, but not ‘set’ magic. See sv_catpvn_mg.
void
sv_catpvn (SV* sv, char* ptr, STRLEN len)
sv_catpvn_mg
Like sv_catpvn, but also handles ‘set’ magic.
void
sv_catpvn_mg (SV* sv, char* ptr, STRLEN len)
sv_catpvf Processes its arguments like sprintf and appends the formatted output to an SV. Handles
‘get’ magic, but not ‘set’ magic. SvSETMAGIC() must typically be called after calling this
function to handle ‘set’ magic.
void
sv_catpvf (SV* sv, const char* pat, ...)
sv_catpvf_mg
Like sv_catpvf, but also handles ‘set’ magic.
void
sv_catpvf_mg (SV* sv, const char* pat, ...)
sv_catsv Concatenates the string from SV ssv onto the end of the string in SV dsv. Handles ‘get’
magic, but not ‘set’ magic. See sv_catsv_mg.
void
sv_catsv (SV* dsv, SV* ssv)
sv_catsv_mg
Like sv_catsv, but also handles ‘set’ magic.
void
sv_chop
Efficient removal of characters from the beginning of the string buffer. SvPOK(sv) must be true
and the ptr must be a pointer to somewhere inside the string buffer. The ptr becomes the first
character of the adjusted string.
void
sv_cmp
sv_chop(SV* sv, char *ptr)
Compares the strings in two SVs. Returns −1, 0, or 1 indicating whether the string in sv1 is less
than, equal to, or greater than the string in sv2.
I32
SvCUR
sv_catsv_mg (SV* dsv, SV* ssv)
sv_cmp (SV* sv1, SV* sv2)
Returns the length of the string which is in the SV. See SvLEN.
int
SvCUR (SV* sv)
SvCUR_set
Set the length of the string which is in the SV. See SvCUR.
void
sv_dec
SvCUR_set (SV* sv, int val )
Auto−decrement of the value in the SV.
void
sv_dec (SV* sv)
sv_derived_from
Returns a boolean indicating whether the SV is a subclass of the specified class.
int
532
sv_derived_from(SV* sv, char* class)
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
sv_derived_from
Returns a boolean indicating whether the SV is derived from the specified class. This is the
function that implements UNIVERSAL::isa. It works for class names as well as for objects.
bool
SvEND
Returns a pointer to the last character in the string which is in the SV. See SvCUR. Access the
character as
char*
sv_eq
sv_derived_from _((SV* sv, char* name));
SvEND(sv)
Returns a boolean indicating whether the strings in the two SVs are identical.
I32
sv_eq (SV* sv1, SV* sv2)
SvGETMAGIC
Invokes mg_get on an SV if it has ‘get’ magic. This macro evaluates its argument more than
once.
void
SvGETMAGIC( SV *sv )
SvGROW
Expands the character buffer in the SV so that it has room for the indicated number of bytes
(remember to reserve space for an extra trailing NUL character). Calls sv_grow to perform the
expansion if necessary. Returns a pointer to the character buffer.
char*
SvGROW( SV* sv, int len )
sv_grow
Expands the character buffer in the SV. This will use sv_unref and will upgrade the SV to
SVt_PV. Returns a pointer to the character buffer. Use SvGROW.
sv_inc
Auto−increment of the value in the SV.
void
sv_inc (SV* sv)
sv_insert Inserts a string at the specified offset/length within the SV. Similar to the Perl substr()
function.
void
SvIOK
sv_insert(SV *sv, STRLEN offset, STRLEN len,
char *str, STRLEN strlen)
Returns a boolean indicating whether the SV contains an integer.
int
SvIOK (SV* SV)
SvIOK_off
Unsets the IV status of an SV.
void
SvIOK_off (SV* sv)
SvIOK_on
Tells an SV that it is an integer.
void
SvIOK_on (SV* sv)
SvIOK_only
Tells an SV that it is an integer and disables all other OK bits.
void
SvIOKp
Returns a boolean indicating whether the SV contains an integer. Checks the private setting.
Use SvIOK.
int
18−Oct−1998
SvIOK_only (SV* sv)
SvIOKp (SV* SV)
Version 5.005_02
533
perlguts
sv_isa
Perl Programmers Reference Guide
perlguts
Returns a boolean indicating whether the SV is blessed into the specified class. This does not
check for subtypes; use sv_derived_from to verify an inheritance relationship.
int
sv_isa (SV* sv, char* name)
sv_isobject
Returns a boolean indicating whether the SV is an RV pointing to a blessed object. If the SV is
not an RV, or if the object is not blessed, then this will return false.
int
SvIV
sv_isobject (SV* sv)
Returns the integer which is in the SV.
int SvIV (SV* sv)
SvIVX
Returns the integer which is stored in the SV.
int
SvLEN
Returns the size of the string buffer in the SV. See SvCUR.
int
sv_len
SvIVX (SV* sv)
SvLEN (SV* sv)
Returns the length of the string in the SV. Use SvCUR.
STRLEN
sv_len (SV* sv)
sv_magic Adds magic to an SV.
void
sv_magic (SV* sv, SV* obj, int how, char* name, I32 namlen)
sv_mortalcopy
Creates a new SV which is a copy of the original SV. The new SV is marked as mortal.
SV*
sv_mortalcopy (SV* oldsv)
sv_newmortal
Creates a new SV which is mortal. The reference count of the SV is set to 1.
SV*
SvNIOK
sv_newmortal (void)
Returns a boolean indicating whether the SV contains a number, integer or double.
int
SvNIOK (SV* SV)
SvNIOK_off
Unsets the NV/IV status of an SV.
void
SvNIOK_off (SV* sv)
SvNIOKp Returns a boolean indicating whether the SV contains a number, integer or double. Checks the
private setting. Use SvNIOK.
int
SvNIOKp (SV* SV)
PL_sv_no
This is the false SV. See PL_sv_yes. Always refer to this as &PL_sv_no.
SvNOK
Returns a boolean indicating whether the SV contains a double.
int
SvNOK (SV* SV)
SvNOK_off
Unsets the NV status of an SV.
void
534
SvNOK_off (SV* sv)
Version 5.005_02
18−Oct−1998
perlguts
Perl Programmers Reference Guide
perlguts
SvNOK_on
Tells an SV that it is a double.
void
SvNOK_on (SV* sv)
SvNOK_only
Tells an SV that it is a double and disables all other OK bits.
void
SvNOK_only (SV* sv)
SvNOKp Returns a boolean indicating whether the SV contains a double. Checks the private setting. Use
SvNOK.
int
SvNV
Returns the double which is stored in the SV.
double
SvNVX
SvOK (SV* sv)
Returns a boolean indicating whether the SvIVX is a valid offset value for the SvPVX. This
hack is used internally to speed up removal of characters from the beginning of a SvPV. When
SvOOK is true, then the start of the allocated string buffer is really (SvPVX − SvIVX).
int
SvPOK
SvNVX (SV* sv)
Returns a boolean indicating whether the value is an SV.
int
SvOOK
SvNV (SV* sv)
Returns the double which is stored in the SV.
double
SvOK
SvNOKp (SV* SV)
SvOOK(SV* sv)
Returns a boolean indicating whether the SV contains a character string.
int
SvPOK (SV* SV)
SvPOK_off
Unsets the PV status of an SV.
void
SvPOK_off (SV* sv)
SvPOK_on
Tells an SV that it is a string.
void
SvPOK_on (SV* sv)
SvPOK_only
Tells an SV that it is a string and disables all other OK bits.
void
SvPOK_only (SV* sv)
SvPOKp Returns a boolean indicating whether the SV contains a character string. Checks the private
setting. Use SvPOK.
int
SvPV
SvPOKp (SV* SV)
Returns a pointer to the string in the SV, or a stringified form of the SV if the SV does not
contain a string. If len is PL_na then Perl will handle the length on its own. Handles ‘get’
magic.
char*
SvPV (SV* sv, int len )
SvPV_force
Like new−>foo };
print "Saw: $@" if $@;
# should be, but isn’t
This example will fail to recognize that an error occurred inside the eval {}. Here‘s why: the
call_Subtract code got executed while perl was cleaning up temporaries when exiting the eval block, and
because call_Subtract is implemented with perl_call_pv using the G_EVAL flag, it promptly reset $@. This
results in the failure of the outermost test for $@, and thereby the failure of the error trap.
Appending the G_KEEPERR flag, so that the perl_call_pv call in call_Subtract reads:
count = perl_call_pv("Subtract", G_EVAL|G_SCALAR|G_KEEPERR);
will preserve the error and restore reliable error handling.
18−Oct−1998
Version 5.005_02
555
perlcall
Perl Programmers Reference Guide
perlcall
Using perl_call_sv
In all the previous examples I have ‘hard−wired’ the name of the Perl subroutine to be called from C. Most
of the time though, it is more convenient to be able to specify the name of the Perl subroutine from within
the Perl script.
Consider the Perl code below
sub fred
{
print "Hello there\n" ;
}
CallSubPV("fred") ;
Here is a snippet of XSUB which defines CallSubPV.
void
CallSubPV(name)
char * name
CODE:
PUSHMARK(SP) ;
perl_call_pv(name, G_DISCARD|G_NOARGS) ;
That is fine as far as it goes. The thing is, the Perl subroutine can be specified as only a string. For Perl 4 this
was adequate, but Perl 5 allows references to subroutines and anonymous subroutines. This is where
perl_call_sv is useful.
The code below for CallSubSV is identical to CallSubPV except that the name parameter is now defined as
an SV* and we use perl_call_sv instead of perl_call_pv.
void
CallSubSV(name)
SV *
name
CODE:
PUSHMARK(SP) ;
perl_call_sv(name, G_DISCARD|G_NOARGS) ;
Because we are using an SV to call fred the following can all be used
CallSubSV("fred") ;
CallSubSV(\&fred) ;
$ref = \&fred ;
CallSubSV($ref) ;
CallSubSV( sub { print "Hello there\n" } ) ;
As you can see, perl_call_sv gives you much greater flexibility in how you can specify the Perl subroutine.
You should note that if it is necessary to store the SV (name in the example above) which corresponds to the
Perl subroutine so that it can be used later in the program, it not enough just to store a copy of the pointer to
the SV. Say the code above had been like this
static SV * rememberSub ;
void
SaveSub1(name)
SV *
name
CODE:
rememberSub = name ;
void
CallSavedSub1()
556
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
CODE:
PUSHMARK(SP) ;
perl_call_sv(rememberSub, G_DISCARD|G_NOARGS) ;
The reason this is wrong is that by the time you come to use the pointer rememberSub in
CallSavedSub1, it may or may not still refer to the Perl subroutine that was recorded in SaveSub1.
This is particularly true for these cases
SaveSub1(\&fred) ;
CallSavedSub1() ;
SaveSub1( sub { print "Hello there\n" } ) ;
CallSavedSub1() ;
By the time each of the SaveSub1 statements above have been executed, the SV*s which corresponded to
the parameters will no longer exist. Expect an error message from Perl of the form
Can’t use an undefined value as a subroutine reference at ...
for each of the CallSavedSub1 lines.
Similarly, with this code
$ref = \&fred ;
SaveSub1($ref) ;
$ref = 47 ;
CallSavedSub1() ;
you can expect one of these messages (which you actually get is dependent on the version of Perl you are
using)
Not a CODE reference at ...
Undefined subroutine &main::47 called ...
The variable $ref may have referred to the subroutine fred whenever the call to SaveSub1 was made
but by the time CallSavedSub1 gets called it now holds the number 47. Because we saved only a pointer
to the original SV in SaveSub1, any changes to $ref will be tracked by the pointer rememberSub. This
means that whenever CallSavedSub1 gets called, it will attempt to execute the code which is referenced
by the SV* rememberSub. In this case though, it now refers to the integer 47, so expect Perl to complain
loudly.
A similar but more subtle problem is illustrated with this code
$ref = \&fred ;
SaveSub1($ref) ;
$ref = \&joe ;
CallSavedSub1() ;
This time whenever CallSavedSub1 get called it will execute the Perl subroutine joe (assuming it
exists) rather than fred as was originally requested in the call to SaveSub1.
To get around these problems it is necessary to take a full copy of the SV. The code below shows
SaveSub2 modified to do that
static SV * keepSub = (SV*)NULL ;
void
SaveSub2(name)
SV *
name
CODE:
/* Take a copy of the callback */
if (keepSub == (SV*)NULL)
/* First time, so create a new SV */
18−Oct−1998
Version 5.005_02
557
perlcall
Perl Programmers Reference Guide
perlcall
keepSub = newSVsv(name) ;
else
/* Been here before, so overwrite */
SvSetSV(keepSub, name) ;
void
CallSavedSub2()
CODE:
PUSHMARK(SP) ;
perl_call_sv(keepSub, G_DISCARD|G_NOARGS) ;
To avoid creating a new SV every time SaveSub2 is called, the function first checks to see if it has been
called before. If not, then space for a new SV is allocated and the reference to the Perl subroutine, name is
copied to the variable keepSub in one operation using newSVsv. Thereafter, whenever SaveSub2 is
called the existing SV, keepSub, is overwritten with the new value using SvSetSV.
Using perl_call_argv
Here is a Perl subroutine which prints whatever parameters are passed to it.
sub PrintList
{
my(@list) = @_ ;
foreach (@list) { print "$_\n" }
}
and here is an example of perl_call_argv which will call PrintList.
static char * words[] = {"alpha", "beta", "gamma", "delta", NULL} ;
static void
call_PrintList()
{
dSP ;
perl_call_argv("PrintList", G_DISCARD, words) ;
}
Note that it is not necessary to call PUSHMARK in this instance. This is because perl_call_argv will do it for
you.
Using perl_call_method
Consider the following Perl code
{
package Mine ;
sub new
{
my($type) = shift ;
bless [@_]
}
sub Display
{
my ($self, $index) = @_ ;
print "$index: $$self[$index]\n" ;
}
sub PrintID
{
my($class) = @_ ;
558
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
print "This is Class $class version 1.0\n" ;
}
}
It implements just a very simple class to manage an array. Apart from the constructor, new, it declares
methods, one static and one virtual. The static method, PrintID, prints out simply the class name and a
version number. The virtual method, Display, prints out a single element of the array. Here is an all Perl
example of using it.
$a = new Mine (’red’, ’green’, ’blue’) ;
$a−>Display(1) ;
PrintID Mine;
will print
1: green
This is Class Mine version 1.0
Calling a Perl method from C is fairly straightforward. The following things are required
a reference to the object for a virtual method or the name of the class for a static method.
the name of the method.
any other parameters specific to the method.
Here is a simple XSUB which illustrates the mechanics of calling both the PrintID and Display
methods from C.
void
call_Method(ref, method, index)
SV *
ref
char * method
int
index
CODE:
PUSHMARK(SP);
XPUSHs(ref);
XPUSHs(sv_2mortal(newSViv(index))) ;
PUTBACK;
perl_call_method(method, G_DISCARD) ;
void
call_PrintID(class, method)
char * class
char * method
CODE:
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSVpv(class, 0))) ;
PUTBACK;
perl_call_method(method, G_DISCARD) ;
So the methods PrintID and Display can be invoked like this
$a = new Mine (’red’, ’green’, ’blue’) ;
call_Method($a, ’Display’, 1) ;
call_PrintID(’Mine’, ’PrintID’) ;
The only thing to note is that in both the static and virtual methods, the method name is not passed via the
stack − it is used as the first parameter to perl_call_method.
18−Oct−1998
Version 5.005_02
559
perlcall
Perl Programmers Reference Guide
perlcall
Using GIMME_V
Here is a trivial XSUB which prints the context in which it is currently executing.
void
PrintContext()
CODE:
I32 gimme = GIMME_V;
if (gimme == G_VOID)
printf ("Context is Void\n") ;
else if (gimme == G_SCALAR)
printf ("Context is Scalar\n") ;
else
printf ("Context is Array\n") ;
and here is some Perl to test it
PrintContext ;
$a = PrintContext ;
@a = PrintContext ;
The output from that will be
Context is Void
Context is Scalar
Context is Array
Using Perl to dispose of temporaries
In the examples given to date, any temporaries created in the callback (i.e., parameters passed on the stack to
the perl_call_* function or values returned via the stack) have been freed by one of these methods
specifying the G_DISCARD flag with perl_call_*.
explicitly disposed of using the ENTER/SAVETMPS − FREETMPS/LEAVE pairing.
There is another method which can be used, namely letting Perl do it for you automatically whenever it
regains control after the callback has terminated. This is done by simply not using the
ENTER ;
SAVETMPS ;
...
FREETMPS ;
LEAVE ;
sequence in the callback (and not, of course, specifying the G_DISCARD flag).
If you are going to use this method you have to be aware of a possible memory leak which can arise under
very specific circumstances. To explain these circumstances you need to know a bit about the flow of
control between Perl and the callback routine.
The examples given at the start of the document (an error handler and an event driven program) are typical of
the two main sorts of flow control that you are likely to encounter with callbacks. There is a very important
distinction between them, so pay attention.
In the first example, an error handler, the flow of control could be as follows. You have created an interface
to an external library. Control can reach the external library like this
perl −−> XSUB −−> external library
Whilst control is in the library, an error condition occurs. You have previously set up a Perl callback to
handle this situation, so it will get executed. Once the callback has finished, control will drop back to Perl
again. Here is what the flow of control will be like in that situation
560
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
perl −−> XSUB −−> external library
...
error occurs
...
external library −−> perl_call −−> perl
|
perl <−− XSUB <−− external library <−− perl_call <−−−−+
After processing of the error using perl_call_* is completed, control reverts back to Perl more or less
immediately.
In the diagram, the further right you go the more deeply nested the scope is. It is only when control is back
with perl on the extreme left of the diagram that you will have dropped back to the enclosing scope and any
temporaries you have left hanging around will be freed.
In the second example, an event driven program, the flow of control will be more like this
perl −−> XSUB −−> event handler
...
event handler −−> perl_call −−> perl
|
event handler <−− perl_call <−−−−+
...
event handler −−> perl_call −−> perl
|
event handler <−− perl_call <−−−−+
...
event handler −−> perl_call −−> perl
|
event handler <−− perl_call <−−−−+
In this case the flow of control can consist of only the repeated sequence
event handler −−> perl_call −−> perl
for practically the complete duration of the program. This means that control may never drop back to the
surrounding scope in Perl at the extreme left.
So what is the big problem? Well, if you are expecting Perl to tidy up those temporaries for you, you might
be in for a long wait. For Perl to dispose of your temporaries, control must drop back to the enclosing scope
at some stage. In the event driven scenario that may never happen. This means that as time goes on, your
program will create more and more temporaries, none of which will ever be freed. As each of these
temporaries consumes some memory your program will eventually consume all the available memory in
your system − kapow!
So here is the bottom line − if you are sure that control will revert back to the enclosing Perl scope fairly
quickly after the end of your callback, then it isn‘t absolutely necessary to dispose explicitly of any
temporaries you may have created. Mind you, if you are at all uncertain about what to do, it doesn‘t do any
harm to tidy up anyway.
Strategies for storing Callback Context Information
Potentially one of the trickiest problems to overcome when designing a callback interface can be figuring out
how to store the mapping between the C callback function and the Perl equivalent.
To help understand why this can be a real problem first consider how a callback is set up in an all C
environment. Typically a C API will provide a function to register a callback. This will expect a pointer to a
function as one of its parameters. Below is a call to a hypothetical function register_fatal which
registers the C function to get called when a fatal error occurs.
register_fatal(cb1) ;
18−Oct−1998
Version 5.005_02
561
perlcall
Perl Programmers Reference Guide
perlcall
The single parameter cb1 is a pointer to a function, so you must have defined cb1 in your code, say
something like this
static void
cb1()
{
printf ("Fatal Error\n") ;
exit(1) ;
}
Now change that to call a Perl subroutine instead
static SV * callback = (SV*)NULL;
static void
cb1()
{
dSP ;
PUSHMARK(SP) ;
/* Call the Perl sub to process the callback */
perl_call_sv(callback, G_DISCARD) ;
}
void
register_fatal(fn)
SV *
fn
CODE:
/* Remember the Perl sub */
if (callback == (SV*)NULL)
callback = newSVsv(fn) ;
else
SvSetSV(callback, fn) ;
/* register the callback with the external library */
register_fatal(cb1) ;
where the Perl equivalent of register_fatal and the callback it registers, pcb1, might look like this
# Register the sub pcb1
register_fatal(\&pcb1) ;
sub pcb1
{
die "I’m dying...\n" ;
}
The mapping between the C callback and the Perl equivalent is stored in the global variable callback.
This will be adequate if you ever need to have only one callback registered at any time. An example could be
an error handler like the code sketched out above. Remember though, repeated calls to register_fatal
will replace the previously registered callback function with the new one.
Say for example you want to interface to a library which allows asynchronous file i/o. In this case you may
be able to register a callback whenever a read operation has completed. To be of any use we want to be able
to call separate Perl subroutines for each file that is opened. As it stands, the error handler example above
would not be adequate as it allows only a single callback to be defined at any time. What we require is a
means of storing the mapping between the opened file and the Perl subroutine we want to be called for that
file.
562
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
Say the i/o library has a function asynch_read which associates a C function ProcessRead with a file
handle fh − this assumes that it has also provided some routine to open the file and so obtain the file handle.
asynch_read(fh, ProcessRead)
This may expect the C ProcessRead function of this form
void
ProcessRead(fh, buffer)
int fh ;
char *
buffer ;
{
...
}
To provide a Perl interface to this library we need to be able to map between the fh parameter and the Perl
subroutine we want called. A hash is a convenient mechanism for storing this mapping. The code below
shows a possible implementation
static HV * Mapping = (HV*)NULL ;
void
asynch_read(fh, callback)
int
fh
SV *
callback
CODE:
/* If the hash doesn’t already exist, create it */
if (Mapping == (HV*)NULL)
Mapping = newHV() ;
/* Save the fh −> callback mapping */
hv_store(Mapping, (char*)&fh, sizeof(fh), newSVsv(callback), 0) ;
/* Register with the C Library */
asynch_read(fh, asynch_read_if) ;
and asynch_read_if could look like this
static void
asynch_read_if(fh, buffer)
int fh ;
char *
buffer ;
{
dSP ;
SV ** sv ;
/* Get the callback associated with fh */
sv = hv_fetch(Mapping, (char*)&fh , sizeof(fh), FALSE) ;
if (sv == (SV**)NULL)
croak("Internal error...\n") ;
PUSHMARK(SP) ;
XPUSHs(sv_2mortal(newSViv(fh))) ;
XPUSHs(sv_2mortal(newSVpv(buffer, 0))) ;
PUTBACK ;
/* Call the Perl sub */
perl_call_sv(*sv, G_DISCARD) ;
}
For completeness, here is asynch_close. This shows how to remove the entry from the hash Mapping.
18−Oct−1998
Version 5.005_02
563
perlcall
Perl Programmers Reference Guide
perlcall
void
asynch_close(fh)
fh
int
CODE:
/* Remove the entry from the hash */
(void) hv_delete(Mapping, (char*)&fh, sizeof(fh), G_DISCARD) ;
/* Now call the real asynch_close */
asynch_close(fh) ;
So the Perl interface would look like this
sub callback1
{
my($handle, $buffer) = @_ ;
}
# Register the Perl callback
asynch_read($fh, \&callback1) ;
asynch_close($fh) ;
The mapping between the C callback and Perl is stored in the global hash Mapping this time. Using a hash
has the distinct advantage that it allows an unlimited number of callbacks to be registered.
What if the interface provided by the C callback doesn‘t contain a parameter which allows the file handle to
Perl subroutine mapping? Say in the asynchronous i/o package, the callback function gets passed only the
buffer parameter like this
void
ProcessRead(buffer)
char *
buffer ;
{
...
}
Without the file handle there is no straightforward way to map from the C callback to the Perl subroutine.
In this case a possible way around this problem is to predefine a series of C functions to act as the interface
to Perl, thus
#define MAX_CB
#define NULL_HANDLE −1
typedef void (*FnMap)() ;
3
struct MapStruct {
FnMap
Function ;
SV *
PerlSub ;
int
Handle ;
} ;
static void
static void
static void
fn1() ;
fn2() ;
fn3() ;
static struct MapStruct Map [MAX_CB] =
{
{ fn1, NULL, NULL_HANDLE },
{ fn2, NULL, NULL_HANDLE },
{ fn3, NULL, NULL_HANDLE }
} ;
564
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
static void
Pcb(index, buffer)
int index ;
char * buffer ;
{
dSP ;
PUSHMARK(SP) ;
XPUSHs(sv_2mortal(newSVpv(buffer, 0))) ;
PUTBACK ;
/* Call the Perl sub */
perl_call_sv(Map[index].PerlSub, G_DISCARD) ;
}
static void
fn1(buffer)
char * buffer ;
{
Pcb(0, buffer) ;
}
static void
fn2(buffer)
char * buffer ;
{
Pcb(1, buffer) ;
}
static void
fn3(buffer)
char * buffer ;
{
Pcb(2, buffer) ;
}
void
array_asynch_read(fh, callback)
int
fh
SV *
callback
CODE:
int index ;
int null_index = MAX_CB ;
/* Find the same handle or an empty entry */
for (index = 0 ; index < MAX_CB ; ++index)
{
if (Map[index].Handle == fh)
break ;
if (Map[index].Handle == NULL_HANDLE)
null_index = index ;
}
if (index == MAX_CB && null_index == MAX_CB)
croak ("Too many callback functions registered\n") ;
if (index == MAX_CB)
index = null_index ;
18−Oct−1998
Version 5.005_02
565
perlcall
Perl Programmers Reference Guide
perlcall
/* Save the file handle */
Map[index].Handle = fh ;
/* Remember the Perl sub */
if (Map[index].PerlSub == (SV*)NULL)
Map[index].PerlSub = newSVsv(callback) ;
else
SvSetSV(Map[index].PerlSub, callback) ;
asynch_read(fh, Map[index].Function) ;
void
array_asynch_close(fh)
int
fh
CODE:
int index ;
/* Find the file handle */
for (index = 0; index < MAX_CB ; ++ index)
if (Map[index].Handle == fh)
break ;
if (index == MAX_CB)
croak ("could not close fh %d\n", fh) ;
Map[index].Handle = NULL_HANDLE ;
SvREFCNT_dec(Map[index].PerlSub) ;
Map[index].PerlSub = (SV*)NULL ;
asynch_close(fh) ;
In this case the functions fn1, fn2, and fn3 are used to remember the Perl subroutine to be called. Each of
the functions holds a separate hard−wired index which is used in the function Pcb to access the Map array
and actually call the Perl subroutine.
There are some obvious disadvantages with this technique.
Firstly, the code is considerably more complex than with the previous example.
Secondly, there is a hard−wired limit (in this case 3) to the number of callbacks that can exist
simultaneously. The only way to increase the limit is by modifying the code to add more functions and then
recompiling. None the less, as long as the number of functions is chosen with some care, it is still a
workable solution and in some cases is the only one available.
To summarize, here are a number of possible methods for you to consider for storing the mapping between C
and the Perl callback
1. Ignore the problem − Allow only 1 callback
For a lot of situations, like interfacing to an error handler, this may be a perfectly adequate solution.
2. Create a sequence of callbacks − hard wired limit
If it is impossible to tell from the parameters passed back from the C callback what the context is,
then you may need to create a sequence of C callback interface functions, and store pointers to each
in an array.
3. Use a parameter to map to the Perl callback
A hash is an ideal mechanism to store the mapping between C and Perl.
Alternate Stack Manipulation
Although I have made use of only the POP* macros to access values returned from Perl subroutines, it is
also possible to bypass these macros and read the stack using the ST macro (See perlxs for a full description
of the ST macro).
566
Version 5.005_02
18−Oct−1998
perlcall
Perl Programmers Reference Guide
perlcall
Most of the time the POP* macros should be adequate, the main problem with them is that they force you to
process the returned values in sequence. This may not be the most suitable way to process the values in some
cases. What we want is to be able to access the stack in a random order. The ST macro as used when coding
an XSUB is ideal for this purpose.
The code below is the example given in the section Returning a list of values recoded to use ST instead of
POP*.
static void
call_AddSubtract2(a, b)
int a ;
int b ;
{
dSP ;
I32 ax ;
int count ;
ENTER ;
SAVETMPS;
PUSHMARK(SP) ;
XPUSHs(sv_2mortal(newSViv(a)));
XPUSHs(sv_2mortal(newSViv(b)));
PUTBACK ;
count = perl_call_pv("AddSubtract", G_ARRAY);
SPAGAIN ;
SP −= count ;
ax = (SP − PL_stack_base) + 1 ;
if (count != 2)
croak("Big trouble\n") ;
printf ("%d + %d = %d\n", a, b, SvIV(ST(0))) ;
printf ("%d − %d = %d\n", a, b, SvIV(ST(1))) ;
PUTBACK ;
FREETMPS ;
LEAVE ;
}
Notes
1.
Notice that it was necessary to define the variable ax. This is because the ST macro expects it to
exist. If we were in an XSUB it would not be necessary to define ax as it is already defined for you.
2.
The code
SPAGAIN ;
SP −= count ;
ax = (SP − PL_stack_base) + 1 ;
sets the stack up so that we can use the ST macro.
3.
Unlike the original coding of this example, the returned values are not accessed in reverse order. So
ST(0) refers to the first value returned by the Perl subroutine and ST(count−1) refers to the last.
Creating and calling an anonymous subroutine in C
As we‘ve already shown, perl_call_sv can be used to invoke an anonymous subroutine. However, our
example showed how Perl script invoking an XSUB to preform this operation. Let‘s see how it can be done
inside our C code:
18−Oct−1998
Version 5.005_02
567
perlcall
Perl Programmers Reference Guide
perlcall
...
SV *cvrv = perl_eval_pv("sub { print ’You will not find me cluttering any namespace!
...
perl_call_sv(cvrv, G_VOID|G_NOARGS);
perl_eval_pv is used to compile the anonymous subroutine, which will be the return value as well (read
more about perl_eval_pv in perl_eval_pv). Once this code reference is in hand, it can be mixed in with
all the previous examples we‘ve shown.
SEE ALSO
perlxs, perlguts, perlembed
AUTHOR
Paul Marquess
#include
/* from the Perl distribution
/* from the Perl distribution
static PerlInterpreter *my_perl;
/***
The Perl interpreter
*/
*/
***/
int main(int argc, char **argv, char **env)
570
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
{
my_perl = perl_alloc();
perl_construct(my_perl);
perl_parse(my_perl, NULL, argc, argv, (char **)NULL);
perl_run(my_perl);
perl_destruct(my_perl);
perl_free(my_perl);
}
Notice that we don‘t use the env pointer. Normally handed to perl_parse as its final argument, env
here is replaced by NULL, which means that the current environment will be used.
Now compile this program (I‘ll call it interp.c) into an executable:
% cc −o interp interp.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
After a successful compilation, you‘ll be able to use interp just like perl itself:
% interp
print "Pretty Good Perl \n";
print "10890 − 9801 is ", 10890 − 9801;
Pretty Good Perl
10890 − 9801 is 1089
or
% interp −e ’printf("%x", 3735928559)’
deadbeef
You can also read and execute Perl statements from a file while in the midst of your C program, by placing
the filename in argv[1] before calling perl_run.
Calling a Perl subroutine from your C program
To call individual Perl subroutines, you can use any of the perl_call_* functions documented in perlcall. In
this example we‘ll use perl_call_argv.
That‘s shown below, in a program I‘ll call showtime.c.
#include
#include
static PerlInterpreter *my_perl;
int main(int argc, char **argv, char **env)
{
char *args[] = { NULL };
my_perl = perl_alloc();
perl_construct(my_perl);
perl_parse(my_perl, NULL, argc, argv, NULL);
/*** skipping perl_run() ***/
perl_call_argv("showtime", G_DISCARD | G_NOARGS, args);
perl_destruct(my_perl);
perl_free(my_perl);
}
where showtime is a Perl subroutine that takes no arguments (that‘s the G_NOARGS) and for which I‘ll
ignore the return value (that‘s the G_DISCARD). Those flags, and others, are discussed in perlcall.
18−Oct−1998
Version 5.005_02
571
perlembed
Perl Programmers Reference Guide
perlembed
I‘ll define the showtime subroutine in a file called showtime.pl:
print "I shan’t be printed.";
sub showtime {
print time;
}
Simple enough. Now compile and run:
% cc −o showtime showtime.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
% showtime showtime.pl
818284590
yielding the number of seconds that elapsed between January 1, 1970 (the beginning of the Unix epoch), and
the moment I began writing this sentence.
In this particular case we don‘t have to call perl_run, but in general it‘s considered good practice to ensure
proper initialization of library code, including execution of all object DESTROY methods and package END
{} blocks.
If you want to pass arguments to the Perl subroutine, you can add strings to the NULL−terminated args list
passed to perl_call_argv. For other data types, or to examine return values, you‘ll need to manipulate the
Perl stack. That‘s demonstrated in the last section of this document:
Fiddling with the Perl stack from your C program.
Evaluating a Perl statement from your C program
Perl provides two API functions to evaluate pieces of Perl code. These are perl_eval_sv and perl_eval_pv.
Arguably, these are the only routines you‘ll ever need to execute snippets of Perl code from within your C
program. Your code can be as long as you wish; it can contain multiple statements; it can employ use,
require, and do to include external Perl files.
perl_eval_pv lets us evaluate individual Perl strings, and then extract variables for coercion into C types.
The following program, string.c, executes three Perl strings, extracting an int from the first, a float from
the second, and a char * from the third.
#include
#include
static PerlInterpreter *my_perl;
main (int argc, char **argv, char **env)
{
char *embedding[] = { "", "−e", "0" };
my_perl = perl_alloc();
perl_construct( my_perl );
perl_parse(my_perl, NULL, 3, embedding, NULL);
perl_run(my_perl);
/** Treat $a as an integer **/
perl_eval_pv("$a = 3; $a **= 2", TRUE);
printf("a = %d\n", SvIV(perl_get_sv("a", FALSE)));
/** Treat $a as a float **/
perl_eval_pv("$a = 3.14; $a **= 2", TRUE);
printf("a = %f\n", SvNV(perl_get_sv("a", FALSE)));
/** Treat $a as a string **/
perl_eval_pv("$a = ’rekcaH lreP rehtonA tsuJ’; $a = reverse($a);", TRUE);
printf("a = %s\n", SvPV(perl_get_sv("a", FALSE), PL_na));
572
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
perl_destruct(my_perl);
perl_free(my_perl);
}
All of those strange functions with sv in their names help convert Perl scalars to C types. They‘re described
in perlguts.
If you compile and run string.c, you‘ll see the results of using SvIV() to create an int, SvNV() to create
a float, and SvPV() to create a string:
a = 9
a = 9.859600
a = Just Another Perl Hacker
In the example above, we‘ve created a global variable to temporarily store the computed value of our eval‘d
expression. It is also possible and in most cases a better strategy to fetch the return value from
perl_eval_pv() instead. Example:
...
SV *val = perl_eval_pv("reverse ’rekcaH lreP rehtonA tsuJ’", TRUE);
printf("%s\n", SvPV(val,PL_na));
...
This way, we avoid namespace pollution by not creating global variables and we‘ve simplified our code as
well.
Performing Perl pattern matches and substitutions from your C program
The perl_eval_sv() function lets us evaluate strings of Perl code, so we can define some functions that
use it to "specialize" in matches and substitutions: match(), substitute(), and matches().
I32 match(SV *string, char *pattern);
Given a string and a pattern (e.g., m/clasp/ or /\b\w*\b/, which in your C program might appear as
"/\\b\\w*\\b/"), match() returns 1 if the string matches the pattern and 0 otherwise.
int substitute(SV **string, char *pattern);
Given a pointer to an SV and an =~ operation (e.g., s/bob/robert/g or tr[A−Z][a−z]),
substitute() modifies the string within the AV at according to the operation, returning the number of
substitutions made.
int matches(SV *string, char *pattern, AV **matches);
Given an SV, a pattern, and a pointer to an empty AV, matches() evaluates $string =~ $pattern in
an array context, and fills in matches with the array elements, returning the number of matches found.
Here‘s a sample program, match.c, that uses all three (long lines have been wrapped here):
#include
#include
/** my_perl_eval_sv(code, error_check)
** kinda like perl_eval_sv(),
** but we pop the return value off the stack
**/
SV* my_perl_eval_sv(SV *sv, I32 croak_on_error)
{
dSP;
SV* retval;
PUSHMARK(SP);
perl_eval_sv(sv, G_SCALAR);
18−Oct−1998
Version 5.005_02
573
perlembed
Perl Programmers Reference Guide
perlembed
SPAGAIN;
retval = POPs;
PUTBACK;
if (croak_on_error && SvTRUE(ERRSV))
croak(SvPVx(ERRSV, PL_na));
return retval;
}
/** match(string, pattern)
**
** Used for matches in a scalar context.
**
** Returns 1 if the match was successful; 0 otherwise.
**/
I32 match(SV *string, char *pattern)
{
SV *command = NEWSV(1099, 0), *retval;
sv_setpvf(command, "my $string = ’%s’; $string =~ %s",
SvPV(string,PL_na), pattern);
retval = my_perl_eval_sv(command, TRUE);
SvREFCNT_dec(command);
return SvIV(retval);
}
/** substitute(string, pattern)
**
** Used for =~ operations that modify their left−hand side (s/// and tr///)
**
** Returns the number of successful matches, and
** modifies the input string if there were any.
**/
I32 substitute(SV **string, char *pattern)
{
SV *command = NEWSV(1099, 0), *retval;
sv_setpvf(command, "$string = ’%s’; ($string =~ %s)",
SvPV(*string,PL_na), pattern);
retval = my_perl_eval_sv(command, TRUE);
SvREFCNT_dec(command);
*string = perl_get_sv("string", FALSE);
return SvIV(retval);
}
/** matches(string, pattern, matches)
**
** Used for matches in an array context.
**
** Returns the number of matches,
** and fills in **matches with the matching substrings
**/
I32 matches(SV *string, char *pattern, AV **match_list)
574
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
{
SV *command = NEWSV(1099, 0);
I32 num_matches;
sv_setpvf(command, "my $string = ’%s’; @array = ($string =~ %s)",
SvPV(string,PL_na), pattern);
my_perl_eval_sv(command, TRUE);
SvREFCNT_dec(command);
*match_list = perl_get_av("array", FALSE);
num_matches = av_len(*match_list) + 1; /** assume $[ is 0 **/
return num_matches;
}
main (int argc, char **argv, char **env)
{
PerlInterpreter *my_perl = perl_alloc();
char *embedding[] = { "", "−e", "0" };
AV *match_list;
I32 num_matches, i;
SV *text = NEWSV(1099,0);
perl_construct(my_perl);
perl_parse(my_perl, NULL, 3, embedding, NULL);
sv_setpv(text, "When he is at a convenience store and the bill comes to some amo
if (match(text, "m/quarter/")) /** Does text contain ’quarter’? **/
printf("match: Text contains the word ’quarter’.\n\n");
else
printf("match: Text doesn’t contain the word ’quarter’.\n\n");
if (match(text, "m/eighth/")) /** Does text contain ’eighth’? **/
printf("match: Text contains the word ’eighth’.\n\n");
else
printf("match: Text doesn’t contain the word ’eighth’.\n\n");
/** Match all occurrences of /wi../ **/
num_matches = matches(text, "m/(wi..)/g", &match_list);
printf("matches: m/(wi..)/g found %d matches...\n", num_matches);
for (i = 0; i < num_matches; i++)
printf("match: %s\n", SvPV(*av_fetch(match_list, i, FALSE),PL_na));
printf("\n");
/** Remove all vowels from text **/
num_matches = substitute(&text, "s/[aeiou]//gi");
if (num_matches) {
printf("substitute: s/[aeiou]//gi...%d substitutions made.\n",
num_matches);
printf("Now text is: %s\n\n", SvPV(text,PL_na));
}
/** Attempt a substitution **/
if (!substitute(&text, "s/Perl/C/")) {
printf("substitute: s/Perl/C...No substitution made.\n\n");
}
SvREFCNT_dec(text);
18−Oct−1998
Version 5.005_02
575
perlembed
Perl Programmers Reference Guide
perlembed
PL_perl_destruct_level = 1;
perl_destruct(my_perl);
perl_free(my_perl);
}
which produces the output (again, long lines have been wrapped here)
match: Text contains the word ’quarter’.
match: Text doesn’t contain the word ’eighth’.
matches: m/(wi..)/g found 2 matches...
match: will
match: with
substitute: s/[aeiou]//gi...139 substitutions made.
Now text is: Whn h s t cnvnnc str nd th bll cms t sm mnt lk 76 cnts,
Mynrd s wr tht thr s smthng h *shld* d, smthng tht wll nbl hm t gt bck
qrtr, bt h hs n d *wht*. H fmbls thrgh hs rd sqzy chngprs nd gvs th by
thr xtr pnns wth hs dllr, hpng tht h mght lck nt th crrct mnt. Th by gvs
hm bck tw f hs wn pnns nd thn th bg shny qrtr tht s hs prz. −RCHH
substitute: s/Perl/C...No substitution made.
Fiddling with the Perl stack from your C program
When trying to explain stacks, most computer science textbooks mumble something about spring−loaded
columns of cafeteria plates: the last thing you pushed on the stack is the first thing you pop off. That‘ll do
for our purposes: your C program will push some arguments onto "the Perl stack", shut its eyes while some
magic happens, and then pop the results—the return value of your Perl subroutine—off the stack.
First you‘ll need to know how to convert between C types and Perl types, with newSViv() and
sv_setnv() and newAV() and all their friends. They‘re described in perlguts.
Then you‘ll need to know how to manipulate the Perl stack. That‘s described in perlcall.
Once you‘ve understood those, embedding Perl in C is easy.
Because C has no builtin function for integer exponentiation, let‘s make Perl‘s ** operator available to it
(this is less useful than it sounds, because Perl implements ** with C‘s pow() function). First I‘ll create a
stub exponentiation function in power.pl:
sub expo {
my ($a, $b) = @_;
return $a ** $b;
}
Now I‘ll create a C program, power.c, with a function PerlPower() that contains all the perlguts
necessary to push the two arguments into expo() and to pop the return value out. Take a deep breath...
#include
#include
static PerlInterpreter *my_perl;
static void
PerlPower(int a, int b)
{
dSP;
ENTER;
SAVETMPS;
PUSHMARK(SP);
XPUSHs(sv_2mortal(newSViv(a)));
576
/*
/*
/*
/*
/*
Version 5.005_02
initialize stack pointer
everything created after here
...is a temporary variable.
remember the stack pointer
push the base onto the stack
*/
*/
*/
*/
*/
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
XPUSHs(sv_2mortal(newSViv(b))); /* push the exponent onto stack */
PUTBACK;
/* make local stack pointer global */
perl_call_pv("expo", G_SCALAR); /* call the function
*/
SPAGAIN;
/* refresh stack pointer
*/
/* pop the return value from stack */
printf ("%d to the %dth power is %d.\n", a, b, POPi);
PUTBACK;
FREETMPS;
/* free that return value
*/
LEAVE;
/* ...and the XPUSHed "mortal" args.*/
}
int main (int argc, char **argv, char **env)
{
char *my_argv[] = { "", "power.pl" };
my_perl = perl_alloc();
perl_construct( my_perl );
perl_parse(my_perl, NULL, 2, my_argv, (char **)NULL);
perl_run(my_perl);
PerlPower(3, 4);
/*** Compute 3 ** 4 ***/
perl_destruct(my_perl);
perl_free(my_perl);
}
Compile and run:
% cc −o power power.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
% power
3 to the 4th power is 81.
Maintaining a persistent interpreter
When developing interactive and/or potentially long−running applications, it‘s a good idea to maintain a
persistent interpreter rather than allocating and constructing a new interpreter multiple times. The major
reason is speed: since Perl will only be loaded into memory once.
However, you have to be more cautious with namespace and variable scoping when using a persistent
interpreter. In previous examples we‘ve been using global variables in the default package main. We knew
exactly what code would be run, and assumed we could avoid variable collisions and outrageous symbol
table growth.
Let‘s say your application is a server that will occasionally run Perl code from some arbitrary file. Your
server has no way of knowing what code it‘s going to run. Very dangerous.
If the file is pulled in by perl_parse(), compiled into a newly constructed interpreter, and subsequently
cleaned out with perl_destruct() afterwards, you‘re shielded from most namespace troubles.
One way to avoid namespace collisions in this scenario is to translate the filename into a guaranteed−unique
package name, and then compile the code into that package using eval. In the example below, each file will
only be compiled once. Or, the application might choose to clean out the symbol table associated with the
file after it‘s no longer needed. Using perl_call_argv, We‘ll call the subroutine
Embed::Persistent::eval_file which lives in the file persistent.pl and pass the filename
and boolean cleanup/cache flag as arguments.
Note that the process will continue to grow for each file that it uses. In addition, there might be
AUTOLOADed subroutines and other conditions that cause Perl‘s symbol table to grow. You might want to
add some logic that keeps track of the process size, or restarts itself after a certain number of requests, to
ensure that memory consumption is minimized. You‘ll also want to scope your variables with my whenever
18−Oct−1998
Version 5.005_02
577
perlembed
Perl Programmers Reference Guide
perlembed
possible.
package Embed::Persistent;
#persistent.pl
use strict;
use vars ’%Cache’;
use Symbol qw(delete_package);
sub valid_package_name {
my($string) = @_;
$string =~ s/([^A−Za−z0−9\/])/sprintf("_%2x",unpack("C",$1))/eg;
# second pass only for words starting with a digit
$string =~ s|/(\d)|sprintf("/_%2x",unpack("C",$1))|eg;
# Dress it up as a real package name
$string =~ s|/|::|g;
return "Embed" . $string;
}
sub eval_file {
my($filename, $delete) = @_;
my $package = valid_package_name($filename);
my $mtime = −M $filename;
if(defined $Cache{$package}{mtime}
&&
$Cache{$package}{mtime} <= $mtime)
{
# we have compiled this subroutine already,
# it has not been updated on disk, nothing left to do
print STDERR "already compiled $package−>handler\n";
}
else {
local *FH;
open FH, $filename or die "open ’$filename’ $!";
local($/) = undef;
my $sub = ;
close FH;
#wrap the code into a subroutine inside our unique package
my $eval = qq{package $package; sub handler { $sub; }};
{
# hide our variables within this block
my($filename,$mtime,$package,$sub);
eval $eval;
}
die $@ if $@;
#cache it unless we’re cleaning out each time
$Cache{$package}{mtime} = $mtime unless $delete;
}
eval {$package−>handler;};
die $@ if $@;
delete_package($package) if $delete;
#take a look if you want
#print Devel::Symdump−>rnew($package)−>as_string, $/;
578
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
}
1;
__END__
/* persistent.c */
#include
#include
/* 1 = clean out filename’s symbol table after each request, 0 = don’t */
#ifndef DO_CLEAN
#define DO_CLEAN 0
#endif
static PerlInterpreter *perl = NULL;
int
main(int argc, char **argv, char **env)
{
char *embedding[] = { "", "persistent.pl" };
char *args[] = { "", DO_CLEAN, NULL };
char filename [1024];
int exitstatus = 0;
if((perl = perl_alloc()) == NULL) {
fprintf(stderr, "no memory!");
exit(1);
}
perl_construct(perl);
exitstatus = perl_parse(perl, NULL, 2, embedding, NULL);
if(!exitstatus) {
exitstatus = perl_run(perl);
while(printf("Enter file name: ") && gets(filename)) {
/* call the subroutine, passing it the filename as an argument */
args[0] = filename;
perl_call_argv("Embed::Persistent::eval_file",
G_DISCARD | G_EVAL, args);
/* check $@ */
if(SvTRUE(ERRSV))
fprintf(stderr, "eval error: %s\n", SvPV(ERRSV,PL_na));
}
}
PL_perl_destruct_level = 0;
perl_destruct(perl);
perl_free(perl);
exit(exitstatus);
}
Now compile:
% cc −o persistent persistent.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
Here‘s a example script file:
#test.pl
my $string = "hello";
18−Oct−1998
Version 5.005_02
579
perlembed
Perl Programmers Reference Guide
perlembed
foo($string);
sub foo {
print "foo says: @_\n";
}
Now run:
% persistent
Enter file name:
foo says: hello
Enter file name:
already compiled
foo says: hello
Enter file name:
test.pl
test.pl
Embed::test_2epl−>handler
^C
Maintaining multiple interpreter instances
Some rare applications will need to create more than one interpreter during a session. Such an application
might sporadically decide to release any resources associated with the interpreter.
The program must take care to ensure that this takes place before the next interpreter is constructed. By
default, the global variable PL_perl_destruct_level is set to , since extra cleaning isn‘t needed
when a program has only one interpreter.
Setting PL_perl_destruct_level to 1 makes everything squeaky clean:
PL_perl_destruct_level = 1;
while(1) {
...
/* reset global variables here with PL_perl_destruct_level = 1 */
perl_construct(my_perl);
...
/* clean and reset _everything_ during perl_destruct */
perl_destruct(my_perl);
perl_free(my_perl);
...
/* let’s go do it again! */
}
When perl_destruct() is called, the interpreter‘s syntax parse tree and symbol tables are cleaned up,
and global variables are reset.
Now suppose we have more than one interpreter instance running at the same time. This is feasible, but only
if you used the −DMULTIPLICITY flag when building Perl. By default, that sets
PL_perl_destruct_level to 1.
Let‘s give it a try:
#include
#include
/* we’re going to embed two interpreters */
/* we’re going to embed two interpreters */
#define SAY_HELLO "−e", "print qq(Hi, I’m $^X\n)"
int main(int argc, char **argv, char **env)
{
PerlInterpreter
*one_perl = perl_alloc(),
*two_perl = perl_alloc();
580
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
char *one_args[] = { "one_perl", SAY_HELLO };
char *two_args[] = { "two_perl", SAY_HELLO };
perl_construct(one_perl);
perl_construct(two_perl);
perl_parse(one_perl, NULL, 3, one_args, (char **)NULL);
perl_parse(two_perl, NULL, 3, two_args, (char **)NULL);
perl_run(one_perl);
perl_run(two_perl);
perl_destruct(one_perl);
perl_destruct(two_perl);
perl_free(one_perl);
perl_free(two_perl);
}
Compile as usual:
% cc −o multiplicity multiplicity.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
Run it, Run it:
% multiplicity
Hi, I’m one_perl
Hi, I’m two_perl
Using Perl modules, which themselves use C libraries, from your C program
If you‘ve played with the examples above and tried to embed a script that use()s a Perl module (such as
Socket) which itself uses a C or C++ library, this probably happened:
Can’t load module Socket, dynamic loading not available in this perl.
(You may need to build a new perl executable which either supports
dynamic loading or has the Socket module statically linked into it.)
What‘s wrong?
Your interpreter doesn‘t know how to communicate with these extensions on its own. A little glue will help.
Up until now you‘ve been calling perl_parse(), handing it NULL for the second argument:
perl_parse(my_perl, NULL, argc, my_argv, NULL);
That‘s where the glue code can be inserted to create the initial contact between Perl and linked C/C++
routines. Let‘s take a look some pieces of perlmain.c to see how Perl does this:
#ifdef __cplusplus
# define EXTERN_C extern "C"
#else
# define EXTERN_C extern
#endif
static void xs_init _((void));
EXTERN_C void boot_DynaLoader _((CV* cv));
EXTERN_C void boot_Socket _((CV* cv));
EXTERN_C void
xs_init()
{
char *file = __FILE__;
/* DynaLoader is a special case */
newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);
18−Oct−1998
Version 5.005_02
581
perlembed
Perl Programmers Reference Guide
perlembed
newXS("Socket::bootstrap", boot_Socket, file);
}
Simply put: for each extension linked with your Perl executable (determined during its initial configuration
on your computer or when adding a new extension), a Perl subroutine is created to incorporate the
extension‘s routines. Normally, that subroutine is named Module::bootstrap() and is invoked when
you say use Module. In turn, this hooks into an XSUB, boot_Module, which creates a Perl counterpart for
each of the extension‘s XSUBs. Don‘t worry about this part; leave that to the xsubpp and extension authors.
If your extension is dynamically loaded, DynaLoader creates Module::bootstrap() for you on the fly.
In fact, if you have a working DynaLoader then there is rarely any need to link in any other extensions
statically.
Once you have this code, slap it into the second argument of perl_parse():
perl_parse(my_perl, xs_init, argc, my_argv, NULL);
Then compile:
% cc −o interp interp.c ‘perl −MExtUtils::Embed −e ccopts −e ldopts‘
% interp
use Socket;
use SomeDynamicallyLoadedModule;
print "Now I can use extensions!\n"’
ExtUtils::Embed can also automate writing the xs_init glue code.
%
%
%
%
perl −MExtUtils::Embed −e xsinit −− −o perlxsi.c
cc −c perlxsi.c ‘perl −MExtUtils::Embed −e ccopts‘
cc −c interp.c ‘perl −MExtUtils::Embed −e ccopts‘
cc −o interp perlxsi.o interp.o ‘perl −MExtUtils::Embed −e ldopts‘
Consult perlxs and perlguts for more details.
Embedding Perl under Win32
At the time of this writing (5.004), there are two versions of Perl which run under Win32. (The two versions
are merging in 5.005.) Interfacing to ActiveState‘s Perl library is quite different from the examples in this
documentation, as significant changes were made to the internal Perl API. However, it is possible to embed
ActiveState‘s Perl runtime. For details, see the Perl for Win32 FAQ at
http://www.perl.com/perl/faq/win32/Perl_for_Win32_FAQ.html.
With the "official" Perl version 5.004 or higher, all the examples within this documentation will compile and
run untouched, although the build process is slightly different between Unix and Win32.
For starters, backticks don‘t work under the Win32 native command shell. The ExtUtils::Embed kit on
CPAN ships with a script called genmake, which generates a simple makefile to build a program from a
single C source file. It can be used like this:
C:\ExtUtils−Embed\eg> perl genmake interp.c
C:\ExtUtils−Embed\eg> nmake
C:\ExtUtils−Embed\eg> interp −e "print qq{I’m embedded in Win32!\n}"
You may wish to use a more robust environment such as the Microsoft Developer Studio. In this case, run
this to generate perlxsi.c:
perl −MExtUtils::Embed −e xsinit
Create a new project and Insert − Files into Project: perlxsi.c, perl.lib, and your own source files, e.g.
interp.c. Typically you‘ll find perl.lib in C:\perl\lib\CORE, if not, you should see the CORE directory
relative to perl −V:archlib. The studio will also need this path so it knows where to find Perl include
files. This path can be added via the Tools − Options − Directories menu. Finally, select Build − Build
interp.exe and you‘re ready to go.
582
Version 5.005_02
18−Oct−1998
perlembed
Perl Programmers Reference Guide
perlembed
MORAL
You can sometimes write faster code in C, but you can always write code faster in Perl. Because you can
use each from the other, combine them as you wish.
AUTHOR
Jon Orwant
This is a raw HTML paragraph
The paired commands "=begin" and "=end" work very similarly to "=for", but instead of only
accepting a single paragraph, all text from "=begin" to a paragraph with a matching "=end" are treated
as a particular format.
Here are some examples of how to use these:
=begin html
Figure 1.
=end html
=begin text
−−−−−−−−−−−−−−−
| foo
|
|
bar |
−−−−−−−−−−−−−−−
^^^^ Figure 1. ^^^^
=end text
Some format names that formatters currently are known to accept include "roff", "man", "latex", "tex",
"text", and "html". (Some formatters will treat some of these as synonyms.)
And don‘t forget, when using any command, that the command lasts up until the end of the
paragraph, not the line. Hence in the examples below, you can see the empty lines after each
command to end its paragraph.
Some examples of lists include:
=over 4
=item *
First item
=item *
Second item
=back
=over 4
=item Foo()
Description of Foo function
=item Bar()
Description of Bar function
=back
Ordinary Block of Text
It will be filled, and maybe even justified. Certain interior sequences are recognized both here and in
commands:
I
B
S
18−Oct−1998
italicize text, used for emphasis or variables
embolden text, used for switches and programs
text contains non−breaking spaces
Version 5.005_02
585
perlpod
Perl Programmers Reference Guide
perlpod
C
L
literal code
A link (cross reference) to name
L
manual page
L
item in manual page
L
section in other manual page
L<"sec">
section in this manual page
(the quotes are optional)
L"sec">
ditto
same as above but only ’text’ is used for output.
(Text can not contain the characters ’|’ or ’>’)
L
L
L
L
L
F