PDFlib 6 Reference Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 312 [warning: Documents this large are best viewed by clicking the View PDF Link!]

A library for generating PDF on the fly
Reference Manual
General Edition for
Cobol, C, C++, Java, Perl,
PHP, Python, RPG, and Tcl
PDFlib GmbH München, Germany
Version 6.0.1
Copyright © 1997–2004 PDFlib GmbH and Thomas Merz. All rights reserved.
PDFlib GmbH
Tal 40, 80331 München, Germany
phone +49 • 89 • 29 16 46 87
fax +49 • 89 • 29 16 46 86
If you have questions check the PDFlib mailing list and archive at groups.yahoo.com/group/pdflib
Licensing contact: sales@pdflib.com
Support for commercial PDFlib licensees: support@pdflib.com (please include your license number)
This publication and the information herein is furnished as is, is subject to change without notice, and
should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or lia-
bility for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with re-
spect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for par-
ticular purposes and noninfringement of third party rights.
PDFlib and the PDFlib logo are registered trademarks of PDFlib GmbH. PDFlib licensees are granted the
right to use the PDFlib name and logo in their product documentation. However, this is not required.
Adobe, Acrobat, and PostScript are trademarks of Adobe Systems Inc. AIX, IBM, OS/390, WebSphere, iSeries,
and zSeries are trademarks of International Business Machines Corporation. ActiveX, Microsoft, Windows,
and Windows NT are trademarks of Microsoft Corporation. Apple, Macintosh and TrueType are trademarks
of Apple Computer, Inc. Unicode and the Unicode logo are trademarks of Unicode, Inc. Unix is a trademark
of The Open Group. Java and Solaris are trademarks of Sun Microsystems, Inc. HKS is a registered trade-
mark of the HKS brand association: Hostmann-Steinberg, K+E Printing Inks, Schmincke. Other company
product and service names may be trademarks or service marks of others.
PANTONE® colors displayed in the software application or in the user documentation may not match
PANTONE-identified standards. Consult current PANTONE Color Publications for accurate color. PANTONE®
and other Pantone, Inc. trademarks are the property of Pantone, Inc. © Pantone, Inc., 2003.
Pantone, Inc. is the copyright owner of color data and/or software which are licensed to PDFlib GmbH to
distribute for use only in combination with PDFlib Software. PANTONE Color Data and/or Software shall
not be copied onto another disk or into memory unless as part of the execution of PDFlib Software.
PDFlib contains modified parts of the following third-party software:
ICClib, Copyright © 1997-2002 Graeme W. Gill
GIF image decoder, Copyright © 1990-1994 David Koblas
PNG image reference library (libpng), Copyright © 1998-2004 Glenn Randers-Pehrson
Zlib compression library, Copyright © 1995-2002 Jean-loup Gailly and Mark Adler
TIFFlib image library, Copyright © 1988-1997 Sam Leffler, Copyright © 1991-1997 Silicon Graphics, Inc.
Cryptographic software written by Eric Young, Copyright © 1995-1998 Eric Young (eay@cryptsoft.com)
Independent JPEG Group’s JPEG software, Copyright © 1991-1998, Thomas G. Lane
PDFlib contains the RSA Security, Inc. MD5 message digest algorithm.
Viva Software GmbH contributed improvements to the font handling for Mac OS.
Author: Thomas Merz
Design and illustrations: Alessio Leonardi
Quality control (manual): Katja Schnelle Romaus, Kurt Stützer
Quality control (software): a cast of thousands
Contents 3
0 Applying the PDFlib License Key 9
1 Introduction 11
1.1 PDFlib Programming 11
1.2 Major new Features in PDFlib 6 13
1.3 PDFlib Features 15
1.4 Availability of Features in different Products 17
2 PDFlib Language Bindings 19
2.1 Overview 19
2.2 Cobol Binding 20
2.2.1 Special Considerations for Cobol 20
2.2.2 The »Hello world« Example in Cobol 20
2.3 COM Binding 24
2.4 C Binding 24
2.4.1 Availability and Special Considerations for C 24
2.4.2 The »Hello world« Example in C 24
2.4.3 Using PDFlib as a DLL loaded at Runtime 25
2.4.4 Error Handling in C 26
2.4.5 Memory Management in C 27
2.4.6 Unicode in the C language binding 28
2.5 C++ Binding 28
2.5.1 Availability and Special Considerations for C++ 28
2.5.2 The »Hello world« Example in C++ 28
2.5.3 Error Handling in C++ 29
2.5.4 Memory Management in C++ 29
2.5.5 Unicode in the C++ language binding 29
2.6 Java Binding 30
2.6.1 Installing the PDFlib Java Edition 30
2.6.2 The »Hello world« Example in Java 31
2.6.3 Error Handling in Java 32
2.7 .NET Binding 33
2.8 Perl Binding 33
2.8.1 Installing the PDFlib Perl Edition 33
2.8.2 The »Hello world« Example in Perl 33
2.8.3 Error Handling in Perl 34
2.9 PHP Binding 34
2.9.1 Installing the PDFlib PHP Edition 34
2.9.2 The »Hello world« Example in PHP 35
2.9.3 Error Handling in PHP 37
2.10 Python Binding 38
2.10.1 Installing the PDFlib Python Edition 38
2.10.2 The »Hello world« Example in Python 38
2.10.3 Error Handling in Python 38
2.11 REALbasic Binding 39
2.12 RPG Binding 39
2.12.1 Compiling and Binding RPG Programs for PDFlib 39
2.12.2 The »Hello world« Example in RPG 39
2.12.3 Error Handling in RPG 41
2.13 Tcl Binding 43
2.13.1 Installing the PDFlib Tcl Edition 43
2.13.2 The »Hello world« Example in Tcl 43
2.13.3 Error Handling in Tcl 44
3 PDFlib Programming 45
3.1 General Programming 45
3.1.1 PDFlib Program Structure and Function Scopes 45
3.1.2 Parameters 45
3.1.3 Exception Handling 46
3.1.4 Option Lists 48
3.1.5 The PDFlib Virtual File System (PVF) 50
3.1.6 Resource Configuration and File Searching 51
3.1.7 Generating PDF Documents in Memory 55
3.1.8 Using PDFlib on EBCDIC-based Platforms 55
3.1.9 Large File Support 56
3.2 Page Descriptions 57
3.2.1 Coordinate Systems 57
3.2.2 Page Sizes and Coordinate Limits 59
3.2.3 Paths 60
3.2.4 Templates 61
3.3 Working with Color 63
3.3.1 Color and Color Spaces 63
3.3.2 Patterns and Smooth Shadings 63
3.3.3 Spot Colors 64
3.3.4 Color Management and ICC Profiles 67
3.4 Hypertext Elements 71
3.4.1 Examples for Creating Hypertext Elements 71
3.4.2 Formatting Options for Text Fields 74
4 Text Handling 77
4.1 Overview of Fonts and Encodings 77
4.1.1 Supported Font Formats 77
4.1.2 Encodings 78
Contents 5
4.1.3 Support for the Unicode Standard 79
4.2 Font Format Details 80
4.2.1 PostScript Fonts 80
4.2.2 TrueType and OpenType Fonts 81
4.2.3 User-Defined (Type 3) Fonts 83
4.3 Font Embedding and Subsetting 84
4.3.1 How PDFlib Searches for Fonts 84
4.3.2 Font Embedding 85
4.3.3 Font Subsetting 87
4.4 Encoding Details 89
4.4.1 8-Bit Encodings 89
4.4.2 Symbol Fonts and Font-specific Encodings 92
4.4.3 Glyph ID Addressing for TrueType and OpenType Fonts 93
4.4.4 The Euro Glyph 93
4.5 Unicode Support 95
4.5.1 Unicode for Page Content and Hypertext 95
4.5.2 Content Strings, Hypertext Strings, and Name Strings 96
4.5.3 String Handling in Unicode-capable Languages 97
4.5.4 String Handling in non-Unicode-capable Languages 98
4.5.5 Character References 100
4.5.6 Unicode-compatible Fonts 102
4.6 Text Metrics and Text Variations 104
4.6.1 Font and Character Metrics 104
4.6.2 Kerning 105
4.6.3 Text Variations 106
4.7 Chinese, Japanese, and Korean Text 108
4.7.1 CJK support in Acrobat and PDF 108
4.7.2 Standard CJK Fonts and CMaps 108
4.7.3 Custom CJK Fonts 112
4.7.4 Forcing monospaced Fonts 114
4.8 Placing and Fitting Single-Line Text 115
4.8.1 Simple Text Placement 115
4.8.2 Placing Text in a Box 116
4.8.3 Aligning Text 117
4.9 Multi-Line Textflows 118
4.9.1 Placing Textflows in the Fitbox 119
4.9.2 Paragraph Formatting Options 120
4.9.3 Inline Option Lists and Macros 121
4.9.4 Tab Stops 123
4.9.5 Numbered Lists 124
4.9.6 Control Characters, Character Mapping, and Symbol Fonts 125
4.9.7 Hyphenation 127
4.9.8 Controlling the Linebreak Algorithm 129
4.9.9 Formatting CJK Text with Textflow 132
5 Importing and Placing Objects 133
5.1 Importing Raster Images 133
5.1.1 Basic Image Handling 133
5.1.2 Supported Image File Formats 134
5.1.3 Image Masks and Transparency 136
5.1.4 Colorizing Images 138
5.1.5 Multi-Page Image Files 139
5.1.6 OPI Support 139
5.2 Importing PDF Pages with PDI (PDF Import Library) 140
5.2.1 PDI Features and Applications 140
5.2.2 Using PDI Functions with PDFlib 140
5.2.3 Acceptable PDF Documents 142
5.3 Placing Images and Imported PDF Pages 144
5.3.1 Scaling, Orientation, and Rotation 144
5.3.2 Adjusting the Page Size 146
6 Variable Data and Blocks 149
6.1 Installing the PDFlib Block Plugin 149
6.2 Overview of the PDFlib Block Concept 150
6.2.1 Complete Separation of Document Design and Program Code 150
6.2.2 Block Properties 151
6.2.3 Why not use PDF Form Fields? 152
6.3 Creating PDFlib Blocks 154
6.3.1 Creating Blocks interactively with the PDFlib Block Plugin 154
6.3.2 Editing Block Properties 156
6.3.3 Copying Blocks between Pages and Documents 157
6.3.4 Converting PDF Form Fields to PDFlib Blocks 159
6.4 Standard Properties for Automated Processing 161
6.4.1 General Properties 161
6.4.2 Text Properties 163
6.4.3 Image Properties 167
6.4.4 PDF Properties 167
6.4.5 Custom Properties 168
6.5 Querying Block Names and Properties 169
6.6 PDFlib Block Specification 171
6.6.1 PDF Object Structure for PDFlib Blocks 171
6.6.2 Generating PDFlib Blocks with pdfmarks 173
7 Generating various PDF Flavors 175
7.1 Acrobat and PDF Versions 175
7.2 Encrypted PDF 176
7.2.1 Strengths and Weaknesses of PDF Security 176
7.2.2 Protecting Documents with PDFlib 177
Contents 7
7.3 Web-Optimized (Linearized) PDF 179
7.4 PDF/X 180
7.4.1 The PDF/X Family of Standards 180
7.4.2 Generating PDF/X-conforming Output 181
7.4.3 Importing PDF/X Documents with PDI 183
7.5 Tagged PDF 185
7.5.1 Generating Tagged PDF with PDFlib 185
7.5.2 Creating Tagged PDF with direct Text Output and Textflows 187
7.5.3 Activating Items for complex Layouts 188
7.5.1 Using Tagged PDF in Acrobat 191
8 API Reference for PDFlib, PDI, and PPS 193
8.1 Data Types and Naming Conventions 193
8.2 General Functions 195
8.2.1 Setup 195
8.2.2 Document and Page 197
8.2.3 Parameter Handling 207
8.2.4 PDFlib Virtual File System (PVF) Functions 208
8.2.5 Exception Handling 209
8.2.6 Utility Functions 211
8.3 Text Functions 213
8.3.1 Font Handling 213
8.3.2 User-defined (Type 3) Fonts 217
8.3.3 Encoding Definition 219
8.3.4 Simple Text Output 219
8.3.5 Multi-Line Text Output with Textflows 227
8.4 Graphics Functions 236
8.4.1 Graphics State Functions 236
8.4.2 Saving and Restoring Graphics States 239
8.4.3 Coordinate System Transformation Functions 240
8.4.4 Explicit Graphics States 242
8.4.5 Path Construction 243
8.4.6 Path Painting and Clipping 246
8.4.7 Layer Functions 248
8.5 Color Functions 251
8.5.1 Setting Color and Color Space 251
8.5.2 Patterns and Shadings 255
8.6 Image and Template Functions 258
8.6.1 Images 258
8.6.2 Templates 264
8.6.3 Thumbnails 265
8.7 PDF Import Functions (PDI) 266
8.7.1 Document and Page 266
8.7.2 Other PDI Processing 270
8.7.3 PDI Parameter Handling 271
8.8 Block Filling Functions (PPS) 274
8.9 Hypertext Functions 278
8.9.1 Actions 278
8.9.2 Named Destinations 281
8.9.3 Annotations 283
8.9.4 Form Fields 286
8.9.5 Bookmarks 292
8.9.6 Document Information Fields 293
8.9.7 Deprecated Hypertext Parameters and Functions 294
8.10 Structure Functions for Tagged PDF 296
B PDFlib Quick Reference 301
C Revision History 306
Index 307
0 Applying the PDFlib License Key
All binary versions of PDFlib, PDFlib+PDI, and PPS supplied by PDFlib GmbH can be used
as fully functional evaluation versions regardless of whether or not you obtained a
commercial license. However, unlicensed versions will display a www.pdflib.com demo
stamp (the »nagger«) cross all generated pages. Companies which are seriously interest-
ed in PDFlib licensing and wish to get rid of the nagger during the evaluation phase or
for prototype demos can submit their company and project details with a brief explana-
tion to sales@pdflib.com, and apply for a temporary license key (we reserve the right to
refuse evaluation keys, e.g. for anonymous requests).
Once you purchased a license key you must apply it in order to get rid of the demo
stamp. There are several methods available:
>Add a line to your script or program which sets the license key at runtime:
PDF_set_parameter(p, "license", "...your license key...");
The license parameter must be set only once, immediately after instantiating the
PDFlib object (i.e., after PDF_new( ) or equivalent call).
>Enter the license key in a text file according to the following format (you can use the
license file template licensekeys.txt which is contained in all PDFlib distributions):
PDFlib license file 1.0
# Licensing information for PDFlib GmbH products
PDFlib 6.0.1 ...your license key...
The license file may contain license keys for multiple PDFlib GmbH products on sep-
arate lines. Next, you must inform PDFlib about the license file, either by setting the
licensefile parameter immediately after instantiating the PDFlib object (i.e., after PDF_
new( ) or equivalent call) as follows:
PDF_set_parameter(p, "licensefile", "/path/to/license/file");
or by setting the environment variable PDFLIBLICENSEFILE with a command similar to
the following:
export PDFLIBLICENSEFILE=/path/to/license/file
Note that PDFlib, PDFlib+PDI, and PDFlib Personalization Server (PPS) are different prod-
ucts which require different license keys although they are delivered in a single pack-
age. PDFlib+PDI license keys will also be valid for PDFlib, but not vice versa, and PPS
license keys will be valid for PDFlib+PDI and PDFlib. All license keys are platform-depen-
dent, and can only be used on the platform for which they have been purchased.
Accumulating individual CPU keys. If you purchased multiple CPU licenses with more
than one orders (as opposed to a single order for all of these CPU licenses), you can accu-
mulate all keys in the license file by entering one after the other. The function PDF_set_
parameter( ) also be called multiply for individual license keys. However, the Windows
registry cannot be used to accumulate license keys.
Evaluating features which are not yet licensed. You can fully evaluate all feature by
using the software without any license key applied. However, once you applied a valid
10 Chapter 0: Applying the PDFlib License Key
license key for a particular product using features of a higher category will no longer be
available. For example, if you installed a valid PDFlib license key the PDI functionality
will no longer be available for testing. Similarly, after installing a PDFlib+PDI license key
the personalization features (block functions) will no longer be available.
When a license key for a product has already been installed set a 0 dummy license
key to enable functionality of a higher product class for evaluation:
PDF_set_parameter(p, "license", "0");
This will enable the previously disabled functions, and re-activate the demo stamp
across all pages.
Licensing options. Different licensing options are available for PDFlib use on one or
more servers, and for redistributing PDFlib with your own products. We also offer sup-
port and source code contracts. Licensing details and the PDFlib purchase order form
can be found in the PDFlib distribution. Please contact us if you are interested in obtain-
ing a commercial PDFlib license, or have any questions:
PDFlib GmbH, Licensing Department
Tal 40, 80331 München, Germany
phone +49 • 89 • 29 16 46 87, fax +49 • 89 • 29 16 46 86
Licensing contact: sales@pdflib.com
Support for PDFlib licensees: support@pdflib.com
1.1 PDFlib Programming 11
1 Introduction
1.1 PDFlib Programming
What is PDFlib? PDFlib is a library which allows you to generate files in Adobe’s Porta-
ble Document Format (PDF). PDFlib acts as a backend to your own programs. While you
(the programmer) are responsible for retrieving the data to be processed, PDFlib takes
over the task of generating the PDF code which graphically represents your data. While
you must still format and arrange your text and graphical objects, PDFlib frees you
from the internal details of PDF. Our binary packages contain different products in a
single library:
>PDFlib contains all functions required to create PDF output containing text, vector
graphics and images plus hypertext elements.
>PDFlib+PDI includes all PDFlib functions, plus the PDF Import Library (PDI) for in-
cluding pages from existing PDF documents in the generated output.
>PDFlib Personalization Server (PPS) includes PDFlib+PDI, plus additional functions
for automatically filling PDFlib blocks. Blocks are placeholders on the page which
can be filled with text, images, or PDF pages. They can be created interactively with
the PDFlib Block Plugin for Adobe Acrobat (Mac or Windows), and will be filled auto-
matically with PPS. The plugin is included in PPS.
How can I use PDFlib? PDFlib is available on a variety of platforms, including Unix,
Windows, Mac, and EBCDIC-based systems such as IBM eServer iSeries and zSeries.
PDFlib itself is written in the C language, but it can be also accessed from several other
languages and programming environments which are called language bindings. These
language bindings cover all current Web and stand-alone application environments.
The Application Programming Interface (API) is easy to learn, and is identical for all
bindings. Currently the following bindings are supported:
>COM for use with Visual Basic, Active Server Pages with VBScript or JScript, Borland
Delphi, Windows Script Host, and other environments
>Cobol (IBM eServer zSeries)
>Java, including servlets
>.NET for use with C#, VB.NET, ASP.NET, and other environments
>PHP hypertext processor
>RPG (IBM eServer iSeries)
What can I use PDFlib for? PDFlib’s primary target is dynamic PDF creation within
your own software, or on the World Wide Web. Similar to HTML pages dynamically gen-
erated on the Web server, you can use a PDFlib program for dynamically generating PDF
12 Chapter 1: Introduction
reflecting user input or some other dynamic data, e.g. data retrieved from the Web ser-
ver’s database. The PDFlib approach offers several advantages:
>PDFlib can be integrated directly in the application generating the data, eliminating
the convoluted creation path application–PostScript–Acrobat Distiller–PDF.
>As an implication of this straightforward process, PDFlib is the fastest PDF-generat-
ing method, making it perfectly suited for the Web.
>PDFlib’s thread-safety as well as its robust memory and error handling support the
implementation of high-performance server applications.
>PDFlib is available for a variety of operating systems and development environ-
Requirements for using PDFlib. PDFlib makes PDF generation possible without wading
through the PDF specification. While PDFlib tries to hide technical PDF details from the
user, a general understanding of PDF is useful. In order to make the best use of PDFlib,
application programmers should ideally be familiar with the basic graphics model of
PostScript (and therefore PDF). However, a reasonably experienced application pro-
grammer who has dealt with any graphics API for screen display or printing shouldn’t
have much trouble adapting to the PDFlib API as described in this manual.
About this manual. This manual describes the API provided by PDFlib. It does not de-
scribe the process of building the library binaries. Functions not described in this manu-
al are unsupported, and should not be used. This manual does not attempt to explain
Acrobat features. Please refer to the Acrobat product literature, and the material cited at
the end of this manual for further reference. The PDFlib distribution contains addition-
al examples for calling PDFlib functions.
1.2 Major new Features in PDFlib 6 13
1.2 Major new Features in PDFlib 6
The following list discusses the most important new or improved features in PDFlib 6.
Programming improvements. Many restrictions in previous versions have been lifted.
For example, pages can be created in arbitrary order, new pages can be inserted between
existing ones, and more content can be added later to an existing page.
Layers. PDF’s layer functionality (introduced in Acrobat 6) is important for CAD and
engineering applications, but can also be used for impressive interactive documents,
multi-lingual documentation, etc. PDFlib supports all layer control features available in
PDF 1.5, including various controls which are not accessible in Acrobat.
Unicode. PDFlib 6 improves support for the Unicode standard by allowing Unicode
strings in all relevant areas, such as file names, page content, hypertext, form fields, etc.
This is especially important for users outside of Europe and North America.
Text formatting. The new textflow formatter offers a powerful, yet simple to use facil-
ity for formatting text according to a variety of options. Unicode text, ragged or justi-
fied text, arbitrary font changes, multi-line body text or large tables in an invoice – the
new textflow features handles all common formatting tasks.
Image handling. TIFF image processing has been extended to cover TIFF flavors which
were previously not supported, such as JPEG-compressed TIFFs or Lab and YCbCr color
spaces. Since PDF 1.5 supports 16 bit color depth TIFF and PNG images with 16 bit per col-
or component can now be converted to 16-bit color in PDF.
Tagged PDF. Tagged PDF is the key for accessible PDF according to section 508 in the
USA and similar regulations in other countries. PDFlib is the first PDF library for general
use which supports Tagged PDF generation. Using the new features it is very easy to cre-
ate Tagged PDF from dynamic data. The generated output can leverage all Acrobat fea-
tures for Tagged PDF, such as page reflow, read aloud, and improved export to other for-
mats such as RTF, HTML, or XML. In combination with the new textflow formatter large
amounts of text can quickly be transformed to Tagged PDF. For the first time ever, PDF
generated dynamically on the Web server can satisfy accessibility regulations.
PDF/X for Prepress. PDFlib 6 is the first software on the market to support generating
and processing PDF documents according to the latest 2003 editions of the PDF/X stan-
dards for prepress (PDF/X-1a:2003, PDF/X-2:2003, and PDF/X-3:2003). PDF/X plays an im-
portant role for file exchange in the prepress world. More and more publishers world-
wide standardize on PDF/X for data exchange in order to implement reliable data
exchange in the graphics arts industry. The new 2003 editions update, enhance, and
unify the PDF/X family of standards.
OPI for Prepress. Some workflows in the graphics arts industry still rely on the OPI
standard from the PostScript age, and use OPI information embedded in PDF docu-
ments. PDFlib 6 supports this by offering options for adding OPI information to import-
ed images.
14 Chapter 1: Introduction
Linearized PDF. PDFlib 6 generates linearized PDF, also known as web-optimized PDF.
This enables page-at-a-time download (also known as byteserving) when viewing PDFs
in the Web browser, and significantly enhances the user experience.
PDFlib Blocks for variable data processing. The user interface of the PDFlib block plug-
in for creating PDF templates has been extended and streamlined. Blocks can now be
filled with multi-line text, using the new textflow formatter. As a result, the PDFlib Per-
sonalization Server (PPS) is no longer restricted to simple mail-merge pieces with small
amounts of text, but can also be used for complex applications with advanced text for-
matting requirements.
Form fields. All types of PDF form fields can be generated and enhanced with Java-
Script and other actions. This can be used to create PDF forms dynamically subject to
user input or database information.
Hypertext. PDFlib’s hypertext features have been extended to fully support all PDF
options for bookmarks, actions, and annotations. Page labels can be created to attach a
symbolic name or roman numerals to a page, such as i, ii, iii... or A-1, A-2, etc.
REALbasic. As a new member in the large family of supported programming environ-
ments PDFlib 6 introduces a new language binding for REALbasic on Mac and Windows.
REALbasic is a language for developing applications for multiple platforms. PDFlib 6 for
REALbasic smoothly integrates into RB’s object model, supports Unicode strings, and
gives the developer access to all PDFlib features from within REALbasic.
1.3 PDFlib Features 15
1.3 PDFlib Features
Table 1.1 lists the major PDFlib features for generating and importing PDF. New or im-
proved features in PDFlib 6 are marked.
Table 1.1 Feature list for PDFlib, PDFlib+PDI, and the PDFlib Personalization Server (PPS)
topic features
PDF output PDF documents of arbitrary length, directly in memory (for Web servers) or on disk file
compression for text, vector graphics, image data, and file attachments
suspend/resume1 and insert page1 features to create pages out of order
PDF flavors PDF 1.3, 1.4, 1.5, and 1.6 (Acrobat 4, 5, 6, and 7)
Linearized (web-optimized) PDF for byteserving over the Web1
PDF input import pages from existing PDF documents (only PDFlib+PDI and PPS)
Blocks PDF personalization with PDFlib blocks for text, image, and PDF data (only PPS)
PDFlib Block plugin for Acrobat to create PDFlib blocks (only PPS), redesigned user interface1
Graphics common vector graphics primitives: lines, curves, arcs, rectangles, etc.
smooth shadings (color blends), pattern fills and strokes
efficiently re-use text or vector graphics with templates
explicit graphics state parameters for text knockout, overprinting etc.
transparency (opacity) and blend modes
layers1: optional page content which can selectively be enabled or disabled
Fonts TrueType (ttf and ttc) and PostScript Type 1 fonts (pfb and pfa, plus lwfn on the Mac)
OpenType fonts (ttf, otf) with PostScript or TrueType outlines
AFM and PFM PostScript font metrics files
font embedding
directly use fonts which are installed on the Windows or Mac host system
subsetting for TrueType and OpenType fonts
user-defined (Type 3) fonts for bitmap fonts or custom logos
Text output text output in different fonts; underlined, overlined, and strikeout text
kerning for PostScript, TrueType, and OpenType fonts
TrueType and OpenType glyph id addressing for advanced typesetting applications
proportional widths for standard CJK fonts
Unicode for page content, hypertext1, and file names1; UTF-8 and UCS-2 formats, little- and big-
fully integrated handling of Unicode strings in COM, Java, .NET, REALbasic, Tcl
support for a variety of encodings (international standards and vendor-specific code pages)
fetch code pages from the system (Windows, IBM eServer iSeries and zSeries)
standard CJK font and CMap support for Chinese, Japanese, and Korean text
custom CJK fonts in the TrueType and OpenType formats with Unicode encoding
embed Unicode information in PDF for correct text extraction in Acrobat
Images embed BMP, GIF, PNG, TIFF1, JPEG, and CCITT raster images
automatic detection of image file formats (file format sniffing)
transparent (masked) images including soft masks
image masks (transparent images with a color applied)
colorize images with a spot color
image interpolation (smooth images with low resolution)
16 Chapter 1: Introduction
Color grayscale, RGB, CMYK, CIE L*a*b* color
built-in PANTONE® and HKS® spot color tables
user-defined spot colors
ICC-based color with ICC color profiles: honor embedded profiles in images, or apply external
profiles to images
rendering intent for text, graphics, and raster images
default gray, RGB, and CMYK color spaces to remap device-dependent colors
Prepress generate output conforming to PDF/X-1, PDF/X-1a, PDF/X-21, and PDF/X-3, including 2003 flavors1
embed output intent ICC profile or reference standard output intent
copy output intent from imported PDF documents (only PDFlib+PDI and PPS)
create OPI 1.3 and OPI 2.0 information for imported images1
separation information (PlateColor)1
Formatting Textflow formatting1: format arbitrary amounts of text into one or more rectangular areas, using
hyphenation, font and color changes, various justification methods, control commands
text line placement and formatting
flexible image placement and formatting
Security generate output with 40-bit or 128-bit encryption
generate output with permission settings
import encrypted documents (master password required; only PDFlib+PDI and PPS)
Hypertext create form fields1 with all field options and JavaScript1
create actions1 for bookmarks, annotations, page open/close and other events
create bookmarks1 with a variety of options and controls
page transition effects, such as shades and mosaic
create all PDF annotation types1, such as PDF links, launch links (other document types), Web links
document information: standard fields (Title, Subject, Author, Keywords) plus unlimited number
of user-defined info fields
named destinations for links, bookmarks, and document open action
viewer preferences (hide menu bar, etc.)1
create page labels (symbolic names for pages)1
Tagged PDF create Tagged PDF1 and structure information for accessibility, page reflow, and improved
content repurposing
easily format large amounts of text for Tagged PDF1
language bindings for Cobol, COM, C, C++, Java, .NET, Perl, PHP1, Python, REALbasic1, RPG, Tcl
thread-safe and robust for deployment in multi-threaded server applications
virtual file system for supplying data in memory, e.g., images from a database
1. New or considerably improved in PDFlib 6
Table 1.1 Feature list for PDFlib, PDFlib+PDI, and the PDFlib Personalization Server (PPS)
topic features
1.4 Availability of Features in different Products 17
1.4 Availability of Features in different Products
Table 1.2 details the availability of features in the open source edition PDFlib Lite and
different commercial products.
Table 1.2 Availability of features in different products
Feature API functions and parameters
PDFlib Lite
(open source)
PDFlib Personalization
Server (PPS)
basic PDF generation (all except those listed below) XXXX
language bindings C, C++, Java, Perl, Tcl, PHP, Python XXXX
language bindings Cobol, COM, .NET, REALbasic, RPG X X X
works on EBCDIC systems X X X
password protection and
permission settings
PDF_begin_document( ) with userpassword,
masterpassword, permissions options
linearized PDF PDF_begin_document( ) with linearize option X X X
font subsetting PDF_load_font( ) with subsetting option X X X
kerning PDF_load_font( ) with kerning option X X X
access Mac and Windows
host fonts
PDF_load_font( ) X X X
access system encodings on
Windows, iSeries, zSeries
PDF_load_font( ) X X X
Unicode encoding and
ToUnicode CMaps
PDF_load_font( ) with encoding = unicode,
autocidfont, unicodemap parameters
numeric and character
entity references
charref option in PDF_fit_textline( ),
charref parameter
proportional glyph widths
for standard CJK fonts with
Unicode CMaps
PDF_load_font( ) with a UCS2-compatible CMap X X X
glyph ID addressing PDF_load_font( ) with encoding = glyphid X X X
extended encoding for Post-
Script-based OpenType fonts
PDF_load_font( ) X X X
Textflow PDF_create_textflow( ), PDF_delete_textflow( ),
PDF_fit_textflow( ), PDF_info_textflow( )
spot color PDF_makespotcolor( ) X X X
color separations PDF_begin_page_ext( ) with separationinfo option X X X
form fields PDF_create_field( ), PDF_create_fieldgroup( ), PDF_
create_action( ) with type=SetOCGState
JavaScript actions PDF_create_action( ) with type=JavaScript X X X
layers PDF_define_layer( ), PDF_begin_layer( ), PDF_end_
layer( ), PDF_set_layer_dependency( ), PDF_create_
action( ) with type=SetOCGState
Tagged PDF PDF_begin_item( ), PDF_end_item( ), PDF_
activate_item( ), PDF_begin_document( ) with
tagged and lang options
18 Chapter 1: Introduction
PDF/X support PDF_process_pdi( ), PDF_begin_document( ) with
pdfx option
ICC profile support PDF_load_iccprofile( ), PDF_setcolor( ) with icc-
basedgray/rgb/cmyk, PDF_load_image( ) with
honoriccprofile option, honoriccprofile parameter,
PDF_begin/end_page_ext( ) with defaultgray/rgb/
cmyk option
CIE L*a*b* color PDF_setcolor( ) with type = lab; Lab TIFF images X X X
OPI support PDF_load_image( ) with OPI-1.3/OPI-2.0 options X X X
PDF import (PDI) PDF_open_pdi( ), PDF_open_pdi_callback( ), PDF_
open_pdi_page( ), PDF_fit_pdi_page( ), PDF_
process_pdi( )
Query information from
existing PDF
PDF_get_pdi_value( ),
PDF_get_pdi_parameter( )
– – X X
variable data processing
and personalization with
PDF_fill_textblock( ),
PDF_fill_imageblock( ),
PDF_fill_pdfblock( )
query standard and custom
block properties
PDF_get_pdi_value( ), PDF_get_pdi_parameter( )
with vdp/Blocks keys
– – – X
PDFlib Block plugin for
interactively create PDFlib blocks for use with PPS X
Table 1.2 Availability of features in different products
Feature API functions and parameters
PDFlib Lite
(open source)
PDFlib Personalization
Server (PPS)
2.1 Overview 19
2 PDFlib Language Bindings
2.1 Overview
Availability and platforms. All PDFlib features are available on all platforms and in all
language bindings (with a few minor exceptions which are noted in the manual). Table
2.1 lists the language/platform combinations we used for testing.
PDFlib on embedded systems. It shall be noted that PDFlib can also be used on embed-
ded systems, and has been ported to the Windows CE, QNX, and EPOC environments as
well as custom embedded systems. For use with restricted environments certain fea-
tures are configurable in order to reduce PDFlib’s resource requirements. If you are in-
terested in details please contact us via sales@pdflib.com.
Table 2.1 Tested language and platform combinations
Unix (Linux, Solaris, HP-UX,
Mac OS X, AIX, IRIX a.o.) Windows NT4SP2 or above
IBM eServer
iSeries and zSeries
Cobol––ILE Cobol
COM ASP (PWS, IIS 4, 5, 6)
WSH (VBScript 5, JScript 5)
Visual Basic 6.0, Borland Delphi 5 – 7
ISO/ANSI C gcc 3, HP C, IBM C 6, Sun Workshop
6, and other ISO C compilers
Microsoft Visual C++ 6, VS .NET
Metrowerks CodeWarrior 8
Borland C++ Builder 6
IBM c89
ISO C++ gcc 3 and other ISO C++ compilers Microsoft Visual C++ 6, VS .NET
Metrowerks CodeWarrior 8
IBM c89
Java JDK 1.1.8, 1.2.2, 1.3, 1.4, 1.5 Sun JDK 1.1.8, 1.2.2, 1.3, 1.4, 1.5
ColdFusion MX
JDK 1.3.1
.NET .NET Framework 1.0, 1.1:
Perl Perl 5.6 – 5.8 Perl 5.6 – 5.8
PHP PHP 4.3.x, 5.0.x PHP 4.3.x, 5.0.x
Python Python 1.6, 2.0 – 2.3 Python 1.6, 2.0 – 2.3
REALbasic REALbasic 5.5 or above for Mac OS Classic, Mac OS X, and Windows
Tcl Tcl 8.3.2 and 8.4.4 Tcl 8.3.2 and 8.4.4
20 Chapter 2: PDFlib Language Bindings
2.2 Cobol Binding
2.2.1 Special Considerations for Cobol
The PDFlib API functions for Cobol are not available under the names documented in
Chapter 8, but use abbreviated function names instead. The short function names are
not documented here, but can be found in a separate cross-reference listing (xref.txt).
For example, instead of using PDF_load_font( ) the short form PDLODFNT must be used.
PDFlib clients written in Cobol are statically linked to the PDFLBCOB object. It in turn
dynamically loads the PDLBDLCB Load Module (DLL), which in turn dynamically loads
the PDFlib Load Module (DLL) upon the first call to PDNEW (which corresponds to PDF_
new( )). The instance handle of the newly allocated PDFlib internal structure is stored in
the P parameter which must be provided to each call that follows.
The PDLBDLCB load module provides the interfaces between the 8-character Cobol
functions and the core PDFlib routines. It also provides the mapping between PDFlib’s
asynchronous exception handling and the monolithic »check each function’s return
code« method that Cobol expects.
Note PDLBDLCB and PDFLIB must be made available to the COBOL program through the use of a
Data types. The data types used in the PDFlib API reference must be mapped to Cobol
data types as in the following samples (taken from the hello example below):
05 PDFLIB-A4-WIDTH USAGE COMP-1 VALUE 5.95E+2. // float
05 WS-INT PIC S9(9) BINARY. // int
05 WS-FLOAT COMP-1. // float
05 WS-STRING PIC X(128). // const char *
05 P PIC S9(9) BINARY. // long *
05 RETURN-RC PIC S9(9) BINARY. // int *
All Cobol strings passed to the PDFlib API should be defined with one extra byte of stor-
age for the expected LOW-VALUES (NULL) terminator.
Return values. The return value of PDFlib API functions will be supplied in an addi-
tional ret parameter which is passed by reference. It will be filled with the result of the
respective function call. A zero return value means the function call executed just fine;
other values signal an error, and PDF generation cannot be continued.
Functions which do not return any result (C functions with a void return type) don’t
use this additional parameter.
Error handling. PDFlib exception handling is not available in the Cobol language bind-
ing. Instead, all API functions support an additional return code (rc) parameter which
signals errors. The rc parameter is passed by reference, and will be used to report prob-
lems. A non-zero value indicates that the function call failed.
2.2.2 The »Hello world« Example in Cobol
The following example shows a simple Cobol program which links against PDFlib. Note
that it does not do any error checking:
2.2 Cobol Binding 21
05 P PIC S9(9) BINARY.
05 WS-STRING PIC X(128).
05 WS-STRING2 PIC X(128).
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
22 Chapter 2: PDFlib Language Bindings
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
STRING Z'Creator'
STRING Z'Hello.cbl'
STRING Z'Author'
STRING Z'Thomas Merz'
STRING Z'Hello, world (COBOL)!'
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
STRING Z'Helvetica-Bold'
STRING Z'ebcdic'
2.2 Cobol Binding 23
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
STRING Z'Hello, World!'
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
24 Chapter 2: PDFlib Language Bindings
2.3 COM Binding
(This section is only included in the COM/.NET/REALbasic edition of the PDFlib manual.)
2.4 C Binding
2.4.1 Availability and Special Considerations for C
PDFlib itself is written in the ANSI C language. In order to use the PDFlib C binding, you
can use a static or shared library (DLL on Windows and MVS), and you need the central
PDFlib include file pdflib.h for inclusion in your PDFlib client source modules. Alterna-
tively, pdflibdl.h can be used for dynamically loading the PDFlib DLL at runtime (see Sec-
tion 2.4.3, »Using PDFlib as a DLL loaded at Runtime«, page 25).
2.4.2 The »Hello world« Example in C
The following example shows a simple C program which links against a static or shared/
dynamic PDFlib library:
#include <stdio.h>
#include <stdlib.h>
#include "pdflib.h"
PDF *p;
int font;
if ((p = PDF_new()) == (PDF *) 0)
printf("Couldn't create PDFlib object (out of memory)!\n");
PDF_TRY(p) {
if (PDF_begin_document(p, "hello.pdf", 0, "") == -1) {
printf("Error: %s\n", PDF_get_errmsg(p));
PDF_set_info(p, "Creator", "hello.c");
PDF_set_info(p, "Author", "Thomas Merz");
PDF_set_info(p, "Title", "Hello, world (C)!");
PDF_begin_page_ext(p, a4_width, a4_height, "");
/* Change "host" encoding to "winansi" or whatever you need! */
font = PDF_load_font(p, "Helvetica-Bold", 0, "host", "");
PDF_setfont(p, font, 24);
PDF_set_text_pos(p, 50, 700);
PDF_show(p, "Hello, world!");
PDF_continue_text(p, "(says C)");
PDF_end_page_ext(p, "");
2.4 C Binding 25
PDF_end_document(p, "");
printf("PDFlib exception occurred in hello sample:\n");
printf("[%d] %s: %s\n",
PDF_get_errnum(p), PDF_get_apiname(p), PDF_get_errmsg(p));
return 0;
2.4.3 Using PDFlib as a DLL loaded at Runtime
While most clients will use PDFlib as a statically bound library or a dynamic library
which is bound at link time, you can also load the PDFlib DLL at runtime and dynamical-
ly fetch pointers to all API functions. This is especially useful to load the PDFlib DLL only
on demand, and on MVS where the library is customarily loaded as a DLL at runtime
without explicitly linking against PDFlib. PDFlib supports a special mechanism to facili-
tate this dynamic usage. It works according to the following rules:
>Include pdflibdl.h instead of pdflib.h.
>Use PDF_new_dl( ) and PDF_delete_dl( ) instead of PDF_new( ) and PDF_delete( ).
>Use PDF_TRY_DL( ) and PDF_CATCH_DL( ) instead of PDF_TRY( ) and PDF_CATCH( ).
>Use function pointers for all other PDFlib calls.
>PDF_get_opaque( ) must not be used.
>Compile the auxiliary module pdflibdl.c and link your application against it.
Note Loading the PDFlib DLL at runtime is supported on selected platforms only.
The following example loads the PDFlib DLL at runtime using this technique:
#include <stdio.h>
#include <stdlib.h>
#include "pdflibdl.h"
PDF *p;
int font;
PDFlib_api *PDFlib;
/* load the PDFlib dynamic library and create a new PDFlib object*/
if ((PDFlib = PDF_new_dl(&p)) == (PDFlib_api *) NULL)
printf("Couldn't create PDFlib object (DLL not found?)\n");
PDF_TRY_DL(PDFlib, p) {
if (PDFlib->PDF_begin_document(p, "hellodl.pdf", 0, "") == -1) {
26 Chapter 2: PDFlib Language Bindings
printf("Error: %s\n", PDFlib->PDF_get_errmsg(p));
PDFlib->PDF_set_info(p, "Creator", "hello.c");
PDFlib->PDF_set_info(p, "Author", "Thomas Merz");
PDFlib->PDF_set_info(p, "Title", "Hello, world (C DLL)!");
PDFlib->PDF_begin_page_ext(p, a4_width, a4_height, "");
/* Change "host" encoding to "winansi" or whatever you need! */
font = PDFlib->PDF_load_font(p, "Helvetica-Bold", 0, "host", "");
PDFlib->PDF_setfont(p, font, 24);
PDFlib->PDF_set_text_pos(p, 50, 700);
PDFlib->PDF_show(p, "Hello, world!");
PDFlib->PDF_continue_text(p, "(says C DLL)");
PDFlib->PDF_end_page_ext(p, "");
PDFlib->PDF_end_document(p, "");
printf("PDFlib exception occurred in hellodl sample:\n");
printf("[%d] %s: %s\n",
PDFlib->PDF_get_errnum(p), PDFlib->PDF_get_apiname(p),
PDF_delete_dl(PDFlib, p);
/* delete the PDFlib object and unload the library */
PDF_delete_dl(PDFlib, p);
return 0;
2.4.4 Error Handling in C
PDFlib supports structured exception handling with try/catch clauses. This allows C and
C++ clients to catch exceptions which are thrown by PDFlib, and react on the exception
in an adequate way. In the catch clause the client will have access to a string describing
the exact nature of the problem, a unique exception number, and the name of the
PDFlib API function which threw the exception. The general structure of a PDFlib C cli-
ent program with exception handling looks as follows:
...some PDFlib instructions...
printf("PDFlib exception occurred in hello sample:\n");
printf("[%d] %s: %s\n",
PDF_get_errnum(p), PDF_get_apiname(p), PDF_get_errmsg(p));
2.4 C Binding 27
Note PDF_TRY/PDF_CATCH are implemented as tricky preprocessor macros. Accidentally omitting
one of these will result in compiler error messages which may be difficult to comprehend. Make
sure to use the macros exactly as shown above, with no additional code between the TRY and
CATCH clauses (except PDF_CATCH( )).
If you want to leave a try clause before its end you must inform the exception machin-
ery before, using the PDF_EXIT_TRY( ) macro. No other PDFlib function must be called
between this macro and the end of the try block.
An important task of the catch clause is to clean up PDFlib internals using PDF_
delete( ) and the pointer to the PDFlib object. PDF_delete( ) will also close the output file if
necessary. PDFlib functions other than PDF_delete( ), PDF_get_opaque( ) and the excep-
tion functions PDF_get_errnum( ), PDF_get_apiname( ), and PDF_get_errmsg( ) must not be
called from within a client-supplied error handler. After fatal exceptions the PDF docu-
ment cannot be used, and will be left in an incomplete and inconsistent state. Obvious-
ly, the appropriate action when an exception occurs is completely application specific.
For C and C++ clients which do not catch exceptions, the default action upon excep-
tions is to issue an appropriate message on the standard error channel, and exit on fatal
errors. The PDF output file will be left in an incomplete state! Since this may not be ade-
quate for a library routine, for serious PDFlib projects it is strongly advised to leverage
PDFlib’s exception handling facilities. A user-defined catch clause may, for example,
present the error message in a GUI dialog box, and take other measures instead of abort-
Old-style error handlers. In addition to structured exception handling PDFlib also
supports the notion of a client-supplied callback function which be called when an ex-
ception occurs. However, this method is considered obsolete and supported for compat-
ibility reasons only. Error handlers will be ignored in PDF_TRY blocks.
2.4.5 Memory Management in C
In order to allow for maximum flexibility, PDFlib’s internal memory management rou-
tines (which are based on standard C malloc/free) can be replaced by external procedures
provided by the client. These procedures will be called for all PDFlib-internal memory
allocation or deallocation. Memory management routines can be installed with a call to
PDF_new2( ), and will be used in lieu of PDFlib’s internal routines. Either all or none of
the following routines must be supplied:
>an allocation routine
>a deallocation (free) routine
>a reallocation routine for enlarging memory blocks previously allocated with the al-
location routine.
The signatures of the memory routines can be found in Section 8.2, »General Func-
tions«, page 195. These routines must adhere to the standard C malloc/free/realloc se-
mantics, but may choose an arbitrary implementation. All routines will be supplied
with a pointer to the calling PDFlib object. The only exception to this rule is that the
very first call to the allocation routine will supply a PDF pointer of NULL. Client-provid-
ed memory allocation routines must therefore be prepared to deal with a NULL PDF
28 Chapter 2: PDFlib Language Bindings
Using the PDF_get_opaque( ) function, an opaque application specific pointer can be
retrieved from the PDFlib object. The opaque pointer itself is supplied by the client in
the PDF_new2( ) call. The opaque pointer is useful for multi-threaded applications which
may want to keep a pointer to thread- or class specific data inside the PDFlib object, for
use in memory management or error handling.
2.4.6 Unicode in the C language binding
Clients of the C language binding must take care not to use the standard text output
functions (PDF_show( ), PDF_show_xy( ), and PDF_continue_text( )) when the text may con-
tain embedded null characters. In such cases the alternate functions PDF_show2( ) etc.
must be used, and the length of the string must be supplied separately. This is not a
concern for all other language bindings since the PDFlib language wrappers internally
call PDF_show2( ) etc. in the first place.
2.5 C++ Binding
2.5.1 Availability and Special Considerations for C++
In addition to the pdflib.h C header file, an object-oriented wrapper for C++ is supplied
for PDFlib clients. It requires the pdflib.hpp header file, which in turn includes pdflib.h.
The corresponding pdflib.cpp module should be linked against the application which in
turn should be linked against the generic PDFlib C library.
Using the C++ object wrapper replaces the PDF_ prefix in all PDFlib function names
with a more object-oriented approach. Keep this in mind when reading the PDFlib API
descriptions in this manual which are documented in C style.
2.5.2 The »Hello world« Example in C++
#include <iostream>
#include "pdflib.hpp"
try {
int font;
PDFlib p;
if (p.begin_document("hello.pdf", "") == -1) {
cerr << "Error: " << p.get_errmsg() << endl;
return 2;
p.set_info("Creator", "hello.cpp");
p.set_info("Author", "Thomas Merz");
p.set_info("Title", "Hello, world (C++)!");
p.begin_page_ext((float) a4_width, (float) a4_height, "");
// Change "host" encoding to "winansi" or whatever you need!
font = p.load_font("Helvetica-Bold", "host", "");
2.5 C++ Binding 29
p.setfont(font, 24);
p.set_text_pos(50, 700);
p.show("Hello, world!");
p.continue_text("(says C++)");
catch (PDFlib::Exception &ex) {
cerr << "PDFlib exception occurred in hello sample: " << endl;
cerr << "[" << ex.get_errnum() << "] " << ex.get_apiname()
<< ": " << ex.get_errmsg() << endl;
return 2;
return 0;
2.5.3 Error Handling in C++
PDFlib API functions will throw a C++ exception in case of an error. These exceptions
must be caught in the client code by using C++ try/catch clauses. In order to provide ex-
tended error information the PDFlib class provides a public PDFlib::Exception class which
exposes methods for retrieving the detailed error message, the exception number, and
the name of the PDFlib API function which threw the exception.
Native C++ exceptions thrown by PDFlib routines will behave as expected. The fol-
lowing code fragment will catch exceptions thrown by PDFlib:
try {
...some PDFlib instructions...
catch (PDFlib::Exception &ex) {
cerr << "PDFlib exception occurred in hello sample: " << endl;
cerr << "[" << ex.get_errnum() << "] " << ex.get_apiname()
<< ": " << ex.get_errmsg() << endl;
return 2;
2.5.4 Memory Management in C++
Client-supplied memory management for the C++ binding works the same as with the C
language binding.
The PDFlib constructor accepts an optional error handler, optional memory manage-
ment procedures, and an optional opaque pointer argument. Default NULL arguments
are supplied in pdflib.hpp which will result in PDFlib’s internal error and memory man-
agement routines becoming active. All memory management functions must be »C«
functions, not C++ methods.
2.5.5 Unicode in the C++ language binding
C++ users must be aware of a pitfall related to the compiler automatically converting
literal strings to the C++ string type which is expected by the PDFlib API functions: this
conversion supports embedded null characters only if an explicit length parameter is
30 Chapter 2: PDFlib Language Bindings
supplied. For example, the following will not work since the string will be truncated at
the first null character:
p.show("\x00\x41\x96\x7B\x8C\xEA"); // Wrong!
To fix this problem apply the string constructor with an explicit length parameter:
p.show(string("\x00\x41\x96\x7B\x8C\xEA", 6)); // Correct
2.6 Java Binding
Java supports a portable mechanism for attaching native language code to Java pro-
grams, the Java Native Interface (JNI). The JNI provides programming conventions for
calling native C or C++ routines from within Java code, and vice versa. Each C routine
has to be wrapped with the appropriate code in order to be available to the Java VM, and
the resulting library has to be generated as a shared or dynamic object in order to be
loaded into the Java VM.
PDFlib supplies JNI wrapper code for using the library from Java. This technique al-
lows us to attach PDFlib to Java by loading the shared library from the Java VM. The ac-
tual loading of the library is accomplished via a static member function in the pdflib
Java class. Therefore, the Java client doesn’t have to bother with the specifics of shared
library handling.
Taking into account PDFlib’s stability and maturity, attaching the native PDFlib li-
brary to the Java VM doesn’t impose any stability or security restrictions on your Java
application, while at the same time offering the performance benefits of a native imple-
mentation. Regarding portability remember that PDFlib is available for all platforms
where there is a Java VM!
2.6.1 Installing the PDFlib Java Edition
For the PDFlib binding to work, the Java VM must have access to the PDFlib Java wrapper
and the PDFlib Java package.
The PDFlib Java package. PDFlib is organized as a Java package with the following
package name:
This package is available in the pdflib.jar file and contains a single class called pdflib. Us-
ing the source files provided in the PDFlib Lite distribution you can generate an abbrevi-
ated HTML version of the PDFlib API reference (this manual) using the javadoc utility
since the PDFlib class contains the necessary javadoc comments. Comments and restric-
tions for using PDFlib with specific Java environments may be found in text files in the
distribution set.
In order to supply this package to your application, you must add pdflib.jar to your
CLASSPATH environment variable, add the option -classpath pdflib.jar in your calls to the
Java compiler and runtime, or perform equivalent steps in your Java IDE. In the JDK you
can configure the Java VM to search for native libraries in a given directory by setting
the java.library.path property to the name of the directory, e.g.
java -Djava.library.path=. pdfclock
2.6 Java Binding 31
You can check the value of this property as follows:
In addition, the following platform-dependent steps must be performed:
Unix. The library libpdf_java.so (on Mac OS X: libpdf_java.jnilib) must be placed in one of
the default locations for shared libraries, or in an appropriately configured directory.
Windows. The library pdf_java.dll must be placed in the Windows system directory, or
a directory which is listed in the PATH environment variable.
PDFlib servlets and Java application servers. PDFlib is perfectly suited for server-side
Java applications, especially servlets. The PDFlib distribution contains examples of
PDFlib Java servlets which demonstrate the basic use. When using PDFlib with a specific
servlet engine the following configuration issues must be observed:
>The directory where the servlet engine looks for native libraries varies among ven-
dors. Common candidate locations are system directories, directories specific to the
underlying Java VM, and local directories of the servlet engine. Please check the doc-
umentation supplied by the vendor of your servlet engine.
>Servlets are often loaded by a special class loader which may be restricted, or use a
dedicated classpath. For some servlet engines it is required to define a special engine
classpath to make sure that the PDFlib package will be found.
More detailed notes on using PDFlib with specific servlet engines and Java application
servers can be found in additional documentation in the PDFlib distribution.
Note Since the EJB (Enterprise Java Beans) specification disallows the use of native libraries, PDFlib
cannot be used within EJBs.
2.6.2 The »Hello world« Example in Java
import java.io.*;
import com.pdflib.pdflib;
import com.pdflib.PDFlibException;
public class hello
public static void main (String argv[])
int font;
pdflib p = null;
p = new pdflib();
if (p.begin_document("hello.pdf", "") == -1) {
throw new Exception("Error: " + p.get_errmsg());
p.set_info("Creator", "hello.java");
p.set_info("Author", "Thomas Merz");
p.set_info("Title", "Hello world (Java)!");
p.begin_page_ext(595, 842, "");
32 Chapter 2: PDFlib Language Bindings
font = p.load_font("Helvetica-Bold", "unicode", "");
p.setfont(font, 18);
p.set_text_pos(50, 700);
p.show("Hello world!");
p.continue_text("(says Java)");
} catch (PDFlibException e) {
System.err.print("PDFlib exception occurred in hello sample:\n");
System.err.print("[" + e.get_errnum() + "] " + e.get_apiname() +
": " + e.get_errmsg() + "\n");
} catch (Exception e) {
} finally {
if (p != null) {
2.6.3 Error Handling in Java
The Java binding installs a special error handler which translates PDFlib errors to native
Java exceptions. In case of an exception PDFlib will throw a native Java exception of the
following class:
The Java exceptions can be dealt with by the usual try/catch technique:
try {
...some PDFlib instructions...
} catch (PDFlibException e) {
System.err.print("PDFlib exception occurred in hello sample:\n");
System.err.print("[" + e.get_errnum() + "] " + e.get_apiname() +
": " + e.get_errmsg() + "\n");
} catch (Exception e) {
} finally {
if (p != null) {
p.delete(); /* delete the PDFlib object */
Since PDFlib declares appropriate throws clauses, client code must either catch all possi-
ble PDFlib exceptions, or declare those itself.
2.7 .NET Binding 33
2.7 .NET Binding
(This section is only included in the COM/.NET/REALbasic edition of the PDFlib manual.)
2.8 Perl Binding
Perl1 supports a mechanism for extending the language interpreter via native C librar-
ies. The PDFlib wrapper for Perl consists of a C wrapper file and a Perl package module.
The C module is used to build a shared library which the Perl interpreter loads at run-
time, with some help from the package file. Perl scripts refer to the shared library mod-
ule via a use statement.
2.8.1 Installing the PDFlib Perl Edition
The Perl extension mechanism loads shared libraries at runtime through the DynaLoad-
er module. The Perl executable must have been compiled with support for shared librar-
ies (this is true for the majority of Perl configurations).
For the PDFlib binding to work, the Perl interpreter must access the PDFlib Perl wrap-
per and the module file pdflib_pl.pm. In addition to the platform-specific methods de-
scribed below you can add a directory to Perl’s @INC module search path using the -I
command line option:
perl -I/path/to/pdflib hello.pl
Unix. Perl will search both pdflib_pl.so (on Mac OS X: pdflib_pl.dylib) and pdflib_pl.pm in
the current directory, or the directory printed by the following Perl command:
perl -e 'use Config; print $Config{sitearchexp};'
Perl will also search the subdirectory auto/pdflib_pl. Typical output of the above com-
mand looks like
Windows. PDFlib supports the ActiveState port of Perl 5 to Windows, also known as
ActivePerl.2 Both pdflib_pl.dll and pdflib_pl.pm will be searched in the current directory,
or the directory printed by the following Perl command:
perl -e "use Config; print $Config{sitearchexp};"
Typical output of the above command looks like
C:\Program Files\Perl5.8\site\lib
2.8.2 The »Hello world« Example in Perl
use pdflib_pl 6.0;
$p = PDF_new();
eval {
1. See www.perl.com
2. See www.activestate.com
34 Chapter 2: PDFlib Language Bindings
if (PDF_begin_document($p, "hello.pdf", "") == -1) {
printf("Error: %s\n", PDF_get_errmsg($p));
PDF_set_info($p, "Creator", "hello.pl");
PDF_set_info($p, "Author", "Thomas Merz");
PDF_set_info($p, "Title", "Hello world (Perl)!");
PDF_begin_page_ext($p, 595, 842, "");
$font = PDF_load_font($p, "Helvetica-Bold", "winansi", "");
PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
PDF_show($p, "Hello world!");
PDF_continue_text($p, "(says Perl)");
PDF_end_page_ext($p, "");
PDF_end_document($p, "");
if ($@) {
printf("hello: PDFlib Exception occurred:\n");
printf(" $@\n");
2.8.3 Error Handling in Perl
The Perl binding installs a special error handler which translates PDFlib errors to native
Perl exceptions. The Perl exceptions can be dealt with by applying the appropriate lan-
guage constructs, i.e., by bracketing critical sections:
eval {
...some PDFlib instructions...
die "Exception caught" if $@;
2.9 PHP Binding
2.9.1 Installing the PDFlib PHP Edition
Detailed information about the various flavors and options for using PDFlib with PHP1,
including the question of whether or not to use a loadable PDFlib module for PHP, can
be found in the PDFlib-in-PHP-HowTo.pdf document which can be found on the PDFlib
Web site.
You must configure PHP so that it knows about the external PDFlib library. You have
two choices:
>Add one of the following lines in php.ini:
extension=libpdf_php.so ; for Unix
1. See www.php.net
2.9 PHP Binding 35
extension=libpdf_php.dll ; for Windows
PHP will search the library in the directory specified in the extension_dir variable in
php.ini on Unix, and in the standard system directories on Windows. You can test
which version of the PHP PDFlib binding you have installed with the following one-
line PHP script:
This will display a long info page about your current PHP configuration. On this page
check the section titled pdf. If this section contains PDFlib GmbH Binary Version (and
the PDFlib version number) you are using the supported new PDFlib wrapper. The
unsupported old wrapper will display PDFlib GmbH Version instead.
>Load PDFlib at runtime with one of the following lines at the start of your script:
dl("libpdf_php.so"); # for Unix
dl("libpdf_php.dll"); # for Windows
PHP 5 features. PDFlib takes advantage of the following new features in PHP 5:
>New object model: the PDFlib functions are encapsulated within a PDFlib object.
>Exceptions: PDFlib exceptions will be propagated as PHP 5 exceptions, and can be
caught with the usual try/catch technique. New-style exception handling can be
used with both the new object-oriented approach and the old API functions.
See below for more details on these PHP 5 features.
Modified error return for PDFlib functions in PHP. Since PHP uses the convention of re-
turning the value 0 (FALSE) when an error occurs within a function, all PDFlib functions
have been adjusted to return 0 instead of -1 in case of an error. This difference is noted
in the function descriptions in Chapter 8. However, take care when reading the code
fragment examples in Section 3, »PDFlib Programming«, page 45 since these use the
usual PDFlib convention of returning -1 in case of an error.
File name handling in PHP. Unqualified file names (without any path component) and
relative file names for PDF, image, font and other disk files are handled differently in
Unix and Windows versions of PHP:
>PHP on Unix systems will find files without any path component in the directory
where the script is located.
>PHP on Windows will find files without any path component only in the directory
where the PHP DLL is located.
In order to provide platform-independent file name handling use of PDFlib’s Search-
Path facility (see Section 3.1.6, »Resource Configuration and File Searching«, page 51) is
strongly recommended.
2.9.2 The »Hello world« Example in PHP
Example for PHP 4. The following sample works with PHP 4:
$p = PDF_new();
/* open new PDF file; insert a file name to create the PDF on disk */
if (PDF_begin_document($p, "", "") == 0) {
36 Chapter 2: PDFlib Language Bindings
die("Error: " . PDF_get_errmsg($p));
PDF_set_info($p, "Creator", "hello.php");
PDF_set_info($p, "Author", "Rainer Schaaf");
PDF_set_info($p, "Title", "Hello world (PHP)!");
PDF_begin_page_ext($p, 595, 842, "");
$font = PDF_load_font($p, "Helvetica-Bold", "winansi", "");
PDF_setfont($p, $font, 24.0);
PDF_set_text_pos($p, 50, 700);
PDF_show($p, "Hello world!");
PDF_continue_text($p, "(says PHP)");
PDF_end_page_ext($p, "");
PDF_end_document($p, "");
$buf = PDF_get_buffer($p);
$len = strlen($buf);
header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=hello.pdf");
print $buf;
Example for PHP 5. The following sample uses the new exception handling and object
encapsulation features available in PHP 5:
try {
$p = new PDFlib();
/* open new PDF file; insert a file name to create the PDF on disk */
if ($p->begin_document("", "") == 0) {
die("Error: " . $p->get_errmsg());
$p->set_info("Creator", "hello.php");
$p->set_info("Author", "Rainer Schaaf");
$p->set_info("Title", "Hello world (PHP)!");
$p->begin_page_ext(595, 842, "");
$font = $p->load_font("Helvetica-Bold", "winansi", "");
$p->setfont($font, 24.0);
$p->set_text_pos(50, 700);
$p->show("Hello world!");
$p->continue_text("(says PHP)");
2.9 PHP Binding 37
$buf = $p->get_buffer();
$len = strlen($buf);
header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=hello.pdf");
print $buf;
catch (PDFlibException $e) {
die("PDFlib exception occurred in hello sample:\n" .
"[" . $e->get_errnum() . "] " . $e->get_apiname() . ": " .
$e->get_errmsg() . "\n");
catch (Exception $e) {
$p = 0;
2.9.3 Error Handling in PHP
Error handling in PHP 4. When a PDFlib exception occurs, a PHP exception is thrown.
Since PHP 4 does not support structured exception handling there is no way to catch ex-
ceptions and act appropriately. Do not disable PHP warnings when using PDFlib, or you
will run into serious trouble.
PDFlib warnings (nonfatal errors) are mapped to PHP warnings, which can be dis-
abled in php.ini. Alternatively, warnings can be disabled at runtime with a PDFlib func-
tion call like in any other language binding:
PDF_set_parameter($p, "warning", "false");
Exception handling in PHP 5. Since PHP 5 supports structured exception handling,
PDFlib exceptions will be propagated as PHP exceptions. PDFlib will throw an exception
of the class PDFlibException, which is derived from PHP’s standard Exception class. You
can use the standard try/catch technique to deal with PDFlib exceptions:
try {
...some PDFlib instructions...
} catch (PDFlibException $e) {
print "PDFlib exception occurred:\n";
print "[" . $e->get_errnum() . "] " . $e->get_apiname() . ": "
$e->get_errmsg() . "\n";
catch (Exception $e) {
print $e;
Note that you can use PHP 5-style exception handling regardless of whether you work
with the old function-based PDFlib interface, or the new object-oriented one.
38 Chapter 2: PDFlib Language Bindings
2.10 Python Binding
2.10.1 Installing the PDFlib Python Edition
The Python1 extension mechanism works by loading shared libraries at runtime. For the
PDFlib binding to work, the Python interpreter must have access to the PDFlib Python
Unix. The library pdflib_py.so (on Mac OS X: pdflib_py.dylib) will be searched in the direc-
tories listed in the PYTHONPATH environment variable.
Windows. The library pdflib_py.dll will be searched in the directories listed in the
PYTHONPATH environment variable.
2.10.2 The »Hello world« Example in Python
from sys import *
from pdflib_py import *
p = PDF_new()
if PDF_begin_document(p, "hello.pdf", "") == -1:
print "Error: " + PDF_get_errmsg(p) + "\n"
PDF_set_info(p, "Author", "Thomas Merz")
PDF_set_info(p, "Creator", "hello.py")
PDF_set_info(p, "Title", "Hello world (Python)")
PDF_begin_page_ext(p, 595, 842, "")
font = PDF_load_font(p, "Helvetica-Bold", "winansi", "")
PDF_setfont(p, font, 24)
PDF_set_text_pos(p, 50, 700)
PDF_show(p, "Hello world!")
PDF_continue_text(p, "(says Python)")
PDF_end_page_ext(p, "")
PDF_end_document(p, "")
2.10.3 Error Handling in Python
The Python binding installs a special error handler which translates PDFlib errors to na-
tive Python exceptions. The Python exceptions can be dealt with by the usual try/catch
...some PDFlib instructions...
print 'Exception caught!'
1. See www.python.org
2.11 REALbasic Binding 39
2.11 REALbasic Binding1
(This section is only included in the COM/.NET/REALbasic edition of the PDFlib manual.)
2.12 RPG Binding
PDFlib provides a /copy module that defines all prototypes and some useful constants
needed to compile ILE-RPG programs with embedded PDFlib functions.
Since all functions provided by PDFlib are implemented in the C language, you have
to add x'00' at the end of each string value passed to a PDFlib function. All strings re-
turned from PDFlib will have this terminating x'00' as well.
2.12.1 Compiling and Binding RPG Programs for PDFlib
Using PDFlib functions from RPG requires the compiled PDFlib service program. To in-
clude the PDFlib definitions at compile time you have to specify the name in the D
specs of your ILE-RPG program:
If the PDFlib source file library is not on top of your library list you have to specify the li-
brary as well:
Before you start compiling your ILE-RPG program you have to create a binding directory
that includes the PDFLIB service program shipped with PDFlib. The following example
assumes that you want to create a binding directory called PDFLIB in the library PDFLIB:
After creating the binding directory you need to add the PDFLIB service program to your
binding directory. The following example assumes that you want to add the service pro-
gram PDFLIB in the library PDFLIB to the binding directory created earlier.
Now you can compile your program using the CRTBNDRPG command (or option 14 in
2.12.2 The »Hello world« Example in RPG
d p S *
d font s 10i 0
d error s 50
d errmsg_p s *
1. See www.realbasic.com
40 Chapter 2: PDFlib Language Bindings
d errmsg s 200 based(errmsg_p)
d filename s 256
d fontname s 50
d fontenc s 50
d infokey s 50
d infoval s 200
d text s 200
d n s 1 inz(x'00')
d empty s 1 inz(x'00')
c clear error
* Init on PDFlib
c eval p=pdf_new
c if p=*null
c eval error='Couldn''t create PDFlib object '+
c '(out of memory)!'
c exsr exit
c endif
* Open new pdf file
c eval filename='hello.pdf'+x'00'
c if PDF_begin_document(p:filename:0:empty) = -1
c exsr geterrmsg
c exsr exit
c endif
* This is required to avoid problems on Japanese systems
c eval infokey='hypertextencoding'+x'00'
c eval infoval='ebcdic'+x'00'
c callp PDF_set_parameter(p:infokey:infoval)
* Set info "Creator"
c eval infokey='Creator'+x'00'
c eval infoval='hello.rpg'+x'00'
c callp PDF_set_info(p:infokey:infoval)
* Set info "Author"
c eval infokey='Author'+x'00'
c eval infoval='Thomas Merz'+x'00'
c callp PDF_set_info(p:infokey:infoval)
* Set info "Title"
c eval infokey='Title'+x'00'
c eval infoval='Hello, world (RPG)!'+x'00'
c callp PDF_set_info(p:infokey:infoval)
c callp PDF_begin_page_ext(p:a4_width:a4_height:
c empty)
c eval fontname='Helvetica-Bold'+x'00'
c eval fontenc='ebcdic'+x'00'
c eval font=PDF_load_font(p:fontname:0:fontenc:n)
c callp PDF_setfont(p:font:24)
c callp PDF_set_text_pos(p:50:700)
c eval text='Hello world!'+x'00'
c callp PDF_show(p:text)
c eval text='(says ILE RPG)'+x'00'
c callp PDF_continue_text(p:text)
c callp PDF_end_page_ext(p:empty)
2.12 RPG Binding 41
c callp PDF_end_document(p:empty)
c callp PDF_delete(p)
c exsr exit
c geterrmsg begsr
c eval errmsg_p=PDF_get_errmsg(p)
c if errmsg_p<>*NULL
c eval error=%subst(errmsg:1:%scan(x'00':errmsg)-1)
c endif
c endsr
c exit begsr
c if error<>*blanks
c eval error='Error: '+error
c error dsply
c endif
c seton lr
c return
c endsr
You can compile this program as follows:
2.12.3 Error Handling in RPG
PDFlib clients written in ILE-RPG can install an error handler in PDFlib which will be ac-
tivated when an exception occurs. Since ILE-RPG translates all procedure names to up-
percase, the name of the error handler procedure should be specified in uppercase. The
following skeleton demonstrates this technique:
d p S *
d font s 10i 0
d error s 50
d errhdl s * procptr
* Prototype for exception handling procedure
d errhandler PR
d p * value
d type 10i 0 value
d shortmsg 2048
c clear error
* Set the procedure pointer to the ERRHANDLER procedure.
c eval errhdl=%paddr('ERRHANDLER')
42 Chapter 2: PDFlib Language Bindings
c eval p=pdf_new2(errhdl:*null:*null:*null:*null)
...PDFlib instructions...
c callp PDF_delete(p)
c exsr exit
c exit begsr
c if error<>*blanks
c error dsply
c endif
c seton lr
c return
c endsr
* If any of the PDFlib functions will cause an exception, first the error handler
* will be called and after that we will get a regular RPG exception.
c *pssr begsr
c exsr exit
c endsr
* Exception Handler Procedure
* This procedure will be linked to PDFlib by passing the procedure pointer to
* PDF_new2. This procedure will be called when a PDFlib exception occurs.
p errhandler B
d errhandler PI
d p * value
d type 10i 0 value
d c_message 2048
d length s 10i 0
* Chop off the trailing x'00' (we are called by a C program)
* and set the error (global) string
c clear error
c x'00' scan c_message length 50
c sub 1 length
c if *in50 and length>0
c if length>%size(error)
c eval error=c_message
c else
c eval error=%subst(c_message:1:length)
c endif
c endif
* Always call PDF_delete to clean up PDFlib
c callp PDF_delete(p)
c return
p errhandler E
2.13 Tcl Binding 43
2.13 Tcl Binding
2.13.1 Installing the PDFlib Tcl Edition
The Tcl 1extension mechanism works by loading shared libraries at runtime. For the
PDFlib binding to work, the Tcl shell must have access to the PDFlib Tcl wrapper shared
library and the package index file pkgIndex.tcl. You can use the following idiom in your
script to make the library available from a certain directory (this may be useful if you
want to deploy PDFlib on a machine where you don’t have root privilege for installing
lappend auto_path /path/to/pdflib
Unix. The library pdflib_tcl.so (on Mac OS X: pdflib_tcl.dylib) must be placed in one of
the default locations for shared libraries, or in an appropriately configured directory.
Usually both pkgIndex.tcl and pdflib_tcl.so will be placed in the directory
Windows. The files pkgIndex.tcl and pdflib_tcl.dll will be searched for in the directories
C:\Program Files\Tcl\lib\pdflib
C:\Program Files\Tcl\lib\tcl8.3\pdflib
2.13.2 The »Hello world« Example in Tcl
package require pdflib 6.0
set p [PDF_new]
if {[PDF_begin_document $p "hello.pdf" ""] == -1} {
puts stderr "Error: [PDF_get_errmsg $p]"
PDF_set_info $p "Creator" "hello.tcl"
PDF_set_info $p "Author" "Thomas Merz"
PDF_set_info $p "Title" "Hello world (Tcl)"
PDF_begin_page_ext $p 595 842 ""
set font [PDF_load_font $p "Helvetica-Bold" "unicode" ""]
PDF_setfont $p $font 24.0
PDF_set_text_pos $p 50 700
PDF_show $p "Hello world!"
PDF_continue_text $p "(says Tcl)"
PDF_end_page_ext $p ""
PDF_end_document $p ""
PDF_delete $p
1. See www.tcl.tk
44 Chapter 2: PDFlib Language Bindings
2.13.3 Error Handling in Tcl
The Tcl binding installs a special error handler which translates PDFlib errors to native
Tcl exceptions. The Tcl exceptions can be dealt with by the usual try/catch technique:
if [ catch { ...some PDFlib instructions... } result ] {
puts stderr "Exception caught!"
puts stderr $result
3.1 General Programming 45
3 PDFlib Programming
3.1 General Programming
3.1.1 PDFlib Program Structure and Function Scopes
PDFlib applications must obey certain structural rules which are very easy to under-
stand. Writing applications according to these restrictions is straightforward. For exam-
ple, you don’t have to think about opening a document first before closing it. Since the
PDFlib API is very closely modelled after the document/page paradigm, generating doc-
uments the »natural« way usually leads to well-formed PDFlib client programs.
PDFlib enforces correct ordering of function calls with a strict scoping system. The
function descriptions specify the allowed scope for a particular function. Calling a func-
tion from a different scope will trigger a PDFlib exception. PDFlib will also throw an ex-
ception if bad parameters are supplied by a library client.
The function descriptions in Chapter 8 reference these scopes; the scope definitions
can be found in Table 3.1. Figure 3.1 depicts the nesting of scopes. PDFlib will throw an
exception if a function is called outside the allowed scope. You can query the current
scope with the scope parameter.
3.1.2 Parameters
PDFlib’s operation can be controlled by a variety of global parameters. These will retain
their settings across the life span of the PDFlib object, or until they are explicitly
changed by the client. The following functions can be used for dealing with parameters:
>PDF_set_parameter( ) can be used to set parameters of type string.
>PDF_set_value( ) can be used to set parameters with numerical values.
>PDF_get_parameter( ) can be used to query parameters of type string.
>PDF_get_value( ) can be used to query the values of numerical parameters.
Details of parameter names and possible values can be found in Chapter 8.
Table 3.1 Function scope definitions
scope name definition
path started by one of PDF_moveto( ), PDF_circle( ), PDF_arc( ), PDF_arcn( ), or PDF_rect( );
terminated by any of the functions in Section 8.4.6, »Path Painting and Clipping«, page 246
page between PDF_begin_page( ) and PDF_end_page( ), but outside of path scope
template between PDF_begin_template( ) and PDF_end_template( ), but outside of path scope
pattern between PDF_begin_pattern( ) and PDF_end_pattern( ), but outside of path scope
font between PDF_begin_font( ) and PDF_end_font( ), but outside of glyph scope
glyph between PDF_begin_glyph( ) and PDF_end_glyph( ), but outside of path scope
document between PDF_begin_document( ) and PDF_end_document( ), but outside of page, template,
pattern, and font scope
object in Java: the lifetime of the pdflib object, but outside of document scope;
in other bindings between PDF_new( ) and PDF_delete( ), but outside of document scope
null outside of object scope
any when a function description mentions »any« scope it actually means any except null, since a
PDFlib object doesn’t even exist in null scope.
46 Chapter 3: PDFlib Programming
3.1.3 Exception Handling
Errors of a certain kind are called exceptions in many languages for good reasons – they
are mere exceptions, and are not expected to occur very often during the lifetime of a
program. The general strategy is to use conventional error reporting mechanisms (read:
special error return codes) for function calls which may go wrong often times, and use a
special exception mechanism for those rare occasions which don’t warrant cluttering
the code with conditionals. This is exactly the path that PDFlib goes: Some operations
can be expected to go wrong rather frequently, for example:
>Trying to open an output file for which one doesn’t have permission
>Trying to open an input PDF with a wrong file name
>Trying to open a corrupt image file
PDFlib signals such errors by returning a special value (usually – 1, but 0 in the PHP
binding) as documented in the API reference. Other events may be considered harmful,
but will occur rather infrequently, e.g.
>running out of virtual memory
>scope violations (e.g., closing a document before opening it)
>supplying wrong parameters to PDFlib API functions (e.g., trying to draw a circle with
negative radius)
When PDFlib detects such a situation, an exception will be thrown instead of passing a
special error return value to the caller. In the C programming language, which doesn’t
natively support exceptions, the client can install a custom routine (called an error han-
dler) which will be called in case of an exception. However, the recommended method is
to make use of PDF_TRY( )/PDF_CATCH( ) blocks as detailed in Section 2.4.4, »Error Hand-
ling in C«, page 26.
page page page page
template pattern
. . .
. . .
path path
page page page page
template pattern
. . .
path path
glyph glyph glyph
Fig. 3.1
Nesting of scopes
3.1 General Programming 47
It is important to understand that the generated PDF document cannot be finished
after an exception occurred. The only methods which can safely be called after an ex-
ception are PDF_delete( ), PDF_get_apiname( ), PDF_get_errnum( ), and PDF_get_errmsg( ).
Calling any other PDFlib method after an exception may lead to unexpected results. The
exception (or data passed to the C error handler) will contain the following information:
>A unique error number (see Table 3.2);
>The name of the PDFlib API function which caused the exception;
>A descriptive text containing details of the problem;
Disabling exceptions. Some exceptions can be disabled. These fall into two categories:
non-fatal errors (warnings) and errors which may or may not justify an exception de-
pending on client preferences.
Warnings generally indicate some problem in your PDFlib code which you should in-
vestigate more closely. However, processing may continue in case of non-fatal errors.
For this reason, you can suppress warnings using the following function call:
PDF_set_parameter(p, "warning", "false");
In addition to the global warning parameter, some functions also support dedicated op-
tions for enabling or disabling warnings for individual function calls. The suggested
strategy is to enable warnings during the development cycle (and closely examine pos-
sible warnings), and disable warnings in a production system.
Certain operations may be considered fatal for some clients, while others are pre-
pared to deal with the situation. In these cases the behavior of the respective PDFlib API
function changes according to a parameter. This distinction is implemented for loading
fonts, images, imported PDF documents, and ICC profiles. For example, if a font cannot
be loaded due to some configuration problem, one client may simply give up, while an-
other may choose another font instead. When the parameter or option fontwarning is
set to true, an exception will be thrown when the font cannot be loaded. Otherwise the
function will return an error code instead. The parameter can be set as follows:
PDF_set_parameter(p, "fontwarning", "false");
Querying the reason of a failed function call. As noted above, the generated PDF out-
put document must always be abandoned when an exception occurs. Some clients,
however, may prefer to continue the document by adjusting some parameters. For ex-
ample, when a particular font cannot be loaded most clients will give up the document,
while others may prefer to work with a different font. This distinction can be achieved
with the fontwarning etc. parameters. In this case it may be desirable to retrieve the er-
ror message that would have been part of the exception. In this situation the functions
PDF_get_errnum( ), PDF_get_errmsg( ), and PDF_get_apiname( ) may be called immediately
Table 3.2 Ranges of PDFlib exception numbers
error ranges reasons
1000 – 1999 (PDCORE library): memory, I/O, arguments, parameters/values, options
2000 – 3999 (PDFlib library): configuration, scoping, graphics and text, color, images, fonts, encodings, PDF/X,
hypertext, Tagged PDF, layers
4000 – 4999 (PDF import library PDI): configuration and parameter, corrupt PDF (file, object, or stream level)
48 Chapter 3: PDFlib Programming
after a failed function call, i.e., a function call which returned with a -1 (in PHP: 0) error
The following code fragments summarize different strategies with respect to excep-
tion handling. The examples try to load and embed a font, assuming that this font is not
If the fontwarning parameter or option is true (which is the default) the document
must be abandoned:
font = PDF_load_font(p, "MyFontName", 0, "winansi", "fontwarning=true");
/* unless an exception was thrown the font handle is valid;
* when an exception occurred the PDF output cannot be continued
If the fontwarning parameter or option is false the return value must be checked for va-
lidity. If it indicates failure, the reason of the failure can be queried in order to properly
deal with the situation:
font = PDF_load_font(p, "MyFontName", 0, "winansi", "fontwarning=false";
if (font == -1) {
/* font handle is invalid; find out what happened. */
errmsg = PDF_get_errmsg(p);
/* Log error message */
/* Try a different font or give up */
/* font handle is valid; continue as usual */
3.1.4 Option Lists
Option lists are a powerful yet easy method to control PDFlib operations. Instead of
requiring a multitude of function parameters, many PDFlib API methods support op-
tion lists, or optlists for short. These are strings which may contain an arbitrary number
of options. Since option lists will be evaluated from left to right an option can be sup-
plied multiply within the same list; in this case the last occurrence will overwrite earlier
ones. Optlists support various data types and composite data like arrays. In most lan-
guages optlists can easily be constructed by concatenating the required keywords and
values. C programmers may want to use the sprintf( ) function in order to construct
An optlist is a string containing one or more pairs of the form
name value
Names and values, as well as multiple name/value pairs can be separated by arbitrary
whitespace characters (space, tab, carriage return, newline). The value may consist of a
list of multiple values. You can also use an equal sign ’=’ between name and value:
Simple values. Simple values may use any of the following data types:
>Boolean: true or false; if the value of a boolean option is omitted, the value true is as-
sumed. As a shorthand notation noname can be used instead of name=false.
>String: these are plain ASCII strings which are generally used for non-localizable key-
words. Strings containing whitespace or ’=’ characters must be bracketed with { and }.
3.1 General Programming 49
An empty string can be constructed with {}. The characters {, }, and \ must be preced-
ed by an additional \character if they are supposed to be part of the string.
>Content strings, hypertext strings and name strings: these can hold Unicode content
in various formats; for details on these string types see Section 4.5, »Unicode Sup-
port«, page 95).
>Unichar: these are single Unicode characters, where several syntax variants are sup-
ported: decimal values (e.g. 173), hexadecimal values prefixed with x, X, 0x, 0X, or U+
(xAD, 0xAD, U+00AD), numerical or character references according to Section 4.5.5,
»Character References«, page 100, but without the ’&’ and ’;’ decoration (shy, #xAD,
#173). Unichars must be in the range 0-65535 (0-xFFFF).
>Keyword: one of a predefined list of fixed keywords
>Float and integer: decimal floating point or integer numbers; point and comma can
be used as decimal separators for floating point values. Integer values can start with
x, X, 0x, or 0X to specify hexadecimal values. Some options (this is stated in the re-
spective documentation) support percentages by adding a % character directly after
the value.
>Handle: several PDFlib-internal object handles, e.g., font, image, or action handles.
Technically these are integer values.
Depending on the type and interpretation of an option additional restrictions may ap-
ply. For example, integer or float options may be restricted to a certain range of values;
handles must be valid for the corresponding type of object, etc. Conditions for options
are documented in their respective function descriptions in Chapter 8. Some examples
for simple values (the first line shows a password string containing a blank character):
PDF_open_pdi( ): password {secret string}
PDF_create_gstate( ): linewidth 0.5 blendmode overlay opacityfill 0.75
PDF_load_font( ): embedding=true subsetting=true subsetlimit=50 kerning=false
PDF_load_font( ): embedding subsetting subsetlimit=50 nokerning
PDF_create_textflow( ) leading=150%
PDF_create_textflow( ) charmapping={ 0x0A 0x20 }
List values. List values consist of multiple values, which may be simple values or list
values in turn. Lists are bracketed with { and }. Some examples for list values:
PDF_fit_image( ): boxsize={500 600} position={50 0}
PDF_create_gstate( ): dasharray={11 22 33}
Rectangles. A rectangle is a list of four float values specifying the coordinates of the
lower left and upper right corners of a rectangle. The coordinate system for interpreting
the rectangle coordinates (standard or user coordinate system) varies depending on the
option, and is documented separately. Example:
PDF_begin_document( ): cropbox {0 0 500 600}
Action lists. An action list specifies one or more actions. Each entry in the list consists
of an event keyword (trigger) and a list of action handles which must have been created
with PDF_create_action( ). Actions will be performed in the listed order. The set of al-
lowed events (e.g. docopen) and the type of actions (e.g. JavaScript) are documented sep-
arately for the respective option. Examples (assuming the values 0, 1, and 2 have been
returned by earlier calls to PDF_create_action( ):
50 Chapter 3: PDFlib Programming
PDF_begin_document( ): action {open 0}
PDF_create_bookmark( ): action {activate {0 1 2}}
PDF_create_field( ): action {keystroke 0 format 1 validate 2}
Color values. Color values are lists consisting of a color space keyword and a list with a
variable number of float values depending on the particular color space. Color space
keywords are the same as for PDF_setcolor( ) (see Section 8.5.1, »Setting Color and Color
Space«, page 251), the possible values are explained in Section 3.3.1, »Color and Color
Spaces«, page 63:
>The color space keywords gray, rgb, and cmyk can be supplied along with one, three, or
four float values.
>The color space keyword lab can be supplied along with three float values.
>The color space keyword spot can be supplied along with a spot color handle. Alterna-
tively, the color space keyword spotname can be supplied along with a spot color
name and a float value containing the color tint.
>The color space keywords iccbasedgray, iccbasedrgb, and iccbasedcmyk can be supplied
along with one, three, or four float values.
>The color space keyword none can be supplied to specify the absence of color.
As detailed in the respective function descriptions in Chapter 8, a particular option list
may only supply a subset of the keywords presented above. Some examples for color
PDF_fill_textblock( ): strokecolor={ rgb 1 0 0 }
PDF_fill_textblock( ): bordercolor=none
PDF_fill_textblock( ): fillcolor={ spotname {PANTONE 281 U} 0.5 }
3.1.5 The PDFlib Virtual File System (PVF)
In addition to disk files a facility called PDFlib Virtual File System (PVF) allows clients to di-
rectly supply data in memory without any disk files involved. This offers performance
benefits and can be used for data fetched from a database which does not even exist on
an isolated disk file, as well as other situations where the client already has the required
data available in memory as a result of some processing.
PVF is based on the concept of named virtual read-only files which can be used just
like regular file names with any API function. They can even be used in UPR configura-
tion files. Virtual file names can be generated in an arbitrary way by the client. Obvious-
ly, virtual file names must be chosen such that name clashes with regular disk files are
avoided. For this reason a hierarchical naming convention for virtual file names is rec-
ommended as follows (filename refers to a name chosen by the client which is unique in
the respective category). It is also recommended to keep standard file name suffixes:
>Raster image files: /pvf/image/filename
>font outline and metrics files (it is recommended to use the actual font name as the
base portion of the file name): /pvf/font/filename
>ICC profiles: /pvf/iccprofile/filename
>Encodings and codepages: /pvf/codepage/filename
>PDF documents: /pvf/pdf/filename
When searching for a named file PDFlib will first check whether the supplied file name
refers to a known virtual file, and then try to open the named file on disk.
3.1 General Programming 51
Lifetime of virtual files. Some functions will immediately consume the data supplied
in a virtual file, while others will read only parts of the file, with other fragments being
used at a later point in time. For this reason close attention must be paid to the lifetime
of virtual files. PDFlib will place an internal lock on every virtual file, and remove the
lock only when the contents are no longer needed. Unless the client requested PDFlib to
make an immediate copy of the data (using the copy option in PDF_create_pvf( )), the vir-
tual file’s contents must only be modified, deleted, or freed by the client when it is no
longer locked by PDFlib. PDFlib will automatically delete all virtual files in PDF_delete( ).
However, the actual file contents (the data comprising a virtual file) must always be
freed by the client.
Different strategies. PVF supports different approaches with respect to managing the
memory required for virtual files. These are governed by the fact that PDFlib may need
access to a virtual file’s contents after the API call which accepted the virtual file name,
but never needs access to the contents after PDF_close( ). Remember that calling PDF_
delete_pvf( ) does not free the actual file contents (unless the copy option has been sup-
plied), but only the corresponding data structures used for PVF file name administra-
tion. This gives rise to the following strategies:
>Minimize memory usage: it is recommended to call PDF_delete_pvf( ) immediately af-
ter the API call which accepted the virtual file name, and another time after PDF_
close( ). The second call is required because PDFlib may still need access to the data so
that the first call refuses to unlock the virtual file. However, in some cases the first
call will already free the data, and the second call doesn’t do any harm. The client
may free the file contents only when PDF_delete_pvf( ) succeeded.
>Optimize performance by reusing virtual files: some clients may wish to reuse some
data (e.g., font definitions) within various output documents, and avoid multiple
create/delete cycles for the same file contents. In this case it is recommended not to
call PDF_delete_pvf( ) as long as more PDF output documents using the virtual file
will be generated.
>Lazy programming: if memory usage is not a concern the client may elect not to call
PDF_delete_pvf( ) at all. In this case PDFlib will internally delete all pending virtual
files in PDF_delete( ).
In all cases the client may free the corresponding data only when PDF_delete_pvf( ) re-
turned successfully, or after PDF_delete( ).
3.1.6 Resource Configuration and File Searching
In most advanced applications PDFlib needs access to resources such as font file, encod-
ing definition, ICC color profiles, etc. In order to make PDFlib’s resource handling plat-
form-independent and customizable, a configuration file can be supplied for describing
the available resources along with the names of their corresponding disk files. In addi-
tion to a static configuration file, dynamic configuration can be accomplished at run-
time by adding resources with PDF_set_parameter( ). For the configuration file we dug
out a simple text format called Unix PostScript Resource (UPR) which came to life in the
era of Display PostScript, and is still in use on several systems. However, we extended
the original UPR format for our purposes. The UPR file format as used by PDFlib will be
described below. There is a utility called makepsres (often distributed as part of the X
Window System) which can be used to automatically generate UPR files from PostScript
font outline and metrics files.
52 Chapter 3: PDFlib Programming
Resource categories. The resource categories supported by PDFlib are listed in Table
3.3. Other resource categories may be present in the UPR file for compatibility with Dis-
play PostScript installations, but they will silently be ignored.
Redundant resource entries should be avoided. For example, do not include multiple
entries for a certain font’s metrics data. Also, the font name as configured in the UPR file
should exactly match the actual font name in order to avoid confusion (although
PDFlib does not enforce this restriction).
In Mac OS Classic the colon character ’:’ must be used as a directory separator. The
font names of resource-based PostScript Type 1 fonts (LWFN fonts) must be specified us-
ing the full path including volume name, for example:
The UPR file format. UPR files are text files with a very simple structure that can easily
be written in a text editor or generated automatically. To start with, let’s take a look at
some syntactical issues:
>Lines can have a maximum of 255 characters.
>A backslash ’\’ escapes newline characters. This may be used to extend lines.
>An isolated period character ’ . ’ serves as a section terminator.
>All entries are case-sensitive.
>Comment lines may be introduced with a percent ’%’ character, and terminated by
the end of the line.
>Whitespace is ignored everywhere except in resource names and file names.
UPR files consist of the following components:
>A magic line for identifying the file. It has the following form:
>A section listing all resource categories described in the file. Each line describes one
resource category. The list is terminated by a line with a single period character.
Available resource categories are described below.
>A section for each of the resource categories listed at the beginning of the file. Each
section starts with a line showing the resource category, followed by an arbitrary
number of lines describing available resources. The list is terminated by a line with a
single period character. Each resource data line contains the name of the resource
(equal signs have to be quoted). If the resource requires a file name, this name has to
Table 3.3 Resource categories supported in PDFlib
resource category name explanation
SearchPath Relative or absolute path name of directories containing data files
FontAFM PostScript font metrics file in AFM format
FontPFM PostScript font metrics file in PFM format
FontOutline PostScript, TrueType or OpenType font outline file
Encoding text file containing an 8-bit encoding or code page table
HostFont Name of a font installed on the system. The value can be encoded in ASCII or UTF-
8 with initial BOM. The latter can be useful for localized host font names.
ICCProfile name of an ICC color profile
StandardOutputIntent name of a standard output condition for PDF/X
3.1 General Programming 53
be added after an equal sign. The SearchPath (see below) will be applied when PDFlib
searches for files listed in resource entries.
File searching and the SearchPath resource category. PDFlib reads a variety of data
items, such as raster images, font outline and metrics information, encoding defini-
tions, PDF documents, and ICC color profiles from disk files. In addition to relative or ab-
solute path names you can also use file names without any path specification. The
SearchPath resource category can be used to specify a list of path names for directories
containing the required data files. When PDFlib must open a file it will first use the file
name exactly as supplied and try to open the file. If this attempt fails PDFlib will try to
open the file in the directories specified in the SearchPath resource category one after
another until it succeeds. SearchPath entries can be accumulated, and will be searched in
reverse order (paths set at a later point in time will searched before earlier ones). This
feature can be used to separate the PDFlib application from platform-specific file sys-
tem schemes. In order to disable the search you can use a fully specified path name in
the PDFlib functions.
On Windows PDFlib will initialize the SearchPath resource category with an entry
read from the following registry entry:
This registry entry may contain a list of path names separated by a semicolon ’;’ char-
On IBM iSeries the SearchPath resource category will be initialized with the following
On MVS the SearchPath feature is not supported.
Sample UPR file. The following listing gives an example of a UPR configuration file as
used by PDFlib. It describes some font metrics and outline files plus a custom encoding:
54 Chapter 3: PDFlib Programming
Searching for the UPR resource file. If only the built-in resources (e.g., PDF core font,
built-in encodings, sRGB ICC profile) or system resources (host fonts) are to be used, a
UPR configuration file is not required, since PDFlib will find all necessary resources
without any additional configuration.
If other resources are to be used you can specify such resources via calls to PDF_set_
parameter( ) (see below) or in a UPR resource file. PDFlib reads this file automatically
when the first resource is requested. The detailed process is as follows:
>If the environment variable PDFLIBRESOURCE is defined PDFlib takes its value as the
name of the UPR file to be read. If this file cannot be read an exception will be
>If the environment variable PDFLIBRESOURCE is not defined PDFlib tries to open a file
with the following name:
upr (on MVS; a dataset is expected)
pdflib/<version>/fonts/pdflib.upr (on IBM eServer iSeries)
pdflib.upr (Windows, Unix, and all other systems)
If this file cannot be read no exception will be thrown.
>On Windows PDFlib will additionally try to read the registry entry
The value of this entry (which will be created by the PDFlib installer, but can also be
created by other means) will be taken as the name of the resource file to be used. If
this file cannot be read an exception will be thrown.
>The client can force PDFlib to read a resource file at runtime by explicitly setting the
resourcefile parameter:
PDF_set_parameter(p, "resourcefile", "/path/to/pdflib.upr");
This call can be repeated arbitrarily often; the resource entries will be accumulated.
Configuring resources at runtime. In addition to using a UPR file for the configuration,
it is also possible to directly configure individual resources within the source code via
the PDF_set_parameter( ) function. This function takes a category name and a corre-
sponding resource entry as it would appear in the respective section of this category in
a UPR resource file, for example:
PDF_set_parameter(p, "FontAFM", "Foobar-Bold=foobb___.afm")
PDF_set_parameter(p, "FontOutline", "Foobar-Bold=foobb___.pfa")
3.1 General Programming 55
3.1.7 Generating PDF Documents in Memory
In addition to generating PDF documents on a file, PDFlib can also be instructed to gen-
erate the PDF directly in memory (in-core). This technique offers performance benefits
since no disk-based I/O is involved, and the PDF document can, for example, directly be
streamed via HTTP. Webmasters will be especially happy to hear that their server will
not be cluttered with temporary PDF files.
You may, at your option, periodically collect partial data (e.g., every time a page has
been finished), or fetch the complete PDF document in one big chunk at the end (after
PDF_end_document( )). Interleaving production and consumption of the PDF data has
several advantages. Firstly, since not all data must be kept in memory, the memory re-
quirements are reduced. Secondly, such a scheme can boost performance since the first
chunk of data can be transmitted over a slow link while the next chunk is still being
generated. However, the total length of the generated data will only be known when the
complete document is finished.
The active in-core PDF generation interface. In order to generate PDF data in memory,
simply supply an empty filename to PDF_begin_document( ), and retrieve the data with
PDF_get_buffer( ):
PDF_begin_document(p, "", 0, "")
...create document...
buf = PDF_get_buffer(p, &size);
... use the PDF data contained in the buffer ...
Note The PDF data in the buffer must be treated as binary data.
This is considered »active« mode since the client decides when he wishes to fetch the
buffer contents. Active mode is available for all supported language bindings.
Note C and C++ clients must not free the returned buffer.
The passive in-core PDF generation interface. In »passive« mode, which is only avail-
able in the C and C++ language bindings, the user installs (via PDF_open_document_
callback( )) a callback function which will be called at unpredictable times by PDFlib
whenever PDF data is waiting to be consumed. Timing and buffer size constraints relat-
ed to flushing (transferring the PDF data from the library to the client) can be config-
ured by the client in order to provide for maximum flexibility. Depending on the envi-
ronment, it may be advantageous to fetch the complete PDF document at once, in
multiple chunks, or in many small segments in order to prevent PDFlib from increasing
the internal document buffer. The flushing strategy can be set using the flush option of
PDF_open_document_callback( )).
3.1.8 Using PDFlib on EBCDIC-based Platforms
The operators and structure elements in the PDF file format are based on ASCII, making
it difficult to mix text output and PDF operators on EBCDIC-based platforms such as
IBM eServer iSeries 400 and zSeries S/390. However, a special mainframe version of
56 Chapter 3: PDFlib Programming
PDFlib has been carefully crafted in order to allow mixing of ASCII-based PDF operators
and EBCDIC (or other) text output. The EBCDIC-safe version of PDFlib is available for
various operating systems and machine architectures.
In order to leverage PDFlib’s features on EBCDIC-based platforms the following items
are expected to be supplied in EBCDIC text format (more specifically, in code page 037
on iSeries, and code page 1047 on zSeries):
>PFA font files, UPR configuration files, AFM font metrics files
>encoding and code page files
>string parameters to PDFlib functions
>input and output file names
>environment variables (if supported by the runtime environment)
>PDFlib error messages will also be generated in EBCDIC format (except in Java).
If you prefer to use input text files (PFA, UPR, AFM, encodings) in ASCII format you can
set the asciifile parameter to true (default is false). PDFlib will then expect these files in
ASCII encoding. String parameters will still be expected in EBCDIC encoding, however.
In contrast, the following items must always be treated in binary mode (i.e., any con-
version must be avoided):
>PDF input and output files
>PFB font outline and PFM font metrics files
>TrueType and OpenType font files
>image files and ICC profiles
3.1.9 Large File Support
In this section the term »large file« is used for files with a size of more than 2 GB. Al-
though there doesn’t seem to be any need for such large files for the average user, there
are actually enterprise application which create or process single large files containing
large numbers of, say, invoices or statements. In such a scenario the file size may exceed
the limit of 2 GB.
PDFlib supports large output files, i.e. it can create PDF output with more than 2 GB.
PDI supports processing of large input files as well. However, large file support is only
available on platforms where the underlying operating system supports large files na-
tively. Obviously, the file system in use must also support large files. Note that Acrobat
6 and older versions are unable to process large files. However, Acrobat 7 properly deals
with large files.
Note Imported files other than PDF, such as fonts and images, can not exceed the 2 GB limit. PDF
output fragments fetched with the PDF_get_buffer( ) interface are also subject to this limit. Fi-
nally, PDF output files are generally limited to 1010 bytes, which is roughly 9.3 GB.
3.2 Page Descriptions 57
3.2 Page Descriptions
3.2.1 Coordinate Systems
PDF’s default coordinate system is used within PDFlib. The default coordinate system
(or default user space) has the origin in the lower left corner of the page, and uses the
DTP point as unit:
1 pt = 1/72 inch = 25.4/72 mm = 0.3528 mm
The first coordinate increases to the right, the second coordinate increases upwards.
PDFlib client programs may change the default user space by rotating, scaling, translat-
ing, or skewing, resulting in new user coordinates. The respective functions for these
transformations are PDF_rotate( ), PDF_scale( ), PDF_translate( ), and PDF_skew( ). If the
user space has been transformed, all coordinates in graphics and text functions must be
supplied according to the new coordinate system. The coordinate system is reset to the
default coordinate system at the start of each page.
Using metric coordinates. Metric coordinates can easily be used by scaling the coor-
dinate system. The scaling factor is derived from the definition of the DTP point given
PDF_scale(p, 28.3465, 28.3465);
After this call PDFlib will interpret all coordinates (except for hypertext features, see be-
low) in centimeters since 72/2.54 = 28.3465.
Coordinates for hypertext elements. PDF always expects coordinates for hypertext
functions, such as the rectangle coordinates for creating text annotations, links, and file
annotations in the default coordinate system, and not in the (possibly transformed)
user coordinate system. Since this is very cumbersome PDFlib offers automatic conver-
sion of user coordinates to the format expected by PDF. This automatic conversion is ac-
tivated by setting the usercoordinates parameter to true:
PDF_set_parameter(p, "usercoordinates", "true");
Since PDF supports only hypertext rectangles with edges parallel to the page edges, the
supplied rectangles must be modified when the coordinate system has been trans-
formed by scaling, rotating, translating, or skewing it. In this case PDFlib will calculate
the smallest enclosing rectangle with edges parallel to the page edges, transform it to
default coordinates, and use the resulting values instead of the supplied coordinates.
The overall effect is that you can use the same coordinate systems for both page con-
tent and hypertext elements when the usercoordinates parameter has been set to true.
Visualizing coordinates. In order to assist PDFlib users in working with PDF’s coordi-
nate system, the PDFlib distribution contains the PDF file grid.pdf which visualizes the
coordinates for several common page sizes. Printing the appropriately sized page on
transparent material may provide a useful tool for preparing PDFlib development.
Acrobat 5/6 (full version only, not the free Reader) also has a helpful facility. Simply
choose Window, Info to display a measurement palette which uses points as units. Note
58 Chapter 3: PDFlib Programming
that the coordinates displayed refer to an origin in the top left corner of the page, and
not PDF’s default origin in the lower left corner.
Don’t be mislead by PDF printouts which seem to experience wrong page dimen-
sions. These may be wrong because of some common reasons:
>The Shrink oversized pages to paper size option has been checked in Acrobat’s print dia-
log, resulting in scaled print output.
>Non-PostScript printer drivers are not always able to retain the exact size of printed
Rotating objects. It is important to understand that objects cannot be modified once
they have been drawn on the page. Although there are PDFlib functions for rotating,
translating, scaling, and skewing the coordinate system, these do not affect existing ob-
jects on the page but only subsequently drawn objects. Rotating text, images, and im-
ported PDF pages by multiples of 90 degrees is easily accomplished with the orientate
option in PDF_fit_textline( ), PDF_fit_image( ), and PDF_fit_pdi_page( ) functions.
Arbitrary rotation angles can be achieved by applying the general coordinate trans-
formation functions. The following example generates some horizontal text, and ro-
tates the coordinate system in order to show rotated text. The save/restore nesting
makes it easy to continue with horizontal text in the original coordinate system after
the vertical text is done:
PDF_set_text_pos(p, 50, 600);
PDF_show(p, "This is horizontal text");
textx = PDF_get_value(p, "textx", 0); /* determine text position*/
texty = PDF_get_value(p, "texty", 0); /* determine text position */
PDF_translate(p, textx, texty); /* move origin to end of text */
PDF_rotate(p, 45); /* rotate coordinates */
PDF_set_text_pos(p, 18, 0); /* provide for distance from horiz. text */
PDF_show(p, "rotated text");
PDF_continue_text(p, "horizontal text continues");
Using top-down coordinates. Unlike PDF’s bottom-up coordinate system some graph-
ics environments use top-down coordinates which may be preferred by some develop-
ers. Such a coordinate system can easily be established using PDFlib’s transformation
functions. However, since the transformations will also affect text output additional
calls are required in order to avoid text being displayed in a mirrored sense.
In order to facilitate the use of top-down coordinates PDFlib supports a special mode
in which all relevant coordinates will be interpreted differently: instead of working with
the default PDF coordinate system, with the origin (0, 0) at the lower left corner of the
page and y coordinates increasing upwards, a modified coordinate system will be used
which has its origin at the upper left corner of the page with y coordinates increasing
downwards. This top-down coordinate system can be activated with the topdown pa-
PDF_set_parameter(p, "topdown", "true")
A different coordinate system can be established for each page, but the topdown para-
meter must not be set within a page description (but only between pages). The topdown
3.2 Page Descriptions 59
feature has been designed to make it quite natural for PDFlib users to work in a top-
down coordinate system. For the sake of completeness we’ll list the detailed conse-
quences of establishing a top-down coordinate system below.
»Absolute« coordinates will be interpreted in the user coordinate system without
any modification:
>All function parameters which are designated as »coordinates« in the function de-
scriptions. Some examples: x, y in PDF_moveto( ); x, y in PDF_circle( ), x, y (but not width
and height!) in PDF_rect( ); llx, lly, urx, ury in PDF_create_annotation( )).
»Relative« coordinate values will be modified internally to match the top-down system:
>Text (with positive font size) will be oriented towards the top of the page;
>When the manual talks about »lower left« corner of a rectangle, box etc. this will be
interpreted as you see it on the page;
>When a rotation angle is specified the center of the rotation is still the origin (0, 0) of
the user coordinate system. The visual result of a clockwise rotation will still be
3.2.2 Page Sizes and Coordinate Limits
Standard page formats. For the convenience of PDFlib users, Table 3.4 lists common
standard page sizes1. Symbolic page size names may be used for the width and height op-
tions in PDF_begin/end_page_ext( ). They are called <format>.width and <format>.height,
where <format> is one of the formats in Table 3.4 (in lowercase, e.g. a4.width).
Page size limits. Although PDF and PDFlib don’t impose any restrictions on the usable
page size, Acrobat implementations suffer from architectural limits regarding the page
size. Note that other PDF interpreters may well be able to deal with larger or smaller doc-
ument formats. PDFlib will throw a non-fatal exception if Acrobat’s page size limits are
exceeded. The page size limits for Acrobat are shown in Table 3.5.
Different page size boxes. While many PDFlib developers only specify the width and
height of a page, some advanced applications (especially for prepress work) may want
to specify one or more of PDF’s additional box entries. PDFlib supports all of PDF’s box
entries. The following entries, which may be useful in certain environments, can be
specified by PDFlib clients (definitions taken from the PDF reference):
1. More information about ISO, Japanese, and U.S. standard formats can be found at the following URLs:
home.inter.net/eds/paper/papersize.html, www.cl.cam.ac.uk/~mgk25/iso-paper.html
Table 3.4 Common standard page size dimensions in points
format width height format width height format width height
a0 2380 3368 a4 595 842 letter 612 792
a1 1684 2380 a5 421 595 legal 612 1008
a2 1190 1684 a6 297 421 ledger 1224 792
a3 842 1190 a5 501 709 11x17 792 1224
Table 3.5 Minimum and maximum page size of Acrobat
PDF viewer minimum page size maximum page size
Acrobat 4 and above 1/24" = 3 pt = 0.106 cm 200" = 14400 pt = 508 cm
60 Chapter 3: PDFlib Programming
>MediaBox: this is used to specify the width and height of a page, and describes what
we usually consider the page size.
>CropBox: the region to which the page contents are to be clipped; Acrobat uses this
size for screen display and printing.
>TrimBox: the intended dimensions of the finished (possibly cropped) page;
>ArtBox: extent of the page’s meaningful content. It is rarely used by application soft-
>BleedBox: the region to which the page contents are to be clipped when output in a
production environment. It may encompass additional bleed areas to account for in-
accuracies in the production process.
PDFlib will not use any of these values apart from recording it in the output file. By de-
fault PDFlib generates a MediaBox according to the specified width and height of the
page, but does not generate any of the other entries. The following code fragment will
start a new page and set the four values of the CropBox:
/* start a new page with custom CropBox */
PDF_begin_page_ext(p, 595, 842, "cropbox {10 10 500 800}");
Number of pages in a document. There is no limit in PDFlib regarding the number of
generated pages in a document. PDFlib generates PDF structures which allow Acrobat to
efficiently navigate documents with hundreds of thousands of pages.
Output accuracy and coordinate range. PDFlib’s numerical output accuracy has been
carefully chosen to match the requirements of PDF and the supported environments,
while at the same time minimizing output file size. As detailed in Table 3.6 PDFlib’s ac-
curacy depends on the absolute value of coordinates. While most developers may safely
ignore this issue, demanding applications should take care in their scaling operations
in order to not exceed PDF’s built-in coordinate limits.
3.2.3 Paths
A path is a shape made of an arbitrary number of straight lines, rectangles, or curves. A
path may consist of several disconnected sections, called subpaths. There are several
operations which can be applied to a path (see Section 8.4.6, »Path Painting and Clip-
ping«, page 246):
>Stroking draws a line along the path, using client-supplied parameters (e.g., color,
line width) for drawing.
>Filling paints the entire region enclosed by the path, using client-supplied parame-
ters for filling.
>Clipping reduces the imageable area for subsequent drawing operations by replacing
the current clipping area (which is the page size by default) with the intersection of
the current clipping area and the area enclosed by the path.
Table 3.6 Output accuracy and coordinate range
absolute value output
0 ... 0.000015 0
0.000015 ... 32767.999999 rounded to four decimal digits
32768 ... 231- 1 rounded to next integer
>= 231 an exception will be raised
3.2 Page Descriptions 61
>Merely terminating the path results in an invisible path, which will nevertheless be
present in the PDF file. This will only rarely be required.
It is an error to construct a path without applying any of the above operations to it.
PDFlib’s scoping system ensures that clients obey to this restriction. These rules may
easily be summarized as »don’t change the appearance within a path description«.
Merely constructing a path doesn’t result in anything showing up on the page; you
must either fill or stroke the path in order to get visible results:
PDF_moveto(p, 100, 100);
PDF_lineto(p, 200, 100);
Most graphics functions make use of the concept of a current point, which can be
thought of as the location of the pen used for drawing.
3.2.4 Templates
Templates in PDF. PDFlib supports a PDF feature with the technical name form
XObjects. However, since this term conflicts with interactive forms we refer to this fea-
ture as templates. A PDFlib template can be thought of as an off-page buffer into which
text, vector, and image operations are redirected (instead of acting on a regular page).
After the template is finished it can be used much like a raster image, and placed an ar-
bitrary number of times on arbitrary pages. Like images, templates can be subjected to
geometrical transformations such as scaling or skewing. When a template is used on
multiple pages (or multiply on the same page), the actual PDF operators for construct-
ing the template are only included once in the PDF file, thereby saving PDF output file
size. Templates suggest themselves for elements which appear repeatedly on several
pages, such as a constant background, a company logo, or graphical elements emitted
by CAD and geographical mapping software. Other typical examples for template usage
include crop and registration marks or custom Asian glyphs.
Using templates with PDFlib. Templates can only be defined outside of a page descrip-
tion, and can be used within a page description. However, templates may also contain
other templates. Obviously, using a template within its own definition is not possible.
Referring to an already defined template on a page is achieved with the PDF_fit_image( )
function just like images are placed on the page (see Section 5.3, »Placing Images and
Imported PDF Pages«, page 144). The general template idiom in PDFlib looks as follows:
/* define the template */
template = PDF_begin_template(p, template_width, template_height);
...place marks on the template using text, vector, and image functions...
PDF_begin_page(p, page_width, page_height);
/* use the template */
PDF_fit_image(p, template, (float) 0.0, (float) 0.0, "");
...more page marking operations...
PDF_close_image(p, template);
62 Chapter 3: PDFlib Programming
All text, graphics, and color functions can be used on a template. However, the follow-
ing functions must not be used while constructing a template:
>The functions in Section 8.6, »Image and Template Functions«, page 258, except PDF_
fit_image( ), and PDF_close_image( ). This is not a big restriction since images can be
opened outside of a template definition, and freely be used within a template (but
not opened).
>The functions in Section 8.9.7, »Deprecated Hypertext Parameters and Functions«,
page 294. Hypertext elements must always be defined on the page where they
should appear in the document, and cannot be generated as part of a template.
Template support in third-party software. Templates (form XObjects) are an integral
part of the PDF specification, and can be perfectly viewed and printed with Acrobat.
However, not all PDF consumers are prepared to deal with this construct. For example,
the Acrobat plugin Enfocus PitStop 5.0 can only move templates, but cannot access indi-
vidual elements within a template. On the other hand, Adobe Illustrator 9 and 10 fully
support templates.
3.3 Working with Color 63
3.3 Working with Color
3.3.1 Color and Color Spaces
PDFlib clients may specify the colors used for filling and stroking the interior of paths
and text characters. Colors may be specified in several color spaces:
>Gray values between 0=black and 1=white;
>RGB triples, i.e., three values between 0 and 1 specifying the percentage of red, green,
and blue; (0, 0, 0)=black, (1, 1, 1)=white;
>Four CMYK values between 0=no color and 1=full color, representing cyan, magenta,
yellow, and black values; (0, 0, 0, 0)=white, (0, 0, 0, 1)=black. Note that this is different
from the RGB specification.
>Device-independent colors in the CIE L*a*b* color space are specified by a luminance
value in the range 0-100 and two color values in the range -127 to 128 (see Section
3.3.4, »Color Management and ICC Profiles«, page 67).
>ICC-based colors are specified with the help of an ICC profile (see Section 3.3.4, »Color
Management and ICC Profiles«, page 67).
>Spot color (separation color space): a predefined or arbitrarily named custom color
with an alternate representation in one of the other color spaces above; this is gener-
ally used for preparing documents which are intended to be printed on an offset
printing machine with one or more custom colors. The tint value (percentage) rang-
es from 0=no color to 1=maximum intensity of the spot color. See Section 3.3.3, »Spot
Colors«, page 64, for a list of spot color names.
>Patterns: tiling with an object composed of arbitrary text, vector, or image graphics
(see Section 3.3.2, »Patterns and Smooth Shadings«, page 63).
>Shadings (smooth blends) provide a gradual transition between two colors, and are
based on another color space (see Section 3.3.2, »Patterns and Smooth Shadings«,
page 63).
>The indexed color space is a not really a color space on its own, but rather an efficient
coding of another color space. It will automatically be generated when an indexed
(palette-based) image is imported.
The default color for stroke and fill operations is black.
3.3.2 Patterns and Smooth Shadings
As an alternative to solid colors, patterns and shadings are special kinds of colors which
can be used to fill or stroke arbitrary objects.
Patterns. A pattern is defined by an arbitrary number of painting operations which
are grouped into a single entity. This group of objects can be used to fill or stroke arbi-
trary other objects by replicating (or tiling) the group over the entire area to be filled or
the path to be stroked. Working with patterns involves the following steps:
>First, the pattern must be defined between PDF_begin_pattern( ) and PDF_end_
pattern( ). Most graphics operators can be used to define a pattern.
>The pattern handle returned by PDF_begin_pattern( ) can be used to set the pattern as
the current color using PDF_setcolor( ).
Depending on the painttype parameter of PDF_begin_pattern( ) the pattern definition
may or may not include its own color specification. If painttype is 1, the pattern defini-
64 Chapter 3: PDFlib Programming
tion must contain its own color specification and will always look the same; if painttype
is 2, the pattern definition must not include any color specification. Instead, the current
fill or stroke color will be applied when the pattern is used for filling or stroking.
Note Patterns can also be defined based on a smooth shading (see below).
Smooth shadings. Smooth shadings, also called color blends or gradients, provide a
continuous transition from one color to another. Both colors must be specified in the
same color space. PDFlib supports two different kinds of geometry for smooth shadings:
>axial shadings are defined along a line;
>radial shadings are defined between two circles.
Shadings are defined as a transition between two colors. The first color is always taken
to be the current fill color; the second color is provided in the c1, c2, c3, and c4 parameters
of PDF_shading( ). These numerical values will be interpreted in the first color’s color
space according to the description of PDF_setcolor( ).
Calling PDF_shading( ) will return a handle to a shading object which can be used in
two ways:
>Fill an area with PDF_shfill( ). This method can be used when the geometry of the ob-
ject to be filled is the same as the geometry of the shading. Contrary to its name this
function will not only fill the interior of the object, but also affects the exterior. This
behavior can be modified with PDF_clip( ).
>Define a shading pattern to be used for filling more complex objects. This involves
calling PDF_shading_pattern( ) to create a pattern based on the shading, and using this
pattern to fill or stroke arbitrary objects.
3.3.3 Spot Colors
PDFlib supports spot colors (technically known as Separation color space in PDF, al-
though the term separation is generally used with process colors, too) which can be
used to print custom colors outside the range of colors mixed from process colors. Spot
colors are specified by name, and in PDF are always accompanied by an alternate color
which closely, but not exactly, resembles the spot color. Acrobat will use the alternate
color for screen display and printing to devices which do not support spot colors (such
as office printers). On the printing press the requested spot color will be applied in addi-
tion to any process colors which may be used in the document. This requires the PDF
files to be post-processed by a process called color separation.
Note Color separation is outside the scope of PDFlib. Acrobat 6, additional software for Acrobat 5
(such as the ARTS PDF Crackerjack1 plugin), or in-RIP separation is required to separate PDFs.
Note Some spot colors do not display correctly on screen in Acrobat 5 when Overprint Preview is
turned on. They can be separated and printed correctly, though.
PDFlib supports various built-in spot color libraries as well as custom (user-defined)
spot colors. When a spot color name is requested with PDF_makespotcolor( ) PDFlib will
first check whether the requested spot color can be found in one of its built-in libraries.
If so, PDFlib will use built-in values for the alternate color. Otherwise the spot color is as-
sumed to be a user-defined color, and the client must supply appropriate alternate col-
1. See www.artspdf.com
3.3 Working with Color 65
or values (via the current color). Spot colors can be tinted, i.e., they can be used with a
percentage between 0 and 1.
By default, built-in spot colors can not be redefined with custom alternate values.
However, this behavior can be changed with the spotcolorlookup parameter. This can be
useful to achieve compatibility with older applications which may use different color
PDFlib will automatically generate suitable alternate colors for built-in spot colors
when a PDF/X conformance level has been selected (see Section 7.4, »PDF/X«, page 180).
For custom spot colors it is the user’s responsibility to provide alternate colors which
are compatible with the selected PDF/X conformance level.
Note Built-in spot color data and the corresponding trademarks have been licensed by PDFlib GmbH
from the respective trademark owners for use in PDFlib software.
PANTONE® colors. PANTONE Colors are well-known and
widely used on a world-wide basis. PDFlib fully supports the
PANTONE MATCHING SYSTEM®, totalling ca. 20 000 swatches.
All color swatch names from the following digital color li-
braries can be used (sample swatch names are provided in
>PANTONE solid coated (PANTONE 185 C)
>PANTONE solid uncoated (PANTONE 185 U)
>PANTONE solid matte (PANTONE 185 M)
>PANTONE process coated (PANTONE DS 35-1 C)
>PANTONE process uncoated (PANTONE DS 35-1 U)
>PANTONE process coated EURO (PANTONE DE 35-1 C)
>PANTONE pastel coated (PANTONE 9461 C)
>PANTONE pastel uncoated (PANTONE 9461 U)
>PANTONE metallic coated (PANTONE 871 C)
>PANTONE solid to process coated (PANTONE 185 PC)
>PANTONE solid to process coated EURO (PANTONE 185 EC)
>PANTONE hexachrome® coated (PANTONE H 305-1 C)
>PANTONE hexachrome® uncoated (PANTONE H 305-1 U)
>PANTONE solid in hexachrome coated (PANTONE 185 HC)
Spot color names are case-sensitive; use uppercase as shown in the examples. Old color
name prefixes CV, CVV, CVU, CVC, and CVP will also be accepted, and changed to the cor-
responding new color names unless the preserveoldpantonenames parameter is true. The
PANTONE prefix must always be provided in the swatch name as shown in the exam-
ples. Generally, PANTONE Color names must be constructed according to the following
PANTONE <id> <paperstock>
where <id> is the identifier of the color (e.g., 185) and <paperstock> the abbreviation of the
paper stock in use (e.g., C for coated). A single space character must be provided between
all components constituting the swatch name. Requesting a spot color name starting
with the PANTONE prefix where the name does not represent a valid PANTONE Color will
results in a non-fatal exception (which can be disabled by setting the warning parameter
66 Chapter 3: PDFlib Programming
to false). The following code snippet demonstrates the use of a PANTONE Color with a
tint value of 70 percent:
spot = PDF_makespotcolor(p, "PANTONE 281 U", 0);
PDF_setcolor(p, "fill", "spot", spot, 0.7, 0, 0);
Note PANTONE® Colors displayed here may not match PANTONE-identified standards. Consult cur-
rent PANTONE Color Publications for accurate color. PANTONE® and other Pantone, Inc. trade-
marks are the property of Pantone, Inc. © Pantone, Inc., 2003.
Note PANTONE® Colors are not supported in the PDF/X-1:2001, PDF/X-1a:2001, and PDF/X-1a:2003
HK colors. The HKS color system is widely used in Germa-
ny and other European countries. PDFlib fully supports HKS
colors, including those from the new HKS 3000 plus palettes.
All color swatch names from the following digital color librar-
ies (Farbfächer) can be used (sample swatch names are provid-
ed in parentheses):
>HKS K (Kunstdruckpapier) for gloss art paper, 88 colors (HKS 43 K)
>HKS N (Naturpapier) for natural paper, 88 colors (HKS 43 N)
>HKS E (Endlospapier) for continuous stationary/coated, 90 colors (HKS 43 E)
>HKS Ek (Endlospapier) for continuous stationary/uncoated, 88 colors (HKS 43 E)
>HKS En: identical to HKS E (HKS 43 En)
>HKS Z (Zeitungspapier) for newsprint, 50 colors (HKS 43 Z)
Spot color names are case-sensitive; use uppercase as shown in the examples. The HKS
prefix must always be provided in the swatch name as shown in the examples. General-
ly, HKS color names must be constructed according to one of the following schemes:
HKS <id> <paperstock>
where <id> is the identifier of the color (e.g., 43) and <paperstock> the abbreviation of the
paper stock in use (e.g., N for natural paper). A single space character must be provided
between the HKS, <id>, and <paperstock> components constituting the swatch name. Re-
questing a spot color name starting with the HKS prefix where the name does not repre-
sent a valid HKS color results in a non-fatal exception (which can be disabled by setting
the warning parameter to false). The following code snippet demonstrates the use of an
HKS color with a tint value of 70 percent:
spot = PDF_makespotcolor(p, "HKS 38 E", 0);
PDF_setcolor(p, "fill", "spot", spot, 0.7, 0, 0);
User-defined spot colors. In addition to built-in spot colors as detailed above, PDFlib
supports custom spot colors. These can be assigned an arbitrary name (which must not
conflict with the name of any built-in color, however) and an alternate color which will
be used for screen preview or low-quality printing, but not for high-quality color sepa-
rations. The client is responsible for providing suitable alternate colors for custom spot
There is no separate PDFlib function for setting the alternate color for a new spot col-
or; instead, the current fill color will be used. Except for an additional call to set the al-
3.3 Working with Color 67
ternate color, defining and using custom spot colors works similarly to using built-in
spot colors:
PDF_setcolor(p, "fill", "cmyk", 0.2, 1.0, 0.2, 0); /* define alternate CMYK values */
spot = PDF_makespotcolor(p, "CompanyLogo", 0); /* derive a spot color from it */
PDF_setcolor(p, "fill", "spot", spot, 1, 0, 0); /* set the spot color */
3.3.4 Color Management and ICC Profiles
PDFlib supports several color management concepts including device-independent col-
or, rendering intents, and ICC profiles.
Device-Independent CIE L*a*b* Color. Device-independent color values can be speci-
fied in the CIE 1976 L*a*b* color space by supplying the color space name lab to PDF_
setcolor( ). Colors in the L*a*b* color space are specified by a luminance value in the
range 0-100, and two color values in the range -127 to 128. The illuminant used for the
lab color space will be D50 (daylight 5000K, 2˚ observer)
Rendering Intents. Although PDFlib clients can specify device-independent color val-
ues, a particular output device is not necessarily capable of accurately reproducing the
required colors. In this situation some compromises have to be made regarding the
trade-offs in a process called gamut compression, i.e., reducing the range of colors to a
smaller range which can be reproduced by a particular device. The rendering intent can
be used to control this process. Rendering intents can be specified for individual images
by supplying the renderingintent parameter or option to PDF_load_image( ). In addition,
rendering intents can be specified for text and vector graphics by supplying the
renderingintent option to PDF_create_gstate( ). Table 3.7 lists the available rendering in-
tents and their meanings.
ICC profiles. The International Color Consortium (ICC)1 defined a file format for speci-
fying color characteristics of input and output devices. These ICC color profiles are con-
sidered an industry standard, and are supported by all major color management system
Table 3.7 Rendering intents
rendering intent explanation typical use
Auto Do not specify any rendering intent in the PDF file, but use
the device’s default intent instead. This is the default.
unknown or unspecific uses
No correction for the device’s white point (such as paper
white) is made. Colors which are out of gamut are mapped
to nearest value within the device’s gamut.
exact reproduction of solid
colors; not recommended
for other uses.
RelativeColorimetric The color data is scaled into the device’s gamut, mapping
the white points onto one another while slightly shifting
vector graphics
Saturation Saturation of the colors will be preserved while the color
values may be shifted.
business graphics
Perceptual Color relationships are preserved by modifying both in-
gamut and out-of-gamut colors in order to provide a
pleasing appearance.
scanned images
1. See www.color.org
68 Chapter 3: PDFlib Programming
and application vendors. PDFlib supports color management with ICC profiles in the
following areas:
>Define ICC-based color spaces for text and vector graphics on the page.
>Process ICC profiles embedded in imported image files.
>Apply an ICC profile to an imported image (possibly overriding an ICC profile em-
bedded in the image).
>Define default color spaces for mapping grayscale, RGB, or CMYK data to ICC-based
color spaces.
>Define a PDF/X output intent by means of an external ICC profile.
Color management does not change the number of components in a color specification
(e.g., from RGB to CMYK).
Searching for ICC profiles. PDFlib will search for ICC profiles according to the following
steps, using the profilename parameter supplied to PDF_load_iccprofile( ):
>If profilename = sRGB, PDFlib will use its internal sRGB profile (see below), and termi-
nate the search.
>Check whether there is a resource named profilename in the ICCProfile resource cate-
gory. If so, use its value as file name in the following steps. If there is no such re-
source, use profilename as a file name directly.
>Use the file name determined in the previous step to locate a disk file by trying the
following combinations one after another:
On Windows 2000/XP colordir designates the directory where device-specific ICC pro-
files are stored by the operating system (typically C:\WINNT\system32\spool\drivers\
color). On Mac OS X the following paths will be tried for colordir:
On other systems the steps involving colordir will be omitted.
Acceptable ICC profiles. The type of acceptable ICC profiles depends on the usage pa-
rameter supplied to PDF_load_iccprofile( ):
>If usage = outputintent, only output device (printer) profiles will be accepted.
>If usage = iccbased, input, display and output device (scanner, monitor, and printer)
profiles plus color space conversion profiles will be accepted. They may be specified
in the gray, RGB, CMYK, or Lab color spaces.
The sRGB color space and sRGB ICC profile. PDFlib supports the industry-standard
RGB color space called sRGB (formally IEC 61966-2-1). sRGB is supported by a variety of
software and hardware vendors and is widely used for simplified color management for
consumer RGB devices such as digital still cameras, office equipment such as color
3.3 Working with Color 69
printers, and monitors. PDFlib supports the sRGB color space and includes the required
ICC profile data internally. Therefore an sRGB profile must not be configured explicitly
by the client, but it is always available without any additional configuration. It can be
requested by calling PDF_load_iccprofile( ) with profilename = sRGB.
Using embedded profiles in images (ICC-tagged images). Some images may contain
embedded ICC profiles describing the nature of the image’s color values. For example,
an embedded ICC profile can describe the color characteristics of the scanner used to
produce the image data. PDFlib can handle embedded ICC profiles in the PNG, JPEG, and
TIFF image file formats. If the honoriccprofile option or parameter is set to true (which is
the default) the ICC profile embedded in an image will be extracted from the image, and
embedded in the PDF output such that Acrobat will apply it to the image. This process is
sometimes referred to as tagging an image with an ICC profile. PDFlib will not alter the
image’s pixel values.
The image:iccprofile parameter can be used to obtain an ICC profile handle for the
profile embedded in an image. This may be useful when the same profile shall be ap-
plied to multiple images.
In order to check the number of color components in an unknown ICC profile use the
icccomponents parameter.
Applying external ICC profiles to images (tagging). As an alternative to using ICC pro-
files embedded in an image, an external profile may be applied to an individual image
by supplying a profile handle along with the iccprofile option to PDF_load_image( ).
In order to apply certain ICC profiles to all images, the image:iccprofile parameter can
be used. As opposed to setting default color spaces (see below) these parameters affect
only images, but not text and vector graphics.
ICC-based color spaces for page descriptions. The color values for text and vector
graphics can directly be specified in the ICC-based color space specified by a profile. The
color space must first be set by supplying the ICC profile handle as value to one of the
setcolor:iccprofilegray, setcolor:iccprofilergb, setcolor:iccprofilecmyk parameters. Subse-
quently ICC-based color values can be supplied to PDF_setcolor( ) along with one of the
color space keywords iccbasedgray, iccbasedrgb, or iccbasedcmyk:
icchandle = PDF_load_iccprofile(...)
if (icchandle == -1) {
PDF_set_value(p, "setcolor:iccprofilecmyk", icchandle);
PDF_setcolor(p, "fill", "iccbasedcmyk", 0, 1, 0, 0);
Mapping device colors to ICC-based default color spaces. PDF provides a feature for
mapping device-dependent gray, RGB, or CMYK colors in a page description to device-
independent colors. This can be used to attach a precise colorimetric specification to
color values which otherwise would be device-dependent. Mapping color values this
way is accomplished by supplying a DefaultGray, DefaultRGB, or DefaultCMYK color
space definition. In PDFlib it can be achieved by setting the defaultgray, defaultrgb, or
defaultcmyk parameters and supplying an ICC profile handle as the corresponding val-
ue. The following examples will set the sRGB color space as the default RGB color space
for text, images, and vector graphics:
70 Chapter 3: PDFlib Programming
icchandle = PDF_load_iccprofile(p, "sRGB", 0, "usage=iccbased");
if (icchandle == -1) {
PDF_set_value(p, "defaultrgb", icchandle);
Defining PDF/X output intents. An output device (printer) profile can be used to speci-
fy an output condition for PDF/X. This is done by supplying usage = outputintent in the
call to PDF_load_iccprofile( ). For details see Section 7.4.2, »Generating PDF/X-conforming
Output«, page 181.
3.4 Hypertext Elements 71
3.4 Hypertext Elements
3.4.1 Examples for Creating Hypertext Elements
This section explains how to create hypertext elements such as bookmarks, form fields,
and annotations. Figure 3.2 shows the resulting document with all hypertext elements
that we will create in this section. The document contains the following hypertext ele-
>At the top right there is an invisible Web link at the text www.pdflib.com. Clicking this
area will bring up the corresponding Web page.
>A gray form field of type text is located below the Web link. Using JavaScript code it
will automatically be filled with the current date.
>The red pushpin contains an annotation with an attachment. Clicking it will open
the attached file.
>At the bottom left there is a form field of type button with a printer symbol. Clicking
this button will execute Acrobat’s menu item File, Print.
>The navigation page contains the bookmark »Our Paper Planes Catalog«. Clicking
this bookmark will bring up a page of another PDF document.
In the next paragraphs we will show in detail how to create these hypertext elements
with PDFlib.
Web link. Let’s start with a link to the Web site www.pdflib.com. This is accomplished in
two steps. First we create an action of type URI (in Acrobat: Open a web link). This will pro-
vide us with an action handle which subsequently can be assigned to one or more hy-
pertext elements:
web_action = PDF_create_action(p, "URI", "url http://www.pdflib.com");
In the second step we create the actual link. A link in PDF is an annotation of type Link.
The action option for the link contains the event name activate which will trigger the ac-
tion, plus the web_action handle created above for the action itself:
sprintf(optlist, "linewidth=0 action {activate %d}", web_action);
PDF_create_annotation (p, left_x, left_y, right_x, right_y, "Link", optlist);
Fig. 3.2
Document with hypertext
72 Chapter 3: PDFlib Programming
By default the link will be displayed with a thin black border. Initially this is convenient
for precise positioning, but we disable the border with linewidth=0.
Bookmark for jumping to another file. Now let’s create the bookmark »Our Paper
Planes Catalog« which jumps to another PDF file called paper_planes_catalog.pdf. First
we create an action of Type GoToR (in Acrobat: Go to a page in another document). In the
option list for this action we define the name of the target document with the filename
option; the destination option specifies a certain part of the page which will be enlarged.
More precisely, the document will be displayed on the second page (page 2) with a fixed
view (type fixed), where the middle of the page is visible (left 50 top 200) and the zoom
factor is 200% (zoom 2):
char optlist[256] =
"filename paper_planes_catalog.pdf "
"destination {page 2 type fixed left 50 top 200 zoom 2}"
goto_action = PDF_create_action(p, "GoToR", optlist);
In the next step we create the actual bookmark. The action option for the bookmark con-
tains the activate event which will trigger the action, plus the goto_action handle created
above for the desired action. The option fontstyle bold specifies bold text, and textcolor
{rgb 0 0 1} makes the bookmark blue. The bookmark text »Our Paper Planes Catalog« is
provided as a function parameter:
sprintf(optlist, "action {activate %d} fontstyle bold textcolor {rgb 0 0 1}",
catalog_bookmark = PDF_create_bookmark(p, "Our Paper Planes Catalog", 0, optlist);
Clicking the bookmark will display the specified part of the page in the target docu-
Annotation with file attachment. In the next example we create a file attachment. We
start by creating an annotation of type FileAttachment. The filename option specifies the
name of the attachment, the option mimetype image/gif specifies its type (MIME is a
common convention for classifying file contents). The annotation will be displayed as a
pushpin (iconname pushpin) in red (annotcolor {rgb 1 0 0}) and has a tooltip (contents {Get
the Kraxi Paper Plane!}). It will not be printed (display noprint):
char optlist[256] =
"filename kraxi_logo.gif mimetype image/gif iconname pushpin "
"annotcolor {rgb 1 0 0} contents {Get the Kraxi Paper Plane!} display noprint"
PDF_create_annotation(p, left_x, left_y, right_x, right_y, "FileAttachment", optlist);
Note that the size of the symbol defined with iconname does not vary; the icon will be
displayed in its standard size in the top left corner of the specified rectangle.
Button form field for printing. The next example creates a button form field which
can be used for printing the document. In the first version we add a caption to the but-
ton; later we will use a printer symbol instead of the caption. We start by creating an ac-
tion of type Named (in Acrobat: Execute a menu item). Also, we must specify the font for
the caption:
3.4 Hypertext Elements 73
print_action = PDF_create_action(p, "Named", "menuname Print");
button_font = PDF_load_font(p, "Helvetica-Bold", 0, "winansi", "");
The action option for the button form field contains the up event (in Acrobat: Mouse Up)
as a trigger for executing the action, plus the print_action handle created above for the
action itself. The backgroundcolor {rgb 1 1 0} option specifies yellow background, while
bordercolor {rgb 0 0 0} specifies black border. The option caption Print adds the text Print
to the button, and tooltip {Print the document} creates an additional explanation for the
user. The font option specifies the font using the button_font handle created above. By
default, the size of the caption will be adjusted so that it completely fits into the but-
ton’s area. Finally, the actual button form field is created with proper coordinates, the
name print_button, the type pushbutton and the appropriate options:
sprintf(optlist, "action {up %d} backgroundcolor {rgb 1 1 0} bordercolor {rgb 0 0 0} "
"caption Print tooltip {Print the document} font %d",
print_action, button_font);
PDF_create_field(p, left_x, left_y, right_x, right_y, "print_button", 0,
"pushbutton", optlist);
Now we extend the first version of the button by replacing the text Print with a little
printer icon. To achieve this we load the corresponding image file print_icon.jpg as a
template before creating the page. Using the icon option we assign the template handle
print_icon to the button field, and create the form field similarly to the code above:
print_icon = PDF_load_image(p, "auto", "print_icon.jpg", "template");
if (print_icon == -1)
/* Error handling */
PDF_begin_page_ext(p, pagewidth, pageheight, "");
sprintf(optlist, "action {up %d} icon %d tooltip {Print the document} font %d",
print_action, print_icon, button_font);
PDF_create_field(p, left_x, left_y, right_x, right_y, "print_button", 0,
"pushbutton", optlist);
Simple text field. Now we create a text field near the upper right corner of the page.
The user will be able to enter the current date in this field. We acquire a font handle and
create a form field of type textfield which is called date, and has a gray background:
textfield_font = PDF_load_font(p, "Helvetica-Bold", "winansi", "");
sprintf(optlist, "backgroundcolor {gray 0.8} font %d", textfield_font);
PDF_create_field(p, left_x, left_y, right_x, right_y, "date", 0, "textfield", optlist);
By default the font size is auto, which means that initally the field height is used as the
font size. When the input reaches the end of the field the font size is decreased so that
the text always fits into the field.
Text field with JavaScript. In order to improve the text form field created above we au-
tomatically fill it with the current date when the page is opened. First we create an ac-
tion of type JavaScript (in Acrobat: Run a JavaScript). The script option in the action’s op-
74 Chapter 3: PDFlib Programming
tion list defines a JavaScript snippet which displays the current date in the date text
field in the format month-day-year:
char optlist[256] =
"script {var d = util.printd('mmm dd yyyy', new Date()); "
"var date = this.getField('date'); date.value = d;}"
show_date = PDF_create_action(p, "JavaScript", optlist);
In the second step we create the page. In the option list we supply the action option
which attaches the show_date action created above to the trigger event open (in Acrobat:
Page Open):
sprintf(optlist, "action {open %d}", show_date);
PDF_begin_page_ext(p, pagewidth, pageheight, optlist);
Finally we create the text field as we did above. It will automatically be filled with the
current date whenever the page is opened:
textfield_font = PDF_load_font(p, "Helvetica-Bold", "winansi", "");
sprintf(optlist, "backgroundcolor {gray 0.8} font %d", textfield_font);
PDF_create_field(p, left_x, left_y, right_x, right_y, "date", 0, "textfield", optlist);
3.4.2 Formatting Options for Text Fields
In Acrobat it is possible to specify various options for formatting the contents of a text
field, such as monetary amounts, dates, or percentages. This is implemented via custom
JavaScript code used by Acrobat. PDFlib does not directly support these formatting fea-
tures since they are not specified in the PDF reference. However, for the benefit of
PDFlib users we present some information below which will allow you to realize format-
ting options for text fields by supplying simple JavaScript code fragements with the
action option of PDF_create_field( ).
In order to apply formatting to a text field JavaScript snippets are attached to a text
field as keystroke and format actions. The JavaScript code calls some internal Acrobat
function where the parameters control details of the formatting.
The following sample creates two keystroke and format actions, and attaches them to
a form field so that the field contents will be formatted with two decimal places and the
EUR currency identifier:
keystroke_action = PDF_create_action(p, "JavaScript",
"script {AFNumber_Keystroke(2, 0, 3, 0, \"EUR \", true); }");
format_action = PDF_create_action(p, "JavaScript",
"script {AFNumber_Format(2, 0, 0, 0, \"EUR \", true); }");
sprintf(optlist, "font = %d action = {keystroke %d format %d}",
font, keystroke_action, format_action);
PDF_create_field(p, 50, 500, 250, 600, "price", 0, "textfield", optlist);
In order to specify the various formats which are supported in Acrobat you must use ap-
propriate functions in the JavaScript code. Table 3.8 lists the JavaScript function names
for the keystroke and format actions for all supported formats; the function parameters
are described in Table 3.9. These functions must be used similarly to the example above.
3.4 Hypertext Elements 75
Table 3.8 JavaScript formatting functions for text fields
format JavaScript functions to be used for keystroke and format actions
number AFNumber_Keystroke(nDec, sepStyle, negStyle, currStyle, strCurrency, bCurrencyPrepend)
AFNumber_Format(nDec, sepStyle, negStyle, currStyle, strCurrency, bCurrencyPrepend)
percentage AFPercent_Keystroke(ndec, sepStyle)
AFPercent_Format(ndec, sepStyle)
date AFDate_KeystrokeEx(cFormat)
time AFTime_Keystroke(tFormat)
special AFSpecial_Keystroke(psf)
Table 3.9 Parameters for the JavaScript formatting functions
parameters explanation and possible values
nDec Number of decimal places
sepStyle The decimal separator style:
0 1,234.56
2 1.234,56
negStyle Emphasis used for negative numbers:
1Use red text
2 Show parenthesis
strCurrency Currency string to use, e.g. "\u20AC" for the Euro sign
false do not prepend currency symbol
true prepend currency symbol
cFormat A date format string. It may contain the following format placeholders, or any of the time
formats listed below for tFormat:
d day of month
dd day of month with leading zero
ddd abbreviated day of the week
m month as number
mm month as number with leading zero
mmm abbreviated month name
mmmm full month name
yyyy year with four digits
yy last two digits of year
tFormat A time format string. It may contain the following format placeholders:
hhour (0-12)
hh hour with leading zero (0-12)
Hhour (0-24)
HH hour with leading zero (0-24)
M minutes
MM minutes with leading zero
ss seconds with leading zero
t 'a' or 'p'
tt 'am' or 'pm'
76 Chapter 3: PDFlib Programming
Form fields activate the document’s dirty flag. When a PDF document containing
form fields is closed in Acrobat, it will ask whether you want to save the file, even if you
didn’t touch any fields. In technical terms, opening a PDFlib-generated PDF with form
fields will cause the document’s dirty flag to be set, i.e. Acrobat considers it as changed.
While usually this doesn’t really matter since the user will want to fill the form fields
anyway, some users may consider this behavior inelegant and annoying. You can work
around it with a small JavaScript which resets the document’s dirty flag after loading
the file. Use the following idiom to achieve this:
/* ...create some form fields... */
PDF_create_field(p, "100, 500, 300, 600, "field1", 0, "textfield", "..."
/* Create a JavaScript action which will be hooked up in the document */
action = PDF_create_action(p, "JavaScript", "script={this.dirty=false;}");
sprintf(optlist, "action={open %d}", action);
PDF_end_document(p, optlist);
psf Describes a few additional formats:
0 Zip Code
1 Zip Code + 4
2 Phone Number
3 Social Security Number
Table 3.9 Parameters for the JavaScript formatting functions
parameters explanation and possible values
4.1 Overview of Fonts and Encodings 77
4 Text Handling
4.1 Overview of Fonts and Encodings
Font handling is one of the most complex aspects of page descriptions and document
formats like PDF. In this section we will summarize PDFlib’s main characteristics with
regard to font and encoding handling (encoding refers to the mapping between individ-
ual bytes or byte combinations to the characters which they actually represent). Except
where noted otherwise, PDFlib supports the same font formats on all platforms.
4.1.1 Supported Font Formats
PDFlib supports a variety of font types. This section summarizes the supported font
types and notes some of the most important aspects of these formats.
PostScript Type 1 fonts. PostScript fonts can be packaged in various file formats, and
are usually accompanied by a separate file containing metrics and other font-related in-
formation. PDFlib supports Mac and Windows PostScript fonts, and all common file for-
mats for PostScript font outline and metrics data.
TrueType fonts. PDFlib supports vector-based TrueType fonts, but not those based on
bitmaps. The TrueType font file must be supplied in Windows TTF or TTC format, or
must be installed in the Mac or Windows operating system. Contrary to PostScript
Type 1 fonts, TrueType and OpenType fonts do not require any additional metrics file
since the metrics information is included in the font file itself.
OpenType fonts. OpenType is a modern font format which combines PostScript and
TrueType technology, and uses a platform-independent file format. OpenType is na-
tively supported on Windows 2000/XP, and Mac OS X. There are two flavors of Open-
Type fonts, both of which are supported by PDFlib:
>OpenType fonts with TrueType outlines (*.ttf) look and feel like usual TrueType
>OpenType fonts with PostScript outlines (*.otf) contain PostScript data in a True-
Type-like file format. This flavor is also called CFF (Compact Font Format).
Chinese, Japanese, and Korean (CJK) fonts. In addition to Acrobat’s standard CJK fonts
(see Section 4.7, »Chinese, Japanese, and Korean Text«, page 108), PDFlib supports cus-
tom CJK fonts in the TrueType and OpenType formats. Generally these fonts are treated
similarly to Western fonts. However, certain restrictions apply.
Type 3 fonts. In addition to PostScript, TrueType, and OpenType fonts, PDFlib also
supports the concept of user-defined (Type 3) PDF fonts. Unlike the common font for-
mats, user-defined fonts are not fetched from an external source (font file or operating
system services), but must be completely defined by the client by means of PDFlib’s na-
tive text, graphics, and image functions. Type 3 fonts are useful for the following pur-
>bitmap fonts,
>custom graphics, such as logos can easily be printed using simple text operators,
78 Chapter 4: Text Handling
>Japanese gaiji (user-defined characters) which are not available in any predefined
font or encoding.
4.1.2 Encodings
An encoding defines how the actual bytes in a string will be interpreted by PDFlib and
Acrobat, and how they translate into text on a page. PDFlib supports a variety of encod-
ing methods.
All supported encodings can be arbitrarily mixed in one document. You may even
use different encodings for a single font, although the need to do so will only rarely
Note Not all encodings can be used with a given font. The user is responsible for making sure that
the font contains all characters required by a particular encoding. This can even be problematic
with Acrobat’s core fonts (see Table 4.2).
Identifying glyphs. There are three fundamentally different methods for identifying
individual glyphs (representations of a character) in a font:
>PostScript Type 1 fonts are based on the concept of glyph names: each glyph is la-
belled with a unique name which can be used to identify the character, and con-
struct code mappings which are suitable for a certain environment. While glyph
names have served their purpose for quite some time they impose severe restric-
tions on modern computing because of their space requirements and because they
do not really meet the requirements of international use (in particular CJK fonts).
>TrueType and OpenType fonts identify individual glyphs based on their Unicode
values. This makes it easy to add clear semantics to all glyphs in a text font. However,
there are no standard Unicode assignments for pi or symbol fonts. This implies some
difficulties when using symbol fonts in a Unicode environment.
>Chinese, Japanese, and Korean OpenType fonts are based on the concept of Character
IDs (CIDs). These are basically numbers which refer to a standard repository (called
character complement) for the respective language.
There is considerable overlap among these concepts. For example, TrueType fonts may
contain an auxiliary table of PostScript glyph names for compatibility reasons. On the
other hand, Unicode semantics for many standard PostScript glyph names are available
in the Adobe Glyph List (AGL). PDFlib supports all three methods (name-based, Unicode,
8-Bit encodings. 8-bit encodings (also called single-byte encodings) map each byte in a
text string to a single character, and are thus limited to 256 different characters at a
time. 8-bit encodings used in PDFlib are based on glyph names or Unicode values, and
can be drawn from various sources:
>A large number of predefined encodings according to Table 4.2. These cover the most
important encodings currently in use on a variety of systems, and in a variety of lo-
>User-defined encodings which can be supplied in an external file or constructed dy-
namically at runtime with PDF_encoding_set_char( ). These encodings can be based on
glyph names or Unicode values.
>Encodings pulled from the operating system, also known as system encoding. This
feature is only available on Windows, IBM eServer iSeries, and zSeries.
4.1 Overview of Fonts and Encodings 79
>Abbreviated Unicode-based encodings which can be used to conveniently address
any Unicode range of 256 consecutive characters with 8-bit values.
>Encodings specific to a particular font. These are also called font-specific or builtin en-
Wide-character addressing. In addition to 8-bit encodings, various other addressing
schemes are supported which are much more powerful, and not subject to the 256 char-
acter limit.
>Purely Unicode-based addressing via the unicode encoding keyword. In this case the
client directly supplies Unicode strings to PDFlib. The Unicode strings may be for-
matted according to one of several standard methods (such as UTF-16, UTF-8) and
byte orderings (little-endian or big-endian).
>CMap-based addressing for a variety of Chinese, Japanese, and Korean standards. In
combination with standard CJK fonts PDFlib supports all CMaps supported by Acro-
bat. This includes both Unicode-based CMaps and others (see Section 4.7, »Chinese,
Japanese, and Korean Text«, page 108).
>Glyph id addressing for TrueType and OpenType fonts via the glyphid encoding key-
word. This is useful for advanced text processing applications which need access to
individual glyphs in a font without reference to any particular encoding scheme, or
must address glyphs which do not have any Unicode mapping. The number of valid
glyph ids in a font can be queried with the fontmaxcode parameter.
4.1.3 Support for the Unicode Standard
Unicode is a large character set which covers all current and many ancient languages
and scripts in the world, and has significant support in many applications, operating
systems, and programming languages. PDFlib supports the Unicode standard to a large
extent. The following features in PDFlib are Unicode-enabled:
>Unicode can be supplied directly in page descriptions.
>Unicode can be supplied for various hypertext elements.
>Unicode strings for text on a page or hypertext elements can be supplied in UTF-8 or
UTF-16 formats with any byte ordering.
>PDFlib will include additional information (a ToUnicode CMap) in the PDF output
which helps Acrobat in assigning proper Unicode values for exporting text (e.g., via
the clipboard) and searching for Unicode text.
80 Chapter 4: Text Handling
4.2 Font Format Details
4.2.1 PostScript Fonts
PostScript font file formats. PDFlib supports the following file formats for PostScript
Type 1 metrics and outline data on all platforms:
>The platform-independent AFM (Adobe Font Metrics) and the Windows-specific PFM
(Printer Font Metrics) format for metrics information. While AFM-based font metrics
can be rearranged to any encoding supported by the font, PFM font metrics can only
be used with the following encodings: winansi, iso8859-1, unicode, ebcdic, and builtin
(the latter only for symbol fonts).
>The platform-independent PFA (Printer Font ASCII) and the Windows-specific PFB
(Printer Font Binary) format for font outline information in the PostScript Type 1 for-
mat, (sometimes also called »ATM fonts«).
>On the Mac, resource-based PostScript Type 1 fonts, sometimes called LWFN (Laser-
Writer Font) fonts, are also supported.
>OpenType fonts with PostScript outlines (*.otf).
If you can get hold of a PostScript font file, but not the corresponding metrics file, you
can try to generate the missing metrics using one of several freely available utilities.
However, be warned that such conversions often result in font or encoding problems.
For this reason it is recommended to use the font outline and metrics data as supplied
by the font vendor.
PostScript font names. When working with host fonts it is important to use the exact
(case-sensitive) PostScript font name. If you are working with disk-based font files you
can use arbitrary alias names (see Section 4.3.1, »How PDFlib Searches for Fonts«, page
84). There are several possibilities to find a PostScript font’s exact name:
>Open the font outline file (*.pfa or *.pfb), and look for the string after the entry
/FontName. Omit the leading / character from this entry, and use the remainder as
the font name.
>If you have ATM (Adobe Type Manager) installed or are working with Windows
2000/XP, you can double-click the font (*.pfb) or metrics (*.pfm) file, and will see a
font sample along with the PostScript name of the font.
>Open the AFM metrics file and look for the string after the entry FontName.
Note The PostScript font name may differ substantially from the Windows font menu name, e.g.
»AvantGarde-Demi« (PostScript name) vs. »AvantGarde, Bold« (Windows font menu name).
Also, the font name as given in any Windows .inf file is not relevant for use with PDF.
PostScript glyph names. In order to write a custom encoding file or find fonts which
can be used with one of the supplied encodings you will have to find information about
the exact definition of the character set to be defined by the encoding, as well as the ex-
act glyph names used in the font files. You must also ensure that a chosen font provides
all necessary characters for the encoding. For example, the core fonts supplied with Ac-
robat 4/5 do not support ISO 8859-2 (Latin 2) nor Windows code page 1250. If you happen
to have the FontLab1 font editor (by the way, a great tool for dealing with all kinds of
1. See www.fontlab.com
4.2 Font Format Details 81
font and encoding issues), you may use it to find out about the encodings supported by
a given font (look for »code pages« in the FontLab documentation).1
For the convenience of PDFlib users, the PostScript program print_glyphs.ps in the dis-
tribution fileset can be used to find the names of all characters contained in a PostScript
font. In order to use it, enter the name of the font at the end of the PostScript file and
send it (along with the font) to a PostScript Level 2 or 3 printer, convert it with Acrobat
Distiller, or view it with a Level-2-compatible PostScript viewer. The program will print
all glyphs in the font, sorted alphabetically by glyph name.
If a font does not contain a glyph required for a custom encoding, it will be missing
from the PDF document.
4.2.2 TrueType and OpenType Fonts
TrueType and OpenType file formats. PDFlib supports the following file formats for
TrueType and OpenType fonts:
>Windows TrueType fonts (*.ttf), including CJK fonts
>Platform-independent OpenType fonts with TrueType (*.ttf) or PostScript outlines
(*.otf), including CJK fonts.
>TrueType collections (*.ttc) with multiple fonts in a single file (mostly used for CJK
>End-user defined character (EUDC) fonts (*.tte) created with Microsoft’s eudcedit.exe
>On Mac OS any TrueType font installed on the system (including .dfont) can also be
used in PDFlib.
TrueType and OpenType font names. When working with host fonts it is important to
use the exact (case-sensitive) TrueType font name (on Windows you can also use the
base name of the font plus a style name suffix, see below). If you are working with disk-
based font files you can use arbitrary alias names (see Section 4.3.1, »How PDFlib Search-
es for Fonts«, page 84). In the generated PDF the name of a TrueType font may differ
from the name used in PDFlib (or Windows). This is normal, and results from the fact
that PDF uses the PostScript name of a TrueType font, which differs from its genuine
TrueType name (e.g., TimesNewRomanPSMT vs. Times New Roman).
Note Contrary to PostScript fonts, TrueType and OpenType font names may contain blank characters.
Finding TrueType font names on Windows. You can easily find the name of an in-
stalled font by double-clicking the TrueType font file, and taking note of the full font
name which will be displayed in the first line of the resulting window (without the
TrueType or OpenType term in parentheses, of course). Do not use the entry in the second
line after the label Typeface name! Also, some fonts may have parts of their name local-
ized according to the respective Windows version in use. For example, the common font
name portion Bold may appear as the translated word Fett on a German system. In order
to retrieve the font data from the Windows system (host fonts) you must use the trans-
lated form of the font name in PDFlib, or use font style names (see below). However, in
order to retrieve the font data directly from file you must use the generic (non-local-
ized) form of the font name.
1. Information about the glyph names used in PostScript fonts can be found at partners.adobe.com/asn/tech/type/
unicodegn.jsp (although font vendors are not required to follow these glyph naming recommendations).
82 Chapter 4: Text Handling
If you want to examine TrueType fonts in more detail take a look at Microsoft’s free
»font properties extension«1 which will display many entries of the font’s TrueType ta-
bles in human-readable form.
Windows font style names. When querying host fonts from the Windows operating
system PDFlib users have access to a feature provided by the Windows font selection
machinery: style names can be provided for the weight and slant of a TrueType or
OpenType font, for example
This will instruct Windows to search for a particular bold, italic, or other variation of the
base font. Depending on the available fonts Windows will select a font which most
closely resembles the requested style (it will not create a new font variation). The font
found by Windows may be different from the requested font, and the font name in the
generated PDF may be different from the requested name; PDFlib does not have any
control over Windows’ font selection. Also, font style names only work with TrueType
and OpenType host fonts, but not for PostScript fonts or fonts configured via a disk-
based font file.
The following keywords (separated from the font name with a comma) can be at-
tached to the base font name supplied to PDF_load_font( ) to specify the font weight:
none, thin, extralight, ultralight, light, normal, regular, medium,
semibold, demibold, bold, extrabold, ultrabold, heavy, black
The following keyword can be specified alternatively or in addition to the above:
The keywords are case-insensitive. If two style names are used both must be separated
with a comma, for example:
Note Windows style names for fonts may be useful if you have to deal with localized font names
since they provide a universal method to access font variations regardless of their localized
Finding host font names on the Mac. Generally, you can find the name of an installed
font in the font menu of applications such as TextEdit on Mac OS X. However, this
method does not always result in the proper font name as expected by PDFlib. For this
reason we recommend Apple’s freely available Font Tools2. This suite of command-line
utilities contains a program called ftxinstalledfonts which is useful for determining the
exact name of all installed fonts. In order to determine the font name expected by
PDFlib, install Font Tools and issue the following statement in a terminal window:
ftxinstalledfonts -q
1. See www.microsoft.com/typography/property/property.htm
2. See developer.apple.com/fonts/OSXTools.html
4.2 Font Format Details 83
4.2.3 User-Defined (Type 3) Fonts
Type 3 fonts in PDF (as opposed to PostScript Type 3 fonts) are not actually a file format.
Instead, the glyphs in a Type 3 font must be defined at runtime with standard PDFlib
graphics functions. Since all PDFlib features for vector graphics, raster images, and even
text output can be used in Type 3 font definitions, there are no restrictions regarding
the contents of the characters in a Type 3 font. Combined with the PDF import library
PDI you can even import complex drawings as a PDF page, and use those for defining a
character in a Type 3 font.
Note PostScript Type 3 fonts are not supported.
Type 3 fonts must completely be defined outside of any page (more precisely, the font
definition must take place in document scope). The following example demonstrates the
definition of a simple Type 3 font:
PDF_begin_font(p, "Fuzzyfont", 0, 0.001, 0.0, 0.0, 0.001, 0.0, 0.0, "");
PDF_begin_glyph(p, "circle", 1000, 0, 0, 1000, 1000);
PDF_arc(p, 500, 500, 500, 0, 360);
PDF_begin_glyph(p, "ring", 400, 0, 0, 400, 400);
PDF_arc(p, 200, 200, 200, 0, 360);
The font will be registered in PDFlib, and its name can be supplied to PDF_load_font( )
along with an encoding which contains the names of the glyphs in the Type 3 font.
Please note the following when working with Type 3 fonts:
>Similar to patterns and templates, images cannot be opened within a glyph descrip-
tion. However, they can be opened before starting a glyph description, and placed
within the glyph description. Alternatively, inline images may be used for small bit-
maps to overcome this restriction.
>Due to restrictions in PDF consumers all characters used with text output operators
must actually be defined in the font: if character code x is to be displayed with PDF_
show( ) or a similar function, and the encoding contains glyphname at position x,
then glyphname must have been defined via PDF_begin_glyph( ). This restriction af-
fects only Type 3 fonts; missing glyphs in PostScript Type 1, TrueType, or OpenType
fonts will simply be ignored.
>Some PDF consumers (this is not true for Acrobat) require a glyph named .notdef if
codes will be used for which the corresponding glyph names are not defined in the
font. The .notdef glyph must be present, but it may simply contain an empty glyph
>When normal bitmap data is used to define characters, unused pixels in the bitmap
will print as white, regardless of the background. In order to avoid this and have the
original background color shine through, use the mask parameter for constructing
the bitmap image.
>The interpolate option for images may be useful for enhancing the screen and print
appearance of Type 3 bitmap fonts.
84 Chapter 4: Text Handling
4.3 Font Embedding and Subsetting
4.3.1 How PDFlib Searches for Fonts
Sources of font data. PDFlib can access font data from various sources:
>Disk-based font files which have been statically configured via a UPR configuration
file (see Section 3.1.6, »Resource Configuration and File Searching«, page 51) or dy-
namically via PDF_set_parameter( ) and the FontOutline resource category.
>Fonts which have been installed in the operating system. We refer to such fonts as
host fonts. Instead of fiddling with font and configuration files simply install the font
in the operating system (read: drop it into the appropriate fonts directory), and
PDFlib will happily use it. Host fonts are available on Mac (only TrueType and Open-
Type, but not PostScript fonts) and Windows systems. They can explicitly be config-
ured with the HostFont UPR resource category in order to control the search order.
This feature can be used, for example, to prefer host fonts over the built-in core
>Font data passed by the client directly in memory by means of a PDFlib virtual file
(PVF). This is useful for advanced applications which have the font data already load-
ed into memory and want to avoid unnecessary disk access by PDFlib (see Section
3.1.5, »The PDFlib Virtual File System (PVF)«, page 50 for details on virtual files).
Potential problem with Windows fonts. We’d like to alert users to a potential problem
with font installation on Windows systems. If you install fonts via the File, Install new
font... menu item (as opposed to dragging fonts to the Windows Fonts directory) there’s
a check box Copy fonts to Fonts folder. If this box is unchecked, Windows will only place a
shortcut (link) to the original font file in the fonts folder. In this case the original font
file must live in a directory which is accessible to the application using PDFlib. In partic-
ular, font files outside of the Windows Fonts directory may not be accessible to IIS with
default security settings. Solution: either copy font files to the Fonts directory, or place
the original font file in a directory where IIS has read permission.
Similar problems may arise with Adobe Type Manager (ATM) if the Add without
copying fonts option is checked while installing fonts.
Font name aliasing. Since it can be difficult to find the exact internal name of a font,
PDFlib supports font name aliasing for PostScript, TrueType, and OpenType fonts. With
font name aliasing you can specify an arbitrary name as an alias for some font. The alias
can be specified as a resource of type HostFont, FontOutline, FontAFM, and FontPFM, both
in a UPR file or at runtime. The following sample defines an alias for a disk-based font:
PDF_set_parameter(p, "FontOutline", "x=DFHSMincho-W3.ttf");
font = PDF_load_font(p, "x", 0, "winansi", "");
Searching for fonts. The font name supplied to PDF_load_font( ) can be encoded in
ASCII, UTF-8, or UTF-16. However, not all encodings are supported for all font sources.
The font is searched according to the following scheme:
>If the name is an alias (configured via a UPR file or a call to PDF_set_parameter( )) it
can be encoded in ASCII or UTF-8. The name to which the alias refers will be used in
the next steps to locate a font file (for disk-based fonts) or host font.
4.3 Font Embedding and Subsetting 85
>If the name specifies a host font, it can be encoded in ASCII. On Windows UTF-8 and
UTF-16 can also be used.
>If the font was not found as a (possibly localized) host font, and was not encoded in
UTF-8 or UTF-16, a corresponding font file will be searched by applying the exten-
sion-based search described below.
>For TTC (TrueType Collection) fonts the name can be encoded in ASCII, UTF-8, or
UTF-16, and will be matched against all names of all fonts in the TTC file.
Extension-based search for disk-based font files. When PDFlib searches for a font out-
line or metrics file on disk (as opposed to fetching host fonts directly from the operat-
ing system) it applies the following search algorithm if the font name consists of plain
ASCII characters:
>When the font has been configured as a FontAFM, FontPFM, or FontOutline resource via
UPR file or at runtime the configured file name will be used.
>If no file could be found, the following suffixes will be added to the font name, and
the resulting file names tried one after the other to find the font metrics (and outline
in the case of TrueType and OpenType fonts):
.ttf .otf .afm .pfm .ttc .tte
>If embedding is requested for a PostScript font, the following suffixes will be added
to the font name and tried one after the other to find the font outline file:
.pfa .pfb
>All trial file names above will be searched for »as is«, and then by prepending all di-
rectory names configured in the SearchPath resource category.
This means that PDFlib will find a font without any manual configuration provided the
corresponding font file consists of the font name plus the standard file name suffix ac-
cording to the font type, and is located in one of the SearchPath directories.
4.3.2 Font Embedding
The PDF core fonts. PDF viewers support a core set of 14 fonts which are assumed to be
always available. Full metrics information for the core fonts is already built into the
PDFlib binary so that no additional font files are required (unless the font is to be em-
bedded). The core fonts are the following:
Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique,
Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique,
Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic,
Symbol, ZapfDingbats
In order to replace one of the core fonts with a font installed on the system (host font)
you must configure the font in the HostFont resource category. For example, the follow-
ing line makes sure that instead of using the built-in core font data, the Symbol font
will be taken from the host system:
PDF_set_parameter(p, "HostFont", "Symbol=Symbol");
86 Chapter 4: Text Handling
PDF supports fonts outside the set of 14 core fonts in several ways. PDFlib is capable of
embedding font outlines into the generated PDF output. Font embedding is controlled
via the embedding option of PDF_load_font( ), although in some cases PDFlib will en-
force font embedding (see below).
Alternatively, a font descriptor containing only the character metrics and some gen-
eral information about the font (without the actual glyph outlines) can be embedded. If
a font is not embedded in a PDF document, Acrobat will take it from the target system if
available, or construct a substitute font according to the font descriptor. Table 4.1 lists
different situations with respect to font usage, each of which poses different require-
ments on the font and metrics files required by PDFlib.
When a font with font-specific encoding (a symbol font) or one containing glyphs
outside Adobe’s Standard Latin character set is used, but not embedded in the PDF out-
put, the resulting PDF will be unusable unless the font is already natively installed on
the target system (since Acrobat can only simulate Latin text fonts). Such PDF files are
inherently nonportable, although they may be of use in controlled environments, such
as intra-corporate document exchange.
Forced font embedding. PDF requires font embedding for certain combinations of
font and encoding. PDFlib will therefore force font embedding (regardless of the embed-
ding option) in the following cases:
>Using glyphid or unicode encoding with a TrueType or OpenType font with TT out-
>Using a TrueType font or an OpenType font with TrueType outlines with an encod-
ing different from winansi, macroman, and ebcdic.
Note that font embedding will not be enforced for OpenType fonts with PostScript out-
lines. The requirement for font embedding is caused by the internal conversion to a CID
font, which can be disabled by setting the autocidfont parameter to false. Doing so will
also disable forced embedding. Note that in this case not all Latin characters will be ac-
cessible, and characters outside the Adobe Glyph List (AGL) won’t work at all.
Legal aspects of font embedding. It’s important to note that mere possession of a font
file may not justify embedding the font in PDF, even for holders of a legal font license.
Many font vendors restrict embedding of their fonts. Some type foundries completely
Table 4.1 Different font usage situations and required metrics and outline files
font usage font metrics file required? font outline file required?
one of the 14 core fonts no no1
1. Font outlines may be supplied if embedding is desired
TrueType or OpenType font installed on the Mac, or
TrueType, OpenType, or PostScript fonts installed on
the Windows system (host fonts)
no no
non-core PostScript fonts PFM or AFM PFB or PFA
(only for font embedding)
TrueType fonts no TTF, TTE
OpenType fonts with TrueType or PS outlines,
including CJK TrueType and OpenType fonts
standard CJK fonts2
2. See Section 4.7, »Chinese, Japanese, and Korean Text«, page 108, for more information on CJK fonts.
no no
4.3 Font Embedding and Subsetting 87
forbid PDF font embedding, others offer special online or embedding licenses for their
fonts, while still others allow font embedding provided subsetting is applied to the font.
Please check the legal implications of font embedding before attempting to embed
fonts with PDFlib. PDFlib will honor embedding restrictions which may be specified in a
TrueType or OpenType font. If the embedding flag in a TrueType font is set to no
embedding1, PDFlib will honor the font vendor’s request, and reject any attempt at em-
bedding the font.
4.3.3 Font Subsetting
In order to decrease the size of the PDF output, PDFlib can embed only those characters
from a font which are actually used in the document. This process is called font subset-
ting. It creates a new font which contains fewer glyphs than the original font, and omits
font information which is not required for PDF viewing. Note, however, that Acrobat’s
TouchUp tool will refuse to work with text in subset fonts. Font subsetting is particular-
ly important for CJK fonts. PDFlib supports subsetting for the following types of fonts:
>TrueType fonts,
>OpenType fonts with PostScript or TrueType outlines.
When a font for which subsetting has been requested is used in a document, PDFlib will
keep track of the characters actually used for text output. There are several controls for
the subsetting behavior:
>The default subsetting behavior is controlled by the autosubsetting parameter. If it is
true, subsetting will be enabled for all fonts where subsetting is possible. The default
value is true.
>If the autosubsetting parameter is false, but subsetting is desired for a particular font
nevertheless, the subsetting option must be supplied to PDF_load_font( ).
>The subsetlimit parameter contains a percentage value. If a document uses more than
this percentage of glyphs in a font, subsetting will be disabled for this particular
font, and the complete font will be embedded instead. This saves some processing
time at the expense of larger output files:
PDF_set_value(p, "subsetlimit", 75); /* set subset limit to 75% */
The default value of subsetlimit is 100 percent. In other words, the subsetting option
requested at PDF_load_font( ) will be honored unless the client explicitly requests a
lower limit than 100 percent.
>The subsetminsize parameter can be used to completely disable subsetting for small
fonts. If the original font file is smaller than the value of subsetminsize in KB, font
subsetting will be disabled for this font. The default value is 100 KB.
Embedding and subsetting TrueType fonts. The dependencies for TrueType handling
are a bit confusing due to certain requirements in PDF. The following is a summary of
the information in previous paragraphs.
If a TrueType font is used with an encoding different from winansi and macroman it
will be converted to a CID font for PDF output by default. For encodings which contain
only characters from the Adobe Glyph List (AGL) this can be prevented by setting the
autocidfont parameter to false. If the font is converted to a CID font, it will always be em-
bedded. Subsetting will be applied by default, unless the autosubsetting parameter is set
1. More specifically: if the fsType flag in the OS/2 table of the font has a value of 2.
88 Chapter 4: Text Handling
to false, or the percentage of used glyphs is higher than the subsetlimit parameter, or the
font file size is in KB smaller than the value of the subsetminsize parameter.
4.4 Encoding Details 89
4.4 Encoding Details
4.4.1 8-Bit Encodings
Table 4.2 lists the predefined encodings in PDFlib, and details their use with several im-
portant classes of fonts. It is important to realize that certain scripts or languages have
requirements which cannot be met by common fonts. For example, Acrobat’s core fonts
do not contain all characters required for ISO 8859-2 (e.g. Polish), while PostScript 3,
OpenType Pro, and TrueType »big fonts« do.
Note The »chartab« example contained in the PDFlib distribution can be used to easily print charac-
ter tables for arbitrary font/encoding combinations.
Notes on the macroman encoding. This encoding reflects the Mac OS character set, al-
beit with the old currency symbol at position 219 = 0xDB, and not the Euro glyph as re-
defined by Apple (this incompatibility is dictated by the PDF specification). The
macroman_euro encoding is identical to macroman except that position 219 = 0xDB
holds the Euro glyph instead of the currency symbol. Also, the macroman and mac-
roman_euro encodings don’t include the Apple glyph and the mathematical symbols as
defined in the Mac OS character set. These are available in the macroman_apple encod-
ing, but the required glyphs are contained only in few fonts.
Host encoding. The special encoding host does not have any fixed meaning, but will be
mapped to another 8-bit encoding depending on the current platform as follows:
>on Mac OS Classic it will be mapped to macroman;
>on IBM eServer zSeries with MVS or USS it will be mapped to ebcdic;
>on IBM eServer iSeries it will be mapped to ebcdic_37;
>on Windows it will be mapped to winansi;
>on all other systems (including Mac OS X) it will be mapped to iso8859-1;
Host encoding is primarily useful for writing platform-independent test programs (like
those contained in the PDFlib distribution and other simple applications. Host encod-
ing is not recommended for production use, but should be replaced by whatever encod-
ing is appropriate.
Automatic encoding. PDFlib supports a mechanism which can be used to specify the
most natural encoding for certain environments without further ado. Supplying the
keyword auto as an encoding name specifies a platform- and environment-specific 8-bit
encoding for text fonts as follows:
>On Windows: the current system code page (see below for details)
>On Unix and Mac OS X: iso8859-1
>On Mac OS Classic: macroman
>On IBM eServer iSeries: the current job’s encoding (IBMCCSID000000000000)
>On IBM eServer zSeries: ebcdic (=code page 1047).
For symbol fonts the keyword auto will be mapped to builtin encoding. While automatic
encoding is convenient in many circumstances, using this method will make your
PDFlib client programs inherently non-portable.
Tapping system code pages. PDFlib can be instructed to fetch code page definitions
from the system and transform it appropriately for internal use. This is very convenient
90 Chapter 4: Text Handling
Table 4.2 Availability of glyphs for predefined encodings in several classes of fonts: some languages cannot be
represented with Acrobat’s core fonts.
code page supported languages
PS Level 1/2,
Acrobat 4/51
Acrobat 62
core fonts
PostScript 3
Pro Fonts4
»Big Fonts«5
winansi identical to cp1252 (superset of iso8859-1) yes yes yes yes yes
macroman Mac Roman encoding, the original Macintosh character set yes yes yes yes yes
similar to macroman, but includes the Euro glyph instead
of currency
yes yes yes yes yes
similar to macroman_euro, but includes additional mathe-
matical symbols
–––yes yes
ebcdic EBCDIC code page 1047 yes yes yes yes yes
ebcdic_37 EBCDIC code page 037 yes yes yes yes yes
pdfdoc PDFDocEncoding yes yes yes yes yes
iso8859-1 (Latin-1) Western European languages yes yes yes yes yes
iso8859-2 (Latin-2) Slavic languages of Central Europe yes yes yes
iso8859-3 (Latin-3) Esperanto, Maltese –––yes yes
iso8859-4(Latin-4) Estonian, the Baltic languages, Greenlandic –––yesyes
iso8859-5 Bulgarian, Russian, Serbian –––yes yes
iso8859-6Arabic ––––yes
iso8859-7 Modern Greek –––
1 miss.
iso8859-8Hebrew and Yiddish ––––yes
iso8859-9 (Latin-5) Western European, Turkish
5 miss. 5 miss.
yes yes yes
iso8859-10 (Latin-6) Nordic languages
1 miss.
iso8859-13 (Latin-7) Baltic languages ––yes yes yes
iso8859-14(Latin-8) Celtic –––––
iso8859-15 (Latin-9) Adds Euro as well as French and Finnish
characters to Latin-1
yes yes yes yes
iso8859-16 (Latin-10) Hungarian, Polish, Romanian, Slovenian yes yes yes
cp1250 Central European ––yes yes yes
cp1251 Cyrillic –––yesyes
cp1252 Western European (same as winansi) yes yes yes yes yes
cp1253 Greek –––
1 miss.
cp1254 Turkish
5 miss.
yes yes yes
cp1255 Hebrew ––––yes
cp1256 Arabic ––––
5 miss.
cp1257 Baltic – – yes yes yes
cp1258 Viet Nam ––––yes
1. Core fonts shipped with Acrobat 4/5 (original Adobe Latin character set; generally Type 1 Fonts since 1982)
2. Acrobat 6 relies on the fonts which are available with the system in order to display Times and Helvetica. Therefore the results vary
widely depending on the number and kind of installed fonts. For example, the system fonts shipped with Windows XP contain more
glyphs than those available in older versions of Windows.
3. Extended Adobe Latin character set (CE-Fonts), generally Type 1 Fonts shipped with PostScript 3 devices
4. Adobe OpenType Pro fonts contain more glyphs than regular OpenType fonts.
5. Windows TrueType fonts containing large glyph complements, e.g. Tahoma
4.4 Encoding Details 91
since it frees you from implementing the code page definition yourself. Instead of sup-
plying the name of a built-in or user-defined encoding for PDF_load_font( ), simply use
an encoding name which is known to the system. This feature is only available on se-
lected platforms, and the syntax for the encoding string is platform-specific:
>On Windows the encoding name is cp<number>, where <number> is the number of
any single-byte code page installed on the system (see Section 4.7.3, »Custom CJK
Fonts«, page 112, for information on multi-byte Windows code pages):
PDF_load_font(p, "Helvetica", 0, "cp1250", "");
Single-byte code pages will be transformed into an internal 8-bit encoding, while
multi-byte code pages will be mapped to unicode. The text must be supplied in a for-
mat which is compatible with the chosen code page (e.g. SJIS for cp932).
>On IBM eServer iSeries any Coded Character Set Identifier (CCSID) can be used. The
CCSID must be supplied as a string, and PDFlib will apply the prefix IBMCCSID to the
supplied code page number. PDFlib will also add leading 0 characters if the code page
number uses fewer than 5 characters. Supplying 0 (zero) as the code page number
will result in the current job’s encoding to be used:
PDF_load_font(p, "Helvetica", 0, "273", "");
>On IBM eServer zSeries with USS or MVS any Coded Character Set Identifier (CCSID) can
be used. The CCSID must be supplied as a string, and PDFlib will pass the supplied
code page name to the system literally without applying any change:
PDF_load_font(p, "Helvetica", 0, "IBM-273", "");
User-defined 8-bit encodings. In addition to predefined encodings PDFlib supports
user-defined 8-bit encodings. These are the way to go if you want to deal with some
character set which is not internally available in PDFlib, such as EBCDIC character sets
different from the one supported internally in PDFlib. PDFlib supports encoding tables
defined by PostScript glyph names, as well as tables defined by Unicode values.
The following tasks must be done before a user-defined encoding can be used in a
PDFlib program (alternatively the encoding can also be constructed at runtime using
PDF_encoding_set_char( )):
>Generate a description of the encoding in a simple text format.
>Configure the encoding in the PDFlib resource file (see Section 3.1.6, »Resource Con-
figuration and File Searching«, page 51) or via PDF_set_parameter( ).
>Provide a font (metrics and possibly outline file) that supports all characters used in
the encoding.
The encoding file simply lists glyph names and numbers line by line. The following ex-
cerpt shows the start of an encoding definition:
% Encoding definition for PDFlib, based on glyph names
% name code Unicode (optional)
space 32 0x0020
exclam 33 0x0021
The next example shows a snippet from a Unicode code page:
% Code page definition for PDFlib, based on Unicode values
% Unicode code
92 Chapter 4: Text Handling
0x0020 32
0x0021 33
More formally, the contents of an encoding or code page file are governed by the follow-
ing rules:
>Comments are introduced by a percent ’%’ character, and terminated by the end of
the line.
>The first entry in each line is either a PostScript glyph name or a hexadecimal Uni-
code value composed of a 0x prefix and four hex digits (upper or lower case). This is
followed by whitespace and a hexadecimal (0xoo–0xFF) or decimal (0–255) character
code. Optionally, name-based encoding files may contain a third column with the
corresponding Unicode value.
>Character codes which are not mentioned in the encoding file are assumed to be un-
defined. Alternatively, a Unicode value of 0x0000 or the character name .notdef can
be provided for unused slots.
As a naming convention we refer to name-based tables as encoding files (*.enc), and Uni-
code-based tables as code page files (*.cpg), although PDFlib treats both kinds in the
same way, and doesn’t care about file names. In fact, PDFlib will automatically convert
between name-based encoding files and Unicode-based code page files whenever it is
necessary. This conversion is based on Adobe’s standard list of PostScript glyph names
(the Adobe Glyph List, or AGL1), but non-AGL names can also be used. PDFlib will assign
free Unicode values to these non-AGL names, and adjusts the values when reading an
OpenType font file which includes a mapping from glyph names to Unicode values.
The AGL is built into PDFlib, and contains more than 1000 glyph names. Encoding
files are required for PostScript fonts with non-standard glyph names, while code pages
are more convenient when dealing with Unicode-based TrueType or OpenType fonts.
4.4.2 Symbol Fonts and Font-specific Encodings
Since Symbol or logo fonts (also called pi fonts) do not usually contain standard charac-
ters they must use a different encoding scheme compared to text fonts.
The builtin encoding for PostScript fonts. The encoding name builtin doesn’t describe a
particular character ordering but rather means »take this font as it is, and don’t mess
with the character set«. This concept is sometimes called a »font-specific« encoding and
is very important when it comes to non-text fonts (such as logo and symbol fonts). It is
also widely used (somewhat inappropriately) for non-Latin text fonts (such as Greek
and Cyrillic). Such fonts cannot be reencoded using one of the standard encodings since
their character names don’t match those in these encodings. Therefore builtin must be
used for all symbolic or non-text PostScript fonts. Non-text fonts can be recognized by
the following entry in their AFM file:
EncodingScheme FontSpecific
Text fonts can be reencoded (adjusted to a certain code page or character set), while
symbolic fonts can’t, and must use builtin encoding instead. However, the widely used
Symbol and ZapfDingbats fonts can also be used with unicode encoding.
1. The AGL can be found at partners.adobe.com/asn/tech/type/glyphlist.txt
4.4 Encoding Details 93
The builtin encoding can not be used for user-defined (Type 3) fonts since these do
not include any default encoding.
Note Unfortunately, many typographers and font vendors didn’t fully grasp the concept of font spe-
cific encodings (this may be due to less-than-perfect production tools). For this reason, there
are many Latin text fonts labeled as FontSpecific encoding, and many symbol fonts incorrectly
labeled as text fonts.
Builtin encoding for TrueType fonts. TrueType fonts with non-text characters, such as
the Wingdings font, must be used with builtin encoding. If a font requires builtin encod-
ing but the client requested a different encoding PDFlib will enforce builtin encoding
Builtin encoding for OpenType fonts with PostScript outlines (*.otf). OTF fonts with
non-text characters must be used with builtin encoding. Some OTF fonts contain an in-
ternal default encoding. PDFlib will detect this case, and dynamically construct an en-
coding which is suited for this particular font. The encoding name builtin will be modi-
fied to builtin_<fontname>. Although this new encoding name can be used in future calls
to PDF_load_font( ) it is only reasonable for use with the same font.
4.4.3 Glyph ID Addressing for TrueType and OpenType Fonts
In addition to 8-bit encodings, Unicode, and CMaps PDFlib supports a method of ad-
dressing individual characters within a font called glyph id addressing. In order to use
this technique all of the following requirements must be met:
>The font is available in the TrueType or OpenType format.
>The font must be embedded in the PDF document (with or without subsetting).
>The developer is familiar with the internal numbering of glyphs within the font.
Glyph ids (GIDs) are used internally in TrueType and OpenType fonts, and uniquely ad-
dress individual glyphs within a font. GID addressing frees the developer from any re-
striction in a given encoding scheme, and provides access to all glyphs which the font
designer put into the font file. However, there is generally no relationship at all be-
tween GIDs and more common addressing schemes, such as Windows encoding or Uni-
code. The burden of converting application-specific codes to GIDs is placed on the
PDFlib user.
GID addressing is invoked by supplying the keyword glyphid as the encoding parame-
ter of PDF_load_font( ). GIDs are numbered consecutively from 0 to the last glyph id val-
ue, which can be queried with the fontmaxcode parameter.
4.4.4 The Euro Glyph
The symbol denoting the European currency Euro raises a number of is-
sues when it comes to properly displaying and printing it. In this section
we’d like to give some hints so that you can successfully deal with the
Euro character. First of all you’ll have to choose an encoding which in-
cludes the Euro character and check on which position the Euro is located.
Some examples:
>With unicode encoding use the character U+20AC.
>In winansi encoding the location is 0x80 (hexadecimal) or 128 (decimal).
94 Chapter 4: Text Handling
>The common iso8859-1 encoding does not contain the Euro character. However, the
iso8859-15 encoding is an extension of iso8859-1 which adds the Euro character at
0xA4 (hexadecimal) or 164 (decimal).
>The original macroman encoding, which is still the same in PDF, does not contain the
Euro character. However, Apple modified this encoding and replaced the old curren-
cy glyph which the Euro glyph at 0xDB (hexadecimal) or 219 (decimal). In order to
use this modified Mac encoding use macroman_euro instead of macroman.
Next, you must choose a font which contains the Euro glyph. Many modern fonts in-
clude the Euro glyph, but not all do. Again, some examples:
>The built-in fonts in PostScript Level 1 and Level 2 devices do not contain the Euro
character, while those in PostScript 3 devices usually do.
>If a font does not contain the Euro character you can use the Euro from the Symbol
core font instead, which is located at position 0xA0 (hexadecimal) or 160 (decimal). It
is available in the version of the Symbol font shipped with Acrobat 4.0 and above,
and the one built into PostScript 3 devices.
4.5 Unicode Support 95
4.5 Unicode Support
PDFlib supports the Unicode standard1, almost identical to ISO
10646, for a variety of features related to page content and hy-
pertext elements.
4.5.1 Unicode for Page Content and Hypertext
Unicode strings can be supplied directly in page descriptions for
use with the following kinds of fonts:
>PostScript fonts with unicode encoding. Up to 255 distinct Unicode values can be
used. If more are requested they will be replaced with the space character. The encod-
ing unicode will always be mapped to winansi if a font with a PFM metrics file is used.
>TrueType and OpenType fonts with unicode encoding. For TrueType and OpenType
fonts this will force font embedding.
>Standard CJK fonts with a Unicode-based CMap. Unicode-compatible CMaps are easi-
ly identified by the Uni prefix in their name (see Table 4.7).
>Custom CJK fonts with unicode encoding.
>On Windows systems Unicode filenames can be used.
In addition to unicode encoding PDFlib supports several other methods for selecting
Unicode characters.
Unicode code pages for PostScript and TrueType fonts. PDFlib supports Unicode ad-
dressing for characters within the Adobe Glyph List (AGL). This kind of Unicode support
is available for Unicode-based TrueType fonts and PostScript fonts with glyph names in
the AGL.
This feature can be activating by using any of PDFlib’s internal code pages, or supply-
ing a suitable custom encoding or code page file (see Section 4.4.1, »8-Bit Encodings«,
page 89).
8-Bit strings for addressing Unicode segments. PDFlib supports an abbreviated format
which can be used to address up to 256 consecutive Unicode characters starting at an ar-
bitrary offset between U+0000 and U+FFFF. This can be used to easily access a small
range of Unicode characters while still working with 8-bit characters.
This feature can be activated by using the string U+XXXX as the encoding parameter
for PDF_load_font( ), where XXXX denotes a hexadecimal offset. The 8-bit character value
will be added to the supplied offset. For example, using the encoding
will select the Cyrillic Unicode section, and 8-bit strings supplied to the text functions
will select the Unicode characters U+0400, U+0401, etc.
Proper Unicode values for cut-and-paste and find operations. PDFlib will include addi-
tional information (a ToUnicode CMap) in the PDF output which helps Acrobat in assign-
ing proper Unicode values for exporting text (e.g., via the clipboard) and searching for
text. By default ToUnicode CMaps will be generated for all supported font types, but
they can only be included if Unicode information is available for a given font/encoding
1. See www.unicode.org
96 Chapter 4: Text Handling
combination. While this is the case for most font/encoding combinations, user-defined
Type 3 fonts, for example, may be missing Unicode information. In this case PDFlib will
not be able to generate a ToUnicode CMap, and text export or searching will not work in
Generation of a ToUnicode CMap can be globally disabled with the unicodemap pa-
rameter, or on a per-font basis with the PDF_load_font( ) option of the same name. The
default of this parameter/option is true. Setting it to false will decrease the output file
size while potentially disabling proper cut-and-paste support in Acrobat.
Unicode for hypertext strings. Unicode can be supplied for various hypertext ele-
ments, such as bookmarks, contents and title of note annotations (see Figure 4.1), stan-
dard and user-defined document information field contents, description and author of
file attachments.
While PDF supports only Unicode in big-endian UTF-16 format and PDFDocEncoding,
which is a superset of ISO 8859-1 for hypertext elements, PDFlib supports all 8-bit and
Unicode-based encodings as well as system-installed code pages which are allowed for
PDF_load_font( ), and will automatically apply any required conversions.
4.5.2 Content Strings, Hypertext Strings, and Name Strings
There are different string types in the PDFlib API depending on their usage:
>Content strings: these will be used to create genuine page content (page descrip-
tions) according to the encoding chosen by the user for a particular font. All text pa-
rameters of the page content functions in Section 8.3.4, »Simple Text Output«, page
219, and Section 8.3.5, »Multi-Line Text Output with Textflows«, page 227, fall in this
>Hypertext strings: these are mostly used for hypertext functions such as bookmarks
and annotations, and are explicitly labeled Hypertext string in the function descrip-
tions. Many parameters and options of the functions in Section 8.9, »Hypertext
Functions«, page 278, fall in this class, as well as some others.
Fig. 4.1
Unicode bookmarks (left) and Unicode
text annotations (right)
4.5 Unicode Support 97
>Name strings: these are used for external file names, font names, block names, etc.,
and are marked as name string in the function descriptions. They slightly differ from
Hypertext strings, but only in languages which are not Unicode-aware.
Replacement mechanism for Unicode code points with unavailable glyphs. Content
strings will be visualized on the page with a particular font. However, no font contains
all characters contained in the latest Unicode standard. While obtaining suitable fonts is
obviously a task of the PDFlib user, PDFlib tries to work around some common prob-
lems by substituting certain characters with visually similar glyphs if the original glyph
is not available in the font, and the glyphwarning option is set to false. The following (in-
complete) list contains some of these glyph mappings. If the first character in the list is
unavailable in a font, it will automatically be replaced with the second:
In addition to the builtin table the fullwidth characters U+FF01 to U+FF5E will be re-
placed with the corresponding ISO 8859-1 characters (i.e. U+0021 to U+007E) if the full-
width variants are not available in a font.
4.5.3 String Handling in Unicode-capable Languages
The following PDFlib language bindings are Unicode-capable:
String handling in these environments is straightforward: all strings will automatically
be provided to the PDFlib kernel as Unicode strings in UTF-16 format. The language
wrappers will correctly deal with Unicode strings provided by the client, and automati-
cally set certain PDFlib parameters. This has the following consequences:
>Since the language wrapper automatically sets the textformat, hypertextformat, and
hypertextencoding parameters, these are not accessible by the client, and must not be
used. The PDFlib language wrapper applies all required conversions so that client-
supplied hypertext strings will always arrive in PDFlib in utf16 format and unicode
>Since the language environment always passes strings in UTF-16 to PDFlib, UTF-8 can
not be used with Unicode-capable languages. It must be converted to UTF-16 before,
using the native methods provided by the environment.
>Using unicode encoding for page descriptions is the easiest way to deal with encod-
ings in Unicode-aware languages.
>Non-Unicode CMaps for standard CJK fonts on page descriptions must be avoided
since the wrapper will always supply Unicode to the PDFlib core; only Unicode
CMaps can be used.
98 Chapter 4: Text Handling
The overall effect is that clients can provide plain Unicode strings to PDFlib functions
without any additional configuration or parameter settings.
4.5.4 String Handling in non-Unicode-capable Languages
Note This section does not apply to the Unicode-capable languages Java and Tcl.
The following PDFlib language bindings are not Unicode-capable:
Although Unicode text can be used in these languages, handling of the various string
types is a bit more complicated:
>Content strings: These are strings used to create genuine page content. Interpretation
of these strings is controlled by the textformat parameter (detailed below) and the
encoding parameter of PDF_load_font( ). If textformat=auto (which is the default) utf16
format will be used for the unicode and glyphid encodings as well as UCS-2 CMaps. For
all other encodings the format will be bytes. The length of UTF-16 strings must be
supplied in a separate length parameter.
>Hypertext strings: string interpretation is controlled by the hypertextformat and hyper-
textencoding parameters (detailed below). If hypertextformat=auto (which is the de-
fault) utf16 format will be used if hypertextencoding=unicode, and bytes otherwise. In
languages which do not support native string objects (Cobol, C, and RPG) the length
of UTF-16 strings must be supplied in a separate length parameter.
>Name strings: these are interpreted slightly differently from page description strings,
depending on the length parameter and the existence of a BOM at the beginning of
the string. In C, if the length parameter is different from 0 the string will be interpret-
ed as UTF-16. Otherwise (i.e., if the length parameter is 0, the function doesn’t provide
one, or a language other than C is used) it will be interpreted as UTF-8 if it starts with
a UTF-8 BOM, or as EBCDIC UTF-8 if it starts with an EBCDIC UTF-8 BOM, or as host if
no BOM is found (or ebcdic on EBCDIC-based platforms).
Strings in option lists. Strings within option lists require special attention since they
cannot be expressed as Unicode strings in UTF-16 format, but only as byte strings. For
this reason UTF-8 is used for Unicode options. By looking for a BOM at the beginning of
an option PDFlib decides how to interpret it. The BOM will be used to determine the for-
mat of the string, and the string type (content string, hypertext string, or name string as
defined above) will be used to determine the appropriate encoding. More precisely, in-
terpreting a string option works as follows:
>If the option starts with a UTF-8 BOM (\xEF\xBB\xBF) it will interpreted as UTF-8.
>If the option starts with an EBCDIC UTF-8 BOM (\x57\x8B\xAB) it will be interpreted as
>If no BOM is found, the string will be treated as winansi (or ebcdic on EBCDIC-based
4.5 Unicode Support 99
Note The PDF_utf16_to_utf8( ) utility function can be used to create UTF-8 strings from UTF-16
strings, which is useful for creating option lists with Unicode values.
Text Format for Unicode Strings. The Unicode standard supports several transforma-
tion formats for storing the actual byte values which comprise a Unicode string. These
vary in the number of bytes per character and the ordering of bytes within a character.
Unicode strings in PDFlib can be supplied in UTF-8 or UTF-16 formats with any byte or-
dering. This can be controlled with the textformat parameter for all text on page descrip-
tions, and the hypertextformat parameter for all hypertext elements. Table 4.3 lists the
values which are supported for both of these parameters.
The default setting for the textformat parameter is utf16 for Unicode-capable languages,
and auto otherwise.
Although the textformat setting is in effect for all encodings, it will be most useful for
unicode encoding. Table 4.4 details the interpretation of text strings for various combi-
nations of font encodings and textformat settings.
Table 4.3 Text formats
textformat explanation
bytes One byte in the string corresponds to one character. This is mainly useful for 8-bit encodings.
utf8 Strings are expected in UTF-8 format.
ebcdicutf8 Strings are expected in EBCDIC-coded UTF-8 format (only on iSeries and zSeries).
utf16 Strings are expected in UTF-16 format. A Unicode Byte Order Mark (BOM) at the start of the string
will be evaluated and then removed. If no BOM is present the string is expected in the machine’s
native byte ordering (on Intel x86 architectures the native byte order is little-endian, while on
Sparc and PowerPC systems it is big-endian).
utf16be Strings are expected in UTF-16 format in big-endian byte ordering. There is no special treatment
for Byte Order Marks.
utf16le Strings are expected in UTF-16 format in little-endian byte ordering. There is no special treatment
for Byte Order Marks.
auto Equivalent to bytes for 8-bit encodings, and utf16 for wide-character addressing (unicode,
glyphid, or a UCS2 or UTF16 CMap). This setting will provide proper text interpretation in most
environments which do not use Unicode natively.
Table 4.4 Relationship of font encodings and text format
font encoding textformat = bytes textformat = utf8, utf16, utf16be, or utf16le
8-bit, or builtin en-
coding for TTF/OTF
8-bit codes convert Unicode values to 8-bit codes according to
the chosen encoding1
builtin encoding for
8-bit codes only in Unicode-capable language bindings. PDFlib
will throw an exception otherwise
U+XXXX 8-bit codes will be added to the off-
set XXXX to address Unicode values
convert Unicode values to 8-bit codes according to
the chosen Unicode offset
glyphid 8-bit codes address glyph ids from 0
to 255
Unicode values will be interpreted as glyph ids2
unicode and UCS2-or
UTF16 CMaps
8-bit codes address Unicode values
from U+0000 to U+00FF
any Unicode value, encoded according to the
chosen text format1
any other CMap
(not Unicode-based)
any single- or multibyte codes
according to the chosen CMap
only in Unicode-capable language bindings. PDFlib
will throw an exception otherwise
100 Chapter 4: Text Handling
Hypertext encoding. The hypertextencoding parameter works analogous to the
encoding parameter of PDF_load_font( ), and controls the 8-bit encoding of hypertext
strings. It can attain most encoding names known to PDFlib, including auto (see Section
4.4, »Encoding Details«, page 89). Note that glyphid, builtin, and CMap names are not al-
lowed for this parameter. The default setting for the hypertextencoding parameter is
Hypertext format. Similarly to the textformat parameter, the format of hypertext
strings can be controlled with the hypertextformat parameter. However, interpretation
of the allowed values is somewhat different for the hypertextformat parameter. While
utf8, utf16, utf16be, and utf16le have the same meaning as for the textformat parameter,
the behavior of bytes and auto is slightly different:
>auto: UTF-16 strings with big-endian BOM will be detected (in C such strings must be
terminated with a double-null), and Unicode output will be generated. If the string
does not start with a big-endian BOM it will be interpreted as an 8-bit encoded string
according to the hypertextencoding parameter (see above). If it contains at least one
character which is not contained in PDFDocEncoding, the complete string will be
converted to a big-endian UTF-16 string, and written to the PDF output as Unicode.
Otherwise it will be written to the PDF output as 8-bit encoded PDFDocEncoding text.
>bytes: one byte in the string corresponds to one character, and the string will be out-
put without any interpretation. This is mainly useful for 8-bit encodings. In addi-
tion, UTF-16 strings with big-endian BOM will automatically be detected. In C, such
strings must be terminated with a double-null unless the length in bytes is explicitly
supplied in the respective function call.
The default setting for the hypertextformat parameter is auto.
4.5.5 Character References
Some environments require the programmer to write source code in 8-bit encodings
(such as winansi, macroman, or ebcdic). This makes it cumbersome to include isolated
Unicode characters in 8-bit encoded text without changing all characters in the text to
multi-byte encoding. In order to aid developers in this situation, PDFlib supports char-
acter references, a method known from markup languages such as SGML and HTML.
HTML-style character references. PDFlib supports all numeric character references and
character entity references defined in HTML 4.01. Numeric character references can be
supplied in decimal or hexadecimal notation for the character’s Unicode value.
Note Code points 128-159 (decimal) or 0x80-0x9F (hexadecimal) do not reference winansi code
points. In Unicode they do not refer to printable characters, but only control characters.
The following are examples for valid character references along with a description of
the resulting character:
1. If the Unicode character is not available in the font PDFlib will issue a warning and replace it with the space character. (this can
be controlled via the glyphwarning parameter).
2. If the glyph id is not available in the font PDFlib will issue a warning and replace it with glyph id 0.
1. See www.w3.org/TR/REC-html40/charset.html#h-5.3
4.5 Unicode Support 101
&#173; soft hyphen
&#xAD; soft hyphen
&shy; soft hyphen
&#229; letter a with small circle above (decimal)
&#xE5; letter a with small circle above (hexadecimal, lowercase x)
&#Xe5; letter a with small circle above (hexadecimal, uppercase X)
&#x20AC; Euro glyph (hexadecimal)
&#8364; Euro glyph (decimal)
&euro; Euro glyph (entity name)
&lt; less than sign
&gt; greater than sign
&amp; ampersand sign
&Alpha; Greek Alpha
Note Although you can reference any Unicode character with character references (e.g. Greek char-
acters and mathematical symbols), the font will not automatically be switched. In order to ac-
tually use such characters you must explicitly select an appropriate font if the current font does
not contain the specified characters.
Additional references for control characters in Textflows. In addition to the HTML-
style references above PDFlib supports custom character entity references which can be
used to specify control characters for textflows. Table 4.5 lists these additional character
Using character references. Character references can be used in all content strings, hy-
pertext strings, and name strings, e.g. in text which will be placed on the page using the
show or textflow functions, as well as in text supplied to the hypertext functions.
Character references will not be converted by default; you must explicitly set the
charref parameter to true if you want to use character references in all content strings:
Table 4.5 Control characters and their meaning in Textflows
Unicode character
entity name equiv. text-
flow option
meaning within textflows in Unicode-compatible
U+0020 SP, space space align words and break lines
U+00A0 NBSP, nbsp (none) (no-break space) space character which will not
break lines
U+0009 HT, hortab (none) horizontal tab: will be processed according to the
ruler, tabalignchar, and tabalignment options
U+002D HY, hyphen (none) separator character for hyphenated words
U+00AD SHY, shy (none) (soft hyphen) hyphenation opportunity, only visible
at line breaks
VT, verttab
LS, linesep
nextline (next line) forces a new line
U+000D and
LF, linefeed
CR, return
NEL, newline
PS, parasep
(next paragraph) Same effect as »next line«; in
addition, the parindent option will affect the next
U+000C FF, formfeed return end of a paragraph; the function PDF_fit_
textflow( ) will return the string _nextpage.
102 Chapter 4: Text Handling
PDF_set_parameter(p, "charref", "true");
Character references can also be enabled for textflow processing by supplying the
charref option to PDF_create_textflow( ) (either directly or as an inline option), PDF_fit_
textline( ), or PDF_fill_textblock( ).
When character references are enabled you can supply numeric or entity references
in 8-bit-encoded text:
PDF_set_parameter(p, "charref", "true");
PDF_set_parameter(p, "textformat", "bytes");
font = PDF_load_font(p, "Helvetica", 0, "unicode", "");
PDF_setfont(p, font, 24);
PDF_show_xy(p, "Price: 500&euro;", 50, 500);
Character references will not be substituted in option lists, but they will be recognized
in options with the Unichar data type (see Section 3.1.4, »Option Lists«, page 48). This
recognition will always be active; it is not subject to the charref parameter.
4.5.6 Unicode-compatible Fonts
Precise Unicode semantics are important for PDFlib’s internal processing, and crucial
for properly extracting text from a PDF document, or otherwise reusing the document,
e.g., converting the contents to another format. This is especially important when creat-
ing Tagged PDF which has strict requirements regarding Unicode compliance (see Sec-
tion 7.5.1, »Generating Tagged PDF with PDFlib«, page 185). In addition to Tagged PDF
Unicode compatibility is relevant for the textflow feature.
Unicode-compatible fonts. A font loaded with PDF_load_font( ) – more precisely: a
combination of font and encoding – is considered Unicode-compatible if the encoding
used for loading the font complies to all of the following conditions:
>The encoding builtin is only allowed for the Symbol and ZapfDingbats fonts and Post-
Script-based OpenType fonts.
>The encoding is not glyphid.
>If the encoding is one of the predefined CMaps in Table 4.7 it must be one of the UCS2
or UTF16 CMaps.
Unicode-compatible output. If you want to make sure that text can reliably be extract-
ed from the generated PDF, and for generating Tagged PDF the output must be Unicode-
compatible. PDF output created with PDFlib will be Unicode-compatible if all of the fol-
lowing conditions are true:
>All fonts used in the document must be Unicode-compatible as defined above, or use
one of the predefined CMaps in Table 4.7.
>If the encoding has been constructed with PDF_encoding_set_char( ) and glyph names
without corresponding Unicode values, or loaded from an encoding file, all glyph
names must be contained in the Adobe Glyph List or the list of well-known glyph
names in the Symbol font.
>The unicodemap parameter or option is true.
>All text strings must have clearly defined semantics according to the Unicode stan-
dard, i.e. characters from the Private Use Area (PUA) are not allowed.
4.5 Unicode Support 103
>PDF pages imported with PDI must be Unicode-compatible. PDI does not change the
Unicode compatibility status of imported pages: it will neither remove nor add Uni-
code information.
When creating Tagged PDF output, text portions which violate these rules can still be
made Unicode-compatible by supplying proper Unicode text with the ActualText option
in PDF_begin_item( ) .
104 Chapter 4: Text Handling
4.6 Text Metrics and Text Variations
4.6.1 Font and Character Metrics
Text position. PDFlib maintains the text position independently from the current </